Product Management for Mission Critical Systems and Thoughts on Elon Musk

You, infraproduct managementsoftware
Back

10 years in the trade is not enough, you can’t cut it
I let you take a swing, and you bunted
For an easy out, I leave mc’s with doubt
Of exceeding, my name is [Elon Musk] and I’m proceeding,
Leading — Drop, Pharcyde, 1995 (historic video recorded in reverse)

Bugs on a Plane

A couple weeks ago, due to internal resource constraints, I needed to take an early call from an airplane still at the gate. I needed to talk with a customer about problems that they were experiencing with performance and resource utilization for a relatively mundane and routine query. The airplane and the customer presented competing priorities. I needed to play “Bop It”: (1) the Zoom mobile app’s mute button, (2) my noise canceling headphones activate/deactivate button, and (3) the flight attendant prompting spoken confirmation of my life-critical commitment to strangers as an exit row passenger. While thinking through complex technical details — “mute it,” “hear it”, “speak it.”

Amid the contradictory directives, there was some synchronicity between my life situation and my product’s situation. The customer wanted a promise. They wanted to know when x feature would be delivered and why it could not happen faster. I wanted my flight to to take off as soon as possible. And the plane was flagged as needing critical maintenance before taking off. Just as it’s obviously better to take off in a plane that is safe than one with a critical maintenance issue, it’s better to deliver a feature in a mission-critical system that is safe than rushing out something half-baked.

My work uniform and station for PM-ing databases

System Failures Outages are like Plane Crashes

For some software engineers, system failures are plane crashes. Sadly, that’s not an exaggeration and is a reality that many readers will know about. There are many factors that contributed to such a horrific consequence so I don’t want to conflate most of my work on database technologies and related applications to life or death scenarios because it is not usually the case. As companies increasingly find uses for Apache Lucene-powered search engines (What Amazon gets from Giving Back to Lucene), the analogy does creep closer to home.

My grandma Joann relies on Siri and Alexa for lots of life stuff as her hand tremor grows stronger with age. Her grocery and Uber orders involve a multi-step process where many of the features are driven by Lucene under-the-hood. Beyond her own advanced usage of these voice interfaces for vital needs, her doctors need search and flexible query capabilities as well. The medical staff she sees regularly for ongoing care rely on research systems that are powered by the same subsystem for patient records, clinical pharmacology, and staying up-to-date on changes in academic research.

In an effort to understand the bar for stability, it’s important for me to understand my users. Ultimately, as a Product Manager I try to prioritize work and convey expectations based on what the users want and need — not based on what I, executives, or engineers want. Even if a plane crash may not be the consequence of your prioritization efforts, your customers’ livelihood could be at risk. I think about my grandmother’s reliance on voice search engines, aerospace systems, and my customers whenever I want to rush a risky rollout.

Communicating Transparently to Customers

One of the things that frustrates me so much about some airlines is an unwillingness to communicate transparently about timelines. In the case when I later found myself stuck on the tarmac, it was after many iterations of delays being extended by another 20 minutes here, and another 20 minutes there. It seemed like when the new departure time was 11:07, until clock struck 11:08. So then the update became 11:27. Once 11:28 rolled around, the departure time was 11:47. I don’t know the ins and outs of their data infrastructure and communication protocols but there are enough crazies in the airports that I try to exploit the extra time to read more pages of whatever my nose is on. As a product manager, I try not to do that. I want the customer to know my best guess on the timeline with a few weeks of buffer added.

Unpredictable performance or system behavior is like a constantly late takeoff, landing, luggage service, and estimated departure time. Generally, it’s a frustrating experiences and can have serious revenue and retention implications. The only way you can alleviate customer pain here is to communicate with customers at regular touchpoints, or as soon as you know if it’s critical. I have a recurring reminder to inform customers just in case updates come in before we sync again.

Building computer infrastructure software with uncertain timelines has similar effects on customers that operating commercial airliners can have on frequent fliers when their timelines are consistently thrown for a loop. Product managers must be able to balance the prioritization of reliability, performance, and feature delivery. In the infrastructure-for-other-companies business, features come last. Safety/reliability first. Speed/performance second. Nevertheless, features remain of critical importance. If United wasn’t the first US Airline to launch an international on-board wifi service, I probably would not be the loyal customer I remain today, for instance. In other businesses where the systems are more single-purpose and can be changed or reverted on your own timeline, the ordering of these considerations can change.

Elon Musk, Twitter, and the Execution Tax of Engineering Brand

When Elon Musk completed his take-private chess move of Twitter Inc, it was met largely with scorn, even by some of my favorite business leaders and cultural critics. They were not completely wrong in my view. I don’t care for Donald Trump’s Twitter one bit, despite my disdain for cancel culture. I do not think that all employees should be forced to work in an office, yet I try to work from some office 5+ days a week. But Musk is experimenting and taking significant risks with the hope of reaping a significant reward — something he paid $44 Billion for the right to do. In one area, I think he could bear out to be accurate.

At many companies, either where I have been employed or worked with as a consultant, I have observed some product and engineering resources that seemingly do not do much. I will not ever reduce engineering down to something as simple as coding. That’s the easy part. However, there are many skilled professionals across the computing industry that will coast because of poor systems for understanding productivity and effectiveness. In other cases where there are robust systems for understanding productivity, technical resources can move at a varying pace because the job they do is extremely high risk and very stressful.

In other cases, companies will be scared to lose current and future employees to competition. I once had a friend in 2017 tell me that a member of his FAANG team made a lot of money and had not completed a single project in over one year. This lack of perceived productivity was deemed acceptable to “preserve the company’s engineering brand” for employees similar to the one in question. To be fair, I doubt my friend, a somewhat famous, brilliant, and neurodivergent engineer, considered any mentorship, collaboration, or education the colleague may have been performing. He may have only viewed writing code as important. In an online chat with this guy Alex, CEO of Moment and formerly Bing, I heard this idea again a few weeks ago, “there is an execution risk associated with engineering brand.” For most companies, your engineering brand is an incredibly precious asset worth preserving or racing to improve.

The Musk experiment is a test of prioritizing execution over engineering brand for a struggling business. I expect a few phenomena to surface.

  1. Twitter will survive because the systems were largely robust before Musk arrived.
  2. The team will reduce the system’s complexity, making it more resilient in the long run.
  3. The team will release new features faster due to the smaller congress of consensus.
  4. The company will report three quarters of profitability over the next 4
  5. The staff that does remain will be emboldened by their newly acquired badges of resilience.
  6. The average execution throughput will increase by necessity.
  7. Unless there are some really gracious things being done that are not yet public or formalized today, twenty percent of the staff will leave sometime over the next 12 months due to burn out, cognitive overload, exhaustion, or because they were fired.
  8. The other 80 percent will not leave for many years because they will feel rich. Shipping is the currency of software engineers, designers, and product managers. The opportunity to work at Twitter at this time will be liberating for some.
  9. Short-term thinkers on Wall Street and novice venture capitalists will take a more scrutinizing look at valuation multiples and operation margins of the entire industry, and conflate apples with oranges.
  10. If I am wrong about 1–9, who cares? Another social media company will fill the void!

With the public information I have reviewed, I think what Musk is doing will succeed. I do not know if that is a good thing or not yet for the broader world of problem solvers. For a subset of bloated businesses, there are lessons to learn from Elon Musk and adopt quickly before you are in trouble. For financial professionals that focus on squeezing lemonade out of distressed assets, you probably could learn something from Musk as well. For startups and growing businesses in desperate need of technical talent, most of these people apparently do not work there anymore so give ’em a good ole DM slide!

Based on the same information, I think that if most other company leaders took similar approaches to running their businesses, the results would be catastrophic. Here are the companies I suspect would struggle from a Musk-like shakeup:

  1. Technology companies that are still growing over 30% year over year cannot risk an exodus like the one observed at Twitter because you need a steady stream of new employees of all levels of experience to facilitate this growth.
  2. Technology companies that are in a transitional phase from one modality to multiple modalities, aka “transitioning from product to platform,” would risk doing more harm than good because the flexibility required to execute on multiple visions at once requires a healthy pulse on global and local extrema.
  3. Startups that are young and still finding their way to product market fit because they should be operating on a skeleton crew basis already.
  4. And, finally, companies that build infrastructure for other companies to run their businesses can face irreparable damage to their relationships, reputation, and ultimately the bottom line by prioritizing execution efficiency over reliability and performance. If the team is walking on pins and needles to ship features fast, the product will not be stable enough to generate sustained revenue.

For product managers building mission critical systems, execution efficiency is not as important as your customers. Focus on them.

Marcus

PS, for those who made it this far, I want to point out three things:

  1. This opinion is my own and not that of my employer or any of the companies with which I am affiliated. It’s a risky piece to write because it is an emotionally charged topic, but I hope this helps people who are trying to solve the world’s problems.
  2. This piece leaves open questions about preserving the engineering brand following a RIF. I’m not interested in writing that piece ahead of other planned pieces about technology, search, and machine learning. For those that are interested, I hear Stripe has done a really good job there. Maybe there are other companies that have as well. “If you don’t want to be here leave” has been shown to be an effective tactic in some cases but not all.
  3. I do want to express one final opinion here, which I feel is most important: the U.S. immigration laws are bullshit and in need of reform. The H1B lottery system is fraught with inefficiencies and inequities that hold back the United States. If you want to work, have skills, and need sponsorship because you were recently laid off, please drop a comment with your email address spelled out to avoid the most basic of spam bots. While I cannot promise employment in the United States or anywhere, I will try to make introductions for people who were recently laid off from Twitter. I have referral powers at more than one company, and many of them are among the best on Earth.
© Marcus Eagan.RSS