By Alex Niemi in Product — 18 Nov 2024

The P0 Principle: A Revenue-First Approach to Product Issue Prioritization

This applies to B2B SaaS because that’s what I build. Simply put:

P0: Your clients are losing money due to your issue/bug.

P1: You are losing money due to your issue/bug.

I’m using the word “money” in my definitions because it’s tangible, but “money” could mean anything that ultimately equates (or contributes) to the “return” on investment that your product is optimized to produce; e.g. impressions, pageviews, calls, GB stored, DiskWriteBytes, contract renewals, SLAs met, etc.

I’m not concerned with P2-P4 issues in this article. A ton of content already exists related to classifying and prioritizing bugs using various methods. Most are very logical and are written from an Engineering perspective.

P0 Issue Examples from the Real World

Twitter

At EdgeCast, we delivered Twitter’s content via our CDN. The more content that Twitter users could consume, the more “promoted tweets” (early-days ads) Twitter could serve. Users expect fast-loading content, and they'll bounce if it's slow.

If we made a change to our infrastructure that inadvertently led to an increase in cache misses, Twitter would serve less content and fewer ads. Serving fewer ads means less revenue for Twitter; therefore, it’s a P0 issue and we’d jump on it right away.

It’s important to note that in almost all cases, a P0 issue also costs your company money, but it’s not always as immediately obvious. Often, it’s not seen until quarterly reviews of revenue, SLA performance, or when the client decides to take their business to one of your competitors.

PriceMaster

At MarketShare, we had a joint venture (partnership) with Live Nation / TicketMaster. We built an internal-facing dynamic ticket pricing platform (PriceMaster) that’s still used to this day.

The foundation was a large codebase of data ingest pipelines, Matlab routines, and mountains of data stored in DynamoDB. Every night, the scripts would run and output the optimal ticket price for each individual seat in a venue that was selling tickets to a future live event (ball game, concert, etc.). This price optimization was done for thousands of different events, venues, and accommodation tiers every day.

“Orphaned seats” are those that do not have an available seat to the immediate left or right in the same aisle. This concept is extremely important in ticket pricing because people typically do not like to attend live events alone. Thus, orphaned seats are often worth far less to event-goers than other seats. In a model, an orphaned seat can be represented as a dummy feature with a binary value (0 or 1).

The value of that single feature determines whether our model priced an orphaned seat at $20 (its real market value) or at $3,000 because it’s the last available seat in an otherwise sold-out section. Thus, if occupancy data for the surrounding seats was stale or interpreted incorrectly, TicketMaster wouldn’t sell the orphaned seat because it wasn’t priced correctly. This would be a P0 issue because it would cost TicketMaster sales.

You’re probably thinking: “Yeah, but $20 is nothing in the grand scheme of things.” Correct, but our pricing models were very intelligent, and depending on what our objective function was—maximize event revenue, maximize occupancy (tickets sold), or a sequence of conditional optimizations—they could favor the creation of orphaned seats by mispricing adjacent seats. Long story short, it mattered.

Microsoft

At EdgeCast (again), Microsoft used our API to create a white-label CDN for their clients. Basically, they re-sold our CDN as a service in Azure. We built tons of endpoints specifically for them.

Our Engineering, QA, and DevOps teams had done an incredible job of implementing unit testing (via TDD), integration tests, and automation testing with a high level of coverage throughout the codebase. We were VERY good at catching and resolving any issues in the early stages of our release cycle prior to being deployed to Prod. We also had robust observability and RUM tooling via New Relic and Loggly.

But, as you might imagine, anytime you’re serving terabits of data per second in the form of streaming media, digital assets, OS updates, etc., there’s an ever-present possibility of resource constraints and unforeseen issues due to correlated traffic and requests.

Any bottleneck in the DAL of our application could cause POSTs (used to create/configure a new service by Azure) to fail or timeout. Given that Microsoft was spending millions of dollars a day to market Azure to web-based businesses in order to sell them our white-labeled CDN, any failed requests were a really big deal.

If a newly converted user experienced an error when trying to create their first CDN service via Azure, they’d exit and wouldn’t come back. Thus, the money that Microsoft had spent to acquire that user was wasted. This would be a P0 issue because it was costing Microsoft money.

P1 Issue Examples from the Real World

EdgeCast MCC

We allowed SMBs to use our CDN in a self-serve paradigm through our client-facing application called the MCC (Media Control Center). Most clients had multiple (oftentimes hundreds) of content-rich sites. If we released a breaking change to the frontend of the app, existing clients wouldn’t be able to add new domains (for content delivery) or configure existing domains. It wasn’t a huge deal for our direct clients because their websites would still be operational and deliver their content directly from their host. But, less content delivery for Edgecast meant less revenue. This would cost us money but wouldn’t cost our clients anything. Therefore, it was a P1 issue - important and urgent, but we didn’t start calling engineers in the middle of the night to resolve it (like we would for a P0 issue).

Replay.io

This is a simple one. We had a $300/day marketing budget for paid search in order to bring visitors to our product marketing website, which was of course branded as Replay.io (not Originate) and had its own domain.

We were using StormPath as an Identity Provider (part of our account creation workflow). A few days after we deployed a change to our app (a new integration with an analytics provider), I noticed that our conversion rate on free trials went to zero… didn’t seem right. Some investigation revealed that we inadvertently broke our integration with StormPath’s API. It cost us a thousand bucks (on wasted marketing budget) but didn’t negatively impact our existing customers or cost them anything because our product continued working. Thus, it was a P1 issue.

Implementation

Strategic

I've simplified the P0/P1 model in this article because I wanted to present it from a Product perspective. In reality, there are a lot of additional considerations.

This should primarily be a conversation between Engineering and Product teams. I don't recommend involving every stakeholder at the company because that approach typically leads to lengthy documents and systems that nobody actually uses.

What's important is establishing a shared understanding between your engineers and product managers about what constitutes a true emergency versus an important but less urgent issue. This clarity helps everyone respond appropriately when issues arise.

Consider addressing these elements together:

Define what specifically makes an issue a P0 in your product context
Establish who needs to be notified when a P0 occurs
Set expectations around response times for different priority levels
Review recent incidents and classify them to build a common reference point

Tactical

As a Product Manager, I like using MoSCoW prioritization for my user stories (works well for tasks and change requests too). But, despite what most Agile consultants will tell you, MoSCoW isn’t appropriate for prioritizing bugs—especially when those bugs could significantly impact revenue.

For practical implementation, consider these approaches:

Customize your issue tracking system - Jira, and most commonly used project management tools, allow you to create different priority schemas for different issue types, and this is the proper way to do it. Use a different (and appropriate) priority scale for bugs, and use MoSCoW prioritization for user stories (and/or as a default).
Incorporate impact - It's often helpful if you can provide a ballpark estimate of the impact, especially when it's large. Do NOT go down the path of requiring a revenue-impact assessment in order to prioritize bugs; otherwise, each and every issue will be an all-consuming project. By "ballpark" I mean that being able to say that an estimated 100k users will see a broken link tomorrow morning, but that only 16 of those users likely need to click that same link (and thus experience a functional defect) goes a long way towards understanding priority and urgency.
Set up appropriate notifications - Create automated alerts that notify the right people immediately when P0 issues are logged, reducing response time.
Measure performance - Build a simple dashboard tracking resolution times for P0/P1 issues to help evaluate whether your system is working effectively.

Conclusion

Perhaps the most important thing is that after you've worked together to define your prioritization schema based on issue type, you establish a shared understanding of how each priority level should be handled. Once you've agreed on these responses, communicate them clearly to teams across all functions. That way, when a real P0 issue rears its ugly head, you can act in concert with the right amount of urgency to get it resolved. And, even better, you can take a deep breath when others are frantic about a supposed P0 issue because you already know it doesn't meet the criteria, and you won't be there all night unnecessarily working on it.

Image by Pirmin Lenherr from Pixabay