Story: the SLO that was too strict to be useful
We set an SLO tighter than reality and spent months failing against it without learning much. Here’s what we changed.
What happened
We set an SLO we couldn’t meet.
It sounded reasonable:
- 99.99% availability for a core API
- tight latency targets for multiple percentiles
On paper, it matched what we wished the system did.
In practice:
- we "burned" error budget almost every week
- alerts based on the SLO fired frequently
- incident reviews spent time explaining why this was "really fine"
The SLO became background noise.
It didn’t help us make decisions.
The symptoms
The symptoms of a too-strict SLO looked like "discipline" from the outside:
- many alerts
- lots of red on SLO dashboards
Inside, we saw:
- on-call tuning out SLO alerts in favor of symptom-based alerts
- teams rationalizing budget burns as "acceptable for now"
- difficulty using the SLO to justify reliability work, because it always looked bad
The worst part was that we:
- were not actually performing worse than before
- had improved some underlying metrics
The SLO was out of sync with reality.
What we changed
1. Start from observed behavior
We re-baselined.
We looked at a year of data:
- actual availability and latency
- times when users and stakeholders complained
- incidents that we agreed "really mattered"
We then:
- set initial SLOs slightly better than current performance
- planned to tighten them gradually as we invested in reliability
2. Involve stakeholders in SLO setting
We brought product, support, and operations into the conversation:
- what do they consider "good enough"?
- where do they see pain today?
This shifted the goal from "perfect" to "agreed acceptable levels of risk."
3. Make SLOs specific and few
We trimmed:
- overlapping SLOs that measured nearly the same thing
- rarely-used metrics that created noise
We kept:
- one or two primary SLOs per critical flow
- clear definitions for how they’re measured
4. Use SLOs to make trade-offs, not to punish
We clarified how we use SLOs:
- to decide when to spend time on reliability vs features
- to choose between different kinds of reliability work
We explicitly did not use them to grade individuals or teams.
This made it easier to argue for re-baselining when reality and targets diverged.
Takeaways
- An SLO that is impossible to meet isn’t "ambitious"; it’s uninformative.
- Starting from observed performance and real user pain leads to more useful targets.
- Fewer, clearer SLOs make it easier to reason about reliability investments.
- SLOs should help teams make decisions, not just produce red dashboards.