RELIABILITY2025-09-30BY STORECODE

Story: the SLO that was too strict to be useful

We set an SLO tighter than reality and spent months failing against it without learning much. Here’s what we changed.

reliabilitysloobservabilityculture

What happened

We set an SLO we couldn’t meet.

It sounded reasonable:

99.99% availability for a core API
tight latency targets for multiple percentiles

On paper, it matched what we wished the system did.

In practice:

we "burned" error budget almost every week
alerts based on the SLO fired frequently
incident reviews spent time explaining why this was "really fine"

The SLO became background noise.

It didn’t help us make decisions.

The symptoms

The symptoms of a too-strict SLO looked like "discipline" from the outside:

many alerts
lots of red on SLO dashboards

Inside, we saw:

on-call tuning out SLO alerts in favor of symptom-based alerts
teams rationalizing budget burns as "acceptable for now"
difficulty using the SLO to justify reliability work, because it always looked bad

The worst part was that we:

were not actually performing worse than before
had improved some underlying metrics

The SLO was out of sync with reality.

What we changed

1. Start from observed behavior

We re-baselined.

We looked at a year of data:

actual availability and latency
times when users and stakeholders complained
incidents that we agreed "really mattered"

We then:

set initial SLOs slightly better than current performance
planned to tighten them gradually as we invested in reliability

2. Involve stakeholders in SLO setting

We brought product, support, and operations into the conversation:

what do they consider "good enough"?
where do they see pain today?

This shifted the goal from "perfect" to "agreed acceptable levels of risk."

3. Make SLOs specific and few

We trimmed:

overlapping SLOs that measured nearly the same thing
rarely-used metrics that created noise

We kept:

one or two primary SLOs per critical flow
clear definitions for how they’re measured

4. Use SLOs to make trade-offs, not to punish

We clarified how we use SLOs:

to decide when to spend time on reliability vs features
to choose between different kinds of reliability work

We explicitly did not use them to grade individuals or teams.

This made it easier to argue for re-baselining when reality and targets diverged.

Takeaways

An SLO that is impossible to meet isn’t "ambitious"; it’s uninformative.
Starting from observed performance and real user pain leads to more useful targets.
Fewer, clearer SLOs make it easier to reason about reliability investments.
SLOs should help teams make decisions, not just produce red dashboards.