STEWARDSHIP2025-01-15BY PRIYA PATEL

Story: the metrics that became compliance signals

Operational metrics we treated as internal-only later became compliance and reporting signals. We describe how we adapted.

stewardshipobservabilitycompliancemetrics

What happened

We built our observability stack for ourselves.

The first dashboards showed:

  • request rates
  • error rates
  • latencies
  • resource usage

They were designed for on-call engineers and incident leads.

Over time, other people started using them:

  • support teams checking user impact
  • product managers looking at adoption
  • finance asking about unit economics

Then a new category of consumer arrived: compliance and audit.

A few regulatory and contractual changes meant we had to:

  • demonstrate that certain flows behaved within defined limits
  • prove we could detect and respond to specific classes of failures

We realized that some of the metrics we had been treating as "internal" were now part of external expectations.

The shift wasn’t immediate. It happened in three steps.

Step 1: a question we couldn’t answer cleanly

During a review, we were asked a simple question:

"How do you know that users can access their data within the defined time window, and how do you prove it over the last 12 months?"

We had:

  • per-service SLOs
  • dashboards showing uptime and latency
  • scattered logs indicating when certain operations completed

What we didn’t have was:

  • a clear, documented mapping between those SLOs and the external requirement
  • a stable way to archive and reproduce the evidence

It took days to reconstruct an answer from:

  • old dashboards
  • exported metrics
  • incident reports

The answer was ultimately positive, but the path there was brittle.

Step 2: metrics as contracts

We decided to treat a subset of metrics as contracts, not just tools.

For each externally-relevant requirement, we:

  • identified which internal metrics and SLOs supported it
  • wrote down that mapping in a short document
  • clarified ownership for keeping both sides up to date

Examples:

  • data access latency requirements mapped to read-path SLOs
  • availability requirements mapped to per-service uptime SLOs

Suddenly, those SLO dashboards were no longer "for us" only.

They were evidence.

Step 3: designing with those consumers in mind

Knowing that audit and compliance would read some dashboards changed how we designed them:

  • labels and descriptions became more precise
  • we added annotations when large changes or incidents occurred
  • we made it easy to export and snapshot data for specific windows

We also became more careful about:

  • renaming or replacing key metrics
  • changing definitions without an accompanying note or migration plan

The goal was not to freeze the system, but to avoid "we changed that graph three times" as an explanation.

What we changed

1. Tag metrics that support external promises

We introduced a simple tagging scheme:

  • metrics that support external or contractual guarantees carry a tag (for example, externally_visible)
  • dashboards built from those metrics include a short "purpose" section at the top

This helped us:

  • see at a glance which graphs had extra scrutiny
  • avoid accidental breakage during refactors

2. Introduce retention and export plans

We adjusted storage and export for tagged metrics:

  • ensure they are retained for as long as our obligations require
  • provide a way to export time ranges in a stable format

This avoided ad-hoc scripts every time someone needed a year’s worth of data for a review.

3. Align incident reviews with external narratives

Incident reviews for systems covered by external guarantees gained a small section:

  • how did this incident affect the metrics we treat as evidence?
  • did we breach any defined windows or limits?
  • how will we present this if asked six months from now?

We did not let this dominate the technical discussion.

But we did make sure someone thought about it.

4. Keep "engineering" and "reporting" honest

We resisted the temptation to create a separate layer of "reporting metrics" that painted a nicer picture.

Instead, we:

  • reused the same SLOs and metrics we used for ourselves
  • added context and explanation where needed

This kept us from drifting into a world where we met the report but not reality.

Takeaways

  • Some internal metrics will eventually become external promises; it’s better to plan for that than to be surprised.
  • Tagging and documenting which metrics support those promises makes refactors safer.
  • Retention and export are part of observability design when metrics serve as evidence.
  • Using the same signals for engineering and external reporting keeps incentives aligned.

Further reading