STEWARDSHIP2022-11-18BY PRIYA PATEL

Cost visibility for infrastructure decisions

How we made infrastructure cost visible enough that engineers could treat it like latency or reliability when making decisions.

stewardshipcostinfrastructureobservability

Cost decisions used to arrive as surprises.

A quarterly email would say "infra spend is up 20%" and everyone would scramble to remember what changed.

Engineers rarely saw the connection between day-to-day work and those numbers.

This wasn’t because people didn’t care. It was because the information arrived too late and in the wrong format.

We decided to make infrastructure cost visible enough that engineers could treat it like latency or reliability: a signal to balance, not a bill to dread.

Constraints

We did not control all layers of billing; some costs were aggregated.
We didn’t want everyone to become a cloud-pricing expert.
We needed to avoid shaming teams; the goal was better decisions, not blame.

What we changed

1. Rough, actionable dashboards

We built dashboards that answered a few concrete questions:

What are the top N services by cost?
How has cost changed for each over the last few months?
Are there obvious jumps tied to known events (launches, migrations)?

We did not chase perfect allocation.

Instead, we:

grouped services where needed
accepted approximate per-service numbers
focused on trends and outliers

2. Cost annotations for major changes

We started annotating major infrastructure-related changes with cost expectations:

"this will increase storage by ~X% for service Y"
"this may double traffic to dependency Z"

In incident and change reviews, we asked:

Is this cost acceptable?
How will we know if it overshoots?

This made cost part of the design conversation instead of an afterthought.

3. Simple per-service budgets

We introduced rough budgets for services with high or fast-growing costs.

A budget looked like:

"Keep monthly cost for service X within Y–Z range"

When a service trended above that range, we:

asked what changed
looked for low-risk optimizations (e.g., removing unused indexes, dialing down unnecessary telemetry)

The point was not to freeze spending, but to attach intent to changes.

4. Partner with product on trade-offs

We worked with product teams to frame trade-offs explicitly:

"This feature will likely add ~N% to infra cost. In return, it should reduce support contacts by M%."

Sometimes the answer was "yes, ship it." Sometimes it was "not yet" or "let’s try a cheaper version first."

The key was that cost was part of the same conversation as user value and reliability.

Results / Measurements

Within a couple of quarters, we saw:

Fewer surprises. Quarterly cost reviews became "this is what we expected" instead of "what happened?"
Targeted optimizations. Teams identified specific places where small changes had outsized cost impact (e.g., turning down over-provisioned resources, cleaning up abandoned experiments).
Better migration planning. When planning major changes, cost impact was a first-class section in design docs.

We did not dramatically reduce total cost; that wasn’t the immediate goal.

We did change how it grew and how comfortable people felt talking about it.

Takeaways

Engineers make better infrastructure decisions when they see cost alongside latency and error rates.
Rough, visible numbers are more useful than perfect ones hidden in quarterly reports.
Simple per-service budgets and annotations tie changes to expected cost impact.
Bringing cost into design conversations early leads to more intentional trade-offs, not automatic "no".