DELIVERY2021-07-29BY JONAS "JO" CARLIN

Q&A: what "good enough" means for staging

Answers to the questions we kept hearing about how realistic staging needs to be and where to spend the effort.

deliverystagingtestingenvironments

Q&A

Why can’t staging be “just like production”?

Because "just like production" is a moving target and a hidden cost center.

Keeping every knob—data volume, traffic shape, third-party integrations, configuration—perfectly in sync would require:

duplicating a lot of infrastructure
running expensive workloads twice
syncing data in ways that introduce privacy and safety risks

Instead of chasing a perfect mirror, we aim for representative enough in the dimensions that matter for a change.

For most features, that means:

similar configuration for the services involved
realistic sample data for the flows being tested
enough load to exercise caching, timeouts, and retries at least once

What absolutely needs to match production?

We insist on a few things being as close as practical:

Critical dependencies. If production talks to a specific database engine, queue, or identity provider, staging should use the same kind (even if smaller).
Configuration patterns. Feature flags, environment variables, and secrets should be wired the same way, even if the values differ.
Failure modes. Timeouts, retries, and circuit breakers should be configured similarly so we see the same classes of failures.

If a change touches a critical path (checkout, authentication, billing), we also try to mirror:

request/response shape
authentication flows
basic rate limits

Where is it fine to diverge?

We allow staging to diverge from production when the risk and cost are low:

Data volume. We rarely need a full copy of production data. A good synthetic dataset plus a thin slice of anonymized real data is usually enough.
Integrations. For non-critical third-party services, staging can use sandboxes or mocks.
Traffic level. We don’t need production-level QPS to catch most logic bugs.

The key is to document the differences.

For each environment, we maintain a short list:

what’s the same as prod
what’s intentionally different
what that means for the kinds of bugs we can and can’t catch

How do we know if staging is “good enough” for a specific change?

We ask three questions during planning and review:

What can go wrong in production if this change fails?
Which of those failures can realistically be surfaced in staging?
What needs to be true about staging for that to happen?

If a change could break a critical data migration, for example, we focus staging effort on:

having representative data shapes
running the migration against that data
checking performance and rollback behavior

If a change only affects a feature flag default for a small cohort, we accept a lighter-weight check.

Do we ever skip staging for small changes?

Yes, but we treat it as an explicit decision, not a habit.

We sometimes skip staging for:

changes that only affect non-production environments
small adjustments to internal dashboards and documentation

When we do, we still:

run automated tests
apply feature flags or config changes gradually
monitor relevant metrics during and after rollout

The bar for skipping staging is higher for:

anything touching auth, billing, or data integrity
changes that alter retries, timeouts, or rate limits

How does remote work affect staging expectations?

Remote work mainly changed how we coordinate, not what staging needs to do.

We try to make staging runs reproducible and observable for people who are not on the same network:

scripts to set up test data instead of "ask someone with access"
logs and dashboards that clearly separate staging from production
written checklists for high-risk changes that depend on staging behavior

Takeaways

Staging doesn’t need to be an exact mirror of production, but it does need to be honest about where it diverges.
Make a short, explicit list of what must match prod for each environment, and keep it updated.
Decide what “good enough” means per change, based on what can go wrong and what staging can realistically surface.
Treat skipping staging as an exception with a rationale, not as a convenience.
Good staging environments make remote collaboration easier by being scriptable and observable, not by being perfect copies.