Story: the feature that only worked in our favorite environment
We built and tested a feature in one staging environment and one region. It behaved very differently elsewhere. Here’s why and what we changed.
What happened
We built a feature that looked solid.
It worked in:
- our main staging environment
- one production region we used for early rollouts
When we enabled it elsewhere, we saw:
- inconsistent behavior
- higher error rates
- confusing logs that didn’t match what we’d seen in staging
The feature wasn’t broken everywhere.
It was broken everywhere that wasn’t our favorite environment.
The blind spots
Our "favorite" environment had:
- a particular data shape
- a subset of integrations
- network and latency characteristics close to one major region
Other regions and environments had:
- different data distributions
- different patterns of third-party errors
- slightly different configuration defaults
We had treated success in one environment as proof.
It was only proof for that environment.
What we changed
1. Make environment differences explicit
We documented key differences between environments and regions:
- data volume and shape
- enabled integrations and flags
- latency profiles
We stopped calling staging "like prod." We started calling it "good for testing these things."
2. Design rollouts to sample diversity
Instead of:
- staging → one region → everywhere
we aimed for:
- staging → multiple regions or cohorts with different characteristics → broader rollout
This meant:
- picking early regions with different traffic patterns
- including internal or low-risk cohorts from more than one environment
3. Align configuration and flags
We found cases where:
- flags had different defaults in different regions
- config values drifted over time
We:
- standardized flag and config baselines where possible
- made differences explicit where they were intentional
4. Test failure modes, not just success
In staging and early rollouts, we:
- simulated dependency failures common in other regions
- tested data from more than one segment or locale
This caught some environment-specific issues before they reached users.
Takeaways
- Success in one environment is a useful signal but not a guarantee.
- Rollouts should sample environments and regions that reflect real diversity.
- Configuration and data shape differences matter as much for testing as code paths.