CULTURE2023-06-30BY STORECODE

On-call across time zones

How we adjusted our on-call rotation and habits once the team was no longer sitting in the same room or even the same continent.

cultureon-callremote-workincidents

When the team fit in one office, on-call was noisy but simple.

People overheard pages. Someone would wander over to the on-call’s desk. If an incident stretched late, others could choose to stay.

As we spread across time zones, those informal supports disappeared.

We didn’t want to centralize everything back into a single region’s workday. We also didn’t want incidents to turn into 24-hour marathons because people felt obligated to "stay on" after their shift.

We had to treat on-call across time zones as a design problem, not a social one.

Constraints

  • We had engineers in multiple time zones with uneven coverage in some hours.
  • Not all services justified a fully separate regional on-call rotation.
  • We wanted handoffs that worked even when people never overlapped in real time.
  • We didn’t want to double the cognitive load by having a different process per region.

What we changed

1. Make rotations local where it matters

We stopped pretending a single global rotation was fine for everything.

For services where incidents often needed product context or rapid iteration during the local day, we:

  • created regionally-focused primary rotations (e.g., "Americas daytime", "Europe daytime")
  • paired them with a smaller, global backup rotation for off-hours

For truly global, low-latency surfaces, we kept a more traditional 24/7 SRE-style rotation.

2. Write handoffs for people, not buildings

We rewrote handoff expectations assuming no shared office:

  • every shift ends with an explicit written handoff (even if there was a live call)
  • the handoff lives in the incident doc, not just in chat
  • the incoming shift acknowledges in writing when they’ve taken over

This mirrors what we do for incident handovers generally, with more emphasis on making shifts feel bounded.

3. Clarify who wakes up and who doesn’t

Time zones introduce a real risk: people wake up for pages that someone else is already handling.

We clarified a few rules:

  • only the active primary and backup for a rotation are expected to respond to pages
  • others can join if they are awake and willing, but it is a choice
  • we don’t praise "I stayed up all night even though I wasn’t on call" as heroics

We also tuned alert routing so that pages go first to the person actually on shift, not to a long list of people around the world.

4. Make asynchronous incident updates normal

In a single-time-zone team, most updates happen in real time during a call.

Across time zones, we:

  • rely more heavily on timestamped updates in the incident doc
  • add short "state of the incident" summaries when shifts change
  • keep a consistent log of decisions and hypotheses

This helps people coming online in another region quickly understand whether they need to join or just monitor.

5. Align expectations with managers and product

On-call across time zones is not just a technical design; it’s a staffing and expectation problem.

We aligned with managers and product partners:

  • which services truly require 24/7 response
  • which ones can reasonably wait for local business hours for non-critical issues

This avoided quietly expecting people in one region to carry the pager for systems that primarily serve another.

Results / Measurements

We looked at:

  • how often people outside the active shift joined incidents anyway
  • how long incidents stretched past shift boundaries
  • self-reported burnout and sleep disruption in rotation surveys

After a few months:

  • we saw fewer cases of "off-shift" engineers feeling pressured to stay on
  • incidents that crossed shifts had clearer handovers and less duplicate investigation
  • surveys showed more people felt the rotation was sustainable

We still adjust. Time zones change, people move, and services grow.

But instead of treating each change as an exception, we treat on-call structure as part of system design.

Takeaways

  • Time zones are a property of your system, not just your calendar.
  • On-call structures should match service needs and where people actually live.
  • Clear written handoffs and scoped expectations matter more when people don’t share a room.
  • Avoid rewarding unsustainable behavior; design rotations that don’t depend on it.

Further reading