RELIABILITY2018-05-22BY ELI NAVARRO

Reversibility over bravado

Decision: default to reversible changes; treat irreversible work as a planned cutover.

reliabilitydeploymentsmigrationschange-management

This is a decision record.

It exists because teams keep repeating the same mistake under deadline pressure: shipping an irreversible change and calling it confidence.

Context

Most systems do not start with a reliable rollback story.

Deploys may be manual. Environments may not match production. Database changes may be coupled to application releases in ways that are hard to unwind.

In that world, “bravado” looks attractive. Big-bang deploys. Late-night cutovers. Fingers crossed.

Bravado isn’t courage. It’s uncertainty with a deadline.

Irreversible work shows up in familiar forms:

  • dropping or rewriting data in place
  • changing a contract in a way old clients can’t survive
  • removing the “old path” before the new path has proven itself
  • bundling many changes so you can’t isolate what broke

The cost shows up later. During the first incident after the change, the team spends more time reconstructing a backout path than fixing the actual problem.

Decision

Default to reversibility.

  • Ship smaller changes that can be observed and undone.
  • Make rollback a first-class requirement for “ready to deploy.”
  • Keep the old path working during migrations (expand/contract when possible).
  • Treat database changes as higher risk. Prefer patterns that allow dual-write and staged cleanup.
  • If a change cannot be reversed safely, label it honestly: cutover.

Cutovers are allowed. But they are planned work.

A cutover must have:

  • a written plan
  • a rehearsal (even if small)
  • a stop condition
  • a backout condition
  • a verification step that is not “no one complained yet”
  • a comms plan (who says what, where)

Consequences

This decision changes the shape of delivery.

Expected benefits:

  • incidents are smaller because changes are smaller
  • time-to-restore after a bad deploy drops because rollback exists and is practiced
  • teams stop depending on after-hours bravery as the default

Costs:

  • some work moves earlier (planning, rollout, verification)
  • you say “not deployable yet” more often
  • you maintain rollback paths instead of letting them rot

Reversibility is not overhead. It is the cost of shipping changes you are willing to own.