ARCHITECTURE2023-07-19BY ELI NAVARRO

Checklist: Safe schema changes in a shared database

A short checklist we run before and during schema changes on shared databases.

architecturedatabasesmigrationschecklists

Schema changes in a shared database are one of the fastest ways to surprise multiple teams at once.

This checklist is not a full migration guide.

It’s the set of questions we make ourselves answer before we run anything that can lock or reshape hot tables.

Context

Use this checklist when:

  • changing table structure that multiple services touch
  • adding or dropping indexes on busy tables
  • modifying column types or constraints

It assumes:

  • you already have a basic design for the change
  • you’ve identified which tables and queries might be affected

The checklist is there to prevent "we didn’t think of that" from turning into a paging alert.

Checklist

  • Is the change additive first?

    • prefer adding new columns, tables, or indexes before removing or changing old ones
    • plan destructive steps as a follow-up once traffic has moved
  • Do we understand query impact?

    • list the top queries that touch the table(s) by frequency and latency
    • check how they will use the new or changed columns or indexes
    • consider worst-case scans and locking
  • Have we tested on production-shaped data?

    • run the migration on a realistic copy or subset of data
    • measure how long it takes and what locks it acquires
  • Is there a clear rollout and rollback plan?

    • document the steps for rolling forward and back
    • ensure rollback does not require guessing the previous state
  • Are we monitoring the right signals?

    • database CPU, I/O, and lock wait times
    • application latency and error rates for affected flows
    • replication lag, if applicable
  • Is the timing right?

    • schedule during a lower-traffic window when possible
    • confirm on-call coverage for both database and application teams
  • Do other teams know this is happening?

    • notify owners of services that depend on the database
    • share the rollout and rollback plan in a place they can find during an incident

Notes

We’ve found that most migration incidents come from missing one of these basics, not from exotic edge cases.

A few practical lessons:

  • dry runs on toy data are better than nothing, but can hide locking and timing problems
  • even "online" schema changes can cause noticeable load; monitoring is not optional
  • coordination with consuming teams is as important as the SQL itself

We also treat the checklist as living: when an incident teaches us something new, we add one or two items and prune anything that’s no longer useful.

Takeaways

  • Schema changes on shared databases are operations work, not just schema design.
  • Additive-first changes, realistic tests, and clear rollback plans prevent many surprises.
  • Coordination and monitoring matter as much as the SQL you run.

Further reading