ARCHITECTURE2023-07-19BY ELI NAVARRO

Checklist: Safe schema changes in a shared database

A short checklist we run before and during schema changes on shared databases.

architecturedatabasesmigrationschecklists

Schema changes in a shared database are one of the fastest ways to surprise multiple teams at once.

This checklist is not a full migration guide.

It’s the set of questions we make ourselves answer before we run anything that can lock or reshape hot tables.

Context

Use this checklist when:

It assumes:

The checklist is there to prevent "we didn’t think of that" from turning into a paging alert.

Is the change additive first?
- prefer adding new columns, tables, or indexes before removing or changing old ones
- plan destructive steps as a follow-up once traffic has moved
Do we understand query impact?
- list the top queries that touch the table(s) by frequency and latency
- check how they will use the new or changed columns or indexes
- consider worst-case scans and locking
Have we tested on production-shaped data?
- run the migration on a realistic copy or subset of data
- measure how long it takes and what locks it acquires
Is there a clear rollout and rollback plan?
- document the steps for rolling forward and back
- ensure rollback does not require guessing the previous state
Are we monitoring the right signals?
- database CPU, I/O, and lock wait times
- application latency and error rates for affected flows
- replication lag, if applicable
Is the timing right?
- schedule during a lower-traffic window when possible
- confirm on-call coverage for both database and application teams
Do other teams know this is happening?
- notify owners of services that depend on the database
- share the rollout and rollback plan in a place they can find during an incident

We’ve found that most migration incidents come from missing one of these basics, not from exotic edge cases.

A few practical lessons:

dry runs on toy data are better than nothing, but can hide locking and timing problems
even "online" schema changes can cause noticeable load; monitoring is not optional
coordination with consuming teams is as important as the SQL itself

We also treat the checklist as living: when an incident teaches us something new, we add one or two items and prune anything that’s no longer useful.

Schema changes on shared databases are operations work, not just schema design.
Additive-first changes, realistic tests, and clear rollback plans prevent many surprises.
Coordination and monitoring matter as much as the SQL you run.