Checklist: Safe schema changes in a shared database
A short checklist we run before and during schema changes on shared databases.
Schema changes in a shared database are one of the fastest ways to surprise multiple teams at once.
This checklist is not a full migration guide.
It’s the set of questions we make ourselves answer before we run anything that can lock or reshape hot tables.
Context
Use this checklist when:
- changing table structure that multiple services touch
- adding or dropping indexes on busy tables
- modifying column types or constraints
It assumes:
- you already have a basic design for the change
- you’ve identified which tables and queries might be affected
The checklist is there to prevent "we didn’t think of that" from turning into a paging alert.
Checklist
-
Is the change additive first?
- prefer adding new columns, tables, or indexes before removing or changing old ones
- plan destructive steps as a follow-up once traffic has moved
-
Do we understand query impact?
- list the top queries that touch the table(s) by frequency and latency
- check how they will use the new or changed columns or indexes
- consider worst-case scans and locking
-
Have we tested on production-shaped data?
- run the migration on a realistic copy or subset of data
- measure how long it takes and what locks it acquires
-
Is there a clear rollout and rollback plan?
- document the steps for rolling forward and back
- ensure rollback does not require guessing the previous state
-
Are we monitoring the right signals?
- database CPU, I/O, and lock wait times
- application latency and error rates for affected flows
- replication lag, if applicable
-
Is the timing right?
- schedule during a lower-traffic window when possible
- confirm on-call coverage for both database and application teams
-
Do other teams know this is happening?
- notify owners of services that depend on the database
- share the rollout and rollback plan in a place they can find during an incident
Notes
We’ve found that most migration incidents come from missing one of these basics, not from exotic edge cases.
A few practical lessons:
- dry runs on toy data are better than nothing, but can hide locking and timing problems
- even "online" schema changes can cause noticeable load; monitoring is not optional
- coordination with consuming teams is as important as the SQL itself
We also treat the checklist as living: when an incident teaches us something new, we add one or two items and prune anything that’s no longer useful.
Takeaways
- Schema changes on shared databases are operations work, not just schema design.
- Additive-first changes, realistic tests, and clear rollback plans prevent many surprises.
- Coordination and monitoring matter as much as the SQL you run.