DELIVERY2019-04-02BY JONAS "JO" CARLIN

Q&A: What does “done” mean?

Done isn’t shipped. Done is shipped, observable, reversible, and supportable.

deliveryqualityoperationshandoff

Teams argue about “done” when they’re actually arguing about who carries risk.

Q&A

Q: Is “done” the same as “shipped”?

A: No.

Shipped means code is in production.

Done means we can live with it.

You can ship something that isn’t done. Sometimes you have to. But then you label the missing parts honestly and you schedule the follow-up like it’s real work, not a wish.

Q: What has to be true for something to be done?

A: Four boring things:

  • Observable: we can tell if it’s working.
  • Reversible: we can undo it when it isn’t.
  • Supportable: a non-author can take a safe first action.
  • Bounded: we can describe what it does, and what it doesn’t.

Observable means the page points to a dashboard that answers “is it broken?” in under a minute.

Reversible means rollback is a button (or a short, tested sequence), not an emergency research project.

Supportable means support and on-call can answer “what just happened?” without waking the person who wrote the code.

Bounded means we can say “this does X, and it does not do Y” so people don’t build expectations out of silence.

Q: Does documentation count?

A: The kind that helps in the first ten minutes does.

A runbook link, a dashboard link, and a rollback path are “done” work.

A long wiki page that no one can find is not.

Q: Is it done when tests are green?

A: Tests are part of “done,” not the definition of it.

Green tests don’t tell you if the dashboard loads quickly, if the alert is pointed at the right symptom, or if the rollback is safe.

When tests are green and you still can’t operate the change, you’ve shipped risk, not work.

Q: What if we have to ship and fill in the rest later?

A: Then it’s not done.

That doesn’t mean you can’t ship.

It means you label the missing parts honestly, assign an owner, and schedule the follow-up before the memory evaporates.

If the follow-up isn’t scheduled, “later” means “never,” and “never” becomes your new operational baseline.

Q: How do you keep “done” from becoming process theater?

A: Keep it tied to risk.

If a check doesn’t reduce a real failure mode, remove it.

If a failure mode keeps showing up, add a check that collapses it.

Q: Who decides “done”?

A: The person who will carry the risk.

If on-call is going to be paged when it breaks, on-call gets a vote in what “done” requires. If support will handle the fallout, support gets a vote too.

Product can decide what’s valuable. Operations decides what’s survivable. “Done” sits in the overlap.

Q: What does “done” mean for data changes?

A: It means a migration plan, not a script.

If you can’t keep the old path working while the data changes shape, you don’t have rollback.

A staged migration with verification is “done.” A big-bang cutover with hope is not.

Q: Isn’t this slower?

A: Up front, yes.

But “fast” that creates fragile systems becomes slow later. You pay in pages, rewrites, and the kind of incident where everyone is guessing.

The goal of a “done” definition is to move the cost earlier, while you still have choices.

Q: How do you keep “done” lightweight?

A: Keep it attached to failure modes.

If a check doesn’t reduce a real risk, delete it.

If you’re adding checks because they sound responsible, stop.

A good “done” definition is boring and short. It’s the minimum set of things that prevent the most common and expensive failures.

Q: What’s the smallest version of “done” you accept under deadline?

A: Observable enough to know if it’s failing, and reversible enough to stop the bleeding.

If we ship without supportability or perfect docs, we still include:

  • a dashboard link that answers “is it broken?”
  • a safe rollback/backout path
  • an owner for the missing pieces with a date

If there’s no owner and no date, we didn’t ship debt. We shipped abandonment.

Q: What does “done” mean for handoff?

A: It means the next person can operate it.

That can be as small as:

  • a link to the runbook section for the new behavior
  • the one dashboard that answers “is it broken?”
  • a short note on how to roll back

If handoff is “ask the author,” the work isn’t done. It’s just shipped.

Q: What does “done” mean for monitoring?

A: The page points to a starting point.

It doesn’t mean you need perfect alerting on day one.

It means you can answer two questions quickly:

  • is it broken for users?
  • is it our change or someone else’s?

“Done” monitoring is usually one fast dashboard and one alert that pages only on real impact.

Everything else can be tickets.

If a system has no page/ticket boundary, “monitoring” becomes a second incident generator.

And “done” includes deletion.

Remove flags when they’re no longer needed. Delete dashboards that aren’t used. Update the runbook so it describes the world you actually have.

If “done” never includes deletion, the system only grows.

Deletion is how you keep the system legible.

Takeaways

“Done” is an operational statement: can we detect, recover, and support this change?

If you can’t point to the rollback, you’re shipping a bet.

If you can’t point to the runbook, you’re shipping a mystery.

Further reading