STEWARDSHIP2018-01-12BY JONAS "JO" CARLIN

We will stay

A Q&A on long-term ownership: what it changes, what it costs, and what it buys you.

stewardshipmaintenancedelivery

At 2:07am we got an email with the subject line: “Do you still fix things?”

No deck. No roadmap. Just a system that was failing and a team that didn’t know who still owned it.

That message is the shape of a lot of real software work. Not the launch. The part after the launch.

Internally we keep a short rule:

We will stay.

Not as a vow. As a constraint we use when we scope work and make decisions.

Q&A

Q: What do you mean by “We will stay”?

A: We mean ownership doesn’t end at launch.

We plan for the second deploy, the first incident, the first handoff, and the first time a dependency goes end-of-life. We treat “operate” as part of “build,” not as a separate phase that somehow happens later.

Q: What does “stay” not mean?

A: It doesn’t mean unlimited scope.

Staying requires boundaries: what we support, when we support it, what counts as urgent, and what we will not do. The easiest way to burn out is to pretend “ownership” is the same thing as “always on.”

Q: Is this just a retainer pitch?

A: It can look like that from the outside. Internally, it’s a cost model.

If a team plans to leave, shortcuts and heroics become rational. If a team plans to stay, those same decisions compound. You don’t need a moral argument. You can just do the math.

Q: What changes when you assume you’ll still be responsible later?

A: You start building boring escape hatches.

You bias toward reversible changes. You keep rollback paths simple. You write the runbook before you need it. You ship smaller changes because smaller changes are easier to understand and easier to undo.

Q: How do you talk about timelines without lying?

A: We separate “comfort” from “plan.”

A date without discovery is usually comfort. A plan is a set of constraints, unknowns, and decisions you can defend.

When we can’t defend a timeline yet, we say so, and we propose the work that gets us to something defensible.

Q: What do you measure?

A: We care about measurements that map to operations, not optics.

Time-to-restore after a bad deploy. Time-to-understand for an engineer joining the system. Frequency of emergency releases. How often an incident requires “the one person who knows.”

Sometimes the first measurement is admitting we don’t have a baseline yet—then adding enough observability to establish one.

Q: What does “stay” look like for a small team?

A: Pick one service and practice staying with it.

Write a short runbook. Make rollback real. Add one useful health signal. Do one dependency upgrade before it becomes an emergency. Repeat for six months. The system will get easier to change.

Q: What if the system is too far gone?

A: Then “stay” becomes a different kind of work.

Stabilize first. Reduce blast radius. Make a replacement plan that doesn’t require pretending unknowns aren’t real. Sometimes the right move is not “rebuild everything.” Sometimes the right move is “make the next six months survivable.”

Q: What do you do when the original builders are gone?

A: You bootstrap ownership the same way you bootstrap reliability.

Pick one on-call starting dashboard. Write the shortest possible runbook. Make sure someone can access logs and roll back a deploy. Turn “who owns this?” into a team/role answer, not a name.

Then start deleting unknowns: pages that don’t map to impact, jobs no one can explain, dashboards no one can load.

Staying starts with making the system legible.

Q: How do you avoid “stay” turning into “always on”?

A: By putting boundaries in the system, not in someone’s stamina.

Pages are for impact. Tickets are for drift. If you don’t separate those, you’ll end up being “on call” for everything.

“Stay” also means you schedule maintenance before it’s urgent. The easiest way to become always-on is to treat upgrades, runbooks, and access as optional until they become emergencies.

Q: What does “stay” buy you?

A: Compounding.

When you stay, you invest in rollback paths, clear docs, and boring runbooks because you’ll be the one using them later.

Over time you get fewer midnight surprises, faster recovery, and a system that is cheaper to change. That doesn’t show up on launch day. It shows up in year two.

Q: What do you do when leadership only funds launches?

A: You translate “stay” into the costs they already feel.

Support load. Emergency releases. Vendor upgrades done in panic. The quiet tax of having to wake up specific people because only they know what to do.

We don’t sell “staying” as virtue. We sell it as risk reduction and total cost.

If the budget can’t cover ongoing ownership, we scope the build differently: fewer moving parts, fewer integrations, fewer irreversible bets. A system you can’t afford to own should not be complicated.

We also write the staying terms down: response windows, what counts as urgent, and what we do when we’re not available. Staying is operational, not sentimental.

If you can’t state the boundary, you don’t have staying. You have wishful thinking.

And if you can’t take a safe first action during the first incident, you don’t have staying yet. You have a launch.

Takeaways

“We will stay” is a delivery constraint: staffing, boundaries, documentation, and operations.

If you assume ownership continues, you’ll choose smaller changes, clearer docs, and calmer recovery.

If you assume you’re leaving, the system will learn that too.

Further reading