DELIVERY2025-04-09BY STORECODE

Using LLMs to draft runbook improvements

How we use internal LLM tools to propose changes to runbooks based on incident docs, without letting the tool edit production docs on its own.

deliveryrunbooksllmincidents

We wrote a lot of incident docs.

We also had a backlog of runbooks that didn’t fully reflect what we had learned from those incidents.

The gap was predictable:

incident leads had fresh context but limited time
runbook updates felt like "extra" work
some teams were better at closing the loop than others

When our internal LLM tools became reliable enough for drafting, we tried using them to help bridge this gap.

We were careful about one rule:

The tool can propose edits, but it cannot change runbooks without a human accepting and editing them.

Constraints

Runbooks live in version-controlled repositories.
Incident docs may contain sensitive data; we must redact or summarize appropriately before sending anything to a model.
Engineers own their runbooks and must be able to disagree with suggestions.

Constraints

Runbooks live in version-controlled repositories.
Incident docs may contain sensitive data; we must redact or summarize appropriately before sending anything to a model.
Engineers own their runbooks and must be able to disagree with suggestions.

What we changed

1. Structure inputs from incident docs

We added lightweight structure to incident docs:

tagged sections for "What worked" and "What was missing from the runbook"
short bullet lists for mitigation steps actually taken

An internal tool then:

extracts these tagged sections
strips or masks identifiers and sensitive payloads
sends the sanitized text to an LLM with a constrained prompt

2. Ask for specific kinds of suggestions

We don’t ask the model to "improve the runbook."

We ask for:

missing steps that should be added to existing sections
clarifications where existing runbook steps proved confusing
suggestions to simplify or re-order steps based on what actually happened

The tool produces a structured proposal, not a full edited document.

3. Keep edits in code review

Proposed changes arrive as:

a diff against the current runbook
with comments explaining why each change is suggested

Engineers review these diffs like any other change:

accept, modify, or reject sections
add their own context

This keeps change history clear and avoids silent edits.

4. Track which suggestions are useful

We add a small note in the pull request template:

"Did model-suggested changes match reality?"
"Were any suggestions misleading or wrong?"

This gives us feedback on:

which prompt patterns work
where the model tends to overstep or hallucinate

Over time, we tuned prompts to:

avoid guessing at root causes
focus on mechanics (steps, ordering, clarity)

Results / Measurements

We didn’t try to measure "AI productivity."

We did track a few signals:

how often runbooks were edited within a week of an incident
how many edits were seeded by tool suggestions
qualitative feedback from incident leads and on-call engineers

After a few months:

more incidents led to concrete runbook improvements
engineers reported that starting from a draft diff was easier than from a blank editor
we still threw away suggestions regularly, which we treat as expected

Takeaways

LLM tools can help propose runbook changes, but ownership must stay with humans.
Structuring incident docs and prompts around specific questions works better than asking for "better docs."
Keeping suggestions as code-reviewed diffs preserves accountability and history.

Using LLMs to draft runbook improvements

Constraints

Constraints

What we changed

1. Structure inputs from incident docs

2. Ask for specific kinds of suggestions

3. Keep edits in code review

4. Track which suggestions are useful

Results / Measurements

Takeaways

Further reading