DELIVERY2025-04-09BY STORECODE

Using LLMs to draft runbook improvements

How we use internal LLM tools to propose changes to runbooks based on incident docs, without letting the tool edit production docs on its own.

deliveryrunbooksllmincidents

We wrote a lot of incident docs.

We also had a backlog of runbooks that didn’t fully reflect what we had learned from those incidents.

The gap was predictable:

  • incident leads had fresh context but limited time
  • runbook updates felt like "extra" work
  • some teams were better at closing the loop than others

When our internal LLM tools became reliable enough for drafting, we tried using them to help bridge this gap.

We were careful about one rule:

The tool can propose edits, but it cannot change runbooks without a human accepting and editing them.

Constraints

  • Runbooks live in version-controlled repositories.
  • Incident docs may contain sensitive data; we must redact or summarize appropriately before sending anything to a model.
  • Engineers own their runbooks and must be able to disagree with suggestions.

Constraints

  • Runbooks live in version-controlled repositories.
  • Incident docs may contain sensitive data; we must redact or summarize appropriately before sending anything to a model.
  • Engineers own their runbooks and must be able to disagree with suggestions.

What we changed

1. Structure inputs from incident docs

We added lightweight structure to incident docs:

  • tagged sections for "What worked" and "What was missing from the runbook"
  • short bullet lists for mitigation steps actually taken

An internal tool then:

  • extracts these tagged sections
  • strips or masks identifiers and sensitive payloads
  • sends the sanitized text to an LLM with a constrained prompt

2. Ask for specific kinds of suggestions

We don’t ask the model to "improve the runbook."

We ask for:

  • missing steps that should be added to existing sections
  • clarifications where existing runbook steps proved confusing
  • suggestions to simplify or re-order steps based on what actually happened

The tool produces a structured proposal, not a full edited document.

3. Keep edits in code review

Proposed changes arrive as:

  • a diff against the current runbook
  • with comments explaining why each change is suggested

Engineers review these diffs like any other change:

  • accept, modify, or reject sections
  • add their own context

This keeps change history clear and avoids silent edits.

4. Track which suggestions are useful

We add a small note in the pull request template:

  • "Did model-suggested changes match reality?"
  • "Were any suggestions misleading or wrong?"

This gives us feedback on:

  • which prompt patterns work
  • where the model tends to overstep or hallucinate

Over time, we tuned prompts to:

  • avoid guessing at root causes
  • focus on mechanics (steps, ordering, clarity)

Results / Measurements

We didn’t try to measure "AI productivity."

We did track a few signals:

  • how often runbooks were edited within a week of an incident
  • how many edits were seeded by tool suggestions
  • qualitative feedback from incident leads and on-call engineers

After a few months:

  • more incidents led to concrete runbook improvements
  • engineers reported that starting from a draft diff was easier than from a blank editor
  • we still threw away suggestions regularly, which we treat as expected

Takeaways

  • LLM tools can help propose runbook changes, but ownership must stay with humans.
  • Structuring incident docs and prompts around specific questions works better than asking for "better docs."
  • Keeping suggestions as code-reviewed diffs preserves accountability and history.

Further reading