Using LLM tools to assist incident retrospectives
How we use LLM-based tools to help with retrospectives—clustering themes, drafting sections—while keeping humans in charge of conclusions.
Running good incident retrospectives is work.
For complex incidents, we may have:
- long chat threads
- multiple dashboards and screenshots
- several partial narratives from different teams
Our goal in a retro is to:
- understand what happened
- agree on what we learned
- identify changes we want to make
We experimented with using internal LLM tools to help with some of the mechanics, without outsourcing judgment.
Constraints
- Incident participants must feel safe; we don’t want tools that sound like they are grading people.
- We redact or summarize sensitive data before sending anything to a model.
- Humans own the final retro document and action items.
What we changed
1. Use tools to cluster themes, not blame
We feed sanitized incident artifacts into an internal tool that can:
- cluster related comments or observations
- highlight repeated patterns (e.g., "missing owner", "confusing alert")
We explicitly do not ask for:
- "root cause" statements
- judgments about individuals or teams
The output is a set of suggested themes, which facilitators can accept, merge, or ignore.
2. Draft structure, not conclusions
We ask tools to propose:
- a candidate outline for the retro doc (sections, headings)
- bullet-point summaries of events we already agree on
Facilitators then:
- edit for accuracy and nuance
- fill in analysis and conclusions
This reduces the time spent retyping known facts, leaving more time for discussion.
3. Keep action-item generation human-led
We avoid asking tools to "suggest fixes."
Instead, we:
- use clustered themes as prompts in the meeting ("we saw several mentions of X")
- let participants propose and debate actions
This keeps ownership of work with the people who will do it.
4. Be transparent about usage
We are explicit in retro invites and docs about:
- which tools we’re using
- what inputs they see
- what outputs they produce
Participants can opt out of having certain parts of conversations used as inputs.
Results / Measurements
We look for:
- whether retros happen more reliably for significant incidents
- whether participants feel they have more time for discussion
- whether themes feel more consistent across incidents
Early feedback:
- facilitators appreciated help with organizing raw material
- participants valued seeing patterns highlighted across incidents
- some skepticism remained, which we treat as healthy pressure to keep tools scoped
Takeaways
- LLM tools can help with the mechanics of retrospectives—clustering and drafting—but should not drive conclusions.
- Being transparent about what tools do and don’t do helps maintain trust.
- Keeping action items and analysis human-led keeps ownership where it belongs.