CULTURE2025-07-08BY STORECODE

Checklist: Promoting an internal tool to tier-1 status

A checklist we use when an internal tool quietly becomes essential and needs to be treated like a tier-1 service.

cultureinternal-toolsreliabilityownership

Internal tools have a way of becoming important before anyone notices.

A script turns into a UI.

A UI turns into the only way to perform some operations.

At some point, the tool is effectively tier-1, even if our processes and on-call rotations haven’t caught up.

This checklist is how we turn "this tool is important" into "this tool is treated like a tier-1 service."

Context

Use this checklist when:

  • an internal tool is required during incidents
  • support or operations can’t do their work without it
  • outages of the tool cause urgent escalations

Checklist

  • Is there a clearly named owning team?

    • the tool appears on a team’s backlog and reliability reviews
    • there is a documented point of contact
  • Does it have SLOs?

    • basic availability and latency targets
    • defined error budgets
  • Is it in the on-call rotation?

    • alerts go to people who can fix it
    • severity for tool outages is calibrated to impact
  • Are observability and logging in place?

    • dashboards for key flows
    • logs that make it possible to debug incidents quickly
  • Are risky operations guarded?

    • confirmation steps for bulk or destructive actions
    • role-based access for sensitive capabilities
  • Is there a runbook?

    • basic "first steps" for common failures
    • clear rollback or disable paths for features
  • Can we deploy and roll back safely?

    • changes are version-controlled
    • deploys are observable and reversible

Notes

Promoting a tool to tier-1 doesn’t mean making it perfect overnight.

It does mean:

  • bringing it into the same engineering discipline as external services
  • giving teams permission to invest in its reliability and UX

We’ve found it helps to:

  • add the tool to regular reliability and product reviews
  • include it in incident drills

Takeaways

  • Internal tools that are critical for operations should be treated like tier-1 services.
  • Ownership, SLOs, on-call coverage, and runbooks are the minimum, not the end state.
  • Making this promotion explicit unlocks investment that was previously "nice to have."

Further reading