SRE Programme · Tier 2 of 5

SRE Programme — Operational.

You have practices, but not yet discipline. SLOs exist on some services; error budgets are a concept, not enforced; postmortems are written, sometimes read. The next moves make error budgets real — numbers that change what teams do next.

DORA Medium cluster per DORA Accelerate 2024: MTTR hours-to-day; change-fail rate 16–30%; deploy weekly. PagerDuty State of Digital Ops: median actionable-alert rate 40–60%.

What this tier actually looks like.

You have SLOs on critical services. Postmortems are blameless. Runbooks exist for the obvious incidents. The team is reasonably proud of its on-call shape. Then someone asks: “when did we last freeze feature work because we burned the error budget?” The answer is silence, because error budgets are a number on a dashboard, not a policy with consequences.

You probably have:

Why most teams get stuck here.

Operational-tier SRE programmes stall because error budgets aren’t enforced. Three patterns:

The three substrate moves to the next tier.

1. Define error budgets for the top 3 services. Agree the policy.

Per Google SRE Workbook: when the budget is burnt, what triggers? Feature freeze? Change-velocity reduction? Auto-pause of risky deploys? Pick consequences before the budget gets burnt; agree them with product. Error Budget calculator helps size the conversation.

2. Track toil. Target <50%. Fund the automation.

Per Google SRE Ch.5. Categorise team work per sprint; toil-percentage reported alongside feature delivery. Above 50%, investment in automation funded explicitly — not as “we’ll do it when we have time.”

3. Build runbooks for the top 10 alert types. Test one in a game-day.

Runbooks tested in game-days survive the 3am page. Untested runbooks decay. Cadence: quarterly game-day, rotating which runbook gets exercised. Closes the gap between “we have runbooks” and “our on-call rotation actually uses them.”

What changes when you cross.

This is the Disciplined tier. DORA “High” cluster. The jump from Disciplined to Engineered (DORA “Elite”) is platform-level: golden signals inherited, blast-radius as a design constraint, chaos engineering as a habit. See the Platform Engineering IDP reference architecture.

Run the diagnostic.

To find out whether your team scores at this tier or another, run SRE Programme. It takes 2–4 minutes and surfaces both your overall tier and the capability breakdown that shows you where the move starts.

For the bigger picture: the compound diagnostic takes results from all six diagnostics and shows you the substrate gap that bounds your overall delivery, not the per-discipline symptom.