SLO and Alert Lab

Active use only. At the end of this lab you should have one SLO document and at least one working burn-rate alert in your capstone, not a theoretical discussion of either.

Retrieval Prompts

State the difference between an SLI, an SLO, and an SLA in your own words.
Write the formula for an availability SLI as a ratio of events.
For a 99.5% SLO over 30 days, how much is the error budget as a percentage of total events?
State the two windows typically combined in a multi-window burn-rate fast-burn alert.
Why is "CPU > 80%" usually a bad primary alert?

Compare and Distinguish

Separate these pairs clearly:

SLI vs a "system metric" like CPU utilization
percent-based SLO vs event-based error budget
single-window alert vs multi-window burn-rate alert
page-worthy symptom vs ticket-worthy symptom
aspirational SLO vs defensible SLO for current architecture

Common Mistake Check

For each statement, identify the error:

"Our SLO is 100% -- anything less and users complain."
"We alert if error rate exceeds 1%; that's our SLO alert."
"Budget is fine; we're at 78% consumed with 3 days left in the window."
"We used the AmazonFreeTierMetrics default for our SLO target."
"The alert fires on CPU > 90% because that's when things get slow."

Mini Application

For your capstone:

SLI formula (write it as a ratio):

SLI = <good events expression> / <total events expression>

SLO + window + error budget (fill in concrete numbers):
- target: ____ %
- window: rolling ____ days
- error budget (% of events): ____
- error budget (absolute events, at current traffic): ____
Consequence for missing the SLO (one sentence):

Fast-burn alert (pseudocode):

condition:  error_ratio(last ___) > <multiplier> * <budget fraction>
AND         error_ratio(last ___) > <multiplier> * <budget fraction>
action:     PAGE

Slow-burn alert (pseudocode): same structure, looser thresholds, ticket not page.

Evidence Check

This lab is complete only if:

library/raw/slo.md exists and contains the SLI, SLO, window, budget, and consequence
library/raw/error-budget-policy.md contains the 5-tier ladder
at least one burn-rate alert is configured in your monitoring tool
you have deleted or demoted to ticket status at least one non-SLO page-level alert
you can explain every threshold number in both alerts without checking a book

Retrieval Prompts​

Compare and Distinguish​

Common Mistake Check​

Mini Application​

Evidence Check​

Retrieval Prompts

Compare and Distinguish

Common Mistake Check

Mini Application

Evidence Check