Athas Boilerplate

Incident Response

internal

Severity model, triage, mitigation, and communication flow.

Severity

  • SEV-1: broad outage or critical data/security risk
  • SEV-2: major feature unavailable or heavily degraded
  • SEV-3: partial impact with workaround

First 15 minutes

  1. Confirm impact scope and affected environments.
  2. Check recent deploys/config changes.
  3. Validate health endpoints and dependencies.
  4. Decide rollback vs runtime mitigation.

Mitigation options

  • rollback latest deployment
  • disable risky paths with runtime flags
  • pause non-critical async processing
  • switch provider path when single provider is degraded

Closeout

  • verify core journeys are healthy
  • publish postmortem with corrective actions
  • update runbooks/alerts/tests

Detailed operations copy: docs/runbooks/incident-response.md

On this page