Incident postmortem · INC-2026-001
Mentor queue worker drained slowly
SEV-3·resolved
Queue consumer fell behind by ~20 min during peak intake; backlog drained without manual intervention once worker count auto-scaled.
Started
Jan 12, 2026, 02:22 PM UTC
Resolved
Jan 12, 2026, 04:08 PM UTC
Duration
1h 46m
Root cause
Auto-scaler cooldown was set to 600s; intake spike outran the scale-up window before the second worker was admitted.
Customer impact
Mentor review queue depth peaked at ~120 items; learners saw slower review-arrival ETA but no item was lost or double-claimed.
Remediation
- Cut auto-scaler cooldown from 600s to 90s.
- Added a queue-depth burn alert that pages on >50 items waiting >5 minutes.
- Ran a queue-claim atomicity eval to confirm no items were double-claimed during the spike.