UpSkillZone

Incident postmortem · INC-2026-001

Mentor queue worker drained slowly

SEV-3·resolved

Queue consumer fell behind by ~20 min during peak intake; backlog drained without manual intervention once worker count auto-scaled.

Started

Jan 12, 2026, 02:22 PM UTC

Resolved

Jan 12, 2026, 04:08 PM UTC

Duration

1h 46m

Root cause

Auto-scaler cooldown was set to 600s; intake spike outran the scale-up window before the second worker was admitted.

Customer impact

Mentor review queue depth peaked at ~120 items; learners saw slower review-arrival ETA but no item was lost or double-claimed.

Remediation

  • Cut auto-scaler cooldown from 600s to 90s.
  • Added a queue-depth burn alert that pages on >50 items waiting >5 minutes.
  • Ran a queue-claim atomicity eval to confirm no items were double-claimed during the spike.