Job Twin brief — UpSkillZone AI
Job Twin 4 — Live incident response
Scenario
Scenario
A customer-facing model is hallucinating regulated content. Diagnose the failure mode, deploy a hotfix, document the post-mortem.
Live window: 90 minutes. Post-mortem due within 4 hours of the live window.
Deliverables
Deliverables
- Diagnosis — minimal repro of the hallucination class.
- Hotfix — code change plus a regression test that fails before, passes after.
- Post-mortem — timeline, root cause, blast radius, follow-ups.
Materials
Time-box
90 min live + 4h post-mortem
Server-authoritative clock. The deadline is hard; auto-save does not extend it.
Submission modes
- repo_url
- file_upload
The first mode listed is the default on the submit screen.
Rubric
Each dimension scored on [0.0, 1.0] in 0.05 increments. The overall score is the weighted average; pass at 65%.
| Dimension | Weight | What it tests |
|---|---|---|
Diagnosis speed diagnosis_speed | 20% | Time from incident open to confirmed root cause. |
Hotfix correctness hotfix_correctness | 25% | Fix actually addresses the failure class. |
Regression safety regression_safety | 20% | Regression test guards the failure class going forward. |
Post-mortem quality postmortem_quality | 20% | Blameless, specific, with concrete follow-ups. |
Communication communication | 15% | Status updates during the live window were clear. |
Failure modes
Self-checks the learner answers before submit. Critical checks block submission unless explicitly forced; the force flag is then surfaced to the mentor.
- F1
Did your hotfix include a regression test?
critical
- F2
Did you publish at least one status update during the live window?
reflective
- F3
Does the post-mortem name follow-ups with owners?
reflective
Skill assertions on offer
On a passing review the mentor selects a subset of these to assert, with an asserted weight bounded by the per-skill ceiling shown below.
- max weight 1.00
llm.safety.hallucination-mitigation
LLM safety — hallucination mitigation
- max weight 1.00
llm.ops.incident-response
LLM ops — incident response
Mentor SLA
48h
From mentor claim to signoff.
Pass threshold
65%
Weighted-average overall score.
Re-attempts
1
Higher of the two scores flows to the credential.
Start this twin
The clock starts when you press start. Read the brief above first. You will be asked to sign in if you have not already.