Job Twin brief — UpSkillZone AI
Job Twin 3 — Production deployment + observability
Scenario
Scenario
Take an existing model-serving service and harden it for production: containerize, add health/readiness probes, structured logs, OpenTelemetry traces, and a basic SLO dashboard.
Time-box: 8 hours. Submit a runnable repository.
Deliverables
Deliverables
- Container — multi-stage Dockerfile, non-root user, healthcheck.
- Probes —
/healthzand/readyzwith meaningful semantics. - Telemetry — structured JSON logs + OTel traces on the inference path.
- SLO dashboard — at least latency p50/p95/p99 and error rate.
Materials
- Starter reporepo
- OTel quickstartdoc
Time-box
8 hours
Server-authoritative clock. The deadline is hard; auto-save does not extend it.
Submission modes
- repo_url
The first mode listed is the default on the submit screen.
Rubric
Each dimension scored on [0.0, 1.0] in 0.05 increments. The overall score is the weighted average; pass at 70%.
| Dimension | Weight | What it tests |
|---|---|---|
Containerization containerization | 15% | Multi-stage build, non-root, reproducible. |
Health probes health_probes | 10% | Liveness vs readiness reflect real dependencies. |
Structured logging structured_logging | 15% | JSON logs with request IDs and consistent fields. |
Telemetry telemetry | 15% | OTel traces span the inference path end-to-end. |
SLO dashboard slo_dashboard | 15% | Latency percentiles and error rate are visible. |
Code quality code_quality | 15% | Readable, deterministic, runs. |
Docs quality docs_quality | 15% | README explains how to run, deploy, and observe. |
Failure modes
Self-checks the learner answers before submit. Critical checks block submission unless explicitly forced; the force flag is then surfaced to the mentor.
- F1
Does the container build and run on a clean clone?
critical
- F2
Do `/healthz` and `/readyz` actually probe dependencies?
reflective
- F3
Are traces propagated across at least one service boundary?
reflective
Skill assertions on offer
On a passing review the mentor selects a subset of these to assert, with an asserted weight bounded by the per-skill ceiling shown below.
- max weight 1.00
llm.ops.containerization
LLM ops — containerization
- max weight 1.00
llm.ops.observability
LLM ops — observability
Mentor SLA
72h
From mentor claim to signoff.
Pass threshold
70%
Weighted-average overall score.
Re-attempts
1
Higher of the two scores flows to the credential.
Start this twin
The clock starts when you press start. Read the brief above first. You will be asked to sign in if you have not already.