Job Twin brief — UpSkillZone AI

Job Twin 3 — Production deployment + observability

day twin·8 hours·pass 70%

Scenario

Take an existing model-serving service and harden it for production: containerize, add health/readiness probes, structured logs, OpenTelemetry traces, and a basic SLO dashboard.

Time-box: 8 hours. Submit a runnable repository.

Deliverables

Container — multi-stage Dockerfile, non-root user, healthcheck.
Probes — /healthz and /readyz with meaningful semantics.
Telemetry — structured JSON logs + OTel traces on the inference path.
SLO dashboard — at least latency p50/p95/p99 and error rate.

Materials

Starter reporepo
OTel quickstartdoc

Time-box

8 hours

Server-authoritative clock. The deadline is hard; auto-save does not extend it.

Submission modes

repo_url

The first mode listed is the default on the submit screen.

Rubric

Each dimension scored on [0.0, 1.0] in 0.05 increments. The overall score is the weighted average; pass at 70%.

Dimension	Weight	What it tests
Containerization containerization	15%	Multi-stage build, non-root, reproducible.
Health probes health_probes	10%	Liveness vs readiness reflect real dependencies.
Structured logging structured_logging	15%	JSON logs with request IDs and consistent fields.
Telemetry telemetry	15%	OTel traces span the inference path end-to-end.
SLO dashboard slo_dashboard	15%	Latency percentiles and error rate are visible.
Code quality code_quality	15%	Readable, deterministic, runs.
Docs quality docs_quality	15%	README explains how to run, deploy, and observe.

Failure modes

Self-checks the learner answers before submit. Critical checks block submission unless explicitly forced; the force flag is then surfaced to the mentor.

F1
Does the container build and run on a clean clone?
critical
F2
Do `/healthz` and `/readyz` actually probe dependencies?
reflective
F3
Are traces propagated across at least one service boundary?
reflective

Skill assertions on offer

On a passing review the mentor selects a subset of these to assert, with an asserted weight bounded by the per-skill ceiling shown below.

llm.ops.containerization
LLM ops — containerization
max weight 1.00
llm.ops.observability
LLM ops — observability
max weight 1.00

Mentor SLA

72h

From mentor claim to signoff.

Pass threshold

70%

Weighted-average overall score.

Re-attempts

Higher of the two scores flows to the credential.

Start this twin

The clock starts when you press start. Read the brief above first. You will be asked to sign in if you have not already.

Open jt-prod-deploy-3 in dashboard →