UpSkillZone AI

Job Twin brief — UpSkillZone AI

Job Twin 3 — Production deployment + observability

day twin·8 hours·pass 70%

Scenario

Scenario

Take an existing model-serving service and harden it for production: containerize, add health/readiness probes, structured logs, OpenTelemetry traces, and a basic SLO dashboard.

Time-box: 8 hours. Submit a runnable repository.

Deliverables

Deliverables

  1. Container — multi-stage Dockerfile, non-root user, healthcheck.
  2. Probes/healthz and /readyz with meaningful semantics.
  3. Telemetry — structured JSON logs + OTel traces on the inference path.
  4. SLO dashboard — at least latency p50/p95/p99 and error rate.

Materials

Time-box

8 hours

Server-authoritative clock. The deadline is hard; auto-save does not extend it.

Submission modes

  • repo_url

The first mode listed is the default on the submit screen.

Rubric

Each dimension scored on [0.0, 1.0] in 0.05 increments. The overall score is the weighted average; pass at 70%.

DimensionWeightWhat it tests

Containerization

containerization

15%Multi-stage build, non-root, reproducible.

Health probes

health_probes

10%Liveness vs readiness reflect real dependencies.

Structured logging

structured_logging

15%JSON logs with request IDs and consistent fields.

Telemetry

telemetry

15%OTel traces span the inference path end-to-end.

SLO dashboard

slo_dashboard

15%Latency percentiles and error rate are visible.

Code quality

code_quality

15%Readable, deterministic, runs.

Docs quality

docs_quality

15%README explains how to run, deploy, and observe.

Failure modes

Self-checks the learner answers before submit. Critical checks block submission unless explicitly forced; the force flag is then surfaced to the mentor.

  • F1

    Does the container build and run on a clean clone?

    critical

  • F2

    Do `/healthz` and `/readyz` actually probe dependencies?

    reflective

  • F3

    Are traces propagated across at least one service boundary?

    reflective

Skill assertions on offer

On a passing review the mentor selects a subset of these to assert, with an asserted weight bounded by the per-skill ceiling shown below.

  • llm.ops.containerization

    LLM ops — containerization

    max weight 1.00
  • llm.ops.observability

    LLM ops — observability

    max weight 1.00

Mentor SLA

72h

From mentor claim to signoff.

Pass threshold

70%

Weighted-average overall score.

Re-attempts

1

Higher of the two scores flows to the credential.

Start this twin

The clock starts when you press start. Read the brief above first. You will be asked to sign in if you have not already.

Open jt-prod-deploy-3 in dashboard →