Job Twin brief — UpSkillZone AI

Capstone — Production AI service

capstone·14 days·pass 75%

Scenario

Ship an end-to-end production AI service of your choosing. Two-week build window. Two mentor reviewers (separate ledger). Public artifact at the end.

Deliverables

Repo — production-grade, with CI, tests, and a deployable container.
Evals — domain-specific suite with adversarial coverage.
Security review — threat model and at least one mitigated risk.
Reflection — what you'd do differently with another two weeks.
Public artifact — demo, write-up, or talk.

Materials

Capstone handbookdoc
Reference architecturesrepo

Time-box

14 days

Server-authoritative clock. The deadline is hard; auto-save does not extend it.

Submission modes

repo_url

The first mode listed is the default on the submit screen.

Rubric

Each dimension scored on [0.0, 1.0] in 0.05 increments. The overall score is the weighted average; pass at 75%.

Dimension	Weight	What it tests
Problem framing problem_framing	15%	Clear user, clear value, clear scope.
System design system_design	20%	Architecture matches the constraints; tradeoffs named.
Production quality production_quality	20%	CI, container, observability, runbook.
Evals coverage evals_coverage	15%	Domain-specific suite with adversarial cases.
Security posture security_posture	10%	Threat model plus at least one mitigated risk.
Reflection reflection	10%	Honest account of what you'd do differently.
Polish polish	10%	Public artifact is something you'd link from a resume.

Failure modes

Self-checks the learner answers before submit. Critical checks block submission unless explicitly forced; the force flag is then surfaced to the mentor.

F1
Does `pytest` pass on a clean clone?
critical
F2
Does a deployable container exist and start?
critical
F3
Is the evals suite runnable end-to-end?
reflective
F4
Is the public artifact actually public?
reflective

Skill assertions on offer

On a passing review the mentor selects a subset of these to assert, with an asserted weight bounded by the per-skill ceiling shown below.

llm.ops.system-design
LLM ops — system design
max weight 1.00
llm.evals.dataset-design
LLM evals — dataset design
max weight 1.00
llm.safety.security-review
LLM safety — security review
max weight 0.90
llm.api.production-readiness
LLM API — production readiness
max weight 1.00

Mentor SLA

168h

From mentor claim to signoff.

Pass threshold

75%

Weighted-average overall score.

Re-attempts

none

The capstone is the final exam.

Start this twin

The clock starts when you press start. Read the brief above first. You will be asked to sign in if you have not already.

Open jt-capstone-6 in dashboard →