UpSkillZone AI

Job Twin brief — UpSkillZone AI

Capstone — Production AI service

capstone·14 days·pass 75%

Scenario

Scenario

Ship an end-to-end production AI service of your choosing. Two-week build window. Two mentor reviewers (separate ledger). Public artifact at the end.

Deliverables

Deliverables

  1. Repo — production-grade, with CI, tests, and a deployable container.
  2. Evals — domain-specific suite with adversarial coverage.
  3. Security review — threat model and at least one mitigated risk.
  4. Reflection — what you'd do differently with another two weeks.
  5. Public artifact — demo, write-up, or talk.

Materials

Time-box

14 days

Server-authoritative clock. The deadline is hard; auto-save does not extend it.

Submission modes

  • repo_url

The first mode listed is the default on the submit screen.

Rubric

Each dimension scored on [0.0, 1.0] in 0.05 increments. The overall score is the weighted average; pass at 75%.

DimensionWeightWhat it tests

Problem framing

problem_framing

15%Clear user, clear value, clear scope.

System design

system_design

20%Architecture matches the constraints; tradeoffs named.

Production quality

production_quality

20%CI, container, observability, runbook.

Evals coverage

evals_coverage

15%Domain-specific suite with adversarial cases.

Security posture

security_posture

10%Threat model plus at least one mitigated risk.

Reflection

reflection

10%Honest account of what you'd do differently.

Polish

polish

10%Public artifact is something you'd link from a resume.

Failure modes

Self-checks the learner answers before submit. Critical checks block submission unless explicitly forced; the force flag is then surfaced to the mentor.

  • F1

    Does `pytest` pass on a clean clone?

    critical

  • F2

    Does a deployable container exist and start?

    critical

  • F3

    Is the evals suite runnable end-to-end?

    reflective

  • F4

    Is the public artifact actually public?

    reflective

Skill assertions on offer

On a passing review the mentor selects a subset of these to assert, with an asserted weight bounded by the per-skill ceiling shown below.

  • llm.ops.system-design

    LLM ops — system design

    max weight 1.00
  • llm.evals.dataset-design

    LLM evals — dataset design

    max weight 1.00
  • llm.safety.security-review

    LLM safety — security review

    max weight 0.90
  • llm.api.production-readiness

    LLM API — production readiness

    max weight 1.00

Mentor SLA

168h

From mentor claim to signoff.

Pass threshold

75%

Weighted-average overall score.

Re-attempts

none

The capstone is the final exam.

Start this twin

The clock starts when you press start. Read the brief above first. You will be asked to sign in if you have not already.

Open jt-capstone-6 in dashboard →