A mastery model that survived a real curriculum

01 — Context

A mathematics program had adopted a mastery framing on paper, but operated as a coverage checklist in practice. Teachers marked standards “done”; the system reported green; and nobody could say what a green cell actually entitled you to believe about a student.

The brief looked like a reporting problem. It was a measurement problem wearing a reporting costume.

02 — The real decision

The decision was not “which dashboard?” It was: what is the unit of mastery, and what evidence licenses the claim that a student holds it? Everything downstream — item design, reporting, intervention — is determined by that one answer.

A mastery model is a claim about a learner. If you can’t say what would make the claim false, you are not measuring — you are decorating.

03 — My role

I led the curricular strategy and owned the measurement logic end to end — defining the constructs, designing and validating items, and specifying how the model would be read by teachers and rendered by the product. I worked between the classroom, the psychometrics, and the engineering, and translated in all three directions.

04 — Constraints

Minutes, not hours: Teachers had to interpret a result at a glance.
Finite item volume: We could not test every sub-skill directly.
Audit-ready: Reporting had to survive a skeptical head of department.

05 — The logic used

We modeled each mastery target as a latent construct with an explicit evidence model, calibrated with Item Response Theory so that difficulty and discrimination were properties of items, not opinions. Where targets were sequential, Bayesian Knowledge Tracing carried belief forward instead of resetting it every assessment.

construct        → evidence model → item bank
response         → IRT calibration → ability estimate
prior × evidence → BKT posterior   → mastery claim

06 — Alternatives considered

A raw percent-correct cutoff was simplest but conflated easy and hard evidence. A pure machine-learned classifier predicted well but couldn’t be explained to a teacher or defended in an audit. We chose the model we could argue for, accepting a small cost in raw fit for a large gain in legibility and accountability.

07 — The system designed

The output was not a dashboard but a small, honest object: a mastery claim, the evidence behind it, and a stated confidence — designed so a teacher could disagree with it intelligently.

Signature module The mastery claim, dissected

This student can model a linear relationship from a table — not just complete the worksheet that contained one.

Claim: States the construct, the conditions, and the “again”.
Evidence: Four items across two difficulties, plus one transfer task.
Confidence: High — held out, not self-confirmed.

[ Abstracted mastery-claim card ]

Fig. 1 — Reconstructed from the production card; axes relabeled, data synthetic.

08 — Abstracted artifacts

[ Construct map ]

[ Item-validation report ]

09 — Validation & quality criteria

Items passed fit statistics and were reviewed for construct relevance, not just difficulty.
Mastery claims were checked against held-out performance, not against themselves.
A claim a teacher couldn’t act on was treated as a defect, not a feature.

10 — Reflections

The hardest work was deciding what not to measure. A smaller set of well-evidenced claims beat a complete map of guesses — and it is the part that transfers to every measurement problem I have touched since.