Knowledge tracing without the black box

01 — Context

An adaptive-practice product used a knowledge-tracing model to decide what each student saw next. It worked — but no one outside the data team could say why a student was shown a particular problem, and teachers had quietly stopped trusting the recommendation.

A model nobody can question is a model nobody can correct.

02 — The real decision

The question was not “is the model accurate?” It was: can a teacher and an engineer hold the same mental picture of what the model believes, and act on it? Accuracy that can’t be inspected buys very little in a classroom.

A recommendation a teacher can’t interrogate isn’t personalization. It’s a slot machine with a progress bar.

03 — My role

I owned the translation layer: turning the BKT parameters — prior knowledge, learn rate, slip, and guess — into language a teacher could reason with, and a set of guardrails an engineer could enforce. I did not rebuild the model; I made its beliefs legible.

04 — Constraints

No retraining: The change had to wrap the existing model, not replace it.
Glanceable: A teacher needed the gist in seconds, the detail on demand.
Honest about doubt: Low-confidence beliefs had to look low-confidence.

05 — The logic used

We exposed the four BKT parameters as a small, named story per skill: what we assumed coming in, how fast this student tends to learn it, and how noisy the evidence is. Slip and guess stopped being hidden knobs and became a stated reason a green cell might still be wrong.

prior knowledge  → "where we started believing"
learn rate       → "how fast this clicks for them"
slip / guess     → "how noisy the evidence is"
posterior        → "what we believe now, and how sure"

06 — Alternatives considered

We could have shown a single mastery percentage and hidden the machinery. It tested well in demos and badly in classrooms: teachers either over-trusted it or ignored it entirely. Exposing the uncertainty cost us a cleaner-looking UI and bought us a model teachers would actually argue with.

07 — The system designed

Signature module Two readings of one student

Same answers, two stories: a percent-correct score and the BKT posterior disagree about what this student knows.

Percent: 60% correct — flattens easy and hard into one number.
Posterior: Likely mastered; two misses were low-confidence slips.
Why it matters: The percent would re-teach what the student already holds.

[ Abstracted skill-belief panel ]

Fig. 1 — Reconstructed from the production panel; parameters and learner data are synthetic.

08 — Validation & quality criteria

Every recommendation could be traced to a stated belief and its confidence.
Teachers in review sessions could correctly predict the next item the model would choose — the test that it had become legible.
A confidently-wrong belief was logged as a defect, not smoothed over.

09 — Reflections

The win was not a better model; it was a model that earned the right to be questioned. Legibility is not a UI veneer over the math — it is a constraint you design the math to satisfy.