Picture a teacher on a Tuesday morning at 7:45 AM: coffee cooling, dashboard glowing. Each of her students has been reduced, on the screen in front of her, to a single row of colored bars. Behind those bars, hidden from her view but very much running, lives some of the most sophisticated machinery in educational research. For two decades the field — Carnegie Mellon's PSLC DataShop, Stanford's open EdNet replications, the ASSISTments trials — has been quietly answering one question with extraordinary precision: how likely is this student to get the next problem right? Newer architectures (DKT, SAKT, AKT) push the predictive AUC higher with deep nets, but they have not changed the shape of the question. A stream of correct-or-incorrect bits flows in. A probability of the next bit flows out. To see what that means (and what it cannot see) we need to meet the two models quietly running inside almost every production mastery system: Bayesian Knowledge Tracing and Learning Factors Analysis.
In 1995, Albert Corbett and John Anderson at Carnegie Mellon proposed something elegant. Picture each skill in a child's mind as a single hidden coin, heads if she knows it, tails if she does not, and picture that you can never look at the coin directly. All you can do is watch her answers and, in good Bayesian fashion, update your belief about the coin after each one. A correct answer makes knows it more probable, but you discount that evidence by the chance she just guessed. A wrong answer makes doesn't more probable, but you discount it by the chance she knew it and slipped. Then comes the move that gives BKT its quiet beauty: you assume that the very act of attempting nudges the coin a little toward knowing, even when the attempt missed. Four numbers do all the work.
Eleven years later, in 2006, Hao Cen, Ken Koedinger, and Brian Junker took a different angle. Drop the hidden coin entirely, they said; just watch how the probability of a right answer rises sigmoidally as the student accumulates practice on a skill. Three parameters: the student's overall strength θ, the skill's intrinsic difficulty β, and a learning rate γ that controls how steeply the curve climbs and you have a model whose γ answers the one question every teacher actually asks: how many more attempts until she has it? The recent deep replacements (DKT, SAKT, AKT) swap the closed-form posterior for an RNN or transformer head, but the answer they hand back to the teacher has the same shape.
One coin per skill, flipped every attempt.
One sigmoid per skill, fit on the cohort.
Both models are honest, fast, and, for the question they ask, astonishingly good. They are also, philosophically, doing the same strange thing: taking the rich, branching, sometimes wandering texture of a child's reasoning and crushing it down to a single bit per attempt, then inferring a probability over a single hidden scalar. Two students who arrive at the same wrong answer through wildly different roads of reasoning are, to the model, the same student. The interior life of the thought has been thrown away. In 1998, two philosophers, Andy Clark and David Chalmers, asked a question that turns out to matter here: where, exactly, does the mind end? Their answer, the Extended Mind hypothesis, was that a student's notebook is not a record of her cognition but a part of it, functionally on equal footing with what is in her head. If that is right (and we believe it is), then the right object to compute on is not the attempt stream alone. It is the whole externalized graph of her thinking: her freewrites, her zettels, the canvas she is drawing on.
What does that graph actually carry? Let me introduce you to the cognitive move that, more than any other, separates a mature mathematical mind from a beginner's. It has no glamorous name; we will call it transposition: the act of looking at a problem in one representational space and recognizing it, suddenly, as a problem in another, where the math becomes tractable. The history of mathematics is, more than anything else, a sequence of great transpositions. Three of them carry most of the modern enterprise, and almost every K-12 graduate has met their consequences without ever being shown the move itself.
Each row is one mature mathematician's move: see this problem in a different space, where the math is easier.
Descartes, 1637. Every geometric incidence becomes a polynomial equation. The student who owns φ_Cartesian sees a circle and the equation as two views of the same object.
A novice reader sees a sequence of scenes. A mature reader sees the same scenes as evidence of who wants what — and the love triangle becomes the engine of the whole plot.
n₁+n₂=5, n₁·n₂=6
A polynomial-factoring problem becomes a tiny arithmetic puzzle. Find two integers that multiply to 6 and add to 5 — the factors (x+2)(x+3) fall out.
How do we make this precise? Think of each subject domain, algebra, geometry, the analysis of a novel, as a small language. It has nouns (its objects and concepts) and grammar (the relations and operations that hold its nouns together). A transposition is a faithful translation between two such languages, in this strict sense: not only does every noun on one side get a translation on the other, but every grammatical relation does too. "Parallel" on the geometry side translates to "same slope" on the algebra side and that means a sentence like "these two lines are parallel" can be re-asked as "these two equations have the same slope" without losing its meaning. When such a translation exists between two languages, the same problem can be posed in either one, and the right side of the translation is the side where the problem becomes easier. We will write this faithful translation as φ : P₁ → P₂.
To see this with full force, open The Great Gatsby. On the surface of the page, you are reading a sequence of scenes: a man named Nick narrates his summer in West Egg; he attends a party thrown by a mysterious millionaire named Gatsby; he watches Gatsby stand alone at the end of a dock, staring across a bay at a green light. A novice reader stays on that surface — the green light is a green light, the parties are parties, the events are the events. A mature reader, almost without noticing, performs a transposition. She moves out of the narrative language and into the language of characters and desires, where the people are not merely doing things but wanting things, where their wants press against one another, and where what looks on the page like "Gatsby threw a party" is really "Gatsby threw a party as a beacon for Daisy." She moves into a language of symbols, too, where the green light is no longer a light but a hope: the unreachable past Gatsby is trying to row backward into. The whole novel reorganizes itself: scenes become consequences of desires; objects become images of those desires; a love triangle becomes the engine of the plot. The same words on the page, faithfully translated into an entirely different language, in which the deeper question: what is this book about? finally becomes tractable. This is the move every English teacher in the country is trying, with varying success, to coach.
A sharper, smaller example — straight from algebra class. Suppose I ask you to factor x² + 5x + 6. There are several ways in. You could reach for the quadratic formula and crank out the discriminant, completely valid, but algorithmic overkill for an expression this clean. You could complete the square, elegant, but heavier than the problem demands. Or you could apply the transposition that François Viète laid down in the late sixteenth century and that we now teach (when we teach it at all) as Vieta's relations: if x² + bx + c factors as (x + n₁)(x + n₂), then it must be that n₁ + n₂ = b and n₁ · n₂ = c. With that one observation, the problem has changed languages. It is no longer about manipulating a polynomial. It is a tiny arithmetic puzzle: find two numbers that multiply to 6 and add to 5. The student's mind, almost without noticing, runs through pairs: (1, 6) sums to 7, no; (2, 3) sums to 5, yes, and the factorization (x + 2)(x + 3) falls out. The whole sub-skill chain (identify a, b, c; recall Vieta; search the integer pairs with the right product; check the sum; write the factored form) is a small thought-program. The transposition φ_Vieta is the move that makes the whole program cheap.
This, finally, is the structure we model. A student's thought-program, as we render it on the Notebook canvas, is a typed sub-graph parsed live from her freewrite by a Cerebras gpt-oss-120b loop. Each vertex is a labeled cognitive move (recall, compute, doubt, transpose, sum); each edge is a dependency between moves; and the catalog of transpositions Φ = { φ_Cartesian, φ_Character, φ_Vieta, … } is fixed across subjects, because the transpositions are. Where BKT puts a posterior on a hidden binary scalar, Luna puts a posterior on the catalog of transpositions a student carries. The Bayesian move is exactly the one Corbett and Anderson made in 1995. Only the latent variable has gone from a single number to a structured library of mental transformations.
How do you actually compute that likelihood? You match graphs. We embed both the student's parsed thought-graph and the canonical pattern of each transposition with a small graph neural network operating over the typed-edge structure; cosine similarity in that embedding is the soft form of the match. Her zettelkasten enters the model with the same weight as a freewrite, Clark and Chalmers' extended-mind hypothesis made literal in software. A [[wikilink]] term in a passage snaps into her personal graph and counts as evidence on equal footing with anything we infer from her writing. The teacher does not see a percentage. She sees a sentence: Maya owns φ_Cartesian and φ_Character fluently; she reached for φ_Vieta on Tuesday and abandoned it before checking the second integer pair; the next problem should be one that pushes the abandonment toward a successful match.
THE move. By Vieta's relations, if x²+bx+c = (x+n₁)(x+n₂) then n₁+n₂ = b and n₁·n₂ = c. The factoring problem has just become a pair-finding problem in two unknowns. The Cerebras loop tags this edge "φ_Vieta" in her thought-graph.
- recall— pull from owned model
- compute— evaluate piece
- doubt— flagged stuck
- transpose— change problem space
- sum— combine / answer
Maya executed φ_Vieta. Posterior over her transposition catalog updates: +0.21 on Vieta. Suggest next: a quadratic with leading coefficient ≠ 1 (e.g. 2x² + 7x + 3) to test whether she can extend Vieta to non-monic forms.
Jean Piaget, watching his own children outgrow one schema after another in his Geneva apartment, called this rhythm equilibration. It is the breathing of the mind between assimilation: fitting a new problem into a schema you already have, and accommodation or stretching the schema to absorb a problem that does not quite fit. The thought-canvas externalizes that breath. The student watches herself reach for a transposition, notice it doesn't quite hold, and stretch the schema; she watches herself, in a sense, learn. It is the closest thing computer-assisted education has come, so far, to a meditative practice, and we mean that precisely, not metaphorically. The unit of mastery, in this picture, is no longer a probability over a hidden bit. It is the catalog of mental moves a student can make, the graph she is editing in front of her own eyes, and the inventory of transpositions she has, at last, taken into her own ownership. That is the unit BKT and LFA could not see. It is the unit the bit could not carry. It is the one Luna is built to model.
For more on how this works, or to try Luna out yourself (High School Pre-Algebra, Algebra 1, English Language Arts, and Chemistry are now available for free for all learners) check us out here: lunalearn.io