The Body Knows What Words Cannot Say
Why AI Mastered Language Before It Could Move — and What That Tells Us About Intelligence Itself
Every child arrives at walking before they arrive at words. Somewhere around their first birthday, long before they can name the thing they are doing, they develop one of the most complex coordinated behaviours in the animal kingdom: upright bipedal locomotion on a shifting, unpredictable surface. They do this without instruction manuals, without labelled datasets, without being told what a "hip hinge" is. The knowledge is in the body first.
Artificial intelligence inverted this sequence entirely. Within a few years of each other, AI systems learned to hold a conversation, write poetry, and pass medical licensing exams. Fluid, graceful, contextually intelligent movement — the kind a trained dancer or a martial artist or an experienced physiotherapist takes for granted — remains, by contrast, a hard and largely unsolved problem. A robot that can beat the world chess champion will still stumble on an uneven footpath. An AI trained on thousands of hours of yoga footage cannot tell you what a practitioner is actually doing with their attention.
This inversion is not a quirk of engineering timelines. It reveals something deep about the nature of intelligence itself.
The Two Kinds of Knowing
In the 1950s, the philosopher and scientist Michael Polanyi noticed something that should have been obvious but wasn't: we know far more than we can tell. He called what we can articulate explicit knowledge — the kind that lives in textbooks, instructions, and propositions. But beneath it lies a vast, mostly silent substrate he named tacit knowledge: the feel of a bicycle finding balance, the surgeon's sense of how much pressure a tissue will bear, the way a skilled potter reads clay through their palms.
Polanyi's famous example was face recognition. You can identify a friend in a crowd instantly and reliably. You almost certainly cannot describe how you do it in enough detail for someone else to replicate the feat. The knowledge is real, functional, and entirely resistant to verbalization.
Somatic practice — the family of disciplines that includes dance, martial arts, embodied yoga, Feldenkrais work, Continuum, and movement-based psychotherapy — is essentially the deliberate cultivation of tacit knowledge. Practitioners spend years learning to perceive and modulate phenomena that most people never notice: the micro-adjustments of weight distribution before a step is taken, the difference between initiated and responsive movement, the felt sense of a breath that organizes the whole spine. This is sophisticated knowledge. It is also, almost by definition, pre-linguistic.
Which means it is almost invisible to the methods AI currently uses to learn.
What Motion Capture Actually Captures
Contemporary AI learns about movement primarily through two pathways: video and motion capture. Both are genuinely impressive technologies. Motion capture systems can record the three-dimensional position of dozens of skeletal landmarks at hundreds of frames per second. Video analysis can now track pose and velocity with remarkable accuracy.
What neither technology captures is the interior of the movement.
Somatic practitioners routinely distinguish between movements that look identical from the outside but feel entirely different from the inside — and produce different effects in the body of a practitioner, in the quality of touch transmitted to another person, in the nervous system's response. A Feldenkrais practitioner adjusting someone's shoulder is doing something categorically different from a physiotherapist performing the same gesture, even if the joint angles are the same. A tai chi master's step has qualities — a particular organization of intention, weight, and attention — that cannot be recovered from the path of any skeletal marker.
Motion capture gives us the shape of movement. It misses the texture, the quality, the felt intention. This is not a gap that better hardware solves. It is an ontological problem: the data being recorded is simply a different thing from the phenomenon being studied.
Why More Data Makes the Problem Worse
The standard response to AI's limitations is, predictably, more data. More video. More annotations. Bigger training sets. For many domains, this works. Language models genuinely improved as they consumed more text.
For embodied knowledge, the scaling logic breaks down — and breaks down in an interesting way. The problem is not quantity. It is the assumption that movement knowledge can be adequately represented in any external observational record.
Consider what annotation requires. A human expert watches a video and labels what they see. But the categories available to that expert are linguistic, and therefore already a translation. When a dance teacher says a student's movement "lacks groundedness," they are not describing a geometric property. They are pointing toward a quality that they recognize through their own bodily experience — a proprioceptive and kinesthetic resonance that the word "groundedness" gestures at rather than captures. The label is a lossy compression of the knowledge.
Scale up this process across thousands of hours of footage and millions of labels, and you have not reduced the translation problem. You have industrialized it. The resulting dataset is rich in descriptions of movement. It contains almost nothing of movement's actual intelligence.
This is compounded by cultural depth. Somatic traditions are not neutral. They carry epistemologies, lineages, healing frameworks, and ways of attending to the body that are specific to place, time, and community. A movement that is therapeutic in one tradition may be contraindicated in another not because the biomechanics differ but because the context — the meaning, the relational field, the practitioner's state — is doing the work. No annotation schema currently accounts for this.
What a Genuinely Movement-Intelligent AI Would Need
For AI to genuinely engage with somatic knowledge, the field would need to rethink what it is trying to capture. This means moving beyond performance data toward process data: the experience of movement as it unfolds. It means building methodologies that can hold qualitative, relational, and cultural information without collapsing it into fixed categories. It means treating practitioners not as sources of labelled examples but as co-investigators who bring irreplaceable knowledge.
It means, in short, taking seriously that the body is an organ of intelligence — not a vehicle that executes instructions issued by the brain, but a distributed knowing system whose sophistication took millions of years of evolution and, in the case of master practitioners, decades of disciplined attention to develop.
The child who learned to walk before they could speak had access to a pedagogical environment of extraordinary richness: a body with built-in sensorimotor feedback, a social world of demonstrating adults, a task with clear stakes. Current AI has access to none of these in any meaningful form.
The gap will not be closed by the next generation of cameras. It will require a genuinely different theory of what movement knowledge is — and a willingness to learn from the people who have spent their lives cultivating it.
Further Reading
Dreyfus, H. L. (1972). What computers can't do: A critique of artificial reason. Harper & Row.
Gallagher, S. (2005). How the body shapes the mind. Oxford University Press. https://doi.org/10.1093/0199271941.001.0001
Merleau-Ponty, M. (2012). Phenomenology of perception (D. A. Landes, Trans.). Routledge. (Original work published 1945)
Polanyi, M. (1966). The tacit dimension. Doubleday.
Sheets-Johnstone, M. (2011). The primacy of movement (2nd ed.). John Benjamins. https://doi.org/10.1075/aicr.82
Varela, F. J., Thompson, E., & Rosch, E. (1991). The embodied mind: Cognitive science and human experience. MIT Press.