Somatic-AI Content Platform

The Anticipatory Body: Predictive Coding in Nervous Systems and Generative AI as Convergent Architectures

Confidence Label: Plausible

Connection type: Structural homology between biological and computational anticipatory systems — convergent but not equivalent

The Synthesis

Somatic movement intelligence and generative AI are converging on the same architectural principle from opposite directions: prediction as the primary organising function of a movement system.

The convergence is not metaphorical. It reflects a deep structural homology between how the nervous system generates movement and how a new generation of generative video and motion models produces temporally coherent sequences. Both systems treat the current sensory or kinematic state not as a fixed input to be processed, but as an error signal relative to an internal prediction. Both systems improve by reducing that error. Both systems can be disrupted by inputs that violate the prediction in surprising ways.

Understanding this convergence is important for practitioners and researchers in somatic-AI co-creation because it reframes what AI motion generation is doing — and, more importantly, what it is not yet doing that somatic practice already does.

Predictive Coding in the Nervous System

The predictive coding framework in neuroscience, developed from Karl Friston's free energy principle and earlier work by Helmholtz, Rao, and Ballard, proposes that the nervous system is fundamentally a prediction machine. Rather than passively receiving sensory input and constructing a perception of the world, the brain continuously generates predictions about what it expects to sense. The sensory signal that actually arrives is compared against this prediction; only the error — the difference between predicted and actual — propagates up the cortical hierarchy for further processing.

For movement, this architecture has a specific implication. Motor commands are not reactive — they are anticipatory. The motor cortex issues a prediction of what proprioceptive feedback should feel like as a result of the intended movement, before the movement happens. The actual movement then unfolds as a process of minimising the error between that predicted proprioceptive state and the incoming sensory signal. This is why movements can happen far faster than conscious reaction time: the movement is guided by a forward model that predicts its own consequences and adjusts before feedback could possibly arrive.

Crucially, this means that the nervous system's representation of movement is not primarily a record of what happened — it is a record of what was expected, plus the residual error. Movement skill, in this framework, is the precision of the prediction: a highly skilled mover has accurate, fine-grained forward models that generate almost no prediction error. A beginner generates large errors that require repeated correction.

Predictive Architectures in Generative AI

The parallel development in AI has proceeded independently. The Video Generation with Predictive Latents paper published this month (arXiv:2605.02134) introduces a training objective in which the decoder must not only reconstruct observed frames but simultaneously predict future frames that have not been seen. This predictive reconstruction objective forces the model's latent representations to encode the causal structure of the sequence — the rules that govern how the current state will evolve — rather than just the appearance of individual frames.

This is precisely the distinction the predictive coding framework draws in neuroscience: a system trained on reconstruction learns to compress what it has seen; a system trained on prediction learns the causal dynamics of what it is watching.

The implications compound: a generative model with predictive latents can be conditioned on partial sequences and extrapolate them in causally coherent ways. It can respond to a beginning of a movement phrase and generate a continuation that respects the kinetic logic of that beginning — not by finding the statistically most likely next frame, but by applying a forward model of the movement's own physics and rhythm.

Where the Convergence Holds

The structural homology is genuine in three respects:

Architecture: Both systems use layered hierarchies where higher levels encode more abstract, slower-changing aspects of the sequence (the movement phrase, the scene) and lower levels encode faster-changing, fine-grained aspects (the individual joint transition, the pixel value). Both systems propagate only errors between layers, not the full signal.

Learning: Both systems improve by reducing prediction error. The nervous system reduces proprioceptive prediction error through motor learning; the AI model reduces reconstruction prediction error through gradient descent. The objective is formally similar: minimise the divergence between the internal model's prediction and the observed data.

Anticipation: Both systems can generate movement in advance of sensory confirmation — the nervous system through motor commands that precede feedback, the AI model through latent extrapolation that generates future frames before they are "observed."

Where the Convergence Breaks Down

The differences are as important as the similarities, and they are precisely where somatic practice adds what the AI architecture lacks.

The content of the prediction. Nervous systems make predictions about proprioceptive experience — the felt quality of the upcoming movement: where the weight will be, what the joint will feel like, how much effort will be required. AI models trained on video data make predictions about pixel configurations. The nervous system predicts the movement from the inside; the AI predicts its appearance from the outside.

Embodied error signals. In the nervous system, a prediction error is a felt discrepancy — the surprise of landing heavier than expected, the shock of a surface that gives more than anticipated. This felt error has motivational and attentional significance; it redirects the mover's attention and modifies future predictions. In AI systems, prediction error is a mathematical quantity in a loss function with no felt dimension.

Prior accumulation. The nervous system's forward models are built over years of embodied experience — every step taken, every fall recovered from, every partner's weight felt. An AI model trained on a video dataset has learned statistical patterns in recorded human movement, but it has no body with which to have had the experiences that generated those patterns. Its predictions are abstractions over others' embodied histories; the nervous system's predictions are distillations of its own.

The Implication for Somatic-AI Co-Creation

The convergence matters because it identifies the precise point where somatic intelligence and AI architecture can be connected rather than merely juxtaposed.

If AI motion systems are becoming prediction machines — and they are — then the most productive interface between a somatic practitioner and an AI system is not "the practitioner provides a pose for the AI to copy" but "the practitioner's forward model and the AI's forward model are placed in dialogue." The practitioner anticipates; the AI anticipates; the shared movement phrase emerges from the negotiation between those anticipations.

This is, structurally, what Contact Improvisation already is between two human movers: a dialogue between two nervous systems' forward models, each predicting what the other will do and adjusting in real time. The difference is that the AI's forward model is currently trained only on visual data — it predicts appearances, not felt states.

The research opportunity is to train AI forward models on signals that are closer to the nervous system's own: EMG pre-activation, accelerometry, proprioceptive signals that carry information about the movement's felt quality and intention rather than just its visual output. A system trained on those signals would have forward models that are genuinely commensurable with the practitioner's own anticipatory architecture — and a co-creative dialogue between them would be possible at the level that matters in somatic practice.

Supporting Evidence

Friston, K. (2010). The free-energy principle: A unified brain theory? Nature Reviews Neuroscience, 11(2), 127–138. https://doi.org/10.1038/nrn2787

Friston, K., & Kiebel, S. (2009). Predictive coding under the free-energy principle. Philosophical Transactions of the Royal Society B, 364(1521), 1211–1221. https://doi.org/10.1098/rstb.2008.0300

Rao, R. P. N., & Ballard, D. H. (1999). Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2(1), 79–87. https://doi.org/10.1038/4580

Sheets-Johnstone, M. (2011). The primacy of movement (2nd ed.). John Benjamins. https://doi.org/10.1075/aicr.82

Zhao, Y., et al. (2026). Video generation with predictive latents. arXiv:2605.02134. https://arxiv.org/abs/2605.02134