Community Digest: Week of 23–29 June 2026
Intent-driven physical control, hierarchical motion tokens, and self-improving robots close out June
From the arXiv
MIND: Multi-Scale Intent Diffusion for Text-Driven Physics-Based Humanoid Control (arXiv:2605.26006) Li et al. (ShanghaiTech) bridge the gap between language and low-level physical control by introducing behavioural intent as a mid-level semantic representation — the body's state encodes motion dynamics more aligned with text than raw actions do. Multi-scale diffusion generates intent at several temporal resolutions. Conceptually resonant with motor-intentionality accounts of movement organisation. → https://arxiv.org/abs/2605.26006
SCRIPT: Scalable Diffusion Policy for Language-Driven Physics-Based Humanoid Control (arXiv:2605.22894) Jointly models actions, physical states, and language (JAST-DiT) with history conditioning and RL post-training; shows consistent gains with model scaling. Language-driven physical control is becoming scalable. → https://arxiv.org/abs/2605.22894
DC-Motion: Decoupling Semantics and Details via Discrete-Continuous Tokens (arXiv:2606.14721) A Discrete-Continuous VAE splits motion into discrete tokens (semantics, temporal layout) and continuous residuals (joint smoothness, fine dynamics), avoiding quantisation loss while keeping compositional structure. The architectural recognition that fine continuous detail — where movement quality lives — must be preserved separately from nameable structure. → https://arxiv.org/abs/2606.14721
From Import AI
Import AI #463 (Jack Clark, ~June 30, 2026) — "Self-improving robots; a 10k Chinese GPU cluster; and an elegiac essay for the human era." The lead item covers ENPIRE (NVIDIA), software that puts physical robots through the autonomous experiment-and-execute loop that software agents use — with returns to scale (8 agents reach better solutions faster than 1) but real infrastructure friction as robot count grows (agents underutilise robots while reading logs, debugging, waiting on the LLM backbone). Clark frames it as a suggestive early glimpse of how an advanced system might try to instantiate itself physically. The issue closes with an elegiac essay on the human era — a more reflective register. → https://importai.substack.com/p/import-ai-463-self-improving-robots
From X / Social
Physics-based humanoid control cluster — the near-simultaneous appearance of MIND and SCRIPT (both text-driven, physics-based humanoid control, both emphasising the language-to-action gap) drew community notice that this specific problem — grounding language commands in physically valid whole-body motion — has become a concentrated research front. Several researchers noted the convergence on intermediate representations (intent, state) as the key to bridging the modality gap.
Robotics / embodiment discussion — ENPIRE (via Import AI) sparked debate about self-improving physical systems and their limits. For the somatic-AI community, the relevant thread was skeptical: autonomous robot self-improvement optimises for task success, not for movement quality or expressivity — a reminder that "better at the task" and "better movement" are different objectives, and only the former is currently being scaled.
Conference & Community Notes
Month-end field picture: June closed with three concurrent research fronts — the muscle-level turn (MuscleMimic, mid-June), the generalisation/evaluation challenge (ViMoGen, PP-Motion, late June), and physics-based language control (MIND, SCRIPT). All three advance the account of movement-as-produced-and-observed. The somatic dimension — movement-as-felt — remains the throughline gap this platform tracks.
Venues:
- MOCO 2026 (Movement and Computing) — priority somatic-AI venue; dates still to confirm at movementcomputing.org (carried).
- ECogS 2026 (OIST, Nov 9–13) — "Embodied cognition and AI"; abstracts expected to open summer.
- NeurIPS 2026 — notifications September.
References
Li, B., et al. (2026). MIND: Multi-scale intent diffusion for text-driven physics-based humanoid control. arXiv:2605.26006. https://arxiv.org/abs/2605.26006
SCRIPT: Scalable diffusion policy for language-driven physics-based humanoid control. (2026). arXiv:2605.22894. https://arxiv.org/abs/2605.22894
DC-Motion: Decoupling semantics and details via discrete-continuous tokens for human motion generation. (2026). arXiv:2606.14721. https://arxiv.org/abs/2606.14721