Community Digest: Week of 2–7 June 2026

CVPR 2026 underway in Denver, Humanoid-GPT sets a new scaling frontier, and EMG personalisation takes a leap forward


From the arXiv

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking (arXiv:2606.03985, June 2 — CVPR 2026) Qi, Chen et al. (Tsinghua / Galbot) present a GPT-style causal Transformer pre-trained on a 2-billion-frame motion corpus — the largest training dataset for a motion tracking model to date. The model achieves zero-shot generalisation to unseen motions and control tasks while tracking highly dynamic behaviours. Presented as a CVPR 2026 poster; code released to GitHub. The scaling-law logic applied to motion tracking is significant: it suggests that many generalisation challenges in motion AI can be addressed through data and parameter scale rather than task-specific engineering. → https://arxiv.org/abs/2606.03985

Towards 3D-Aware Video Diffusion Models: Render-Free Human Motion Control with Mesh Tokenization (arXiv:2606.02000, June 1) Liang, Wei, Li et al. (DAMO Academy / Alibaba, Hupan Lab) compress 3D human mesh geometry into discrete tokens that are processed jointly with video tokens inside a DiT architecture — eliminating the render-to-2D step that previous motion-conditioned video generation required. The model reasons natively about 3D geometry, camera viewpoint, and appearance in a unified token stream. Notable for opening a direct path from 3D body representations (motion capture, IMU-derived pose) to video generation without intermediate 2D projection. → https://arxiv.org/abs/2606.02000

REACT: A Conditioning Framework for User-Adaptive sEMG Hand Pose Estimation (arXiv:2605.30127, May 28) Xie & Cheung demonstrate that FiLM (Feature-wise Linear Modulation) applied to a frozen EMG-to-pose backbone — using a compact user embedding learned from under 45 seconds of calibration data — significantly reduces angular error on the emg2pose benchmark across all generalisation splits. No gradient update required at deployment. The fast, gradient-free personalisation architecture is directly applicable to any EMG-conditioned system that needs to adapt to a new user without full retraining. → https://arxiv.org/abs/2605.30127


From the Industry

ICRA 2026 wraps in Vienna (June 1–4) — The International Conference on Robotics and Automation 2026 concluded its main programme in Vienna with strong representation from embodied AI, bimanual manipulation, and humanoid locomotion research. The WBCD (What Bimanuals Can Do) challenge, the largest real-world bimanual manipulation competition at ICRA, drew competition teams from across Europe, North America, and Asia. Humanoid locomotion results from ICRA 2026 and CVPR 2026 are appearing on the same week — the robotics and CV communities are converging on the same physical AI agenda.


From Import AI

Import AI #459 (Jack Clark, June 1, 2026) — Three main threads: (1) A survey paper on how AI oversight has become structurally difficult as AI capabilities expand, with analysis of the institutional gaps that make evaluation hard to maintain at pace; (2) new scaling laws applied to protein folding models, showing that the compute-capability relationship observed in language models extends to structural biology prediction; (3) a methodological paper on pricing the extinction risk of AI systems — treating catastrophic AI failure as a contingent liability and asking what insurance-market instruments would be required to price it. An unusually wide issue; the extinction risk pricing paper in particular has generated significant community discussion. → https://importai.substack.com/p/import-ai-459-ai-oversight-is-difficult


From X / Social

@GalaxyGeneralRobotics (Galbot / Humanoid-GPT team) — released code and demo videos for Humanoid-GPT on GitHub, showing the model tracking a wide range of highly dynamic human motion sequences zero-shot, including martial arts, dance, and parkour clips sourced from online video. The zero-shot dance tracking results in particular drew substantial attention from the somatic-AI and creative robotics communities: the model follows choreographic phrasing without prior exposure to dance-specific data.

@EmbodiedAIRead — shared a summary of the VLA (Vision-Language-Action) research landscape at ICLR 2026: VLA submissions grew from 9 to 164 in one year, now constituting a major sub-track. The rapid formalisation of VLA as a research category is relevant to somatic AI: VLA models that integrate proprioceptive and multimodal sensing alongside vision and language are the closest existing architecture family to the kind of system SSIN research envisions.


Conference Notes

CVPR 2026 — Denver, June 1–19 — The conference is underway. Human motion-relevant oral and highlight papers appearing in the first week include Humanoid-GPT, Superman (perception-generation unification), and several physics-based motion synthesis papers. The HuMoGen and PhysHuman workshops run as satellite events in the second week. Full programme available at cvpr.thecvf.com.


References

Liang, J., et al. (2026). Towards 3D-aware video diffusion models: Render-free human motion control with mesh tokenization. arXiv:2606.02000. https://arxiv.org/abs/2606.02000

Qi, Z., et al. (2026). Humanoid-GPT: Scaling data and structure for zero-shot motion tracking. arXiv:2606.03985. https://arxiv.org/abs/2606.03985

Xie, E., & Cheung, H. S. (2026). REACT: A conditioning framework for user-adaptive sEMG hand pose estimation. arXiv:2605.30127. https://arxiv.org/abs/2605.30127