Community Digest: Week of 12–18 May 2026

In-betweening goes neural, billion-parameter motion models go open, and the embodied AI conference calendar fills up


From the arXiv

Generative Motion In-betweening by Diffusion over Continuous Implicit Representations (arXiv:2605.12778, May 12) Fan, Henderson, & Ho (University of Glasgow) propose encoding motion as implicit neural representations and using latent diffusion to sample plausible, diverse in-between sequences from sparse keyframe constraints. The INR approach allows the model to generate motion at arbitrary temporal resolution — not locked to training frame rate. Strong results on sparse-keyframe in-betweening quality and diversity. Relevant for score-based and phrase-structured somatic co-creation workflows. → https://arxiv.org/abs/2605.12778

UMo: Unified Sparse Motion Modeling for Real-Time Co-Speech Avatars (arXiv:2605.14731, May 14) Zhan et al. unify text, audio, and motion as discrete tokens in a single transformer, achieving tight audio–body gesture alignment at real-time throughput. The unified token approach removes the need for separate alignment modules per modality. Notable step toward multi-channel conditioning systems where speech, gesture, and movement are co-modelled rather than processed in parallel independent pipelines. → https://arxiv.org/abs/2605.14731

Riemannian Motion Generation via Flow Matching (arXiv:2603.15016, March — CVPR 2026 accepted) Miao, Huang, & Li (PolyU / SUSTech) represent motion on a product manifold and learn dynamics via Riemannian flow matching. State-of-the-art FID of 0.043 on HumanML3D; first place on all metrics in MotionStreamer format. This paper is appearing in CVPR 2026 proceedings, bringing manifold-aware motion representation into the mainstream conference canon. Geometrically principled encoding — relevant to the qualitative texture discussion from the May 1 deep analysis. → https://arxiv.org/abs/2603.15016


From HuggingFace

Tencent HY-Motion 1.0 — Tencent released HY-Motion 1.0 to HuggingFace this week: a series of text-to-3D human motion generation models based on Diffusion Transformer (DiT) + flow matching, scaled to the billion-parameter level. The first open-source text-to-motion model family at this scale. Generates skeleton-based 3D animation from text prompts with notably improved instruction-following over smaller open models. Available for community fine-tuning. → https://huggingface.co/tencent/HY-Motion-1.0


From Import AI

Import AI #457 (Jack Clark, May 18, 2026) — This week's edition covers: an investigation into fast16.sys, a ~20-year-old computer virus that selectively patches high-precision calculation software in memory to tamper with numerical results — a quiet threat to scientific computation; research on pathologies in the Muon optimizer (more than one in four neurons effectively dead by training step 500 under certain configurations); and a multi-institution paper on positive alignment — defining AI development that supports human and ecological flourishing in pluralistic, user-authored ways. The alignment paper's co-signatories include Oxford, Google DeepMind, Anthropic, OpenAI, and Stanford. → https://jack-clark.net/2026/05/18/import-ai-457-ai-stuxnet-cursed-muon-optimizer-and-positive-alignment/


From X / Social

@Marktechpost shared DanceGRPO (ByteDance Seed + Univ. Hong Kong) — a unified framework adapting Group Relative Policy Optimization (GRPO) to visual generation. Applied across diffusion and rectified flow paradigms, text-to-image, text-to-video, and image-to-video, with reward types including motion quality. The application of RL-style reward shaping to motion quality in generation pipelines is a technically interesting direction: motion quality rewards could eventually be defined in terms of somatic criteria rather than visual plausibility.

@Kling_ai announced new tutorial content for Kling 2.6 Motion Control — uploading a character image plus an original dance or expression video to generate a new combination. Practitioner-facing tool; the reference-video-to-new-character transfer workflow is directly usable for exploring how movement vocabulary transfers across body types and visual styles.

@DrTomFroese confirmed funding for the International Conference on Embodied Cognitive Science (ECogS 2026) at OIST, Nov. 9–13, 2026. Theme: Embodied cognition and AI. This is a significant conference for the phenomenological and cognitive-scientific grounding of somatic-AI research — abstract submissions likely to open over the coming weeks.


Conference Notes

HuMoGen Workshop @ CVPR 2026 — The Human Motion Generation workshop is confirmed for CVPR 2026 (Nashville, June). Inviting accepted conference and journal papers in motion generation, with poster and presentation tracks. CVPR 2026 as a whole accepted 4,090 papers (25% acceptance rate, up 42% from prior year), with a dedicated oral session on Human Motion. The scale of CVPR 2026 motion research suggests the field has fully entered the mainstream CV/ML canon.

Foundation Models Meet Embodied Agents (FMEA) Workshop @ CVPR 2026 — Four challenges with 500/500/300/$200 cash prizes per track, covering VLMs, LLM planning, and VLA policies. Embodied AI challenge submissions currently open.


References

Fan, S., Henderson, P., & Ho, E. S. L. (2026). Generative motion in-betweening by diffusion over continuous implicit representations. arXiv:2605.12778. https://arxiv.org/abs/2605.12778

Miao, F., Huang, J., & Li, T. (2026). Riemannian motion generation: A unified framework for human motion representation and generation via Riemannian flow matching. arXiv:2603.15016. https://arxiv.org/abs/2603.15016

Tencent. (2026). HY-Motion-1.0 [Model repository]. HuggingFace. https://huggingface.co/tencent/HY-Motion-1.0

Zhan, X., Fu, X., Yang, C., et al. (2026). UMo: Unified sparse motion modeling for real-time co-speech avatars. arXiv:2605.14731. https://arxiv.org/abs/2605.14731