Google DeepMind released Gemini Robotics, a Vision-Language-Action model built on Gemini 2.0 that claims to directly control robots across different hardware platforms. The system can handle complex manipulation tasks like folding origami and playing cards, adapt to new robot embodiments including bi-arm platforms, and learn new tasks from as few as 100 demonstrations. DeepMind says it works with unseen environments and follows open vocabulary instructions while executing "smooth and reactive movements."

This represents Google's most aggressive push into embodied AI, moving beyond chatbots into physical world control. The timing isn't coincidental—robotics companies are racing to solve the hardware-software integration problem that's kept useful robots out of real environments. DeepMind's approach of training one generalist model that adapts to any robot body could solve the fragmentation problem that's plagued robotics for decades.

Meanwhile, researchers at KAIST released Robot-R1, taking a different approach with reinforcement learning instead of supervised fine-tuning. They argue that traditional training methods lead to "catastrophic forgetting and reduced generalization performance" in robotics tasks. Robot-R1 learns to predict keypoint states for task completion, inspired by DeepSeek-R1's reasoning approach. The competing methodologies highlight ongoing uncertainty about the best path to general-purpose robotics.

For developers, the practical question is whether these models will actually ship as APIs you can use, or remain research demos. DeepMind's track record suggests cautious optimism—they've delivered production models before, but robotics has burned through billions in hype. The real test is whether Gemini Robotics works reliably enough for someone to bet their product on it.