Google DeepMind released Gemini Robotics-ER 1.6, positioning it as the "strategist" brain that works alongside their existing VLA model that handles physical execution. The key architectural split: ER handles spatial reasoning, task planning, and success detection while calling external tools like Google Search, while the VLA model translates decisions into actual robot movements. The biggest addition is instrument reading capability—robots can now parse gauges, displays, and readouts in real environments.

This dual-brain approach reflects where robotics AI is heading: away from monolithic models trying to do everything, toward specialized components that excel at distinct tasks. I've been tracking this trend since covering Google's initial Gemini Robotics claims in April—the industry realized that cramming vision, reasoning, and motor control into one model creates more problems than it solves. Tesla's FSD team learned this lesson years ago, and now robotics is catching up.

What's most telling is the pointing capability improvements. Gemini Robotics-ER 1.6 can accurately count objects and identify precise pixel locations—foundational skills that previous versions botched. In DeepMind's own benchmarks, the 1.5 version missed scissors entirely and hallucinated objects that weren't there. These aren't flashy capabilities, but they're the difference between a robot that works in controlled demos versus one that functions in messy real-world environments.

For developers building with robotics APIs, this split architecture matters. You're no longer betting on one model to handle everything—you can potentially swap out reasoning components without rebuilding motor control systems. But Google hasn't released this publicly yet, so we're still watching from the sidelines while they perfect the integration.