Google DeepMind’s Gemini Robotics helps humanoids learn to make salads, origami foxes
In a blog post Tuesday, Google DeepMind outlined Gemini Robotics, a new offshoot of the software giant’s generative AI platform. Built atop Gemini 2.0, the new model adds physical movement to the list of outputs that already includes text, video, and audio. The researchers say a robotic system – in this case, Apptronik's Apollo humanoid – was able to execute new activities without additional preprogramming.
Examples include making salads, packing lunch, playing Tic-Tac-Toe, folding an origami fox, and “slam dunking” a tiny basketball into a an equally tiny hoop.
“We’d trained models to help robots with specific tasks and to understand natural language before, but this was a step change,” DeepMind robotics head, Carolina Parada, explains. “The robot had never seen anything related to basketball, or this specific toy. Yet it understood something complex — ‘slam dunk the ball’ — and performed the action smoothly. On its first try.”
If the execution is, indeed, as seamless and speedy as Parada suggests, such models could prove invaluable to the new generation of humanoid robots. On-boarding new skills and operating in uncontrolled environments are currently two key bottlenecks to an end goal of building more general-purpose robots.
Here that means providing humanoid robots a level of real-world flexibility on-par with their complex form factor. The ability for multi-purpose robots to rapidly develop a wide range of different skills based on their existing model data would go a long way toward creating systems that can do different jobs in different environments, like expanding from the factory to the home.
“The possibilities for robots using highly general and capable models are broad and exciting,” says Parada. “They could be more useful in industries where setups are complex, precision is important and the spaces aren’t human-friendly. And they could be helpful in human-centric spaces, like the home."
Apptronik's Apollo system appears to factor heavily into DeepMind's research on the subject. The two parties announced a deal at the end of 2024 designed to, "bring together best-in-class artificial intelligence with cutting-edge hardware and embodied intelligence." Google also participated in Apptronik's recent $350 million raise.