The development of humanoid robots has moved at a slow pace for the better part of two decades, but recently we have seen more and more breakthroughs in this field. As we recently wrote, the AI robot Ameca was presented at MWC 2024, and another development, Unitree H1, broke the speed record among humanoid robots. And now a spectacular result of the collaboration between the Figure AI and OpenAI companies has appeared on the Internet – a stunning video of a humanoid robot that can now hold conversations with people.
Startup Figure AI released a video showing the Figure 01 robot working with the new Visual Language Model (VLM). In it, Figure 01 is standing at a table on which there is a plate, an apple and a cup. There is a dryer on the left. And to the question of a person, what the robot sees in front of him, he answers by describing in detail everything that is lying on the table.
Then the man asks if he can eat something, and the robot answers: “Of course”, and with a deft smooth movement he takes the apple and hands it to the man. After that comes another stunning demonstration – a man pours crumpled rubbish from a basket in front of Figure 01 and asks the robot to explain why he did this, and at the same time collect the rubbish in the basket. And he explains his “thought” while putting the paper back in the rubbish. “So I gave you an apple because it’s the only edible item I could give you off the table,” said the robot.
Company officials explained that Figure 01 uses OpenAI’s pre-trained multimodal model, VLM, to understand images and text, and relies on voice prompts to generate its responses. This is different from, say, OpenAI’s GPT-4, which focuses on written prompts.
It also uses what the company calls “learned low-level bimanual manipulations.” The system coordinates precise image calibration (down to the pixel level) with its neural network for motion control. “These networks receive images at 10 Hz and generate 24-DOF actions (wrist poses and finger joint angles) at 200 Hz,” Figure AI said in a statement.
The company claims that every behaviour in the video is based on system learning, so no one behind the scenes is pulling Figure 01’s strings. Of course, there is a nuance – it is not known how many times the robot went through this procedure. Maybe it was the hundredth time, which explains his precise movements. But in any case, this achievement looks spectacular and a little fantastic.
Figure 01 is now completing real world tasks
Everything is autonomous:
-Autonomous navigation & force-based manipulation
-Learned vision model for bin detection & prioritization
-Reactive bin manipulation (robust to pose variation)
-Generalizable to other pick/place tasks pic.twitter.com/0wFmYnq0GC— Figure (@Figure_robot) February 26, 2024
Read also:
- The Unitree H1 robot has become the fastest humanoid robot in the world
- ChatGPT has been integrated into the robotic pet Loona