The latest advancement holds the potential to revolutionize the capabilities of robotic systems.
For years, the idea of robots seamlessly integrating into our daily lives has been a staple of science fiction. Google has now taken a significant step towards making this a reality with the introduction of Robotics Transformer 2 (RT-2), a pioneering vision-language-action (VLA) model. This model, trained using text and images from the internet, is designed to translate this knowledge into robotic actions, paving the way for a new era of informed and helpful robots. Quoting the tech giant’s words, “RT-2 can speak robot.”
Unlike traditional robots, RT-2 is not just about understanding objects and their properties. It’s about contextual understanding. For instance, while many robots can be trained to recognize an apple based on its properties, RT-2 can identify an apple in its environment, differentiate it from similar objects and know how to handle it.
Understanding RT-2’s capabilities
Recent advancements have bolstered robots’ reasoning and problem-solving abilities. Techniques like chain-of-thought prompting now allow robots to break down multistep tasks, while vision models like PaLM-E have been instrumental in helping robots better interpret their environment. Additionally, the success of RT-1 demonstrated that Transformers, a deep learning architecture that is known for its data generalization ability, could facilitate knowledge transfer between different types of robots.
Historically, robots operated on complex stacks of systems, requiring fragmented communication between reasoning and action systems. RT-2 streamlines this, allowing a singular model to handle intricate reasoning and directly produce robotic actions. Notably, RT-2 can utilize minimal robot training data and apply concepts from its training to guide robotic actions, even for unfamiliar tasks. For instance, while traditional systems needed explicit training to dispose of trash, RT-2 inherently understands the concept and can act accordingly.
Advancing robotic learning with knowledge transfer
RT-2 exhibits a promising capability to transfer knowledge into actions, indicating a potential for robots to swiftly adapt to new and unfamiliar situations and surroundings. In extensive testing, consisting of over 6,000 robotic trials, RT-2 performed as well as its predecessor, RT-1, when handling tasks from its training data (referred to as “seen” tasks).
However, what sets the new model apart is its significant improvement in handling novel, previously unseen scenarios, achieving an impressive success rate of 62 percent, compared to RT-1’s 32 percent. This enhanced performance demonstrates the potential of RT-2 in enabling robots to effectively learn and apply knowledge to new challenges.
- Dreame’s Robotic Pool Cleaners To Hit North American Shores in September
- Meet the Popular AI Robot Chefs Redefining Cooking
- Weird Robots and The Joy of Innovation
Header image courtesy of Pexels
Press release link: https://www.blog.google/technology/ai/google-deepmind-rt2-robotics-vla-model/