SHARE
Facebook X Pinterest WhatsApp

Google DeepMind Unveils Enhanced Robotic Control with RT-2

thumbnail
Google DeepMind Unveils Enhanced Robotic Control with RT-2

Robotic hand holding sphere 3d. Artificial intelligence. On blue

RT-2 utilizes chain-of-thought reasoning, allowing it to make multi-stage decisions, such as selecting alternative tools or beverages based on specific situations.

Sep 12, 2023

Google DeepMind unveiled Robotic Transformer 2 (RT-2), a vision-language-action (VLA) model designed to enhance robotic control through plain language instructions. Harnessing data from the Internet, RT-2 aims to foster robots that can adeptly navigate human environments, akin to well-known fictional robot companions from science fiction.

Redefining Robotic Capabilities

RT-2, drawing inspiration from how humans learn by reading and observing, relies on a vast language model akin to ChatGPT, which is trained using online text and images. This allows RT-2 to achieve the feat of generalization, enabling it to recognize patterns and perform untrained tasks.

Google showcased RT-2’s proficiency by demonstrating its ability to identify and discard trash without prior training. This includes recognizing potentially ambiguous items like food packaging as trash. A separate test had a robot powered by RT-2 successfully pinpoint a dinosaur figurine when instructed to “Pick up the extinct animal.” These capabilities are transformative as, traditionally, robotic training has been labor-intensive, relying on extensive manual data acquisition.

See also: AI and Robotics Research Continues to Accelerate

Advertisement

The Technical Mastery Behind RT-2

RT-2’s prowess can be attributed to Google DeepMind’s adoption of transformer AI models, celebrated for their generalization capabilities. The technology is built on Google’s prior AI innovations, such as the Pathways Language and Image model (PaLI-X) and the Pathways Language model Embodied (PaLM-E). Moreover, RT-2 was co-trained using data from its precursor, RT-1, gathered over 17 months.

The RT-2 framework refines a pre-trained VLM model with robotics and web data, leading to a model that processes camera images from robots and predicts subsequent actions. Interestingly, actions are represented as tokens, akin to word fragments, aiding in the robot’s control. This method, applied to RT-1, was also employed for RT-2, converting actions into symbolic “string” representations to facilitate new skill acquisition.

Additionally, RT-2 utilizes chain-of-thought reasoning, allowing it to make multi-stage decisions, such as selecting alternative tools or beverages based on specific situations. Comparative tests revealed RT-2’s stellar performance in new situations, recording a 62% success rate against RT-1’s 32%.

However, the model has its limitations. Although web data enhances generalization over concepts, it cannot bestow the robot with new physical skills it hasn’t practiced. Google acknowledges these constraints and the considerable research journey ahead but remains optimistic, viewing RT-2 as a significant stride towards achieving general-purpose robots.

thumbnail
Elizabeth Wallace

Elizabeth Wallace is a Nashville-based freelance writer with a soft spot for data science and AI and a background in linguistics. She spent 13 years teaching language in higher ed and now helps startups and other organizations explain - clearly - what it is they do.

Recommended for you...

If 2025 was the Year of AI Agents, 2026 will be the Year of Multi-agent Systems
AI Agents Need Keys to Your Kingdom
The Rise of Autonomous BI: How AI Agents Are Transforming Data Discovery and Analysis
Why the Next Evolution in the C-Suite Is a Chief Data, Analytics, and AI Officer

Featured Resources from Cloud Data Insights

The Difficult Reality of Implementing Zero Trust Networking
Misbah Rehman
Jan 6, 2026
Cloud Evolution 2026: Strategic Imperatives for Chief Data Officers
Why Network Services Need Automation
The Shared Responsibility Model and Its Impact on Your Security Posture
RT Insights Logo

Analysis and market insights on real-time analytics including Big Data, the IoT, and cognitive computing. Business use cases and technologies are discussed.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.