A research prototype developed by Google DeepMind, exploring the future of human-AI agent interaction, starting with the browser. Built with Gemini 2.0, it combines powerful multimodal understanding and reasoning capabilities to automate tasks in the browser.
Capable of understanding and reasoning about everything on a browser screen, including pixels and text, code, images, and web elements like forms
Understands and navigates complex websites, executing tasks on behalf of users
Achieves state-of-the-art results of 83.5% in single-agent settings on the WebVoyager benchmark
Only open to trusted testers
Automatically browses and interacts with complex websites
Handles repetitive tasks on web pages, saving user time