UI-TARS Desktop

Open Source

Free

UI-TARS Desktop is a GUI Agent application based on UI-TARS (Vision-Language Model) that allows you to control your computer using natural language. It integrates key components like perception, reasoning, grounding, and memory into a single vision-language model, enabling end-to-end task automation without predefined workflows or manual rules.

Product Screenshot

Features

Comprehensive GUI Understanding

Processes multimodal inputs (text, images, interactions) to build a coherent understanding of interfaces, with real-time monitoring and accurate response to dynamic GUI changes

Unified Action Space

Standardized action definitions across platforms (desktop, mobile, and web), supporting additional operations like hotkeys, long press, and platform-specific gestures

Dual System Reasoning

Combines fast intuitive responses with deliberate high-level planning, supporting multi-step planning, reflection, and error correction for robust task execution

Memory System

Includes short-term memory for capturing task-specific context and long-term memory for retaining historical interactions and knowledge to improve decision-making

Pricing

Open Source Version

Free

Apache 2.0 license
Local deployment support
Cloud deployment support
Multiple model sizes available (2B, 7B, 72B)
Requires self-configuration of API

Use Cases

Automated Task Execution

Uses natural language instructions to automatically execute various computer tasks, such as browsing websites and sending tweets

Cross-Platform Operation

Supports automated operations across different platforms (Windows, MacOS), providing a unified user experience