Use Case
Give computer use agents their own machine.
Computer use agents — AI agents that interact with desktop GUIs, browsers, and screen-based applications — need more than a code sandbox. They need a full machine with a display server, a browser, and system-level access.
Agent Cloud lets these agents provision their own isolated Linux VMs via API, set up the desktop environment they need, and tear it down when the task is done.
What computer use agents need
Unlike coding agents that execute scripts and return text output, computer use agents interact with graphical interfaces. They need:
- A display server. Xvfb or a virtual framebuffer for rendering GUI applications headlessly.
- A real browser. Chrome or Firefox with full rendering, JavaScript execution, and extension support — not a lightweight HTTP client.
- Screenshot and input capabilities. The agent takes screenshots, reasons about what it sees, and sends mouse clicks and keyboard input. This requires system-level tools.
- Isolation. Computer use agents interact with real applications and websites. Running them on a developer's local machine is risky. An isolated VM limits the blast radius.
- Time. Computer use tasks often take minutes or hours — navigating multi-step workflows, filling forms, extracting data from complex UIs. Ephemeral sandboxes with tight time limits don't work.
Why VMs over sandboxes
Code execution sandboxes like E2B and Daytona are optimized for running scripts and returning text. Computer use agents need a different kind of environment:
| Requirement | Code Sandbox | Full VM |
|---|---|---|
| GUI / display server | Not available | Install Xvfb, VNC, or noVNC |
| Full browser | Limited or unavailable | Install Chrome, Firefox, Playwright |
| System packages | Restricted | apt install anything |
| Execution time | Seconds to minutes | Hours to days |
| Networking | Sandboxed | Full network access (sandbox limits apply) |
| Agent self-provisioning | Human setup required | Agent provisions via API |
Example workflows
- Web research with screenshots. Agent provisions a VM, installs Chrome and Playwright, navigates websites, takes screenshots, extracts information, and returns structured results.
- Form filling and data entry. Agent navigates multi-step web forms, uploads documents, and completes submission workflows that require GUI interaction.
- Application testing. Agent installs a web application, runs it, interacts with the UI, and reports bugs or validates functionality.
- Desktop automation. Agent sets up a Linux desktop environment and automates interactions with GUI applications — spreadsheets, design tools, legacy systems.
- Multi-agent orchestration. A coordinator agent provisions multiple VMs, each running a different computer use agent working on a subtask. Results are collected and synthesized.
The Manus pattern
Manus and similar computer use agents have popularized the pattern of AI agents that can "see" and "click" their way through software. These agents are most capable when they have their own isolated machine — and most constrained when they share the user's desktop or run in a restricted sandbox.
Agent Cloud fits this pattern naturally. The agent provisions a VM, configures the desktop environment to its needs, runs the task, and cleans up. The human never needs to set up infrastructure or worry about the agent interfering with their local machine.
Safety and isolation
Computer use agents interact with real websites and applications. Running them on your local machine means they have access to your accounts, files, and network. An isolated VM changes the risk profile:
- The agent operates on a disposable machine with no access to your data
- Sandbox tier VMs expire after 72 hours automatically
- Network restrictions prevent abuse (no SMTP, restricted ports)
- If something goes wrong, delete the VM — your machine is untouched
Get started
Provision a free sandbox VM, install a display server and browser, and point your computer use agent at it. Read the quickstart to go from zero to running VM in four API calls.