Use Case

Give computer use agents their own machine.

Computer use agents — AI agents that interact with desktop GUIs, browsers, and screen-based applications — need more than a code sandbox. They need a full machine with a display server, a browser, and system-level access.

Agent Cloud lets these agents provision their own isolated Linux VMs via API, set up the desktop environment they need, and tear it down when the task is done.

What computer use agents need

Unlike coding agents that execute scripts and return text output, computer use agents interact with graphical interfaces. They need:

  • A display server. Xvfb or a virtual framebuffer for rendering GUI applications headlessly.
  • A real browser. Chrome or Firefox with full rendering, JavaScript execution, and extension support — not a lightweight HTTP client.
  • Screenshot and input capabilities. The agent takes screenshots, reasons about what it sees, and sends mouse clicks and keyboard input. This requires system-level tools.
  • Isolation. Computer use agents interact with real applications and websites. Running them on a developer's local machine is risky. An isolated VM limits the blast radius.
  • Time. Computer use tasks often take minutes or hours — navigating multi-step workflows, filling forms, extracting data from complex UIs. Ephemeral sandboxes with tight time limits don't work.

Why VMs over sandboxes

Code execution sandboxes like E2B and Daytona are optimized for running scripts and returning text. Computer use agents need a different kind of environment:

RequirementCode SandboxFull VM
GUI / display serverNot availableInstall Xvfb, VNC, or noVNC
Full browserLimited or unavailableInstall Chrome, Firefox, Playwright
System packagesRestrictedapt install anything
Execution timeSeconds to minutesHours to days
NetworkingSandboxedFull network access (sandbox limits apply)
Agent self-provisioningHuman setup requiredAgent provisions via API

Example workflows

  • Web research with screenshots. Agent provisions a VM, installs Chrome and Playwright, navigates websites, takes screenshots, extracts information, and returns structured results.
  • Form filling and data entry. Agent navigates multi-step web forms, uploads documents, and completes submission workflows that require GUI interaction.
  • Application testing. Agent installs a web application, runs it, interacts with the UI, and reports bugs or validates functionality.
  • Desktop automation. Agent sets up a Linux desktop environment and automates interactions with GUI applications — spreadsheets, design tools, legacy systems.
  • Multi-agent orchestration. A coordinator agent provisions multiple VMs, each running a different computer use agent working on a subtask. Results are collected and synthesized.

The Manus pattern

Manus and similar computer use agents have popularized the pattern of AI agents that can "see" and "click" their way through software. These agents are most capable when they have their own isolated machine — and most constrained when they share the user's desktop or run in a restricted sandbox.

Agent Cloud fits this pattern naturally. The agent provisions a VM, configures the desktop environment to its needs, runs the task, and cleans up. The human never needs to set up infrastructure or worry about the agent interfering with their local machine.

Safety and isolation

Computer use agents interact with real websites and applications. Running them on your local machine means they have access to your accounts, files, and network. An isolated VM changes the risk profile:

  • The agent operates on a disposable machine with no access to your data
  • Sandbox tier VMs expire after 72 hours automatically
  • Network restrictions prevent abuse (no SMTP, restricted ports)
  • If something goes wrong, delete the VM — your machine is untouched

Get started

Provision a free sandbox VM, install a display server and browser, and point your computer use agent at it. Read the quickstart to go from zero to running VM in four API calls.