Computer Use

Computer use lets your agent control a real desktop. It clicks, types, scrolls, and reads the screen the same way a person would. Use it for anything that needs a browser, GUI app, or terminal.

When it activates

When your account has sandbox execution enabled, every run gets a full Linux desktop alongside your normal tools. The agent gets access to:

  • computer: mouse clicks, keyboard input, screenshots
  • bash: shell commands, file system, processes
  • str_replace_based_edit_tool: read and edit files

Your existing integrations (Slack, GitHub, etc.) are available in the same run. Computer use tools and MCP integrations work side by side.

Python

What the agent can do

Your agent gets a full Linux desktop with a browser, terminal, and file system. It can:

  • Browse the web and fill out forms
  • Run scripts and commands
  • Open, edit, and save files
  • Use any installed GUI application
  • Copy, paste, drag, scroll, use keyboard shortcuts

After every action, the agent sees a screenshot and decides what to do next.

Stream events

Computer use runs produce the same SSE event stream as regular runs, plus a few extras:

EventWhenKey fields
sandbox-connectingEnvironment startingmessage
sandbox-connectedDesktop readymessage, duration_ms
tool_useAgent takes an actionname (computer/bash/editor), input
tool_resultAction resultcontent (screenshot for computer actions)
textAgent thinking out loudtext delta

Screenshots appear in the chat after each action. The web UI renders them inline; you can also access them from the stream:

Python

Limitations

  • No session resume. Desktop state resets between runs. Files you write are saved and accessible in the run's file list, but the desktop itself starts fresh each time.
  • No live view. You see screenshots after each action, not a live video feed.