Google has turned computer use into a built-in tool inside Gemini 3.5 Flash, the fast and low-cost tier of its model family. With it, developers can build agents that see what is on a screen, reason about it, and then take action, clicking, typing, and navigating across browser, mobile, and desktop environments. It is available now through the Gemini API and the Gemini Enterprise Agent Platform.
The capability itself is not brand new. Until now, computer use lived in a separate, standalone Gemini 2.5 computer use model, something you called as its own endpoint. The change here is placement rather than invention: by folding screen control directly into the main Flash model, Google makes it a default capability of the model most people already reach for, instead of a specialized tool off to the side.
That placement is the real story, because Flash is the cheap, high-volume tier. Computer use is exactly the kind of feature whose usefulness depends on cost, since an agent that drives software for hours runs up a lot of tokens. Google points the capability at long-horizon and enterprise automation, including continuous software testing and knowledge work across professional applications, the repetitive multi-step jobs where having a cheap model do the clicking actually changes the economics.
The part worth paying attention to is what Google shipped alongside the capability. An agent that operates a live browser or a real desktop is uniquely exposed to prompt injection, where a malicious web page, email, or document slips instructions into what the agent reads and hijacks its behavior. Google says it used targeted adversarial training to harden Gemini 3.5 Flash against this, and it released two optional enterprise safeguard systems: one that requires explicit user confirmation before the agent takes a sensitive action, and one that automatically stops a task if it detects an indirect injection attempt. Defense, not just capability, in the same announcement.
The honest read keeps two things in view. Computer use agents are still brittle in practice, and reliability on long, multi-step tasks remains the hard, unsolved part, so a built-in tool does not make the agents themselves trustworthy. And the safeguards are optional add-ons described in Google's own terms, not independently tested guarantees. But the combination is the signal: making screen-driving cheaper while naming and shipping a defense for its single biggest failure mode is a more grown-up way to push agents forward than capability alone, and it raises the bar for how rivals are expected to release the same thing.
