AWS opened Amazon WorkSpaces to AI agents in preview this week โ giving any MCP-compliant agent framework, including LangChain, CrewAI and AWS's own Strands Agents, a managed virtual desktop to operate legacy applications through computer vision and input simulation. The agent authenticates through IAM, connects to a WorkSpaces instance at a pre-signed URL, and interacts the way a human employee would: taking screenshots, clicking, typing, scrolling. The target application doesn't know an agent is driving it; nothing about the software needs to be modified. AWS demonstrated the pattern with a Strands agent on Bedrock running a prescription refill workflow inside a sample pharmacy system โ patient lookup, medication search, order placement, refill confirmation โ none of it via API.
The architecture is more interesting than the demo. WorkSpaces exposes a managed MCP endpoint as the agent control plane, which makes the framework choice up to the builder rather than locked into AWS-native runtimes. Security inherits the human-WorkSpaces model: isolated instances, unique IAM identities per agent (so CloudTrail can distinguish agentic actions from human activity), CloudWatch observability, and configurable per-stack capabilities โ desktop resolution, image format, screenshot storage, computer-input enablement. The honest cost reality is the part most reads will miss: Reflex's recent benchmark showed a vision agent consumed roughly 500,000 input tokens to complete a task an API agent handled in 12,000 โ a 45ร token spread, with the vision agent taking 17 minutes versus 20 seconds. Palash Awasthi at Reflex framed it crisply: "Better vision models reduce error rates per screenshot, but they do not reduce the number of screenshots required to reach the relevant data."
The ecosystem read here is two-track. AWS is making the bet that the 75% of organizations Gartner flagged as still running legacy apps without modern APIs โ and the 71% of Fortune 500 firms with critical mainframe processes โ will choose a 45ร more expensive agent over a multi-year modernization project, because the math actually works at enterprise pricing. The MCP plumbing matters more than the WorkSpaces brand: this is the first managed cloud-desktop-as-MCP-endpoint, which makes it the cloud-side counterpart to Anthropic's Claude computer-use and OpenAI's Operator. Microsoft is building the same category with Windows 365 for AI agents. The bottleneck is no longer whether agents can drive GUIs (Claude 3.5 Sonnet computer-use showed that in late 2024); it is who hosts the desktop the agent runs on. AWS just bid for that layer with an MCP front door.
For builders deploying agents into regulated industries: the IAM-per-agent pattern, CloudTrail audit, and isolated-instance model are the parts to copy if you're building elsewhere โ regulators will want exactly this trace, not a "trust the agent" story. For builders evaluating computer-use vs API integration: do the token math at your scale and your workflow length. The 20-second API path beats 17-minute vision agents on cost when an API exists; on legacy stacks where modernization is a year of work and seven figures, a 45ร more expensive agent that ships next week is the rational choice. Preview is available in US East (N. Virginia, Ohio), US West (Oregon), Canada Central, four European regions and five Asia-Pacific regions, with sample code on GitHub.
