Microsoft Fara1.5: 4B/9B/27B browser agents, 27B 72% Mind2Web vs Operator 58%

Microsoft Research's AI Frontiers lab released Fara1.5: a family of browser computer-use agents at 4B, 9B, and 27B parameter sizes, built on Qwen3.5 base checkpoints. The models read screenshots and emit mouse/keyboard actions through an observe-think-act loop — each step takes prior conversation history plus the three most recent screenshots, outputs thoughts and a single action. Action space includes standard inputs plus web-specific operations (searches) and meta-actions for context management and user clarification. Online-Mind2Web (300 tasks, 136 websites): Fara1.5-27B at 72%, Fara1.5-9B at 63.4%. Comparison set: OpenAI Operator 58.3%, Gemini 2.5 Computer Use 57.3%, Yutori Navigator n1 64.7%. WebVoyager: 27B 88.6%, 9B 86.6%, 4B 80.8%. Training: ~2 million supervised samples — 60% web trajectories, 12.8% synthetic environments, 12.5% form filling/interactions, 8.8% grounding, 4.9% VQA, plus safety data. Safety pauses on missing personal info, ambiguous task descriptions, irreversible actions without approval. Open-source availability, weights, license, and HuggingFace/Azure deployment details not specified in the announcement yet.

Two things to note. Microsoft Research building on Qwen3.5 base — that is Microsoft using Chinese open-weight foundations to build a Western agentic product. Same cross-vendor weight-initialization pattern we covered last week with NVIDIA's Nemotron-Labs-Diffusion built on Ministral3. Microsoft has its own Phi family but chose Qwen3.5 as the browser-agent starting point. The OpenAI Operator comparison is the strategic move. Microsoft is OpenAI's largest investor and partner, yet Microsoft Research is shipping a research-grade browser agent that outperforms Operator by 13.7 points on Online-Mind2Web. Microsoft is hedging its OpenAI dependence by building in-house at Microsoft Research. Three sizes (4B/9B/27B) means deployment flexibility: edge tasks at 4B locally, server-grade tasks at 27B in datacenter. The meta-action space supporting context management and user clarification — pause for personal info, pause for ambiguous tasks, pause before irreversible — is the differentiator that makes browser agents shippable. Agents that won't ask before destructive actions are agents you can't put in production.

Ecosystem context. Browser-agent space heating up beyond the closed-API incumbents. OpenAI Operator (closed, GPT-class). Google Gemini 2.5 Computer Use (closed, Gemini-based). Anthropic Computer Use (closed, Claude-based). Now Microsoft Fara1.5 (Qwen3.5-based, three sizes, availability TBD). The benchmark numbers say Microsoft's research-grade family already beats the closed-API frontier on Online-Mind2Web. If Microsoft releases Fara1.5 weights publicly, the open-weights browser-agent category gets a real frontier-class entry overnight. If they keep it closed and route through Azure/Bing/Edge integration, it becomes Microsoft's defense against OpenAI capturing the agent layer. Either way, the benchmark pressure is now on Operator and Gemini Computer Use to ship the next iteration with comparable numbers. For builders shipping browser-automation products today: the 4B model at 80.8% WebVoyager is the interesting size class — accessible enough for local deployment, capable enough to handle most browser tasks.

Monday: if you ship browser-automation or computer-use products (RPA replacements, web scraping, QA testing, customer-support workflow automation), evaluate Fara1.5 as soon as availability lands. Specific tests on your task distribution: (1) login flows with MFA, (2) form filling with conditional logic, (3) multi-page navigation preserving state, (4) error-recovery from unexpected page states. The 4B variant is the size to start with — if 80.8% WebVoyager translates to 70-80% on your tasks, you have a deployable agent without datacenter inference. For closed-source competitors (Operator, Gemini Computer Use, Anthropic Computer Use): the pricing competitive position just got real pressure. Operator at $200/month per user versus deploy-your-own Fara1.5-4B locally is a fundamentally different cost curve if Microsoft releases weights. Watch HuggingFace and the Microsoft Research blog over the next 48 hours for the weight and license announcement. The benchmark gap (72% vs 58%) is real, and the downstream competitive consequence depends on whether Microsoft ships weights or keeps Fara1.5 as Azure-internal capability.

Microsoft Fara1.5: 4B/9B/27B browser agents, 27B 72% Mind2Web vs Operator 58%

More News