IBM announced the general availability of Bob on April 28, an AI development platform pitched at enterprises wrestling with technical debt, hybrid cloud, and compliance. Bob spans the full software development lifecycle โ€” discovery, planning, coding, testing, deployment, ops โ€” and uses role-based agents coordinated through governed workflows. The launch lands the same week GitHub announced usage-based Copilot pricing, and the framing is similar: enterprise AI dev tools have to control cost and risk, not just speed up typing. IBM says 80,000 of its own employees are already using Bob and reports a 45% average productivity gain โ€” a figure to file under "vendor self-report" until independently verified.

Under the hood, Bob is a model-routing platform. It mixes frontier LLMs (unspecified which), open-source models, IBM's own Granite SLMs, and fine-tuned task-specific models, choosing per task based on accuracy, latency, and cost. The agentic part is multi-agent coordination across testing, documentation, and CI/CD pipelines, with human-in-the-loop controls at the boundaries. The most differentiated capability โ€” and the one most other AI dev tools quietly skip โ€” is mainframe modernization. IBM cites APIS IT, a Croatian government IT operator, migrating .NET services and legacy JCL/PL/I systems with claims of 10ร— faster architecture documentation and "100% accuracy on legacy JCL/PL/I." The latter is the kind of claim that's only meaningful if you see the test corpus; on its own, it's marketing.

IBM has been here before. Watson AIOps, Watson Code Assistant for Z, the original Watson Discovery โ€” IBM has shipped enterprise AI dev tools repeatedly, and the trail of customer outcomes has been mixed. Bob is interesting because the architecture is genuinely modern โ€” multi-agent plus model routing across SLMs and frontier LLMs, not just one big model behind a wrapper โ€” and because mainframe support is a real moat. Cursor, Claude Code, and Copilot do not ship with PL/I expertise. But the failure modes IBM is warning about are real: hallucination on undocumented legacy environments, RAG silos, models suggesting syntactically correct but functionally useless code. Whether Bob actually solves these or just papers over them with multi-agent orchestration is the open question, and IBM's launch materials don't answer it.

For builders not in enterprise SDLC, Bob is mostly a market signal: IBM thinks the next round of enterprise AI dev tooling competition is about governance, model routing, and legacy system integration, not chat speed. For builders inside enterprises with mainframe debt, this is one of the few options that explicitly targets your problem; the smart move is to pilot it on a contained legacy module and measure the actual hallucination rate against your codebase rather than trusting the 100% accuracy headline. The 30-day trial is the lever there. And for everyone watching the space: pay attention to IBM's model-routing pattern โ€” mixing small Granite models for cheap high-volume tasks with frontier LLMs for hard tasks is the architecture every cost-conscious enterprise will want to replicate, with or without IBM.