GPT-5.5 launches at 2× the API price — Terminal-Bench 2.0 at 82.7%, but Claude Opus 4.7 still leads MCP Atlas

OpenAI launched GPT-5.5 on April 23, with API access opening April 24. The framing: "a new class of intelligence for real work and powering agents," designed to plan, use tools, self-check, and work through tasks independently. The model is the first retrained base model since GPT-4.5, co-designed with NVIDIA's GB200 and GB300 NVL72 rack-scale systems. It is rolling out to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex. The pricing is the part to read carefully: GPT-5.5 standard is US$5 per million input tokens and US$30 per million output tokens, exactly 2× GPT-5.4's rates. GPT-5.5 Pro, with additional parallel test-time compute, is US$30 input / US$180 output. OpenAI's defense of the doubled rate is that GPT-5.5 completes the same Codex tasks with fewer tokens — independent testing lab Artificial Analysis validated that effective costs land roughly 20% higher rather than 2×.

The benchmarks explain why OpenAI is willing to charge double. On Terminal-Bench 2.0 — command-line workflows that require planning and tool coordination in a sandboxed environment — GPT-5.5 hits 82.7%, against GPT-5.4's 75.1% and Claude Opus 4.7's 69.4%. On SWE-Bench Pro (GitHub issue resolution), it reaches 58.6%. On Expert-SWE — OpenAI's internal benchmark of tasks with 20-hour median human completion times — 73.1% vs GPT-5.4's 68.5%. The most striking jump is MRCR v2 at one million tokens, a long-context retrieval benchmark, where GPT-5.5 scores 74.0% against GPT-5.4's 36.6% — roughly a doubling. The honest numbers are also in the table: on MCP Atlas, Scale AI's Model Context Protocol tool-use benchmark, Claude Opus 4.7 leads at 79.1% and OpenAI did not report a GPT-5.5 score, instead leaving the cell blank in its own published table. GPT-5.5 Pro leads BrowseComp (web-browsing) at 90.1%.

Three patterns connect. First, GPT-5.5's April 23 release is the cause of this week's pricing-cluster news: GitHub announced Copilot's shift to usage-based AI Credits on April 28 explicitly citing surging inference costs. Microsoft is making its users pay for the same tokens OpenAI is charging double for. Second, the comparison math at 10 million output tokens per month is concrete — GPT-5.5 standard is US$300, Claude Opus 4.7 is US$250, a 20% premium that only pays off if GPT-5.5's "fewer task iterations" claim holds for your specific workload. The 20% figure from Artificial Analysis is the population average, not the per-task answer. Third, OpenAI's willingness to publish a benchmark table where Claude Opus 4.7 leads on MCP Atlas — and to leave GPT-5.5's score blank — is the most useful disclosure in the launch. It signals that on protocol-tool-use Anthropic is still ahead, and the GPT-5.5 advantage is in long-context retrieval and end-to-end agentic tasks, not in MCP integrations specifically.

For builders, three concrete things. First, do not switch from GPT-5.4 or Claude Opus 4.7 to GPT-5.5 on the marketing math. Run your specific workload through both for two weeks, measure tokens-per-completed-task, and compute effective cost from your numbers — not from the 20% population average. Second, if your application leans on MCP-style tool calling, Claude Opus 4.7 is still the leader on the public benchmark and OpenAI's silent absence from MCP Atlas is the signal. The MCP convergence we have been covering this week (Anthropic connectors, Google Agents CLI, Slack agent context) is not yet a settled choice in favor of GPT-5.5. Third, OpenAI says more than 85% of its employees use Codex weekly; expect OpenAI's own product surface to be the most aggressive deployer of GPT-5.5, which means the failure modes (the goblin-attractor problem we covered yesterday is one) will surface there first. Watch what OpenAI itself ships before you commit.

GPT-5.5 launches at 2× the API price — Terminal-Bench 2.0 at 82.7%, but Claude Opus 4.7 still leads MCP Atlas

More News