OpenAI's GPT-5.5 Instant: AIME 2025 81.2, MMMU-Pro 76.0, default in ChatGPT

OpenAI shipped GPT-5.5 Instant today as the new default ChatGPT model, replacing GPT-5.3 Instant. The benchmark moves are large enough to flag: AIME 2025 climbs from 65.4 to 81.2 — a 15.8-point jump on a held-out math benchmark designed to resist contamination — and MMMU-Pro multimodal reasoning lifts from 69.2 to 76.0. The model is on the API as `chat-latest`; 5.3 stays available to paid users for a three-month sunset window. Pricing details, latency benchmarks, and architecture notes weren't disclosed in the launch coverage, which puts the substantive eval read squarely on the public benchmark numbers OpenAI chose to highlight.

The "Instant" suffix continues OpenAI's tier strategy from the GPT-5 generation: Instant variants are the latency-optimized default for ChatGPT consumer traffic, with Thinking variants reserved for deliberate reasoning workloads. Whether 5.5 Instant is a fully retrained backbone or an enhanced post-training pass on the 5.3 weights isn't disclosed — and the 16-point AIME jump could reasonably come from either. AIME 2025 was selected partly because the test problems weren't released until after most pretraining cutoffs, so contamination is implausible; that means the gain is real reasoning capability, not memorization. The MMMU-Pro number tells a similar story on the multimodal side: 76.0 closes the gap to GPT-5 Thinking territory at a fraction of the latency cost. For builders who've been routing simple multimodal queries through Gemini 2.5 Flash because GPT-5.3 Instant's vision was the weak spot, the calculus shifts.

The ecosystem read is that OpenAI is converging the Instant-to-Thinking gap deliberately. Anthropic's Sonnet 4.5 → Opus split has the same shape but a smaller delta; Google's Gemini 2.5 Flash vs Pro is wider. By pushing the default Instant to AIME 81 and MMMU-Pro 76, OpenAI is making the case that you can run consumer chat traffic on the cheap tier without forcing users to know which mode to pick. For builders shipping chat experiences on the API, the `chat-latest` alias is the relevant signal — if you've been pinning to a specific model version for stability, expect default-model promotions to keep moving the floor under you, and budget eval re-runs into your release cadence. The three-month sunset on 5.3 is OpenAI's standard pace; if your eval harness depends on a frozen 5.3 baseline, you have a clock now.

Practical move: re-eval your top traffic prompts on `chat-latest` this week. If your downstream consumers ranked GPT-5.3 Instant against Sonnet 4.5 or Gemini 2.5 Flash, the new numbers might shift your routing logic. Math and multimodal use cases get the biggest lift; pure text-completion and tool-calling tasks haven't been benchmarked publicly yet, so test your own. The three-month window for 5.3 is enough to do a controlled rollout but not enough to defer it — start the comparison now, or you'll be making the switch under deadline pressure with the depreciation looming. For ChatGPT consumer-side builders (custom GPTs, Apps SDK), the underlying model is now stronger by default and your earlier prompt engineering may need lighter scaffolding.

OpenAI's GPT-5.5 Instant: AIME 2025 81.2, MMMU-Pro 76.0, default in ChatGPT

More News