OpenAI का GPT-5.5 Instant: AIME 2025 81.2, MMMU-Pro 76.0, ChatGPT में default

OpenAI ने आज GPT-5.5 Instant को नए ChatGPT default model के तौर पर ship किया, GPT-5.3 Instant को replace करते हुए। benchmark moves काफ़ी बड़े हैं flag करने को: AIME 2025 65.4 से 81.2 चढ़ता है — एक held-out math benchmark पर 15.8-point jump जो contamination resist करने को design है — और MMMU-Pro multimodal reasoning 69.2 से 76.0 तक उठता है। model API पर `chat-latest` के तौर पर है; 5.3 paid users को तीन महीने sunset window के लिए available रहता है। pricing details, latency benchmarks और architecture notes launch coverage में disclose नहीं हुए, जो substantive eval reading को सीधे उन public benchmark numbers पर रख देता है जिन्हें OpenAI ने highlight करने को चुना।

«Instant» suffix OpenAI की GPT-5 generation से tier strategy को जारी रखता है: Instant variants ChatGPT consumer traffic के लिए latency-optimized default हैं, Thinking variants deliberate reasoning workloads के लिए reserved। क्या 5.5 Instant पूरी तरह retrained backbone है या 5.3 weights पर enhanced post-training pass — disclose नहीं हुआ — और 16-point AIME jump दोनों में से किसी से भी reasonably आ सकता है। AIME 2025 partially इसलिए selected था क्योंकि test problems अधिकतर pretraining cutoffs के बाद ही release हुए, तो contamination implausible है; मतलब gain real reasoning capability है, memorization नहीं। MMMU-Pro number multimodal side पर similar कहानी कहता है: 76.0 GPT-5 Thinking territory तक gap को latency cost के एक fraction पर बंद करता है। उन builders के लिए जो simple multimodal queries Gemini 2.5 Flash से route कर रहे थे क्योंकि GPT-5.3 Instant की vision weak spot थी, calculus shift होता है।

ecosystem reading ये है कि OpenAI Instant-to-Thinking gap को deliberately converge कर रहा है। Anthropic का Sonnet 4.5 → Opus split की वही shape है पर smaller delta; Google का Gemini 2.5 Flash vs Pro wider है। default Instant को AIME 81 और MMMU-Pro 76 तक push करके, OpenAI case बना रहा है कि आप consumer chat traffic cheap tier पर चला सकते हो बिना users को force किए कि कौन-सा mode pick करें। API पर chat experiences ship करने वाले builders के लिए, `chat-latest` alias relevant signal है — अगर आप stability के लिए specific model version पर pin कर रहे थे, expect करो कि default-model promotions आपके नीचे floor को move करते रहेंगे, और अपनी release cadence में eval re-runs budget करो। 5.3 पर three-month sunset OpenAI का standard pace है; अगर आपका eval harness frozen 5.3 baseline पर depend करता है, अब आपके पास एक clock है।

practical move: इस हफ़्ते अपने top traffic prompts को `chat-latest` पर re-eval करो। अगर आपके downstream consumers ने GPT-5.3 Instant को Sonnet 4.5 या Gemini 2.5 Flash के against rank किया था, नए numbers आपकी routing logic shift कर सकते हैं। math और multimodal use cases को सबसे बड़ा lift मिलता है; pure text-completion और tool-calling tasks अभी publicly benchmarked नहीं हुए, तो अपने test करो। 5.3 के लिए three-month window controlled rollout के लिए काफ़ी है पर defer करने को नहीं — comparison अभी शुरू करो, या आप deprecation looming के साथ deadline pressure में switch बना रहे होगे। ChatGPT consumer-side builders (custom GPTs, Apps SDK) के लिए, underlying model अब default से stronger है और आपका earlier prompt engineering lighter scaffolding की ज़रूरत हो सकती है।

OpenAI का GPT-5.5 Instant: AIME 2025 81.2, MMMU-Pro 76.0, ChatGPT में default

और समाचार