Gemini 3.5 Flash ज़्यादातर evals पर 3.1 Pro को मात, Antigravity 2.0 agent-first IDE, Zubnet AI समाचार

Google ने I/O 2026 पर Gemini 3.5 Flash announce किया: coding और agentic tasks सहित ज़्यादातर benchmarks पर Gemini 3.1 Pro को outperform करता है, अन्य frontier models से 4x faster होने का दावा एक optimized variant के साथ 12x पर। आज से Gemini app और AI Mode in Search में globally default। Claims में autonomous coding pipelines execute करना, research projects manage करना, scratch से operating systems build करना, और multiple hours तक independently run करना shamil हैं human judgment के लिए pause behavior के साथ। Antigravity 2.0 standalone desktop application के रूप में launch हुआ — "agentic development platform और IDE जहाँ agents live, work, और execute कर सकते हैं," Gemini 3.5 Flash के साथ co-developed। Gemini Spark 24/7 personal AI agent के रूप में debut हुआ। Search agentic capabilities और generative interfaces के साथ rebuilt। Announcement में disclosed नहीं numbers: context window, MMLU, SWE-bench Verified, और pricing।

Tier inversion वो move है जो मायने रखता है। Historically Flash Google का cheap sub-tier था और Pro production frontier था। 3.5 के साथ, Flash अब production model है और Pro tier ambiguous हो जाता है — Google ने नहीं कहा कि 3.1 Pro deprecated option के रूप में रहता है या 3.5 Pro क्या होगा। अन्य frontier models के against 4x speedup plausible है अगर OpenAI o-class और Anthropic Opus 4.7 के against default settings पर benchmarked हो, लेकिन Google ने harness या comparison details publish नहीं किए। 12x-faster optimized variant तुरंत quantization या distillation tradeoffs पर सवाल उठाता है जो address नहीं किए गए। Autonomous-for-hours claims को कुछ भी concrete मतलब होने से पहले adversarial benchmarking की ज़रूरत है — कल Anthropic का Capability Curve framing (बारह महीनों में SWE-bench Verified पर 62% से 87%) उस तरह का grounded number है जिसकी इस announcement में कमी है। Antigravity 2.0 standalone desktop IDE के रूप में Claude Code (जिसने कल Routines, Managed Agents, और Capability Curve framing ship किया) और Cursor के background-agent track के against direct positioning है। "Gemini 3.5 Flash के साथ co-developed" का मतलब है कि Google ने IDE और model एक साथ design किए — वही vertical integration play जो Anthropic Claude Code के साथ और Cursor अपनी model selection के साथ चलाता है।

तीन labs अलग-अलग bets पर lined up। OpenAI: compute-and-scale Stargate trajectory। Anthropic: AI-assisted research velocity (कल का Karpathy hire), Capability Curve framing, MCP और Managed Agents के via infrastructure primitives। Google: agent-first development environment plus full-stack integration (Search, Workspace, IDE, mobile)। IDEs evaluate करने वाले builders के लिए, Antigravity 2.0 अब Claude Code, Cursor, और OpenAI Codex के साथ four-way fight में join करता है; deeper question है कि क्या agent-IDE integration सही abstraction layer है। Anthropic का MCP-as-protocol bet कहता है नहीं, value protocol layer पर accrue होती है। Google का Antigravity bet कहता है हाँ, IDE खुद agent operating environment बन जाता है। Google का tier inversion (Flash beats Pro) भी model-pricing convergence signal करता है — cheap और expensive frontier models के बीच price/performance gap collapse हो रहा है, जो business strategy के रूप में model routing पर wrapper-ecosystem margin को compress करता है।

सोमवार: अगर आप Gemini API पर build करते हैं, Google AI Studio docs को 3.1 Pro पर actual deprecation path और 3.5 Flash पर pricing के लिए watch करें। User-facing default switch आपको नहीं बताता कि आपकी production API integrations automatically migrate होती हैं या change require करती हैं। अगर आप अपनी team के लिए IDEs evaluate कर रहे हैं, Antigravity 2.0 Claude Code, Cursor, और Codex के साथ bake-off में belongs करता है। Real test: अपने repo में एक four-hour autonomous coding task pick करें (एक real subsystem refactor करें, tests के साथ scratch से एक service build करें) और same starting prompt के साथ चारों पर run करें। पहले valid PR तक wall-clock time, defect rate, और agent के judgment-pause prompts की quality compare करें। 12x-faster optimized variant को standard model के against specifically eval-test किया जाना चाहिए — quantization या distillation अक्सर multi-step reasoning को degrade करता है even जब single-shot benchmarks hold करते हैं। Gemini Spark और personal agents के लिए generally, adversarial reviews का wait करें कि "अपनी digital life को 24/7 manage करना" actually क्या करता है, कितना costs करता है, और क्या protect करता है। Google के claims का सबसे strong हिस्सा यह है कि वे testable हैं; अगले चार हफ्तों का adversarial benchmarking आपको आज की announcement से ज़्यादा बताएगा।

Gemini 3.5 Flash ज़्यादातर evals पर 3.1 Pro को मात, Antigravity 2.0 agent-first IDE

और समाचार