Google Research के ReasoningBank से agents अपनी failures से सीखते हैं, WebArena पर +8.3% और SWE-Bench-Verified पर +4.6%

Google Research ने आज ReasoningBank भेजा एक paper और एक open-source repo के साथ। Premise सीधी और उपयोगी है: मौजूदा agent-memory approaches या तो exhaustive action trajectories (Synapse-style) log करते हैं जो transferable patterns distill करने में विफल होते हैं, या केवल सफल workflows (AWM-style) save करते हैं जो learning का एक प्राथमिक स्रोत नज़रअंदाज़ करते हैं, agent की अपनी failures। ReasoningBank तर्क देता है कि आप दोनों चाहते हैं, structured।

Architecture lean है। प्रत्येक memory entry में तीन fields हैं: एक title (संक्षिप्त strategy identifier), एक description (संक्षिप्त summary), और content (distilled reasoning steps, decision rationales, या operational insights)। Inference time पर, agent action लेने से पहले relevant memories retrieve करता है, environment के साथ interact करता है, फिर outcome का self-assess करने और नई memories extract करने के लिए एक LLM-as-a-judge उपयोग करता है। कोई fine-tuning नहीं, सब runtime। Authors नोट करते हैं कि self-judgement को perfectly accurate होने की आवश्यकता नहीं है; system judgment noise के प्रति robust है। Memories runs के पार evolve होती हैं: शुरुआती entries procedural checklists की तरह दिखती हैं ("Page links देखें"), बाद की entries preventative logic में परिपक्व होती हैं ("Active page filters के साथ tasks को निरंतर cross-reference करें यह सुनिश्चित करने के लिए कि retrieved datasets premature रूप से paginated नहीं हैं")।

संख्याएं ईमानदार हैं। Gemini-2.5-Flash पर, ReasoningBank WebArena success को 8.3 points और SWE-Bench-Verified को 4.6 points एक memory-free baseline के ऊपर उठाता है, और SWE-Bench-Verified पर प्रति task लगभग 3 execution steps बचाता है। MaTTS (parallel scaling, k=5) के साथ, WebArena पर अतिरिक्त +3% और 0.4 कम steps। तुलना की गई baselines में vanilla ReAct, Synapse (trajectory memory), और AWM (workflow memory) शामिल हैं। ये पहले से capable agents के ऊपर single-digit gains हैं framework-बदलने वाली छलांगों के बजाय, लेकिन वे एक memory layer से आते हैं जो retrieval plus judge LLM calls के अलावा कुछ खर्च नहीं करता, कोई training आवश्यक नहीं।

दो व्यावहारिक notes अगर आप agents बना रहे हैं। एक, failure-learning insight साफ हिस्सा है। अगर आपका agent-memory system केवल सफल trajectories store करता है (जो circulation में अधिकांश workflow-memory implementations के लिए default है), तो आप अपने potential gains का एक महत्वपूर्ण हिस्सा मेज़ पर छोड़ रहे हैं। SWE-Bench-Verified पर 4.6 points बेहतर actions से नहीं आते; वे पिछली बार क्या गलत हुआ उसे एक form में store करने से आते हैं जिसे agent अगली बार retrieve कर सकता है। दो, code github.com/google-research/reasoning-bank पर है और paper arxiv 2509.25140 पर है। तीन-field entry schema मौजूदा agent loops में rewrite के बिना retrofit करने के लिए काफी simple है, जो आमतौर पर वह जगह है जहां ये academic memory architectures अटक जाती हैं।

Google Research के ReasoningBank से agents अपनी failures से सीखते हैं, WebArena पर +8.3% और SWE-Bench-Verified पर +4.6%

और समाचार