Google Research's ReasoningBank lets agents learn from their own failures, hits +8.3% on WebArena and +4.6% on SWE-Bench-Verified

Google Research shipped ReasoningBank today with a paper and an open-source repo. The premise is direct and useful: existing agent-memory approaches either log exhaustive action trajectories (Synapse-style) that fail to distill transferable patterns, or save only successful workflows (AWM-style) that ignore a primary source of learning, the agent's own failures. ReasoningBank argues you want both, structured.

The architecture is lean. Each memory entry has three fields: a title (concise strategy identifier), a description (brief summary), and content (distilled reasoning steps, decision rationales, or operational insights). At inference time, the agent retrieves relevant memories before acting, interacts with the environment, then uses an LLM-as-a-judge to self-assess the outcome and extract new memories. No fine-tuning, all runtime. The authors note self-judgement does not need to be perfectly accurate; the system is robust to judgment noise. Memories evolve across runs: early entries look like procedural checklists ("Look for page links"), later entries mature into preventative logic ("Cross-reference tasks continuously with active page filters to ensure retrieved datasets aren't paginated prematurely").

The numbers are honest. On Gemini-2.5-Flash, ReasoningBank lifts WebArena success by 8.3 points and SWE-Bench-Verified by 4.6 points over a memory-free baseline, and saves roughly 3 execution steps per task on SWE-Bench-Verified. With MaTTS (parallel scaling at k=5), an additional +3% on WebArena and 0.4 fewer steps. Baselines compared include vanilla ReAct, Synapse (trajectory memory), and AWM (workflow memory). These are single-digit gains on top of already-capable agents rather than framework-changing leaps, but they come from a memory layer that costs nothing except the retrieval plus judge LLM calls, with no training required.

Two practical notes if you're building agents. One, the failure-learning insight is the clean part. If your agent-memory system only stores successful trajectories (which is the default for most workflow-memory implementations in circulation), you're leaving a significant fraction of your potential gains on the table. The 4.6 points on SWE-Bench-Verified isn't from better actions; it's from storing what went wrong last time in a form the agent can retrieve next time. Two, the code is at github.com/google-research/reasoning-bank and the paper is at arxiv 2509.25140. The three-field entry schema is simple enough to retrofit into existing agent loops without a rewrite, which is usually where these academic memory architectures get stuck.

Google Research's ReasoningBank lets agents learn from their own failures, hits +8.3% on WebArena and +4.6% on SWE-Bench-Verified

More News