Agent: Definition & Meaning — AI Wiki

An AI system that can autonomously plan and execute multi-step tasks, using tools (web search, code execution, API calls) to achieve a goal. Unlike a simple chatbot that answers one question at a time, an agent decides what to do next based on what it's learned so far.

Why it matters

Agents are the bridge between "AI that talks" and "AI that does." When your AI can browse docs, write code, and test it without you holding its hand at every step — that's an agent.

Deep Dive

At its core, an agent is just a loop. The model receives a goal, decides on a next action (usually a tool call), observes the result, and repeats until the goal is met or it decides it can't proceed. This is sometimes called the "ReAct" pattern — Reason, Act, Observe. What makes it powerful is that the model maintains state across iterations: it remembers what it already tried, what failed, and what information it gathered. The loop is orchestrated by a harness — a piece of code that sends messages to the model, executes the tool calls the model requests, and feeds the results back in. Frameworks like LangChain, CrewAI, and Anthropic's own Agent SDK provide this harness, but you can also build one in about fifty lines of code. The model itself never "runs" anything; it just outputs structured JSON saying "call this function with these arguments," and your code does the rest.

Designing the Toolbox

The practical difference between a good agent and a frustrating one comes down to how you define its tools and how much autonomy you give it. A coding agent like Claude Code or Cursor's agent mode might have tools for reading files, writing files, running shell commands, and searching a codebase. A customer-support agent might have tools for looking up orders, issuing refunds, and escalating tickets. The key design decision is granularity: too few tools and the agent can't do anything useful; too many and it gets confused about which to pick. In production, most teams find that 5–15 well-defined tools is the sweet spot. Each tool needs a clear name, a good description (this is what the model reads to decide when to use it), and a well-typed parameter schema.

One Agent Beats a Swarm

One of the biggest misconceptions about agents is that they need elaborate multi-agent architectures to be useful. The industry went through a phase of "agent swarms" and "crew" patterns where you'd have a planner agent, a researcher agent, a writer agent, and a critic agent all talking to each other. In practice, a single model in a tight loop with good tools usually outperforms these complex setups. Multi-agent patterns add latency, cost, and failure modes. They make sense for genuinely parallel workloads — say, scanning ten repos simultaneously — but for most sequential tasks, one agent with clear instructions does the job. The companies shipping real agent products (Anthropic, OpenAI, Google) have converged on this simpler architecture.

The Reliability Problem

Reliability is the hard part. An agent that works 90% of the time sounds good until you realize that in a 10-step task, a 90% per-step success rate gives you a ~35% chance of completing the whole thing. This is why production agents need guardrails: maximum iteration limits, cost caps, human-in-the-loop checkpoints for dangerous actions (like deleting data or spending money), and graceful failure modes. The best agent implementations also include retry logic with backoff, structured error handling that feeds failures back to the model so it can try a different approach, and logging that lets you trace exactly what happened when things go sideways.

From Demo to Production

The evolution of agents has been rapid. In 2023, AutoGPT went viral but was mostly a demo — it burned through tokens and rarely completed complex tasks. By 2025, Claude Code, Devin, and similar tools were writing production code, running tests, and shipping pull requests with real reliability. The difference wasn't just better models; it was better tool design, better prompting, and hard-won engineering lessons about keeping the loop tight. If you're building an agent today, start with a single loop, a handful of tools, and invest your time in making those tools return clean, useful output. That matters more than any framework choice.

Agent