AGI: Definition & Meaning — AI Wiki

一种假想的 AI 系统,能够理解、学习并执行人类能胜任的任何智力任务 — 具备跨领域迁移知识的能力,而无需为每个领域专门训练。不同于当前 AI 擅长的狭窄任务(生成文本、图像分类),AGI 能应对新情境、进行抽象推理、适应任何挑战。AGI 是否迫在眉睫、还需数十年、或根本不可能,是领域内最具争议的辩论。

为什么重要

AGI 是整个 AI 行业的北极星(或者说魔鬼)。它驱动数十亿美元的投资,塑造安全研究的优先级,主导政策辩论。无论你是否相信 AGI 近在咫尺,这个概念定义了 Anthropic、OpenAI、DeepMind 这些公司如何框定自己的使命 — 理解这场辩论能帮你把真正的进展从炒作中分离出来。

Deep Dive

The first problem with AGI is that nobody agrees on what it means. OpenAI published a five-level framework in 2024: Level 1 is chatbots (conversational AI), Level 2 is reasoners (human-level problem solving), Level 3 is agents (systems that take actions), Level 4 is innovators (systems that aid in invention), and Level 5 is organizations (AI that can do the work of an entire company). By their own definition, they claimed to be approaching Level 2 with o1. François Chollet, creator of Keras and the ARC benchmark, takes a fundamentally different view — he argues that AGI means efficient skill acquisition, the ability to pick up genuinely new tasks with minimal examples, not just impressive performance on tasks similar to training data. Google DeepMind proposed yet another framework that separates generality from performance, creating a matrix where you could have narrow superintelligence or general incompetence. These are not minor definitional quibbles. Which definition you adopt determines whether AGI is two years away or two centuries away.

The Current State of Play

Where we actually stand depends entirely on how you measure. Large language models can pass the bar exam, write publishable code, explain quantum mechanics, compose poetry, and reason through novel logic puzzles. By any standard from even five years ago, this would have been considered strong evidence of general intelligence. And yet these same systems sometimes cannot reliably count the letters in a word, struggle with spatial reasoning, confuse correlation with causation, and confidently state false information. Is this 90% of the way to AGI, with the remaining 10% being engineering details? Or is it 10% of the way, with the impressive parts being a parlor trick built on pattern matching at scale? Honest researchers disagree sharply. The optimists point out that each new model generation fixes many of the previous failure modes. The skeptics point out that the remaining failures suggest fundamental architectural limitations, not just scaling issues.

The Scaling Debate

The most consequential technical debate in AI right now is whether scaling — more data, more compute, more parameters — will eventually produce AGI, or whether we need fundamentally new architectures. The scaling hypothesis, championed most visibly by researchers at OpenAI, holds that intelligence is primarily a function of scale: make the model big enough, train it on enough data, and general capability emerges. The evidence for this view is real — GPT-4 is qualitatively more capable than GPT-3, which was qualitatively more capable than GPT-2, and each jump came largely from scaling. The counter-argument is that scaling laws show diminishing returns, that current architectures have fundamental limitations (no persistent memory, no world model, no causal reasoning), and that throwing more compute at a flawed architecture just produces a bigger flawed system. The truth is probably somewhere in between. Scaling has produced genuine breakthroughs that nobody predicted, but there are classes of problems — long-horizon planning, physical reasoning, reliable arithmetic — where more scale has not reliably helped.

The Economic Argument

There is a pragmatic reframing of AGI that sidesteps the philosophical debate entirely: AGI does not need to match or exceed human intelligence in every domain. It just needs to be good enough to automate most knowledge work. A system that can write code at a senior engineer level, draft legal documents, analyze medical images, manage projects, and handle customer support — even if it cannot tie a shoelace or understand a joke about its own limitations — would transform the global economy as profoundly as any hypothetical "true" AGI. Some economists argue we are already entering this era. The question is not whether AI will be conscious or "truly" intelligent but whether it will make most white-collar jobs automatable. That framing makes the AGI timeline feel much shorter and much more concrete, regardless of where you stand on the philosophical questions.

Safety and the Timeline Problem

The timeline for AGI matters enormously for safety research, and this is not a theoretical concern. Alignment — the work of ensuring advanced AI systems do what we actually want — is genuinely hard. Current techniques like RLHF and constitutional AI work reasonably well for today's systems, but they rely on humans being able to evaluate the AI's outputs. As systems become more capable, this assumption breaks down. If AGI is fifty years away, there is time to develop robust alignment techniques, build institutional frameworks, and iterate through many rounds of testing. If AGI is five years away, we are running alignment research on a deadline that may not be sufficient. This is why timeline estimates are not just academic curiosity — they directly determine how urgently we need to solve alignment, how aggressively we should regulate AI development, and how much risk the major labs should be willing to accept in pursuit of capability gains. The researchers who worry most about AGI safety are not necessarily the ones who think AGI is most likely; they are the ones who think the consequences of getting it wrong are irreversible.

AGI