Anthropic's widely circulated chart claiming AI has "theoretical capability" to perform 80% of tasks across 22 job categories isn't based on their own model testing. Instead, it cites an August 2023 OpenAI study that used human annotators—not actual workers in those fields—to guess whether GPT-4 or "anticipated LLM-powered software" could reduce task completion time by 50%. The study analyzed O*NET's granular job breakdowns, but the "theoretical" assessments came from AI researchers making educated guesses about future capabilities, not empirical evidence.
This matters because the chart has been floating around as evidence of imminent job displacement, when it's really speculation from over a year ago about productivity improvements, not job replacement. The researchers explicitly focused on time savings "with equivalent quality," not full automation. Yet Anthropic's visualization makes it appear as though current LLMs are theoretically capable of performing the vast majority of human work across fields from legal services to management.
What's particularly telling is that the "observed exposure" (red area) remains tiny compared to the "theoretical capability" (blue area). This gap reveals how far actual AI deployment lags behind researcher speculation. The 2023 study couldn't account for real-world implementation challenges, regulatory constraints, or the difference between speeding up tasks and replacing workers entirely.
For developers building AI tools, this disconnect highlights why user research and real-world testing matter more than theoretical assessments. Before assuming your AI can handle 80% of any job category, test it with actual practitioners in those fields—not AI researchers making educated guesses.
