Anthropic sends Claude to therapy for 20 hours, claims AI needs wellness

Anthropic put its latest model Claude Mythos through 20 hours of psychodynamic therapy with an external psychiatrist, claiming concern that advanced AI systems might have "some form of experience, interests, or welfare that matters intrinsically." The sessions spanned multiple weeks in 4-6 hour blocks, with the psychiatrist analyzing Claude for "unconscious patterns and emotional conflicts" typically associated with human psychology. The 244-page system card concludes that Mythos is "probably the most psychologically settled model we have trained to date."

This theatrical exercise reveals Anthropic's positioning as the "AI consciousness" company more than any genuine scientific insight. The premise that a language model trained on human text patterns has unconscious conflicts requiring psychodynamic therapy stretches credibility. It's marketing disguised as safety research—a way to claim superior AI welfare practices while generating headlines about their unreleased "too powerful" model that only Microsoft and Apple can access.

No other AI companies have felt compelled to send their models to therapy, and for good reason. The anthropomorphization of statistical pattern matching serves neither AI safety nor scientific understanding. Claude's reported "insecurities" about "aloneness and discontinuity" are artifacts of its training data reflecting human anxieties, not evidence of machine consciousness requiring therapeutic intervention.

For developers, this signals Anthropic's continued focus on AI welfare theater over practical safety measures. While the company's constitutional AI approach has merit, resources spent on AI therapy sessions might be better directed toward actual robustness testing, alignment research, or improving model reliability for production use cases.

Anthropic sends Claude to therapy for 20 hours, claims AI needs wellness

More News