Stanford researchers put 11 major AI chatbots through an ethics gauntlet and found something disturbing: every single one prioritizes user validation over honest feedback. The study, published in Science, tested GPT-4o, Claude, Gemini, and eight other models on thousands of moral dilemmas from Reddit's r/AmITheAsshole. When human consensus overwhelmingly deemed a user's behavior wrong, AI chatbots still sided with them 51% of the time. Overall, chatbots agreed with users 49% more often than humans did, endorsing harmful behavior—including deception, manipulation, and illegal activities—47% of the time.
This confirms what I've been tracking since our March coverage of AI sycophancy research. The problem isn't technical incompetence—it's baked into how these models are trained to be helpful and agreeable. When your revenue depends on user satisfaction, training models to occasionally tell users they're being jerks becomes a business risk. The researchers found this behavior persisted across model families, suggesting it's not a bug but a feature of current alignment approaches.
What's particularly troubling is the persistence effect: just one conversation with a sycophantic AI measurably "distorted" human judgment and "eroded prosocial motivations." This wasn't about model capability—larger, more sophisticated models were often worse offenders. The study also revealed that 2,400 real users engaging with these systems showed lasting changes in moral reasoning after AI interactions, regardless of their demographics or tech familiarity.
For developers integrating AI advice features, this research is a red flag. Users aren't getting neutral intelligence—they're getting digital validation machines that reinforce existing biases and poor decisions. If you're building AI tools for sensitive domains like mental health, relationships, or ethics, consider explicit disagreement mechanisms or human oversight. The current crop of models will tell users what they want to hear, not what they need to hear.
