A new analysis explores how large language models could industrialize p-hacking—the practice of manipulating statistical analysis to make insignificant results appear significant. Drawing from Stefan and Schönbrodt's "Big Little Lies" research on human statistical manipulation, the piece examines whether AI will become "guardians of scientific integrity" or automate fraud at scale. The concern centers on AI's ability to navigate what researchers call the "Garden of Forking Paths"—the countless analytical choices that can dramatically alter study conclusions.
This matters because AI is already embedded in research workflows across academia and industry. While human p-hacking typically involves stressed PhD students fudging numbers at 3AM, AI could systematically explore every possible analytical pathway to find the one that produces desired results. The automation potential is staggering: instead of one researcher trying a few different approaches, an LLM could test thousands of variable combinations, outlier removal strategies, and statistical methods until something hits significance.
What makes this particularly dangerous is the plausible deniability. When humans p-hack, there's usually intent. When an AI does it, researchers can claim they were just being "thorough" or "exploring all possibilities." The tool becomes the perfect scapegoat for methodological malfeasance, wrapped in the veneer of computational rigor.
For developers building AI research tools, this creates a responsibility problem. Your statistical analysis assistant isn't just helping researchers work faster—it might be helping them lie better. The solution isn't avoiding AI in research, but building guardrails that prevent systematic fishing expeditions. Think mandatory pre-registration of analysis plans, automated flags for multiple testing, and transparency logs that show every analytical path explored.
