Peer-review AI flood: 30% of reviews show it, writing quality down 1.28σ

Organization Science, the INFORMS management journal, has published an editorial from its AI Task Force documenting what AI has done to its peer-review pipeline since ChatGPT launched. The dataset is 6,957 initial submissions and 10,389 reviews from January 2021 through February 2026. Submissions jumped 42% after December 2022. By February 2026, the majority of papers analyzed showed at least some AI involvement; writing quality, measured via readability metrics, declined by 1.28 standard deviations from baseline. About 30% of peer reviews now show detectable AI use, versus near-zero before ChatGPT. The detection tool is Pangram, scoring on a continuous 0-1 scale, with the editors explicitly acknowledging "no detection system is fully reliable for judging individual texts."

The interesting numbers are downstream of the headline. Papers in the 0-15% AI-content bucket received revise-and-resubmit decisions 11.9% of the time. Papers in the 70%+ AI bucket received revise-and-resubmit only 3.2% of the time — meaning the heavily AI-assisted papers were rejected outright at much higher rates. That isn't editors being perfectly able to detect AI; it's the AI-assisted writing being identifiably weaker on the dimensions reviewers actually measure. The senior editor running the analysis is Claudine Gartenberg at Wharton. The editorial doesn't propose automated gatekeeping — it flags the deeper structural issue as tenure and hiring incentives that reward submission volume regardless of marginal contribution.

For builders, the second-order effect matters more than the headline. Peer-reviewed publication has been the trust signal builders use to filter what's worth reading — "this passed review at NeurIPS / Nature / a top venue" is the proxy for technical credibility. If 30% of reviews are now AI-assisted and submission volume is up 42%, the noise floor under that signal is rising. The reviewer who used to spend four hours on a paper might be approving an LLM-summarized version in twenty minutes. The eval-of-evals problem in AI research becomes recursive: we use peer review to validate AI claims, but peer review itself is now partly conducted by AI. It's the same shape as the Harvard ER medical-AI accountability gap — clinical evidence ahead of regulatory infrastructure, scientific evidence ahead of review infrastructure.

Practical reads. When you consume research, don't outsource skepticism to the journal name; read the methods section, check the eval harness, look for code releases, validate central claims yourself when a procurement or architectural decision rests on a paper's findings. If you're in academic publishing or running an internal research program, the Pangram-style detection-plus-continuous-scoring is the eval methodology worth tracking — not for gatekeeping but for distributional analysis of where review attention is going and where it's already gone. Tenure incentives are the structural lever Organization Science identifies, and they're outside any single journal's control. The signal: trust-via-venue is a 2010s assumption that doesn't survive the volume shift.

Peer-review AI flood: 30% of reviews show it, writing quality down 1.28σ

More News