Zubnet AI学习Wiki › Watermarking
Safety

Watermarking

AI Watermark, Text Watermarking
在 AI 生成内容里嵌入不可见信号,允许事后检测的技术。文本水印在生成时微妙地偏置 token 选择,这样检测器能统计上识别带水印的文本。图像水印在生成的像素里嵌入不可见模式。目标是让 AI 内容可识别,而不降低质量。

为什么重要

当 AI 生成内容变得和人类创作的内容难以区分,水印是少数能在规模上帮助区分它们的技术方法之一。它对打击错误信息、学术诚信、内容来源都重要。但这不是个已解决的问题 — 文本水印能通过改写被移除,水印和去除之间的军备竞赛在持续。

Deep Dive

The most cited approach to text watermarking (Kirchenbauer et al., 2023) works by splitting the vocabulary into "green" and "red" lists at each generation step, using a hash of the previous token as the seed. The model is then biased to prefer green-list tokens. A detector that knows the hashing scheme can check whether a text uses statistically more green-list tokens than expected by chance. The bias is small enough that humans don't notice, but large enough for statistical detection over a few hundred tokens.

The Robustness Problem

Text watermarks are fragile. Paraphrasing the text (manually or with another model), translating to another language and back, or even inserting/deleting a few words can destroy the statistical signal. This is fundamentally different from image watermarks, which can survive cropping, compression, and resizing. The research community is working on more robust schemes, but there's an inherent tension: a stronger watermark affects text quality, while a subtler watermark is easier to remove.

Adoption and Regulation

The EU AI Act mandates that AI-generated content be labeled as such, pushing watermarking from research toward deployment. Google's SynthID and Meta's watermarking research are production implementations. But voluntary adoption is uneven — if only some providers watermark, users can simply switch to one that doesn't. Effective watermarking may ultimately require regulation or industry-wide standards, similar to how content ratings work for media.

相关概念

← 所有术语
← Wan-AI Weights →