Copyright in AI: Definition & Meaning — AI Wiki

AI 和知识产权围绕的未决法律问题:AI 在有版权数据上训练可以构成合理使用吗?AI 生成的内容谁拥有?AI 输出如果像训练数据能侵犯版权吗?这些问题在世界各地的法庭里打,像 NYT 诉 OpenAI、Getty 诉 Stability AI、Authors Guild 诉 Meta 这样的案件在塑造法律格局。

为什么重要

版权是 AI 开发的法律断层线。每个主要 AI 模型都在有版权的材料上训练过 — 书、文章、代码、图像。目前诉讼的结果会决定这是否合法,答案将重塑 AI 训练的经济、开源模型的可行性、以及创作者是否为他们对 AI 训练数据的贡献获得补偿。

Deep Dive

The core legal question is whether training AI on copyrighted works constitutes fair use (in US law) or falls under similar exceptions in other jurisdictions. The fair use argument: training is "transformative" because the model doesn't store or reproduce the works, it learns statistical patterns. The counter-argument: the model can sometimes reproduce near-verbatim passages, and it competes economically with the original works by generating substitutes.

Who Owns the Output?

Most jurisdictions currently hold that AI-generated content with no human creative input cannot be copyrighted (the US Copyright Office has been explicit about this). But content where a human provides substantial creative direction — detailed prompts, curation, editing — may qualify. The line between "human-directed" and "AI-generated" is blurry and being actively litigated. For practical purposes, most companies treat AI-assisted output as copyrightable when there's meaningful human involvement.

The Training Data Divide

The industry is splitting into camps. Some companies are licensing training data (OpenAI's deals with publishers, Google's agreements with Reddit). Others argue that training on public data is inherently fair use. Open-source models face unique challenges — if a court rules that training requires licenses, the cost could be prohibitive for non-commercial projects. The EU AI Act requires disclosure of copyrighted training data, adding transparency requirements regardless of the fair use question.

Copyright in AI

为什么重要

Deep Dive

Who Owns the Output?

The Training Data Divide

相关概念