Copyright in AI: Definition & Meaning — AI Wiki

AI 和智慧財產權圍繞的未決法律問題:AI 在有版權資料上訓練可以構成合理使用嗎?AI 生成的內容誰擁有?AI 輸出如果像訓練資料能侵犯版權嗎?這些問題在世界各地的法庭裡打,像 NYT 訴 OpenAI、Getty 訴 Stability AI、Authors Guild 訴 Meta 這樣的案件在塑造法律格局。

為什麼重要

版權是 AI 開發的法律斷層線。每個主要 AI 模型都在有版權的材料上訓練過 — 書、文章、程式、影像。目前訴訟的結果會決定這是否合法,答案將重塑 AI 訓練的經濟、開源模型的可行性、以及創作者是否為他們對 AI 訓練資料的貢獻獲得補償。

Deep Dive

The core legal question is whether training AI on copyrighted works constitutes fair use (in US law) or falls under similar exceptions in other jurisdictions. The fair use argument: training is "transformative" because the model doesn't store or reproduce the works, it learns statistical patterns. The counter-argument: the model can sometimes reproduce near-verbatim passages, and it competes economically with the original works by generating substitutes.

Who Owns the Output?

Most jurisdictions currently hold that AI-generated content with no human creative input cannot be copyrighted (the US Copyright Office has been explicit about this). But content where a human provides substantial creative direction — detailed prompts, curation, editing — may qualify. The line between "human-directed" and "AI-generated" is blurry and being actively litigated. For practical purposes, most companies treat AI-assisted output as copyrightable when there's meaningful human involvement.

The Training Data Divide

The industry is splitting into camps. Some companies are licensing training data (OpenAI's deals with publishers, Google's agreements with Reddit). Others argue that training on public data is inherently fair use. Open-source models face unique challenges — if a court rules that training requires licenses, the cost could be prohibitive for non-commercial projects. The EU AI Act requires disclosure of copyrighted training data, adding transparency requirements regardless of the fair use question.

Copyright in AI

為什麼重要

Deep Dive

Who Owns the Output?

The Training Data Divide

相關概念