Copyright in AI: Definition & Meaning — AI Wiki

Les questions légales non résolues autour de l'IA et la propriété intellectuelle : l'entraînement IA sur des données copyrightées peut-il constituer un fair use ? Qui possède le contenu généré par IA ? La sortie IA peut-elle enfreindre le copyright si elle ressemble aux données d'entraînement ? Ces questions se battent dans les tribunaux du monde entier, avec des cas comme NYT v. OpenAI, Getty v. Stability AI et Authors Guild v. Meta qui façonnent le paysage légal.

Pourquoi c'est important

Le copyright est la ligne de faille légale du développement IA. Chaque modèle IA majeur a été entraîné sur du matériel copyrighté — livres, articles, code, images. L'issue des procès actuels va déterminer si c'est légal, et la réponse va remodeler l'économie de l'entraînement IA, la viabilité des modèles open-source et si les créateurs sont compensés pour leurs contributions aux données d'entraînement IA.

Deep Dive

The core legal question is whether training AI on copyrighted works constitutes fair use (in US law) or falls under similar exceptions in other jurisdictions. The fair use argument: training is "transformative" because the model doesn't store or reproduce the works, it learns statistical patterns. The counter-argument: the model can sometimes reproduce near-verbatim passages, and it competes economically with the original works by generating substitutes.

Who Owns the Output?

Most jurisdictions currently hold that AI-generated content with no human creative input cannot be copyrighted (the US Copyright Office has been explicit about this). But content where a human provides substantial creative direction — detailed prompts, curation, editing — may qualify. The line between "human-directed" and "AI-generated" is blurry and being actively litigated. For practical purposes, most companies treat AI-assisted output as copyrightable when there's meaningful human involvement.

The Training Data Divide

The industry is splitting into camps. Some companies are licensing training data (OpenAI's deals with publishers, Google's agreements with Reddit). Others argue that training on public data is inherently fair use. Open-source models face unique challenges — if a court rules that training requires licenses, the cost could be prohibitive for non-commercial projects. The EU AI Act requires disclosure of copyrighted training data, adding transparency requirements regardless of the fair use question.

Copyright in AI

Pourquoi c'est important

Deep Dive

Who Owns the Output?

The Training Data Divide

Concepts liés