Kadrey v. Meta: AI training found to be fair use, but it all depends on the facts

Jul 7

Last week, we looked at Bartz v. Anthropic PBC, No. 3:23-cv-04768-WHA (N.D. Cal. Feb. 12, 2025), where the U.S. District Court for the Northern District of California ruled that training an AI on copyrighted books constituted fair use, while storing entire pirated copies for retrieval did not enjoy the same protection. Now there’s another major decision: Kadrey v. Meta Platforms, Inc., No. 3:23-cv-03417-AMO (N.D. Cal. June 25, 2025), also from the Norther District of California and also addressing fair use.

The courts come to a similar outcome, but the paths they took are different and have important implications for future cases. The main headlines I seem to be seeing on these cases is that it is fair use to use copyrighted books to train AI. But a closer look shows things aren’t that simple.

The Kadrey case

Thirteen authors, including Richard Kadrey, sued Meta, alleging that Meta infringed their copyrights by training its large language models on their books. Some of the books were allegedly acquired from so-called shadow libraries, sometimes called “pirate sites,” like Library Genesis or Bibliotik. Meta moved for summary judgment, arguing that training on these works qualified as fair use.

The court agreed. It granted Meta’s motion and denied the authors’ cross-motion. The court concluded that, on the factual record before it, Meta’s use was protected by fair use, which barred a copyright infringement claim.

A short technical detour

By the way, if you are interested in technical stuff, you’ll enjoy learning about the mechanics of how Meta assembled copies of the books. It used “torrenting” to do that, and the parties had a dispute whether it used “seeding” or “leeching,” although that didn’t affect the outcome of the case. Read the opinion beginning at page 11 of the slip opinion if you want to learn more. You can then go on to see that Meta post-trained its models to prevent them from “memorizing” and outputting certain text from their training data, including copyrighted material.

Now, let’s get back to the law.

Fair use analysis

The court’s discussion followed the traditional four-factor test for fair use: 1) purpose and character of the use, 2) nature of the copyrighted work, 3) amount and substantiality of the portion used, and 4) effect on the market for the original work. The Court acknowledged that the most important factor is often factor number four, but it still discussed all four factors.

First, it found that the purpose and character of Meta’s use strongly favored fair use. Training a general-purpose large language model was highly transformative. It repurposed the books into a tool that could generate summaries, stories, or code. That’s a fundamentally different use from simply reading the books.

Second, the nature of the works weighed against fair use. The authors’ books were creative and expressive, so this factor cut against Meta.

Third, on the amount used, the court determined that copying entire works was reasonable. Complete copying was necessary to achieve the transformative purpose of training large language models.

Finally, the court held that the market impact factor favored fair use. The plaintiffs offered no evidence that large language models’ outputs acted as substitutes for their books or otherwise harmed sales. The court acknowledged the theoretical risk of “market dilution, “where AI-generated content could saturate the market with derivative or style-based works.” But it found no actual evidence of that.

The court also criticized the analogy drawn by the Bartz court between training people to write better by reading books and training AI on copyrighted books. It called that an “ inapt analogy” which is “not a basis for blowing off the most important factor in the fair use analysis [i.e. market impact].”

Shadow and pirate libraries

One key nuance is how the Kadrey court treated Meta’s use of shadow libraries. The plaintiff authors argued that Meta’s reliance on pirate sites to obtain their works undermined any fair use defense. But the court treated the copying as part of a single process directed at training the AI. It did not focus on whether the texts came from legitimate publishers or shadow libraries. In other words, the court focused on the ultimate transformative use, not on how Meta obtained the books.

This contrasts with the Bartz court’s approach. There, Anthropic not only trained its Claude models on books from pirate sites, but also maintained a central library of the infringing works for uses beyond training. While the training itself was fair use, the court ruled that maintaining this separate repository was not protected.

Now what?

These cases are early wins for generative AI developers. The courts held that, on the particular facts of those cases, training language models on copyrighted books was transformative and fair use, especially where authors couldn’t establish concrete market harm.

But these decisions also provide a blueprint for future plaintiffs. If they can show that AI outputs displace their markets, or that developers are maintaining unauthorized copies for more than training, the results are likely to be very different.

artificial intelligenceAIcopyrightfair usepirated workspurchased worksdigital downloasinfringementshadow librariestraining

DAVID ALLGEYER