Generative AI: Transformative Fair Use Meets Piracy

 Earlier, I highlighted the June ruling in Bartz v. Anthropic which was favorable to AI developers: training an AI model on lawfully obtained books was found to be “exceedingly transformative,” and therefore potentially fair use.

But there was a second part to the Court’s order: downloading books from pirate sites — those shadow libraries that exist outside the law — is not fair at all.  And that part of the ruling doesn’t look at all favorable to the defendant AI company.

The Court has now granted class certification for the authors and rightsholders of nearly seven million allegedly pirated books to bring copyright infringement claims.  

Massive infringement was not well received

The Court’s characterization of the defendant’s actions is worth quoting in full (cleaned up to remove record citations) to understand just how massive — and blatant — the infringement was:

In early 2021, a co-founder of Anthropic downloaded 196,640 unauthorized copies of copyrighted books from the pirate library known as Books3. This library conveniently packaged for mass download pairs of each book’s extracted text and filename, enabling the books to be readily rebuilt into separate files or reviewed. The pirate behind Books3 publicly downplayed any “copyright backlash” from these efforts. Anthropic’s co-founder downloaded the pirated books to avoid the trouble of paying for them, hoping they might prove useful for training large language models (LLMs) or for something else. 

Later that year, Anthropic wanted still more. But centralized pirate libraries were being shut down by the government. So, in June 2021, Anthropic’s co-founder used the infamous BitTorrent protocol to copy books peer-to-peer from decentralized copies of another pirate library — Library Genesis, or LibGen. He set a computer program to pause torrenting if disk space got “exhausted.” These five million copies again came from ebooks, this time remaining in file types like .epub and .pdf to be “viewed as eBooks.” They generally had covers including title and authors 

The Court quickly rejected the argument that the infringement was so massive it would be too hard to sort out. It said, “A denial of a motion to certify the class would amount to a concession that copyright owners’ credible allegations of infringement will go unchecked by courts so long as a copyist allegedly violates the Copyright Act not a little but a whole lot.”

 Pirating creates grounds for class certification

The Court found Fed. R. Civ. P. Rule 23’s four requirements for class certification were met because:

1. The alleged infringement grew out of a uniform process — downloading from pirate sites and storing in a central repository;

2. The central legal question — whether that wholesale copying qualifies as fair use — can be resolved once for all class members;

3. A class action is exponentially more efficient and manageable than millions of individual suits.  In fact, the Court said, “Here, if not brought as a class action, there will likely be no action at all.  . . . Defendant is formidable. Few if any potential class members alone could muster comparable resources [to those of defendant], nor is there any guarantee [any alone] could find counsel willing to work pro bono or on a contingency basis;” 

4. The named plaintiffs’ claims are typical of the class, and they adequately represent absent class members.

Why This Matters

My earlier article emphasized the fair use path for training AI with lawfully acquired books. Score one for AI.  But now, things have taken a serious turn. Training AI requires massive amounts of data. But when data is unlawfully sourced, fair use doesn’t hold, and a court can take swift and broad action. At least this court did.   

And, if willful infringement is proven, a court can increase copyright damages up to $150,000 per work. 17 U.S.C. § 504. With millions of works at issue this is a mighty big deal.

Previous
Previous

Best Lawyers names David Allgeyer Minneapolis Arbitration Lawyer of the Year

Next
Next

AI at the Hearing: Keeping It Real