The Anthropic Decision: A (Sorta) Win for AI

By Emily Poler

Two recent court decisions are starting to provide some clarity about when AI companies can incorporate copyrighted works into their large language models (LLMs) without licenses from the copyright holders. One is in a suit against Meta; we’ll get to that in a future post. 

Today, let’s focus on the suit brought by a group of authors against Anthropic PBC, the company behind Claude, a ChatGPT and CoPilot competitor. (For what it’s worth, I’ve found Claude to be the best AI of the three). Bottom line: “The training use was a fair use,” wrote Judge William Alsup. “The use of the books at issue to train Claude and its precursors was exceedingly transformative.” This ruling is a landmark as it’s one of the first substantive decisions on how fair use applies to AI — and it’s a big win for AI, right? Well, there’s a catch.

But first, some background. To create Claude (I love how AI companies give their LLMs these friendly, teddy bear names that mask that they’re machines and cause real harm), Anthropic collected a library of approximately seven million books. In some cases, Anthropic purchased hard copies and scanned them. But, mostly it just grabbed “free” (aka, pirated) digital copies from the Internet. At least three authors whose books were used — Andrea Bartz, Charles Graeber and Kirk Wallace Johnson — were not amused, and in 2024 they filed a class action suit against Anthropic, alleging copyright infringement for training Claude on their works and for obtaining the materials without paying for them. 

As far as Anthropic’s training of its LLM on copyrighted materials, the Court found this to be fair use since it dramatically differs from the works’ original purpose. As the judge wrote, “the technology at issue was among the most transformative many of us will see in our lifetimes.” This is a big deal.

But what’s also a big deal — and the catch for Anthropic — is that if you’re going to train an AI on copyrighted materials, you have to pay for them. In most cases, Anthropic didn’t. And thus, Judge Alsup is allowing the case to proceed to trial, writing that Anthropic “downloaded for free millions of copyrighted books in digital form from pirate sites on the internet.” 

For me, there are a couple of notable takeaways here, some purely legal and some the kind of common sense that I suspect that most kindergartners could point out. Let’s talk about the purely legal point first. The Court went to great lengths to distinguish the different ways that Anthropic used the works, which was critical in its fair use analysis. 

As part of Anthropic’s process, when it scanned a purchased book it discarded the original copy. The Court found this constituted fair use as long as the hard copy was destroyed and the digitized version not distributed outside the company. However, Anthropic kept all the books, including the millions of pirated copies, in a general library even after deciding, in some cases, that some books in this library would not be used for training now, or maybe ever. The judge specifically noted this implied the company’s primary purpose was to amass a vast library without paying for it, regardless of whether it might someday be used for a transformative purpose, and that such a practice directly displaced legitimate demand for the authors’ works. 

The opinion is especially interesting to me because of how the Court distinguished the facts of this case from other fair use cases. For example, the Court pointed out that in most (if not all) of other fair use cases, the defendant purchased or obtained the initial copy legally by either purchasing it or using a library copy. 

This brings us to the other big takeaway, which is a mix of legal reasoning combined with morals and common sense: A defendant doesn’t get a free pass on stealing copyrighted materials just because it does something neat with those materials. In his opinion, the judge consistently ruled that it’s not ok to pirate books. This should have been obvious to Anthropic (and its lawyers) as I think that most children could tell you doing something cool or interesting with the proceeds of a bank robbery doesn’t make the bank robbery legal. This is particularly true given that Anthropic’s whole marketing schtick is that it’s less evil than other technology companies. In fact, Anthropic’s lawyers seemed to acknowledge as much at oral argument, saying “You can’t just bless yourself by saying I have a research purpose and, therefore, go and take any textbook you want. That would destroy the academic publishing market if that were the case.” 

It will be fascinating to see what happens in the trial, slated to start in December. If judgement for copyright goes against Anthropic, U.S. copyright law allows for statutory damages of up to $150,000 per infringed work. With more than seven million pirated books in Anthropic’s library, the damages could be huge.

Also huge, of course, is the precedent set here that training AI on copyrighted works is fair use. It’s a significant decision that many have been waiting for that will have enormous repercussions on, well, just about everything going forward.

Stay tuned. More to come soon on the suit against LLAMA, Meta’s LLM.