May 27, 2025
Copyright, Fair Use and AI: A Kinda Official Report
By Emily Poler
Following up on my last post in which we, in part, discussed the United States Copyright Office and its developing standards for determining whether works created with artificial intelligence are eligible for copyright, we can now dive deeply into the recent “pre-publication” release of a report detailing that Office’s thinking on the topic of AI and fair use. A final version of the report is supposed to be published in the near future; however, the Trump Administration’s termination of the Director of the US Copyright Office, which, according to some, is linked to the Office’s issuance of this report, makes me wonder if this time frame might change.
Why is this important? Given the Copyright Office’s role as the government body to which someone who wants to register a copyright goes to file an application, as well as advising Congress on matters related to copyright, its reports are likely to influence the judges currently considering the lawsuits against the tech companies that own OpenAI and its ilk.
As a reminder, the fair use doctrine (some background here) allows use of copyrighted material under certain circumstances to be considered non-infringing (i.e., fair). Where a use is fair, the entity repurposing the copyrighted materials does not have to pay a license fee to the original creator. Courts have developed a four-part framework for determining if a new use furthers the ultimate goal of copyright — the promotion of the “Progress of Science and useful Arts,” U.S. Constit., Art. I, Section 8 (and yes, that capitalization is in the original). This involves considering things like the degree to which the new work transforms the original and the extent to which the new work can substitute for the original work.
Overall, the Copyright Office’s report is quite interesting, replete with good background for anyone wanting to understand generative AI, and to absorb the issues related to AI and copyright. Here are some highlights (and my thoughts):
- Different uses of copyrighted materials should be treated differently. (Okay, that’s maybe not so surprising). For example, in the Copyright Office’s analysis, using copyrighted materials for initial training of an AI model is different from using copyrighted materials for retrieval augmented generation, where AI delivers information to users drawn from scraped original works. This makes sense to me because, as with many (most?) things, context matters. Moreover, numerous cases (including the Supreme Court’s decision in Andy Warhol Foundation v. Goldsmith) stress the importance of analyzing the specific use at issue. However, the Copyright Office also noted that “compiling a dataset or training alone is rarely the ultimate purpose. Fair use must also be evaluated in the context of the overall use.” Which leads us to the next point…
- The report describes how “training a generative AI foundation model on a large and diverse dataset will often be transformative” because it converts a massive collection of copyrighted materials “into a statistical model that can generate a wide range of outputs across a diverse array of new situations.” On the flip side, to the extent a model generates outputs that are similar to the original materials, the less likely those outputs are to be transformative. This could represent trouble for AI platforms that allow users to create outputs that replicate the style of a copyrighted work. In those cases (every case?) where an AI platform allows users to generate entirely original works as well as ones that are similar or identical to copyrighted materials, courts will have to figure out what constitutes fair use.
- There is a common argument advanced by AI platforms that using copyrighted materials to train AI models is “inherently transformative because it is not for expressive purposes” since the models reduce movies, novels and other works to digital tokens. The Copyright Office isn’t buying this. It says changing “O Romeo, Romeo! Wherefore art thou Romeo?” into a string of numbers does not render it non-expressive because that digital information can subsequently be used to create expressive content. This makes a lot of sense. Translating Shakespeare into Russian isn’t transformative, so there’s no good reason that converting it into a “language” readable by a machine should be any different.
- The use of entire copyrighted works for training weighs against a finding of fair use; however, the ingestion of whole works could be fair if a platform implements “guardrails” that prevent a user from obtaining substantial portions of the original work. Again, courts are going to need to examine real world uses and draw lines between those that are ok and those that are not.
- When an AI platform produces work based on its training on copyrighted materials, even if that output lacks protectable elements of the original (for example, the exact melody or lyrics of a song), output that is stylistically similar to an original work could compete with that original work — and this weighs against a finding of fair use.
While at first blush there’s nothing particularly new or revelatory in the report, it is nonetheless effective at concisely synthesizing the issues raised in the various AI copyright-related lawsuits in the courts at the moment (and to come in the future). As such, it highlights the many areas where courts are going to have to define what does and does not constitute fair use, and the even trickier questions of where precisely the lines between will need to be drawn. Fun times ahead!