AI

February 19, 2025

A Hint of How AI Infringement Suits Will Go?

As the lawyers reading this know, media giant Thomson Reuters has a proprietary online research database called Westlaw. In addition to hosting cases and statutes, Westlaw also includes original material written by Westlaw editors. A recent decision involving that original content and its use by Ross Intelligence, a potential Thomson Reuters competitor, to create an AI-powered product may provide a bit of a roadmap on fair use and other issues facing the courts considering cases against OpenAI, Perplexity and other generative AI platforms.

First, some background: while the bulk of Westlaw’s content — statutes, rules, ordinances, cases, administrative codes, etc.— are not subject to copyright protection, Westlaw editors concisely restate the important points of a case with short summaries. Each is called a Headnote. Westlaw organizes Headnotes into something called the West Key Number System, which makes it much easier to find what you’re looking for.

This case began when Ross asked to license Westlaw’s Headnotes to create its own, AI-powered legal research search engine. Not surprisingly, Thomson Reuters didn’t want to help create a competitor and said no.

As a workaround, Ross hired a company called LegalEase, which in turn hired a bunch of lawyers to create training data for Ross’ AI. This training data took the form of a list of questions, each with correct and incorrect answers. While the lawyers answering these questions were told not to simply cut and paste Headnotes, the answers were formulated using Westlaw’s Headnotes and the West Key Number System. LegalEase called these “Bulk Memos.”

Thomson Reuters was none too happy about this and sued Ross for, among other things, copyright infringement, claiming that “Ross built its competing product from Bulk Memos, which in turn were built from Westlaw [H]eadnotes.” In its defense, Ross claimed that Westlaw’s Headnotes were not subject to copyright protection, and that to the extent it infringed on Thomson Reuters’ copyrights, its use constituted fair use.

In 2023 the Court largely denied Thomson Reuters’ motion for summary judgment, ruling that, among other things, the question of whether Headnotes qualify for copyright protection would have to be decided by a jury. The Court, however, subsequently had a change of heart and asked Thomson Reuters and Ross to renew their motions for summary judgment. Earlier this month, the Court ruled on these renewed motions.

Of note, the Court found that at least some Headnotes qualified for copyright protection, as did the West Key Number System. On the Headnotes, the Court found that the effort of “distilling, synthesizing, or explaining” a judicial opinion was sufficiently original to qualify for copyright protection. The Court also found the West Key Number System to be sufficiently original to clear the “minimal threshold for originality” required for copyright protection. The Court further found that the Bulk Memos infringed on some of the Headnotes.

The Court also rejected Ross’ assertion of fair use. Its decision was based largely on the fact that Ross was using Thomson Reuters’ Headnotes to create a competing product. Here, the Court looked at not only Thomson Reuters’ current market, but also potential markets it might develop, finding that since Thomson Reuters might create its own AI products the Ross product could negatively impact the market for Thomson Reuters, which weighed against fair use.

The Court was not impressed with Ross’ reliance on a line of cases finding copying of computer code at an intermediate step to be fair use. Here, the Court noted that Ross was not copying computer code. Moreover, in those cases, the copying was necessary to access purely functional elements of a computer program and achieve new, transformative purposes. In contrast, Ross used Headnotes to make it easier to develop a competitive product.

Ultimately, these conclusions are most interesting because of what other courts hearing AI infringement cases may take from them. Sure, there are differences (notably, Ross doesn’t seem to be using generative AI), but this case highlights some of the legal and factual issues we’re going to see as other cases move forward. In particular, I think the fact that the Court here found that the process of summarizing or distilling longer cases into Headnotes renders the Headnotes subject to copyright protection may be problematic for companies such as OpenAI, which has tried to claim that it is only ingesting underlying facts from news articles. If creating Headnotes is sufficiently original to qualify for copyright protection, then it seems likely that a reporter selecting the facts to include in a news article is also sufficiently original.

Stay tuned. There is much, much more to come.

February 11, 2025

Get Ready: DeepSeek is Here

And this week, it’s DeepSeek. Every few days it seems there’s something new dominating tech headlines, and since right now it’s the low-cost, low-energy Chinese AI roiling world governments and markets, I thought I’d use this week’s post to take a look at some portions of DeepSeek’s Terms of Use (ToU). Of course, keep in mind nothing I write here is legal advice and, as I’ve covered at greater length previously, there’s a whole lot of uncertainty about the rules governing the creation of large language and diffusion models, as well as their outputs. But that doesn’t mean there’s not a lot to chew on already.

With that disclaimer out of the way, I’m going to start with something that’s rather mundane, but where litigators’ minds tend to go right off the bat: forum selection. For the non-attorneys out there, that’s where a lawsuit against DeepSeek would have to be brought. What do DeepSeek’s ToU say? “In the event of a dispute arising from the signing, performance, or interpretation of these Terms, the Parties shall make efforts to resolve it amicably through negotiation. If negotiation fails, either Party has the right to file a lawsuit with a court having jurisdiction over the location of the registered office of Hangzhou DeepSeek Artificial Intelligence Co., Ltd.”

In other words, if you want to sue DeepSeek, you have to do so in China. This is not atypical — technology companies generally include favorable forum selection clauses in their ToU — but from an American perspective, this will make it hard or impossible for most US-based DeepSeek users to sue the company in the event of a dispute.

More disturbing is section 4.2 of DeepSeek’s ToU: “Subject to applicable law and our Terms, you have the following rights regarding the Inputs and Outputs of the Services: (1) You retain any rights, title, and interests—if any—in the Inputs you submit; (2) We assign any rights, title, and interests—if any—in the Outputs of the Services to you.” Sounds benign, right?

Nope. What it really means is if DeepSeek decides a user has violated its ToU (or Chinese law), it could unilaterally decide that the user has given up rights to its materials and/or the rights to use output from DeepSeek. This means DeepSeek could use this provision to claim ownership over the material users put into DeepSeek, or could sue a user who includes output generated by DeepSeek in any of their own commercial activities. People and organizations will have to make their own calls about whether this is an acceptable risk but, on top of the fact that any user who thinks their rights have been improperly rescinded would have to seek legal recourse in a Chinese court, this seems, um, bad.

I should also mention that the privacy and national security concerns involved in using DeepSeek are well above my pay grade — but I’d love to hear your thoughts on them. I’m particularly curious what privacy attorneys think about the provisions around the platform’s use by minors (“DeepSeek fully understands the importance of protecting minors and will take corresponding protective measures in accordance with legal requirements and industry mainstream practices”); and reports that a DeepSeek database containing sensitive information was publicly accessible. Neither the vague language on the protection of minors nor DeepSeek’s failure to protect its information inspires confidence. But I’m not a privacy lawyer so maybe I’m missing something.

Lastly, one especially amusing thing has come from the DeepSeek splash: OpenAI (creators of ChatGPT) has publicly accused DeepSeek of using its output to train DeepSeek’s AI, complaining that it is a violation of OpenAI’s terms of service. Ha! OpenAI, of course, is currently embroiled in several copyright infringement lawsuits (which I’ve covered here) with the New York Times and others over OpenAI’s use of their content to train its algorithms (and presumably compete with them). Oh, the irony.

December 17, 2024

Perplexity and the Perplexing Legalities of Data Scraping

Of the many lawsuits media giants have filed against AI companies for copyright infringement, the one filed by Dow Jones & Co. (publisher of the Wall Street Journal) and NYP Holdings Inc. (publisher of the New York Post) against Perplexity AI adds a new wrinkle.

Perplexity is a natural-language search engine that generates answers to user questions by scraping information from sources across the web, synthesizing the data and presenting it in an easily-digestible chatbot interface. Its makers call it an “answer engine” because it’s meant to function like a mix of Wikipedia and ChatGPT. The plaintiffs, however, call it a thief that is violating Internet norms to take their content without compensation.

To me, this represents a particularly stark example of the problems with how AI platforms are operating vis-a-vis copyrighted materials, and one well worth analyzing.

According to its website, Perplexity pulls information “from the Internet the moment you ask a question, so information is always up-to-date.” Its AI seems to work by combining a large language model (LLM) with retrieval-augmented generation (RAG — oh, the acronyms!). As this is a blog about the law, not computer science, I won’t get too deep into this but Perplexity uses AI to improve a user’s question and then searches the web for up-to-date info, which it synthesizes into a seemingly clear, concise and authoritative answer. Perplexity’s business model appears to be that people will gather information through Perplexity (paying for upgraded “Pro” access) instead of doing a traditional web search that returns links the user then follows to the primary sources of the information (which is one way those media sources generate subscriptions and ad views).

Part of this requires Perplexity to scrape the websites of news outlets and other sources. Web scraping is an automated method to quickly extract large amounts of data from websites, using bots to find requested information by analyzing the HTML content of web pages, locating and extracting the desired data and then aggregating it into a structured format (like a spreadsheet or database) specified by the user. The data acquired this way can then be repurposed as the party doing the gathering sees fit. Is this copyright infringement? Probably, because copyright infringement is when you copy copyrighted material without permission.

To make matters worse, at least according to Dow Jones and NYP Holdings, Perplexity seems to have ignored the Robots Exclusion Protocol. This is a tool that, among other things, instructs scraping bots not to copy copyrighted materials. However, despite the fact that these media outlets deploy this protocol, Perplexity spits out verbatim copies of some of the Plaintiff’s articles and other materials.

Of course, Perplexity has a defense, of sorts. Its CEO accuses the Plaintiffs and other media companies of being incredibly short sighted, and wishing for a world in which AI didn’t exist. Perplexity says that media companies should work with, not against, AI companies to develop shared platforms. It’s not entirely clear what financial incentives Perplexity has or will offer to these and other content creators.

Moreover, it seems like Perplexity is the one that is incredibly shortsighted. The whole premise of copyright law is that if people are economically rewarded they will create new, useful and insightful (or at least, entertaining) materials. If Perplexity had its way, these creators would not be paid at all or accept whatever it is that Perplexity deigns to offer. Presumably, this would not end well for the content creators and there would be no more reliable, up-to-date information to scrape. Moreover, Perplexity’s self-righteous claim that media companies just want to go back to the Stone Age (i.e., the 20th century) seems premised on a desire for a world in which the law allows anyone who wants copyrighted material to just take it without paying for it. And that’s not how the world works — at least for now.

November 19, 2024

OpenAI’s Texts and DMs: Business or Personal?

If you’ve been following this blog, you’re familiar with the copyright infringement cases the New York Times and the Authors Guild have brought against OpenAI, makers of ChatGPT. So familiar, in fact, I won’t summarize these suits again. You can find a prior post about these cases here. The current dispute is interesting, at least to me (social media + law = fun for a nerd like me!) because it is another data point on how courts grapple with the blurry line between business and personal communications on social media.

Taking a step back for the non-litigators and non-lawyers in the room: In litigation, the parties must exchange materials that could have a bearing on the case. This generally covers a pretty broad range of materials and requires each party to produce all such materials that are in its “possession, custody, or control.” A party can also subpoena a non-party to the case for relevant materials in the non-party’s “possession, custody, or control.” However, where possible, it’s generally better to get discovery materials from a party instead of a non-party.

Turning back to the cases against OpenAI, the Authors Guild asked the tech company to produce texts and social media direct messages from more than 30 current and former employees, including some of the company’s top executives. It claims these communications may shed light on the issues in the case.

OpenAI has pushed back strongly. It claims that its employees’ social media accounts and personal phones are, well, personal and, therefore, not in its control. It also contends the Guild’s request might intrude on these persons’ privacy. OpenAI also rejects the Guild’s assumption that OpenAI’s search of its internal materials relevant to the case will be inadequate without its employees’ and former employees’ texts and DMs. It sniffs that the Guild should wait until it receives OpenAI’s documents before presuming as much (how rude!).

The Authors Guild has responded by pointing to OpenAI employees’ posts on X (yes, formerly Twitter) that clearly indicate they used their “personal” social media for work purposes. Same goes for their phones which, while they may not be paid for by the company, seem to have been used to text about business.

So, who’s right here? For starters, it seems pretty likely that, at least for current OpenAI employees, OpenAI could just tell people to turn over DMs and text messages. Assuming the employees don’t object or refuse, this should be enough to establish that OpenAI has “control.” The fact that it seems that OpenAI hasn’t taken this basic step before refusing to produce DMs and text messages seems like a really good way to piss off the Magistrate Judge hearing this issue, especially if the employees violated OpenAI policies requiring work-related communications to take place on devices and accounts owned by the company (it should have such policies if it doesn’t!) or if the communications were clearly within the scope of an employee’s employment. Without that basic showing, it seems likely that the Authors Guild will prevail.

If it does (or if it doesn’t) there will be more about it here!

March 19, 2024

Let’s Talk About Trademarks (And AI)

I’ve posted quite a bit about the growing legal battles involving AI companies, copyright infringement, and the right of publicity. These are still early days in the evolution of AI so it’s hard to envision all the ways the technology will develop and be utilized, but I predict AI is going to come up against even more existing intellectual property laws — specifically, trademark law.

For example, in its lawsuit against Open AI and others (which I wrote about here), the New York Times Company alleged the Defendants engaged in trademark dilution. To take a step back, trademark dilution happens when someone uses a “famous” trademark (think Nike, McDonalds, UPS, etc.) without permission, in a way that weakens or otherwise harms the reputation of the mark’s owner. This could happen when an AI platform, in response to a user query, delivers flat-out wrong or offensive content and attributes it to a famous brand such as the New York Times. Thus, according to the Times’ complaint, when asked “what the Times said are ‘the 15 most heart-healthy foods to eat,’” Bing Chat (a Microsoft AI product) responded with, among other things, “red wine (in moderation).” However, the actual Times article on the subject “did not provide a list of heart-healthy foods and did not even mention 12 of the 15 foods identified by Bing Chat (including red wine).” Who knows where Bing got its info from, but if the misinformation and misattribution causes people to think less of the “newspaper of record,” that could be construed as trademark dilution.

There are, however, potential pitfalls for brands who want to use trademark dilution to push back against AI platforms. It’s difficult to discover, expensive to pursue and there can be a lot of ambiguity about whether a brand is “famous” and able to be significantly harmed by trademark dilution. In the New York Times’ case, the media giant has the resources to police the Internet and to file suits; nor should there be any dispute that is a “famous” brand with a reputation that is vitally important. But smaller companies may not have the resources to search for situations where AI platforms incorrectly attribute information, or have a platform visible enough to meaningfully correct the record. Plus, calculating the brand damage from AI “hallucinations” will be very difficult and costly. Also, this area of the law does nothing for brands that aren’t “famous.”

Another area where trademark law and AI seem destined to face off is under the sections of the Lanham Act — the Federal trademark law — that allows celebrities to sue for non-consensual use of their persona in a way that leads to consumer confusion, or others to sue for false advertising that influences consumer purchasing decisions. AI makes it pretty easy to manipulate a celebrity’s (or anyone’s) image or video to do and say whatever a user wants, which opens up all sorts of troublesome trademark possibilities.

Again, there are a couple of serious limitations here. For starters, the false endorsement prong likely only applies to celebrities or others who are well-known and does little to protect the rest of us. Perhaps more important (and terrifying), it seems likely that there will be significant issues in applying the Lanham Act’s provisions on false advertising in the context of deepfakes in political campaigns — like, for example, the recent robocall in advance of the New Hampshire primary that sounded like it was from President Biden. To avoid problems with the First Amendment, the Lanham Act is limited to commercial speech and thus will be largely useless for dealing with this type of AI abuse.

One other potentially interesting (and creepy) area where AI and trademark law might intersect is when it comes to humans making purchasing decisions through an AI interface. For example, a user tells a chatbot to order a case of “ShieldSafe disinfecting wipes,” but what shows up on their porch is a case of “ShieldPro disinfecting wipes” (hat tip to ChatGPT for suggesting these fictional names). While the mistake of a few letters might mean nothing to an algorithm (or even to a consumer who just wants to clean a toilet), it’s certainly going to anger a ShieldSafe Corp. that wants to prevent copycat companies from stealing their customers (and keep their business from going down that aforementioned toilet).