Perplexity and the Perplexing Legalities of Data Scraping

Of the many lawsuits media giants have filed against AI companies for copyright infringement, the one filed by Dow Jones & Co. (publisher of the Wall Street Journal) and NYP Holdings Inc. (publisher of the New York Post) against Perplexity AI adds a new wrinkle. 

Perplexity is a natural-language search engine that generates answers to user questions by scraping information from sources across the web, synthesizing the data and presenting it in an easily-digestible chatbot interface. Its makers call it an “answer engine” because it’s meant to function like a mix of Wikipedia and ChatGPT. The plaintiffs, however, call it a thief that is violating Internet norms to take their content without compensation. 

To me, this represents a particularly stark example of the problems with how AI platforms are operating vis-a-vis copyrighted materials, and one well worth analyzing.

According to its website, Perplexity pulls information “from the Internet the moment you ask a question, so information is always up-to-date.” Its AI seems to work by combining a large language model (LLM) with retrieval-augmented generation (RAG — oh, the acronyms!). As this is a blog about the law, not computer science, I won’t get too deep into this but Perplexity uses AI to improve a user’s question and then searches the web for up-to-date info, which it synthesizes into a seemingly clear, concise and authoritative answer. Perplexity’s business model appears to be that people will gather information through Perplexity (paying for upgraded “Pro” access) instead of doing a traditional web search that returns links the user then follows to the primary sources of the information (which is one way those media sources generate subscriptions and ad views).

Part of this requires Perplexity to scrape the websites of news outlets and other sources. Web scraping is an automated method to quickly extract large amounts of data from websites, using bots to find requested information by analyzing the HTML content of web pages, locating and extracting the desired data and then aggregating it into a structured format (like a spreadsheet or database) specified by the user. The data acquired this way can then be repurposed as the party doing the gathering sees fit. Is this copyright infringement? Probably, because copyright infringement is when you copy copyrighted material without permission. 

To make matters worse, at least according to Dow Jones and NYP Holdings, Perplexity seems to have ignored the Robots Exclusion Protocol. This is a tool that, among other things, instructs scraping bots not to copy copyrighted materials. However, despite the fact that these media outlets deploy this protocol, Perplexity spits out verbatim copies of some of the Plaintiff’s articles and other materials. 

Of course, Perplexity has a defense, of sorts. Its CEO accuses the Plaintiffs and other media companies of being incredibly short sighted, and wishing for a world in which AI didn’t exist. Perplexity says that media companies should work with, not against, AI companies to develop shared platforms. It’s not entirely clear what financial incentives Perplexity has or will offer to these and other content creators. 

Moreover, it seems like Perplexity is the one that is incredibly shortsighted. The whole premise of copyright law is that if people are economically rewarded they will create new, useful and insightful (or at least, entertaining) materials. If Perplexity had its way, these creators would not be paid at all or accept whatever it is that Perplexity deigns to offer. Presumably, this would not end well for the content creators and there would be no more reliable, up-to-date information to scrape. Moreover, Perplexity’s self-righteous claim that media companies just want to go back to the Stone Age (i.e., the 20th century) seems premised on a desire for a world in which the law allows anyone who wants copyrighted material to just take it without paying for it. And that’s not how the world works — at least for now.

Happier Life, Better Business

The year is drawing to a close, which means I’m looking back on the good and the bad of 2024 and trying to focus, naturally, on the good. Among the good things of 2024 are three key realizations that have helped improve my legal practice. I’m sharing because I think they can be of value to anyone starting, growing, or managing a business. 

One thing that really hit home this year is that when you’re a business owner, business problems are personal problems (and vice versa). This isn’t because a personal problem means that I’m making less money or that I take every frustrating or difficult situation personally. 

I, like most everyone else, enjoy doing the things I’m good at and don’t like to do stuff that feels hard or stresses me out. But, as a business owner, just because something isn’t easy doesn’t mean I can avoid dealing with it. I still have to either slog through it or go back to bed and pull the covers up over my head (I never do that but hey, technically it’s an option). 

There is, however, a third and, IMHO, better option: Understand why the task is hard and figure out how to make it less hard.

As I’ve talked about before, I used to dread posting to this blog because I was sure that some anonymous Internet troll was going to get offended by something I said, scold me for getting a fact wrong or get all huffy over a misplaced comma. I became so focused on not upsetting anyone or making mistakes that I ended up churning out some pretty pedestrian content. Worse, it took me FOREVER and a day to write anything because I obsessively examined and reexamined every damn word. Unsurprisingly, this did not make it easy to regularly post new material.

Acknowledging these feelings was a huge first step in overcoming them. It enabled me to look at my fears objectively and consider if there was any actual data to support their existence (surprise: there wasn’t!). Ultimately, addressing these personal fears and starting to make more regular and compelling blog posts turned out to have huge results for my business, as this blog has measurably helped attract new clients (for which I am extremely grateful). 

The second big revelation is that it’s not only okay to be choosy when taking on clients, it’s critical for my sanity and my firm’s success. For a long time, I operated as if every potential client might be my last. Irrational, to be sure, but also pretty normal. As a result, I felt like I was endangering my business and financial future if I didn’t say yes to any matter that even vaguely fit into my area of expertise. That meant ending up saddled with work that wasn’t profitable or, worse, made me miserable because I either wasn’t interested in the subject or the client didn’t value my insights, knowledge or ideas. Perfect example: Have I litigated securities fraud issues? Sure. Could I do so again? Of course. Do I want to? No! Securities fraud cases are not something I enjoy, nor will they lead to more of the cases I thrive on. In other words, while taking on cases or clients that aren’t a good fit may put money into my pocket in the short term, they don’t result in work I can excel at and people I enjoy working with. That’s where I need to focus my attention. Now, I am way more selective, and while I know that turning down work sounds a little crazy if you’re just starting your own business, it’s been a game changer for me. 

Which leads to my final big discovery of the year: By saying no to things that don’t serve my firm’s (and ultimately, my own) long-term interests, I have more time to focus on doing and getting work that I DO want. My time and energy are finite resources (this is really the BIG realization) and by using them more efficiently I’ve seen rapid, tangible results in the growth of my practice. I’m happier, my clients are happier, and my family are happier. And that’s ALL good, this year and for the years to come. 

OpenAI’s Texts and DMs: Business or Personal?

If you’ve been following this blog, you’re familiar with the copyright infringement cases the New York Times and the Authors Guild have brought against OpenAI, makers of ChatGPT. So familiar, in fact, I won’t summarize these suits again. You can find a prior post about these cases here. The current dispute is interesting, at least to me (social media + law = fun for a nerd like me!) because it is another data point on how courts grapple with the blurry line between business and personal communications on social media.

Taking a step back for the non-litigators and non-lawyers in the room: In litigation, the parties must exchange materials that could have a bearing on the case. This generally covers a pretty broad range of materials and requires each party to produce all such materials that are in its “possession, custody, or control.” A party can also subpoena a non-party to the case for relevant materials in the non-party’s “possession, custody, or control.” However, where possible, it’s generally better to get discovery materials from a party instead of a non-party.

Turning back to the cases against OpenAI, the Authors Guild asked the tech company to produce texts and social media direct messages from more than 30 current and former employees, including some of the company’s top executives. It claims these communications may shed light on the issues in the case.

OpenAI has pushed back strongly. It claims that its employees’ social media accounts and personal phones are, well, personal and, therefore, not in its control. It also contends the Guild’s request might intrude on these persons’ privacy. OpenAI also rejects the Guild’s assumption that OpenAI’s search of its internal materials relevant to the case will be inadequate without its employees’ and former employees’ texts and DMs. It sniffs that the Guild should wait until it receives OpenAI’s documents before presuming as much (how rude!). 

The Authors Guild has responded by pointing to OpenAI employees’ posts on X (yes, formerly Twitter) that clearly indicate they used their “personal” social media for work purposes. Same goes for their phones which, while they may not be paid for by the company, seem to have been used to text about business. 

So, who’s right here? For starters, it seems pretty likely that, at least for current OpenAI employees, OpenAI could just tell people to turn over DMs and text messages. Assuming the employees don’t object or refuse, this should be enough to establish that OpenAI has “control.” The fact that it seems that OpenAI hasn’t taken this basic step before refusing to produce DMs and text messages seems like a really good way to piss off the Magistrate Judge hearing this issue, especially if the employees violated OpenAI policies requiring work-related communications to take place on devices and accounts owned by the company (it should have such policies if it doesn’t!) or if the communications were clearly within the scope of an employee’s employment. Without that basic showing, it seems likely that the Authors Guild will prevail. 

If it does (or if it doesn’t) there will be more about it here!

IP in a Partnership: Who Owns What?

I talk a lot here about aspects of intellectual property law. It’s an area I find pretty fascinating because it has to do with how a society encourages people to create, and the law embodies beliefs about how to accomplish that. I also talk a lot about partnership disputes which, along with IP work, forms a big part of my practice. 

Sometimes, when you put two good things together you get something great (Reese’s!). Other times, though, you just get a mess. (Melted chocolate in your pocket? OK, I’ll stop now.) Often, it’s my job to sort out the issues created when partnership disagreements intertwine with intellectual property issues — specifically, who owns a company’s IP when a partnership falls apart.

In such disputes, there are a few rules that usually apply. I’ve found these are often unknown to or misunderstood by the people involved in these scenarios. So let’s run through them.

  1. Just because two people or a larger group didn’t formally register a company doesn’t mean there isn’t a partnership. In New York (where I primarily practice) and in other states, courts can find that people entered into a partnership even if they never filed paperwork to create a business entity. There are a range of factors that can come into play here but, in general, courts will look at whether the parties shared the business’s profits and losses; jointly managed or controlled the business; contributed money to the business; and/or whether they intended to be partners. Why does this matter? Because, during the existence of a partnership, the partners owe each other fiduciary duties, meaning they must treat each other fairly and, importantly, no individual member of the company can claim the company’s property for herself.

  2. Thus, even if a partner registers a partnership’s trademark in her or his name, that trademark belongs to the partnership — not to her. For example, if a business operates under or sells a product with a name and/or logo, one of the members of the business can’t take ownership of that name or logo by individually obtaining a trademark registration for it. Nor can they exclude other members of the business from using the name or logo if the partnership breaks up.

  3. Copyright rules are different! Generally speaking, a copyright vests in the creator, not the company. This means that if partners (either individually or together) create a work that is copyrighted or copyrightable, the copyright goes to the creator or creators, not the business. Moreover, under copyright law, transferring a copyright requires a written document, so if any owner wants to transfer a copyrighted work from themselves to the business, they need to have a document that says so.

  4. On a related note, just because something is created by a partner under the auspices of the business doesn’t mean it’s a “work for hire” and thus belongs to the business from the moment of its creation. Something only becomes a work for hire in two situations: (a) if it’s prepared by an employee within the scope of his or her employment; or (b) if there’s a signed written agreement stating that the material is a work for hire.

  5. Finally, the idea for a business is usually not protectable because, in general, ideas are not protectable intellectual property (I know, that sounds counterintuitive). Copyright law protects the expression of an idea, not the idea itself. So if you say to a friend, “Hey, we should open a business making ice cream for cats,” and your friend goes out and starts up Kitty Kreameries, you’re not entitled to any ownership of it. You have to put in the work and actually do the thing, not just think of the thing.  

No one starts a business with others expecting things to turn sour. But it happens a LOT. So the overall lesson here: If you’re entering into or already in a business with others, whether you’ve formally created it or not, be aware what belongs to you and what belongs to the business as a whole so you won’t be taken by surprise if it all comes crashing down someday.

Digging Through Yesterday to Plan for Tomorrow

First, a disclaimer: bear with me on this one. Even though I start off with descriptions of the various offices I’ve inhabited since 2021 and my struggles furnishing them, the tale does lead to some lessons that are worth thinking about as we prepare for the inevitable onslaught of articles and emails about how to plan for 2025. 

Like many people with desk jobs, I worked from home during the pandemic. It wasn’t a big deal, as I was used to meetings on Zoom and my bookkeeper, assistant, and paralegal had always been remote. 

In the fall of 2021, as COVID was starting to ease, New York City decided to install a new water main outside my bedroom/office. This ensuing construction cacophony was the end of my working from home.

I moved into a private office within a small shared space that was pretty great in many ways. It had a big window, a lovely view of the Manhattan skyline and one of my neighbors was a floral designer, which meant I frequently had fresh flowers in my office. However, there were rarely any other people around, so it still felt like I was stuck in my bedroom. After that, I moved to another space with the hopes that I would have a regular officemate. Unfortunately, that didn’t work out as planned and I found myself still mostly alone every day. 

About a year ago, I moved once again to my current office, which is in downtown Brooklyn. Third time is indeed the charm. There’s a nice mix of having other people around, but a door I can close when I’m on the phone or need to concentrate.

Even with this upgrade though, my actual office was pretty bleak. My furniture amounted to a junky old filing cabinet, a hand-me-down bookshelf, and a depressingly blank Zoom background. Mostly, this was because I just haven’t had time to find furniture that I like. 

Recently this changed. I finally had some time to buy a new bookcase and filing cabinet. They’re quite nice and certainly a big improvement over my prior decor. 

Of course, these purchases meant I had to transfer everything from the old furniture to the new. That archaeological dig unearthed a bunch of articles I had printed out and hand-scrawled notes I’d thrown in a folder to come back to later. As I read through this collection, I quickly realized almost all this material had to do, in some way, with growing a business. I soon became thoroughly engrossed in reading, stopped checking my email, let my computer go to sleep, and left my phone on the other side of the room. 

It was an interesting journey through the past few years of my practice. Some of these articles and ideas were no longer relevant, as they contained ideas or advice I’d tried that didn’t work for me, or experiences I’ve subsequently written about here. But a lot of it still resonated and, as I worked my way through this stuff, it became pretty clear that there were some recurring themes. Nothing particularly earthshaking or radical, but ideas that are definitely worth revisiting. More importantly, the process — particularly being separated from my phone and other distractions — allowed me to step back and see connections that I had forgotten about or previously missed.

So what are the lessons here? First, creating a strategy for growing a business isn’t a one-and-done deal. What worked a few months ago might not work now, or could be ripe for further improvement. Through my review of this collection of material, I could see the evolution in my thinking and approach, and sort out what didn’t work, examine whether improvements were possible, and chuck the stuff that didn’t work or was no longer relevant. 

In the next two-and-a-half months we’re all going to be bombarded with articles, commercials, and general blather advising us to plan for 2025, and my experience reading my articles and notes reinforced how you can’t plan for the future without assessing where you’ve been. Looking back on decisions and moves I’ve made is essential for taking stock of what works (and what doesn’t) and how to deploy resources in the future. Simply having an idea once, implementing it and never reexamining it can too easily lead to stagnation. 

And what’s the best way to do this? By freeing ourselves from distraction! Stepping back from our phones and computers (and even some of the idle office chitchat I now enjoy that I missed so much during the pandemic) allows you to get new perspectives and see the connections between what you’ve done before and the results they’ve led to. Because the past is the strongest foundation we can build upon for the future.