Scrape Away! Social Media Posts and Web Scraping

I’ve written a number of posts about data scraping because it’s a big deal right now. Initially, I was interested in the issue because it was pretty clear that AI companies such as OpenAI engaged in widespread extraction of data from other companies’ websites (i.e., scraping) to collect materials to create their generative AI platforms. More recently, my interest has shifted to focus more on the extent to which social media companies are trying to use their terms of service to limit or prevent others from collecting and selling their users’ data.

The interesting wrinkle: the social media platforms don’t actually own their users’ content.

First, some background: In case you’ve never thought about it (and there’s no reason for this to cross most people’s minds), when you sign up for an account on a social app like LinkedIn or X and agree to its terms of service, you give the platform a license to use your content. Generally speaking, this means that you give it the right to display, reproduce, distribute, and adapt your posts. 

Why does this matter? Because the copyright to any posts vests in their creator (you) and without a license, a social media company would have to pay the creator (you again) each time a post is reproduced or displayed on the platform. The licenses users grant social media companies are non-exclusive and do not give them “ownership” over posts; they only allow the social media companies to publish the posts. 

This is important for a couple of reasons. First, it gives social media platforms safe harbor from civil liability under Section 230 of the Communications Decency Act. This protects them from being sued for defamation and the like based on a user’s posts. Second, it means that the original creator retains ownership of the posts. Despite this, and as will surprise no one who has been alive over the last 20 years, social media platforms have tried to assert as much control over users’ posts as possible — often, far beyond what is actually permitted by law. And the Courts have started weighing in on this. 

In a 2022 case brought against LinkedIn by a competitor that scraped LinkedIn’s data, the Ninth Circuit observed that “giving companies like Linkedin free rein to decide, on any basis, who can collect and use data — data that the companies do not own, that they otherwise make publicly available to users, and that the company themselves collect and use — risk the possible creation of information monopolies…”

More recently, a company called Bright Data Ltd. had been sued by both Meta Platforms, Inc., and X Corp. In those cases, the social media platforms alleged Bright Data violated their respective terms of service by scraping Facebook and Instagram and X and selling the information gathered to third-parties. 

In neither case was the Court particularly impressed with the plaintiffs’ arguments that Bright Data should not be able to scrape their social media platforms. 

In the Meta case, the Court found that Facebook’s and Instagram’s terms of use did not apply because Bright Data was not logged into an account on either when it engaged in scraping and/or had deleted its accounts before engaging in any scraping. The Court thus rejected Meta’s arguments and granted Bright Data’s motion for summary judgment.

And in the case brought by X, the Court found that X’s claims that data scraping breached its terms of use impermissibly conflicted with the Copyright Act. The Court held that “X Corp. would upend the careful balance Congress struck between what copyright owners own and do not own, and what they leave for others to draw on. In addition to giving itself de facto copyright ownership in copyrighted content that X users designated for public use, X Corp. would give itself de facto copyright ownership over content that Congress declined to extend copyright protection in the first place (e.g., likes, user names, short comments)…” As a result, the Court dismissed X’s lawsuit. 

We’ll see where social media companies go in their efforts to try to keep as much of their users’ data for themselves (and their AI platforms), but the Courts have made it clear there are limits — as of now, third-party scraping of social media can continue.