James Hennessy, Intergration Engineer

title

The idea behind this legal theory is that web scraping—often high-volume, unwanted data requests—are a form of trespass on private tangible property—computer servers. But the thing about trespass to chattels is that it requires both a trespass to private tangible property and an element of damages. In the early days of the Internet, when Internet connections sounded like this, it didn’t take a lot of extra traffic to damage someone’s server or the ability to provide a functioning website. Many web scrapers were clumsy and didn’t realize the impact of their additional requests on servers. In the late 1990s and early 2000s, web scraping often did burden or shut down websites.

But as technology improved, this legal theory stopped making as much sense. Server capacity improved by many orders of magnitude, and most scrapers became savvy enough to limit their requests in a way that became imperceptible or at least inconsequential to the host servers. Now, one of elements of the trespass to chattels legal claim—damage to the servers or other tangible property of the host, very rarely happens.

Next, from the early 2000s until 2017, the primary legal theory that was used to deter web scraping was the Computer Fraud and Abuse Act or the CFAA. The CFAA prohibits accessing a “protected computer” without authorization. In the context of web scraping, the question is whether, once a web scraper gets its authorization revoked (usually via cease-and-desist letter, but often in the form of various anti-bot protections), any further scraping and use of a website’s data is “without authorization” within the meaning of the CFAA.

From 2001 to 2017, the simplistic answer was yes, any form of revocation of authorization was typically sufficient to trigger CFAA liability, if the scraper continued to access the site without permission. And then, in 2017, the famous hiQ Labs, Inc. v. LinkedIn Corp. case came out, which affirmed a plaintiff web scraper’s right to access public LinkedIn data under the CFAA. The Ninth Circuit affirmed, holding:

We agree with the district court that giving companies like LinkedIn free rein to decide, on any basis, who can collect and use data—data that the companies do not own, that they otherwise make publicly available to viewers, and that the companies themselves collect and use—risks the possible creation of information monopolies that would disserve the public interest.

Many interpreted this as allowing an affirmative right to scrape public data, even if that was not the correct reading of the law and the reality was always more nuanced.

In the end, it was a pyrrhic victory. hiQ Labs lost that case, and at summary judgment the district court held that “LinkedIn’s User Agreement unambiguously prohibits scraping and the unauthorized use of scraped data.” LinkedIn obtained a permanent injunction and damages against hiQ Labs on that basis.

Web Scrap

title