Is website scraping legal in my case


It is now almost a habit that hacks and data breaches in online services regularly lead to millions of user data falling into the wrong hands. In April 2021, for example, the data of more than 553 million Facebook users, 500 million LinkedIn accounts and 1.3 million clubhouse users appeared on the web.

What is rather unusual, however, is that all three operators of the services denied being victims of a hack. Rather, it was about scraping, it was said, with data being tapped that was published by the users themselves and can be viewed by other members or at least their contacts, such as full names, telephone numbers, etc. But what is scraping, how does it work and How can you protect yourself against this?

Scraping - Definition

Scraping, short for screen scraping or web scraping, is a function in which an application or script reads and saves information from a website or online service - that is, "scratches" the information from the screen. Well-known use cases of this technology are, for example, bots from search engines such as Google, which are continuously on the Internet to index websites (crawling). But comparison portals also use the method to collect vast amounts of data and then evaluate them.

In many cases, such a practice is also in the interests of the website operator, since, thanks to such indexing, they may achieve a higher range or more sales for their products and services. However, the technology can also be misused. Companies are able to use scraping, for example, to automatically search through the web shops of their competitors.

They can then, for example, adjust their prices so that they are always a little cheaper (price grabbing). Or they take over their product descriptions and images (content grabbing) or the entire web shop construction and save a lot of time and money. The phone numbers and email addresses collected on Facebook are also linked to subsequent waves of "smishing" and phishing.

Web Scraping - How It Works

The scraping process basically consists of two parts, namely calling up the desired web pages (static and dynamically generated) and then extracting the data. A large number of scraping tools are available, and on Github alone there are numerous solutions and toolkits for a wide variety of applications.

In the case of the Facebook scrap, where data marked as private was also extracted, the operators assume a special method that made use of a gap in the platform's contact import function that was closed at the end of 2019. This feature should enable users to identify friends and acquaintances on Facebook by uploading their phone book. According to Facebook, the attackers took advantage of this functionality on a large scale to query a set of user profiles and then obtain information about them, which is contained in their public profiles.

Scraping - Legal or Illegal?

The answer is: it depends. If no technical protective devices are overcome for scraping, the act itself is not illegal - after all, only information is collected that is publicly available anyway. However, what you do with the data after it has been "scraped" can be illegal. If, for example, pictures, articles and the like are tapped and published elsewhere without permission, it is clearly a violation of copyright. It should also be clear that the use of the data sets for phishing and similar activities is not legal.

The verdict is even clearer when it comes to scraping personal data. GDPR and other data protection laws have clear guidelines when it comes to collecting and storing personal data. You must have a legitimate reason to do so, such as express consent or a legitimate interest in collecting and storing the personal data. The GDPR also requires that only as much data be processed as is necessary to fulfill a task (data economy).

Most social network operators also exclude scraping in their terms and conditions. The fact that, as in the case of Facebook, LinkedIn and Clubhouse, there seem to be hardly any other control bodies, casts a bad light on their security settings.

Data Scraping - Defense Measures

There are various options for the website operator to protect themselves from scrapers. txt file to deny access to web crawlers. In addition, web application firewalls are usually able to detect suspicious activities of a scraper.

In addition, it shouldn't be made too easy for automated data collectors. In the case of Clubhouse it looks like that a consecutive numbering was used when creating user profiles in the SQL database. This allows scrapers relatively easy access: a simple script that adds a number to the profile links is sufficient for mass data scraping.

And on the user side? "Users have to be aware that any information that is publicly accessible in their way runs the risk of falling victim to scrappers, be it Facebook, LinkedIn, clubhouse or anywhere else," the security experts at Avast explain: Once published, the information could be collected and you have no control over who copies the data and what is done with it in the vastness of the Internet.

The only way to prevent public information from being gathered and used in an undesirable way is to not make it public. Facebook also recommends all users to regularly check their data protection settings in order to continuously adapt them to their current preferences.