Web Scraping: know the data collection technique


Web Scraping: In early April, a new data leak exposed 533 million Facebook users worldwide, including the founder of the social network Mark Zuckerberg and around 8 million Brazilians who have profiles on the service.

According to the platform, this data exposure is not due to any invasion of its servers. The information, which stopped at a hacking forum, was obtained through a technique known as scraping.

The method, used by marketing agencies, journalists and data scientists, has already made headlines on other occasions, such as in September 2020, when data from 235 million users on YouTube, Instagram and TikTok were leaked. But the most famous case is perhaps the Cambridge Analytica scandal, in which information from Facebook profiles was used to generate behavioral maps of voters.

What is scraping?

Also called web scraping, scraping is a technique that allows you to collect information on the internet in an automated manner, from public databases, available on websites, social networks and other online services.

Generally, the tool is used to speed up the consultation and the collection of this information, while the work done manually would take a much longer time. The agility of the process is due to specific applications, programming language or scripts to copy data on a large scale.

Scraping is triggered when a researcher, scientist, journalist or other professional needs to collect a large amount of data to feed a study, research or report, automating the collection on a public basis from the federal government or any other source.

With the scraping of data, it is also possible to obtain open information from profiles on social networks (name, photo, address, phone, email, etc.) and through Google, for the most varied objectives, such as the segmentation of advertising campaigns and monitoring competitors.

Is data scraping legal?

Collecting data by scraping is not considered illegal, as long as scraping takes place on public bases. That is, the information obtained is accessible to any internet user and just as visiting someone’s profile and viewing the data made available there is not a crime, using an automated tool for such work also does not violate the laws.

However, it is necessary to know that Facebook, Instagram, YouTube and TikTok, among other platforms, currently consider the automated copying of data stored by them as a violation of the rules of use of their services.

Are there risks for those who have the data copied?

When using scraping, people and companies can have access to public information of any individual included in that database, such as phone number, e-mail, profile picture, age and sex, depending on the type of source accessed by the automatic tool.

In the case of a social network, scrapers also get details such as number of followers, engagement and even shared links, in addition to public posts and other content open to other users, if the platform grants such access.


Please enter your comment!
Please enter your name here