New research has revealed the scale of the growing web scraping problem on some of the world's largest websites.
Smartproxy’s Most Scraped Websites 2024 report states that social media pages account for over a quarter (27%) of the most scraped sites.
During 2023 and the first three months of 2024, bots were most interested in search engines like Google (42%), however, social media accounts and community forums collectively accounted for a third (34%) of observed scraping cases.
Google is the most crawled website
While alarming, fortunately many of the most scraped sites are not targets for personal data mining, with search engines and e-commerce leading the way.
“This trend shows the critical need for real-time search data in various sectors, including the ever-growing field of AI, where data plays a crucial role in training AI models,” said Smartproxy CEO Vytautas Savickas.
“Additionally, e-commerce platforms contribute a large share of most scraped targets, reflecting the industry’s push for competitive intelligence needed for dynamic pricing strategies.”
E-commerce sites, which account for around one-fifth (18%) of data extraction requests, represent a growing segment. Smartproxy noted that shopping trends are emerging and as consumers seek more competitive prices, real-time data has become increasingly important.
The report also details that e-commerce is reaching peaks, with shopping periods such as Black Friday (+64%), Christmas (+46%) and Amazon Prime Day (+22%) recording considerable spikes.
“Companies are stepping up their data collection efforts during these times to capture the value of data generated by the flood of online shoppers looking for discounts and special offers,” Savickas added.