
AI trackers cause a 50% increase in bandwidth demand for Wikimedia Commons.
The Wikimedia Foundation reports that bandwidth consumption for multimedia downloads has increased by 50% since January 2024.
The Wikimedia Foundation, responsible for managing Wikipedia and other collaborative knowledge projects, reported that since January 2024, bandwidth consumption for multimedia downloads from Wikimedia Commons has increased by 50%. According to a statement, this growth is not due to increased demand from users seeking information, but rather the activity of automated bots searching for data to train artificial intelligence models.
The statement emphasizes that, although Wikimedia's infrastructure is designed to handle traffic spikes during events of great interest among human users, the traffic generated by these bots has reached unprecedented levels, increasing operational risks and costs. Wikimedia Commons is a freely accessible repository that hosts images, videos, and audio files under open licenses or that are in the public domain.
A deeper analysis reveals that nearly 65% of the most costly traffic, in terms of resource consumption, comes from these bots, while only 35% of page views are generated by them. This discrepancy is attributed to the fact that the most visited content is cached closer to the user, while less frequented content is located farther away in the "main data center," which incurs higher costs for serving it. According to Wikimedia, human readers tend to focus on specific topics, in contrast to bots that typically "bulk read" numerous articles, also visiting less popular pages. This causes their requests to be more likely redirected to the data center, thereby increasing resource usage.
In light of this situation, the Wikimedia Foundation's site reliability team is compelled to allocate time and resources to block crawlers to prevent service disruptions for regular users. Additionally, they face associated cloud costs. This phenomenon is part of a growing trend that threatens the existence of an open internet. Recently, Drew DeVault, a software engineer and open source advocate, expressed concern that AI crawlers ignore "robots.txt" files designed to prevent automated traffic. Gergely Orosz also commented on how AI scrapers from companies like Meta have increased bandwidth demand on their own projects.
On the other hand, some tech companies are taking steps to mitigate the problem, such as Cloudflare, which has launched AI Labyrinth, a tool that uses AI-generated content to slow down crawlers. However, this situation is like a game of cat and mouse that could drive many editors to seek refuge behind login walls and paywalls, negatively impacting all web users today.