Cover Image for Meta accused of using illegally downloaded books to train its artificial intelligence.
Sat Jan 11 2025

Meta accused of using illegally downloaded books to train its artificial intelligence.

A study reveals that Meta was feeding its artificial intelligence using illegal torrent downloads.

Meta is facing a new scandal following its announcement to end moderation on its platforms. An investigation reveals that the company, led by Mark Zuckerberg, has been using books obtained illegally through torrent servers. Recently disclosed court documents indicate that Meta turned to the platform Library Genesis to train its artificial intelligence models, as reported by various media outlets.

This issue has arisen in the context of a lawsuit filed by a group of authors, including figures such as Ta-Nehisi Coates and comedian Sarah Silverman. The full disclosure of the documents occurred after a judge dismissed Meta's attempts to keep parts of them secret. According to reports, the judge considered that the company was trying to hide information to avoid bad publicity, citing a troubling remark from an unidentified Meta employee: the company feared that media coverage regarding the use of a pirated dataset, such as LibGen, could weaken its position with regulators.

Library Genesis, known as LibGen, is a site that provides free access to a vast collection of books, academic articles, and journals. Meta, along with other leaders in the field of artificial intelligence, claims that the use of data extracted from the web can be considered "fair use," but this claim will be subject to rigorous scrutiny in the legal arena. An engineer at the company expressed discomfort regarding the situation, noting that "downloading torrents from a corporate laptop doesn't feel right."

Discontent among employees has even reached the upper management of the company. A memorandum suggests that following an increase in discussions about the use of LibGen, the AI team received approval to use that material. The plaintiffs have argued that Meta has treated the "public availability" of these datasets as a justification for their use, despite internal records revealing that both Zuckerberg and other high executives were aware of LibGen's pirated nature.