Elsevier, Hachette, and Macmillan Sue Meta Over Llama Training Data

SubscribeLaw and Policy

Elsevier, Hachette, Macmillan, and author Scott Turow filed a class action lawsuit on May 5 against Meta Platforms and CEO Mark Zuckerberg in the Southern District of New York, alleging that Meta reproduced copyrighted books and scientific papers without authorization to train its Llama large language models ^[1]. The complaint identifies two principal sources of the allegedly infringing material: paywalled academic journals scraped from sources including Sci-Hub and LibGen, and commercially published books ^[1]. Plaintiffs characterize the action as the first major AI copyright lawsuit brought by academic and trade publishers asserting their own direct infringement claims, distinct from prior suits filed by individual authors or news organizations ^[1].

The legal theory centers on the Copyright Act's exclusive reproduction right. Publishers contend that ingesting entire texts into a training corpus, without license or payment, constitutes actionable copying regardless of whether the model's outputs reproduce protected expression verbatim ^[1]. Elsevier, a division of RELX Group, publishes thousands of peer-reviewed journals and enforces strict paywall access; Hachette and Macmillan are two of the largest trade book houses in the United States. The Association of American Publishers, which has lobbied aggressively on AI training policy, counts all three among its members ^[1]. Meta has defended its training practices in parallel litigation by invoking fair use, arguing that model training is transformative and non-expressive.

The case lands in the Southern District of New York, which is already managing several overlapping AI copyright dockets, including suits brought by the Authors Guild and by news publishers. A class certification motion will be a threshold battleground: the proposed class would encompass a broad set of academic and trade rightsholders, and defendants will likely challenge typicality and commonality given the heterogeneous licensing arrangements across that universe. Meta has not yet filed a responsive pleading, and no scheduling order has been publicly reported as of the filing date ^[1].

The outcome carries structural consequences for the AI industry. If courts reject a fair-use defense at the motion-to-dismiss stage, training pipeline operators, not just Meta, would face pressure to negotiate licensing deals or suspend ingestion of protected works. The publishing plaintiffs appear to be positioning for that leverage, filing as a class to aggregate damages and signal industry-wide coordination rather than piecemeal negotiation ^[1]. Legislative proposals on AI training disclosure are pending in Congress, and this filing may accelerate that debate.

References

[1]Nature. (2026, May 11). Elsevier vs Meta: first science publisher sues over scraped research papers. https://www.nature.com/articles/d41586-026-01481-0

Elsevier, Hachette, and Macmillan Sue Meta Over Llama Training Data

References

Latest Articles