Apple is facing a proposed class action lawsuit in the U.S. District Court for the Northern District of California, filed by authors Grady Hendrix and Jennifer Roberson. The suit accuses the company of illegally using copyrighted books to train its artificial intelligence models, including those used for Apple Intelligence.
At the center of the case is a dataset called Books3, which is widely known to contain pirated works from “shadow libraries” such as Bibliotik. The authors allege that Apple relied on Books3 through its connection to RedPajama, a dataset used to train OpenELM, the company’s open-source large language model. Since OpenELM was publicly documented by Apple in 2024, the plaintiffs argue that the same pirated material may also have been used to train Apple’s Foundation Language Models.
The lawsuit highlights that Apple did not seek permission, provide credit, or pay compensation for the authors’ works, despite using them to train systems that underpin a potentially lucrative venture. The plaintiffs claim that this practice diluted the market for their books and deprived them of control over their intellectual property. They are seeking damages, restitution, attorneys’ fees, and even the destruction of Apple’s AI models that include the pirated training data.
This case follows a wave of lawsuits targeting companies developing generative AI. Anthropic recently agreed to pay $1.5 billion in a record settlement with authors over similar claims. Meanwhile, Meta has successfully defended itself against accusations of copyright infringement, with a court ruling that its training practices fell under fair use. Microsoft and OpenAI have also faced lawsuits alleging misuse of copyrighted books.
For Apple, the timing is notable. The company has publicly emphasized its efforts to train AI systems responsibly, including offering licensing deals to publishers and signing agreements with content providers like Shutterstock. Apple has also pledged to respect website restrictions such as robots.txt when scraping data. Despite these measures, the lawsuit contends that the company still benefited from datasets tied to pirated material.
The case will test how courts balance intellectual property rights against the needs of AI development. As the legal battle unfolds, the outcome could reshape the future of Apple Intelligence and set a precedent for how technology companies approach data sourcing for training their models.
(via Reuters)
