Adobe is in hot water after a proposed class-action lawsuit accused the tech giant of using pirated books, including works by author Elizabeth Lyon, to train its SlimLM AI model. This legal action highlights growing concerns over intellectual property rights in the rapidly advancing world of artificial intelligence.
The Allegations Behind the Adobe AI Lawsuit
The lawsuit, filed by Elizabeth Lyon, an author of non-fiction writing guidebooks, claims that Adobe used pirated versions of several books to train its SlimLM model. The lawsuit specifically targets SlimPajama, an open-source dataset used by Adobe, alleging it contains works that were copied from a dataset known as Books3. This dataset, which has been used by various tech companies to train their AI systems, includes a substantial number of copyrighted works.
Lyon claims that her books were part of this dataset, which was manipulated for AI training purposes. The SlimPajama dataset is said to have been created by copying and altering the RedPajama dataset, which has already been associated with similar lawsuits against other tech giants like Apple and Salesforce for using pirated materials in their AI training.
The Growing Legal Challenges for AI Training
This lawsuit is part of a larger trend where tech companies, including Adobe, face legal challenges over the datasets used to train AI systems. These lawsuits stem from the fact that AI models require vast amounts of data, and sometimes, that data includes pirated or copyrighted materials without permission from the authors.
Previously, Anthropic, the maker of the Claude chatbot, settled with authors for $1.5 billion after similar allegations of using pirated books. This case marked a significant turning point in the ongoing debate over AI training data and intellectual property rights.
What’s Next for Adobe?
The outcome of this case could have wide-reaching implications for AI development and its legal landscape. As Adobe fights these allegations, the tech industry as a whole will be watching closely. If the lawsuit succeeds, it could set a precedent for how AI companies handle copyrighted content in their training datasets, potentially forcing them to adopt more transparent and legally compliant practices.
