|
16 October 2025 (Washington, DC) - - Last month, AI company Anthropic agreed to a blockbuster $1.5 billion settlement after being caught "red-handed" training its models on an enormous cache of pirated versions of copyrighted books and other material.
Note to readers: well, that was the mainstream tech media's call. Anthropic settled a copyright case that it had largely won in district court. Future litigants are likely to hold out for much more. A uniquely punitive provision of copyright law allows plaintiffs who may not have suffered any damage to seek awards in the trillions. Indeed, legal observers estimated that Anthropic dodged a $1+ trillion liability by settling, and their lawyers played it perfectly.
The big issue is the avalanche of litigation, already 40 lawsuits and counting, doesn’t just put the artificial intelligence industry at risk of spending their investors’ money on settlements instead of advances in AI. It raises the prospect that the full bill won’t be known for a decade, as different juries and different courts reach varying conclusions.
And now, a similar lawsuit aimed at ChatGPT maker OpenAI has taken a dramatic turn, raising the possibility of yet another major legal escalation regarding AI-facilitated copyright infringement — and a potentially much bigger payout to rights-holders.
Specifically, authors and publishers who filed a lawsuit against the Sam Altman-led firm have secured access to internal Slack messages and emails discussing the mass deletion of a pirated books dataset. A New York district court ordered OpenAI to hand over the communications regarding data deletion last week.
As legal analysts have noted, the communications could demonstrate willful infringement, potentially leading to enhanced damages of up to $150,000 per work, a massive increase from just $750.
For context: Anthropic’s settlement only covered around half a million works out of an estimated seven million, resulting in a payout of around $3,000 per author. Authors and publishers had just gone after similar communications — right before the case settled.
Finding out what attorneys said or what clients said to attorneys and back and forth probably gives us a lot of evidence regarding state of mind.
The lawsuit highlights the AI industry’s largely careless treatment of copyrighted materials. Tech leaders have continued to argue that training AI models on protected content falls under “fair use,” a legal doctrine that allows for transformative use of copyrighted materials.
Most recently, OpenAI’s TikTok-like text-to-video app Sora 2 has been found to spit out a litany of videos heavily based on protected intellectual property, showing it’s not just ChatGPT potentially infringing copyright.
Note: Sloppily-implemented guardrails have since stemmed the tide somewhat, but as we have reported users easily find ways around them.
Given the major payouts at stake, the tides could be turning in favor of artists who had their lifework sucked up by AI models without permission, who could be looking at another consolation prize in the form of a settlement fee.
Communications could show that OpenAI deleted the dataset, a move that plaintiffs argue could be construed as intentional destruction of evidence. Judge Ona Wang has already found that OpenAI improperly withheld materials.
While it remains to be seen how the internal communications will influence the case, experts argue that it could allow plaintiffs to build a stronger case. Said one eDiscovery attorney we consulted:
“There may be a smoking gun, we don’t know. That is what we aim for in these cases. But the authors’ lawyers are going to get as much information as possible now to get as much money for plaintiffs as possible".
As we reported over the summer, OpenAI’s lawyers already made a major misstep by at first claiming that the company had deleted the LibGen data set “due to non-use” - only to change their mind, claiming they had "misspoken". If OpenAI were to be found to have intentionally destroyed evidence by deleting copyrighted materials, it could be in deep trouble. Juries hear that kind of stuff and it becomes a very powerful stick. It doesn’t necessarily guarantee the outcome but it’s a heavy thumb on the scales.
And a key point: OpenAI relied on the standard eDiscovery rule: attorney/client privilege. Emails and documents protected. Shielded from discovery. But in an extraordinary move, the plaintiffs asked the judge for access to the communications between OpenAI and its attorneys by invoking a “crime-fraud” exemption to privilege. She looks to agree.
And that's the thing. By securing access to some Slack messages and emails discussing OpenAI’s deletion of a dataset containing pirated books you'll reveal additional attorney communications about the decision. If they succeed, the communications could demonstrate willful infringement, triggering huge fines.
And this was not the first time that a Big Tech company’s internal communications with lawyers surrounding its use of copyrighted material showed up as evidence in a lawsuit. According to email messages revealed in discovery during a copyright case in 2024 that slopped over into 2025, Meta researchers expressed reservations about using LibGen, describing it as a “data set we know to be pirated”. Per the filings, the issue was escalated to “MZ,” who approved the pirated library’s use. This forced to Zuckerberg to hand-over emails and Slack chats.
If you are in the eDiscovery ecosystem, peruse the OpenAI Copyright Infringement court docket in this case. It is a Master Class. This case represents a front in the broader fight over discovery/eDiscovery in the litigation over novel questions about applying copyright law to AI. It carries massive ramifications for both the burgeoning technology and an array of content industries, plus eDiscovery rules.
And it is probably too soon to call out trends, but two thoughts jump to mind after following all of these copyright cases and content cases over the last three or more years:
1. AI outputs are now legal records in the U.S. AI-generated content whether it’s a chatbot response, a code snippet, or a marketing draft is now squarely within the scope of litigation discovery in the U.S. Given the AI-generated content is, by its very nature, public content, allowing discovery of input prompts and generated outputs where relevant in litigation proceedings is totally understandable. Companies need to treat these outputs as potentially evidentiary, with the same care applied to them as emails, contracts, and internal memos.
If your AI tools generate content that could be relevant to a dispute; whether IP-related, employment-related, or regulatory, your company is likely going to need to be able to preserve, retrieve, and produce that data on demand.
2. Privacy compliance and "normal" eDiscovery rules are no longer a shield. OpenAI’s objection to the court order centered on privacy concerns and the normal eDiscovery rules about client/attorney privilege. The company argued under the Federal Rules of Civil Procedure, the EU General Data Protection Regulation and even the California Privacy Rights Act.
But the court prioritized key litigation discovery over privacy preferences and standard eDiscovery rules. Plus, as she noted, sticking a lawyer on a "cc" doesn't do it. Too many of the Slack messages were plainly devoid of any request for legal advice, and counsel was not asked to weigh in. Employees were exchanging conversations regarding model commercialization and associated risks. Not looking for legal advice.
|