Precedence thoughts on Chardet AI license change dispute

The closest existing precedent: Thomson Reuters v. Ross Intelligence

In February 2025, Judge Bibas in the District of Delaware granted partial summary judgment to Thomson Reuters, marking the first time a U.S. court reached a conclusion on fair use in the context of AI training data. Jenner & Block The key finding was that Ross's use of Westlaw headnotes as AI training data to build a competing legal research tool was commercial and non-transformative Reed Smith, and fair use was rejected. However, the court was careful to note this is not a generative AI case, and opinions on its applicability to generative AI have been split. Perkins Coie That case is now on interlocutory appeal to the Third Circuit, which will be the first court of appeals to weigh in on fair use in AI Wolterskluwer.

The "substantial similarity" framework

The existing copyright framework requires two things to prove infringement: (1) access to the original work, and (2) "substantial similarity" in the output. Copyright owners may be able to establish that AI outputs infringe their copyrights if the AI program both had access to their works and created substantially similar outputs. Congress.gov This is actually the core of the chardet dispute — Blanchard's JPlag analysis showing under 1.3% similarity is essentially arguing the second prong fails, regardless of whether Claude had access to the original during training.

The clean room problem with LLMs

This is where it gets really interesting. As the ShiftMag analysis nailed it: the entire concept of a clean-room implementation assumes that knowledge contamination is binary — either you've seen the code or you haven't — and LLMs break this model completely because the model has "seen" the code during training. ShiftMag

But there's a counterargument from Heather Meeker (a well-known open source licensing attorney) who wrote about this exact topic. She argues that as long as an AI model has been trained on enough inputs, it should not parrot any one input Copyleft Currents, and that AI greatly reduces the human effort necessary for clean room projects, which are traditionally very human-intensive. Copyleft Currents

The emerging judicial consensus (and disagreements)

A judicial consensus is developing that training a general-purpose AI model is highly transformative, a factor favoring fair use. But other issues are the subject of sharp disagreements between courts, and 2026 is unlikely to bring final answers. Lexology And whether training generative AI models on copyrighted material constitutes fair use is a moving target — the Thomson Reuters decision is on appeal, and other cases like Kadrey v. Meta are still in discovery while Bartz v. Anthropic settled. IPWatchdog

The Copyright Office's position

In May 2025, the U.S. Copyright Office concluded that using copyrighted materials for AI model development may constitute prima facie infringement, warned that models could infringe if outputs closely resemble training data, and emphasized that "transformative" arguments are not inherently valid. Built In

The bottom line for chardet specifically

The chardet case sits in a genuine legal gray zone. As Simon Willison put it, he sees this as a microcosm of the larger question around coding agents for fresh implementations of existing, mature code, and expects it will soon show up in commercial scenarios where well-funded litigation follows. Simon Willison

The strongest argument for Blanchard is that the output is measurably, structurally independent code — the 1.3% similarity metric is pretty compelling evidence against "substantial similarity." The strongest argument against is the one the ShiftMag piece raised: take leaked Windows source code, run it through an LLM, and release the output as open source — if that's not acceptable, why is chardet different? The mechanism is identical; the only variable is sympathy for the copyright holder.

What's particularly interesting from your vantage point as someone who's been thinking about agentic AI engineering is that this isn't just an academic question — it has immediate practical implications for anyone building with AI tools on top of existing codebases. The Google v. Oracle precedent on API copyrightability (where the Supreme Court found reimplementing Java APIs was fair use) gives some comfort on the interface side, but the implementation side with AI in the loop is genuinely uncharted territory. We probably won't get a definitive answer until one of these cases works its way through the appellate courts over the next year or two.