2026-04-02

Anthropic Leaked Its Own Source Code and May Not Own It

#ai (12)#opensource (22)#security (7)#legal

On March 31st, Anthropic shipped version 2.1.88 of Claude Code to npm with a 60MB source map file that was supposed to stay internal. That file pointed to a zip archive on Anthropic's Cloudflare R2 bucket containing the entire TypeScript source. 1,900 files. 512,000 lines of code. The full architectural blueprint of one of the most commercially successful AI coding tools ever built.

Security researcher Chaofan Shou spotted it within hours. By the time Anthropic pulled the package, the codebase had been mirrored, forked over 41,500 times on GitHub, and archived on decentralized platforms that don't respond to takedown notices.

What followed was a 12-hour chain reaction that may have permanently changed the legal landscape for AI-generated code.

Timeline of the March 31 Claude Code leak

Anthropic's response was a DMCA blitz. GitHub disabled over 8,100 repositories. The original mirror and its entire fork network went dark. Lawyers moved fast.

But a Korean developer named Sigrid Jin moved faster. Jin, previously profiled by the Wall Street Journal as the heaviest Claude Code user in the world (25 billion tokens consumed), woke up at 4am and rewrote the entire core architecture in Python before sunrise. The repo hit 30,000 GitHub stars faster than any repository in GitHub history. He then rewrote it again in Rust.

Anthropic can't touch those rewrites. They're clean-room reimplementations. New code, new language, new creative expression. Copyright protects specific expression, not ideas, not architectures, not design patterns. Anyone can study how a system works and build their own version from scratch. That's how the entire software industry has always operated.

But the legal problems go deeper than clean-room rewrites. The DMCA takedowns themselves rest on a copyright claim that Anthropic's own public statements may have already destroyed.

The authorship problem

On March 18, 2025, the DC Circuit issued a unanimous opinion in Thaler v. Perlmutter holding that the Copyright Act requires human authorship. The case involved a computer scientist who tried to register copyright for artwork generated by his AI system. The court didn't just deny the registration on narrow grounds. It reasoned that the word "author" throughout the Copyright Act is structurally incoherent when applied to a non-human entity. Applying it to a machine would produce absurdities, like referring to a "machine's children" or a "machine's nationality." The Supreme Court declined to hear the appeal.

Some important caveats. That case was about visual art, not code. It dealt with a specific scenario where someone listed an AI as the sole author. It's one circuit, not a Supreme Court ruling. The court deliberately left the door open for "AI-assisted" works where a human contributes meaningful creative input.

The reasoning matters more than the narrow holding. The court established that "author" throughout the Copyright Act is structurally tied to human beings. That's not a ruling about art. That's a philosophical foundation that any future court addressing AI-generated code will find persuasive, even if it's not technically binding.

And then Anthropic's leadership walked up to that open door and welded it shut.

In January 2026, Boris Cherny, the head of Claude Code, posted on X that 100% of his code is written by Claude. No manual edits. Not even small ones. He shipped 22 pull requests in one day and 27 the next, each one "100% written by Claude." Across Anthropic, he said the figure is "pretty much 100%."

An Anthropic spokesperson softened that to 70-90% company-wide. For Claude Code specifically, about 90% of its code is written by Claude Code itself.

These aren't offhand comments. They're timestamped, attributed, reported by Fortune, The Register, CNBC, and others. They're discoverable evidence. And they directly undermine any claim of human authorship over the leaked codebase.

The court in Thaler left Anthropic a door. Their marketing team closed it.

You can't copyright an idea

There's a common response to this argument: "But the humans designed the system. They architected it. The AI just wrote the implementation."

This gets the law exactly backwards. Copyright protects specific expression, not ideas, not designs, not architectures. You can design a system all day long. That design isn't copyrightable. Patents can protect novel functional inventions, but that's a completely different legal regime with a completely different process and standard.

The part that copyright actually covers (the specific code) is the part Anthropic says the AI wrote. And the part the humans contributed (the design and architecture) is the part copyright doesn't protect.

So when someone rewrites the architecture in a different language, there's nothing to claim. The ideas are free. The original expression is AI-generated. And the new expression belongs to whoever wrote it.

The transitive authorship problem

Here's where it gets speculative, and where the implications extend far beyond one leaked npm package.

If 90% of Claude Code's source was written by Claude, then the training pipeline code that produces the next generation of Claude models was also substantially written by Claude. The model weights that come out of that pipeline are the output of an AI-authored system.

Can you copyright the product of a system that was built by AI?

The authorship chain across model generations

The existing precedent only addresses direct AI outputs. Nobody has litigated whether a work produced by an AI-coded system inherits the copyright problem of the system that created it. But the logic is hard to avoid. If human authorship is the prerequisite for copyright, and the authorship chain passes through a substantially non-human link, the claim gets weaker at every generation.

Nobody knows where the line is. No court has addressed it. But every frontier AI company should be thinking about it, because the answer affects whether their core asset is protectable at all.

The moat that isn't

Trade secret is the last legal defense in this analysis. Trade secret protection doesn't require human authorship. It doesn't care who or what created the information. It only requires that the holder took "reasonable measures" to keep it secret.

Anthropic is not having a great month on that front either.

Days before the Claude Code leak, Fortune reported that descriptions of Anthropic's upcoming model (internally called "Mythos" or "Capybara") were sitting in a publicly accessible data cache along with close to 3,000 other files. Then the Claude Code source went out on npm. Two major exposures in one week.

If this ever went to court, opposing counsel would argue that Anthropic's operational security doesn't meet the "reasonable measures" threshold for trade secret protection. A single incident might be forgivable. A pattern is harder to defend.

Legal protections breakdown for Anthropic's leaked code

The protections collapse one by one. Copyright requires human authorship, and Anthropic publicly says AI writes the code. Trade secret requires maintained confidentiality, and Anthropic keeps accidentally publishing things. Patent requires specific novel invention claims and a formal process, not something you can retroactively blanket over a leaked codebase. DMCA takedowns require a valid underlying copyright, and they only work on centralized platforms anyway.

What's left is practical barriers: the cost of compute, the difficulty of assembling training data, the head start of an established product, brand trust, enterprise relationships. Those are real advantages. But they're business moats, not legal ones. They can't be enforced in court. They erode as compute gets cheaper, as open-source models close the gap, and as competitors absorb architectural insights from leaks exactly like this one.

The product demo

There's an irony at the center of this whole story that's hard to overstate.

Anthropic built Claude Code. They told the world it was so good that their own engineers stopped writing code entirely. Then a packaging error exposed the source. The world's heaviest Claude Code user used what was almost certainly Claude to rewrite Claude Code in Python overnight. The result is legally untouchable, it's the fastest-starred repo in GitHub history, and it demonstrates exactly the capability Anthropic has been selling.

Anthropic's own product, used by Anthropic's own power user, to neutralize Anthropic's own IP. Made possible because the product is exactly as good as they said it was.

That's not a leak. That's a product demo.

What this means

The Claude Code leak is entertaining. The legal questions it surfaces are not. Every frontier AI company that uses its own models to write production code is building on the same unstable ground. The more they market AI autonomy to sell products, the more they undermine the legal frameworks that protect those products. Every press quote about AI writing 100% of the code is a future exhibit in a case they hope never gets filed.

The law hasn't caught up. Congress hasn't acted. The courts have addressed one narrow question in one circuit. But the trajectory is clear, and every company in this space is exposed to it.

The question isn't whether AI-generated code is copyrightable. The court already answered that. The question is whether anyone in the industry is willing to admit what that answer means for them.

I'm not a lawyer. This article is speculative analysis based on public reporting, public court opinions, and public statements by Anthropic leadership. If you're making business decisions about AI-generated IP, talk to an actual attorney.

Sources: