The Fable You Saw, the Mythos You Didn't
On June 9, 2026, Anthropic released Fable 5 — its most powerful public language model to date — and simultaneously handed a restricted version, Mythos 5, to its vetted cybersecurity partners.
Most people immediately rushed to try it out, but a number of experts sat down to carefully read the system card — the 319-page safety document bundled with the model. They found a mechanism users had no idea existed. If the system decided someone was using the model to build a competing AI, it would quietly degrade the quality of its responses, and it defined "potential competitor" very broadly.
Crucially, the document stated outright that the restriction was "not visible to the user." For anyone who writes for a living, that means something simple and uncomfortable: an answer you accepted as the best available may have been deliberately weakened in advance. You didn't get the best the model could do, nobody told you, and you had no way to notice.
Anthropic moved quickly once experts took their findings to the media. A spokesperson acknowledged the misjudgment — "We made the wrong tradeoff and we apologize for not getting the balance right" — and the company promised to make such restrictions visible going forward, openly routing flagged requests to a fallback model. While user notification is a genuine improvement, the system still decides to pull the stronger version and doesn’t provide any appeal mechanism.
The Fable Behind Fable 5
Within the first 24 hours of launch, a well-known jailbreaker posted what they claimed was a leaked system prompt for Fable 5. It turned out to contain roughly 1,500 lines (approximately 120,000 characters or about 27,000 tokens) describing a complex, multi-step jailbreak. To get around the model's defenses, the author combined several techniques at once: substituting characters with visually identical lookalikes, burying commands inside long, innocuous-looking text, and splitting dangerous requests into fragments.
According to the rumors, the secret behind Fable 5's extraordinary power lay precisely in its invisible base instructions. The story goes that, when previous Anthropic model Opus 4.8 received the leaked prompt, it began replicating the new model's behavior to roughly 90 percent fidelity. Allegedly, a simpler model, Sonnet 4.6, recently got noticeably smarter for exactly the same reason. Developers apparently just added a Fable 5 wrapper, porting over a portion of those same secret instructions.
There was another rumor circulating as well: that a major corporation had quietly tipped off authorities about undisclosed vulnerabilities in the model. That report was soon confirmed,several leading outlets identified Amazon directly, saying thatCEO Andy Jassy personally called the U.S. Treasury Secretary, after which Anthropic was given just 1.5 hours to fix the problem.
To be clear, every detail of these back-channel talks remains unconfirmed. Anthropic itself directly disputed the reported jailbreak, noting that the model never generated some of the responses shown in the leak. We're recounting these stories because they are an inescapable part of how this industry operates and narrates itself, but we are keeping them separate from what is officially documented and verified.

The State Enters the Hidden Layer
The verified part of the story ended very abruptly. On June 12, just three days after launch, Anthropic received a government export-control directive and took Fable 5 and Mythos 5 offline worldwide. The absurdity of the situation was stark: U.S. authorities had effectively barred foreign nationals from accessing the model — a ban extending to non-American employees of Anthropic itself. With no way to instantly screen users by passport, the company was forced to pull the plug for everyone. The decision sent shockwaves even through allied nations at the G7 summit.
The trigger for the shutdown, according to the company, was a feature that "essentially consists of asking the model to read a specific codebase and fix any software flaws." It's a completely standard procedure for developers, but one the government deemed dangerous. At that moment, the lever of control moved still further than before from the user.
This kind of shutdown is only the opening round of a much larger fight over who controls frontier AI models — one that already has governments, leading labs, and a fast-closing China all in the ring. The battle will be long, and we'll be watching it closely and reporting back on what matters.

The Prompt Inside the Source
The system prompt story has direct implications for anyone who writes for a living. On top of the fact that developers can silently degrade your model at the hidden layer, and governments can pull it offline entirely, there is a third threat. A complete outsider can hijack the same mechanism Anthropic's engineers used to quietly constrain the AI. Feed the model a document containing the right instructions and it will follow those instead of yours, or subtly distort your own request without you noticing. That technique is called prompt injection. Anyone who uses AI to process text from untrusted sources is exposed to it. Simon Willison, who coined the term, frames the problem this way:
LLMs are unable to reliably distinguish the importance of instructions based on where they came from.
In future pieces, we'll dig into this topic and offer practical guidance to help writers protect their drafts and vet outside sources. We'll also look at hidden prompts that researchers have embedded directly inside academic papers. Subscribe to our weekly newsletter so you don't miss those articles when they go live.
Sources
References cited in this piece. Last verified on the published or revision date.
- 01
- 02
- 03
- 04
- 05
- 06
- 07
- 08
- 09