Fable 5 taken down by the US government
Recently the AI company Anthropic which you might know by using the everyday AI Claude, or for Software Engineers Claude Code, showcased a AI with the name of Mythos Preview, which apparently spot flaws and bugs within major open source software as well as escaped it's rigorous sandbox easily. To address the growing cyber capability of AI, Anthropic launched Project Glasswing, which consists of macrocompanies like Google, using these AI to strengthen their security, showing a glimpse of the cybersecurity related work potential of AI.
Soon, Anthropic launched Mythos 5 and Fable 5, these 2 are basically the derivatives/descendents of Mythos Preview. These 2 were extremely strong in the benchmarks as shown in the given chart:
Agentic coding, SWE-Bench Pro
Claude Mythos 5 / Fable 5: 80.3%
Claude Mythos Preview: 77.8%
Claude Opus 4.8: 69.2%
GPT 5.5: 58.6%
Gemini 3.1 Pro: 54.2%
Agentic coding, FrontierCode Diamond, xhigh
Claude Mythos 5 / Fable 5: 29.3%
Claude Opus 4.8: 13.4%
GPT 5.5: 5.7%
Knowledge work, GDPval-AA
Claude Mythos 5 / Fable 5: 1932
Claude Opus 4.8: 1890
GPT 5.5: 1769
Gemini 3.1 Pro: 1314
Knowledge work vision, GDP.pdf, no tools
Claude Mythos 5 / Fable 5: 29.8%
Claude Opus 4.8: 22.5%
GPT 5.5: 24.9%
Gemini 3.1 Pro: 16.7%
Spatial reasoning, Blueprint-Bench 2
Claude Mythos 5 / Fable 5: 38.6%
Claude Opus 4.8: 14.5%
GPT 5.5: 36.2%
Gemini 3.1 Pro: 26.5%
Tool use, AutomationBench
Claude Mythos 5 / Fable 5: 17.4%
Claude Opus 4.8: 15.5%
GPT 5.5: 12.9%
Gemini 3.1 Pro: 9.6%
Computer use, OSWorld-Verified
Claude Mythos 5 / Fable 5: 85.0%
Claude Mythos Preview: 85.4%
Claude Opus 4.8: 83.4%
GPT 5.5: 78.7%
Gemini 3.1 Pro: 76.2%
Legal, Legal Agent Benchmark
Claude Mythos 5 / Fable 5: 13.3%
Claude Opus 4.8: 10.4%
GPT 5.5: 2.1%
Gemini 3.1 Pro: 0.0%
Multidisciplinary reasoning, Humanity’s Last Exam, no tools
Claude Mythos 5 / Fable 5: 59.0%*
Claude Mythos Preview: 56.8%
Claude Opus 4.8: 49.8%
GPT 5.5: 41.4%
Gemini 3.1 Pro: 44.4%
Multidisciplinary reasoning, Humanity’s Last Exam, with tools
Claude Mythos 5 / Fable 5: 64.5%*
Claude Mythos Preview: 64.7%
Claude Opus 4.8: 57.9%
GPT 5.5: 52.2%
Gemini 3.1 Pro: 51.4%
Biology, BioMysteryBench, hard
Claude Mythos 5 / Fable 5: 46.1%*
Claude Mythos Preview: 29.6%
Claude Opus 4.8: 40.0%
Biology, BioMysteryBench, human solved
Claude Mythos 5 / Fable 5: 83.9%*
Claude Mythos Preview: 82.6%
Claude Opus 4.8: 80.4%
Agentic coding, Terminal-Bench 2.1
Claude Mythos 5 / Fable 5: 88.0%*
Claude Opus 4.8: 82.7%
GPT 5.5: 83.4%, Codex CLI
Gemini 3.1 Pro: 70.7%, Gemini CLI
Cybersecurity, ExploitBench, Cap%
Claude Mythos 5 / Fable 5: 78.0%*
Claude Mythos Preview: 69.0%
Claude Opus 4.8: 40.0%
GPT 5.5: 34.0%
Health, HealthBench Professional
Claude Mythos 5 / Fable 5: 66.0%*
Claude Mythos Preview: 64.7%
Claude Opus 4.8: 56.9%
GPT 5.5: 51.8%
Methodology note
Scores for Claude Mythos 5 and Claude Fable 5 are within a 1–3 percentage point difference.
The table shows the higher score of the two.
Starred benchmarks show larger differences due to blocking safeguards for cybersecurity and biology-related questions.
Which shows that it is extremely good in agentic coding, reasoning, long workflows, cybersecurity and Health related work, in fact it was substantially better than earlier models, and could one shot beautiful UI websites.
Then Fable 5 was launched for the public, strengthened by much stronger guardrails, it was available for general use, where cyber/medical requests were routed to Opus 4.8.
Soon later, the US government issued an order to Anthropic to take down these models for any foreign people or foreign nationals due to national security concerns which Anthropic interpreted due to a narrow jailbreak technique, that could expose already known knowledge.
To comply with this request, Anthropic had to shut down the model for all people since marking selectively due to nationality is difficult in practice.
Poll: Do you think the US government concerns were valid or was it a stratergic move?
Choose one option.