Claude Fable 5: Pricing, Free Window, and Benchmark Breakdown
🚀 Claude Fable 5: What Actually Changed — Performance, Price, and Access
On June 9, 2026, Anthropic unveiled a new top tier that sits above its previous flagship Opus line — the Mythos-class. The first model from that class to ship publicly is Claude Fable 5. The headline facts first: pricing lands at $10 per million input tokens and $50 per million output tokens — less than half the cost of the previous most-expensive research-grade model. Through June 22 it ships to paid subscribers at no extra charge; starting June 23 you'll need usage credits. The rumor about a fixed "open date" turned out to be true. Let's unpack it piece by piece.
🧬 What "Mythos-class" Means — and How Fable and Mythos Relate
The Mythos-class splits into two branches that share the same brain — an identical underlying model. The common shorthand you'll hear — "Mythos and Fable are the same model, except Fable has more safeguards in place" — lines up exactly with Anthropic's official announcement.
▶ Claude Fable 5 — the general-purpose, top-performing model released to businesses and developers, with consumer-grade safeguards applied on top of the Mythos-class.
▶ Claude Mythos 5 — the same model as Fable 5, but with safeguards lifted in areas such as cybersecurity and biology research. For now it's restricted to a very small set of vetted groups, such as partners in Project Glasswing (which includes U.S. government participants), with select biology researchers to follow.
🛡️ Safeguards That Reroute Rather Than Refuse
Fable 5's most distinctive trait is that it doesn't simply block a risky request — it falls back to the previous flagship, Opus 4.8, to handle it. When a query lands in a high-misuse domain — think autonomous hacking or bioweapon design — a safer, less capable model answers in its place. Anthropic's safety classifiers route queries touching cybersecurity, biology and chemistry, or model distillation to Opus 4.8. This fallback fires in under 5% of sessions on average, but it's tuned conservatively, so a perfectly harmless request will occasionally get rerouted too. In practice that means the failure mode is a quieter, slightly weaker answer rather than a hard refusal — worth knowing if your workflow ever brushes against those domains.
flowchart TD
A([User submits a query]) --> B{High-misuse domain?
cyber · bio/chem · etc.}
B -->|Yes · under 5%| C[Routed to
Opus 4.8]
B -->|No · most queries| D[Answered directly
by Fable 5]
style A fill:#3498db,stroke:#2980b9,color:#ffffff
style B fill:#fef9e7,stroke:#f39c12
style C fill:#fdedec,stroke:#e74c3c,color:#c0392b
style D fill:#eafaf1,stroke:#27ae60,color:#1e8449
🔁 Diagram in brief: queries in high-misuse domains like hacking or bio/chem (under 5% of the total) are answered by Opus 4.8; the large majority of ordinary queries are handled directly by Fable 5.
💰 Pricing — Less Than Half the Previous Top-Tier Model
Token pricing breaks down as follows. For a frontier tier, many observers have called it aggressive.
| Category | Price (per million tokens) |
|---|---|
| Input | $10 |
| Output | $50 |
At $10/$50, this comes in below half the price of Mythos Preview, the previous most-expensive research-grade model. One detail worth remembering: output costs 5× input — a workflow that emits long answers at volume can run up a bill fast.
⏰ The Free Window — June 22 Is the Cutoff
As rumored, there's a time limit — but the model isn't going away. What's limited is the window during which it's bundled into subscription plans for free.
▶ Included free: June 9–22 — usable at no extra cost on Pro, Max, Team, and seat-based Enterprise plans.
▶ Switches to metered: from June 23 it drops off the standard bundle, and after that you'll need separate usage credits. The API and pay-as-you-go Enterprise stay available through ongoing billing.
▶ Anthropic says it plans to fold Fable 5 back into subscription plans as it secures more server capacity.
⌨️ Turning It On in Claude Code
No specific version number is spelled out in the official docs. The guidance is simple: update the CLI (command-line interface) or desktop app to the latest version, then enable it with /model claude-fable-5. The model ID is claude-fable-5. Since no exact minimum build number is published, you're safe treating "be on the latest version" as the real requirement.
🤔 Why Cap the Free Window?
Mythos-class reasoning burns enormous compute. In an early launch where demand is hard to forecast, throwing it open to unlimited subscription use would all but guarantee serious latency. The "two-week trial → metered billing" buffer looks like the response. The reroute-instead-of-refuse safeguard reads the same way: a guard against misuse of a model whose capabilities are simply very strong. In short, it reflects a plain reality — the stronger the performance, the larger the operating and safety costs that ride along with it.
🏗️ What It Can Do — From Assistant to Autonomous Agent
Fable 5 leads with the ability to carry out long-horizon tasks with no human in the loop, well beyond simple code generation. The flagship example cited: payments company Stripe reported that Fable 5 autonomously completed, in a single day, a codebase-wide migration across a 50-million-line Ruby codebase that had taken dozens of engineers more than two months by hand.
🟡 A note as you read — This 50-million-line, one-day figure is an early case the vendor (Anthropic) cited. It's striking, but it's too soon to generalize that it reproduces identically in every environment.
📊 Benchmark Comparison vs. Rival Models (as of June 2026)
Lined up next to GPT-5.5, Gemini 3.1 Pro, and DeepSeek V4 Pro, its strengths come into focus. Some figures diverge by source, so read the "interpretation caveats" below the table alongside it. First, what each metric measures — SWE-bench (software-engineering bench) tests real GitHub issue resolution; FrontierCode Diamond covers the hardest competitive coding; CursorBench measures real in-IDE (integrated development environment) edits; Terminal-Bench measures terminal task completion; GPQA Diamond is PhD-level science; and ARC-AGI-2 probes abstract reasoning.
| Metric | Fable 5 | GPT-5.5 | Gemini 3.1 Pro | DeepSeek V4 Pro |
|---|---|---|---|---|
| Release date | Jun 9, 2026 | Apr 23, 2026 | Feb 19, 2026 | Apr 24, 2026 |
| SWE-bench ※ | 80.3% (Pro) | 58.6% | 80.6% (Ver.) | 80.6% (Ver.) |
| FrontierCode Diamond | 29.3% | 5.7% | — | — |
| CursorBench 3.1 | 72.9% (SOTA) | — | — | — |
| Terminal-Bench ※ | 88.0% (v2.1) | 82.7% (v2.0) | — | — |
| GPQA Diamond | — | — | 94.3% | — |
| ARC-AGI-2 | — | — | 77.1% | — |
| GDPval-AA (knowledge work) | 1932 (#1) | 1769 | — | — |
※ Marked rows use different measurement settings or benchmark versions across models, so direct comparison needs care (see below). Ver. = Verified. GDPval-AA is a knowledge-work index tabulated by Artificial Analysis.
🔥 Where the Gap Is Widest — FrontierCode Diamond
On the hardest competitive-coding metric, Fable 5 (29.3%) beats GPT-5.5 (5.7%) by more than five times.
⚠️ Reading the Numbers Carefully — Where Sources Disagree
Some figures don't line up across sources. Here they are, unfiltered.
🔴 A suspicious SWE-bench tie — some tallies put Gemini 3.1 Pro and DeepSeek V4 Pro at exactly 80.6%. Two different models matching to the decimal hints that the source may not be primary. For context, mainstream reporting on SWE-bench Pro actually places Gemini 3.1 Pro closer to the mid-50s, so treat the 80.6% figure with caution.
🔴 Mixed measurement configs — Fable 5's SWE-bench is reported as "Pro 80.3%", while the other two are "Verified". Scores from different settings sit in the same row, so a direct read of this row can mislead.
🔴 Different Terminal-Bench versions — GPT-5.5 is cited at 82.7% (v2.0) while Fable 5 is at 88.0% (v2.1); the benchmark versions themselves differ, so it isn't an apples-to-apples comparison.
Bottom line: FrontierCode Diamond, CursorBench, and the GDPval-AA knowledge-work index are the more internally consistent, trustworthy signals, while SWE-bench and Terminal-Bench are better read as reference points once you account for setting and version differences.
🧩 Strengths at a Glance
🟢 Claude Fable 5 — Agentic autonomy and long-horizon projects. SOTA (state of the art) on FrontierCode Diamond and CursorBench, #1 on the GDPval-AA knowledge-work index. Token prices are high, but it tends to need fewer round-trips per task, which many find makes the overall efficiency strong.
🔵 GPT-5.5 — Terminal workflows and self-correction. It was SOTA on Terminal-Bench 2.0 at launch (since surpassed by Fable 5).
🟣 Gemini 3.1 Pro — Native multimodality and very long context (1M tokens). Strong on reasoning: GPQA Diamond 94.3%, ARC-AGI-2 77.1%.
🟡 DeepSeek V4 Pro — Extreme cost-efficiency and MoE (mixture-of-experts) efficiency. High SWE-bench Verified scores make it attractive for high-volume work.
✅ So Which Should You Pick?
▶ High-complexity work — unattended edits across a giant codebase over several days, or orchestrating multiple agents → the most advanced option right now is Fable 5.
▶ Terminal-centric, conventional coding → GPT-5.5.
▶ Very-long-context analysis that fuses video and audio → Gemini 3.1 Pro.
▶ High-volume processing on a budget → DeepSeek V4 Pro.
Disclaimer — The prices, free window, and benchmark figures above reflect the announcement as of June 9, 2026; policies and availability can change. Some benchmarks differ in measurement settings or versions and may not be apples-to-apples, so confirm primary sources before making an adoption decision.
References — Anthropic newsroom, Cursor benchmarks, Google DeepMind, DeepSeek, Artificial Analysis (H1 2026 tallies).
I gather material from a software-development angle, write it up myself, and give it one more check before publishing.
This article was written from publicly available data and sources. Last updated: June 10, 2026
📄 View raw research source — click to expand
# Claude Fable 5 — A Complete Rundown of Performance, Pricing, and Access, Plus a Frontier AI Benchmark Comparison ## 1. Understanding the Question This inquiry centers on four points: ① the **performance, pricing, and usage caveats of Fable 5** that ordinary users can access, ② the **minimum requirements/version** needed to use Claude Code, ③ whether the rumor that "the launch date is limited" is true, and ④ building a **comparison based on official benchmarks**, contrasting Fable 5 against competing models like GPT-5.5, Gemini 3.1 Pro, and DeepSeek. To give the conclusion up front, the "date limit" rumor is **true**. However, it does not mean the model itself disappears — it means the **free-inclusion period within subscription plans** is limited. The benchmark comparison also contains a mix of contradictory figures, so in the body we expose those conflicts as-is and present the interpretive caveats alongside. --- ## 2. Foundation The newly introduced **'Mythos-class'** is Anthropic's next-generation tier that sits above Opus, which was previously the top of the lineup. It splits into two branches that share the same brain (model). - **Claude Fable 5**: The general-purpose top-performance model released to ordinary enterprises and developers by applying consumer-grade **Safeguards** to the Mythos class. (Source: Anthropic Newsroom, 2026-06-09) - **Claude Mythos 5**: The same model as Fable 5, but a version with **safeguards removed** for areas such as cybersecurity and biology research. Currently provided in a restricted manner **only to a very small number of vetted groups**, including 'Project Glasswing' partners such as the U.S. government. In other words, the explanation the questioner heard — "Mythos and Fable are the same, but Fable has more safeguards in place" — matches the official blog's description. ### What Sets Fable 5 Apart — 'Fallback-style Safeguards' The most distinctive aspect of Fable 5 is a safeguard that **reroutes** rather than refuses. When a question comes in from an area with high risk of misuse (uplift), such as autonomous hacking or biochemical weapon design, instead of simply refusing to answer, it **hands the request off (Fallback) to the previous flagship model, Opus 4.8, for processing**. This behavior occurs in **less than 5% of all sessions** on average, but because it is set conservatively, even harmless requests may occasionally be rerouted. (Source: Anthropic Newsroom) --- ## 3. Current State ### Pricing | Category | Unit price (per 1M tokens) | | :--- | :--- | | Input | **$10** | | Output | **$50** | This is **less than half** the price of Mythos Preview, the previously most expensive research-grade model. (Source: Anthropic Newsroom) ### Usage Deadline & Accessibility (the most important caveat) As rumored, a **time limit exists**. However, it is not a shutdown of the model but a deadline on the free inclusion within subscription plans. - **Free-inclusion period**: From the launch date **June 9 to June 22**, it can be used at **no additional cost** with a Pro, Max, Team, or Enterprise subscription. - **Transition to pay-as-you-go**: **Starting June 23**, it will be removed from the default-included list → using it afterward requires separate **Usage Credits**. (API and pay-as-you-go Enterprise can use it via ongoing billing at any time.) - Anthropic stated it plans to **quickly reintegrate it into subscription plans once server capacity is secured**. → **Practical recommendation**: Run focused testing on your key workflows before the **June 22** free deadline, so you can assess the real-world value against paying for credits in advance. ### Claude Code Minimum Requirements The research found that **no specific version number is specified.** The official guidance is to "**update the CLI/desktop app to the latest version**, then activate with the `/model claude-fable-5` command." (The model ID is confirmed as `claude-fable-5`.) Since the exact minimum build number is not in the public docs, it is safest to treat **updating to the latest version** as the de facto requirement. --- ## 4. Root Cause — Why a Time Limit Was Imposed Mythos-class inference consumes enormous compute. In an early launch where demand is hard to forecast, opening it fully to unlimited subscriptions would inevitably cause severe service delays. That appears to be why they chose a demand-suppression structure of a **2-week trial → transition to pay-as-you-go (credits)**. The fallback-style safeguard is likewise a misuse-prevention design driven by the sheer strength of the model's capabilities. --- ## 5. Impact Fable 5 emphasizes the ability to **autonomously carry out long-horizon tasks**, going beyond simple code generation. An example is cited in which, during early testing, **Stripe reported that Fable 5 autonomously completed in a single day a 50-million-line codebase migration that had required dozens of people working for more than two months** (as cited per Anthropic's announcement). This suggests AI is moving from 'assistant' to 'autonomous agent.' That said, since this figure is an early vendor case, caution is needed before generalizing. --- ## 6. Conclusion and Implications — Competing-Model Benchmark Comparison ### [Key Frontier AI Benchmarks, as of June 2026] | Benchmark | Claude Fable 5 | GPT-5.5 | Gemini 3.1 Pro | DeepSeek V4 Pro | | :--- | :--- | :--- | :--- | :--- | | Release date | 2026.06.09 | 2026.04.23 | 2026.02.19 | 2026.04.24 | | SWE-bench | 80.3% (Pro) | 58.6% | 80.6% (Verified) | 80.6% (Verified) | | FrontierCode Diamond | **29.3%** | 5.7% | — | — | | CursorBench 3.1 | **72.9%** (SOTA) | — | — | — | | Terminal-Bench | 88.0% (v2.1) | 82.7% (v2.0) | — | — | | GPQA Diamond | — | — | 94.3% | — | | ARC-AGI-2 | — | — | 77.1% | — | | Artificial Analysis (GDPval-AA) | **1932 (1st)** | — | — | — | (Source: Anthropic, Cursor, DeepMind, DeepSeek, Artificial Analysis, first-half 2026 aggregation) ### ⚠️ Cross-Round Contradictions — Be Sure to Note When Comparing Directly **Mutually conflicting figures** were found between research rounds. They are stated as-is. 1. **Suspected SWE-bench Verified tie**: One source aggregates Gemini 3.1 Pro and DeepSeek V4 Pro as **both being exactly 80.6%**. Two different models matching to the decimal point is **highly likely an unverified secondary aggregation** — cross-checking against primary sources is needed. 2. **Mixed measurement variants**: Fable 5's SWE-bench had no figure in Round 1 but appeared in Round 2 as **'Pro 80.3%'**. Within the same table, the **'Pro' setup (Fable) and the 'Verified' setup (Gemini, DeepSeek) are mixed**, so a direct comparison of the SWE-bench row may be inaccurate. 3. **Terminal-Bench version mismatch**: Round 1 has GPT-5.5 at 82.7% (**v2.0**), while Round 2 has Fable 5 at 88.0% (**v2.1**), so with **different bench versions** this is not a same-baseline comparison. Therefore, in the table above, **FrontierCode Diamond, CursorBench, and GDPval-AA** are relatively consistent, trustworthy metrics, while the **SWE-bench and Terminal-Bench rows are best read as reference figures that account for setup/version differences**. ### Strengths by Model - **Claude Fable 5** — Agentic autonomy and long-horizon projects. At 29.3% on FrontierCode Diamond, it leads GPT-5.5 (5.7%) by a wide margin, takes SOTA on CursorBench 3.1 at 72.9%, and ranks 1st on GDPval-AA. Its per-token price is high, but with fewer turns consumed, the overall efficiency is rated as excellent. - **GPT-5.5** — Terminal workflows and self-correction. SOTA at launch with Terminal-Bench 2.0 at 82.7% (now surpassed by Fable 5). - **Gemini 3.1 Pro** — Native multimodality and ultra-long context (1M tokens). Strong reasoning with GPQA Diamond at 94.3% and ARC-AGI-2 at 77.1%. - **DeepSeek V4 Pro** — Extreme value-for-money and MoE efficiency. Its high SWE-bench Verified score makes it favorable for high-volume work. ### Final Takeaways - For high-complexity environments that require **autonomously modifying a huge codebase over several days or orchestrating multiple agents** → **Fable 5** is the most advanced option at this point. - For **terminal-centric traditional coding** → GPT-5.5; for **ultra-long-context analysis combining video and audio** → Gemini 3.1 Pro; for **high-volume processing relative to cost** → DeepSeek V4 Pro. - However, we repeatedly recommend **not using the SWE-bench and Terminal-Bench figures among the above benchmarks as a definitive comparison until they have been re-verified against primary sources.** ## Cross-Round Contradictions - Gemini 3.1 Pro and DeepSeek V4 Pro both have SWE-bench Verified scores of exactly 80.6% — two different models matching is highly likely an unverified secondary aggregation (cross-checking against primary sources is needed) - Fable 5's SWE-bench figure was absent in Round 1 but appeared in Round 2 as 'Pro 80.3%' — the measurement variants (Verified vs Pro) are mixed across models, so a direct comparison may be inaccurate - Terminal-Bench has GPT-5.5 at 82.7% (v2.0) in Round 1 and Fable 5 at 88.0% (v2.1) in Round 2, so with different versions this is not a same-baseline comparison --- ## References - [Anthropic Newsroom](https://www.anthropic.com/news/claude-fable-5-mythos-5) - [Cursor Benchmark](https://cursor.com) - [Google DeepMind](https://deepmind.google) - [DeepSeek](https://deepseek.com) - [Artificial Analysis](https://artificialanalysis.ai)
댓글
댓글 쓰기