Claude Opus 4.8 vs Claude Fable 5 benchmark

Claude Opus 4.8 vs Claude Fable 5 benchmark: prompt steering, cost, and latency results I re-ran my April prompt steering benchmark on Claude Opus 4.8 [1m] and Claude Fable 5 [1m] looking at how effort level and prompt steering impacts token usage and token costs, and performance i.e. instruction following https://ai.georgeliu.com/p/claude-opus-48-vs-claude-fable-5

The benchmark harness runs 200x headless claude -p sessions and the analysis comes from session-metrics, a Claude Code skill I built with Claude Code itself - it parses session JSONL files into per-turn cost, cache, latency, and token breakdowns, and produced the comparison data and charts behind the article.

The results that surprised me:

The model-specific inversion survived but moved. ultrathink is essentially free on Opus 4.8 at high effort (-2.6 percent cost) and the single most expensive wrapper on Fable 5 at the same effort (+31.0 percent, output up 53 percent).

Fable 5 looks 2-3x more expensive than 4.8, but it sits on a 2x rate card. Divide by two and at high effort its token consumption is on par with 4.8. At medium it is genuinely ~43 percent heavier.

Fable 5's medium effort cost MORE than high ($1.33 vs $1.14 per session). There is no economy rung hiding at medium.

"Be concise" reliably breaks refactoring tasks. Across two model generations, compression wrappers made models drop required content from the explanation around otherwise-correct code (4.7 failed the same constraint in 4 of 10 cells in April, all concise/no-tools; 4.8 repeated it in June). If your prompt requires the answer to contain specific things, do not also tell the model to trim words.

Instruction-following ranks across both runs: Opus 4.6 88/90, Fable 5 83/90, Opus 4.8 80/90, Opus 4.7 75/90. The oldest model is still the champion. Fable 5 hits "exactly 120 words" 10/10 but writes past character caps; 4.8 does the opposite.

Both June models are much faster: 4.8 averages ~3 seconds per turn vs 6.6-19.5 for April's 4.6.

0 replies

Replies (0)

No replies yet.