Multi-Provider LLM Fallback Done Right in 2026

Anthropic retired Claude Opus 4 on June 15 and a directive killed Fable 5 in hours. Your model ID runs on two clocks, and a naive fallback fails both.

Yesterday claude-opus-4-20250514 and claude-sonnet-4-20250514 stopped answering. Not slower, not dumber. Retired. Anthropic's deprecation table lists both as "Retired" and spells out the consequence in five words: "Requests to retired models will fail." If you hardcoded either snapshot ID and skipped the migration, your calls started returning errors on June 15 and nobody on the provider side is coming to fix it.

Three days earlier something different happened. On June 12 a US government directive disabled Fable 5 and Mythos 5 in a matter of hours. No deprecation table, no notice window, no successor ID to repoint at. One model died on a published schedule. The other died on someone else's command. Same week, same blast radius in your code, two completely different failure modes. That pairing is the whole lesson, and most fallback designs only account for one half of it.

A model ID runs on two clocks

Here is the framing I wish more teams started from: a versioned model string is a dependency that runs on two separate clocks, and you have to watch both.

The first clock is the deprecation treadmill. It is slow, dated, and predictable. Anthropic commits to "at least 60 days notice before model retirement for publicly released models," and the cadence is consistent enough to set your roadmap by. Opus 4.1 was deprecated on June 5 with a hard retirement of August 5, exactly 60 days. The Sonnet 4 / Opus 4 pair ran from April 14 to June 15. A model ships, gets a successor within a quarter, and lands on a retirement clock you can see coming. This is a calendar problem, and a calendar fixes it.

The second clock is the revocation cliff. Instant, undated, external. Fable 5 went dark in hours on an order nobody scheduled. Capacity reclamation, a legal directive, a region cutoff, a billing dispute: any of these can pull an ID out from under you with effectively zero notice. A 60-day email does nothing here. A calendar reminder does nothing here. This is a kill-switch problem and it needs a kill-switch defense.

Teams that survive both treat them as two different projects. Teams that get burned file both under "model upgrade work," plan for the treadmill, and get blindsided by the cliff. The defense for one is not the defense for the other. Conflating them is exactly why "don't worry, we have a fallback" tends to fall apart on the day the fallback is needed.

Retirement is a hard error, and the contract shifts even when you stay put

Worth being precise about what "retired" means, because people read it as "deprecated" and assume a grace period. There is none. A retired ID fails outright. Opus 4 and Sonnet 4 (both ...-20250514) retired June 15. claude-mythos-preview is scheduled to retire June 30. The recommended replacements are claude-opus-4-8 and claude-sonnet-4-6, and repointing to them is the easy part of this whole story.

The trap is assuming the swap is a string edit. It is not, even inside one vendor. Anthropic now returns a 400 error when temperature, top_p, or top_k are set to a non-default value on Claude Opus 4.7 and later. So you "just bump the model string" to a newer Opus, ship it, and start throwing 400s on a temperature parameter your code has been sending without complaint for a year. Portability is not only a cross-vendor problem. The contract drifts under you while you stand still on the same provider.

"OpenAI-compatible" gets you the envelope, not the letter

This is where the comfortable assumption dies. Nearly every provider now exposes an OpenAI-shaped endpoint, and that sells the dream of "point it at a different base URL and you're done." The envelope is genuinely portable. The letter inside is not.

Tool definitions are the clearest example. OpenAI wraps a JSON Schema inside a tools array. Anthropic uses an input_schema object and has no global JSON mode at all, so structured output has to be forced through a tool plus tool_choice. Swap the model behind a uniform endpoint without translating that, and your tool calls come back malformed. The HTTP layer says everything went fine.

And prompt behavior does not transfer either. OpenAI's own prompting guidance is blunt that different models, and different snapshots within the same family, can need different prompting, which is why they recommend pinning snapshots and keeping an eval suite to catch drift. So a fallback to a "comparable" model is an unverified bet until an eval proves otherwise. In practice the part that bites teams is that "comparable on the leaderboard" and "comparable on my prompt" are not the same claim, and the gap only shows up in production output.

The fallback that returns 200 and lies

If there is one idea to take from this, it is this one. The dangerous failure is not the model that goes down. It is the model that comes up wrong.

A gateway buys you availability, full stop. LiteLLM's Router is good at exactly this: order-based priority where a failed order=1 deployment rolls to order=2 then order=3, cooldowns that pull a deployment after 429s or a greater-than-50% failure rate inside a minute, and num_retries with backoff. That machinery keeps requests flowing. It is doing its job.

But watch what happens when LiteLLM rolls from Anthropic to an OpenAI-compatible target with no adapter underneath. You get a 200. From a model that may emit broken JSON, or the wrong tool format, or structured output your parser silently drops. Your uptime dashboard stays green. The agent quietly does the wrong thing, for every request, until a human notices the output is garbage.

Think about which failure you would rather have at 3 a.m. A clean 5xx pages someone and gets fixed in twenty minutes. The "successful" fallback sails straight past your monitoring because, as far as every metric is concerned, the request succeeded. Availability hid the incident. That is the trap: the gateway routes, it does not adapt, and routing without adapting is a generator of confident wrong answers.

Pick a gateway, but know what it won't do

The market has settled into three shapes, and the choice is real. OpenRouter is a hosted marketplace, one key for 300-plus models, lowest friction. LiteLLM is a self-hosted proxy with full routing control and no vendor lock-in. Portkey leans observability-first. Pick on how much control versus how little ops you want.

None of them solves behavior portability for you. Every one of them routes; not one of them adapts your tool schema or runs your eval before serving the fallback. Buying a gateway and calling the problem solved is the most common version of the mistake. The gateway is necessary. It is nowhere near sufficient.

What to ship before the next ID disappears

Two layers most teams collapse into one, plus the discipline that keeps the two clocks separate. In priority order:

Move every model ID into config today. Env var or config key, never a literal in source. Export your usage CSV from the Claude Console (Usage > Export) to see exactly which IDs you actually call, then grep your codebase for every hardcoded string and kill it. This is your only real defense against the revocation cliff: when an ID dies undated, you want a repoint in minutes, not a deploy. If you fix one thing this week, fix this.

Run two scheduled jobs, not one. A recurring calendar job that diffs your live model IDs against the provider deprecation tables, triggering migration the moment any ID you use shows a retirement date inside 90 days. The August 5 Opus 4.1 retirement should already be on it. Separately, a tested kill-switch path (config repoint plus a pre-validated alternate provider) for the undated case. One clock is a calendar. The other is a fire drill. Do not let them share a ticket.

Put a thin adapter under the gateway, per model. Normalize tool schemas (tools array versus input_schema), strip unsupported params (drop temperature, top_p, and top_k for Opus 4.7+ or eat the 400), and reconcile structured-output mechanics. A gateway alias without this layer is the 200-that-lies generator from two sections up.

Gate every swap behind a golden eval. Pin snapshots, keep an eval suite, and require the fallback target to pass before it can serve traffic. If the alternate fails the suite, fail loud. A quiet degrade is worse than an outage because you cannot see it.

Page on fallback activation. Treat "we rolled to order=2" as an event a human reads, not a silent success. The roll kept you up; the alarm is what turns a hidden problem back into a signal you can act on.

Game-day the cliff, not just the treadmill. Revoke your primary model ID with zero notice and time how long until correct traffic flows from the alternate. If that number is longer than "hours," Fable 5 already showed you how that day ends.

The string in your code looks like a constant. It is a lease, and the landlord can change the terms or evict you. Build like you believe that.

Sources

https://platform.claude.com/docs/en/about-claude/model-deprecations
https://docs.litellm.ai/docs/routing
https://developers.openai.com/api/docs/guides/prompting
https://openrouter.ai/blog/insights/llm-gateway/
https://tianpan.co/blog/2026-04-27-model-deprecation-treadmill-pre-sunset-discipline