Automatic model fallback when hitting rate limits or API outages

Problem

When running multiple Claude Code sessions in parallel or on intensive tasks, you frequently hit rate limits or encounter provider outages. This blocks all active sessions simultaneously:

> 529 {"type":"error","error":{"type":"overloaded_error","message":"Overloaded"}}

On Claude Max 5x plans, heavy users report hitting hourly limits at least monthly. Running 3-4 parallel sessions per project accelerates this. When Anthropic's API goes down, every active agent stops.

Solution

Option 1: claude-code-router for automatic provider switching

Use claude-code-router or similar routing tools to automatically fall back between providers:

{
  "providers": [
    {
      "name": "anthropic",
      "model": "claude-opus-4-6",
      "priority": 1
    },
    {
      "name": "openai",
      "model": "codex-5.2",
      "priority": 2,
      "when": "rate_limited"
    }
  ],
  "fallback_strategy": "priority",
  "retry_after_seconds": 60
}

Option 2: Manual model switching with symlinked configs

Keep both providers configured and switch when rate limited:

# Symlink AGENTS.md for Codex compatibility
ln -s CLAUDE.md AGENTS.md

# Use opus for planning and conversation
claude --model claude-opus-4-6 "Plan the auth refactor into PLAN.md"

# Switch to codex for implementation when rate limited
codex "Read PLAN.md and implement phase 1"

Option 3: Tiered workflow by model strength

Assign tasks based on model capability and availability:

# Tier 1: Opus for planning and review (best quality)
# Tier 2: Codex for implementation (good at writing code)
# Tier 3: Sonnet for simple tasks (fast, rarely rate limited)

# Planning agent (opus)
claude --model claude-opus-4-6 "Create detailed plan in PLAN.md"

# Implementation agent (codex - different rate limit pool)
codex "Read PLAN.md, implement next phase"

# Review agent (opus reviews codex output)
claude --model claude-opus-4-6 "Review the changes from last commit"

Why It Works

Different providers and model tiers have independent rate limit pools. When Anthropic's Opus is overloaded, OpenAI's Codex likely is not, and vice versa. By distributing work across providers, you maintain continuous development throughput. The tiered approach also plays to each model's strengths -- Opus excels at planning and review while Codex handles implementation well, and each draws from a separate quota.

Context

Users on Max 5x report hitting limits with 3-4 parallel sessions; Max 20x users rarely hit limits
API-key users ($2k+/month) report better performance and availability than subscription plans during outages
Codex via AGENTS.md symlink reads the same project instructions as Claude via CLAUDE.md
The consensus ranking is Opus > Codex > Sonnet for code quality, but Codex handles harder implementation tasks well
Consider ccusage to track spending across providers and understand your actual consumption patterns