3 Model Routing Strategies That Cut Our Customers' Costs by 40%
We analyzed aggregated usage patterns across our customer base (anonymized) and found that most teams overspend on AI by using a single model for everything. Here are three routing strategies that consistently cut costs by 30-50%.
Strategy 1: Complexity-Based Routing
The simplest and most effective approach. Classify incoming requests by complexity, then route accordingly:
def route_model(prompt: str, max_tokens: int) -> str:
estimated_complexity = len(prompt) + max_tokens
if estimated_complexity < 500:
return "tokenfast/claude-haiku-4-5-20251001" # $1/M in, $5/M out
elif estimated_complexity < 5000:
return "tokenfast/claude-sonnet-4-6" # $5/M in, $25/M out
else:
return "tokenfast/claude-opus-4-6" # $30/M in, $150/M out
Real result: One customer moved from 100% Opus to this split and saw a 43% cost reduction with less than 2% quality degradation on their evaluation suite.
Strategy 2: Cascade (Try Cheap First)
Send the request to a cheaper model first. If the response passes your quality check, use it. If not, retry with a more capable model.
async def cascade_request(messages):
# Try the fast, cheap model first
response = await client.chat.completions.create(
model="tokenfast/claude-haiku-4-5-20251001",
messages=messages,
)
# Quality gate: check if response meets criteria
if passes_quality_check(response):
return response
# Fall back to the more capable model
return await client.chat.completions.create(
model="tokenfast/claude-sonnet-4-6",
messages=messages,
)
This works especially well for customer support and classification tasks where 70-80% of requests are straightforward.
Strategy 3: Task-Specific Model Assignment
Different models have different strengths. Map your task types to optimal models:
| Task | Recommended Model | Why |
|---|---|---|
| Code generation | Claude 4.6 Opus | Best at complex, multi-file code |
| Summarization | Claude Haiku 4.5 | Fast, accurate, cheap |
| Data extraction | Claude 4.6 Sonnet | Great structured output |
| Creative writing | Claude 4.6 Opus | Strongest at nuance |
| Classification | Claude Haiku 4.5 | Speed > depth |
| Long document analysis | Gemini 3.1 Pro | 1M token context |
The Math
A team spending $10,000/month on Claude Opus for everything could realistically spend $5,500-6,500/month with smart routing — and get faster responses for simple tasks as a bonus.
Through TokenFast, switching models is just changing a string. No separate API keys, no different SDKs, no billing complexity. That's what makes these strategies practical to implement.