Routing engine

Route every request intelligently.

Define routing as policy. Relay classifies the task, selects the right model for the moment, and falls back automatically when a provider degrades.

Incoming request
taskcode
tokens3,820
regionus-east
Relay decision
8ms
Classify taskcode · high stakes
Apply policypremium_coding
Check budgetok · $0.018 est.
Provider healthopenai · healthy
Route
openai/gpt-4.1
Fallback
anthropic/claude-sonnet
google/gemini-2.5-pro
Task-based

Classify each prompt and route by intent: code, summary, agent, search, RAG, vision.

Optimize for

Cost, latency, quality, or balanced — set the goal and Relay picks.

Automatic failover

Provider down? Retry across a fallback chain with health-aware backoff.

Region-aware

Pin requests to a region for data residency or lowest latency.

A/B + canary

Split traffic between models to evaluate quality safely in production.

Circuit breakers

Auto-disable degraded providers and re-test in the background.

Provider pinning

Force specific traffic to specific providers for compliance or contracts.

Live re-routing

Policies update in seconds — no redeploys.

Routing as code

Declarative. Reviewable. Versioned.

Define routing in YAML or JSON. Promote between environments with git workflows.

routing.yaml
# routing.yaml — declarative routing
policies:
  - name: premium_coding
    when: task == "code" or task == "review"
    route_to: openai/gpt-4.1
    fallback: [anthropic/claude-sonnet, google/gemini-2.5-pro]
    max_latency_ms: 2000
    region: us

  - name: cheap_summaries
    when: task == "summarize"
    optimize_for: cost
    fallback: [groq/llama-3.3-70b, deepseek/reasoner]
    max_cost_per_request: 0.002

  - name: long_context
    when: input_tokens > 100_000
    route_to: google/gemini-2.5-pro
    fallback: [anthropic/claude-sonnet]

Smarter routing in 5 minutes.