Define routing as policy. Relay classifies the task, selects the right model for the moment, and falls back automatically when a provider degrades.
Classify each prompt and route by intent: code, summary, agent, search, RAG, vision.
Cost, latency, quality, or balanced — set the goal and Relay picks.
Provider down? Retry across a fallback chain with health-aware backoff.
Pin requests to a region for data residency or lowest latency.
Split traffic between models to evaluate quality safely in production.
Auto-disable degraded providers and re-test in the background.
Force specific traffic to specific providers for compliance or contracts.
Policies update in seconds — no redeploys.
Define routing in YAML or JSON. Promote between environments with git workflows.
# routing.yaml — declarative routing
policies:
- name: premium_coding
when: task == "code" or task == "review"
route_to: openai/gpt-4.1
fallback: [anthropic/claude-sonnet, google/gemini-2.5-pro]
max_latency_ms: 2000
region: us
- name: cheap_summaries
when: task == "summarize"
optimize_for: cost
fallback: [groq/llama-3.3-70b, deepseek/reasoner]
max_cost_per_request: 0.002
- name: long_context
when: input_tokens > 100_000
route_to: google/gemini-2.5-pro
fallback: [anthropic/claude-sonnet]