Relay intelligently routes, monitors, and optimizes AI traffic across every major LLM provider — so your team ships reliable AI without managing infrastructure.
const response = await relay.chat.completions.create({
model: "auto/best",
messages: [{ role: "user", content: "Summarize this document" }],
routing: {
optimize_for: "cost_quality",
fallback: ["openai/gpt-4.1", "anthropic/claude", "google/gemini"]
}
})Every team building serious AI hits the same wall.
Every provider has different pricing, endpoints, SDKs, limits, and reliability patterns.
AI spend grows quickly without routing, budgets, alerts, and per-team visibility.
Rate limits, outages, and latency spikes require automatic fallback logic.
The best model for your task today may not be the best model next month.
A single gateway between your application and every model — with routing, governance, and observability built in.
OpenAI-compatible interface for every provider.
Route by cost, quality, latency, or task.
Retry across providers when one fails.
Per-key, per-team, per-model dashboards.
p50/p95/p99 across providers in real time.
Scoped keys with rate limits and budgets.
Caps and alerts per team or environment.
Full traces with prompts, fallbacks, costs.
Compare up to 3 models side by side.
100+ models with pricing and benchmarks.
SSO, audit logs, data residency.
PII redaction, allowlists, deny rules.
Compare context windows, pricing, and latency across providers in one place.
Send simple summaries to low-cost models, code generation to premium models, and long documents to long-context models — automatically.
Per-key budgets, team caps, rate limits, and real-time alerts — built in.
Trace every call: prompt, response, tokens, cost, latency, fallback chain.
| Time | Key | Team | Model | Provider | Tokens | Cost | Latency | Status | |
|---|---|---|---|---|---|---|---|---|---|
| 09:14:22 | sk_***a91 | Platform | openai/gpt-4.1 | OpenAI | 2,431 | $0.018 | 412ms | 200 | |
| 09:14:18 | sk_***a91 | Platform | anthropic/claude-sonnet | Anthropic | 3,820 | $0.024 | 488ms | 200 | |
| 09:14:11 | sk_***f24 | Research | groq/llama-3.3-70b | Groq | 812 | $0.0008 | 94ms | 200 | |
| 09:13:58 | sk_***a91 | Platform | openai/gpt-4.1 | OpenAI | — | — | — | 429 | |
| 09:13:58 | sk_***a91 | Platform | anthropic/claude-sonnet | Anthropic (fallback) | 1,914 | $0.012 | 521ms | 200 |
Drop-in compatibility with the OpenAI SDK. Just change the base URL.
import OpenAI from "openai"
const client = new OpenAI({
apiKey: process.env.RELAY_API_KEY,
baseURL: "https://api.decks.ai/v1"
})
const completion = await client.chat.completions.create({
model: "auto/best",
messages: [{ role: "user", content: "Write a launch plan" }]
})Ship faster without locking into one provider.
Add AI features with cost controls and team visibility.
Centralize governance, budgets, logs, and model access.
Manage client usage, keys, and billing in one place.
Compare models and build with one API.
"We cut our LLM bill by 41% in the first week and our p95 latency dropped by 280ms."
"Relay is the layer we wished we'd built ourselves. Fallbacks alone are worth the price."
"Finally one bill, one dashboard, one place to govern every model our product uses."
"Migrated from our own routing layer in an afternoon. Just changed the base URL."
Start free. Scale with confidence.
For exploring Relay.
For solo developers and small projects.