Relay v3·Intelligent routing + AI observability, now in public beta

One API for every AI model.

Relay intelligently routes, monitors, and optimizes AI traffic across every major LLM provider — so your team ships reliable AI without managing infrastructure.

OpenAI-compatible
100+ models
Smart failovers
Real-time observability
Enterprise-ready
Requests / 24h
1.24M
+18.2%
Cost saved via routing
$3,482
+9.1%
p95 latency
684 ms
−42 ms
Request volume
Requests Fallbacks
Latency by provider (p50)
Live routing flow
healthy
Request
Classify task
Choose model
Send to GPT-4.1
Log + bill
Fallback 1
GPT-4.1
Fallback 2
Claude Sonnet
Fallback 3
Gemini Pro
Top models
last 24h
  • openai/gpt-4.1$1241
  • anthropic/claude-sonnet$980
  • google/gemini-2.5-pro$540
  • groq/llama-3.3-70b$84
route.ts
const response = await relay.chat.completions.create({
  model: "auto/best",
  messages: [{ role: "user", content: "Summarize this document" }],
  routing: {
    optimize_for: "cost_quality",
    fallback: ["openai/gpt-4.1", "anthropic/claude", "google/gemini"]
  }
})
Connect every major provider
OpenAI
Anthropic
Google
Meta
Mistral
Cohere
AWS Bedrock
Azure
Groq
Together
Perplexity
DeepSeek
xAI
The problem

LLM infrastructure gets messy fast.

Every team building serious AI hits the same wall.

Too many provider APIs

Every provider has different pricing, endpoints, SDKs, limits, and reliability patterns.

Costs are hard to control

AI spend grows quickly without routing, budgets, alerts, and per-team visibility.

Reliability breaks in production

Rate limits, outages, and latency spikes require automatic fallback logic.

Model choice changes constantly

The best model for your task today may not be the best model next month.

The solution

Relay gives your team one control plane for AI.

A single gateway between your application and every model — with routing, governance, and observability built in.

Your app
App / SDK / Edge
Relay gateway
Routing · Auth · Budgets · Logs · Fallbacks
Providers
OpenAI
Anthropic
Google
Meta
Mistral
Cohere
AWS Bedrock
Azure
Groq
Together
Perplexity
DeepSeek
xAI
Unified API

OpenAI-compatible interface for every provider.

Smart model routing

Route by cost, quality, latency, or task.

Automatic fallbacks

Retry across providers when one fails.

Spend tracking

Per-key, per-team, per-model dashboards.

Latency monitoring

p50/p95/p99 across providers in real time.

Virtual API keys

Scoped keys with rate limits and budgets.

Team budgets

Caps and alerts per team or environment.

Request logs

Full traces with prompts, fallbacks, costs.

Prompt playground

Compare up to 3 models side by side.

Model marketplace

100+ models with pricing and benchmarks.

Enterprise controls

SSO, audit logs, data residency.

Security policies

PII redaction, allowlists, deny rules.

Model marketplace

Find the right model for every task.

Compare context windows, pricing, and latency across providers in one place.

ModelProviderContextInputOutputLatencyStrengthUse case
GPT-4.1OpenAI1M$5/M$15/MFastReasoning + codingGeneral + code
Claude Sonnet 4Anthropic200K$3/M$15/MFastWriting + agentsAgents + writing
Gemini 2.5 ProGoogle1M$1.25/M$5/MFastMultimodal + long contextLong context, vision
Llama 4 405BMeta256K$0.6/M$0.9/MMediumOpen weightsCustomizable, self-host
DeepSeek R1DeepSeek128K$0.27/M$1.1/MMediumReasoningMath, logic, planning
Mistral Large 2Mistral128K$2/M$6/MFastEnterprise EUEU deployment
Grok 3xAI128K$5/M$15/MFastRealtime + toolsSearch-grounded
Llama 3.3 70BGroq128K$0.59/M$0.79/MUltra-fastLowest latencyRealtime UX
Smart routing

Route every request intelligently.

Send simple summaries to low-cost models, code generation to premium models, and long documents to long-context models — automatically.

Routing modes
Lowest cost
Lowest latency
Best quality
Balanced
Fallback chain
A/B testing
Task-based
Provider preference
Region-aware
Decision flow
  1. 1User request
  2. 2Classify task
  3. 3Choose model
  4. 4Check budget
  5. 5Send to provider
  6. 6If fail, fallback
  7. 7Log result
  8. 8Optimize future route
Cost control

Stop surprise AI bills.

Per-key budgets, team caps, rate limits, and real-time alerts — built in.

Monthly spend
$4,820
Budget remaining
$2,180
Cost per request
$0.0041
Token usage
182M
OpenAI$1820 / $3000
Anthropic$1240 / $2000
Google$940 / $2000
DeepSeek$320 / $1000
Built-in guardrails
  • Per-key budgets
  • Team budgets
  • Rate limits
  • Spend alerts
  • Model allowlists
  • Usage caps
  • Daily / monthly reporting
Observability

See every request in production.

Trace every call: prompt, response, tokens, cost, latency, fallback chain.

TimeKeyTeamModelProviderTokensCostLatencyStatus
09:14:22sk_***a91Platformopenai/gpt-4.1OpenAI2,431$0.018412ms200
09:14:18sk_***a91Platformanthropic/claude-sonnetAnthropic3,820$0.024488ms200
09:14:11sk_***f24Researchgroq/llama-3.3-70bGroq812$0.000894ms200
09:13:58sk_***a91Platformopenai/gpt-4.1OpenAI429
09:13:58sk_***a91Platformanthropic/claude-sonnetAnthropic (fallback)1,914$0.012521ms200
Developer experience

OpenAI-compatible. Migration in minutes.

Drop-in compatibility with the OpenAI SDK. Just change the base URL.

index.ts
import OpenAI from "openai"

const client = new OpenAI({
  apiKey: process.env.RELAY_API_KEY,
  baseURL: "https://api.decks.ai/v1"
})

const completion = await client.chat.completions.create({
  model: "auto/best",
  messages: [{ role: "user", content: "Write a launch plan" }]
})
Use cases

Built for every kind of AI team.

AI startups

Ship faster without locking into one provider.

SaaS companies

Add AI features with cost controls and team visibility.

Enterprises

Centralize governance, budgets, logs, and model access.

Agencies

Manage client usage, keys, and billing in one place.

Developers

Compare models and build with one API.

Loved by AI teams

From scrappy startups to enterprise platforms.

"We cut our LLM bill by 41% in the first week and our p95 latency dropped by 280ms."
Priya ShahCTO, NebulaAI
"Relay is the layer we wished we'd built ourselves. Fallbacks alone are worth the price."
Marco ReyesML Platform Engineer, Lumen
"Finally one bill, one dashboard, one place to govern every model our product uses."
Aisha KhanProduct Lead, Northwind
"Migrated from our own routing layer in an afternoon. Just changed the base URL."
Daniel ChoFounder, Builder Studio
Pricing

Simple, usage-aware pricing.

Start free. Scale with confidence.

Free
$0/month

For exploring Relay.

  • 10K requests/month
  • 3 connected providers
  • Basic analytics
  • Community support
Most popular
Pro
$29/month

For solo developers and small projects.

  • 1M requests/month
  • Unlimited providers
  • Smart routing
  • Fallback chains
  • API keys
  • Usage analytics
  • Email support
Team
$199/month

For growing engineering teams.

  • Team budgets
  • Role-based access
  • Advanced logs
  • Rate limits
  • Alerts
  • Priority support
Enterprise
Custom

For regulated and large-scale deployments.

  • SSO / SAML
  • Dedicated gateway
  • Private deployment
  • Custom routing policies
  • Audit logs
  • SLA
  • Security review

Start routing your AI traffic today.

No vendor lock-in. One bill. One API. Every model.

Built for production AI teams. Spend less. Ship faster. Fail less. One bill. One API. Every model.