Relay v3·Intelligent routing + AI observability, now in public beta

One API for every AI model.

Relay intelligently routes, monitors, and optimizes AI traffic across every major LLM provider — so your team ships reliable AI without managing infrastructure.

OpenAI-compatible

100+ models

Smart failovers

Real-time observability

Enterprise-ready

Requests / 24h

1.24M

+18.2%

Cost saved via routing

$3,482

+9.1%

p95 latency

684 ms

−42 ms

Request volume

Requests Fallbacks

Latency by provider (p50)

Live routing flow

healthy

Request

Classify task

Choose model

Send to GPT-4.1

Log + bill

Fallback 1

GPT-4.1

Fallback 2

Claude Sonnet

Fallback 3

Gemini Pro

Top models

last 24h

openai/gpt-4.1$1241
anthropic/claude-sonnet$980
google/gemini-2.5-pro$540
groq/llama-3.3-70b$84

route.ts

const response = await relay.chat.completions.create({
  model: "auto/best",
  messages: [{ role: "user", content: "Summarize this document" }],
  routing: {
    optimize_for: "cost_quality",
    fallback: ["openai/gpt-4.1", "anthropic/claude", "google/gemini"]
  }
})

Connect every major provider

OpenAI

Anthropic

Google

LLM infrastructure gets messy fast.

Every team building serious AI hits the same wall.

Too many provider APIs

Every provider has different pricing, endpoints, SDKs, limits, and reliability patterns.

Costs are hard to control

AI spend grows quickly without routing, budgets, alerts, and per-team visibility.

Reliability breaks in production

Rate limits, outages, and latency spikes require automatic fallback logic.

Model choice changes constantly

The best model for your task today may not be the best model next month.

The solution

Relay gives your team one control plane for AI.

A single gateway between your application and every model — with routing, governance, and observability built in.

Your app

App / SDK / Edge

Relay gateway

Routing · Auth · Budgets · Logs · Fallbacks

Providers

OpenAI

Anthropic

Google

Find the right model for every task.

Compare context windows, pricing, and latency across providers in one place.

Model	Provider	Context	Input	Output	Latency	Strength	Use case
GPT-4.1	OpenAI	1M	$5/M	$15/M	Fast	Reasoning + coding	General + code
Claude Sonnet 4	Anthropic	200K	$3/M	$15/M	Fast	Writing + agents	Agents + writing
Gemini 2.5 Pro	Google	1M	$1.25/M	$5/M	Fast	Multimodal + long context	Long context, vision
Llama 4 405B	Meta	256K	$0.6/M	$0.9/M	Medium	Open weights	Customizable, self-host
DeepSeek R1	DeepSeek	128K	$0.27/M	$1.1/M	Medium	Reasoning	Math, logic, planning
Mistral Large 2	Mistral	128K	$2/M	$6/M	Fast	Enterprise EU	EU deployment
Grok 3	xAI	128K	$5/M	$15/M	Fast	Realtime + tools	Search-grounded
Llama 3.3 70B	Groq	128K	$0.59/M	$0.79/M	Ultra-fast	Lowest latency	Realtime UX

Smart routing

Route every request intelligently.

Send simple summaries to low-cost models, code generation to premium models, and long documents to long-context models — automatically.

Routing modes

Lowest cost

Lowest latency

Best quality

Balanced

Fallback chain

A/B testing

Task-based

Provider preference

Region-aware

Decision flow

1User request
2Classify task
3Choose model
4Check budget
5Send to provider
6If fail, fallback
7Log result
8Optimize future route

Cost control

Stop surprise AI bills.

Per-key budgets, team caps, rate limits, and real-time alerts — built in.

Monthly spend

$4,820

Budget remaining

$2,180

Cost per request

$0.0041

Token usage

182M

OpenAI$1820 / $3000

Anthropic$1240 / $2000

Google$940 / $2000

DeepSeek$320 / $1000

Built-in guardrails

Per-key budgets
Team budgets
Rate limits
Spend alerts
Model allowlists
Usage caps
Daily / monthly reporting

Observability

See every request in production.

Trace every call: prompt, response, tokens, cost, latency, fallback chain.

Time	Key	Team	Model	Provider	Tokens	Cost	Latency	Status
09:14:22	sk_***a91	Platform	openai/gpt-4.1	OpenAI	2,431	$0.018	412ms	200
09:14:18	sk_***a91	Platform	anthropic/claude-sonnet	Anthropic	3,820	$0.024	488ms	200
09:14:11	sk_***f24	Research	groq/llama-3.3-70b	Groq	812	$0.0008	94ms	200
09:13:58	sk_***a91	Platform	openai/gpt-4.1	OpenAI	—	—	—	429
09:13:58	sk_***a91	Platform	anthropic/claude-sonnet	Anthropic (fallback)	1,914	$0.012	521ms	200

Developer experience

OpenAI-compatible. Migration in minutes.

Drop-in compatibility with the OpenAI SDK. Just change the base URL.

index.ts

import OpenAI from "openai"

const client = new OpenAI({
  apiKey: process.env.RELAY_API_KEY,
  baseURL: "https://api.decks.ai/v1"
})

const completion = await client.chat.completions.create({
  model: "auto/best",
  messages: [{ role: "user", content: "Write a launch plan" }]
})

Use cases

Built for every kind of AI team.

AI startups

Ship faster without locking into one provider.

SaaS companies

Add AI features with cost controls and team visibility.

Enterprises

Centralize governance, budgets, logs, and model access.

Agencies

Manage client usage, keys, and billing in one place.

Developers

Compare models and build with one API.

Loved by AI teams

From scrappy startups to enterprise platforms.

"We cut our LLM bill by 41% in the first week and our p95 latency dropped by 280ms."

Priya ShahCTO, NebulaAI

"Relay is the layer we wished we'd built ourselves. Fallbacks alone are worth the price."

Marco ReyesML Platform Engineer, Lumen

"Finally one bill, one dashboard, one place to govern every model our product uses."

Aisha KhanProduct Lead, Northwind

"Migrated from our own routing layer in an afternoon. Just changed the base URL."

Daniel ChoFounder, Builder Studio

Pricing

Simple, usage-aware pricing.

Start free. Scale with confidence.

Free

$0/month

For exploring Relay.

10K requests/month
3 connected providers
Basic analytics
Community support

Start routing your AI traffic today.

No vendor lock-in. One bill. One API. Every model.

Built for production AI teams. Spend less. Ship faster. Fail less. One bill. One API. Every model.