Open-source LLM router
One endpoint in front of every LLM provider
Pool your own API keys behind a single OpenAI- and Anthropic-compatible endpoint. Automatic failover, per-key rate limits, and real cost tracking — with sub-2 ms overhead.
Free while in beta · bring your own provider keys · Apache-2.0
Routes to anything speaking OpenAI or Anthropic wire format
Drop-in
Change one line.
Keep your SDK.
Relay speaks the OpenAI and Anthropic wire formats natively and translates between them on the fly. Point your existing SDK at Relay, authenticate with a relay key, and route to any model on any provider — no rewrites, no vendor lock-in.
- Your provider keys never leave your control
- Streaming passes straight through
- Token counts and cost come from the provider, per request
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.r.wyolet.com/openai/v1", # the only change
api_key=os.environ["RELAY_KEY"],
)
resp = client.chat.completions.create(
model="claude-sonnet-4-5", # any model, any provider
messages=[{"role": "user", "content": "hello"}],
)Why relay
The routing layer your LLM stack is missing
Everything between your code and the providers — pooling, limits, failover, and observability — handled in one place.
Key pooling & failover
Combine many keys, accounts, or providers into one pool. Relay load-balances across them and fails over before the first byte — per-account rate limits stop being your ceiling.
Disposable relay keys
Mint keys scoped to exactly the models and limits you choose. Hand them out freely: if one leaks, the damage is capped and your real provider keys are never exposed.
Cross-shape translation
Call Claude with the OpenAI SDK or GPT with the Anthropic SDK. Relay translates between wire formats through one canonical protocol — streaming included.
Cost & usage, per request
Every request is metered with real provider token counts and priced from an open catalog. Slice usage by key, model, or provider — no more guessing what agents cost.
Circuit breakers
Per-key breakers track auth failures, rate limits, and outages separately. A misbehaving upstream is quarantined automatically and healed out-of-band.
Built for the hot path
Go, allocation-conscious, one Redis round-trip per request. Sub-2 ms added latency at p50 and thousands of requests per second on a single pod.
How it works
From sign-in to first token in minutes
Add your provider keys
Sign in and add keys for the providers you already use — OpenAI, Anthropic, Bedrock, anything in the catalog. Stored AES-256-GCM encrypted, or referenced from your own environment. They are yours; relay just routes with them.
Set a policy, mint a relay key
A policy decides which models a key may reach and how hard it may push — rate limits, model allowlists, provider scopes. Mint as many relay keys as you want against it: one per app, per agent, per teammate.
Point your SDK at relay
Swap the base URL, keep everything else. Relay authenticates the key, picks a healthy upstream from your pool, streams the response back, and meters exact tokens and cost — all in under 2 ms of overhead.
Self-host
Your infrastructure, your rules
The exact same relay that powers our cloud, as a single container — API, admin UI, database, and a pre-seeded catalog of 400+ models. Apache-2.0, Kubernetes-native, nothing phones home.
> catalog seeded: 400+ models, 40+ hosts
Pricing
Free to route. Pay your providers, not us.
Relay never marks up tokens. Inference is billed by your providers on your keys — relay is the infrastructure in between.
Cloud
betaThe hosted relay at r.wyolet.com. Bring your own provider keys.
- Your own scoped workspace
- Unlimited relay keys & policies
- Usage and cost dashboards
- Sign in with Google or email
Self-host
Run it yourself — one container or a full Kubernetes deployment.
- Every feature, no gates
- Your hardware, your data
- Helm chart & compose files
- Community support on Discord
Enterprise
Managed hosting, dedicated instances, or support for your own.
- Dedicated single-tenant relay
- SSO / SAML
- Priority support & SLAs
- Deployment engineering
FAQ
Questions, answered
What does "bring your own keys" mean?
You add your existing provider API keys (OpenAI, Anthropic, Bedrock, …) to relay. Inference is billed by those providers directly to you — relay routes, pools, meters, and protects the keys, and never marks up a token.
How are my provider keys stored?
Encrypted with AES-256-GCM under a master key before they touch the database. When self-hosting you can also reference keys from your own environment or an external secret manager, so they are never persisted at all.
What happens when a provider goes down or rate-limits me?
Per-key circuit breakers classify the failure — auth, rate limit, server error, unreachable — and relay fails over to the next healthy key or provider in your pool before the first byte is sent back. Broken keys heal automatically out-of-band.
Does routing add latency?
Under 2 ms at the median and under 15 ms at p99 in a live cluster deployment. The hot path is allocation-conscious Go with a single Redis round-trip per request, and responses stream straight through.
Is the hosted cloud different from the open-source relay?
Same binary, same features. The cloud at r.wyolet.com is our production deployment of the Apache-2.0 code with multi-user workspaces enabled. If you outgrow it — or want data on your own metal — self-host and take your config with you.
Stop babysitting provider keys.
Start routing.
Sign in, add a provider key, mint a relay key — your first request routes in minutes. Free while in beta.