Open-source LLM router

One endpoint in front of every LLM provider

Pool your own API keys behind a single OpenAI- and Anthropic-compatible endpoint. Automatic failover, per-key rate limits, and real cost tracking — with sub-2 ms overhead.

Start routing — it's free Self-host in one command

Free while in beta · bring your own provider keys · Apache-2.0

relay overhead p50 < 2 ms

< 0 ms

p50 added latency

measured pod-to-pod in a live cluster

requests/sec per pod

scale horizontally from there

models in the catalog

open, versioned, extensible

100%

open source

Apache-2.0, nothing phones home

Routes to anything speaking OpenAI or Anthropic wire format

OpenAIAnthropicGoogle GeminiAWS BedrockVertex AIAzure OpenAIGroqMistralDeepSeekxAIOllamaTogetherFireworksOpenRouterOpenAIAnthropicGoogle GeminiAWS BedrockVertex AIAzure OpenAIGroqMistralDeepSeekxAIOllamaTogetherFireworksOpenRouter

Drop-in

Change one line.
Keep your SDK.

Relay speaks the OpenAI and Anthropic wire formats natively and translates between them on the fly. Point your existing SDK at Relay, authenticate with a relay key, and route to any model on any provider — no rewrites, no vendor lock-in.

Your provider keys never leave your control
Streaming passes straight through
Token counts and cost come from the provider, per request

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.r.wyolet.com/openai/v1",  # the only change
    api_key=os.environ["RELAY_KEY"],
)

resp = client.chat.completions.create(
    model="claude-sonnet-4-5",  # any model, any provider
    messages=[{"role": "user", "content": "hello"}],
)

Why relay

The routing layer your LLM stack is missing

Everything between your code and the providers — pooling, limits, failover, and observability — handled in one place.

Key pooling & failover

Combine many keys, accounts, or providers into one pool. Relay load-balances across them and fails over before the first byte — per-account rate limits stop being your ceiling.

Disposable relay keys

Mint keys scoped to exactly the models and limits you choose. Hand them out freely: if one leaks, the damage is capped and your real provider keys are never exposed.

Cross-shape translation

Call Claude with the OpenAI SDK or GPT with the Anthropic SDK. Relay translates between wire formats through one canonical protocol — streaming included.

Cost & usage, per request

Every request is metered with real provider token counts and priced from an open catalog. Slice usage by key, model, or provider — no more guessing what agents cost.

Circuit breakers

Per-key breakers track auth failures, rate limits, and outages separately. A misbehaving upstream is quarantined automatically and healed out-of-band.

Built for the hot path

Go, allocation-conscious, one Redis round-trip per request. Sub-2 ms added latency at p50 and thousands of requests per second on a single pod.

How it works

From sign-in to first token in minutes

Add your provider keys

Sign in and add keys for the providers you already use — OpenAI, Anthropic, Bedrock, anything in the catalog. Stored AES-256-GCM encrypted, or referenced from your own environment. They are yours; relay just routes with them.

Set a policy, mint a relay key

A policy decides which models a key may reach and how hard it may push — rate limits, model allowlists, provider scopes. Mint as many relay keys as you want against it: one per app, per agent, per teammate.

Point your SDK at relay

Swap the base URL, keep everything else. Relay authenticates the key, picks a healthy upstream from your pool, streams the response back, and meters exact tokens and cost — all in under 2 ms of overhead.

Read the quickstart →

Self-host

Your infrastructure, your rules

The exact same relay that powers our cloud, as a single container — API, admin UI, database, and a pre-seeded catalog of 400+ models. Apache-2.0, Kubernetes-native, nothing phones home.

Star on GitHub Deployment guides →

$ docker run -p 8080:8080 -p 8081:8081 wyolet/relay:standalone> relay listening — inference :8080 · console :8081
> catalog seeded: 400+ models, 40+ hosts

Pricing

Free to route. Pay your providers, not us.

Relay never marks up tokens. Inference is billed by your providers on your keys — relay is the infrastructure in between.

Cloud

beta

Freewhile in beta

The hosted relay at r.wyolet.com. Bring your own provider keys.

Your own scoped workspace
Unlimited relay keys & policies
Usage and cost dashboards
Sign in with Google or email

Start routing

Self-host

Freeforever, Apache-2.0

Run it yourself — one container or a full Kubernetes deployment.

Every feature, no gates
Your hardware, your data
Helm chart & compose files
Community support on Discord

Read the quickstart

Enterprise

Customtalk to us

Managed hosting, dedicated instances, or support for your own.

Dedicated single-tenant relay
SSO / SAML
Priority support & SLAs
Deployment engineering

FAQ

Questions, answered

What does "bring your own keys" mean?

You add your existing provider API keys (OpenAI, Anthropic, Bedrock, …) to relay. Inference is billed by those providers directly to you — relay routes, pools, meters, and protects the keys, and never marks up a token.

How are my provider keys stored?

Encrypted with AES-256-GCM under a master key before they touch the database. When self-hosting you can also reference keys from your own environment or an external secret manager, so they are never persisted at all.

What happens when a provider goes down or rate-limits me?

Per-key circuit breakers classify the failure — auth, rate limit, server error, unreachable — and relay fails over to the next healthy key or provider in your pool before the first byte is sent back. Broken keys heal automatically out-of-band.

Does routing add latency?

Under 2 ms at the median and under 15 ms at p99 in a live cluster deployment. The hot path is allocation-conscious Go with a single Redis round-trip per request, and responses stream straight through.

Is the hosted cloud different from the open-source relay?

Same binary, same features. The cloud at r.wyolet.com is our production deployment of the Apache-2.0 code with multi-user workspaces enabled. If you outgrow it — or want data on your own metal — self-host and take your config with you.

Stop babysitting provider keys.
Start routing.

Open the console Read the docs

One endpoint in front of every LLM provider

Change one line.Keep your SDK.

The routing layer your LLM stack is missing

Key pooling & failover

Disposable relay keys

Cross-shape translation

Cost & usage, per request

Circuit breakers

Built for the hot path

From sign-in to first token in minutes

Add your provider keys

Set a policy, mint a relay key

Point your SDK at relay

Your infrastructure, your rules

Free to route. Pay your providers, not us.

Cloud

Self-host

Enterprise

Questions, answered

Stop babysitting provider keys.Start routing.

Change one line.
Keep your SDK.

Stop babysitting provider keys.
Start routing.