For teams building on the Claude API

Cut your Claude bill — pay only for savings we prove.

A drop-in proxy routes each request to the cheapest model that's good enough and proves it on your own traffic with a held-out control arm. You pay only a share of the savings we actually deliver — and your prompts never leave your environment.

No card to start · pay only for realized savings · quality proven on your own traffic · prompts stay in your environment
# 1. drop the proxy in front of the Claude API
$ pip install modelpilot-client
$ export MODELPILOT_DEPLOYMENT_ID=dep_…
$ modelpilot-client            # your key stays local

# 2. point your SDK at it — that's the whole change
client = Anthropic(base_url="http://127.0.0.1:8400")
✓ routing live — guidance mode
Saved this month example
$1,240 · 31%
48,210 requests · 27,905 routed · 0 quality regressions
Baseline (all top-model)$4,000
With ModelPilot$2,760
Works with the Claude API — no app rewrite 20–40% typical savings 0% false-downgrades on our golden set 0 prompts stored by us
How it works

Live in five minutes. Savings on day one.

A drop-in proxy plus a hosted dashboard. Start in guidance mode to see recommendations, then flip to autopilot when you trust the numbers — all from the console, no redeploy.

1

Drop in the proxy

Install the client and point your SDK's base URL at it. Your Anthropic key never leaves your machine — we only ever see a task category and token counts.

# one line in your app base_url="127.0.0.1:8400"
2

It routes, you stay in control

Each request is classified and sent to the cheapest model that's good enough. Choose guidance (recommend, you stay in control) or autopilot (auto-route) — toggle anytime.

# classify → floor → economics opus → haiku (non-inferior)
3

See the savings

Your dashboard shows realized savings by task type, baseline-vs-actual, and a non-inferiority proof rate — recalculated from the actual tokens each request used.

# billed only on this realized savings × 20%
Proof, not promises

Savings you can audit — measured on your own traffic.

We don't ask you to trust a marketing number. Every dollar is recomputed from the real tokens each request consumed, with quality checked side-by-side.

Baseline vs. actual — last 30 days
Baseline (all top-model)$4,000
With ModelPilot$2,760

Illustrative. Your dashboard shows your numbers, per prompt and per session.

20–40%
Typical bill reduction, measured on real customer traffic — not a best-case headline.
0%
False-downgrades on our golden set. Hard work (debugging, refactors, analysis) stays on the top model.
RCT holdout
A held-out control arm runs the baseline so savings are measured against a real counterfactual.
Side-by-side
A non-inferiority rate shows how often the cheaper model held up, judged on your own prompts.
Everything in one console

Built to earn a place in your request path.

Guidance → autopilot

Adopt at your own pace. Start in guidance — we recommend the cheaper-but-good-enough model — then flip to autopilot when you're convinced, ramping from a slice of traffic to all of it. Switched server-side, no redeploy.

Live savings dashboard

Realized savings by task type, baseline-vs-actual, lifetime totals, and your projected bill — recomputed from real tokens.

Prompts never leave your system

Classification happens locally. Only a category label and numeric features reach us — never prompt text, outputs, or your API key.

Quality, proven

Per-category floors, a universal guard that never downgrades structured-output or tool calls, and side-by-side non-inferiority checks.

Adapts to your traffic

It learns per-customer floors and rules from your own usage (judge-validated) and gets cheaper-safe over time — with your approval.

Fails open, always

If our brain is ever unreachable, traffic passes straight through to Anthropic, unrouted. We can never block your requests.

Privacy by architecture

Observability without shipping us your prompts.

Most gateways see everything you send. ModelPilot is built so the sensitive data physically can't reach us — purpose-built for teams that can't let prompts leave their environment: healthcare, legal, and financial services.

Your appprompt + API key Local proxyclassifies on your system ModelPilotcategory + token counts only Anthropicyour key, your prompt
Prompt text, model outputs, and API keys never transit our servers — enforced, and rejected at our edge if present.
Savings are metered as aggregate dollars and counts, not content.
The thin client is publishable and inspectable — the routing IP stays server-side, your data stays with you.
Self-serve, per-deployment isolation, and you can leave anytime — your key, your traffic.
What we optimize

Where we cut your bill — by work type.

We route the high-volume routine work to a cheaper model and keep your hard reasoning on the top model. Measured on your own traffic; you pay only on what we save.

Work type Cut vs Opus
Short Q&A / lookups~80%
Data / field extraction~71%
Classification / triage~68%
Rewrite / reformat~73%
Translation~70%
Summaries (short)~60%
Simple code / SQL~80%

Hard reasoning (complex coding, debugging, math, agents, analysis) stays on the top model — quality protected. Illustrative at list prices; measured on your traffic. Full breakdown + which teams it saves for →

// How we compare

Different tool for a different job.

Routing to a cheaper model isn't new — Martian and OpenRouter do it well, and they beat us on breadth and maturity. We're built for a narrower job: cut your Claude bill, prove the savings on your own traffic, and bill you only for what we actually deliver — with prompts that never leave your environment. Here's the honest split.

  ModelPilot OpenRouter Martian Build it yourself
Cuts your Claude bill (routes to a cheaper good-enough model) ~possible (pick a cheaper model / auto-router), not its focus ~if you build it
You pay only on realized savings 20% PAYG · 15% on subscription tiers ~5% of spend metered usage no fee, but you build + maintain it
Quality proven on your own traffic (held-out control arm) non-inferiority RCT, per customer headline % claims, no per-customer control arm you'd build the evals
Prompts never leave your environment classifies locally; your key, your prompt proxies your traffic routes on the prompt your infra
Keep your own provider account (BYOK) supported; fee above the free tier ~
Fails open if the optimizer is unreachable straight through to Anthropic it's in your request path it's in your request path ~depends on your build
Model breadth across providers Claude-focused by design 400+ models, 60+ providers broad multi-provider routing ~whatever you integrate
Mature & battle-tested at scale early-stage ~$1.3B raised; ~25T tokens/wk ~seed-stage (~$9M, NEA); Accenture-backed

How to choose: pick OpenRouter or Martian if you need many models across providers or a battle-tested vendor at scale. Pick ModelPilot if prompt privacy, pay-only-for-savings, and quality proven on your own traffic matter most — that combination is our wedge. See the full, honest comparison →

Pricing

Pay for savings, not your data.

You only pay for the savings we deliver — 20% on Pay-as-you-go, or a subscription + 15% on the optimization tiers — and your prompt data never leaves your system, even on those tiers. Three ways to go:

Pay-as-you-go

Drop-in routing to the cheapest good-enough model, with a live savings dashboard. No subscription.

20%of realized savings · no subscription
  • Drop-in proxy + savings dashboard
  • Guidance & autopilot modes
  • Per-task savings, proof & live % bill cut
  • Prompts never leave your system
Start free trial
Self-optimize popular

Routing tuned to your own traffic — per-category floors learned from your usage patterns (metadata only; your content never leaves your system).

$99/mo + 15% of savings
  • Everything in Pay-as-you-go
  • Per-customer tuning to your own traffic
  • Per-category floors learned from your workload
  • Sharper savings — your content never leaves your system
Start free trial
Managed done for you

We continuously tune routing to your traffic for you — metadata only; your content never leaves your system.

Subscription+ 15% of savings · pricing coming soon
  • Everything in Self-optimize
  • We continuously tune it for you
  • Maximum safe savings, judge-validated
  • Priority support
Talk to us
Which tier should you pick?

Start on Pay-as-you-go — no fee, and it proves the savings on your own bill. Self-optimize adds two things: a lower rate (15% vs 20%) and routing tuned to your own traffic, which safely lowers the per-category floors the global defaults keep conservative — typically an estimated 15–35% more savings. Rule of thumb: it pays for itself once you're saving more than ~$300–$500/month. Your exact uplift is measured on your own traffic with a held-out control arm and shown in your dashboard — we never guess it after the fact.

Free 7-day trial · no card to start · your prompt data never leaves your system on any tier.

FAQ

Questions, answered.

Do you see my prompts or my customers' data?

No. Classification runs locally in the client; only a task category and numeric features (token estimates, flags) are sent to ModelPilot. Prompt text, model outputs, and your API key never reach our servers — and our endpoints reject any payload that contains them.

What happens if ModelPilot goes down?

The proxy fails open: if our routing brain is unreachable, your request is forwarded straight to the Claude API, unrouted. We can degrade your savings, never your uptime.

How does pricing work?

You pay a share of realized savings, metered from the real tokens each request used (baseline minus actual), with a line-item breakdown in the console — 20% on Pay-as-you-go, or a subscription + 15% on the optimization tiers (Self-optimize $99/mo; Managed pricing coming soon), which let us tune routing to your own traffic (metadata only — your content never leaves your system). No savings, no bill.

Will it downgrade my hard or quality-sensitive requests?

No. Routing only ever moves down to a model that's provably good enough for the task, with per-category floors and a universal guard that keeps structured-output and tool-using calls on a capable model. Our golden set shows 0% false-downgrades, and you can run side-by-side non-inferiority checks on your own prompts.

Which models does it route between?

The current Claude family — it moves a request down the capability ladder (e.g. Opus → Sonnet → Haiku) only when the cheaper model is good enough and the economics pay off after cache effects.

Can I keep full control?

Yes. Start in guidance mode (recommendations only, traffic unchanged), and flip to autopilot when you're ready — all from the dashboard, no redeploy. Set a minimum model and risk tolerance per deployment.

See what you'd save this week.

Start in guidance mode, watch the savings add up on real traffic, and only pay when we actually cut your bill.

No card to start · prompts never leave your system · cancel anytime