For teams building on the Claude API

Cut your Claude bill — typically 20–40%.

ModelPilot is a drop-in proxy that routes every request to the cheapest model that's provably good enough — proven on your own traffic, never downgrading hard work. You pay 20% of what we save you. Nothing saved, nothing owed.

No card to start · your prompts never leave your box · cancel anytime
# 1. drop the proxy in front of the Claude API
$ pip install modelpilot-client
$ export MODELPILOT_DEPLOYMENT_ID=dep_…
$ modelpilot-client            # your key stays local

# 2. point your SDK at it — that's the whole change
client = Anthropic(base_url="http://127.0.0.1:8400")
✓ routing live — guidance mode
Saved this month
$1,240 · 31%
48,210 requests · 27,905 routed · 0 quality regressions
Baseline (all top-model)$4,000
With ModelPilot$2,760
Works with the Claude API — no app rewrite 20–40% typical savings 0% false-downgrades on our golden set 0 prompts stored by us
How it works

Live in five minutes. Savings on day one.

A drop-in proxy plus a hosted dashboard. Start in shadow mode to measure, then flip to autopilot when you trust the numbers — all from the console, no redeploy.

1

Drop in the proxy

Install the client and point your SDK's base URL at it. Your Anthropic key never leaves your machine — we only ever see a task category and token counts.

# one line in your app base_url="127.0.0.1:8400"
2

It routes, you stay in control

Each request is classified and sent to the cheapest model that's good enough. Choose shadow (measure only), guidance (recommend), or autopilot (auto-route) — toggle anytime.

# classify → floor → economics opus → haiku (non-inferior)
3

See the savings

Your dashboard shows realized savings by task type, baseline-vs-actual, and a non-inferiority proof rate — recalculated from the actual tokens each request used.

# billed only on this realized savings × 20%
Proof, not promises

Savings you can audit — measured on your own traffic.

We don't ask you to trust a marketing number. Every dollar is recomputed from the real tokens each request consumed, with quality checked side-by-side.

Baseline vs. actual — last 30 days
Baseline (all top-model)$4,000
With ModelPilot$2,760

Illustrative. Your dashboard shows your numbers, per prompt and per session.

20–40%
Typical bill reduction, measured on real customer traffic — not a best-case headline.
0%
False-downgrades on our golden set. Hard work (debugging, refactors, analysis) stays on the top model.
RCT holdout
A held-out control arm runs the baseline so savings are measured against a real counterfactual.
Side-by-side
A non-inferiority rate shows how often the cheaper model held up, judged on your own prompts.
Everything in one console

Built to earn a place in your request path.

🎚️

Shadow → guidance → autopilot

Adopt at your own pace. Measure first, recommend next, auto-route when you're convinced — switched server-side, no redeploy.

📊

Live savings dashboard

Realized savings by task type, baseline-vs-actual, lifetime totals, and your projected bill — recomputed from real tokens.

🔒

Prompts never leave your box

Classification happens locally. Only a category label and numeric features reach us — never prompt text, outputs, or your API key.

Quality, proven

Per-category floors, a universal guard that never downgrades structured-output or tool calls, and side-by-side non-inferiority checks.

🧠

Adapts to your traffic

It learns per-customer floors and rules from your own usage (judge-validated) and gets cheaper-safe over time — with your approval.

🛟

Fails open, always

If our brain is ever unreachable, traffic passes straight through to Anthropic, unrouted. We can never block your requests.

Privacy by architecture

Observability without shipping us your prompts.

Most gateways see everything you send. ModelPilot is built so the sensitive data physically can't reach us.

Your appprompt + API key Local proxyclassifies on your box ModelPilotcategory + token counts only Anthropicyour key, your prompt
Prompt text, model outputs, and API keys never transit our servers — enforced, and rejected at our edge if present.
Savings are metered as aggregate dollars and counts, not content.
The thin client is publishable and inspectable — the routing IP stays server-side, your data stays with you.
Self-serve, per-deployment isolation, and you can leave anytime — your key, your traffic.
Pricing

You only pay for savings we actually deliver.

No seats, no per-request fees, no token markup. The incentive is honest: we make money only when we cut your bill.

Free 7-day trial · no card
20% of realized savings

No savings, no bill. Cancel anytime.

  • Drop-in proxy + hosted savings dashboard
  • Shadow, guidance & autopilot modes
  • Per-task savings, proof, and your live bill
  • Prompts never leave your box
  • Per-customer tuning that improves with use
Start your free trial
FAQ

Questions, answered.

Do you see my prompts or my customers' data?

No. Classification runs locally in the client; only a task category and numeric features (token estimates, flags) are sent to ModelPilot. Prompt text, model outputs, and your API key never reach our servers — and our endpoints reject any payload that contains them.

What happens if ModelPilot goes down?

The proxy fails open: if our routing brain is unreachable, your request is forwarded straight to the Claude API, unrouted. We can degrade your savings, never your uptime.

How is the 20% bill computed?

We meter the realized savings on each routed request — baseline cost (what the original model would have cost) minus the actual cost — as aggregate dollars. Your bill each cycle is 20% of that, with a line-item breakdown in the console. If we save you nothing, you owe nothing.

Will it downgrade my hard or quality-sensitive requests?

No. Routing only ever moves down to a model that's provably good enough for the task, with per-category floors and a universal guard that keeps structured-output and tool-using calls on a capable model. Our golden set shows 0% false-downgrades, and you can run side-by-side non-inferiority checks on your own prompts.

Which models does it route between?

The current Claude family — it moves a request down the capability ladder (e.g. Opus → Sonnet → Haiku) only when the cheaper model is good enough and the economics pay off after cache effects.

Can I keep full control?

Yes. Start in shadow mode (measure, change nothing), move to guidance (recommendations only), and flip to autopilot when you're ready — all from the dashboard, no redeploy. Set a minimum model and risk tolerance per deployment.

See what you'd save this week.

Start in shadow mode, watch the savings add up on real traffic, and only pay when we actually cut your bill.

No card to start · prompts never leave your box · cancel anytime