ModelPilot is a drop-in proxy that routes every request to the cheapest model that's provably good enough — proven on your own traffic, never downgrading hard work. You pay 20% of what we save you. Nothing saved, nothing owed.
# 1. drop the proxy in front of the Claude API $ pip install modelpilot-client $ export MODELPILOT_DEPLOYMENT_ID=dep_… $ modelpilot-client # your key stays local # 2. point your SDK at it — that's the whole change client = Anthropic(base_url="http://127.0.0.1:8400") ✓ routing live — guidance mode
A drop-in proxy plus a hosted dashboard. Start in shadow mode to measure, then flip to autopilot when you trust the numbers — all from the console, no redeploy.
Install the client and point your SDK's base URL at it. Your Anthropic key never leaves your machine — we only ever see a task category and token counts.
# one line in your app
base_url="127.0.0.1:8400"
Each request is classified and sent to the cheapest model that's good enough. Choose shadow (measure only), guidance (recommend), or autopilot (auto-route) — toggle anytime.
# classify → floor → economics
opus → haiku (non-inferior)
Your dashboard shows realized savings by task type, baseline-vs-actual, and a non-inferiority proof rate — recalculated from the actual tokens each request used.
# billed only on this
realized savings × 20%
We don't ask you to trust a marketing number. Every dollar is recomputed from the real tokens each request consumed, with quality checked side-by-side.
Illustrative. Your dashboard shows your numbers, per prompt and per session.
Adopt at your own pace. Measure first, recommend next, auto-route when you're convinced — switched server-side, no redeploy.
Realized savings by task type, baseline-vs-actual, lifetime totals, and your projected bill — recomputed from real tokens.
Classification happens locally. Only a category label and numeric features reach us — never prompt text, outputs, or your API key.
Per-category floors, a universal guard that never downgrades structured-output or tool calls, and side-by-side non-inferiority checks.
It learns per-customer floors and rules from your own usage (judge-validated) and gets cheaper-safe over time — with your approval.
If our brain is ever unreachable, traffic passes straight through to Anthropic, unrouted. We can never block your requests.
Most gateways see everything you send. ModelPilot is built so the sensitive data physically can't reach us.
No seats, no per-request fees, no token markup. The incentive is honest: we make money only when we cut your bill.
No savings, no bill. Cancel anytime.
No. Classification runs locally in the client; only a task category and numeric features (token estimates, flags) are sent to ModelPilot. Prompt text, model outputs, and your API key never reach our servers — and our endpoints reject any payload that contains them.
The proxy fails open: if our routing brain is unreachable, your request is forwarded straight to the Claude API, unrouted. We can degrade your savings, never your uptime.
We meter the realized savings on each routed request — baseline cost (what the original model would have cost) minus the actual cost — as aggregate dollars. Your bill each cycle is 20% of that, with a line-item breakdown in the console. If we save you nothing, you owe nothing.
No. Routing only ever moves down to a model that's provably good enough for the task, with per-category floors and a universal guard that keeps structured-output and tool-using calls on a capable model. Our golden set shows 0% false-downgrades, and you can run side-by-side non-inferiority checks on your own prompts.
The current Claude family — it moves a request down the capability ladder (e.g. Opus → Sonnet → Haiku) only when the cheaper model is good enough and the economics pay off after cache effects.
Yes. Start in shadow mode (measure, change nothing), move to guidance (recommendations only), and flip to autopilot when you're ready — all from the dashboard, no redeploy. Set a minimum model and risk tolerance per deployment.
Start in shadow mode, watch the savings add up on real traffic, and only pay when we actually cut your bill.
No card to start · prompts never leave your box · cancel anytime