ModelPilot · free 7-day trial

Cut your Claude bill — typically 20–40%.
Measured on your own traffic, not promised.

A drop-in gateway that routes every request to the cheapest Claude model that's provably good enough — informed by the whole conversation, never downgrading hard work — and shows you exactly what you saved, per prompt and per session, recalculated from the actual tokens each request consumed.

Sign up, connect your gateway, and see savings in your dashboard. After the trial you pay just 20% of the savings we deliver — no savings, no bill.

# no API key, no spend — see it route in two minutes
$ pip install "modelpilot @ git+https://github.com/krethikram-sudo/modelpilot.git"
$ modelpilot demo --offline

  1 classification    SWITCH → claude-haiku-4-5   conf 0.85
  2 debugging         stay   → claude-opus-4-8    quality protected
  3 extraction        SWITCH → claude-haiku-4-5   conf 0.85
  ...
REALIZED savings: $0.07   POTENTIAL: $0.12  (48% of baseline)

Two modes. Start in guidance, switch when convinced.

Guidance · start here

Nothing about your traffic changes. ModelPilot scores every request and shows you, on the dashboard, exactly what autopilot would save — so you decide with evidence, at zero risk.

Autopilot · when ready

Confidence-gated routing with a built-in randomized holdout, so the savings are measured, not estimated — and an escalation valve whose costs count against us. Flip one flag when the dashboard convinces you.

The whole product is built to earn that switch: run in guidance for a day, open the dashboard, and you'll see the would-have-saved number, the side-by-side proof that the cheaper model's output holds up, and a one-flag path to capture it.

The recommended rollout

1 · Run in guidance

One base-URL change. Your app behaves exactly as before; ModelPilot just watches and scores. modelpilot gateway --mode guidance

2 · Read the dashboard

It shows what autopilot would save (per chat and cumulative), and a side-by-side of your real prompts — recommended model vs your standard model — so you can see the output is just as good.

3 · Switch to autopilot

When the evidence is convincing, flip to --mode autopilot. The randomized holdout keeps proving quality parity while you bank the savings.

Why the routing is trustworthy

The thing naive routers get wrong: prompt caches are model-scoped, so switching models mid-conversation can forfeit your cache discount and cost more. ModelPilot prices the cache break before every switch and vetoes any move that doesn't actually pay.

Hard problemHow ModelPilot handles it
Prompt caches are model-scoped — naive switching loses money Cache-aware economics price the rewrite before any mid-conversation switch and veto switches that don't pay
Does the router itself add latency? The decision is local heuristics: ~0.05 ms per request — negligible next to multi-second generation
Structured outputs / JSON schemas can break brittle parsers Requests with a tool or output-schema contract are never downgraded below Sonnet, so the response shape your code depends on is preserved
"Summarize" can be trivial or expert-level Calibrated on a golden dataset with a ≤2% false-downgrade target; when unsure, it doesn't touch your request
"why?" after a hard exchange looks trivial Session-context routing: follow-ups inherit the conversation's difficulty; mechanical tasks over existing content stay cheap
Vendors grade their own homework Randomized holdout + escalation costs deducted + side-by-side output comparisons you can run yourself: modelpilot compare --judge

It adapts to your traffic — and gets cheaper the more you use it

A generic router leaves money on the table on your workload. ModelPilot learns four things from your own traffic — every change earned by evidence on your prompts, never a blunt global default:

1 · Gates tune themselves

Each category's confidence gate loosens where routing has proven safe at volume and tightens the instant anything escalates — live, no restart.

2 · Floors drop when proven

It samples your own prompts, runs the next-cheaper model against the baseline, and lowers the floor only when the cheaper output is judged non-inferior on your data. modelpilot learn-floors

3 · Rules learn your domain

Domain phrasing ("redline this clause", "triage this ticket") a generic router dumps into a catch-all gets mapped to the right category, so it's actually optimized. modelpilot learn-rules

4 · Savings beyond routing

Finds uncached repeated context and oversized prompts in your ledger and tells you the dollars caching or trimming would save — it never rewrites your prompts. modelpilot prompt-audit

Every downgrade is gated by non-inferiority on your own prompts and protected by the structured-output guard, so "cheaper" never means "riskier". The longer it runs, the more it can safely route down — savings compound with use.

Day-one fit: starter packs for your segment

Don't wait for the learning loop to warm up. Drop in a starter pack for your space and routing fits your workload from the first request — then your own traffic refines it:

MODELPILOT_POLICY=packs/doc-extraction.json modelpilot gateway --mode guidance

Aggressive packs

doc-extraction, support, coding — route bulk extraction, classification, triage and simple codegen to the cheapest model that holds up (where the savings are largest).

Regulated packs

legal, healthcare — conservative by design: never below Sonnet, with a built-in compliance profile (model floor + cautious gate) for legal and clinical work.

Packs are a strong starting prior, not a black box: every guardrail still applies, and learn-floors confirms or pulls back each category on your own traffic over time.

Fits how your company actually buys

A deployment profile makes routing obey your constraints, so cost optimization never fights compliance or your contract:

Approved models only

Allow- and block-lists keep routing inside the models you've cleared. A blocked cheap model falls back to the next approved tier — never silently somewhere you didn't sanction.

Your quality floor

Set a min_model and nothing — not the heuristics, not the learned floors — ever routes below it.

Your negotiated rates

Drop in your committed-use / enterprise pricing and every number reflects your bill, not list price.

Your risk posture

One dial — conservative, balanced, or aggressive — sets how confident the router must be before it switches.

It runs in your infrastructure with your own API key: keys and prompt text stay on your side — the ledger stores token counts and routing metadata, never prompt content. Telemetry is opt-in and aggregate-only (modelpilot telemetry --preview shows the exact payload first — never prompts, outputs, or keys).

Already on Bedrock?

AWS Bedrock has a cost router — Intelligent Prompt Routing. If you're on the first-party Anthropic API it isn't available to you at all, and ModelPilot is the only cost router for your traffic. If you are on Bedrock, the honest comparison:

Bedrock IPRModelPilot
Models it routes Two older models per router (Claude 3.x era) The current lineup — Opus 4.8, Sonnet 4.6, Haiku 4.5, Fable 5
Quality evidence Vendor prediction; no holdout, no output proof RCT holdout + both outputs side by side on your own prompts
Whose interest The vendor billing you also picks the model Customer-side; our only job is your bill going down
Lock-inAWS Bedrock only Your infra, first-party key, portable

Run the head-to-head on your own prompts: modelpilot compare --bedrock-router <your-router-arn> --judge.

The proof artifact — on your own traffic

This is what converts guidance users to autopilot. Run your captured prompts through both arms — ModelPilot's routing vs everything on the biggest model — and get one page with costs and outputs side by side, plus independent non-inferiority verdicts per prompt:

$ modelpilot compare --from-captures --judge
routed 14/20 prompts · saved 61% ($1.87) · non-inferior 100%
report: compare_report.html

Savings claims you can audit: actual token counts at list prices, the same prompts in both arms, and every output on the page for your own eyes — not just a judge's verdict. It gets better as you go: modelpilot tune learns from your traffic — loosening routing on categories that prove safe for your workload and backing off any that don't — so savings climb the more you use it.

Quickstart

# 1. Install
pip install "modelpilot @ git+https://github.com/krethikram-sudo/modelpilot.git"

# 2. Prove it to yourself (no key, no spend)
modelpilot demo --offline

# 3. Guidance mode: set your API key, point your app at it
#    (your free 7-day trial starts now — no license needed)
export ANTHROPIC_API_KEY=sk-ant-...
modelpilot gateway --mode guidance --port 8400
#    Python:  anthropic.Anthropic(base_url="http://localhost:8400")
#    any SDK: ANTHROPIC_BASE_URL=http://localhost:8400

# 4. Watch the dashboard — it shows what autopilot would save + the proof
open http://localhost:8400/modelpilot/dashboard

# 5. Convinced? One flag (free during your trial).
modelpilot gateway --mode autopilot --port 8400

# 6. After 7 days, keep it running with a license
export MODELPILOT_LICENSE="<your license token>"

Your first 7 days are free — the trial starts on first run, no key needed, full functionality; after that a license keeps it running. The offline demo is always free. Your API keys pass through and are never stored; the ledger holds token counts and routing metadata — no prompt text. Full details in the README.