Architecture & privacy
ModelPilot is split so the sensitive data physically can't reach us. A local proxy classifies on your box; a hosted brain returns the routing decision from a category and numbers.
Request flow
your app ──▶ local proxy ──▶ Anthropic API ──▶ your app
│ (classifies locally; forwards with YOUR key)
▼
ModelPilot brain ◀── category + token counts only
(floors · economics · gate · entitlement · mode)
- Your app calls the proxy instead of
api.anthropic.com(one line: the base URL). - The proxy classifies the request locally into a task category and extracts numeric features (token estimates, flags) — no prompt text leaves the process.
- It asks the brain for a decision, sending only the category + features. The brain applies the per-category floors, switch economics, the confidence gate, your entitlement, and your mode.
- The proxy forwards the (possibly cheaper) request to Anthropic with your key and returns the
response — adding
x-modelpilot-decision/x-modelpilot-routedheaders. - If the brain is unreachable, the proxy fails open: the request goes straight to Anthropic, unrouted.
What leaves your box — and what never does
| Stays local | Sent to ModelPilot |
|---|---|
| Prompt text & system prompts | Task category (e.g. classification) |
| Model outputs / completions | Token-count estimates & boolean flags (has-tools, etc.) |
| Your Anthropic API key | Requested model + your deployment id / API key |
| Any customer data | Aggregate savings dollars & counts (for billing) |
Metering & billing
Your local ledger records counts and dollars (never prompt text). The client periodically reports the delta of realized savings to the console as aggregates; your bill each cycle is 20% of that. You can audit every figure in the dashboard, recomputed from the actual tokens each request used.
How routing protects quality
- Floors: each task category has a cheapest tier we believe is non-inferior; routing only ever moves down to that floor, never above the model you requested.
- Guards: structured-output and tool-using calls are never routed below a capable model.
- Session awareness: a short follow-up inherits the session's difficulty, so "why?" after a hard debugging exchange isn't sent to a weak model.
- Economics: a switch is only applied when it actually pays after cache effects.
- Proof: a held-out control arm (RCT) runs the baseline, and side-by-side non-inferiority checks validate the cheaper model on your own prompts.
The client is inspectable
The proxy is a thin, publishable client: it contains the commodity classifier and the forwarding logic, but none of the routing IP (floors, price table, economics) — those stay server-side. You can read exactly what it sends.
© 2026 ModelPilot · krethikram@gmail.com