Reduce AI spend without sacrificing quality.

Varsten helps your team spend less on AI by reusing safe repeat answers, sending each request to the right model, and checking that cost savings do not hurt quality.

app.varsten.ai/command-center
Command Center

Drop-in proxy

One line to integrate.

Keep your provider SDK. Point its base URL at Varsten and swap the key. Streaming, tool calls, and your existing code stay exactly as they are.

client.py
from openai import OpenAI

client = OpenAI(
    base_url="https://proxy.varsten.ai/v1",  # was https://api.openai.com/v1
    api_key=os.environ["VARSTEN_API_KEY"],  # your Varsten key
)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    stream=True,
)

Response reuse

Repeat work does not need a new model call.

When the same request shows up again, Varsten can serve the stored response instead of paying for another completion. Near-duplicate matching can be enabled only on routes where it is safe; otherwise the request streams straight through untouched.

Routing & evals

A cheaper model ships only when the numbers say it's safe.

Before a route moves to a smaller model, Varsten replays your real traffic through both and grades them with a position-swapped judge. Savings are measured as an A/B difference against a concurrent holdback and reported with confidence intervals — not an estimate. If quality slips past tolerance, the route rolls back.

Reliability

Inline, but it can't take you down.

The data plane fails open. If anything upstream is unreachable, requests pass through to your original provider unchanged — you stop saving, you never stop serving. Strict read and total timeouts mean a hung upstream can't pin a connection.

Pricing

Pay only when Varsten proves savings.

Start with Free to monitor AI spend month by month and review savings recommendations. Move to Performance when you want Varsten to optimize traffic directly: 25% of verified savings, with you keeping the other 75%.

Free
$0 /mo

For teams that need AI spend monitoring and savings recommendations.

  • Ongoing AI spend monitoring
  • Month-by-month spend trends
  • Savings recommendations by route, model, and workload
  • Pricing and catalog trust checks
  • Read-only Proof dashboard

Savings proof

See the split before you pay.

The calculator shows the shared-savings economics. The proof rules explain what Varsten can bill and what stays off the invoice.

Interactive calculator

Estimate the 75/25 split.

Uses a conservative 20% savings assumption. Real billing uses verified savings only, never projections.

$5k$100k+
Gross savings at 20%$5,000/mo
Varsten fee at 25%$1,250/mo
You keep 75%$3,750/mo
Annualized net savings$45,000/yr

Verified savings

What counts as billable proof?

Varsten only charges when attribution can defend the delta.

  • Cached repeat responses where the avoided model call cost is known.
  • Batch routing measured as sync price minus batch price.
  • Routing and model swaps measured against a live holdback or approved eval gate.
  • Quality guardrails and rollback history attached to each optimization.
  • Recommendations, estimates, and customer-side changes are not billed.