What is Vachi?
Vachi is an LLM gateway for AI agents. It sits between your AI agent (Claude Code, Cursor, OpenClaw, LangChain, custom scripts) and the model provider (Anthropic, OpenAI, Google), and applies token distillation to every request before it reaches the model.
Token distillation weighs how much each token contributes to the outcome and rebuilds the payload around what carries real weight. The model receives a leaner, sharper input and does the same work — for less. Then Vachi caches the distilled result so context your agent repeats across many calls isn't paid for twice.
The combined two-step mechanism is called adaptive context caching.
How it works
- Your agent points its base URL at Vachi and brings its own API key (BYOK).
- Vachi accepts the request, distills it and caches the distilled payload.
- Vachi forwards the distilled request to your chosen frontier model.
- The model responds; Vachi streams the response back to your agent.
- Net result: same model, leaner payload, smaller bill.
What Vachi supports
Frontier models
- Anthropic — Claude Opus 4.8 and all Anthropic models
- OpenAI — GPT 5.5 and all OpenAI models
- Google — Gemini 3.1 Pro and all Google models
AI tools
- Claude Code, Cursor, OpenClaw
- Cline, Aider, LangChain, LlamaIndex
- Any OpenAI-compatible client
Pricing
- Drop-in setup; bring your own model API key.
- Performance share — only pay when Vachi actually lowers your bill on a request.
- Early-stage credit available on signup.
Learn more