Optional · Never required
Vrge Managed AI
If you'd rather not juggle API keys, pay a flat monthly fee and we run the AI for you. Hard token quotas. Zero overage bills. Cancel anytime — access runs through your current billing period, no renewal after that. The app works identically if you never subscribe.
Starter
Best for 1–2 users
For solo founders and partner duos testing managed AI.
500K tokens / month
Join the waitlist- 500K AI tokens every month
- Redact-by-default on every cloud call
- Cancel anytime — no renewal, no retention nags
- Falls back to BYO or Ollama at cap
Pro
Best for 3–10 active users
For small agencies and boutique consultancies running observer daily.
2M tokens / month
Join the waitlist- 2M AI tokens every month
- Per-user fair-use allocation for teams
- Admin dashboard: top consumers, projected usage
- Mid-cycle upgrade with prorated quota credit
Power
Best for 11–25 active users
Agency-scale for full observer deployment across the team.
8M tokens / month
Join the waitlist- 8M AI tokens every month
- Per-user fair-use allocation + admin controls
- Priority routing to the fastest available model
More than 25 active users? Email us for custom pricing. Same flat-fee, hard-quota promise — sized for your team.
How it works
When you subscribe, the desktop app routes AI calls through a Cloudflare Worker we operate at ai.getvrge.com. The Worker holds the upstream provider keys (Anthropic today; OpenAI + Google in v1.1), enforces your monthly quota in a managed database, and forwards the redacted request.
The proxy is the source of truth. It checks your quota + subscription status beforecalling any upstream model. If you're over your cap, the Worker refuses the call with HTTP 429 and your app falls back to BYO keys or Ollama for the rest of the cycle. There is no “approximate” enforcement; the quota is a number in the database.
Cheapest-capable routing. You see a flat token quota. Internally, the Worker picks the smallest model that can handle each job — Haiku for classification, Sonnet for extraction, Opus only for complex drafting. That spread between retail price and wholesale model cost is where the tier's gross margin lives, and it's why your quota goes further than if you'd just bought 2M tokens of Opus.
What we log (metadata only)
The Managed AI proxy records one row per forwarded call with the following fields — and no others:
- Your Vrge license key
- Timestamp
- Upstream provider (e.g. Anthropic)
- Model selected (e.g. claude-haiku-4-5)
- Input + output token counts
- Task type (classify / extract / summarize / draft)
- Status (ok / rejected_quota / error)
- Redaction mode that applied
What we do not log: prompt bodies, completion text, schema definitions, or any user-identifying payload fields. This is a schema-level guarantee — the usage log table has no column for content, so no code path can leak it even by accident. The privacy invariant is enforced in the proxy test suite: every row written to the log is checked to have no content column. If someone adds one, a test fails and the deploy is blocked.
For teams: fair-use inside the quota
The team concern with any shared AI quota is one heavy user burning the month's budget in three days. The Pro and Power tiers solve this with a per-user allocation layer:
- Default fair share. Each user gets approximately
(org_quota ÷ seats) × 1.5as their soft cap — 1.5× the even split, so a normal user has headroom. - Admin override. The admin dashboard has a per-user slider. Boost your power user; throttle the intern.
- 80% warning. When a user approaches their allocation, they see a banner in their own app.
- Top-consumers view. The admin sees exactly which teammate and which feature has burned the most tokens this month. No guessing.
- Mid-cycle upgrade math. Upgrade Pro → Power on day 20 with 1.5M used and you get
8M − 1.5M = 6.5Mremaining for the final 10 days, prorated billing. No gaming, no accidental starvation.
The six guardrails we won't break
These are public commitments. If a future feature violates one, that feature has a bug.
- 1BYO keys stays free forever. Every feature in Vrge works with Ollama locally or with your own Anthropic/OpenAI/Google key. Managed AI is convenience, never a gate.
- 2Hard quota, zero overage. When you hit your monthly cap, the proxy refuses the call. No 'reasonable usage' language. No surprise bill, ever.
- 3Redact-by-default through our proxy. The client applies redaction before sending. The proxy verifies it was applied for non-manual sources and refuses the call otherwise.
- 4No prompt/response logging. Metadata only. Schema-level invariant — there's no content column to leak.
- 5Cancel anytime, no dark patterns. One click through the Lemon Squeezy customer portal. Access runs through the current billing period, no auto-renewal after cancel. No retention nags, pause-first flows, or win-back emails.
- 6Live quota meter + cost preview. Settings → AI shows real-time usage. Manual actions whose estimated cost exceeds 5% of remaining quota prompt you first.
Privacy-max? Self-host the proxy.
The Managed AI proxy ships as a Docker image under the same license as the app. Legal, medical, airgapped, or regulated industries can run it on their own infrastructure with their own provider keys. Point the desktop client at your hostname instead of ai.getvrge.com and you have the same redaction layer, the same metadata-only logging, and zero dependency on us for the AI path.
See the self-hosting guide for the full runbook.
Questions?
Read the Managed AI FAQs, see the privacy policy for Managed AI, or email us.