Optional · Never required

Vrge Managed AI

If you'd rather not juggle API keys, pay a flat monthly fee and we run the AI for you. Hard token quotas. Zero overage bills. Cancel anytime — access runs through your current billing period, no renewal after that. The app works identically if you never subscribe.

Starter

Best for 1–2 users

For solo founders and partner duos testing managed AI.

$10/ month

500K tokens / month

Join the waitlist

500K AI tokens every month
Redact-by-default on every cloud call
Cancel anytime — no renewal, no retention nags
Falls back to BYO or Ollama at cap

Pro

Best for 3–10 active users

For small agencies and boutique consultancies running observer daily.

$25/ month

2M tokens / month

Join the waitlist

2M AI tokens every month
Per-user fair-use allocation for teams
Admin dashboard: top consumers, projected usage
Mid-cycle upgrade with prorated quota credit

Power

Best for 11–25 active users

Agency-scale for full observer deployment across the team.

$75/ month

8M tokens / month

Join the waitlist

8M AI tokens every month
Per-user fair-use allocation + admin controls
Priority routing to the fastest available model

More than 25 active users? Email us for custom pricing. Same flat-fee, hard-quota promise — sized for your team.

Hard quota · no overageRedact-by-defaultCancel anytime · no renewal

Full Managed AI details →

How it works

When you subscribe, the desktop app routes AI calls through a Cloudflare Worker we operate at ai.getvrge.com. The Worker holds the upstream provider keys (Anthropic today; OpenAI + Google in v1.1), enforces your monthly quota in a managed database, and forwards the redacted request.

The proxy is the source of truth. It checks your quota + subscription status beforecalling any upstream model. If you're over your cap, the Worker refuses the call with HTTP 429 and your app falls back to BYO keys or Ollama for the rest of the cycle. There is no “approximate” enforcement; the quota is a number in the database.

Cheapest-capable routing. You see a flat token quota. Internally, the Worker picks the smallest model that can handle each job — Haiku for classification, Sonnet for extraction, Opus only for complex drafting. That spread between retail price and wholesale model cost is where the tier's gross margin lives, and it's why your quota goes further than if you'd just bought 2M tokens of Opus.

What we log (metadata only)

The Managed AI proxy records one row per forwarded call with the following fields — and no others:

Your Vrge license key
Timestamp
Upstream provider (e.g. Anthropic)
Model selected (e.g. claude-haiku-4-5)
Input + output token counts
Task type (classify / extract / summarize / draft)
Status (ok / rejected_quota / error)
Redaction mode that applied

What we do not log: prompt bodies, completion text, schema definitions, or any user-identifying payload fields. This is a schema-level guarantee — the usage log table has no column for content, so no code path can leak it even by accident. The privacy invariant is enforced in the proxy test suite: every row written to the log is checked to have no content column. If someone adds one, a test fails and the deploy is blocked.

For teams: fair-use inside the quota

The team concern with any shared AI quota is one heavy user burning the month's budget in three days. The Pro and Power tiers solve this with a per-user allocation layer:

Default fair share. Each user gets approximately (org_quota ÷ seats) × 1.5 as their soft cap — 1.5× the even split, so a normal user has headroom.
Admin override. The admin dashboard has a per-user slider. Boost your power user; throttle the intern.
80% warning. When a user approaches their allocation, they see a banner in their own app.
Top-consumers view. The admin sees exactly which teammate and which feature has burned the most tokens this month. No guessing.
Mid-cycle upgrade math. Upgrade Pro → Power on day 20 with 1.5M used and you get 8M − 1.5M = 6.5M remaining for the final 10 days, prorated billing. No gaming, no accidental starvation.

The six guardrails we won't break

These are public commitments. If a future feature violates one, that feature has a bug.

1BYO keys stays free forever. Every feature in Vrge works with Ollama locally or with your own Anthropic/OpenAI/Google key. Managed AI is convenience, never a gate.
2Hard quota, zero overage. When you hit your monthly cap, the proxy refuses the call. No 'reasonable usage' language. No surprise bill, ever.
3Redact-by-default through our proxy. The client applies redaction before sending. The proxy verifies it was applied for non-manual sources and refuses the call otherwise.
4No prompt/response logging. Metadata only. Schema-level invariant — there's no content column to leak.
5Cancel anytime, no dark patterns. One click through the Lemon Squeezy customer portal. Access runs through the current billing period, no auto-renewal after cancel. No retention nags, pause-first flows, or win-back emails.
6Live quota meter + cost preview. Settings → AI shows real-time usage. Manual actions whose estimated cost exceeds 5% of remaining quota prompt you first.

Privacy-max? Self-host the proxy.

The Managed AI proxy ships as a Docker image under the same license as the app. Legal, medical, airgapped, or regulated industries can run it on their own infrastructure with their own provider keys. Point the desktop client at your hostname instead of ai.getvrge.com and you have the same redaction layer, the same metadata-only logging, and zero dependency on us for the AI path.

See the self-hosting guide for the full runbook.

Questions?

Read the Managed AI FAQs, see the privacy policy for Managed AI, or email us.