Ben @ Grepture
Product Updates

Budgets: Hard Spend Caps on Your AI Traffic

Set per-key or per-label spend limits with email alerts at 50, 80, and 100 percent. The proxy rejects requests with HTTP 402 once a budget is exhausted.

What a budget is

A budget is a spend rule attached to a scope. The scope is either an API key or a label. The rule has a dollar limit and a window — daily or monthly. While requests come in through the proxy, their cost is computed and added to the budget. When the total hits the limit, the proxy starts rejecting new matching requests with HTTP 402 Payment Required.

That's the whole feature. Two things happen as the cap fills up:

  1. Email alerts fire at 50%, 80%, and 100% of the limit, to the addresses you configure on the budget.
  2. Once the limit is hit, the proxy rejects new matching requests with a 402 response that names the budget, scope, and period.

Everything else — running totals, breakdowns, alert dedupe — is the dashboard wrapping around those two behaviors.

Scoping by API key vs. label

Per-key budgets are the simple case: cap total spend on a specific Grepture API key. Useful if you have separate keys for production and staging, or one per customer.

Per-label budgets are the interesting case. If your application sends X-Grepture-Label: feature:summarizer on every summarizer request, you can cap that feature independently of the rest of your traffic. You're not limited by how many API keys you've issued — you're limited by how you classify your own requests. Labels you don't have a budget for are silently ignored. No implicit budget creation, no rules to debug.

You can run several budgets in parallel. A monthly cap on the whole API key, a daily cap on the runaway-summarizer feature, a tighter cap on the experimental endpoint a teammate is testing. The proxy checks every active budget on every request, and the first one to be exhausted wins.

How "exhausted" actually works

Two things to know about enforcement, both designed to keep the proxy fast:

The cap is enforced eventually-consistently. When a request comes in, the proxy checks Redis for the budget's current period spend. If it's already over, reject. If not, forward the request and increment after. So if you have many concurrent in-flight requests at the moment the cap is reached, a few may push the period total slightly over. The overshoot is bounded by concurrent_requests × max_single_request_cost — usually a couple of dollars on a serious workload, basically nothing.

Budgets count spend from creation forward. If you create a $5 monthly cap on the 15th of the month, the budget tracks what gets spent from the 15th onward, not what you already burned earlier that month. This matches what you probably mean by "set a cap right now." Next calendar period, the budget resets and counts from the start of that period like you'd expect. The budget creation form shows you the period-to-date spend on the chosen scope, so you can size the cap with that context in hand.

How "matching" actually works

The proxy keeps a 60-second cache of your team's active budgets and re-fetches when you create, edit, or delete one. On every request it:

  1. Loads your team's budgets from the cache.
  2. Filters to the ones that match this request — same api_settings_id for per-key budgets, same label for per-label budgets.
  3. For each match, checks the Redis spend counter against the limit.
  4. Rejects with 402 if any are exhausted; otherwise forwards and records spend after.

If Redis is unavailable, the proxy fails open. Your traffic flows; budgets stop enforcing temporarily; alerts continue to work on the next cron tick. Same behavior as the existing rate-limit and quota systems.

Alerts at 50%, 80%, 100%

An app-layer cron runs every five minutes. For each enabled budget, it computes the current period spend from your traffic logs (so cost includes provider-specific things like Anthropic prompt-cache reads at 10% and OpenAI cached input at 50%), checks which thresholds have been crossed, and sends one email per newly-crossed threshold.

The dedupe is at the database level — a unique (budget_id, period_key, threshold) primary key — so re-runs of the cron, restarts, transient errors, retried sends — none of them can result in a duplicate email. When the period rolls over (a new day, a new month), the budget resets and the threshold emails fire again as you cross them on the new period.

Email-only for now. Slack and webhook channels are queued for the next iteration.

A worked example

You're running a customer-facing chatbot. You issue one Grepture API key for production, and you tag specific features:

POST /v1/messages
X-Grepture-Target: https://api.anthropic.com/v1/messages
X-Grepture-Label: feature:onboarding-chat

You create three budgets:

  • Per-key, $2,000/monthalert@yourteam.com, oncall@yourteam.com. The all-up safety net. If anything anywhere goes wrong and burns spend, this is the upper bound.
  • Per-label feature:onboarding-chat, $400/monthproduct@yourteam.com. You expect this feature to cost about $300/month. If it runs hot, you want to know.
  • Per-label feature:experimental, $50/dayalice@yourteam.com. Alice is iterating on this all day. If she hits a runaway loop, the proxy stops her before it gets expensive.

Three different windows, three different alert lists, three independent caps. None of them interferes with the others.

Where to find it

/budgets in the dashboard, under the Guardrails group in the nav. Create a budget, watch the spend bar fill as traffic comes in, get an email when you're getting close, and a 402 response in your application if anything ever blows past the cap.

See the pricing page for which plans include budgets, or jump straight in from the dashboard.

[Protect your API traffic today]

Start scanning requests for PII, secrets, and sensitive data in minutes. Free plan available.

Get Started Free