Docs›Budgets
Budgets
Hard spend caps on Grepture API keys and labels. Email alerts at 50/80/100% and HTTP 402 enforcement at the proxy once a cap is hit.
Overview
A budget is a spend rule with a scope, a window, and a limit. The proxy tracks how much that scope spends within the current period and rejects new matching requests with HTTP 402 Payment Required once the limit is reached. Email alerts fire at 50%, 80%, and 100% of the limit.
Find budgets at Guardrails → Budgets in the dashboard, or visit /budgets directly.
Scope
Every budget attaches to one of two scope types:
| Scope | Matches | Use when |
|---|---|---|
| API key | All requests authenticated with a given Grepture API key | You want a hard ceiling on total spend for an environment or customer |
| Label | Requests carrying X-Grepture-Label: <value> | You want to cap specific features, flows, or experiments independently |
Per-label budgets only match traffic where the request includes the matching label header. Unlabeled traffic and traffic with non-matching labels is silently ignored by the budget — no implicit budget creation.
Setting a label on a request is just one header:
POST /v1/messages
X-Grepture-Target: https://api.anthropic.com/v1/messages
X-Grepture-Label: feature:summarizer
You can run multiple budgets at the same time — an overall per-key cap plus individual per-label caps. The proxy checks every active budget on every request; the first one to be exhausted wins.
Window
Two windows are available:
- Monthly — resets at 00:00 UTC on the 1st of the calendar month
- Daily — resets at 00:00 UTC
Each budget has exactly one window. Need both daily and monthly limits on the same scope? Create two budgets.
Creating a budget
- Go to Guardrails → Budgets in the dashboard
- Click New budget
- Pick a scope (API key or label)
- Pick a window (daily or monthly)
- Set the limit in dollars
- Add alert email addresses (comma-separated)
When creating a budget, the form shows you the period-to-date spend for the chosen scope. This is informational — it tells you what's already been spent so you can size the cap appropriately. Budgets count spend from the moment of creation, so a fresh budget always starts at $0.00 regardless of what came before.
You can disable, edit, or delete a budget at any time. Changes take effect within 60 seconds (the proxy caches active budget definitions for that window).
The 402 response
When the cap is hit, the proxy returns HTTP 402 with a JSON body identifying the budget:
{
"error": "Budget exceeded",
"budget_id": "1932f5fe-8235-4c45-9588-1371b3f9c27b",
"scope": { "type": "label", "value": "feature:summarizer" },
"limit_cents": 1000,
"period": "monthly",
"period_key": "2026-05"
}
The response also carries an X-Grepture-Budget-Status: exceeded header so client SDKs can detect budget rejections without parsing the body.
Once the period rolls over, the budget resets and the proxy starts accepting requests again automatically. No action needed.
Alerts
An app-layer cron runs every five minutes. For every active budget, it computes the current period spend from your traffic logs (honoring per-team pricing overrides set under Settings → Models and provider-specific caching discounts like Anthropic cache reads at 10%). When the percentage crosses 50%, 80%, or 100% for the first time in the current period, an email goes out to the addresses listed on the budget.
Each threshold fires once per period. The dedupe is at the database level — a (budget_id, period_key, threshold) primary key — so re-runs of the cron, retries, or transient errors can't produce duplicate emails.
When the period rolls over, the threshold counters reset along with the budget.
Email-only for now. Slack and webhook channels are on the roadmap.
How enforcement works
Worth knowing two details about how the proxy decides whether to allow a request:
Eventually consistent. The proxy keeps the running per-period spend in Redis as a micro-cent counter. On every request, it reads the counter, compares against the limit, and either allows or rejects. After a successful forward, it increments the counter by the actual cost of the request.
This means concurrent requests in flight at the moment the cap is reached can push the period total slightly over the limit. The overshoot is bounded by concurrent_requests × max_single_request_cost. For typical workloads it's a couple of dollars at most — vastly cheaper than the latency you'd pay for strict serialization on every request.
Fail-open on Redis outage. If the proxy can't reach Redis, budgets stop enforcing and traffic continues to flow. This matches how the existing rate-limit and quota systems behave. Alerts continue to work on the next cron tick because they read from the database, not Redis.
Counts from creation forward. Budgets count spend from the moment they were created, not from the start of the calendar period. If you create a $5 monthly cap on the 15th of the month, the budget tracks what gets spent from the 15th onward — not what happened earlier that month. When the period rolls over, this clipping no longer applies and the budget tracks the full new period as expected.
Cost accuracy
The cost added to a budget per request is computed from the upstream provider's reported token counts:
- OpenAI —
prompt_tokensandcompletion_tokens, withprompt_tokens_details.cached_tokensbilled at 50% of the input rate. - Anthropic —
input_tokensat the regular input rate,cache_read_input_tokensat 10% of input,cache_creation_input_tokensat 125% of input,output_tokensat the output rate. - Gemini —
promptTokenCount,candidatesTokenCount, andcachedContentTokenCountat the discounted cache rate.
Per-team pricing overrides set in Settings → Models are applied by the alert cron, so if you've negotiated custom rates with a provider, alerts and dashboard spend reflect them.
Examples
A typical setup for a SaaS product:
- Account-wide safety net — per-key, $2,000/month, alerts to oncall and finance. Catches anything weird happening anywhere.
- Per-feature cap — per-label
feature:onboarding-chat, $400/month, alerts to the product owner. Lets the team see when the feature is running hot. - Per-experiment cap — per-label
feature:experimental, $50/day, alerts to the engineer running it. Cheap insurance against runaway loops during development.
The three budgets don't interfere with each other. A request gets evaluated against every matching budget; the strictest one wins.
Related
- Configuration — proxy setup,
X-Grepture-Labeland other headers - Dashboard — traffic logs, analytics, and where spend numbers come from