How to Track and Control AI API Costs Across Providers
Get per-request cost attribution across OpenAI, Anthropic, Google, and Azure. See where your tokens go, which models cost the most, and where to optimize — with a single proxy.
The problem: AI spend is invisible
You're calling OpenAI for chat, Anthropic for code generation, and Google for embeddings. Each provider has its own billing dashboard, its own token counting, and its own pricing model. By the time the monthly invoice arrives, you have no idea which feature, endpoint, or model is responsible for the cost.
// Three providers, three billing dashboards, zero unified view
await openai.chat.completions.create({ model: "gpt-4o", messages });
await anthropic.messages.create({ model: "claude-sonnet-4-5-20250514", messages });
await google.generateContent({ model: "gemini-2.0-flash", contents });
Provider dashboards show aggregate spend, not per-request cost. You can't answer basic questions: How much does the summarization feature cost per call? Which model is cheapest for this task? How much did AI cost us this week?
The solution: per-request cost tracking with Grepture
Grepture is an AI gateway that sits between your application and every LLM provider. Every request flowing through the proxy gets automatic token counting and cost attribution — per request, per model, per endpoint.
No billing API polling. No custom logging code. Route your traffic through the proxy and cost tracking is immediate.
Setup in 3 minutes
1. Install the SDK
npm install @grepture/sdk
2. Get your API key
Sign up at grepture.com/en/pricing — the free plan includes 1,000 requests/month. Copy your API key from the dashboard.
3. Route your AI traffic through the proxy
OpenAI
import OpenAI from "openai";
import { Grepture } from "@grepture/sdk";
const grepture = new Grepture({
apiKey: process.env.GREPTURE_API_KEY!,
proxyUrl: "https://proxy.grepture.com",
});
const openai = new OpenAI({
...grepture.clientOptions({
apiKey: process.env.OPENAI_API_KEY!,
baseURL: "https://api.openai.com/v1",
}),
});
// Every request now has cost tracking
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Summarize this document..." }],
});
Anthropic
Anthropic's SDK is OpenAI-compatible when using the messages API through the proxy:
const anthropic = new OpenAI({
...grepture.clientOptions({
apiKey: process.env.ANTHROPIC_API_KEY!,
baseURL: "https://api.anthropic.com/v1",
}),
});
Google Gemini
const gemini = new OpenAI({
...grepture.clientOptions({
apiKey: process.env.GEMINI_API_KEY!,
baseURL: "https://generativelanguage.googleapis.com/v1beta/openai",
}),
});
Azure OpenAI
const azure = new OpenAI({
...grepture.clientOptions({
apiKey: process.env.AZURE_OPENAI_API_KEY!,
baseURL: "https://your-resource.openai.azure.com/openai/deployments/your-deployment",
}),
});
Any HTTP API
For providers without an OpenAI-compatible SDK, use grepture.fetch():
const response = await grepture.fetch("https://api.example.com/v1/generate", {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${process.env.PROVIDER_API_KEY}`,
},
body: JSON.stringify({ prompt: "..." }),
});
What you get
Once traffic flows through the proxy, the dashboard shows:
- Per-request token counts — input tokens, output tokens, and total for every call
- Per-request cost — calculated using each provider's model-specific pricing
- Cost by model — see which models consume the most budget
- Cost by endpoint — attribute spend to specific features or services
- Spend trends over time — daily, weekly, and monthly views
- Filterable traffic log — search by model, cost range, status, or time window
Tracing multi-step costs
AI agents and RAG pipelines make multiple LLM calls per user request. Use conversation tracing to group related calls and see the total cost of a workflow:
const grepture = new Grepture({
apiKey: process.env.GREPTURE_API_KEY!,
proxyUrl: "https://proxy.grepture.com",
traceId: `workflow-${crypto.randomUUID().slice(0, 12)}`,
});
// All calls under this trace are grouped in the dashboard
const plan = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Plan the steps to..." }],
});
const result = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: `Execute step 1: ${plan.choices[0].message.content}` }],
});
In the dashboard's Traces tab, you'll see both requests grouped with a combined cost, token count, and step-by-step timeline. This tells you exactly how much a single user workflow costs end to end.
Cost optimization tips
Once you have visibility, you can act on it:
- Right-size your models — if
gpt-4oandgpt-4o-miniproduce similar quality for a task, switch to mini. The dashboard shows you where this trade-off makes sense. - Spot runaway endpoints — a feature making 10x more calls than expected shows up immediately in per-endpoint cost views.
- Track cost per user action — use trace IDs to attribute cost to user-facing features, not just API calls.
- Compare providers — route the same task through different providers and compare cost/quality in the traffic log.
Next steps
- View pricing — free for up to 1,000 requests/month
- Read the SDK docs — full reference for
clientOptions()andgrepture.fetch() - Set up observability — log and inspect every AI request alongside cost data