How to Track and Control AI API Costs Across Providers

The problem: AI spend is invisible

You're calling OpenAI for chat, Anthropic for code generation, and Google for embeddings. Each provider has its own billing dashboard, its own token counting, and its own pricing model. By the time the monthly invoice arrives, you have no idea which feature, endpoint, or model is responsible for the cost.

// Three providers, three billing dashboards, zero unified view
await openai.chat.completions.create({ model: "gpt-4o", messages });
await anthropic.messages.create({ model: "claude-sonnet-4-5-20250514", messages });
await google.generateContent({ model: "gemini-2.0-flash", contents });

Provider dashboards show aggregate spend, not per-request cost. You can't answer basic questions: How much does the summarization feature cost per call? Which model is cheapest for this task? How much did AI cost us this week?

The solution: per-request cost tracking with Grepture

Grepture is an AI gateway that sits between your application and every LLM provider. Every request flowing through the proxy gets automatic token counting and cost attribution — per request, per model, per endpoint.

No billing API polling. No custom logging code. Route your traffic through the proxy and cost tracking is immediate.

Setup in 3 minutes

1. Install the SDK

npm install @grepture/sdk

2. Get your API key

3. Route your AI traffic through the proxy

OpenAI

import OpenAI from "openai";
import { Grepture } from "@grepture/sdk";

const grepture = new Grepture({
  apiKey: process.env.GREPTURE_API_KEY!,
  proxyUrl: "https://proxy.grepture.com",
});

const openai = new OpenAI({
  ...grepture.clientOptions({
    apiKey: process.env.OPENAI_API_KEY!,
    baseURL: "https://api.openai.com/v1",
  }),
});

// Every request now has cost tracking
const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Summarize this document..." }],
});

Anthropic

Anthropic's SDK is OpenAI-compatible when using the messages API through the proxy:

const anthropic = new OpenAI({
  ...grepture.clientOptions({
    apiKey: process.env.ANTHROPIC_API_KEY!,
    baseURL: "https://api.anthropic.com/v1",
  }),
});

Google Gemini

const gemini = new OpenAI({
  ...grepture.clientOptions({
    apiKey: process.env.GEMINI_API_KEY!,
    baseURL: "https://generativelanguage.googleapis.com/v1beta/openai",
  }),
});

Azure OpenAI

const azure = new OpenAI({
  ...grepture.clientOptions({
    apiKey: process.env.AZURE_OPENAI_API_KEY!,
    baseURL: "https://your-resource.openai.azure.com/openai/deployments/your-deployment",
  }),
});

Any HTTP API

For providers without an OpenAI-compatible SDK, use grepture.fetch():

const response = await grepture.fetch("https://api.example.com/v1/generate", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    Authorization: `Bearer ${process.env.PROVIDER_API_KEY}`,
  },
  body: JSON.stringify({ prompt: "..." }),
});

Latency-sensitive? Use trace mode

If your workload is latency-sensitive and you only need cost tracking (not PII redaction or blocking), use trace mode. Requests go directly to the provider — the SDK captures cost and token data asynchronously:

const grepture = new Grepture({
  apiKey: process.env.GREPTURE_API_KEY!,
  proxyUrl: "https://proxy.grepture.com",
  mode: "trace",
});

// Same clientOptions() API — requests go directly to the provider
const openai = new OpenAI({
  ...grepture.clientOptions({
    apiKey: process.env.OPENAI_API_KEY!,
    baseURL: "https://api.openai.com/v1",
  }),
});

Same cost dashboard, zero proxy overhead. In serverless environments, call await grepture.flush() before the function exits.

What you get

Once traffic flows through the proxy (or is traced in trace mode), the dashboard shows:

Per-request token counts — input tokens, output tokens, and total for every call
Per-request cost — calculated using each provider's model-specific pricing
Cost by model — see which models consume the most budget
Cost by endpoint — attribute spend to specific features or services
Spend trends over time — daily, weekly, and monthly views
Filterable traffic log — search by model, cost range, status, or time window

Tracing multi-step costs

AI agents and RAG pipelines make multiple LLM calls per user request. Use conversation tracing to group related calls and see the total cost of a workflow:

const grepture = new Grepture({
  apiKey: process.env.GREPTURE_API_KEY!,
  proxyUrl: "https://proxy.grepture.com",
  traceId: `workflow-${crypto.randomUUID().slice(0, 12)}`,
});

// All calls under this trace are grouped in the dashboard
const plan = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Plan the steps to..." }],
});

const result = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: `Execute step 1: ${plan.choices[0].message.content}` }],
});

In the dashboard's Traces tab, you'll see both requests grouped with a combined cost, token count, and step-by-step timeline. This tells you exactly how much a single user workflow costs end to end.

Cost optimization tips

Once you have visibility, you can act on it:

Right-size your models — if gpt-4o and gpt-4o-mini produce similar quality for a task, switch to mini. The dashboard shows you where this trade-off makes sense.
Spot runaway endpoints — a feature making 10x more calls than expected shows up immediately in per-endpoint cost views.
Track cost per user action — use trace IDs to attribute cost to user-facing features, not just API calls.
Compare providers — route the same task through different providers and compare cost/quality in the traffic log.

Next steps

View pricing — free for up to 1,000 requests/month
Read the SDK docs — full reference for clientOptions() and grepture.fetch()
Set up observability — log and inspect every AI request alongside cost data