How to Track and Control AI API Costs Across Providers

Get per-request cost attribution across OpenAI, Anthropic, Google, and Azure. See where your tokens go, which models cost the most, and where to optimize — with a single proxy.

The problem: AI spend is invisible

You're calling OpenAI for chat, Anthropic for code generation, and Google for embeddings. Each provider has its own billing dashboard, its own token counting, and its own pricing model. By the time the monthly invoice arrives, you have no idea which feature, endpoint, or model is responsible for the cost.

// Three providers, three billing dashboards, zero unified view
await openai.chat.completions.create({ model: "gpt-4o", messages });
await anthropic.messages.create({ model: "claude-sonnet-4-5-20250514", messages });
await google.generateContent({ model: "gemini-2.0-flash", contents });

Provider dashboards show aggregate spend, not per-request cost. You can't answer basic questions: How much does the summarization feature cost per call? Which model is cheapest for this task? How much did AI cost us this week?

The solution: per-request cost tracking with Grepture

Grepture is an AI gateway that sits between your application and every LLM provider. Every request flowing through the proxy gets automatic token counting and cost attribution — per request, per model, per endpoint.

No billing API polling. No custom logging code. Route your traffic through the proxy and cost tracking is immediate.

Setup in 3 minutes

1. Install the SDK

npm install @grepture/sdk

2. Get your API key

Sign up at grepture.com/en/pricing — the free plan includes 1,000 requests/month. Copy your API key from the dashboard.

3. Route your AI traffic through the proxy

OpenAI

import OpenAI from "openai";
import { Grepture } from "@grepture/sdk";

const grepture = new Grepture({
  apiKey: process.env.GREPTURE_API_KEY!,
  proxyUrl: "https://proxy.grepture.com",
});

const openai = new OpenAI({
  ...grepture.clientOptions({
    apiKey: process.env.OPENAI_API_KEY!,
    baseURL: "https://api.openai.com/v1",
  }),
});

// Every request now has cost tracking
const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Summarize this document..." }],
});

Anthropic

Anthropic's SDK is OpenAI-compatible when using the messages API through the proxy:

const anthropic = new OpenAI({
  ...grepture.clientOptions({
    apiKey: process.env.ANTHROPIC_API_KEY!,
    baseURL: "https://api.anthropic.com/v1",
  }),
});

Google Gemini

const gemini = new OpenAI({
  ...grepture.clientOptions({
    apiKey: process.env.GEMINI_API_KEY!,
    baseURL: "https://generativelanguage.googleapis.com/v1beta/openai",
  }),
});

Azure OpenAI

const azure = new OpenAI({
  ...grepture.clientOptions({
    apiKey: process.env.AZURE_OPENAI_API_KEY!,
    baseURL: "https://your-resource.openai.azure.com/openai/deployments/your-deployment",
  }),
});

Any HTTP API

For providers without an OpenAI-compatible SDK, use grepture.fetch():

const response = await grepture.fetch("https://api.example.com/v1/generate", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    Authorization: `Bearer ${process.env.PROVIDER_API_KEY}`,
  },
  body: JSON.stringify({ prompt: "..." }),
});

What you get

Once traffic flows through the proxy, the dashboard shows:

  • Per-request token counts — input tokens, output tokens, and total for every call
  • Per-request cost — calculated using each provider's model-specific pricing
  • Cost by model — see which models consume the most budget
  • Cost by endpoint — attribute spend to specific features or services
  • Spend trends over time — daily, weekly, and monthly views
  • Filterable traffic log — search by model, cost range, status, or time window

Tracing multi-step costs

AI agents and RAG pipelines make multiple LLM calls per user request. Use conversation tracing to group related calls and see the total cost of a workflow:

const grepture = new Grepture({
  apiKey: process.env.GREPTURE_API_KEY!,
  proxyUrl: "https://proxy.grepture.com",
  traceId: `workflow-${crypto.randomUUID().slice(0, 12)}`,
});

// All calls under this trace are grouped in the dashboard
const plan = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Plan the steps to..." }],
});

const result = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: `Execute step 1: ${plan.choices[0].message.content}` }],
});

In the dashboard's Traces tab, you'll see both requests grouped with a combined cost, token count, and step-by-step timeline. This tells you exactly how much a single user workflow costs end to end.

Cost optimization tips

Once you have visibility, you can act on it:

  1. Right-size your models — if gpt-4o and gpt-4o-mini produce similar quality for a task, switch to mini. The dashboard shows you where this trade-off makes sense.
  2. Spot runaway endpoints — a feature making 10x more calls than expected shows up immediately in per-endpoint cost views.
  3. Track cost per user action — use trace IDs to attribute cost to user-facing features, not just API calls.
  4. Compare providers — route the same task through different providers and compare cost/quality in the traffic log.

Next steps