How to Monitor and Log All LLM API Calls in One Place

The problem: scattered providers, scattered logs

You're calling OpenAI for chat, Anthropic for code review, Google for embeddings, and Azure for a fine-tuned model. Each provider has its own logging story — and most of them don't have one at all.

// Four providers. Zero unified view of what's happening.
await openai.chat.completions.create({ model: "gpt-4o", messages });
await anthropic.messages.create({ model: "claude-sonnet-4-5-20250514", messages });
await google.generateContent({ model: "gemini-2.0-flash", contents });
await azure.chat.completions.create({ model: "gpt-4o", messages });

When a prompt produces a bad response, you can't inspect what was sent. When latency spikes, you can't tell which model is slow. When a customer reports a bug, you're guessing — because the actual request and response aren't logged anywhere you can search.

Provider dashboards show aggregate metrics. They don't show you the prompt that caused the hallucination at 3:47 PM yesterday.

The solution: unified logging with Grepture

Grepture is an AI gateway that sits between your application and every LLM provider. Every request flowing through the proxy is automatically logged — request body, response body, token counts, latency, HTTP status, model, and which detection rules matched.

No custom logging middleware. No console.log(JSON.stringify(messages)). Route your traffic through the proxy and every call is captured in a single, searchable traffic log.

Setup in 3 minutes

1. Install the SDK

npm install @grepture/sdk

2. Get your API key

3. Route your AI traffic through the proxy

OpenAI

import OpenAI from "openai";
import { Grepture } from "@grepture/sdk";

const grepture = new Grepture({
  apiKey: process.env.GREPTURE_API_KEY!,
  proxyUrl: "https://proxy.grepture.com",
});

const openai = new OpenAI({
  ...grepture.clientOptions({
    apiKey: process.env.OPENAI_API_KEY!,
    baseURL: "https://api.openai.com/v1",
  }),
});

// Every request is now logged automatically
const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Summarize this document..." }],
});

Anthropic

const anthropic = new OpenAI({
  ...grepture.clientOptions({
    apiKey: process.env.ANTHROPIC_API_KEY!,
    baseURL: "https://api.anthropic.com/v1",
  }),
});

Google Gemini

const gemini = new OpenAI({
  ...grepture.clientOptions({
    apiKey: process.env.GEMINI_API_KEY!,
    baseURL: "https://generativelanguage.googleapis.com/v1beta/openai",
  }),
});

Azure OpenAI

const azure = new OpenAI({
  ...grepture.clientOptions({
    apiKey: process.env.AZURE_OPENAI_API_KEY!,
    baseURL: "https://your-resource.openai.azure.com/openai/deployments/your-deployment",
  }),
});

Any HTTP API

For providers without an OpenAI-compatible SDK, use grepture.fetch():

const response = await grepture.fetch("https://api.example.com/v1/generate", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    Authorization: `Bearer ${process.env.PROVIDER_API_KEY}`,
  },
  body: JSON.stringify({ prompt: "..." }),
});

Just need logging? Use trace mode

If you don't need PII redaction or request blocking — just visibility — use trace mode. Requests go directly to the provider with no proxy hop. The SDK captures metadata asynchronously and sends it to the same traffic log:

const grepture = new Grepture({
  apiKey: process.env.GREPTURE_API_KEY!,
  proxyUrl: "https://proxy.grepture.com",
  mode: "trace",
});

Same dashboard, same traffic log, same conversation tracing — zero latency overhead. This is ideal for latency-sensitive workloads like streaming UIs, real-time agents, and high-throughput pipelines. In serverless environments, call await grepture.flush() before the function exits.

What gets logged

Every request through the proxy (or traced in trace mode) captures:

Request and response bodies — the full prompt and completion, inspectable in the dashboard
Token counts — input tokens, output tokens, and total for every call
Latency — round-trip time from your app to the provider and back
HTTP status — success, rate limit, auth failure, timeout
Model — which model handled the request
Detection rules matched — which Grepture rules fired (PII, secrets, prompt injection)
Request ID — unique identifier for every call, available in code via requestId on the response

You can access detection metadata programmatically too:

const response = await grepture.fetch("https://api.openai.com/v1/chat/completions", {
  method: "POST",
  headers: { Authorization: `Bearer ${process.env.OPENAI_API_KEY}` },
  body: JSON.stringify({ model: "gpt-4o", messages }),
});

console.log(response.requestId);    // "req_abc123..."
console.log(response.rulesApplied); // ["pii-email", "pii-phone"]

Using the traffic log

The dashboard's Traffic Log page is where you'll spend most of your time:

Filterable table — search by status code, HTTP method, model, URL, or time window
Request detail view — click any row to see the full request and response, including headers, body, token counts, latency, and which rules matched
30-day traffic chart — spot trends, spikes, and anomalies at a glance

For organizations that need logging without storing prompt content, zero-data mode (Business+) captures operational metadata — status, tokens, latency, model — without persisting request or response bodies.

Conversation tracing

AI agents and multi-turn conversations make dozens of LLM calls per user interaction. Without grouping, they're just noise in a flat log. Use trace IDs to link related requests into a single timeline.

Set a trace ID at construction

const grepture = new Grepture({
  apiKey: process.env.GREPTURE_API_KEY!,
  proxyUrl: "https://proxy.grepture.com",
  traceId: `session-${crypto.randomUUID().slice(0, 12)}`,
});

const openai = new OpenAI({
  ...grepture.clientOptions({
    apiKey: process.env.OPENAI_API_KEY!,
    baseURL: "https://api.openai.com/v1",
  }),
});

// Both calls are grouped under the same trace
await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Plan the migration steps." }],
});

await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Execute step 1..." }],
});

Change trace mid-session

When a new conversation starts or an agent begins a separate run, switch the trace:

// New user session — new trace
grepture.setTraceId(`session-${crypto.randomUUID().slice(0, 12)}`);

// Stop tracing
grepture.setTraceId(undefined);

In the dashboard's Traces tab, you'll see all requests grouped by trace with a combined timeline, total token count, and aggregate latency. This turns a wall of individual requests into a readable conversation history.

Framework integration

LangChain

Wrap the underlying OpenAI client that LangChain uses:

import { ChatOpenAI } from "@langchain/openai";
import { Grepture } from "@grepture/sdk";

const grepture = new Grepture({
  apiKey: process.env.GREPTURE_API_KEY!,
  proxyUrl: "https://proxy.grepture.com",
});

const model = new ChatOpenAI({
  modelName: "gpt-4o",
  ...grepture.clientOptions({
    apiKey: process.env.OPENAI_API_KEY!,
    baseURL: "https://api.openai.com/v1",
  }),
});

Every LangChain call — chains, agents, tools — now flows through the proxy and appears in your traffic log.

Vercel AI SDK

import { createOpenAI } from "@ai-sdk/openai";
import { Grepture } from "@grepture/sdk";

const grepture = new Grepture({
  apiKey: process.env.GREPTURE_API_KEY!,
  proxyUrl: "https://proxy.grepture.com",
});

const openai = createOpenAI({
  ...grepture.clientOptions({
    apiKey: process.env.OPENAI_API_KEY!,
    baseURL: "https://api.openai.com/v1",
  }),
});

Next steps

View pricing — free for up to 1,000 requests/month
Read the SDK docs — full reference for clientOptions() and grepture.fetch()
Track AI API costs — per-request cost attribution across all providers
Redact PII from API calls — stop sending sensitive data to LLMs