Trace Mode — Full Observability Without the Proxy Hop
Grepture now supports a trace-only mode where requests go directly to your LLM provider. The SDK captures tokens, costs, and latency async — same dashboard visibility, zero latency overhead.
The proxy tradeoff
Grepture's proxy gives you control. PII redaction, request blocking, prompt management — all handled before traffic reaches your LLM provider. But that control comes with a network hop. Your request goes to Grepture first, then to OpenAI or Anthropic, and the response comes back the same way.
For most workloads, the added latency is negligible. But we kept hearing from teams running latency-sensitive workloads — real-time agents, streaming UIs, high-throughput pipelines — who wanted the visibility without the detour. They didn't need rules or redaction. They just needed to see what was happening: which models, how many tokens, what it cost, how long it took.
So we built trace mode.
What trace mode is
One config change. Set mode: "trace" and requests go directly to your LLM provider — OpenAI, Anthropic, Google, whoever. No proxy in the hot path.
After the response arrives, the SDK captures metadata — model, tokens, latency, status code, cost — and batches it asynchronously to Grepture. Fire-and-forget. The trace call never blocks your application, and if it fails, your production code doesn't notice.
The dashboard stays the same. Traffic log, cost tracking, conversation tracing — all still work. You get the same visibility you'd get with proxy mode, minus the network hop.
The config change
The API surface is identical to proxy mode. You change one line:
import { Grepture } from "@grepture/sdk";
import OpenAI from "openai";
const grepture = new Grepture({
apiKey: process.env.GREPTURE_API_KEY!,
proxyUrl: "https://proxy.grepture.com",
mode: "trace", // ← direct to provider, traces sent async
});
const openai = new OpenAI({
...grepture.clientOptions({
apiKey: process.env.OPENAI_API_KEY!,
baseURL: "https://api.openai.com/v1",
}),
});
// This request goes directly to OpenAI — no proxy hop
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Summarize this quarter's results." }],
});
The clientOptions() call returns the same shape either way. In proxy mode, it rewrites the base URL to route through Grepture. In trace mode, it keeps the provider's URL and wraps fetch to capture metadata after the response.
If you're running in a serverless environment — Lambda, Vercel Functions, Cloudflare Workers — call flush() before the function exits to make sure pending traces are sent:
await grepture.flush();
When to use which mode
Both modes exist for good reasons. The right choice depends on what you need from Grepture.
Trace mode is for
- Observability and cost tracking without any latency overhead
- Latency-sensitive production workloads — streaming responses, real-time agents, voice interfaces
- High-throughput pipelines where shaving milliseconds off every call matters at scale
- Getting started — you want visibility into your AI traffic before committing to rules and policies
Proxy mode is for
- PII redaction — catching sensitive data before it reaches the model, with 50+ built-in detection patterns
- Request blocking — stopping requests that match rules you define
- Prompt management — resolving prompts server-side with
prompt.use()so your application doesn't need to bundle prompt templates - Mask and restore — reversible redaction that keeps LLM responses useful while protecting PII
If you need to inspect or modify the request in flight, you need proxy mode. If you just need to see what happened after the fact, trace mode gives you the same dashboard with zero overhead.
How it works under the hood
The SDK handles two response types differently.
Buffered responses. The SDK clones the response object, reads the body in the background, and extracts usage metadata (model, tokens, status) from the JSON. Your application gets the original response immediately — the clone and extraction happen asynchronously after the response is returned to your code.
Streaming responses. This is where it gets interesting. The SDK wraps the response stream with a TransformStream that passes every chunk through to your application immediately — zero buffering. In parallel, it captures only the last few SSE data: lines, which is where providers include token counts in their streaming protocol. When the stream ends with [DONE], the SDK extracts usage from those final chunks and sends the trace. Your application never waits for this.
Under both paths, a TraceSender batches up to 25 entries and flushes every 2 seconds — or immediately when the batch is full. Calling flush() sends whatever is in the buffer right now. All of this is best-effort — if a trace request fails, it's silently dropped. Production traffic is never interrupted by trace delivery.
What you see in the dashboard
Trace entries show up in the same traffic log as proxy entries. You can tell them apart by the source indicator — a shield icon for proxy requests, an eye icon for trace requests.
The same columns are there: model, tokens, cost, duration, status. Trace entries work with conversation tracing (group related calls by trace ID), grep search across request and response bodies, and all existing filters.
The one difference: trace entries don't show rules_applied, because no rules ran. The request went straight to the provider.
Try it
One config change to get started:
const grepture = new Grepture({
apiKey: process.env.GREPTURE_API_KEY!,
proxyUrl: "https://proxy.grepture.com",
mode: "trace",
});
Start with trace mode for visibility. See which models your team is using, what they cost, and where latency is going. When you're ready to add PII redaction or request rules, switching to proxy mode is the same one-line change — remove mode: "trace" and traffic routes through the proxy automatically.
- How Grepture works — architecture overview of both modes
- Tracking AI API costs — set up cost tracking with trace IDs
- Getting started guide — full SDK configuration reference