You can't fix what you can't see
You're iterating on a prompt. You change a tool definition. You swap models. The only feedback you get is whatever the model returns — and a billing surprise at the end of the month. You don't see the requests. You don't see redaction firing. You don't catch a runaway tool loop until it's already cost you $40.
The Grepture CLI fixes the dev loop with one command. It opens a local AI gateway on your machine, pipes your traffic through it, and tails every request live in your terminal. This post walks through that workflow end to end — log in, start a session, wire up your SDK, and watch traffic flow.
Step 1: install and log in
Install the CLI from npm:
npm install -g @grepture/cli
Then authenticate:
grepture login
This kicks off a browser-based device-code flow. The CLI prints a short user code, opens your browser, and polls until you approve. Once you do, your token lands in ~/.grepture/credentials (mode 0600) and you're ready.
If you're on a headless box or in CI, skip the browser dance:
grepture login --token <your-api-token>
You only have to do this once per machine. The CLI keeps you signed in until you run grepture logout.
Step 2: start a session
This is the headline:
grepture dev
That single command does several things at once. It creates a session record in Grepture Cloud (so the dashboard knows about it). It boots a local proxy on localhost:8787. And it starts polling for log entries to print into your terminal.
You'll see a header like this:
Grepture Dev Session
─────────────────────────────────────────
Session: d7150823...
Proxy: http://localhost:8787
Target: https://api.openai.com
Dashboard: https://app.grepture.com/sessions/d7150823...
Expires: 5/6/2026, 7:23:14 PM
─────────────────────────────────────────
Waiting for requests...
A few flags are worth knowing:
--port 9000if8787is already taken.--target https://api.anthropic.comto default the upstream to Anthropic instead of OpenAI. You can also override per-request by setting anx-grepture-targetheader.--name "qa-agent"to label the session in the dashboard so you can find it later.
Sessions are time-bounded on purpose. After 5 minutes with no traffic, the CLI marks the session as idle. After 15 minutes with no traffic, it auto-disconnects and cleans up. You can walk away from your laptop without leaving zombie sessions running.
Step 3: wire up your SDK
Pointing your code at the local proxy is a one-line change. For the OpenAI SDK in TypeScript:
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "http://localhost:8787/proxy",
apiKey: process.env.OPENAI_API_KEY,
});
const res = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Hello, world." }],
});
Or via env var, no code change at all:
export OPENAI_BASE_URL=http://localhost:8787/proxy
Same idea for the Anthropic SDK in Python — start grepture dev --target https://api.anthropic.com and point your client base URL at http://localhost:8787/proxy:
from anthropic import Anthropic
client = Anthropic(base_url="http://localhost:8787/proxy")
msg = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=512,
messages=[{"role": "user", "content": "Hello, world."}],
)
Your provider API key still travels with the request — the CLI forwards it as an x-grepture-auth-forward header so the upstream sees the same auth it always did. Grepture authenticates the proxy hop separately with the API key tied to your account.
That's the entire wiring. No SDK rewrite, no separate test environment, no fork in your code.
Step 4: tail traffic live
Now make a request from your app. The terminal running grepture dev will print a row for each one:
9:47:30 PM POST 200 gpt-4o 1,204ms 12,430 tok /v1/chat/completions
9:47:45 PM POST 403 gpt-4o 89ms BLOCKED /v1/chat/completions
9:48:01 PM POST 200 gpt-4o 2,105ms 45,230 tok /v1/chat/completions
Time, method, status, model, latency, tokens (or BLOCKED if a rule rejected the call), and the endpoint path — for every request, as it happens. Yellow status codes mean a rule fired but the request still went through (e.g. PII was redacted before the model saw it). Red codes mean an error or a block.
That live feed is the part that changes how you work. You stop guessing why the model returned something weird because you can see the request that produced it. You stop being surprised by token counts because they're right there next to every call. You catch an agent that's chosen the wrong model the moment it happens, not when the bill arrives.
For deeper inspection — full request and response bodies, redaction details, rule traces — the dashboard URL printed at startup goes straight to the session view at app.grepture.com. The terminal is for the at-a-glance picture. The dashboard is for the autopsy.
If you want to look at traffic without an active session running, grepture logs pulls recent entries from the cloud:
grepture logs --since 1h --status 4xx -n 50
It supports --search, --method, --status (2xx/4xx/5xx), --since (30m, 1h, 1d), and -n for the limit. Handy when you want to dig into something from yesterday's session without leaving the terminal.
For the broader observability picture and why we built the gateway this way, see trace mode and zero-latency observability.
There's more in the CLI
This post stayed focused on the local session workflow because that's the one most people start with. The CLI does more — it scans your codebase for PII and hardcoded secrets, installs a pre-commit hook that blocks bad commits, outputs SARIF for GitHub Code Scanning in CI, and syncs detection rules between your team and the cloud. The introducing the Grepture CLI post covers all of that, and the docs quickstart has the reference. Source is on GitHub.
If you're routing a coding assistant through Grepture rather than your own app, the same grepture dev session works — see Claude Code and Cursor for the per-tool config.