Prompt Management: Version Control for Your LLM Prompts

Prompts are a new primitive

Prompts aren't config. They aren't copy. They're a new kind of artifact that didn't exist before LLMs, and they need their own tooling. A prompt controls how your AI behaves — its tone, its reasoning, its guardrails. Change a prompt and you change the product. That makes prompts closer to code than to content, but they don't fit neatly into your existing development workflow either.

We built prompt management into Grepture because we think prompts deserve the same things code already has: versioning, rollback, visibility into what's running in production, and a way for your whole team to collaborate on them without going through a deploy cycle every time.

How Grepture prompt management works

Grepture's prompt management treats prompts as a managed resource — versioned, served through the proxy, and fully integrated with traffic logging. Here's the lifecycle:

Create a prompt in the dashboard with a slug (e.g., standard-support-replies)
Edit the draft — write your messages, define variables, test with the preview panel
Publish the draft as an immutable numbered version (v1, v2, v3...)
Activate the version you want the proxy to serve
Resolve at request time — the proxy injects the right messages before forwarding to the LLM

No redeploy. No code change. Your application code references a slug, and the proxy handles the rest.

Drafts and versioning

The versioning model is deliberately simple — it mirrors how you'd want to work with any content that goes to production.

Drafts

Every prompt has exactly one draft. It's your working copy. Edit it as many times as you want, preview it with test variables, share it with teammates for review. A draft is never served in production unless you explicitly request it (useful for testing, which we'll cover later).

Publishing

When you're happy with a draft, publish it. This snapshots the current draft as an immutable numbered version — v1, v2, v3, and so on. Published versions can't be edited. This is intentional: you need to be able to trust that what you published is what's being served.

The first time you publish a prompt, that version is automatically activated. This is a convenience — you don't want to publish v1 and then wonder why nothing changed because you forgot to activate it.

Activating and rolling back

The active version is what the proxy serves by default. You can activate any published version at any time. This means rollback is instant: if v5 is causing problems, activate v4. No deploy, no revert commit, no waiting. Your next request gets the previous version.

This is the core value proposition of managed prompts. When you separate prompt content from application code, changing or reverting a prompt becomes a dashboard action instead of a deployment.

Variables and template logic

Static prompts are useful, but most real-world prompts need runtime data. Grepture uses Handlebars-style templates that support three patterns.

Variable interpolation

The simplest case — inject a value at request time:

You are a customer support agent for {{company_name}}.
The customer's name is {{customer_name}}.
Their issue: {{issue}}

Variables are passed when you use the prompt (via the SDK or headers), and the proxy replaces them before forwarding the request.

Conditional blocks

Use {{#if}} to include or exclude sections based on whether a variable is present:

{{#if premium}}
This customer is on a premium plan. Prioritize their request
and offer proactive solutions.
{{else}}
This customer is on the free tier. Be helpful but direct
them to documentation for complex issues.
{{/if}}

List iteration

Use {{#each}} to iterate over array variables:

The customer has the following open tickets:
{{#each open_tickets}}
- {{this}}
{{/each}}

Variable schema

Each prompt can define a variable schema — the name, type, and default value for each variable. This serves as documentation for anyone using the prompt and powers the test panel in the editor. You can preview exactly how your prompt will resolve with different variable combinations before publishing.

SDK integration

The Grepture SDK provides three patterns for using managed prompts, each giving you a different level of control.

Server-side resolution with `prompt.use()`

This is the simplest approach and what we recommend for most use cases. You reference the prompt by slug, the SDK sets the right headers, and the proxy resolves the template and injects the messages before forwarding to the LLM.

import Grepture from "@grepture/sdk";
import OpenAI from "openai";

const grepture = new Grepture({
  apiKey: process.env.GREPTURE_API_KEY,
  proxyUrl: "https://proxy.grepture.com",
});

const openai = new OpenAI({
  ...grepture.clientOptions({
    apiKey: process.env.OPENAI_API_KEY,
    baseURL: "https://api.openai.com/v1",
  }),
});

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: grepture.prompt.use("standard-support-replies", {
    variables: {
      customer_name: ticket.customerName,
      issue: ticket.description,
      tone: "friendly",
    },
  }),
});

Under the hood, prompt.use() returns a special marker array. When the SDK's wrapped fetch detects this marker, it sets X-Grepture-Prompt and X-Grepture-Vars headers on the outgoing request. The proxy reads these headers, fetches the prompt template, resolves variables, and replaces the request's messages array with the resolved content. Your LLM provider receives a normal chat completion request — it has no idea a prompt management system is involved.

Client-side resolution with `prompt.assemble()`

Sometimes you want to inspect or modify the resolved messages before sending them to the LLM. prompt.assemble() fetches the prompt from the proxy, resolves variables, and returns the final messages array for you to use however you want.

const { messages, metadata } = await grepture.prompt.assemble(
  "standard-support-replies",
  {
    variables: {
      customer_name: "Sarah",
      issue: "billing question",
      tone: "formal",
    },
  },
);

// Append additional context before sending
messages.push({
  role: "user",
  content: additionalContext,
});

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages,
});

This is useful when you need to add extra messages (like retrieved context from a RAG pipeline), merge prompts from different sources, or log the exact messages being sent.

Fetch and reuse with `prompt.get()` and `prompt.resolve()`

For batch processing or high-throughput scenarios, you can fetch the raw template once and resolve it multiple times with different variable sets — avoiding a network call per resolution.

// Fetch the template once
const template = await grepture.prompt.get("standard-support-replies");

// Resolve for each ticket in a batch
const results = await Promise.all(
  tickets.map(async (ticket) => {
    const messages = grepture.prompt.resolve(template.messages, {
      customer_name: ticket.customerName,
      issue: ticket.description,
      tone: ticket.priority === "high" ? "empathetic" : "friendly",
    });

    return openai.chat.completions.create({
      model: "gpt-4o",
      messages,
    });
  }),
);

prompt.get() returns the messages with {{handlebars}} placeholders intact, plus the variable schema. prompt.resolve() is a pure function — no network calls, no side effects. This combination is ideal when you're processing hundreds of items and don't want the overhead of a proxy round-trip for each one.

Pinning a version

All three methods accept an optional version parameter. By default, the proxy serves the active version. But you can pin to a specific version or request the draft:

// Pin to version 3
grepture.prompt.use("standard-support-replies", { version: 3 });

// Use the draft (for testing)
grepture.prompt.use("standard-support-replies", { version: "draft" });

This is useful for testing a new version before activating it, or for keeping a specific workflow on a known-good version while you iterate on the active one.

How the proxy resolves prompts

If you're curious about the mechanics, here's what happens when a request with a managed prompt hits the proxy.

The proxy reads two headers:

X-Grepture-Prompt — the prompt slug, optionally with a version reference (e.g., standard-support-replies or standard-support-replies@3 or standard-support-replies@draft)
X-Grepture-Vars — a JSON string containing the variables to inject

The proxy fetches the prompt template (from Redis cache or the database), resolves the Handlebars template with the provided variables, and replaces the messages array in the request body. The resolved request is then forwarded to the target LLM provider.

Caching

Published prompt versions are cached in Redis with a 60-second TTL. Since published versions are immutable, the cache is always consistent — the only cost is a brief delay when you activate a new version. Draft requests bypass the cache entirely and always fetch from the database, so you see changes immediately when testing.

Skipping rules

Managed prompts can optionally set skip_rules to bypass the security pipeline (PII redaction, prompt injection detection, etc.) for their content. The rationale: if you wrote the prompt and control its content, running it through injection detection is unnecessary overhead. User-provided variables still flow through the normal pipeline. This is opt-in and off by default.

Traffic visibility

Every request that uses a managed prompt records the prompt_id and prompt_version in the traffic log. This gives you full traceability — for any logged request, you can see exactly which prompt version produced that response.

In the dashboard, this shows up in several places:

Traffic table — a prompt column shows the prompt name for requests that used managed prompts. You can filter the traffic log by prompt to see all requests for a specific prompt.
Traffic detail — the detail view for any request shows the prompt name and version, with a link directly to that version in the prompt editor.
Prompt list — the prompts overview page shows usage counts, so you can see at a glance how much traffic each prompt is handling.
Prompt editor — a "View traffic" link takes you to the traffic log filtered to that prompt, so you can jump from editing a prompt to seeing how it's performing in production.

This closes the loop that's missing when prompts live in code. You can trace a degraded response back to a specific prompt version, compare response quality across versions, and make informed decisions about whether to roll back or iterate.

Getting started

Setting up prompt management takes a few minutes.

1. Create a prompt — go to the Prompts page in the Grepture dashboard, create a new prompt with a slug, write your messages, and publish.

2. Install the SDK and point your LLM calls through the proxy:

npm install @grepture/sdk

import Grepture from "@grepture/sdk";

const grepture = new Grepture({
  apiKey: process.env.GREPTURE_API_KEY,
  proxyUrl: "https://proxy.grepture.com",
});

// Use with any OpenAI-compatible client
const openai = new OpenAI({
  ...grepture.clientOptions({
    apiKey: process.env.OPENAI_API_KEY,
    baseURL: "https://api.openai.com/v1",
  }),
});

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: grepture.prompt.use("your-prompt-slug", {
    variables: { name: "World" },
  }),
});

3. Check the traffic log — your request will appear with the prompt name and version, so you can verify everything is wired up correctly.

If you're already using Grepture for PII redaction or traffic logging, prompt management works with your existing setup — same proxy, same SDK, same dashboard. And if you're new to Grepture, the quickstart guide will get you from zero to a working proxy in about five minutes.

Prompts are too important to live as strings in your codebase. Give them the same versioning, visibility, and control you give the rest of your production infrastructure. Try Grepture and see the difference.