How to Version and Manage LLM Prompts Server-Side

Stop hardcoding prompts. Store, version, and deploy prompt templates from a dashboard — resolve them at request time with zero redeploys. Handlebars templating, draft/publish workflow.

The problem: prompts buried in code

Your system prompts live in string literals scattered across your codebase. Changing a single word in a prompt means opening a PR, waiting for CI, and deploying. There's no version history, no rollback, and no way for a product manager or prompt engineer to iterate without developer involvement.

// Hardcoded in your codebase — changing this requires a deploy
const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [
    {
      role: "system",
      content: `You are a friendly support agent for Acme Corp.
      Respond in a helpful, concise tone. If the customer is upset,
      acknowledge their frustration before solving the problem.`,
    },
    {
      role: "user",
      content: ticket.text,
    },
  ],
});

Want to tweak the tone? A/B test a new instruction? Roll back after a bad change? You're stuck with code deploys for all of it.

The solution: prompt management with Grepture

Grepture is an AI gateway that lets you store prompt templates in a dashboard, version them with a draft/publish workflow, and resolve them at request time — either server-side through the proxy or client-side via the SDK.

Your prompts become a managed resource with Handlebars variables, immutable versions, and instant rollback. Your code just references a slug.

Setup in 3 minutes

1. Install the SDK

npm install @grepture/sdk

2. Get your API key

Sign up at grepture.com/en/pricing — the free plan includes 1,000 requests/month. Copy your API key from the dashboard.

3. Create a prompt in the dashboard

Go to Prompts and click New Prompt. Give it a name, a slug (e.g., support-reply), and start defining messages in the editor.

4. Use it in code

import OpenAI from "openai";
import { Grepture } from "@grepture/sdk";

const grepture = new Grepture({
  apiKey: process.env.GREPTURE_API_KEY!,
  proxyUrl: "https://proxy.grepture.com",
});

const openai = new OpenAI({
  ...grepture.clientOptions({
    apiKey: process.env.OPENAI_API_KEY!,
    baseURL: "https://api.openai.com/v1",
  }),
});

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: grepture.prompt.use("support-reply", {
    variables: { issue: ticket.text, tone: "friendly", company: "Acme" },
  }),
});

The prompt is resolved server-side by the proxy. No extra roundtrips, no prompts in your codebase.

Creating prompts in the dashboard

Each prompt has three fields at creation time:

FieldDescription
nameDisplay name (e.g., "Support Reply")
slugURL-safe identifier used in API calls (e.g., support-reply)
skip_rulesWhen enabled, the prompt bypasses the guardrail rule pipeline

After creating, you land in the editor. Define a messages array with system, user, and assistant roles. Use the Variables panel to document each template variable with a name, type (string, number, boolean), and optional default value.

Here's a realistic example — a support reply prompt with three variables:

System message:

You are a {{tone}} support agent for {{company}}.

{{#if context}}
Here is the relevant context:
{{context}}
{{/if}}

Please respond to the following issue:
{{issue}}

Variables panel:

VariableTypeDefault
tonestring"friendly"
companystring
issuestring
contextstring

Handlebars templating

Prompt templates use Handlebars-style syntax for dynamic content.

Variable interpolation

Hello {{name}}, your order {{order_id}} is ready.

Missing variables resolve to an empty string.

Conditionals

{{#if premium}}
You are a premium customer and qualify for priority support.
{{else}}
Standard support response.
{{/if}}

Values are falsy if empty, "false", or "0".

Loops

{{#each items}}
- {{this}}
{{/each}}

Pass arrays as JSON strings in the variables object: { items: '["item1", "item2"]' }.

Draft/publish workflow

Prompts follow a four-step lifecycle:

  1. Draft — edit freely, save as many times as you want. The draft is never served to production traffic unless explicitly requested with @draft.
  2. Publish — snapshots the current draft into an immutable version (v1, v2, v3...). Published versions cannot be edited.
  3. Activate — set which published version is "live." This is what production traffic resolves when no version is specified.
  4. Rollback — activate any older version to roll back instantly. No deploy, no downtime.

This means you can edit a draft, test it in the playground, publish it, and activate it — all without touching your codebase. If something goes wrong, activate the previous version and you're back to normal in seconds.

Using prompts via SDK

The SDK exposes a grepture.prompt namespace with methods for every level of control.

prompt.use() — server-side resolution

The simplest approach. Pass grepture.prompt.use() as the messages field. The proxy resolves the template server-side — zero extra roundtrips:

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: grepture.prompt.use("support-reply", {
    variables: { issue: ticket.text, tone: "friendly", company: "Acme" },
    // version: "draft",  // pin to draft
    // version: 3,        // pin to v3
    // omit for active (live) version
  }),
});

prompt.assemble() — client-side resolution

Fetches the template and resolves it locally. Useful when you want to inspect or modify messages before sending:

const { messages } = await grepture.prompt.assemble("support-reply", {
  variables: { issue: ticket.text, tone: "friendly", company: "Acme" },
});

// Append extra context before sending
messages.push({ role: "user", content: additionalContext });

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages,
});

prompt.get() + prompt.resolve() — fetch once, resolve many

Fetch the raw template once and resolve it multiple times with different variables. Useful for batch operations:

const template = await grepture.prompt.get("support-reply");

const resolved1 = grepture.prompt.resolve(template.messages, {
  issue: tickets[0].text, tone: "friendly", company: "Acme",
});
const resolved2 = grepture.prompt.resolve(template.messages, {
  issue: tickets[1].text, tone: "empathetic", company: "Acme",
});

prompt.list() — discover available prompts

const prompts = await grepture.prompt.list();
for (const p of prompts) {
  console.log(`${p.slug} (v${p.active_version})`);
}

SDK method reference

MethodReturnsNetworkDescription
prompt.use(slug, opts?)PromptMessagesNoneMarker array for server-side resolution via clientOptions()
prompt.assemble(slug, opts?)Promise<AssembledPrompt>1 callFetch + resolve client-side. Returns { messages, metadata }
prompt.get(slug, opts?)Promise<PromptTemplate>1 callFetch raw template with {{handlebars}} intact
prompt.resolve(messages, vars)PromptMessage[]NonePure function — resolve template messages locally
prompt.list()Promise<PromptListItem[]>1 callList all prompts for the team

All methods accept an optional version option (number | "draft"). Omit for the active (live) version.

Using prompts via headers

If you don't want to use the SDK's prompt methods, you can resolve prompts with two headers on any OpenAI-compatible request:

const response = await openai.chat.completions.create(
  { model: "gpt-4o", messages: [] },
  {
    headers: {
      "X-Grepture-Prompt": "support-reply",
      "X-Grepture-Vars": JSON.stringify({
        issue: ticket.text,
        tone: "friendly",
        company: "Acme",
      }),
    },
  },
);
HeaderDescription
X-Grepture-PromptPrompt slug. Append @draft or @v3 to pin a specific version.
X-Grepture-VarsJSON object of template variables.

The proxy resolves the template and replaces the request's messages array before forwarding. The messages array you pass in the request body is ignored.

Testing in the playground

The Test Panel in the prompt editor lets you fill in variable values and see the resolved output without making an API call. Use it to verify your templates before publishing — check that conditionals render correctly, loops expand as expected, and the final prompt reads the way you intend.

Rule pipeline integration

By default, resolved prompts flow through your guardrail rules like any other request. This means PII redaction, injection detection, and other rules apply to the final resolved messages.

If you control the content of a prompt and don't want rules interfering, enable Skip Rules on that prompt. Use this for system-only prompts where every variable is server-controlled:

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: grepture.prompt.use("internal-classifier", {
    variables: { text: document.body },
    // skip_rules is configured on the prompt in the dashboard, not in code
  }),
});

Next steps