How to Separate Prompts from Code with Server-Side Prompt Serving

Treat prompts like configuration, not code. Edit, version, and deploy prompt templates without redeploying your app — and let non-developers iterate on prompts safely.

The problem: prompts are trapped in your codebase

Every prompt change follows the same path: open a PR, wait for review, pass CI, deploy. A one-word tweak to a system prompt takes hours instead of seconds.

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [
    {
      role: "system",
      content: `You are a friendly support agent for Acme Corp.
      Always greet the customer by name. Be concise but warm.
      If the issue involves billing, escalate to a human.
      Never mention competitors by name.`,
    },
    { role: "user", content: ticket.text },
  ],
});

This works — until it doesn't. Product wants to change "friendly" to "professional." A content writer needs to add a new instruction about refund policy. The team wants to A/B test two versions of the system prompt. Every change goes through the same bottleneck: a developer, a PR, a deploy.

The costs compound:

  • Slow iteration — prompt tuning should take minutes, not deploy cycles
  • No rollback — if a prompt change degrades quality, you're reverting commits and redeploying
  • Developer bottleneck — product managers, content writers, and designers can't contribute to prompt work without filing tickets
  • No audit trail — git blame tells you who changed the string, not why the prompt was v3 instead of v4

The "prompt as config" pattern

Prompts aren't application logic. They're configuration — closer to feature flags or copy than to business logic. The best teams treat them that way.

The pattern is simple: your application code defines the structure (which variables exist, what model to call, how to handle the response). The prompt content lives in a separate system that can be updated independently.

This is the same idea behind feature flags, CMS-driven content, and environment variables. Decouple what changes often from what changes rarely. Prompts change constantly during development and tuning. Application code changes when you ship features.

Architecture: server-side prompt resolution

Here's how the flow works with Grepture:

  1. Your app makes an LLM call through the Grepture proxy, referencing a prompt by slug (e.g., support-reply)
  2. The proxy looks up the active version of that prompt template
  3. The proxy resolves the template with the variables your app provided
  4. The resolved messages are forwarded to the LLM provider (OpenAI, Anthropic, etc.)
  5. The response comes back through the proxy to your app

The key detail: resolution happens inside the proxy, on the same request path your traffic already takes. There's no extra roundtrip to a prompt service. No additional latency. The proxy resolves the template and forwards the request in a single hop.

Your code never contains the prompt text. It contains a reference to a prompt and the variables to fill in.

Setting up server-side prompt serving

1. Create a prompt in the dashboard

Go to Prompts in the Grepture dashboard and click New Prompt. Give it a slug (e.g., support-reply) and define your messages with Handlebars-style variables:

You are a {{tone}} support agent for {{company}}.

{{#if context}}
Here is the relevant context:
{{context}}
{{/if}}

Please respond to the following issue:
{{issue}}

Define the variables (tone, company, context, issue) in the Variables panel with types and optional defaults.

2. Install the SDK

npm install @grepture/sdk

3. Reference the prompt in your code

import OpenAI from "openai";
import { Grepture } from "@grepture/sdk";

const grepture = new Grepture({
  apiKey: process.env.GREPTURE_API_KEY!,
  proxyUrl: "https://proxy.grepture.com",
});

const openai = new OpenAI({
  ...grepture.clientOptions({
    apiKey: process.env.OPENAI_API_KEY!,
    baseURL: "https://api.openai.com/v1",
  }),
});

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: grepture.prompt.use("support-reply", {
    variables: {
      issue: ticket.text,
      tone: "friendly",
      company: "Acme Corp",
    },
  }),
});

prompt.use() doesn't make a network call. It returns a marker array that tells the proxy to resolve the template server-side. The proxy looks up support-reply, fills in the variables, and forwards the resolved messages to OpenAI — all in one request.

Now anyone on your team can edit the prompt text in the dashboard without touching this code.

Empowering non-developers

Once prompts are in the dashboard, the audience for prompt editing expands:

  • Product managers adjust tone, add instructions, and tweak behavior based on user feedback
  • Content writers refine language, fix grammar, and align prompts with brand voice
  • Designers iterate on conversational UX without waiting for a developer

Developers define the contract: the template structure, the variable names, the model configuration. Non-developers fill in the content within that structure. The variables panel documents what each variable does, so editors know what they're working with.

The Test Panel in the prompt editor lets anyone fill in sample variable values and preview the fully resolved output — no API call, no deployment, no risk. If the resolved output looks right, publish it. If not, keep editing.

HTTP header-based resolution

If you don't want to use the SDK, you can reference prompts with HTTP headers. This works with any OpenAI-compatible client in any language:

const response = await openai.chat.completions.create(
  { model: "gpt-4o", messages: [] },
  {
    headers: {
      "X-Grepture-Prompt": "support-reply",
      "X-Grepture-Vars": JSON.stringify({
        issue: ticket.text,
        tone: "friendly",
        company: "Acme Corp",
      }),
    },
  },
);

The messages array in the request body is ignored — the proxy replaces it with the resolved template. Append @draft or @v3 to the slug to pin a specific version: "support-reply@v3".

This is useful for teams using Python, Go, or any other language where installing the Grepture SDK isn't practical — any HTTP client that can set custom headers works.

Migration guide: moving hardcoded prompts to Grepture

Step 1: Identify prompts in your code

Search for messages: arrays in your LLM calls. Every hardcoded system prompt, user template, or few-shot example is a migration candidate.

Before:

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [
    {
      role: "system",
      content: `You are a friendly support agent for Acme Corp.
      Always greet the customer by name. Be concise but warm.
      If the issue involves billing, escalate to a human.
      Never mention competitors by name.`,
    },
    {
      role: "user",
      content: `Customer issue: ${ticket.text}`,
    },
  ],
});

Step 2: Create the prompt in the dashboard

Copy the message content into a new prompt. Replace dynamic values with {{variables}}:

  • System message: You are a {{tone}} support agent for {{company}}...
  • User message: Customer issue: {{issue}}

Publish the first version.

Step 3: Replace hardcoded strings with prompt.use()

After:

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: grepture.prompt.use("support-reply", {
    variables: {
      issue: ticket.text,
      tone: "friendly",
      company: "Acme Corp",
    },
  }),
});

The code is shorter, the prompt is editable without deploys, and you get versioning and rollback for free.

Step 4: Repeat for each prompt

Start with high-churn prompts — the ones your team edits most often. Low-churn system prompts (like a simple classification task) can stay hardcoded if nobody needs to touch them.

Testing before publishing

The versioning workflow prevents accidental production changes:

  1. Draft — edit freely in the dashboard. Drafts are never served to production traffic.
  2. Test — use the Test Panel to preview resolved output with sample variables. Verify the output looks correct.
  3. Publish — snapshot the draft into an immutable version (v1, v2, v3...).
  4. Activate — set the new version as "live." All production traffic using that prompt slug now resolves to the new version.
  5. Rollback — if quality degrades, activate a previous version. One click, instant rollback, no deploy.

To test a draft in your application before publishing, pin the draft version explicitly:

messages: grepture.prompt.use("support-reply", {
  variables: { issue: ticket.text, tone: "friendly" },
  version: "draft",
});

This lets you validate against real traffic in a staging environment without affecting production.

Next steps