Mask and Restore: Reversible Redaction That Keeps LLMs Useful

The redaction dilemma

You want to protect PII in your AI pipeline. So you add redaction. Problem solved, right?

Not quite. Traditional redaction breaks your application.

Consider a customer support AI. A user sends: "Hi, I'm Sarah Miller, my email is sarah@example.com and my order #45231 is missing."

With permanent redaction, the LLM receives: "Hi, I'm [REDACTED], my email is [REDACTED] and my order #45231 is missing."

The model does its best. It responds: "I'm sorry [REDACTED], let me look into order #45231. I'll send an update to [REDACTED]."

Your customer sees [REDACTED] in the response. The protection worked — and your app is broken. You've traded one problem for another.

With traditional redaction, you can protect the data or you can have a working app. Not both.

How mask and restore works

Mask and restore is Grepture's approach to reversible redaction. Instead of destroying PII, it temporarily replaces sensitive values with tokens, lets the LLM process the tokenized prompt, and then swaps the original values back into the response. The LLM never sees real PII. The user never sees tokens. The application works perfectly.

The flow:

Step 1: Detect

Grepture's detection engine identifies PII in the incoming request. This uses regex patterns for structured data (emails, phone numbers, credit cards) and local AI models for unstructured data (names, locations, organizations). All detection runs on Grepture's infrastructure — your data doesn't get forwarded anywhere else.

Step 2: Replace with tokens

Each detected PII value gets replaced with a unique, type-preserving token. The token format encodes the entity type and a short hash:

Sarah Miller becomes <<PERSON_a7f3>>
sarah@example.com becomes <<EMAIL_b2e1>>

The type prefix matters. It tells the LLM what kind of entity it's working with, which helps the model maintain coherent responses. More on why this works in the next section.

Step 3: Store mapping in vault

The token-to-value mapping is stored in an encrypted vault with a configurable TTL (time-to-live). The vault holds the real values so they can be restored later.

Step 4: Send tokenized request

The LLM receives the sanitized prompt:

"Hi, I'm <<PERSON_a7f3>>, my email is <<EMAIL_b2e1>> and my order #45231 is missing."

No real PII reaches the AI provider. If the provider logs the request, caches it, or uses it for training — doesn't matter. Nothing sensitive in there.

Step 5: LLM responds with tokens

The model maintains referential consistency. It tracks the tokens as entities and uses them correctly in its response:

"I'm sorry <<PERSON_a7f3>>, let me look into order #45231. I'll send an update to <<EMAIL_b2e1>>."

Step 6: Restore on response

Grepture intercepts the response, looks up each token in the vault, and swaps them back to the original values:

"I'm sorry Sarah Miller, let me look into order #45231. I'll send an update to sarah@example.com."

The user gets a natural response. The LLM never saw real PII. No code changes beyond the initial proxy setup. This is what preventing data leaks looks like when the application also needs to keep working.

Why this works with LLMs

Why would a language model handle made-up tokens correctly? It comes down to how transformers process text.

LLMs track entities by reference, not by name. When a model sees an entity in the prompt — real name or token like <<PERSON_a7f3>> — it tracks that entity through attention mechanisms across the context window. The model doesn't care whether the entity looks like a human name or an opaque identifier. It tracks relationships and references.

The token format is deliberate. The <<TYPE_hash>> pattern looks like an entity reference — something models handle naturally. It's distinct enough to avoid confusion with regular text, and the type prefix tells the model what the token represents.

It works with streaming. Grepture supports SSE (Server-Sent Events) streaming and detokenizes chunks in real time. No buffering — restored values appear as each chunk comes through.

It works across conversation turns. As long as the vault TTL hasn't expired, tokens restore correctly in multi-turn conversations. The mapping persists between requests, so a chatbot conversation that spans multiple messages maintains consistent restoration throughout.

The attention mechanism treats tokens as entities regardless of their surface form — that's what makes mask and restore reliable.

Comparing approaches

Not every situation calls for mask and restore.

Permanent redaction

Replace PII with [REDACTED]. Simple, zero risk of exposure. But it breaks application logic — the LLM can't reference the original values, and neither can the user. Best for: audit logs, analytics pipelines, any context where you don't need the original values back.

Partial masking

Replace PII with partially obscured values like S**** M***** or s****@example.com. Preserves some structure but still loses the actual information. The LLM can't use the data meaningfully in its response. Better UX than [REDACTED], but the same functional limitation. Best for: display contexts where showing data type without full value is sufficient.

Client-side redaction

Application code strips PII before sending to the AI provider. Each service needs its own implementation, coverage is inconsistent, and it's hard to maintain as data patterns change. Best for: environments where you can't use a proxy layer (rare, but they exist).

Mask and restore

Full protection plus full functionality. Network-layer interception means zero code changes across all services. Token-based replacement with encrypted vault storage and configurable TTL. Best for: user-facing AI features where the LLM response goes back to the user — chatbots, support agents, content generation, anything where you need protection AND a working application.

Trade-offs to consider: the vault adds a dependency (mitigated by TTL expiry and zero-data mode options), tokens need to survive the model's context window (not an issue for typical conversations, but very long prompts could theoretically be affected), and long-running conversations may hit TTL limits if the session outlasts the configured expiry.

Vault design

The vault stores the encrypted mapping between tokens and original values.

Encrypted storage — Token-to-value mappings are encrypted at rest. Even if the vault were somehow accessed directly, the raw data isn't readable.
Configurable TTL — Values expire after a set time. This balances usability (tokens remain restorable for the duration of a conversation) with minimizing data retention (values don't persist indefinitely). You set the TTL in the configuration.
Zero-data mode — On Business+ plans, vault storage is in-memory only. Tokens and values never hit disk. The entire mask-and-restore cycle completes within the request lifecycle, and nothing persists after the response is delivered. This is the strictest mode for environments where even temporary storage of PII is a concern.
Isolated vault spaces — Each API key has its own vault namespace. There's no cross-contamination between different applications or environments using the same Grepture account.
Automatic cleanup — Vault entries are created per-request and automatically cleaned up on TTL expiry. There's no manual maintenance — the vault is self-managing.

The TTL is the key design decision. Too short and multi-turn conversations break. Too long and you're storing PII longer than necessary. A quick Q&A bot might use a 5-minute TTL. A support conversation might need 30 minutes or more.

Implementation with Grepture

Getting mask and restore running takes minutes. The SDK wraps your existing AI provider client, and rules are configured in the dashboard.

import Grepture from "@grepture/sdk";
import OpenAI from "openai";

const grepture = new Grepture({
  apiKey: process.env.GREPTURE_API_KEY,
  proxyUrl: "https://proxy.grepture.com",
});

const openai = new OpenAI({
  ...grepture.clientOptions(),
  apiKey: process.env.OPENAI_API_KEY,
});

// With mask-and-restore rules configured in the dashboard:
// - PII detected in request → replaced with tokens
// - LLM processes tokenized prompt
// - Tokens in response → restored to original values
const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [
    { role: "system", content: "You are a helpful customer support agent." },
    { role: "user", content: "Hi, I'm Sarah Miller, my email is sarah@example.com" },
  ],
});

// response.choices[0].message.content contains restored values
// The LLM never saw "Sarah Miller" or "sarah@example.com"
console.log(response.choices[0].message.content);
// → "Hi Sarah Miller! How can I help you today? I have your email sarah@example.com on file."

That's it. The @grepture/sdk package routes traffic through the Grepture proxy, where mask-and-restore rules are applied automatically. The OpenAI client works exactly as it normally would — same API, same types, same streaming support.

In the dashboard, the traffic log shows the full flow: original request with real PII, tokenized version sent to the LLM, and restored response sent back to the user.

Rules are configured in the dashboard's rule builder. Select the detection type (email, name, phone, etc.), the action (mask and restore), and optionally a custom TTL. No code deploys needed to change your redaction policy. The quickstart guide walks through the full setup in about five minutes.

Grepture works with OpenAI, Anthropic, Google AI, Azure OpenAI, and any OpenAI-compatible provider. Mask-and-restore is provider-agnostic — it operates on request and response payloads regardless of which model is on the other end.

When to use which approach

Choosing the right action depends on the context:

Mask and restore — The LLM response goes back to the user and needs the original values. Chatbots, support agents, content generation, email drafting.
Permanent redaction — You don't need the original values back. Audit logs, analytics, model evaluation pipelines, internal monitoring.
Block — The presence of certain data means something is wrong. API keys, secrets, medical records where they shouldn't appear. If the detection fires, the request shouldn't be happening at all.
Log only — During initial deployment. Run in observe mode for a week, review what PII flows through your system, then set actions. Almost everyone finds more PII than they expected.

These approaches aren't mutually exclusive. A typical production setup might look like this:

Names and emails: mask and restore (user-facing responses need them)
SSNs and government IDs: permanent redaction (no legitimate reason for the LLM to have these)
API keys and secrets: block (their presence indicates a prompt injection attempt or misconfigured input)
Everything else: log (build visibility before enforcing)

Start broad with logging, narrow down to specific actions as you understand your data patterns, and adjust TTLs based on your conversation lengths.