How to Redact PII from LangChain Pipelines

Stop PII from leaking through LangChain chains and agents. Learn how to redact sensitive data from every LLM call in your LangChain pipeline using a proxy-level security layer.

The problem: PII leaking through LangChain pipelines

LangChain makes it easy to build complex AI workflows — chains, agents, RAG pipelines, tool-calling loops. But every invoke call eventually sends a prompt to an LLM provider. If your pipeline assembles user data from databases, documents, or APIs, that prompt almost certainly contains PII.

import { ChatOpenAI } from "@langchain/openai";
import { HumanMessage } from "@langchain/core/messages";

const model = new ChatOpenAI({ model: "gpt-4o" });

const response = await model.invoke([
  new HumanMessage(
    `Analyze this customer's account history:

    Name: Jennifer Walsh
    Email: j.walsh@enterprise.com
    Phone: (415) 555-0288
    SSN: 552-31-8847
    Account: 4024-0071-4382-9956
    Address: 88 Mission St, San Francisco, CA 94105
    Slack webhook: https://hooks.slack.com/services/T00/B00/secret`
  ),
]);

That single call sent a name, email, phone number, SSN, credit card, address, and a webhook secret to OpenAI. In a real LangChain pipeline, data is often pulled automatically from retrievers, tools, and databases — making it even harder to audit what's being sent.

Why LangChain pipelines are especially risky

LangChain's power comes from composability — but that composability creates a larger attack surface for PII leakage:

  • Retrieval chains (RAG) pull documents from vector stores that may contain customer data, medical records, or internal documents
  • Agents and tools can call APIs, query databases, and scrape content — assembling PII from multiple sources into a single prompt
  • Memory and conversation history accumulates PII over multi-turn interactions
  • Output parsers may log intermediate results containing sensitive data
  • Chain callbacks can send prompt content to logging, monitoring, or tracing services

The problem isn't any single call — it's that LangChain orchestrates many calls, each potentially carrying PII from different sources.

The solution: proxy-level redaction with Grepture

Grepture is an open-source security proxy that sits between your LangChain pipeline and any LLM provider. Every request is scanned for PII, secrets, and sensitive patterns before it leaves your infrastructure. Sensitive data is masked with reversible tokens — and restored in the response so your pipeline works normally.

One proxy protects every LLM call in your pipeline, regardless of which provider or model you're using.

Setup in 3 minutes

1. Install the SDK

npm install @grepture/sdk

2. Get your API key

Sign up at grepture.com/en/pricing — the free plan includes 1,000 requests/month. Copy your API key from the dashboard.

3. Wrap your LangChain model

LangChain models accept a configuration object where you can override the HTTP client. Use clientOptions() to route traffic through Grepture:

import { ChatOpenAI } from "@langchain/openai";
import { Grepture } from "@grepture/sdk";

const grepture = new Grepture({
  apiKey: process.env.GREPTURE_API_KEY!,
  proxyUrl: "https://proxy.grepture.com",
});

const opts = grepture.clientOptions({
  apiKey: process.env.OPENAI_API_KEY!,
  baseURL: "https://api.openai.com/v1",
});

const model = new ChatOpenAI({
  model: "gpt-4o",
  configuration: {
    baseURL: opts.baseURL,
    fetch: opts.fetch,
  },
});

// Every chain, agent, and retriever call is now protected
const response = await model.invoke([
  new HumanMessage(userInput),
]);

Works with any LangChain-supported provider

The same approach works with Anthropic, Google, and any OpenAI-compatible provider:

import { ChatAnthropic } from "@langchain/anthropic";

const opts = grepture.clientOptions({
  apiKey: process.env.ANTHROPIC_API_KEY!,
  baseURL: "https://api.anthropic.com",
});

const model = new ChatAnthropic({
  model: "claude-sonnet-4-5-20250929",
  clientOptions: {
    baseURL: opts.baseURL,
    fetch: opts.fetch,
  },
});

What gets detected

Grepture ships with 50+ detection patterns on the free tier and 80+ on Pro, covering:

CategoryExamplesTier
Personal identifiersNames, emails, phone numbers, SSNs, dates of birthFree (regex), Pro (AI)
Financial dataCredit card numbers, IBANs, routing numbersFree
CredentialsAPI keys, bearer tokens, passwords, connection stringsFree
Network identifiersIP addresses, MAC addressesFree
Freeform PIINames, organizations, and addresses in unstructured textPro (local AI models)
Adversarial inputsPrompt injection attemptsBusiness

All detection runs on Grepture infrastructure — no data is forwarded to additional third parties.

Mask and restore: reversible redaction

Grepture doesn't just strip PII — it replaces sensitive values with tokens, sends the sanitized prompt to the LLM, and restores the original values in the response. This is critical for LangChain pipelines where downstream chains depend on the model's output containing the original data.

What the LLM sees:

Analyze this customer's account history:
Name: [PERSON_1]
Email: [EMAIL_1]
SSN: [SSN_1]
Account: [CREDIT_CARD_1]
...

What your pipeline gets back:

The customer Jennifer Walsh (j.walsh@enterprise.com)
has an active account at 88 Mission St, San Francisco.
No anomalies detected in recent transactions.

The model processes clean data. Your LangChain pipeline receives the full, personalized response — and downstream chains, tools, and output parsers work normally.

Protects the entire pipeline

Because Grepture operates at the HTTP level, it protects every LLM call in your pipeline:

  • Retrieval chains — PII in retrieved documents is redacted before reaching the model
  • Agent tool calls — data assembled by tools is scanned before each LLM invocation
  • Multi-turn conversations — accumulated PII in message history is caught on every turn
  • Parallel chains — all concurrent LLM calls flow through the same proxy

No changes to your chain logic, retrievers, or agents. Just wrap the model client once.

Next steps