Ben @ Grepture

Why Presidio Isn't Enough for PII Redaction

Microsoft Presidio is a great PII detection toolkit — but production LLM redaction needs more than detection. Here's where it stops and what fills the gap.

Presidio gets you to the starting line, not the finish

If you've ever needed to strip personal data out of text before it leaves your systems, you've probably reached for Microsoft Presidio. It's the default open-source answer to PII detection, and deservedly so — it's well-engineered, MIT-licensed, and genuinely good at what it was built to do.

Then you try to put it in front of a live LLM workload — every prompt, every provider, every language your users actually type in — and you hit a wall. The wall isn't a bug. It's the gap between a detection library and a redaction system. Presidio is the former. Production PII redaction for AI traffic needs the latter.

This post is about where that gap is, and why closing it yourself is more work than it looks.

What Presidio actually is (and where it shines)

Presidio is a two-stage Python toolkit. The Analyzer finds PII; the Anonymizer transforms it. The Analyzer runs a registry of recognizers over your text and returns spans with confidence scores. Some recognizers are regex-and-checksum based (credit cards via Luhn, IBANs, emails, SSNs). Others lean on an NLP engine — spaCy by default, with Hugging Face transformers or Stanza as pluggable backends — to catch the things regex can't, like names. A context-enhancement step nudges confidence up when telltale words sit nearby.

For structured PII, this is excellent. A credit-card number has a checksum; an email has a grammar. Presidio confirms these with near-certainty, and you should absolutely use a tool like it for that job. None of what follows is a knock on Presidio's detection quality within its design envelope.

The problem is that "redact PII before it reaches an LLM provider" is a much bigger job than "detect PII in a string." And Presidio, by design, only does the second part.

Gap 1: Presidio tells you up front it won't catch everything

You don't have to take a competitor's word for the accuracy ceiling — Presidio's own documentation is refreshingly honest about it. From the official FAQ and docs:

  • "Because it is using automated detection mechanisms, there is no guarantee that Presidio will find all sensitive information," and so "additional systems and protections should be employed."
  • On commercial alternatives: SaaS PII offerings "often have better entity coverage or accuracy than Presidio."
  • The evaluation guide is blunt: "No de-identification system is perfect. It is important to evaluate the performance of a PII detection system for your specific use case." The project publishes no official accuracy benchmark — you're expected to measure it yourself.

Where does that uncertainty concentrate? On the unstructured categories — names, organizations, locations — that depend entirely on the NER model rather than a checksum. These are also the most common form of PII in real prompts. There's no regular expression for "this token is a person's name." Try to write one and you'll flag New York, Monday Morning, and half the capitalized words in any document. So the quality of your name detection is exactly the quality of the NER model you wired in — and the default spaCy model trades recall for speed.

An independent benchmark from a privacy vendor tested several general-purpose PII tools, Presidio among them, against ~45,000 words of messy real-world data (call transcripts, medical notes, chat logs) and found aggregate recall in the 57–73% range. Treat that as directional rather than gospel — it's a vendor's test and it doesn't isolate a Presidio-specific number — but the direction matches Presidio's own disclaimer. On clean structured data, detection is great. On the free-form text people actually send to LLMs, a meaningful slice of names slips through unless you invest heavily in model tuning.

Gap 2: The multilingual wall (ask anyone shipping in Europe)

Here's a detail that bites EU teams specifically. Presidio's docs state plainly: "While different detection mechanisms such as regular expressions are language agnostic, the context words used to increase the PII detection confidence aren't."

In practice that means one NER model per language, plus per-language context-word lists you have to translate and maintain yourself. The moment a prompt arrives in German instead of English, your carefully tuned English pipeline loses recall — and an EU-facing gateway sees German, French, and Dutch as a matter of routine. The community issues reflect this: custom recognizers misbehaving in German, friction building multi-language Docker images, default models that struggle with German organization names.

Multilingual coverage isn't a nice-to-have for a European product. It's the requirement. With Presidio, it's a project you own.

Gap 3: Redaction is one-way by default

This is the big one for LLM use cases, and it's where most teams discover Presidio's anonymizer wasn't built for their workflow.

The Anonymizer ships several operators — Replace, Redact, Mask, Hash, Encrypt, Keep. Only Encrypt ↔ Decrypt is reversible. Redact, Mask, and Hash are lossy by design: once Sarah Chen becomes <PERSON>, the original is gone.

For a lot of AI workloads, one-way redaction quietly breaks the product. Think customer-support summarization, personalized document generation, or any multi-turn assistant: you need to send the model sanitized text, but you need the response to come back referencing the real names, emails, and account numbers. That requires mask-and-restore — replace PII with stable tokens on the way out, map them back on the way in. Presidio gives you encryption primitives, but the full restore workflow — token storage, mapping, lifecycle — is yours to build and operate.

And it gets subtler. Presidio explicitly "does not store or maintain stateful sessions." So even consistency is on you: the same person should map to the same placeholder across every turn of a conversation, or the model gets confused about who's who. A recent change even made the Hash operator use random per-entity salts by default, so identical values hash differently unless you thread an explicit salt through yourself. Conversation-level entity consistency — table stakes for an LLM gateway — is something you assemble on top of Presidio, not something it provides.

Gap 4: It doesn't catch secrets

Presidio targets PII: names, emails, phone numbers, financial identifiers. It does not detect API keys, bearer tokens, AWS credentials, or database connection strings. This isn't an oversight you can config your way around — a maintainer confirmed in the project's own discussions that secret detection "is not currently supported."

For LLM traffic, that's a real exposure. Developers paste config into prompts. RAG pipelines slurp up .env files and internal docs. Coding assistants ship snippets that include live credentials. A redaction layer that's blind to secrets is missing one of the highest-severity leak categories in exactly the workloads where it matters most. You'd need a second tool, and a second integration, to cover it.

Gap 5: A library isn't on the network path

Even with perfect detection, a library has a structural problem: it only runs where you remember to call it.

Presidio is Python, embedded in your code. Every code path that sends data to a provider needs an explicit Analyzer + Anonymizer call. Miss one — a new endpoint, a background job, a third-party SDK, an autonomous agent making tool calls you didn't hand-write — and unredacted PII goes straight out. As your surface area grows, "did we wrap every egress point?" becomes a question you can't confidently answer.

A library also gives you nothing operational for free. There's no audit trail of what was detected and redacted, no policy management, no dashboard, no logging — and, as the FAQ notes, no SLA or warranty, because Presidio "is not an official product of any company." That's a perfectly reasonable stance for an open-source toolkit. It's just a lot of missing surface area between "Presidio detects PII in a string" and "we can prove to an auditor that no customer PII reached OpenAI last quarter."

Detection is a feature. Redaction is a system.

Step back and the pattern is clear. Presidio solves one well-scoped problem — find PII in this text — and solves it well. But production PII redaction for AI traffic is a system with several more moving parts:

  • Coverage across every egress path, not just the ones you remembered to wrap.
  • High recall on unstructured, multilingual PII, because that's what real prompts contain.
  • Secret and credential detection, alongside personal data.
  • Reversible, consistent redaction so responses can carry real values back without the model ever seeing them.
  • Detection that runs before egress, so raw text never leaves your boundary in the first place.
  • An audit trail you can hand to a compliance team.

You can build every one of these on top of Presidio. Teams do. But you're then maintaining NER models, per-language context lists, a token store, a restore pipeline, a separate secrets scanner, a proxy layer, and an audit log — a standing system, not a library call.

How Grepture closes the gap

Grepture was built around the system, not the string. It's a security proxy that sits on the network path between your application and any AI provider, so redaction isn't something you remember to call — it's something every request flows through automatically, in any language, including calls from agents and third-party libraries you didn't write.

Under the hood, detection is layered the way the problem actually splits: deterministic validators for structured PII (the checksum-provable categories), and multilingual transformer NER for the unstructured kind — names, organizations, locations — across the European languages an EU gateway sees daily. All of it runs in-process at the proxy, so raw prompt text is never fanned out to a third-party detection API before it's cleaned. (We go deeper on that design in how we redact PII before it reaches the LLM, without giving away the exact model tuning that does the heavy lifting.)

On top of detection, the parts Presidio leaves to you are built in:

  • Mask-and-restore — reversible redaction with consistent tokens, so responses come back personalized while the model only ever saw placeholders.
  • Secret scanning for API keys, tokens, and credentials, alongside PII — not a second tool to integrate.
  • A built-in audit trail and dashboard, so the compliance story is a report, not a research project — which matters under GDPR and the EU AI Act.

If you want the full feature-by-feature breakdown, we maintain a Grepture vs. Presidio comparison. And if your standards require it, the proxy is open source and self-hostable — the same pipeline on your own infrastructure.

Presidio is a fine choice when you need a detection toolkit and have the team to build a system around it. If what you actually need is the system — coverage everywhere, reversibility, secrets, multilingual recall, and an audit trail — that's a different shape of problem, and it's the one Grepture is built to solve.

Key takeaways

  • Presidio is a detection library, not a redaction system. It finds PII in a string well; production AI redaction needs much more around it.
  • Its own docs admit the accuracy ceiling — no guarantee of full coverage, weakest on the unstructured names/orgs/locations that dominate real prompts, and worse out-of-the-box in non-English languages.
  • Redaction is one-way by default. Reversible, consistent mask-and-restore — essential for multi-turn and personalized LLM workloads — is yours to build.
  • It doesn't detect secrets. API keys and credentials in prompts go straight through.
  • A library only runs where you call it. A proxy on the network path catches every egress route, with an audit trail to prove it.
[Protect your API traffic today]

Start scanning requests for PII, secrets, and sensitive data in minutes. Free plan available.

Get Started Free