The leak nobody talks about
Most teams shipping RAG today treat their vector store the same way they treat a search index: pour data in, query it back out, move on. The mental model is wrong, and it's actively dangerous.
When you embed "Please email me at john.doe@example.com about order #45192" and store the resulting 1536-dimensional vector in Pinecone, pgvector, Weaviate, or any other vector store, you've created a piece of data that:
- Persists indefinitely. Vector stores are designed for retrieval, not lifecycle management. The vector lives until someone deletes it — and nobody knows which vectors belong to which user.
- Cannot be selectively scrubbed. GDPR's right to erasure says you have to delete a person's data on request. Finding every vector derived from that person's text — including paraphrases, partial mentions, and downstream summaries — is effectively impossible.
- Is queryable. k-NN search means an attacker who knows a similar query can pull back the original input. Recent embedding-inversion research has shown that surprisingly large fractions of the source text can be reconstructed from a single embedding.
- Gets re-injected into prompts. That's the whole point of RAG. The PII you embedded today gets concatenated into a system prompt tomorrow and sent to whichever model your agent decided to call.
Chat completions are ephemeral. A bad logging decision there is a contained incident. Vector stores are permanent. A bad embedding decision is a structural problem.
Why the "redact your logs" pattern doesn't help
The standard PII story in 2026 looks like this: you proxy requests through a gateway, the gateway redacts emails/phones/SSNs from the request and response before writing to your traffic log, and your dashboard shows clean strings. Great for chat completions.
For embeddings, this pattern catches nothing useful. The leak isn't in the log — it's in the vector you sent to OpenAI and stored in Pinecone. Redacting the log entry while the upstream call goes through with the original PII is exactly the wrong order of operations.
What you actually need is to redact the input string before the embedding request goes to OpenAI. The vector that comes back is then derived from "Please email me at [EMAIL_REDACTED] about order #45192" — and that's the vector that ends up in your store.
The placeholder trick
There's a subtle reason placeholders are the right redaction strategy here. RAG depends on similar inputs producing similar vectors. If you redact emails by hashing them (a3f9c1...), every email becomes a unique random-looking token and breaks similarity — two support tickets about "email delivery problems" now embed to entirely different regions of vector space because the user's email is different.
If you redact with stable placeholders ([EMAIL_REDACTED]), every email becomes the same token. The vector for "my email user1@x.com isn't working" and "my email user2@y.com isn't working" end up nearly identical — which is exactly the clustering behavior you want for retrieval.
Same logic applies to phone numbers, SSNs, addresses, dates. Use placeholders, not hashes, when redacting embedding inputs.
What we built
Grepture's /v1/embeddings endpoint is an OpenAI-compatible passthrough that runs PII redaction on the input before forwarding. Free tier gets regex-based detection (email, phone, SSN, credit card, IP, address, DOB); pro+ adds NER-based detection for names, locations, and organizations.
curl -X POST https://proxy.grepture.com/v1/embeddings \
-H "Authorization: Bearer grp_live_..." \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-3-small",
"input": "email me at john@example.com about order #12345"
}'
Response: a normal OpenAI embeddings response, plus two headers — x-grepture-redactions: 1 and x-grepture-pii-categories: email — so you know what was caught.
We do not store the input text or the response vectors. The whole point is to keep PII out of vector storage; storing it on our side would defeat the feature.
When to use block mode
For workloads where any PII at all is unacceptable — regulated industries, internal documents, audit-sensitive data — pass x-grepture-on-pii: block. The endpoint returns 422 instead of redacting, with the categories caught, so your application can surface the problem to the user instead of silently transforming their input.
For most RAG workloads, the default redact-and-pass behavior is what you want. The vector store stays useful, retrieval still works, and the PII never leaves your network in a form that can be reconstructed.
What this doesn't solve
This handles the input side. It does not protect against:
- PII in the document corpus you're embedding at index time (separate problem, same redaction logic applies — point your indexer at /v1/embeddings too).
- PII the LLM generates in its responses and you then embed.
- Reconstruction attacks against vectors derived from redacted-but-still-identifying text ("the patient with the rare condition X who lives in zip code Y").
The vector store leak is fixable. Start with the inputs.