Docs›Embeddings

Embeddings

PII-redacting embeddings endpoint. OpenAI-compatible passthrough that strips sensitive data from the input before the vector is generated, so PII never lands in your vector store.

Overview

POST /v1/embeddings is an OpenAI-compatible passthrough that detects and redacts PII in the input before forwarding to OpenAI. The vector that comes back — and that you store in Pinecone, pgvector, Weaviate, or any other vector store — is derived from redacted text, so a PII leak into your vector store becomes structurally impossible.

Unlike chat completions, embeddings are persistent surface area. Once a vector with raw PII lives in your vector store, it cannot be selectively scrubbed, it gets queried by k-NN, and it gets re-injected into prompts via RAG. This endpoint solves that at the ingest side.

Endpoint

POST https://proxy.grepture.com/v1/embeddings

Same request and response shape as OpenAI's /v1/embeddings. You can drop it in anywhere an OpenAI embeddings client is used by pointing the base URL at Grepture.

Authentication

Two keys are involved:

Grepture API key — pass in Authorization: Bearer grp_.... Identifies your team for logging and rate limiting.
+
OpenAI API key — used for the upstream call. Resolved in this order:
1. Caller-supplied (BYOK): x-grepture-auth-forward: Bearer sk-.... The caller pays OpenAI directly.
2. Stored provider key: if no header is sent, Grepture uses your team's OpenAI key from Integrations → Provider Keys.
3. Neither: returns 400 no_openai_key.

Basic usage

curl -X POST https://proxy.grepture.com/v1/embeddings \
  -H "Authorization: Bearer grp_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "email me at john.doe@example.com about order #45192"
  }'

Response is the standard OpenAI shape, plus two headers:

x-grepture-redactions: 1
x-grepture-pii-categories: email

The embedding vector returned was computed from "email me at [EMAIL_REDACTED] about order #45192" — the placeholder, not the original email.

Request body

Field	Type	Required	Notes
`model`	string	yes	Any OpenAI embeddings model (`text-embedding-3-small`, `text-embedding-3-large`, `text-embedding-ada-002`).
`input`	string or string[]	yes	Pre-tokenized arrays of integers are not supported — see Input shape.
`dimensions`	number	no	Forwarded to OpenAI. Lets you truncate `text-embedding-3` vectors.
`encoding_format`	"float" \| "base64"	no	Forwarded to OpenAI.
`user`	string	no	Forwarded to OpenAI.

Headers

Header	Default	Values
`Authorization`	—	`Bearer grp_...` Grepture API key.
`x-grepture-auth-forward`	—	`Bearer sk-...` OpenAI key (BYOK). Optional.
`x-grepture-on-pii`	`redact`	`redact` \| `block` — see Modes.
`x-grepture-redaction-strategy`	`placeholder`	`placeholder` \| `hash` \| `mask` — see Redaction strategies.
`x-grepture-trace-id`	—	Trace ID for cross-request grouping in the dashboard.

What gets detected

Two detection layers run on every input, gated by tier:

Regex (all tiers) — email, phone, SSN, credit card, IP address, street address, date of birth.

NER (Pro and above) — person names, locations, organizations. Layered on top of the regex pass; matches are merged and deduped by position.

Each detection has a category and a position span. All distinct categories caught across the request are returned in the x-grepture-pii-categories header.

Redaction strategies

The redaction strategy controls how matches are replaced before the request is forwarded.

placeholder (default, recommended) — replaces matches with stable strings like [EMAIL_REDACTED]. Every email becomes the same token, so two RAG documents about "email delivery problems" still embed to nearly identical vectors. This is the only strategy that preserves k-NN clustering. Use it unless you have a specific reason not to.

hash — replaces matches with a 12-character SHA-256 prefix. Every distinct value gets a distinct token, which breaks similarity-based retrieval — "email user1@x.com" and "email user2@x.com" will end up in different regions of vector space. Useful only if you also store the hash somewhere and need to correlate.

mask — replaces matches with first*last style masks (e.g., j****e@example.com). Partial signal survives but k-NN behavior is unpredictable. Rarely the right choice for embeddings.

Modes

redact (default) — detected PII is replaced and the request is forwarded. Best for RAG workloads where you want maximum recall.

block — if any PII is detected, returns 422 pii_detected and does not forward. Use for regulated workloads where any PII at all should not leave your application:

curl -X POST https://proxy.grepture.com/v1/embeddings \
  -H "Authorization: Bearer grp_live_..." \
  -H "x-grepture-on-pii: block" \
  -H "Content-Type: application/json" \
  -d '{"model":"text-embedding-3-small","input":"call me at 555-123-4567"}'

Response:

{
  "error": "pii_detected",
  "categories": ["phone"],
  "count": 1
}

A row is still written to embedding_logs with blocked: true so the dashboard shows what was caught.

Input shape

input accepts either a single string or an array of strings:

{ "input": "one document to embed" }
{ "input": ["doc 1", "doc 2", "doc 3"] }

Pre-tokenized inputs (number[] or number[][] — what OpenAI calls "token arrays") are rejected with 400 tokenized_input_not_supported. The redaction pipeline operates on text; once a string has been tokenized client-side, there is no surface left to detect PII on.

If your client tokenizes by default (rare), switch it to pass strings.

SDK usage

The @grepture/sdk package exposes a typed wrapper:

import { Grepture } from "@grepture/sdk";

const grepture = new Grepture({
  apiKey: process.env.GREPTURE_API_KEY!,
  proxyUrl: "https://proxy.grepture.com",
});

const { data, redactions } = await grepture.embeddings.create({
  model: "text-embedding-3-small",
  input: "email me at john@example.com about order #12345",
});

console.log(redactions); // { count: 1, categories: ["email"] }

// `data` is the standard OpenAI embeddings array — pass it to Pinecone/pgvector/Weaviate as-is.

Options on embeddings.create():

Option	Default	Notes
`model`	—	Required.
`input`	—	Required. String or string array.
`dimensions`	—	Optional. Truncate `text-embedding-3` vectors.
`encoding_format`	—	`"float"` or `"base64"`.
`user`	—	Forwarded to OpenAI.
`onPii`	`"redact"`	`"redact"` or `"block"`.
`strategy`	`"placeholder"`	`"placeholder"`, `"hash"`, or `"mask"`.
`openaiKey`	—	BYOK passthrough. Sent as `x-grepture-auth-forward`.
`traceId`	—	Trace ID for dashboard grouping.

block mode throws an error with status, code, categories, and count fields attached when PII is detected.

OpenAI SDK drop-in

If you are already using openai, change the base URL and add the Grepture key — no other code changes:

import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.GREPTURE_API_KEY!,
  baseURL: "https://proxy.grepture.com/v1",
  defaultHeaders: {
    // Optional: pass your own OpenAI key per-request
    "x-grepture-auth-forward": `Bearer ${process.env.OPENAI_API_KEY}`,
  },
});

const { data } = await openai.embeddings.create({
  model: "text-embedding-3-small",
  input: "email me at john@example.com",
});

The x-grepture-redactions and x-grepture-pii-categories headers are still set but you'll need to read them off the raw response if you want them — the typed redactions field is only available via @grepture/sdk.

Errors

Status	Code	Cause
400	`tokenized_input_not_supported`	`input` is a number array. Pass strings.
400	`no_openai_key`	No OpenAI key found via header or `provider_keys`.
400	—	Missing or invalid `model` / `input`.
401	—	Missing or invalid Grepture API key.
422	`pii_detected`	Block mode triggered; request not forwarded.
429	—	Rate or quota limit hit.
502	—	Upstream OpenAI unreachable.

Observability

Every call writes a row to embedding_logs, viewable in the dashboard at Embeddings. The row records:

Model, input count, total characters, token usage, duration
Redaction count, categories caught, source (regex / ai / both)
Redaction strategy, blocked flag, status code
Trace ID, BYOK flag, provider key ID

Grepture does not store the input text or the response vectors. This is intentional: the point of the endpoint is to keep PII out of vector storage, and storing it on our side would defeat the feature. The dashboard shows counts and categories only.

Background reading

For the why behind this endpoint, see Your Vector Store Is a Permanent PII Leak.