Changelog

LLM Evals on Real Traffic

LLM-as-a-judge evaluations that run automatically on production traffic flowing through the gateway.

Launched LLM-as-a-judge evaluations on real production traffic. Create evaluators using built-in templates or custom judge prompts, and Grepture scores actual responses on a 0-to-1 scale. No synthetic datasets or separate eval pipelines needed.