You can now replay any logged request directly from the traffic log. The Playground seeds the editable request from the original — using only the already-redacted body, so no PII is ever restored — and lets you adjust the prompt, model, or parameters and re-run it through the proxy.
Every re-run is scored against the original side-by-side across your evaluators, with a per-metric delta so you can see exactly what improved or regressed. Happy with the result? Save the new request and response as a dataset case in one click, ready for your next experiment.
The full loop: spot something in production, replay it, compare the scores, and capture the fix as a test case — all without leaving the Playground.