Join our 12-week Enablement Program to optimize AI agents for reliability, cost and speed. Apply now →

02 · Evaluate & testComing soon

Evaluator Agent

Prove every change beats baseline before it ships

Evaluator Agent is the gate between the Mutation Agent's candidates and your repository. It re-runs the full eval suite, scores against held-out production traces, and runs adversarial probes for safety regressions.

A candidate that passes the Evaluator comes with a signed evidence pack — eval deltas, latency, cost, and a side-by-side diff — so the human reviewer sees the receipts, not just the prompt.

What it does

Re-runs the full eval suite against held-out traces
Adversarial probing for safety and prompt-injection regressions
Side-by-side diff and score-delta report
Signs the evidence pack so the PR is reviewable in minutes
Auto-rejects any candidate that regresses on a blocked criterion

Inputs

Mutation Agent candidates
Held-out eval suite
Safety probes

Outputs

Signed evidence pack
Pass/fail verdict per candidate
Pull request body

Works with

Braintrust

Phoenix

MLflow

GitHub

Get early access to Evaluator Agent

Join the early-access list and we will reach out the moment this agent ships.

Join the early-access list

← Previous agentDataset Agent Next agent →Experiment Agent