02 · Evaluate & testComing soon
Evaluator Agent
Prove every change beats baseline before it ships
Evaluator Agent is the gate between the Mutation Agent's candidates and your repository. It re-runs the full eval suite, scores against held-out production traces, and runs adversarial probes for safety regressions.
A candidate that passes the Evaluator comes with a signed evidence pack — eval deltas, latency, cost, and a side-by-side diff — so the human reviewer sees the receipts, not just the prompt.
What it does
- Re-runs the full eval suite against held-out traces
- Adversarial probing for safety and prompt-injection regressions
- Side-by-side diff and score-delta report
- Signs the evidence pack so the PR is reviewable in minutes
- Auto-rejects any candidate that regresses on a blocked criterion
Inputs
- Mutation Agent candidates
- Held-out eval suite
- Safety probes
Outputs
- Signed evidence pack
- Pass/fail verdict per candidate
- Pull request body
Works with
Get early access to Evaluator Agent
Join the early-access list and we will reach out the moment this agent ships.
Join the early-access list