Join our 12-week Enablement Program to optimize AI agents for reliability, cost and speed. Apply now →

02 · Evaluate & test

Dataset Agent

Build the dataset your evals run against

Dataset Agent owns the data that everything else in the Mutagent loop scores against. It mines production traces for the hard cases, clusters them by failure mode, fills the long tail with synthetic edge cases, and routes whatever still needs a human to the right reviewer.

Because datasets drift the moment your traffic does, Dataset Agent keeps the set alive: new failure modes get pulled in continuously, stale items get retired, and every change is versioned alongside the prompt it was scored against. The Evaluator never runs against yesterday's reality.

What it does

Mines production traces for hard cases and rare failure modes
Clusters traces by semantic similarity to find coverage gaps
Generates synthetic edge cases to fill the long tail
Routes ambiguous items to human reviewers with the right context
Versions every dataset change next to the prompts it scored against

Inputs

Production traces
Existing eval cases
Reviewer feedback
Synthetic-generation policy

Outputs

Curated eval dataset
Coverage report
Per-cluster labelled samples

Works with

Langfuse

LangSmith

Braintrust

Phoenix

Try Dataset Agent today

Install the CLI and run this agent against your own evals in under five minutes.

Try Dataset Agent

← Previous agentBuild Agent Next agent →Evaluator Agent