A team of agents for your AI agents.
Six specialized Mutagent agents across four lifecycle phases. Run them à la carte, or let the full loop run end to end. Click any agent to see exactly what it does, what it consumes, and what it produces.
Build
Turn an idea into a running, instrumented agent.
Spec Agent
Describe what you need and get the right questions, a recommended architecture, a node graph, and a framework-agnostic spec with eval criteria baked in.
Build Agent
Install the CLI, log in, and map your existing prompts and traces automatically. Connect to Langfuse, LangSmith, or raw OpenTelemetry. No YAML, no infrastructure changes.
Evaluate & test
Build the eval set, score every candidate, ship the winners.
Dataset Agent
Curate dataset items from production traces, expert annotations, and synthetic generation. Get the hard cases labelled before they ship. Maintain the set as the agent and your traffic evolve. Stale datasets are why "tests pass but quality drops".
Evaluator Agent
Score every candidate prompt against your evals and held-out production traces. Only candidates that beat baseline without regressing reach the merge button.
Experiment Agent
Compare any prompt against any dataset with any model. Score Claude vs GPT vs Gemini side by side against your eval criteria. Know exactly where you stand before optimization changes anything.
Deploy Agent
Stage rollouts, canary on a slice of traffic, watch the key evals live, and roll back automatically if quality slips.
Improve
Watch production, diagnose drift, mutate, validate, ship.
Incident Agent
Define acceptance thresholds once and monitor production automatically. Trigger optimization when drift is detected, validate candidates, and deploy improvements. You get a notification, not a ticket.
Diagnose Agent
Run your entire eval set, group failures by semantic similarity, and trace each cluster back to a specific phrase or structural gap in the prompt. Ranked hypotheses, not a wall of logs.
Mutation Agent
Given a strategy from Optimize Agent, instruct the AI-agent coding agent to mutate the prompt, tool, code, or config. Validate against the Evaluator. Ship the candidates that beat baseline, abandon the rest.
Auto Engineer Agent
When Incident Agent flags drift, Auto Engineer kicks off Diagnose, Mutate, Evaluate, and Deploy automatically. You get notified, not paged.