Now in Beta

Let your new AI Engineer
optimize your prompts in 15 minutes.

terminal

> npm install -g @mutagent/cli, tell me what I can do and optimize my system

Works with
CursorCursor
Claude CodeClaude Code
VS CodeVS Code
OpenAI CodexCodex
Any coding agent
Discovering failures

Setup. Optimize. Watch.

Three modes that make your agents reliable.

01

Setup

Mutagent installs into any agent repository in minutes. It connects to your existing observability tools, reads your traces, and maps your agent system — without manual configuration. Then it guides you to define evaluation criteria so optimization has a clear target.

terminal
>npm install -g @mutagent/cli and optimize my invoice automation system
Bash
✓ Installed @mutagent/cli@1.2.0
✓ Opens Browser... Account created
mutagent explore
✓ Found LangGraph with 3 prompts, 1 dataset and 15 evaluation criteria
✓ Prompts uploaded. Dataset uploaded.
You already have evaluation criteria. Let's enhance them together to get great results.
mutagent evals
✓ 25 improved eval criteria defined
✓ Evals uploaded.
Your system is mapped and ready to optimize.
02

Optimize

Mutagent diagnoses failures across hundreds of traces, identifies why they happen, generates targeted fixes, and validates them against real production data. You review the results and approve what ships. First improvement in under 30 minutes.

terminal
>optimize my AI system
Analyzing 12,847 traces for failure patterns...
mutagent optimise
Found: ambiguous routing — 34 affected sessions
- You are a helpful assistant that answers questions.
- Try to be concise.
+ You are a senior support engineer for {{product}}.
+ Answer precisely using the knowledge base. If uncertain, escalate.
+ Always include the ticket ID in your response.
mutagent validate
47/47 evaluation cases passed · 0 regressions
Fix validated. Should the fix be applied? (y/n)
03

Watch

Mutagent continuously monitors production traces. When it detects degradation, emerging failure patterns, or drift, it automatically triggers an optimization cycle. Your agent learns from every interaction. The longer you use it, the smarter it gets.

terminal
>watch my agents and optimize when needed
Starting continuous monitoring on 3 agents.
mutagent watch
◈ Watching 3 agents · 12,847 traces analyzed
Performance drift detected on support-agent
mutagent optimize (auto-triggered)
Root cause: outdated few-shot examples
Fix generated and validated
✓ 52/52 cases passed · 0 regressions
Drift detected and fixed automatically. View report →
Should the fix be applied? (y/n)
Last cycle: 4m ago · Next check: 12m

We help you develop what good looks like. That's the foundation.

Before Mutagent optimizes anything, it helps you build evaluation criteria and a test dataset from your production data. Then it takes it from there.

You stay in control.

Mutagent never changes your agents without your approval. Every optimization ends with: what was diagnosed, why this fix was chosen, and what the validation data shows. You trace back to the actual root cause and the complete experiment outcome.

Approve to apply. Decline to discard.

Before any fix is proposed, Mutagent validates it against your evaluation dataset. If anything regresses, the change is blocked.

terminal
Optimization complete — support-agent
ambiguous routing fix · 34 affected sessions
mutagent diagnose
System prompt lacks domain-specific instructions. 34 sessions routed to generic fallback instead of specialist handler.
mutagent mutate
- You are a helpful assistant.
+ You are a senior support engineer for {{product}}.
+ If uncertain, escalate to specialist.
mutagent validate
Evaluation cases47/47 passed
Regressions0
Routing accuracy81% → 86.5%
Fix validated. View detailed report →
Should the fix be applied? (y/n)

What we mutate

Mutagent targets the configuration layer that controls how your agents behave.

Prompts

System prompts, task instructions, persona definitions. Where most agent failures originate. A single word change can shift routing accuracy by 25 percentage points.

Output descriptions

The structured definitions that tell agents what shape their responses should take. Vague output specs lead to inconsistent results and downstream parsing failures.

Tool descriptions

The natural language that tells agents when and how to use tools. Ambiguous descriptions are one of the most common causes of tool selection failures.

Few-shot examples

Pulls relevant cases from production traces. Adds them where coverage is missing, replaces examples that mislead.

Evaluation criteria

Scoring rules and test datasets that define what "good" looks like. Refined as Mutagent learns from production data.

FAQ

Frequently Asked Questions

Mutagent is your autonomous AI engineer, delivered as a CLI. It diagnoses what's going wrong with your AI agents, figures out why, proposes a precise fix, and proves it works — before you ever see it. You review the results and approve what ships.

Time to optimize your AI system.

Under 30 minutes from install to your first validated fix.