Use Cases

Mutagent in Action

See how Mutagent powers agent optimization across different workflows and scenarios.

Discover Prompts in Your Codebase

Your agent has prompts scattered across files, configs, and framework code. Mutagent scans your entire codebase, finds every prompt, and maps how they connect, so you know exactly what you're optimizing.

mutagent initmutagent explore

Evaluate Prompt Quality

How good is your prompt, really? Define evaluation rubrics, run them against real datasets, and get a structured scorecard, not vibes.

mutagent prompts dataset add <id> ...mutagent prompts evaluation create <id> --guided

Optimize Prompts Automatically

Stop hand-tuning prompts. Mutagent runs optimization jobs that generate prompt mutations, test them against your evaluations, and surface the best-performing variant.

mutagent prompts optimize start <id> --dataset <d> --evaluation <e>mutagent prompts optimize results <job-id>

Test Prompts Interactively

Prototype fast. Run any prompt against live inputs in the playground, iterate on variables, and validate outputs, all from your terminal.

mutagent playground run <id> --input '{...}'

Integrate with Any Framework

LangGraph, Mastra, ADK, Claude Code: Mutagent generates framework-specific integration guides so your coding agent can wire optimized prompts into your stack automatically.

mutagent integrate langgraphmutagent integrate mastra

Analyze Traces & Debug Failures

Your agent fails on edge cases but logs don't tell you why. Mutagent ingests traces, surfaces failure patterns, and connects them back to the prompts that caused them.

mutagent tracesmutagent prompts get <id> --json

Ready to optimize your agents?

Install Mutagent and start turning agent failures into improvements, in minutes.

We diagnose before we fix

While other optimizers run searches, we run diagnostics.

Instruction ambiguity

Unclear or conflicting instructions that the model can interpret multiple ways. The default root cause for most prompt-level failures.

-Extract the total from the invoice if available.
+Extract the grand total. If multiple totals exist (subtotal, tax, shipping), return each as a separate field. Never sum them yourself.

Missing context

Required runtime context not provided to the model. The input data is incomplete or absent, not a prompt structure issue.

-Classify the support ticket and suggest next steps.
+Classify the ticket. Product: {{product_name}}.
+Plan: {{plan_tier}}. Prior tickets: {{ticket_history}}.
+SLA deadline: {{sla_hours}}h.

Missing section

An entire prompt section is absent. The field or concept has zero presence in the prompt, not even a partial mention.

-(no multi-language handling anywhere in prompt)
+## Language handling
+Detect input language. Respond in same language.
+If mixed, prefer the language of the first user message.

Output format mismatch

The output value is semantically wrong, not structurally. Format and type constraints are violated in ways that break downstream consumers.

-date: "March 15th, 2026"
-amount: "$1,234.56"
+date: "2026-03-15" // ISO 8601
+amount: 1234.56 // numeric, no currency symbol

Constraint violation

Explicit guardrails in the prompt are not followed by the model, especially under pressure or complex inputs.

-Keep responses short. Do not discuss competitors.
+Max 3 sentences per response.
+If the user mentions a competitor by name, reply:
+"I can only help with {{product_name}} questions."

Reasoning gap

The prompt lacks chain-of-thought guidance. The model jumps to conclusions without the intermediate reasoning steps needed for accuracy.

-Determine the refund eligibility.
+Step 1: Check if order is within 30-day window.
+Step 2: Verify item condition against policy.
+Step 3: Calculate refund amount. Then decide.

Edge case unhandled

A genuinely novel, unanticipated scenario the prompt was never designed for. Not common challenges, but truly unexpected input patterns.

-Parse the uploaded document and extract fields.
+Parse the document.
+If handwritten text detected, flag for human review.
+If scanned pages are rotated, attempt OCR correction.

Variable misuse

A template variable is referenced or applied incorrectly within the prompt. Wrong delimiters, wrong names, or mismatched interpolation.

-Use the customer data: {customer}
-Previous orders: {history}
+Customer name: {{customer.name}}
+Account ID: {{customer.id}}
+Orders (last 90d): {{customer.recent_orders}}

Schema description weak

Schema field descriptions exist but are insufficient. They describe shape without intent, leaving the model to guess semantics.

-urgency: number
-sentiment: string
+urgency: number, 1 (can wait 7d) to 5 (revenue at risk).
+Used for queue priority.
+sentiment: "positive" | "neutral" | "negative"

Instruction ambiguity

Unclear or conflicting instructions that the model can interpret multiple ways. The default root cause for most prompt-level failures.

-Extract the total from the invoice if available.
+Extract the grand total. If multiple totals exist (subtotal, tax, shipping), return each as a separate field. Never sum them yourself.

Missing context

Required runtime context not provided to the model. The input data is incomplete or absent, not a prompt structure issue.

-Classify the support ticket and suggest next steps.
+Classify the ticket. Product: {{product_name}}.
+Plan: {{plan_tier}}. Prior tickets: {{ticket_history}}.
+SLA deadline: {{sla_hours}}h.

Missing section

An entire prompt section is absent. The field or concept has zero presence in the prompt, not even a partial mention.

-(no multi-language handling anywhere in prompt)
+## Language handling
+Detect input language. Respond in same language.
+If mixed, prefer the language of the first user message.

Output format mismatch

The output value is semantically wrong, not structurally. Format and type constraints are violated in ways that break downstream consumers.

-date: "March 15th, 2026"
-amount: "$1,234.56"
+date: "2026-03-15" // ISO 8601
+amount: 1234.56 // numeric, no currency symbol

Constraint violation

Explicit guardrails in the prompt are not followed by the model, especially under pressure or complex inputs.

-Keep responses short. Do not discuss competitors.
+Max 3 sentences per response.
+If the user mentions a competitor by name, reply:
+"I can only help with {{product_name}} questions."

Reasoning gap

The prompt lacks chain-of-thought guidance. The model jumps to conclusions without the intermediate reasoning steps needed for accuracy.

-Determine the refund eligibility.
+Step 1: Check if order is within 30-day window.
+Step 2: Verify item condition against policy.
+Step 3: Calculate refund amount. Then decide.

Edge case unhandled

A genuinely novel, unanticipated scenario the prompt was never designed for. Not common challenges, but truly unexpected input patterns.

-Parse the uploaded document and extract fields.
+Parse the document.
+If handwritten text detected, flag for human review.
+If scanned pages are rotated, attempt OCR correction.

Variable misuse

A template variable is referenced or applied incorrectly within the prompt. Wrong delimiters, wrong names, or mismatched interpolation.

-Use the customer data: {customer}
-Previous orders: {history}
+Customer name: {{customer.name}}
+Account ID: {{customer.id}}
+Orders (last 90d): {{customer.recent_orders}}

Schema description weak

Schema field descriptions exist but are insufficient. They describe shape without intent, leaving the model to guess semantics.

-urgency: number
-sentiment: string
+urgency: number, 1 (can wait 7d) to 5 (revenue at risk).
+Used for queue priority.
+sentiment: "positive" | "neutral" | "negative"

Instruction ambiguity

Unclear or conflicting instructions that the model can interpret multiple ways. The default root cause for most prompt-level failures.

-Extract the total from the invoice if available.
+Extract the grand total. If multiple totals exist (subtotal, tax, shipping), return each as a separate field. Never sum them yourself.

Missing context

Required runtime context not provided to the model. The input data is incomplete or absent, not a prompt structure issue.

-Classify the support ticket and suggest next steps.
+Classify the ticket. Product: {{product_name}}.
+Plan: {{plan_tier}}. Prior tickets: {{ticket_history}}.
+SLA deadline: {{sla_hours}}h.

Missing section

An entire prompt section is absent. The field or concept has zero presence in the prompt, not even a partial mention.

-(no multi-language handling anywhere in prompt)
+## Language handling
+Detect input language. Respond in same language.
+If mixed, prefer the language of the first user message.

Output format mismatch

The output value is semantically wrong, not structurally. Format and type constraints are violated in ways that break downstream consumers.

-date: "March 15th, 2026"
-amount: "$1,234.56"
+date: "2026-03-15" // ISO 8601
+amount: 1234.56 // numeric, no currency symbol

Constraint violation

Explicit guardrails in the prompt are not followed by the model, especially under pressure or complex inputs.

-Keep responses short. Do not discuss competitors.
+Max 3 sentences per response.
+If the user mentions a competitor by name, reply:
+"I can only help with {{product_name}} questions."

Reasoning gap

The prompt lacks chain-of-thought guidance. The model jumps to conclusions without the intermediate reasoning steps needed for accuracy.

-Determine the refund eligibility.
+Step 1: Check if order is within 30-day window.
+Step 2: Verify item condition against policy.
+Step 3: Calculate refund amount. Then decide.

Edge case unhandled

A genuinely novel, unanticipated scenario the prompt was never designed for. Not common challenges, but truly unexpected input patterns.

-Parse the uploaded document and extract fields.
+Parse the document.
+If handwritten text detected, flag for human review.
+If scanned pages are rotated, attempt OCR correction.

Variable misuse

A template variable is referenced or applied incorrectly within the prompt. Wrong delimiters, wrong names, or mismatched interpolation.

-Use the customer data: {customer}
-Previous orders: {history}
+Customer name: {{customer.name}}
+Account ID: {{customer.id}}
+Orders (last 90d): {{customer.recent_orders}}

Schema description weak

Schema field descriptions exist but are insufficient. They describe shape without intent, leaving the model to guess semantics.

-urgency: number
-sentiment: string
+urgency: number, 1 (can wait 7d) to 5 (revenue at risk).
+Used for queue priority.
+sentiment: "positive" | "neutral" | "negative"