03 · Improve

Diagnose Agent

Find exactly what is failing

Diagnose Agent turns a sea of red eval results into a short, ranked list of root causes. It clusters failures by semantic similarity, traces each cluster back to a specific phrase or structural gap in the prompt, and ranks them by impact so you know what to fix first.

No more eyeballing logs at 2am. Every diagnosis comes with sample traces, the failing eval cases, and a confidence score.

What it does

Runs your full eval suite against any prompt version
Clusters failures by semantic similarity
Traces each cluster back to a specific prompt phrase or missing tool description
Ranks root causes by impact and confidence
Surfaces representative sample traces for each cluster

Inputs

Eval dataset
Production traces
Current prompts and tool definitions

Outputs

Ranked root-cause list
Failure clusters with sample traces
Impact estimates

Works with

Langfuse

LangSmith

Datadog

Braintrust

Try Diagnose Agent today

Install the CLI and run this agent against your own evals in under five minutes.

See a sample diagnosis

← Previous agentIncident Agent Next agent →Mutation Agent