A comprehensive, single-page guide to mcp-eval
: concepts, setup, styles, assertions, metrics, CLI, and best practices.
mcp-eval
?mcp-eval
as your “flight simulator” for tool‑using LLMs. You plug in an agent, connect it to real MCP servers (tools), and run realistic scenarios. The framework captures OTEL traces as the single source of truth, turns them into metrics, and gives you expressive assertions for both content and behavior.
uv tool install mcpevals
(recommended) or pip install mcpevals
mcp-eval init
- interactive setup for API keys and configurationmcp-eval server add
- configure the server you want to testmcp-eval run tests/
- execute your test suitemcp-eval
connects to it via the MCP protocol, making testing completely language-agnostic.mcp-eval generate
to bootstrap comprehensive tests. We recommend Anthropic Sonnet/Opus. See Test Generation.
mcp-eval doctor
, validate
, and issue
for diagnosis. See Troubleshooting.
mcpeval.yaml
for eval knobs