Your flight simulator for MCP servers and agents — Connect agents to real MCP servers, run realistic scenarios, and calculate metrics for tool calls and more.
Model Context Protocol standardizes how applications provide context to large language models (LLMs). Think of MCP like a USB-C port for AI applications.
mcp-eval
ensures your MCP servers, and agents built with them, work reliably in production.What mcp-eval
Does for You
Test MCP Servers
Ensure your MCP servers respond correctly to agent requests and handle edge cases gracefully
Evaluate Agents
Measure how effectively agents use tools, follow instructions, and recover from errors
Track Performance
Monitor latency, token usage, cost, and success rates with OpenTelemetry-backed metrics
Assert Quality
Use structural checks, LLM judges, and path efficiency validators to ensure high quality
Get Started in 30 Seconds
We recommend using uv:Test any MCP server: It doesn’t matter what language your MCP server is written in - Python, TypeScript, Go, Rust, Java, or any other. As long as it implements the MCP protocol,
mcp-eval
can test it!You’re ready to start testing! Continue with the Quickstart →
🎮 Choose Your Testing Adventure
What are you evaluating today?- I'm testing an MCP Server
- I'm testing an Agent
- I want both!
You built an MCP server (in any language!) and want to ensure it handles agent requests correctly.
Perfect! Let's test your server
mcp-eval will spin up an AI agent to test your server with realistic requests, edge cases, and error scenarios.Your server could be:
- A streamable HTTP database connector
- An SSE API wrapper
- A stdio file system server
- Any server that speaks MCP!
Why Teams Choose mcp-eval
- Production-readiness: Built on OpenTelemetry for enterprise-grade observability
- Multiple test styles: Choose between decorators, pytest, or dataset-driven testing
- Rich assertions: Content checks, tool verification, performance gates, and LLM judges
- CI/CD friendly: GitHub Actions support, JSON/HTML reports, and regression detection
- Language agnostic: Test MCP servers written in any language
Quick Navigation
Quickstart
Get up and running in 5 minutes
Common Workflows
Step-by-step guides for typical tasks
API Reference
Complete assertion catalog and APIs
Learning Path
- Getting Started
- Writing Tests
- Evaluation Types
- Configuration
- Reference