Your flight simulator for MCP servers and agents — Connect agents to real MCP servers, run realistic scenarios, and calculate metrics for tool calls and more.
Model Context Protocol standardizes how applications provide context to large language models (LLMs). Think of MCP like a USB-C port for AI applications.mcp-eval ensures your MCP servers, and agents built with them, work reliably in production.

What mcp-eval Does for You

Test MCP Servers

Ensure your MCP servers respond correctly to agent requests and handle edge cases gracefully

Evaluate Agents

Measure how effectively agents use tools, follow instructions, and recover from errors

Track Performance

Monitor latency, token usage, cost, and success rates with OpenTelemetry-backed metrics

Assert Quality

Use structural checks, LLM judges, and path efficiency validators to ensure high quality

Get Started in 30 Seconds

We recommend using uv:
# Install mcp-eval globally (for CLI)
uv tool install mcpevals

# Add mcp-eval dependency to your project
uv add mcpevals

# Initialize your project (interactive setup)
mcp-eval init

# Add your MCP server to test
mcp-eval server add

# Auto-generate tests with an LLM
mcp-eval generate

# Run decorator/dataset tests
mcp-eval run tests/

# Run pytest tests (use pytest)
uv run pytest -q tests
Test any MCP server: It doesn’t matter what language your MCP server is written in - Python, TypeScript, Go, Rust, Java, or any other. As long as it implements the MCP protocol, mcp-eval can test it!
You’re ready to start testing! Continue with the Quickstart →

🎮 Choose Your Testing Adventure

What are you evaluating today?
You built an MCP server (in any language!) and want to ensure it handles agent requests correctly.

Perfect! Let's test your server

mcp-eval will spin up an AI agent to test your server with realistic requests, edge cases, and error scenarios.Your server could be:
  • A streamable HTTP database connector
  • An SSE API wrapper
  • A stdio file system server
  • Any server that speaks MCP!

Why Teams Choose mcp-eval

  • Production-readiness: Built on OpenTelemetry for enterprise-grade observability
  • Multiple test styles: Choose between decorators, pytest, or dataset-driven testing
  • Rich assertions: Content checks, tool verification, performance gates, and LLM judges
  • CI/CD friendly: GitHub Actions support, JSON/HTML reports, and regression detection
  • Language agnostic: Test MCP servers written in any language

Quick Navigation

Learning Path

Example: Your First Test

from mcp_eval import task, Expect

@task("Verify fetch server works correctly")
async def test_fetch(agent, session):
    # Ask the agent to fetch a webpage
    response = await agent.generate_str("Fetch https://example.com and summarize it")
    
    # Assert the right tool was called
    await session.assert_that(Expect.tools.was_called("fetch"))
    
    # Verify the content is correct
    await session.assert_that(Expect.content.contains("Example Domain"), response=response)
    
    # Check performance
    await session.assert_that(Expect.performance.response_time_under(5000))

Join the Community