Skip to main content
Your flight simulator for MCP servers and agents — Connect agents to real MCP servers, run realistic scenarios, and calculate metrics for tool calls and more.
Model Context Protocol standardizes how applications provide context to large language models (LLMs). Think of MCP like a USB-C port for AI applications.mcp-eval ensures your MCP servers, and agents built with them, work reliably in production.

What mcp-eval Does for You

Test MCP Servers

Ensure your MCP servers respond correctly to agent requests and handle edge cases gracefully

Evaluate Agents

Measure how effectively agents use tools, follow instructions, and recover from errors

Track Performance

Monitor latency, token usage, cost, and success rates with OpenTelemetry-backed metrics

Assert Quality

Use structural checks, LLM judges, and path efficiency validators to ensure high quality

Get Started in 30 Seconds

We recommend using uv:
# Install mcp-eval globally (for CLI)
uv tool install mcpevals

# Add mcp-eval dependency to your project
uv add mcpevals

# Initialize your project (interactive setup)
mcp-eval init

# Add your MCP server to test
mcp-eval server add

# Auto-generate tests with an LLM
mcp-eval generate

# Run decorator/dataset tests
mcp-eval run tests/

# Run pytest tests (use pytest)
uv run pytest -q tests
Test any MCP server: It doesn’t matter what language your MCP server is written in - Python, TypeScript, Go, Rust, Java, or any other. As long as it implements the MCP protocol, mcp-eval can test it!
You’re ready to start testing! Continue with the Quickstart →

🎮 Choose Your Testing Adventure

What are you evaluating today?
You built an MCP server (in any language!) and want to ensure it handles agent requests correctly.

Perfect! Let's test your server

mcp-eval will spin up an AI agent to test your server with realistic requests, edge cases, and error scenarios.Your server could be:
  • A streamable HTTP database connector
  • An SSE API wrapper
  • A stdio file system server
  • Any server that speaks MCP!

Start Here

MCP Server Testing Guide

See Examples

Testing the Fetch Server

Why Teams Choose mcp-eval

  • Production-readiness: Built on OpenTelemetry for enterprise-grade observability
  • Multiple test styles: Choose between decorators, pytest, or dataset-driven testing
  • Rich assertions: Content checks, tool verification, performance gates, and LLM judges
  • CI/CD friendly: GitHub Actions support, JSON/HTML reports, and regression detection
  • Language agnostic: Test MCP servers written in any language

Quick Navigation

Quickstart

Get up and running in 5 minutes

Common Workflows

Step-by-step guides for typical tasks

API Reference

Complete assertion catalog and APIs

Learning Path

Overview

Understand mcp-eval’s architecture and philosophy

Quickstart

Your first test in 5 minutes

Concepts

Core concepts and terminology

Example: Your First Test

from mcp_eval import task, Expect

@task("Verify fetch server works correctly")
async def test_fetch(agent, session):
    # Ask the agent to fetch a webpage
    response = await agent.generate_str("Fetch https://example.com and summarize it")
    
    # Assert the right tool was called
    await session.assert_that(Expect.tools.was_called("fetch"))
    
    # Verify the content is correct
    await session.assert_that(Expect.content.contains("Example Domain"), response=response)
    
    # Check performance
    await session.assert_that(Expect.performance.response_time_under(5000))

Join the Community

GitHub

Report issues and contribute

Discord

Get help and share experiences