mcp-eval Documentation

Your flight simulator for MCP servers and agents — Connect agents to real MCP servers, run realistic scenarios, and calculate metrics for tool calls and more.

Model Context Protocol standardizes how applications provide context to large language models (LLMs). Think of MCP like a USB-C port for AI applications.mcp-eval ensures your MCP servers, and agents built with them, work reliably in production.

What `mcp-eval` Does for You

Test MCP Servers

Ensure your MCP servers respond correctly to agent requests and handle edge cases gracefully

Evaluate Agents

Measure how effectively agents use tools, follow instructions, and recover from errors

Track Performance

Monitor latency, token usage, cost, and success rates with OpenTelemetry-backed metrics

Assert Quality

Use structural checks, LLM judges, and path efficiency validators to ensure high quality

Get Started in 30 Seconds

We recommend using uv:

# Install mcp-eval globally (for CLI)
uv tool install mcpevals

# Add mcp-eval dependency to your project
uv add mcpevals

# Initialize your project (interactive setup)
mcp-eval init

# Add your MCP server to test
mcp-eval server add

# Auto-generate tests with an LLM
mcp-eval generate

# Run decorator/dataset tests
mcp-eval run tests/

# Run pytest tests (use pytest)
uv run pytest -q tests

# Install mcp-eval
pip install mcpevals

# Initialize your project
mcp-eval init

# Add your MCP server
mcp-eval server add

# Run decorator/dataset tests
mcp-eval run tests/

# Run pytest tests (use pytest)
pytest -q tests

Test any MCP server: It doesn’t matter what language your MCP server is written in - Python, TypeScript, Go, Rust, Java, or any other. As long as it implements the MCP protocol, mcp-eval can test it!

You’re ready to start testing! Continue with the Quickstart →

🎮 Choose Your Testing Adventure

What are you evaluating today?

I'm testing an MCP Server
I'm testing an Agent
I want both!

You built an MCP server (in any language!) and want to ensure it handles agent requests correctly.

Perfect! Let's test your server

mcp-eval will spin up an AI agent to test your server with realistic requests, edge cases, and error scenarios.Your server could be:

A streamable HTTP database connector
An SSE API wrapper
A stdio file system server
Any server that speaks MCP!

Start Here

MCP Server Testing Guide

See Examples

Testing the Fetch Server

Why Teams Choose `mcp-eval`

Production-readiness: Built on OpenTelemetry for enterprise-grade observability
Multiple test styles: Choose between decorators, pytest, or dataset-driven testing
Rich assertions: Content checks, tool verification, performance gates, and LLM judges
CI/CD friendly: GitHub Actions support, JSON/HTML reports, and regression detection
Language agnostic: Test MCP servers written in any language

Quickstart

Get up and running in 5 minutes

Common Workflows

Step-by-step guides for typical tasks

API Reference

Complete assertion catalog and APIs

Learning Path

Getting Started
Writing Tests
Evaluation Types
Configuration
Reference

Overview

Understand mcp-eval’s architecture and philosophy

Quickstart

Your first test in 5 minutes

Concepts

Core concepts and terminology

Example: Your First Test

from mcp_eval import task, Expect

@task("Verify fetch server works correctly")
async def test_fetch(agent, session):
    # Ask the agent to fetch a webpage
    response = await agent.generate_str("Fetch https://example.com and summarize it")
    
    # Assert the right tool was called
    await session.assert_that(Expect.tools.was_called("fetch"))
    
    # Verify the content is correct
    await session.assert_that(Expect.content.contains("Example Domain"), response=response)
    
    # Check performance
    await session.assert_that(Expect.performance.response_time_under(5000))

import pytest
from mcp_eval import create_agent, Expect

@pytest.mark.asyncio
async def test_fetch_with_pytest():
    agent = await create_agent("claude-3-5-sonnet")
    response = await agent.generate_str("Fetch https://example.com")
    
    assert "Example Domain" in response
    assert agent.tools_called == ["fetch"]

​What mcp-eval Does for You

Test MCP Servers

Evaluate Agents

Track Performance

Assert Quality

​Get Started in 30 Seconds

​🎮 Choose Your Testing Adventure

Perfect! Let's test your server

Start Here

See Examples

Perfect! Let's evaluate your agent

Start Here

Learn Patterns

Awesome! Test the whole stack

Quick Start

Concepts

Examples

​Why Teams Choose mcp-eval

​Quick Navigation

Quickstart

Common Workflows

API Reference

​Learning Path

Overview

Quickstart

Concepts

Assertions

Common Workflows

Test Generation

Server Evaluation

Agent Evaluation

Datasets

Configuration

CI/CD

Reports

CLI Reference

API Reference

Troubleshooting

FAQ

​Example: Your First Test

​Join the Community

GitHub

Discord

What `mcp-eval` Does for You

Get Started in 30 Seconds

🎮 Choose Your Testing Adventure

Why Teams Choose `mcp-eval`

Quick Navigation

Learning Path

Example: Your First Test

Join the Community