mcp-eval Documentation

Your flight simulator for MCP servers and agents — Connect agents to real MCP servers, run realistic scenarios, and calculate metrics for tool calls and more.

Model Context Protocol standardizes how applications provide context to large language models (LLMs). Think of MCP like a USB-C port for AI applications.mcp-eval ensures your MCP servers, and agents built with them, work reliably in production.

What `mcp-eval` Does for You

Test MCP Servers

Ensure your MCP servers respond correctly to agent requests and handle edge cases gracefully

Evaluate Agents

Measure how effectively agents use tools, follow instructions, and recover from errors

Track Performance

Monitor latency, token usage, cost, and success rates with OpenTelemetry-backed metrics

Assert Quality

Use structural checks, LLM judges, and path efficiency validators to ensure high quality

Get Started in 30 Seconds

We recommend using uv:

# Install mcp-eval globally (for CLI)
uv tool install mcpevals

# Add mcp-eval dependency to your project
uv add mcpevals

# Initialize your project (interactive setup)
mcp-eval init

# Add your MCP server to test
mcp-eval server add

# Auto-generate tests with an LLM
mcp-eval generate

# Run decorator/dataset tests
mcp-eval run tests/

# Run pytest tests (use pytest)
uv run pytest -q tests

Test any MCP server: It doesn’t matter what language your MCP server is written in - Python, TypeScript, Go, Rust, Java, or any other. As long as it implements the MCP protocol, mcp-eval can test it!

You’re ready to start testing! Continue with the Quickstart →

🎮 Choose Your Testing Adventure

What are you evaluating today?

I'm testing an MCP Server
I'm testing an Agent
I want both!

You built an MCP server (in any language!) and want to ensure it handles agent requests correctly.

Perfect! Let's test your server

mcp-eval will spin up an AI agent to test your server with realistic requests, edge cases, and error scenarios.Your server could be:

A streamable HTTP database connector
An SSE API wrapper
A stdio file system server
Any server that speaks MCP!

Start Here

MCP Server Testing Guide

See Examples

Testing the Fetch Server

Why Teams Choose `mcp-eval`

Production-readiness: Built on OpenTelemetry for enterprise-grade observability
Multiple test styles: Choose between decorators, pytest, or dataset-driven testing
Rich assertions: Content checks, tool verification, performance gates, and LLM judges
CI/CD friendly: GitHub Actions support, JSON/HTML reports, and regression detection
Language agnostic: Test MCP servers written in any language

Quickstart

Get up and running in 5 minutes

Common Workflows

Step-by-step guides for typical tasks

API Reference

Complete assertion catalog and APIs

Learning Path

Getting Started
Writing Tests
Evaluation Types
Configuration
Reference

Overview

Understand mcp-eval’s architecture and philosophy

Quickstart

Your first test in 5 minutes

Concepts

Core concepts and terminology

Example: Your First Test

from mcp_eval import task, Expect

@task("Verify fetch server works correctly")
async def test_fetch(agent, session):
    # Ask the agent to fetch a webpage
    response = await agent.generate_str("Fetch https://example.com and summarize it")
    
    # Assert the right tool was called
    await session.assert_that(Expect.tools.was_called("fetch"))
    
    # Verify the content is correct
    await session.assert_that(Expect.content.contains("Example Domain"), response=response)
    
    # Check performance
    await session.assert_that(Expect.performance.response_time_under(5000))

Getting Started

Core Concepts

Writing Tests

Building with LLMs

Evaluation Guides

Configuration

CI/CD & Deployment

Test Reporting

API Reference

CLI Reference

Resources

mcp-eval Documentation

What `mcp-eval` Does for You

Test MCP Servers

Evaluate Agents

Track Performance

Assert Quality

Get Started in 30 Seconds

🎮 Choose Your Testing Adventure

Perfect! Let's test your server

Start Here

See Examples

Why Teams Choose `mcp-eval`

Quick Navigation

Quickstart

Common Workflows

API Reference

Learning Path

Overview

Quickstart

Concepts

Example: Your First Test

Join the Community

GitHub

Discord

Getting Started

Core Concepts

Writing Tests

Building with LLMs

Evaluation Guides

Configuration

CI/CD & Deployment

Test Reporting

API Reference

CLI Reference

Resources

​What mcp-eval Does for You

Test MCP Servers

Evaluate Agents

Track Performance

Assert Quality

​Get Started in 30 Seconds

​🎮 Choose Your Testing Adventure

Perfect! Let's test your server

Start Here

See Examples

​Why Teams Choose mcp-eval

​Quick Navigation

Quickstart

Common Workflows

API Reference

​Learning Path

Overview

Quickstart

Concepts

​Example: Your First Test

​Join the Community

GitHub

Discord

What `mcp-eval` Does for You

Get Started in 30 Seconds

🎮 Choose Your Testing Adventure

Why Teams Choose `mcp-eval`

Quick Navigation

Learning Path

Example: Your First Test

Join the Community