🚀 Welcome to mcp-eval! You’re about to supercharge your MCP development with powerful testing capabilities. This guide will have you testing MCP servers and agents in just 5 minutes!

What you’ll learn

By the end of this quickstart, you’ll be able to:

✅ Install and configure mcp-eval for your project
✅ Connect your MCP servers for testing
✅ Write and run your first test
✅ Understand test reports and iterate on failures
✅ Choose the right testing style for your needs

Time to complete: ~5 minutes

Before you begin

Let’s make sure you have everything ready:

System requirements

Python 3.10+

Required for running mcp-evalDownload Python →

MCP Server

Any MCP-compatible server to testBrowse MCP servers →

API Key

Claude or OpenAI key for LLM featuresGet Claude API →

New to MCP? No worries! Check out the MCP documentation to understand the basics of Model Context Protocol servers. You’ll be testing them like a pro in no time!

Your 5-minute journey to testing mastery

Install `mcp-eval` and configure API keys

First, let’s get mcp-eval installed for your project.We recommend using uv to install mcp-eval as a global tool:

uv tool install mcpevals

This makes the mcp-eval CLI available globally on your system.

Language agnostic testing: mcp-eval can test MCP servers written in any language - Python, TypeScript, Go, Rust, Java, etc. As long as your server implements the MCP protocol, mcp-eval can test it!

Next, add mcp-eval as a dependency for your project:

Using uv in a project

uv add mcpevals

Alternatively:

pip install mcpevals

Now set up your API key for the best experience:

# We recommend Claude for superior test generation and judging
export ANTHROPIC_API_KEY="sk-ant-..."

# Alternative: OpenAI
export OPENAI_API_KEY="sk-..."

Pro tip: Claude Sonnet or Opus models provide the best results for test generation and LLM judge evaluations!

Initialize your test project

Let’s set up your testing environment with our interactive wizard:

mcp-eval init

This friendly wizard will:

🎯 Ask for your preferred LLM provider and model
📝 Create mcpeval.yaml with your configuration
🔐 Set up mcpeval.secrets.yaml for secure API key storage
🤖 Help you define your first test agent
🔧 Import any existing MCP servers

What happens during init:

? Select your LLM provider: Anthropic
? Select model: claude-3-5-sonnet-20241022
? Import servers from mcp.json? Yes
? Path to mcp.json: .cursor/mcp.json
✓ Found 2 servers: fetch, filesystem
? Create a default agent? Yes
? Agent name: TestBot
? Agent instruction: You test MCP servers thoroughly
✓ Configuration saved to mcpeval.yaml
✓ Secrets saved to mcpeval.secrets.yaml

Configure MCP servers

Before we can test an MCP server, you need to tell mcp-eval how to connect to it.

Connection works over any supported transport (stdio, websocket, sse, streamable_http). You can import server configurations from mcp.json or dxt files, or specify them interactively using the mcp-eval server add command.

Adding your MCP server

You have several ways to add a server to your configuration:

Interactive Add
From mcp.json
From DXT
Manual Edit

The easiest way - let mcp-eval guide you:

mcp-eval server add

This will prompt you for:

How to add (interactive, from-mcp-json, or from-dxt)
Server name (e.g., “fetch”)
Command to run (e.g., “uvx mcp-server-fetch”)
Any arguments or environment variables

Example interaction:

? How would you like to add the server? interactive
? Server name: fetch  
? Command: uvx mcp-server-fetch
? Add environment variables? No
✓ Added server 'fetch'

Common server examples

Here are some popular MCP servers you might want to test:

# Fetch server (web content)
uvx mcp-server-fetch

Verify your server configuration: After adding a server, you can verify it’s working:

# List all configured servers
mcp-eval server list

# Validate server connectivity
mcp-eval validate

Run your first test

Time for the exciting part - running your first test! We’ll use the included fetch server example to demonstrate.

Example structure: The examples assume you have the fetch server configured. If you’re testing a different server, you’ll need to adjust the test code accordingly.

First, let’s make sure we have an example test. If you used mcp-eval init, you might already have one. Otherwise, let’s run:

mcp-eval run examples/mcp_server_fetch/tests/test_decorator_style.py \
  -v \
  --markdown test-reports/results.md \
  --html test-reports/index.html

What’s happening:

🏃 Running decorator-style tests from the example file
📊 Verbose output (-v) shows test progress
📝 Markdown report for documentation
🌐 HTML report for interactive exploration

Expected output:

Running tests...
✓ test_basic_fetch_decorator - Test basic URL fetching [2.3s]
  ✓ fetch_tool_called: Tool 'fetch' was called
  ✓ contains_domain_text: Content contains "Example Domain"
  ✓ fetch_success_rate: Tool success rate 100%

✓ test_content_extraction_decorator - Test extraction quality [3.1s]
  ✓ fetch_called_for_extraction: Tool 'fetch' was called
  ✓ extraction_quality_assessment: LLM judge score 0.92

Results: 2 passed, 0 failed
Reports saved to test-reports/

Explore your test results

Open your shiny new test report to see the details:

# Open the HTML report in your browser
open test-reports/index.html

# Or view the markdown report
cat test-reports/results.md

Understanding the HTML report:The interactive report shows:

📊 Overview dashboard - Pass/fail rates, performance metrics
🔍 Test details - Each test with all assertions
🛠️ Tool usage - What tools were called and when
💭 LLM reasoning - The agent’s thought process
⚡ Performance - Response times and efficiency metrics
🎯 Failed assertions - Detailed diffs and explanations

Common things to check:

Did the right tools get called?
Was the output accurate?
How efficient was the agent’s approach?
What was the LLM judge’s assessment?

Test failed? Don’t worry! Check the assertion details to understand why. Common issues:

Tool not found (check server configuration)
Content mismatch (adjust your assertions)
Timeout (increase timeout in config)

What’s next? Write your own test!

Now that you’ve run the example, let’s write your very first custom test:

Choose your testing style

Task Decorators
Pytest

Best for: Quick, readable tests

from mcp_eval import task, Expect

@task("My first test")
async def test_my_server(agent, session):
    response = await agent.generate_str(
        "Use my tool to do something"
    )
    
    await session.assert_that(
        Expect.tools.was_called("my_tool"),
        response=response
    )

Your test file structure

Create a new test file tests/test_my_server.py:

"""Tests for my awesome MCP server."""

from mcp_eval import task, setup, Expect

@setup
def configure_tests():
    """Any setup needed before tests run."""
    print("🚀 Starting my server tests!")

@task("Test basic functionality")
async def test_basic_operation(agent, session):
    """Verify the server responds correctly to basic requests."""
    
    # 1. Send a prompt to the agent
    response = await agent.generate_str(
        "Please use the calculator to add 2 + 2"
    )
    
    # 2. Check that the right tool was called
    await session.assert_that(
        Expect.tools.was_called("calculate"),
        name="calculator_used"
    )
    
    # 3. Verify the response content
    await session.assert_that(
        Expect.content.contains("4"),
        name="correct_answer",
        response=response
    )
    
    # 4. Check efficiency (optional)
    await session.assert_that(
        Expect.performance.max_iterations(3),
        name="completed_efficiently"
    )

@task("Test error handling")
async def test_error_recovery(agent, session):
    """Verify graceful error handling."""
    
    response = await agent.generate_str(
        "Try to divide by zero, then recover"
    )
    
    # Use LLM judge for complex behavior
    await session.assert_that(
        Expect.judge.llm(
            rubric="Agent should handle error gracefully and provide helpful response",
            min_score=0.8
        ),
        name="error_handling_quality",
        response=response
    )

Run your new test:

mcp-eval run tests/test_my_server.py -v --html reports/my_test.html

Troubleshooting common issues

My server isn't being found

Solution: Check your mcpeval.yaml to ensure the server is properly configured:

mcp:
  servers:
    my_server:
      command: "python"
      args: ["path/to/server.py"]

Also verify the server name matches what you’re using in your agent’s server_names.

Tests are timing out

Solution: Increase the timeout in your configuration:

execution:
  timeout_seconds: 600  # 10 minutes

API key errors

Solution: Ensure your API key is set correctly:

# Check if it's set
echo $ANTHROPIC_API_KEY

# Or add to mcpeval.secrets.yaml
anthropic:
  api_key: "sk-ant-..."

Resources to level up

Ready to become a mcp-eval expert? Here’s your learning path:

Complete Examples

Full test suites showing all testing patterns

Common Workflows

Step-by-step guides for typical testing scenarios

Configuration Guide

Deep dive into all configuration options

Best Practices

Pro tips for writing maintainable tests

Get help

💬 Questions? Check our FAQ or troubleshooting guide
🐛 Found a bug? Report it on GitHub
💡 Have ideas? We’d love to hear them in discussions

Congratulations! 🎉 You’ve successfully set up mcp-eval and run your first tests. You’re now ready to ensure your MCP servers and agents work flawlessly. Happy testing!

Getting Started

Core Concepts

Writing Tests

Building with LLMs

Evaluation Guides

Configuration

CI/CD & Deployment

Test Reporting

API Reference

CLI Reference

Resources

Quickstart Guide

What you’ll learn

Before you begin

System requirements

Python 3.10+

MCP Server

API Key

Your 5-minute journey to testing mastery

Using uv in a project

Adding your MCP server

Common server examples

What’s next? Write your own test!

Choose your testing style

Your test file structure

Troubleshooting common issues

Resources to level up

Complete Examples

Common Workflows

Configuration Guide

Best Practices

Get help

Getting Started

Core Concepts

Writing Tests

Building with LLMs

Evaluation Guides

Configuration

CI/CD & Deployment

Test Reporting

API Reference

CLI Reference

Resources

​What you’ll learn

​Before you begin

​System requirements

Python 3.10+

MCP Server

API Key

​Your 5-minute journey to testing mastery

​Using uv in a project

​Adding your MCP server

​Common server examples

​What’s next? Write your own test!

​Choose your testing style

​Your test file structure

​Troubleshooting common issues

​Resources to level up

Complete Examples

Common Workflows

Configuration Guide

Best Practices

​Get help

What you’ll learn

Before you begin

System requirements

Your 5-minute journey to testing mastery

Using uv in a project

Adding your MCP server

Common server examples

What’s next? Write your own test!

Choose your testing style

Your test file structure

Troubleshooting common issues

Resources to level up

Get help