🚀 Welcome to mcp-eval! You’re about to supercharge your MCP development with powerful testing capabilities. This guide will have you testing MCP servers and agents in just 5 minutes!

What you’ll learn

By the end of this quickstart, you’ll be able to:
  • ✅ Install and configure mcp-eval for your project
  • ✅ Connect your MCP servers for testing
  • ✅ Write and run your first test
  • ✅ Understand test reports and iterate on failures
  • ✅ Choose the right testing style for your needs
Time to complete: ~5 minutes

Before you begin

Let’s make sure you have everything ready:

System requirements

Python 3.10+

Required for running mcp-evalDownload Python →

MCP Server

Any MCP-compatible server to testBrowse MCP servers →

API Key

Claude or OpenAI key for LLM featuresGet Claude API →
New to MCP? No worries! Check out the MCP documentation to understand the basics of Model Context Protocol servers. You’ll be testing them like a pro in no time!

Your 5-minute journey to testing mastery

1

Install `mcp-eval` and configure API keys

First, let’s get mcp-eval installed for your project.We recommend using uv to install mcp-eval as a global tool:
uv tool install mcpevals
This makes the mcp-eval CLI available globally on your system.
Language agnostic testing: mcp-eval can test MCP servers written in any language - Python, TypeScript, Go, Rust, Java, etc. As long as your server implements the MCP protocol, mcp-eval can test it!
Next, add mcp-eval as a dependency for your project:

Using uv in a project

uv add mcpevals
Alternatively:
pip install mcpevals
Now set up your API key for the best experience:
# We recommend Claude for superior test generation and judging
export ANTHROPIC_API_KEY="sk-ant-..."

# Alternative: OpenAI
export OPENAI_API_KEY="sk-..."
Pro tip: Claude Sonnet or Opus models provide the best results for test generation and LLM judge evaluations!
2

Initialize your test project

Let’s set up your testing environment with our interactive wizard:
mcp-eval init
This friendly wizard will:
  • 🎯 Ask for your preferred LLM provider and model
  • 📝 Create mcpeval.yaml with your configuration
  • 🔐 Set up mcpeval.secrets.yaml for secure API key storage
  • 🤖 Help you define your first test agent
  • 🔧 Import any existing MCP servers
What happens during init:
? Select your LLM provider: Anthropic
? Select model: claude-3-5-sonnet-20241022
? Import servers from mcp.json? Yes
? Path to mcp.json: .cursor/mcp.json
✓ Found 2 servers: fetch, filesystem
? Create a default agent? Yes
? Agent name: TestBot
? Agent instruction: You test MCP servers thoroughly
✓ Configuration saved to mcpeval.yaml
✓ Secrets saved to mcpeval.secrets.yaml
3

Configure MCP servers

Before we can test an MCP server, you need to tell mcp-eval how to connect to it.
Connection works over any supported transport (stdio, websocket, sse, streamable_http). You can import server configurations from mcp.json or dxt files, or specify them interactively using the mcp-eval server add command.

Adding your MCP server

You have several ways to add a server to your configuration:
The easiest way - let mcp-eval guide you:
mcp-eval server add
This will prompt you for:
  • How to add (interactive, from-mcp-json, or from-dxt)
  • Server name (e.g., “fetch”)
  • Command to run (e.g., “uvx mcp-server-fetch”)
  • Any arguments or environment variables
Example interaction:
? How would you like to add the server? interactive
? Server name: fetch  
? Command: uvx mcp-server-fetch
? Add environment variables? No
✓ Added server 'fetch'

Common server examples

Here are some popular MCP servers you might want to test:
# Fetch server (web content)
uvx mcp-server-fetch
Verify your server configuration: After adding a server, you can verify it’s working:
# List all configured servers
mcp-eval server list

# Validate server connectivity
mcp-eval validate
4

Run your first test

Time for the exciting part - running your first test! We’ll use the included fetch server example to demonstrate.
Example structure: The examples assume you have the fetch server configured. If you’re testing a different server, you’ll need to adjust the test code accordingly.
First, let’s make sure we have an example test. If you used mcp-eval init, you might already have one. Otherwise, let’s run:
mcp-eval run examples/mcp_server_fetch/tests/test_decorator_style.py \
  -v \
  --markdown test-reports/results.md \
  --html test-reports/index.html
What’s happening:
  • 🏃 Running decorator-style tests from the example file
  • 📊 Verbose output (-v) shows test progress
  • 📝 Markdown report for documentation
  • 🌐 HTML report for interactive exploration
Expected output:
Running tests...
✓ test_basic_fetch_decorator - Test basic URL fetching [2.3s]
  ✓ fetch_tool_called: Tool 'fetch' was called
  ✓ contains_domain_text: Content contains "Example Domain"
  ✓ fetch_success_rate: Tool success rate 100%

✓ test_content_extraction_decorator - Test extraction quality [3.1s]
  ✓ fetch_called_for_extraction: Tool 'fetch' was called
  ✓ extraction_quality_assessment: LLM judge score 0.92

Results: 2 passed, 0 failed
Reports saved to test-reports/
5

Explore your test results

Open your shiny new test report to see the details:
# Open the HTML report in your browser
open test-reports/index.html

# Or view the markdown report
cat test-reports/results.md
Understanding the HTML report:The interactive report shows:
  • 📊 Overview dashboard - Pass/fail rates, performance metrics
  • 🔍 Test details - Each test with all assertions
  • 🛠️ Tool usage - What tools were called and when
  • 💭 LLM reasoning - The agent’s thought process
  • Performance - Response times and efficiency metrics
  • 🎯 Failed assertions - Detailed diffs and explanations
Common things to check:
  • Did the right tools get called?
  • Was the output accurate?
  • How efficient was the agent’s approach?
  • What was the LLM judge’s assessment?
Test failed? Don’t worry! Check the assertion details to understand why. Common issues:
  • Tool not found (check server configuration)
  • Content mismatch (adjust your assertions)
  • Timeout (increase timeout in config)

What’s next? Write your own test!

Now that you’ve run the example, let’s write your very first custom test:

Choose your testing style

Best for: Quick, readable tests
from mcp_eval import task, Expect

@task("My first test")
async def test_my_server(agent, session):
    response = await agent.generate_str(
        "Use my tool to do something"
    )
    
    await session.assert_that(
        Expect.tools.was_called("my_tool"),
        response=response
    )

Your test file structure

Create a new test file tests/test_my_server.py:
"""Tests for my awesome MCP server."""

from mcp_eval import task, setup, Expect

@setup
def configure_tests():
    """Any setup needed before tests run."""
    print("🚀 Starting my server tests!")

@task("Test basic functionality")
async def test_basic_operation(agent, session):
    """Verify the server responds correctly to basic requests."""
    
    # 1. Send a prompt to the agent
    response = await agent.generate_str(
        "Please use the calculator to add 2 + 2"
    )
    
    # 2. Check that the right tool was called
    await session.assert_that(
        Expect.tools.was_called("calculate"),
        name="calculator_used"
    )
    
    # 3. Verify the response content
    await session.assert_that(
        Expect.content.contains("4"),
        name="correct_answer",
        response=response
    )
    
    # 4. Check efficiency (optional)
    await session.assert_that(
        Expect.performance.max_iterations(3),
        name="completed_efficiently"
    )

@task("Test error handling")
async def test_error_recovery(agent, session):
    """Verify graceful error handling."""
    
    response = await agent.generate_str(
        "Try to divide by zero, then recover"
    )
    
    # Use LLM judge for complex behavior
    await session.assert_that(
        Expect.judge.llm(
            rubric="Agent should handle error gracefully and provide helpful response",
            min_score=0.8
        ),
        name="error_handling_quality",
        response=response
    )
Run your new test:
mcp-eval run tests/test_my_server.py -v --html reports/my_test.html

Troubleshooting common issues

Resources to level up

Ready to become a mcp-eval expert? Here’s your learning path:

Get help


Congratulations! 🎉 You’ve successfully set up mcp-eval and run your first tests. You’re now ready to ensure your MCP servers and agents work flawlessly. Happy testing!