> ## Documentation Index
> Fetch the complete documentation index at: https://mcp-eval.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# mcp-eval Documentation

> The comprehensive testing framework for MCP servers and tool-using agents.

<Note>
  **Your flight simulator for MCP servers and agents** — Connect agents to real MCP servers, run realistic scenarios, and calculate metrics for tool calls and more.
</Note>

<Info>
  [Model Context Protocol](https://modelcontextprotocol.io/docs/getting-started/intro)  standardizes how applications provide context to large language models (LLMs). Think of MCP like a USB-C port for AI applications.

  **`mcp-eval`** ensures your MCP servers, and agents built with them, work reliably in production.
</Info>

## What `mcp-eval` Does for You

<Columns cols={2}>
  <Card title="Test MCP Servers" icon="server">
    Ensure your MCP servers respond correctly to agent requests and handle edge cases gracefully
  </Card>

  <Card title="Evaluate Agents" icon="robot">
    Measure how effectively agents use tools, follow instructions, and recover from errors
  </Card>

  <Card title="Track Performance" icon="chart-line">
    Monitor latency, token usage, cost, and success rates with OpenTelemetry-backed metrics
  </Card>

  <Card title="Assert Quality" icon="circle-check">
    Use structural checks, LLM judges, and path efficiency validators to ensure high quality
  </Card>
</Columns>

## Get Started in 30 Seconds

We recommend using [uv](https://docs.astral.sh/uv/):

<CodeGroup>
  ```bash uv (recommended) theme={null}
  # Install mcp-eval globally (for CLI)
  uv tool install mcpevals

  # Add mcp-eval dependency to your project
  uv add mcpevals

  # Initialize your project (interactive setup)
  mcp-eval init

  # Add your MCP server to test
  mcp-eval server add

  # Auto-generate tests with an LLM
  mcp-eval generate

  # Run decorator/dataset tests
  mcp-eval run tests/

  # Run pytest tests (use pytest)
  uv run pytest -q tests
  ```

  ```bash pip theme={null}
  # Install mcp-eval
  pip install mcpevals

  # Initialize your project
  mcp-eval init

  # Add your MCP server
  mcp-eval server add

  # Run decorator/dataset tests
  mcp-eval run tests/

  # Run pytest tests (use pytest)
  pytest -q tests
  ```
</CodeGroup>

<Info>
  **Test any MCP server:** It doesn't matter what language your MCP server is written in - Python, TypeScript, Go, Rust, Java, or any other. As long as it implements the MCP protocol, `mcp-eval` can test it!
</Info>

<Check>You're ready to start testing! [Continue with the Quickstart →](./quickstart)</Check>

## 🎮 Choose Your Testing Adventure

What are you evaluating today?

<Tabs>
  <Tab title="I'm testing an MCP Server" icon="server">
    **You built an MCP server** (in any language!) and want to ensure it handles agent requests correctly.

    <Card title="Perfect! Let's test your server" icon="server">
      mcp-eval will spin up an AI agent to test your server with realistic requests, edge cases, and error scenarios.

      **Your server could be:**

      * A streamable HTTP database connector
      * An SSE API wrapper
      * A stdio file system server
      * Any server that speaks MCP!
    </Card>

    <Columns cols={2}>
      <Card title="Start Here" icon="rocket" href="./server-evaluation">
        MCP Server Testing Guide
      </Card>

      <Card title="See Examples" icon="code" href="./examples-mcp-server-fetch">
        Testing the Fetch Server
      </Card>
    </Columns>
  </Tab>

  <Tab title="I'm testing an Agent" icon="robot">
    **You built an AI agent** that uses MCP servers and want to ensure it uses tools effectively.

    <Card title="Perfect! Let's evaluate your agent" icon="robot">
      mcp-eval will connect your agent to MCP servers and verify it uses tools correctly, handles errors, and meets performance targets.

      **Your agent could be:**

      * A customer service bot
      * A coding assistant
      * A deep research agent
      * Any MCP agent!
    </Card>

    <Columns cols={2}>
      <Card title="Start Here" icon="rocket" href="./agent-evaluation">
        Agent Evaluation Guide
      </Card>

      <Card title="Learn Patterns" icon="graduation-cap" href="./common-workflows">
        Common Testing Patterns
      </Card>
    </Columns>
  </Tab>

  <Tab title="I want both!" icon="sparkles">
    **You're building a complete system** with both MCP servers and agents.

    <Card title="Awesome! Test the whole stack" icon="sparkles">
      mcp-eval can test your entire integration - ensuring servers handle requests correctly AND agents use tools effectively.
    </Card>

    <Columns cols={3}>
      <Card title="Quick Start" icon="rocket" href="./quickstart">
        5-minute setup
      </Card>

      <Card title="Concepts" icon="lightbulb" href="./concepts">
        Core concepts
      </Card>

      <Card title="Examples" icon="code" href="./examples">
        Browse all examples
      </Card>
    </Columns>
  </Tab>
</Tabs>

## Why Teams Choose `mcp-eval`

* **Production-readiness**: Built on OpenTelemetry for enterprise-grade observability
* **Multiple test styles**: Choose between decorators, pytest, or dataset-driven testing
* **Rich assertions**: Content checks, tool verification, performance gates, and LLM judges
* **CI/CD friendly**: GitHub Actions support, JSON/HTML reports, and regression detection
* **Language agnostic**: Test MCP servers written in any language

## Quick Navigation

<Columns cols={3}>
  <Card title="Quickstart" icon="rocket" href="./quickstart">
    Get up and running in 5 minutes
  </Card>

  <Card title="Common Workflows" icon="graduation-cap" href="./common-workflows">
    Step-by-step guides for typical tasks
  </Card>

  <Card title="API Reference" icon="code" href="./api-catalog">
    Complete assertion catalog and APIs
  </Card>
</Columns>

## Learning Path

<Tabs>
  <Tab title="Getting Started">
    <Columns cols={1}>
      <Card title="Overview" icon="map" href="./overview">
        Understand `mcp-eval`'s architecture and philosophy
      </Card>

      <Card title="Quickstart" icon="rocket" href="./quickstart">
        Your first test in 5 minutes
      </Card>

      <Card title="Concepts" icon="lightbulb" href="./concepts">
        Core concepts and terminology
      </Card>
    </Columns>
  </Tab>

  <Tab title="Writing Tests">
    <Columns cols={1}>
      <Card title="Assertions" icon="check" href="./assertions">
        The unified Expect API for all assertions
      </Card>

      <Card title="Common Workflows" icon="route" href="./common-workflows">
        Practical testing patterns
      </Card>

      <Card title="Test Generation" icon="wand-magic-sparkles" href="./test-generation">
        AI-powered test creation
      </Card>
    </Columns>
  </Tab>

  <Tab title="Evaluation Types">
    <Columns cols={1}>
      <Card title="Server Evaluation" icon="server" href="./server-evaluation">
        Testing MCP server implementations
      </Card>

      <Card title="Agent Evaluation" icon="robot" href="./agent-evaluation">
        Measuring agent effectiveness
      </Card>

      <Card title="Datasets" icon="database" href="./datasets">
        Systematic evaluation suites
      </Card>
    </Columns>
  </Tab>

  <Tab title="Configuration">
    <Columns cols={1}>
      <Card title="Configuration" icon="gear" href="./configuration">
        Settings and customization
      </Card>

      <Card title="CI/CD" icon="circle-play" href="./ci-cd">
        GitHub Actions and automation
      </Card>

      <Card title="Reports" icon="chart-bar" href="./reports">
        Understanding test outputs
      </Card>
    </Columns>
  </Tab>

  <Tab title="Reference">
    <Columns cols={1}>
      <Card title="CLI Reference" icon="terminal" href="./cli-reference">
        Complete command documentation
      </Card>

      <Card title="API Reference" icon="code" href="./api-catalog">
        Detailed API documentation
      </Card>

      <Card title="Troubleshooting" icon="wrench" href="./troubleshooting">
        Common issues and solutions
      </Card>

      <Card title="FAQ" icon="question-circle" href="./faq">
        Frequently asked questions
      </Card>
    </Columns>
  </Tab>
</Tabs>

## Example: Your First Test

<CodeGroup>
  ```python test_fetch.py theme={null}
  from mcp_eval import task, Expect

  @task("Verify fetch server works correctly")
  async def test_fetch(agent, session):
      # Ask the agent to fetch a webpage
      response = await agent.generate_str("Fetch https://example.com and summarize it")
      
      # Assert the right tool was called
      await session.assert_that(Expect.tools.was_called("fetch"))
      
      # Verify the content is correct
      await session.assert_that(Expect.content.contains("Example Domain"), response=response)
      
      # Check performance
      await session.assert_that(Expect.performance.response_time_under(5000))
  ```

  ```python pytest_style.py theme={null}
  import pytest
  from mcp_eval import create_agent, Expect

  @pytest.mark.asyncio
  async def test_fetch_with_pytest():
      agent = await create_agent("claude-3-5-sonnet")
      response = await agent.generate_str("Fetch https://example.com")
      
      assert "Example Domain" in response
      assert agent.tools_called == ["fetch"]
  ```
</CodeGroup>

<Tip>[See more examples →](./examples)</Tip>

## Join the Community

<Columns cols={2}>
  <Card title="GitHub" icon="github" href="https://github.com/lastmile-ai/mcp-eval">
    Report issues and contribute
  </Card>

  <Card title="Discord" icon="discord" href="https://lmai.link/discord/mcp-eval">
    Get help and share experiences
  </Card>
</Columns>
