Master these core concepts to write effective tests with mcp-eval. Each concept builds on the previous ones to create a complete testing framework.
Overview
mcp-eval orchestrates interactions between three key components:- Agents - AI models that can use tools
- MCP Servers - Tool providers implementing the Model Context Protocol
- Test Sessions - Orchestrators that manage execution and collect metrics
Think of it like a stage play: The agent is the actor, MCP servers provide the props and scenery, and the test session is the director capturing everything for review.
TestSession (single source of truth)
TestSession
is the orchestrator that manages the entire test lifecycle. It configures OpenTelemetry tracing, runs the agent, collects spans, computes metrics, and saves artifacts.
Key responsibilities
- Trace management: Configures and captures OTEL traces
- Metrics extraction: Converts traces into actionable metrics (tool calls, latency, token usage, costs)
- Assertion coordination: Manages immediate and deferred assertion evaluation
- Report generation: Creates JSON, HTML, and Markdown reports
Metrics derived from traces
From the OTEL traces, TestSession extracts:- Tool invocation details (names, arguments, outputs, timing)
- Iteration counts and conversation turns
- Token usage and estimated costs
- Performance breakdowns (LLM time vs tool time)
- Error patterns and recovery sequences
TestAgent
TestAgent
is a wrapper around the runtime agent that provides testing-specific functionality and connects assertions to the session.
Key features
- Simplified API:
generate_str()
for string responses - Direct assertion access:
agent.assert_that()
shortcut - Session integration: Automatically connected to TestSession’s metrics
Unified assertion API
mcp-eval uses a single, discoverable API pattern for all assertions:Immediate vs deferred assertions
Understanding assertion timing is crucial for debugging test failures.
- Content checks (
contains
,regex
) - LLM judges (quality evaluation)
- Tool usage (
was_called
,count
,sequence
) - Performance (
response_time_under
,max_iterations
) - Path efficiency analysis
Test styles
mcp-eval supports three testing approaches to fit different workflows:Decorator style
Simple and expressive for quick tests:@task
, @setup
, @teardown
, @parametrize
Pytest integration
Familiar for teams already using pytest:mcp_session
, mcp_agent
), markers (@pytest.mark.mcp_agent
)
Dataset style
Systematic evaluation with test matrices:LLM judges
LLM-based evaluation for subjective quality assessment:Single criterion
Multi-criteria evaluation
MCPEvalSettings
for judge model/provider defaults.
Sources: