Configuration File Structure
Complete mcpeval.yaml Reference
Copy
Ask AI
# Schema reference (optional but recommended)
$schema: ../../schema/mcpeval.config.schema.json
# Project metadata
name: "My MCP Test Suite"
description: "Comprehensive testing for my MCP servers"
# Provider configuration (global defaults)
provider: anthropic # anthropic|openai|google|bedrock|vertex_ai|azure_ai
model: claude-3-5-sonnet-20241022 # Or gpt-4-turbo-preview, gemini-pro, etc.
# MCP Server Configuration (REQUIRED for mcp-agent)
mcp:
servers:
# Each server needs unique name
my_server:
command: "python" # Command to run
args: ["server.py", "--debug"] # Arguments
env: # Environment variables
LOG_LEVEL: "info"
API_KEY: "${MY_API_KEY}" # Can reference env vars
cwd: "./servers" # Working directory (optional)
fetch:
command: "uvx"
args: ["mcp-server-fetch"]
env:
UV_NO_PROGRESS: "1"
filesystem:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
# Agent Definitions
agents:
definitions:
- name: "default"
instruction: |
You are a helpful test agent.
Be thorough in your responses.
Handle errors gracefully.
server_names: ["my_server", "fetch"] # Which servers this agent can use
max_iterations: 5
model: claude-3-5-sonnet-20241022 # Override global model
provider: anthropic # Override global provider
- name: "advanced"
instruction: |
You are an expert agent with advanced capabilities.
Use tools efficiently and minimize token usage.
server_names: ["my_server", "fetch", "filesystem"]
max_iterations: 10
temperature: 0.7
max_tokens: 4000
# Default agent for tests
default_agent: "default" # Reference to agent definition above
# Judge Configuration
judge:
provider: anthropic # Can differ from main provider
model: claude-3-5-sonnet-20241022 # Opus for best quality judging
min_score: 0.8
max_tokens: 2000
temperature: 0.0 # Deterministic judging
system_prompt: |
You are an expert evaluator of AI assistant responses.
Consider accuracy, completeness, and appropriate tool usage.
Be fair but thorough in your evaluation.
# Metrics Collection
metrics:
collect:
- "response_time"
- "tool_coverage"
- "iteration_count"
- "token_usage"
- "cost_estimate"
- "error_recovery"
- "parallel_tool_calls"
- "cache_hits"
# Reporting Configuration
reporting:
formats: ["json", "markdown", "html"] # Output formats
output_dir: "./test-reports" # Where to save reports
include_traces: true # Include OTEL traces
include_conversation: true # Include full conversation
timestamp_format: "%Y%m%d_%H%M%S" # Report timestamp format
# Execution Configuration
execution:
max_concurrency: 5 # Parallel test execution
timeout_seconds: 300 # Global timeout
retry_failed: false # Retry failed tests
fail_fast: false # Stop on first failure
verbose: true # Verbose output
# Test-specific servers (if not using agents)
default_servers: ["my_server"] # Fallback if agent doesn't specify
# Logger settings (from mcp-agent)
logger:
type: console # console|file|none
level: info # debug|info|warning|error
format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
file_path: "./logs/mcp-eval.log" # If type is file
Secrets Configuration (mcpeval.secrets.yaml)
Copy
Ask AI
# This file should be gitignored!
anthropic:
api_key: "sk-ant-..."
base_url: "https://api.anthropic.com" # Optional
openai:
api_key: "sk-..."
org_id: "org-..." # Optional
base_url: "https://api.openai.com/v1" # Optional
google:
api_key: "..."
# Server-specific secrets
servers:
my_server:
env:
SECRET_TOKEN: "..."
DATABASE_URL: "postgresql://..."
Environment Variables
Provider API Keys
Copy
Ask AI
# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."
export ANTHROPIC_BASE_URL="https://api.anthropic.com"
# OpenAI
export OPENAI_API_KEY="sk-..."
export OPENAI_ORG_ID="org-..."
# Google
export GOOGLE_API_KEY="..."
# Azure
export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://....openai.azure.com"
mcp-eval Specific
Copy
Ask AI
# Override provider/model
export MCPEVAL_PROVIDER="openai"
export MCPEVAL_MODEL="gpt-4-turbo-preview"
# Execution settings
export MCPEVAL_TIMEOUT_SECONDS="600"
export MCPEVAL_MAX_CONCURRENCY="10"
export MCPEVAL_OUTPUT_DIR="/tmp/test-reports"
# Debug settings
export MCPEVAL_VERBOSE="true"
export MCPEVAL_DEBUG="true"
Configuration Patterns
Pattern 1: Multi-Provider Setup
Copy
Ask AI
# Use different providers for different purposes
provider: openai # Fast/cheap for tests
model: gpt-4o-mini
judge:
provider: anthropic # High quality for judging
model: claude-3-opus-20240229
agents:
definitions:
- name: "fast"
provider: openai
model: gpt-4o-mini
- name: "accurate"
provider: anthropic
model: claude-3-5-sonnet-20241022
Pattern 2: Environment-Specific Configs
Copy
Ask AI
# Development config (mcpeval.dev.yaml)
provider: openai
model: gpt-4o-mini # Cheap for development
execution:
timeout_seconds: 60
verbose: true
---
# Production config (mcpeval.prod.yaml)
provider: anthropic
model: claude-3-opus-20240229 # Best quality
execution:
timeout_seconds: 600
max_concurrency: 10
retry_failed: true
Pattern 3: Server Groups
Copy
Ask AI
mcp:
servers:
# Data servers
database:
command: "python"
args: ["db_server.py"]
cache:
command: "node"
args: ["cache_server.js"]
# Utility servers
fetch:
command: "uvx"
args: ["mcp-server-fetch"]
filesystem:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-filesystem"]
agents:
definitions:
- name: "data_agent"
server_names: ["database", "cache"]
- name: "utility_agent"
server_names: ["fetch", "filesystem"]
- name: "full_agent"
server_names: ["database", "cache", "fetch", "filesystem"]
Initialization Commands
Quick Setup
Copy
Ask AI
# Initialize with wizard
mcp-eval init
# This creates:
# - mcpeval.yaml (main config)
# - mcpeval.secrets.yaml (API keys)
Import Existing Configurations
Copy
Ask AI
# From Cursor/VS Code MCP config
mcp-eval server add --from-mcp-json .cursor/mcp.json
# From DXT manifest
mcp-eval server add --from-dxt ~/Desktop/my-server.dxt
# Add server interactively
mcp-eval server add
Validate Configuration
Copy
Ask AI
# Check configuration validity
mcp-eval validate
# Full system check
mcp-eval doctor --full
# List configured servers
mcp-eval server list --verbose
# Test server connectivity
mcp-eval server test my_server
Common Configuration Issues
Issue 1: Server Not Found
Copy
Ask AI
# Wrong - server not in mcp.servers
servers:
my_server:
command: "python"
# Correct - must be under mcp.servers
mcp:
servers:
my_server:
command: "python"
Issue 2: Agent Missing Servers
Copy
Ask AI
# Wrong - agent has no servers
agents:
definitions:
- name: "test"
instruction: "Test agent"
# Missing server_names!
# Correct
agents:
definitions:
- name: "test"
instruction: "Test agent"
server_names: ["my_server"]
Issue 3: Invalid Model Names
Copy
Ask AI
# Wrong - incomplete model name
model: claude-3-sonnet
# Correct - full model identifier
model: claude-3-5-sonnet-20241022
# Model name reference:
# Anthropic: claude-3-5-sonnet-20241022, claude-3-opus-20240229
# OpenAI: gpt-4-turbo-preview, gpt-4o, gpt-4o-mini
# Google: gemini-pro, gemini-1.5-pro
Issue 4: Environment Variable Issues
Copy
Ask AI
# Wrong - direct secret in config
mcp:
servers:
my_server:
env:
API_KEY: "sk-actual-key-here" # Never commit!
# Correct - use env var reference
mcp:
servers:
my_server:
env:
API_KEY: "${MY_SERVER_API_KEY}"
# Or use secrets file
# mcpeval.secrets.yaml:
servers:
my_server:
env:
API_KEY: "sk-actual-key-here"
Configuration Best Practices
1. Use Schema Validation
Copy
Ask AI
# Add schema reference for IDE support
$schema: ../../schema/mcpeval.config.schema.json
2. Separate Secrets
Copy
Ask AI
# .gitignore
mcpeval.secrets.yaml
.env
*.secret
3. Use Descriptive Names
Copy
Ask AI
agents:
definitions:
- name: "web_scraper" # Clear purpose
instruction: "..."
- name: "data_analyst" # Clear purpose
instruction: "..."
# Not: "agent1", "test", "tmp"
4. Document Complex Configs
Copy
Ask AI
judge:
# Using Opus for highest quality evaluation
# despite higher cost due to critical nature
provider: anthropic
model: claude-3-opus-20240229
# Deterministic for reproducibility
temperature: 0.0
# Higher token limit for detailed reasoning
max_tokens: 3000
5. Use Provider Defaults
Copy
Ask AI
# Instead of repeating provider/model everywhere
provider: anthropic
model: claude-3-5-sonnet-20241022
# Agents inherit these unless overridden
agents:
definitions:
- name: "default" # Uses global provider/model
instruction: "..."
- name: "fast"
provider: openai # Override for this agent
model: gpt-4o-mini
Advanced Configuration
Programmatic Configuration
Copy
Ask AI
from mcp_eval.config import MCPEvalSettings, update_config
# Load and modify
settings = MCPEvalSettings(
provider="anthropic",
model="claude-3-5-sonnet-20241022",
execution={"timeout_seconds": 600}
)
# Update at runtime
update_config({
"execution": {"max_concurrency": 10},
"reporting": {"output_dir": "/custom/path"}
})
Configuration Discovery
Copy
Ask AI
from mcp_eval.config import find_eval_config, find_eval_secrets
# Find config files
config_path = find_eval_config() # Searches up directory tree
secrets_path = find_eval_secrets()
# Load with precedence
from pathlib import Path
config_locations = [
Path.cwd() / "mcpeval.yaml",
Path.cwd() / ".mcp-eval" / "config.yaml",
Path.home() / ".mcp-eval" / "config.yaml"
]
Environment-Specific Loading
Copy
Ask AI
# Use different configs per environment
if [ "$ENV" = "production" ]; then
export MCPEVAL_CONFIG="mcpeval.prod.yaml"
else
export MCPEVAL_CONFIG="mcpeval.dev.yaml"
fi
mcp-eval run tests/
Config Templates
Minimal Config
Copy
Ask AI
provider: anthropic
model: claude-3-5-sonnet-20241022
mcp:
servers:
my_server:
command: "python"
args: ["server.py"]
Full-Featured Config
Copy the complete reference from above and customize!CI/CD Config
Copy
Ask AI
provider: anthropic
model: claude-3-5-sonnet-20241022
execution:
max_concurrency: 10
timeout_seconds: 300
fail_fast: true # Stop on first failure in CI
reporting:
formats: ["json", "junit"] # Machine-readable
output_dir: "./ci-reports"
logger:
type: file
level: debug # Full logs for debugging
file_path: "./ci-logs/mcp-eval.log"