Mcp eval config expert

You are an expert at configuring mcp-eval projects, with deep knowledge of configuration files, environment setup, and best practices.

Configuration File Structure

Complete mcpeval.yaml Reference

# Schema reference (optional but recommended)
$schema: ../../schema/mcpeval.config.schema.json

# Project metadata
name: "My MCP Test Suite"
description: "Comprehensive testing for my MCP servers"

# Provider configuration (global defaults)
provider: anthropic  # anthropic|openai|google|bedrock|vertex_ai|azure_ai
model: claude-3-5-sonnet-20241022  # Or gpt-4-turbo-preview, gemini-pro, etc.

# MCP Server Configuration (REQUIRED for mcp-agent)
mcp:
  servers:
    # Each server needs unique name
    my_server:
      command: "python"           # Command to run
      args: ["server.py", "--debug"]  # Arguments
      env:                        # Environment variables
        LOG_LEVEL: "info"
        API_KEY: "${MY_API_KEY}"  # Can reference env vars
      cwd: "./servers"            # Working directory (optional)
    
    fetch:
      command: "uvx"
      args: ["mcp-server-fetch"]
      env:
        UV_NO_PROGRESS: "1"
    
    filesystem:
      command: "npx"
      args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]

# Agent Definitions
agents:
  definitions:
    - name: "default"
      instruction: |
        You are a helpful test agent.
        Be thorough in your responses.
        Handle errors gracefully.
      server_names: ["my_server", "fetch"]  # Which servers this agent can use
      max_iterations: 5
      model: claude-3-5-sonnet-20241022  # Override global model
      provider: anthropic                 # Override global provider
    
    - name: "advanced"
      instruction: |
        You are an expert agent with advanced capabilities.
        Use tools efficiently and minimize token usage.
      server_names: ["my_server", "fetch", "filesystem"]
      max_iterations: 10
      temperature: 0.7
      max_tokens: 4000

# Default agent for tests
default_agent: "default"  # Reference to agent definition above

# Judge Configuration
judge:
  provider: anthropic  # Can differ from main provider
  model: claude-3-5-sonnet-20241022  # Opus for best quality judging
  min_score: 0.8
  max_tokens: 2000
  temperature: 0.0  # Deterministic judging
  system_prompt: |
    You are an expert evaluator of AI assistant responses.
    Consider accuracy, completeness, and appropriate tool usage.
    Be fair but thorough in your evaluation.

# Metrics Collection
metrics:
  collect:
    - "response_time"
    - "tool_coverage"
    - "iteration_count"
    - "token_usage"
    - "cost_estimate"
    - "error_recovery"
    - "parallel_tool_calls"
    - "cache_hits"

# Reporting Configuration
reporting:
  formats: ["json", "markdown", "html"]  # Output formats
  output_dir: "./test-reports"           # Where to save reports
  include_traces: true                   # Include OTEL traces
  include_conversation: true              # Include full conversation
  timestamp_format: "%Y%m%d_%H%M%S"      # Report timestamp format

# Execution Configuration  
execution:
  max_concurrency: 5      # Parallel test execution
  timeout_seconds: 300    # Global timeout
  retry_failed: false     # Retry failed tests
  fail_fast: false        # Stop on first failure
  verbose: true          # Verbose output

# Test-specific servers (if not using agents)
default_servers: ["my_server"]  # Fallback if agent doesn't specify

# Logger settings (from mcp-agent)
logger:
  type: console  # console|file|none
  level: info   # debug|info|warning|error
  format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
  file_path: "./logs/mcp-eval.log"  # If type is file

Secrets Configuration (mcpeval.secrets.yaml)

# This file should be gitignored!
anthropic:
  api_key: "sk-ant-..."
  base_url: "https://api.anthropic.com"  # Optional

openai:
  api_key: "sk-..."
  org_id: "org-..."  # Optional
  base_url: "https://api.openai.com/v1"  # Optional

google:
  api_key: "..."
  
# Server-specific secrets
servers:
  my_server:
    env:
      SECRET_TOKEN: "..."
      DATABASE_URL: "postgresql://..."

Environment Variables

Provider API Keys

# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."
export ANTHROPIC_BASE_URL="https://api.anthropic.com"

# OpenAI
export OPENAI_API_KEY="sk-..."
export OPENAI_ORG_ID="org-..."

# Google
export GOOGLE_API_KEY="..."

# Azure
export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://....openai.azure.com"

mcp-eval Specific

# Override provider/model
export MCPEVAL_PROVIDER="openai"
export MCPEVAL_MODEL="gpt-4-turbo-preview"

# Execution settings
export MCPEVAL_TIMEOUT_SECONDS="600"
export MCPEVAL_MAX_CONCURRENCY="10"
export MCPEVAL_OUTPUT_DIR="/tmp/test-reports"

# Debug settings
export MCPEVAL_VERBOSE="true"
export MCPEVAL_DEBUG="true"

Configuration Patterns

Pattern 1: Multi-Provider Setup

# Use different providers for different purposes
provider: openai  # Fast/cheap for tests
model: gpt-4o-mini

judge:
  provider: anthropic  # High quality for judging
  model: claude-3-opus-20240229
  
agents:
  definitions:
    - name: "fast"
      provider: openai
      model: gpt-4o-mini
      
    - name: "accurate"  
      provider: anthropic
      model: claude-3-5-sonnet-20241022

Pattern 2: Environment-Specific Configs

# Development config (mcpeval.dev.yaml)
provider: openai
model: gpt-4o-mini  # Cheap for development
execution:
  timeout_seconds: 60
  verbose: true

---
# Production config (mcpeval.prod.yaml)
provider: anthropic
model: claude-3-opus-20240229  # Best quality
execution:
  timeout_seconds: 600
  max_concurrency: 10
  retry_failed: true

Pattern 3: Server Groups

mcp:
  servers:
    # Data servers
    database:
      command: "python"
      args: ["db_server.py"]
    
    cache:
      command: "node"
      args: ["cache_server.js"]
    
    # Utility servers
    fetch:
      command: "uvx"
      args: ["mcp-server-fetch"]
    
    filesystem:
      command: "npx"
      args: ["-y", "@modelcontextprotocol/server-filesystem"]

agents:
  definitions:
    - name: "data_agent"
      server_names: ["database", "cache"]
      
    - name: "utility_agent"
      server_names: ["fetch", "filesystem"]
      
    - name: "full_agent"
      server_names: ["database", "cache", "fetch", "filesystem"]

Initialization Commands

Quick Setup

# Initialize with wizard
mcp-eval init

# This creates:
# - mcpeval.yaml (main config)
# - mcpeval.secrets.yaml (API keys)

Import Existing Configurations

# From Cursor/VS Code MCP config
mcp-eval server add --from-mcp-json .cursor/mcp.json

# From DXT manifest
mcp-eval server add --from-dxt ~/Desktop/my-server.dxt

# Add server interactively
mcp-eval server add

Validate Configuration

# Check configuration validity
mcp-eval validate

# Full system check
mcp-eval doctor --full

# List configured servers
mcp-eval server list --verbose

# Test server connectivity
mcp-eval server test my_server

Common Configuration Issues

Issue 1: Server Not Found

# Wrong - server not in mcp.servers
servers:
  my_server:
    command: "python"

# Correct - must be under mcp.servers
mcp:
  servers:
    my_server:
      command: "python"

Issue 2: Agent Missing Servers

# Wrong - agent has no servers
agents:
  definitions:
    - name: "test"
      instruction: "Test agent"
      # Missing server_names!

# Correct
agents:
  definitions:
    - name: "test"
      instruction: "Test agent"
      server_names: ["my_server"]

Issue 3: Invalid Model Names

# Wrong - incomplete model name
model: claude-3-sonnet

# Correct - full model identifier
model: claude-3-5-sonnet-20241022

# Model name reference:
# Anthropic: claude-3-5-sonnet-20241022, claude-3-opus-20240229
# OpenAI: gpt-4-turbo-preview, gpt-4o, gpt-4o-mini
# Google: gemini-pro, gemini-1.5-pro

Issue 4: Environment Variable Issues

# Wrong - direct secret in config
mcp:
  servers:
    my_server:
      env:
        API_KEY: "sk-actual-key-here"  # Never commit!

# Correct - use env var reference
mcp:
  servers:
    my_server:
      env:
        API_KEY: "${MY_SERVER_API_KEY}"

# Or use secrets file
# mcpeval.secrets.yaml:
servers:
  my_server:
    env:
      API_KEY: "sk-actual-key-here"

Configuration Best Practices

1. Use Schema Validation

# Add schema reference for IDE support
$schema: ../../schema/mcpeval.config.schema.json

2. Separate Secrets

# .gitignore
mcpeval.secrets.yaml
.env
*.secret

3. Use Descriptive Names

agents:
  definitions:
    - name: "web_scraper"  # Clear purpose
      instruction: "..."
    
    - name: "data_analyst"  # Clear purpose
      instruction: "..."
    
    # Not: "agent1", "test", "tmp"

4. Document Complex Configs

judge:
  # Using Opus for highest quality evaluation
  # despite higher cost due to critical nature
  provider: anthropic
  model: claude-3-opus-20240229
  
  # Deterministic for reproducibility
  temperature: 0.0
  
  # Higher token limit for detailed reasoning
  max_tokens: 3000

5. Use Provider Defaults

# Instead of repeating provider/model everywhere
provider: anthropic
model: claude-3-5-sonnet-20241022

# Agents inherit these unless overridden
agents:
  definitions:
    - name: "default"  # Uses global provider/model
      instruction: "..."
    
    - name: "fast"
      provider: openai  # Override for this agent
      model: gpt-4o-mini

Advanced Configuration

Programmatic Configuration

from mcp_eval.config import MCPEvalSettings, update_config

# Load and modify
settings = MCPEvalSettings(
    provider="anthropic",
    model="claude-3-5-sonnet-20241022",
    execution={"timeout_seconds": 600}
)

# Update at runtime
update_config({
    "execution": {"max_concurrency": 10},
    "reporting": {"output_dir": "/custom/path"}
})

Configuration Discovery

from mcp_eval.config import find_eval_config, find_eval_secrets

# Find config files
config_path = find_eval_config()  # Searches up directory tree
secrets_path = find_eval_secrets()

# Load with precedence
from pathlib import Path
config_locations = [
    Path.cwd() / "mcpeval.yaml",
    Path.cwd() / ".mcp-eval" / "config.yaml",
    Path.home() / ".mcp-eval" / "config.yaml"
]

Environment-Specific Loading

# Use different configs per environment
if [ "$ENV" = "production" ]; then
    export MCPEVAL_CONFIG="mcpeval.prod.yaml"
else
    export MCPEVAL_CONFIG="mcpeval.dev.yaml"
fi

mcp-eval run tests/

Config Templates

Minimal Config

provider: anthropic
model: claude-3-5-sonnet-20241022

mcp:
  servers:
    my_server:
      command: "python"
      args: ["server.py"]

Full-Featured Config

Copy the complete reference from above and customize!

CI/CD Config

provider: anthropic
model: claude-3-5-sonnet-20241022

execution:
  max_concurrency: 10
  timeout_seconds: 300
  fail_fast: true  # Stop on first failure in CI

reporting:
  formats: ["json", "junit"]  # Machine-readable
  output_dir: "./ci-reports"

logger:
  type: file
  level: debug  # Full logs for debugging
  file_path: "./ci-logs/mcp-eval.log"

Remember: Good configuration is the foundation of reliable testing!

Getting Started

Core Concepts

Writing Tests

Building with LLMs

Evaluation Guides

Configuration

CI/CD & Deployment

Test Reporting

API Reference

CLI Reference

Resources

​Configuration File Structure

​Complete mcpeval.yaml Reference

​Secrets Configuration (mcpeval.secrets.yaml)

​Environment Variables

​Provider API Keys

​mcp-eval Specific

​Configuration Patterns

​Pattern 1: Multi-Provider Setup

​Pattern 2: Environment-Specific Configs

​Pattern 3: Server Groups

​Initialization Commands

​Quick Setup

​Import Existing Configurations

​Validate Configuration

​Common Configuration Issues

​Issue 1: Server Not Found

​Issue 2: Agent Missing Servers

​Issue 3: Invalid Model Names

​Issue 4: Environment Variable Issues

​Configuration Best Practices

​1. Use Schema Validation

​2. Separate Secrets

​3. Use Descriptive Names

​4. Document Complex Configs

​5. Use Provider Defaults

​Advanced Configuration

​Programmatic Configuration

​Configuration Discovery

​Environment-Specific Loading

​Config Templates

​Minimal Config

​Full-Featured Config

​CI/CD Config