Skip to main content
🔧 Having trouble? Don’t worry! This comprehensive guide will help you diagnose and fix common issues quickly. We’ve got your back!

Quick diagnostics

Before diving into specific issues, let’s run a quick health check:

System Check

mcp-eval doctor
Comprehensive system diagnosis

Validate Config

mcp-eval validate
Verify configuration and API keys

Test Connection

mcp-eval validate --servers
Check server connectivity

Common error messages and solutions

🔑 Authentication errors

Symptoms:
anthropic.AuthenticationError: Invalid API Key
openai.error.AuthenticationError: Incorrect API key provided
Solutions:
  1. Check environment variables:
    # Verify keys are set
    echo $ANTHROPIC_API_KEY
    echo $OPENAI_API_KEY
    
    # Set if missing
    export ANTHROPIC_API_KEY="sk-ant-..."
    export OPENAI_API_KEY="sk-..."
    
  2. Use secrets file:
    # mcpeval.secrets.yaml
    anthropic:
      api_key: "sk-ant-..."
    openai:
      api_key: "sk-..."
    
  3. Validate configuration:
    mcp-eval validate
    
Pro tip: Never commit API keys to version control! Use .gitignore for secrets files.
Symptoms:
Rate limit reached for requests
Too many requests, please retry after X seconds
Solutions:
  1. Reduce concurrency:
    # mcpeval.yaml
    execution:
      max_concurrency: 2  # Lower from default 5
    
  2. Add retry logic:
    execution:
      retry_failed: true
      retry_delay: 5  # seconds between retries
    
  3. Use different models for testing vs judging:
    # Use cheaper model for generation
    provider: "anthropic"
    model: "claude-3-haiku-20240307"
    
    # But keep good model for judging
    judge:
      model: "claude-3-5-sonnet-20241022"
    

🔌 Server connection issues

Symptoms:
Server 'my_server' not found
Failed to start MCP server: Command not found
subprocess.CalledProcessError: returned non-zero exit status
Solutions:
  1. Verify server configuration:
    # mcpeval.yaml or mcp-agent.config.yaml
    mcp:
      servers:
        my_server:
          command: "python"  # Ensure command exists
          args: ["path/to/server.py"]  # Check path is correct
          env:
            PYTHONPATH: "."  # Add if needed
    
  2. Test server manually:
    # Run the server command directly
    python path/to/server.py
    
    # Check for errors or missing dependencies
    
  3. Debug with verbose output:
    mcp-eval run tests/ -vv
    
  4. Common fixes:
    • Install server dependencies: pip install -r requirements.txt
    • Use absolute paths: /full/path/to/server.py
    • Check file permissions: chmod +x server.py
    • Verify Python version compatibility
Symptoms:
No tools found for server 'my_server'
Tool 'my_tool' was not called (expected at least 1 call)
Solutions:
  1. Check server is listed in agent:
    from mcp_agent.agents.agent import Agent
    
    # Ensure server_names includes your server
    agent = Agent(
        name="test_agent",
        server_names=["my_server"]  # Must match config
    )
    
  2. Verify tool discovery:
    # List available tools
    mcp-eval server list --verbose
    
  3. Check MCP protocol implementation:
    • Server must implement tools/list method
    • Tools must have proper schemas
    • Server must be running when agent connects
  4. Enable debug logging:
    # mcpeval.yaml
    logging:
      level: DEBUG
      show_mcp_messages: true
    

⏱️ Timeout and performance issues

Symptoms:
TimeoutError: Test exceeded 300 seconds
asyncio.TimeoutError
Test killed due to timeout
Solutions:
  1. Increase timeout globally:
    # mcpeval.yaml
    execution:
      timeout_seconds: 600  # 10 minutes
    
  2. Set per-test timeout:
    @task("Long running test", timeout=600)
    async def test_complex_operation(agent, session):
        # Your test code
    
  3. Optimize test prompts:
    # Instead of vague prompts:
    # "Do something with the data"
    
    # Use specific prompts:
    "Fetch https://api.example.com/data and return the count"
    
  4. Add performance assertions:
    await session.assert_that(
        Expect.performance.response_time_under(5000),  # 5 seconds
        name="response_time_check"
    )
    
  5. Profile slow tests:
    # Increase verbosity and export HTML for manual review
    mcp-eval run tests/ -v --html reports/perf.html
    
Symptoms:
Warning: Test consumed 10,000+ tokens
Estimated cost: $X.XX exceeds budget
Solutions:
  1. Use cheaper models for testing:
    # For basic tests
    provider: "anthropic"
    model: "claude-3-haiku-20240307"
    
  2. Limit response length:
    response = await agent.generate_str(
        "Summarize this in 50 words or less",
        max_tokens=200
    )
    
  3. Cache responses during development:
    development:
      cache_responses: true
      cache_ttl: 3600  # 1 hour
    
  4. Monitor token usage:
    metrics = session.get_metrics()
    print(f"Tokens used: {metrics.total_tokens}")
    print(f"Estimated cost: ${metrics.estimated_cost}")
    

🧪 Test execution problems

Symptoms:
AssertionError: Expected content to contain "example"
Content was: "This is an Example page"  # Note the capital E
Solutions:
  1. Check case sensitivity:
    # Case-insensitive matching
    await session.assert_that(
        Expect.content.contains("example", case_sensitive=False),
        response=response
    )
    
  2. Use regex for flexible matching:
    await session.assert_that(
        Expect.content.regex(r"exam\w+", case_sensitive=False),
        response=response
    )
    
  3. Debug actual output:
    # Temporarily add debug output
    print(f"Actual response: {response!r}")
    
    # Or save a JSON report
    mcp-eval run tests/ --json debug.json
    
  4. Use partial matching for tools:
    await session.assert_that(
        Expect.tools.output_matches(
            tool_name="fetch",
            expected_output="example",
            match_type="contains"  # Instead of "exact"
        )
    )
    
Symptoms:
Test passes sometimes, fails others
Different results on each run
Works locally but fails in CI
Solutions:
  1. Set deterministic model parameters:
    response = await agent.generate_str(
        prompt,
        temperature=0,  # Deterministic
        seed=42  # Fixed seed if supported
    )
    
  2. Use objective assertions:
    # Instead of LLM judge for deterministic checks
    await session.assert_that(
        Expect.tools.was_called("fetch"),
        Expect.tools.count("fetch", 1),
        Expect.content.contains("specific_string")
    )
    
  3. Add retry logic for network calls:
    @task("Network test", retry=3)
    async def test_external_api(agent, session):
        # Will retry up to 3 times on failure
    
  4. Isolate test environment:
    # CI-specific configuration
    execution:
      parallel: false  # Run tests sequentially
      reset_between_tests: true  # Clean state
    

Debug mode walkthrough

When tests fail mysteriously, enable debug mode for detailed insights:

Step 1: Enable debug output

# Maximum verbosity
mcp-eval run tests/ -vvv

# Or set in config
# mcpeval.yaml
debug:
  enabled: true
  log_level: DEBUG
  save_traces: true
  save_llm_calls: true

Step 2: Examine the debug output

Look for these key sections:
[DEBUG] Starting test: test_fetch_example
[DEBUG] Agent configuration: {name: "test_agent", servers: ["fetch"]}
[DEBUG] Sending prompt: "Fetch https://example.com"
[DEBUG] LLM Response: "I'll fetch that URL for you..."
[DEBUG] Tool call: fetch(url="https://example.com")
[DEBUG] Tool response: {"content": "Example Domain..."}
[DEBUG] Final response: "The page contains..."
[DEBUG] Assertion 'content_check' passed

Step 3: Inspect OTEL traces

# View trace for specific test
cat test-reports/traces/test_fetch_example.jsonl | jq '.'

# Or use the trace viewer
mcp-eval trace view test-reports/traces/test_fetch_example.jsonl
Key things to look for in traces:
  • Tool call sequences
  • Error spans
  • Timing information
  • Token usage per call

Network and connectivity debugging

Testing behind a proxy

# mcpeval.yaml
network:
  proxy:
    http: "http://proxy.company.com:8080"
    https: "https://proxy.company.com:8080"
  timeout: 30
  retry_on_connection_error: true

Debugging SSL/TLS issues

# Disable SSL verification (development only!)
export CURL_CA_BUNDLE=""
export REQUESTS_CA_BUNDLE=""

# Or configure trusted certificates
export SSL_CERT_FILE="/path/to/cacert.pem"

Testing with local servers

# For localhost servers
mcp:
  servers:
    local_server:
      command: "python"
      args: ["server.py"]
      env:
        HOST: "127.0.0.1"
        PORT: "8080"
      startup_timeout: 10  # Wait for server to start

Performance troubleshooting

Identifying bottlenecks

# Save a machine-readable report and analyze offline
mcp-eval run tests/ --json profile.json

# Analyze the report (custom scripts)
cat profile.json | jq '.' | less
Key metrics to watch:
  • llm_time_ms: Time spent in LLM calls
  • tool_time_ms: Time in tool execution
  • idle_time_ms: Wasted time between operations
  • max_concurrent_operations: Parallelism level

Optimization strategies

Reduce LLM calls

# Batch multiple checks
response = await agent.generate_str(
    "Fetch A, analyze it, then fetch B"
)

Parallel execution

# Run tests concurrently
@pytest.mark.parametrize("url", urls)
@pytest.mark.parallel

Cache results

cache:
  enabled: true
  ttl: 3600

Optimize prompts

# Be specific to reduce iterations
"Get the title from example.com"
# Not: "Tell me about example.com"

Platform-specific issues

macOS

# Add Python to PATH
export PATH="/usr/local/bin:$PATH"

# Or use full paths in config
command: "/usr/local/bin/python3"

Windows

# Use forward slashes or escaped backslashes
command: "python"
args: ["C:/path/to/server.py"]
# Or
args: ["C:\\path\\to\\server.py"]

# Set encoding
env:
  PYTHONIOENCODING: "utf-8"

Linux/Docker

# Fix permissions
chmod +x server.py

# For Docker
docker run --network=host mcp-eval

Getting help

Self-service debugging

  1. Run diagnostics:
    mcp-eval doctor --full > diagnosis.txt
    
  2. Check logs:
    # View recent test logs
    tail -f test-reports/logs/mcp-eval.log
    
  3. Validate everything:
    mcp-eval validate
    

Prepare an issue report

If you’re still stuck, let’s gather information for a bug report:
# Automatically collect diagnostics
mcp-eval issue

# This will:
# 1. Run system diagnostics
# 2. Collect configuration (sanitized)
# 3. Get recent error logs
# 4. Generate issue template
# 5. Open GitHub issue page

Community support

Quick reference: Error codes

CodeMeaningQuick Fix
AUTH001Invalid API keyCheck environment variables
SRV001Server not foundVerify server name in config
SRV002Server failed to startCheck command and dependencies
TOOL001Tool not foundVerify server implements tool
TIMEOUT001Test timeoutIncrease timeout_seconds
ASSERT001Assertion failedCheck expected vs actual values
NET001Network errorCheck connectivity and proxy
RATE001Rate limitedReduce concurrency or add delays

Still stuck? Don’t hesitate to reach out! We’re here to help you succeed with mcp-eval. Remember, every great developer has faced these issues - you’re in good company! 🚀
I