Troubleshooting Guide

🔧 Having trouble? Don’t worry! This comprehensive guide will help you diagnose and fix common issues quickly. We’ve got your back!

Quick diagnostics

Before diving into specific issues, let’s run a quick health check:

System Check

mcp-eval doctor

Comprehensive system diagnosis

Validate Config

mcp-eval validate

Verify configuration and API keys

Test Connection

mcp-eval validate --servers

Check server connectivity

Common error messages and solutions

🔑 Authentication errors

Error: Invalid API key or authentication failed

Symptoms:

anthropic.AuthenticationError: Invalid API Key
openai.error.AuthenticationError: Incorrect API key provided

Solutions:

Check environment variables:

# Verify keys are set
echo $ANTHROPIC_API_KEY
echo $OPENAI_API_KEY

# Set if missing
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."

Use secrets file:

# mcpeval.secrets.yaml
anthropic:
  api_key: "sk-ant-..."
openai:
  api_key: "sk-..."

Validate configuration:
```
mcp-eval validate
```

Pro tip: Never commit API keys to version control! Use .gitignore for secrets files.

Error: Rate limit exceeded

Symptoms:

Rate limit reached for requests
Too many requests, please retry after X seconds

Solutions:

Reduce concurrency:

# mcpeval.yaml
execution:
  max_concurrency: 2  # Lower from default 5

Add retry logic:

execution:
  retry_failed: true
  retry_delay: 5  # seconds between retries

Use different models for testing vs judging:

# Use cheaper model for generation
provider: "anthropic"
model: "claude-3-haiku-20240307"

# But keep good model for judging
judge:
  model: "claude-3-5-sonnet-20241022"

🔌 Server connection issues

Error: MCP server not found or failed to start

Symptoms:

Server 'my_server' not found
Failed to start MCP server: Command not found
subprocess.CalledProcessError: returned non-zero exit status

Solutions:

Verify server configuration:

# mcpeval.yaml or mcp-agent.config.yaml
mcp:
  servers:
    my_server:
      command: "python"  # Ensure command exists
      args: ["path/to/server.py"]  # Check path is correct
      env:
        PYTHONPATH: "."  # Add if needed

Test server manually:

# Run the server command directly
python path/to/server.py

# Check for errors or missing dependencies

Debug with verbose output:
```
mcp-eval run tests/ -vv
```
Common fixes:
- Install server dependencies: pip install -r requirements.txt
- Use absolute paths: /full/path/to/server.py
- Check file permissions: chmod +x server.py
- Verify Python version compatibility

Error: No tools detected from server

Symptoms:

No tools found for server 'my_server'
Tool 'my_tool' was not called (expected at least 1 call)

Solutions:

Check server is listed in agent:

from mcp_agent.agents.agent import Agent

# Ensure server_names includes your server
agent = Agent(
    name="test_agent",
    server_names=["my_server"]  # Must match config
)

Verify tool discovery:

# List available tools
mcp-eval server list --verbose

Check MCP protocol implementation:
- Server must implement tools/list method
- Tools must have proper schemas
- Server must be running when agent connects

Enable debug logging:

# mcpeval.yaml
logging:
  level: DEBUG
  show_mcp_messages: true

⏱️ Timeout and performance issues

Error: Test execution timed out

Symptoms:

TimeoutError: Test exceeded 300 seconds
asyncio.TimeoutError
Test killed due to timeout

Solutions:

Increase timeout globally:

# mcpeval.yaml
execution:
  timeout_seconds: 600  # 10 minutes

Set per-test timeout:

@task("Long running test", timeout=600)
async def test_complex_operation(agent, session):
    # Your test code

Optimize test prompts:

# Instead of vague prompts:
# "Do something with the data"

# Use specific prompts:
"Fetch https://api.example.com/data and return the count"

Add performance assertions:

await session.assert_that(
    Expect.performance.response_time_under(5000),  # 5 seconds
    name="response_time_check"
)

Profile slow tests:

# Increase verbosity and export HTML for manual review
mcp-eval run tests/ -v --html reports/perf.html

Error: High token usage or costs

Symptoms:

Warning: Test consumed 10,000+ tokens
Estimated cost: $X.XX exceeds budget

Solutions:

Use cheaper models for testing:

# For basic tests
provider: "anthropic"
model: "claude-3-haiku-20240307"

Limit response length:

response = await agent.generate_str(
    "Summarize this in 50 words or less",
    max_tokens=200
)

Cache responses during development:

development:
  cache_responses: true
  cache_ttl: 3600  # 1 hour

Monitor token usage:

metrics = session.get_metrics()
print(f"Tokens used: {metrics.total_tokens}")
print(f"Estimated cost: ${metrics.estimated_cost}")

🧪 Test execution problems

Error: Assertion failed but seems correct

Symptoms:

AssertionError: Expected content to contain "example"
Content was: "This is an Example page"  # Note the capital E

Solutions:

Check case sensitivity:

# Case-insensitive matching
await session.assert_that(
    Expect.content.contains("example", case_sensitive=False),
    response=response
)

Use regex for flexible matching:

await session.assert_that(
    Expect.content.regex(r"exam\w+", case_sensitive=False),
    response=response
)

Debug actual output:

# Temporarily add debug output
print(f"Actual response: {response!r}")

# Or save a JSON report
mcp-eval run tests/ --json debug.json

Use partial matching for tools:

await session.assert_that(
    Expect.tools.output_matches(
        tool_name="fetch",
        expected_output="example",
        match_type="contains"  # Instead of "exact"
    )
)

Error: Flaky or inconsistent test results

Symptoms:

Test passes sometimes, fails others
Different results on each run
Works locally but fails in CI

Solutions:

Set deterministic model parameters:

response = await agent.generate_str(
    prompt,
    temperature=0,  # Deterministic
    seed=42  # Fixed seed if supported
)

Use objective assertions:

# Instead of LLM judge for deterministic checks
await session.assert_that(
    Expect.tools.was_called("fetch"),
    Expect.tools.count("fetch", 1),
    Expect.content.contains("specific_string")
)

Add retry logic for network calls:

@task("Network test", retry=3)
async def test_external_api(agent, session):
    # Will retry up to 3 times on failure

Isolate test environment:

# CI-specific configuration
execution:
  parallel: false  # Run tests sequentially
  reset_between_tests: true  # Clean state

Debug mode walkthrough

When tests fail mysteriously, enable debug mode for detailed insights:

Step 1: Enable debug output

# Maximum verbosity
mcp-eval run tests/ -vvv

# Or set in config

# mcpeval.yaml
debug:
  enabled: true
  log_level: DEBUG
  save_traces: true
  save_llm_calls: true

Step 2: Examine the debug output

Look for these key sections:

[DEBUG] Starting test: test_fetch_example
[DEBUG] Agent configuration: {name: "test_agent", servers: ["fetch"]}
[DEBUG] Sending prompt: "Fetch https://example.com"
[DEBUG] LLM Response: "I'll fetch that URL for you..."
[DEBUG] Tool call: fetch(url="https://example.com")
[DEBUG] Tool response: {"content": "Example Domain..."}
[DEBUG] Final response: "The page contains..."
[DEBUG] Assertion 'content_check' passed

Step 3: Inspect OTEL traces

# View trace for specific test
cat test-reports/traces/test_fetch_example.jsonl | jq '.'

# Or use the trace viewer
mcp-eval trace view test-reports/traces/test_fetch_example.jsonl

Key things to look for in traces:

Tool call sequences
Error spans
Timing information
Token usage per call

Network and connectivity debugging

Testing behind a proxy

# mcpeval.yaml
network:
  proxy:
    http: "http://proxy.company.com:8080"
    https: "https://proxy.company.com:8080"
  timeout: 30
  retry_on_connection_error: true

Debugging SSL/TLS issues

# Disable SSL verification (development only!)
export CURL_CA_BUNDLE=""
export REQUESTS_CA_BUNDLE=""

# Or configure trusted certificates
export SSL_CERT_FILE="/path/to/cacert.pem"

Testing with local servers

# For localhost servers
mcp:
  servers:
    local_server:
      command: "python"
      args: ["server.py"]
      env:
        HOST: "127.0.0.1"
        PORT: "8080"
      startup_timeout: 10  # Wait for server to start

Performance troubleshooting

Identifying bottlenecks

# Save a machine-readable report and analyze offline
mcp-eval run tests/ --json profile.json

# Analyze the report (custom scripts)
cat profile.json | jq '.' | less

Key metrics to watch:

llm_time_ms: Time spent in LLM calls
tool_time_ms: Time in tool execution
idle_time_ms: Wasted time between operations
max_concurrent_operations: Parallelism level

Optimization strategies

Reduce LLM calls

# Batch multiple checks
response = await agent.generate_str(
    "Fetch A, analyze it, then fetch B"
)

Parallel execution

# Run tests concurrently
@pytest.mark.parametrize("url", urls)
@pytest.mark.parallel

Cache results

cache:
  enabled: true
  ttl: 3600

Optimize prompts

# Be specific to reduce iterations
"Get the title from example.com"
# Not: "Tell me about example.com"

Platform-specific issues

macOS

Command not found errors

# Add Python to PATH
export PATH="/usr/local/bin:$PATH"

# Or use full paths in config
command: "/usr/local/bin/python3"

Windows

Path and encoding issues

# Use forward slashes or escaped backslashes
command: "python"
args: ["C:/path/to/server.py"]
# Or
args: ["C:\\path\\to\\server.py"]

# Set encoding
env:
  PYTHONIOENCODING: "utf-8"

Linux/Docker

Permission and container issues

# Fix permissions
chmod +x server.py

# For Docker
docker run --network=host mcp-eval

Getting help

Self-service debugging

Run diagnostics:
```
mcp-eval doctor --full > diagnosis.txt
```

Check logs:

# View recent test logs
tail -f test-reports/logs/mcp-eval.log

Validate everything:
```
mcp-eval validate
```

Prepare an issue report

If you’re still stuck, let’s gather information for a bug report:

# Automatically collect diagnostics
mcp-eval issue

# This will:
# 1. Run system diagnostics
# 2. Collect configuration (sanitized)
# 3. Get recent error logs
# 4. Generate issue template
# 5. Open GitHub issue page

Community support

💬 Discord: Join our community
🐛 GitHub Issues: Report bugs
💡 Discussions: Ask questions
📚 FAQ: Check our frequently asked questions

Quick reference: Error codes

Code	Meaning	Quick Fix
`AUTH001`	Invalid API key	Check environment variables
`SRV001`	Server not found	Verify server name in config
`SRV002`	Server failed to start	Check command and dependencies
`TOOL001`	Tool not found	Verify server implements tool
`TIMEOUT001`	Test timeout	Increase timeout_seconds
`ASSERT001`	Assertion failed	Check expected vs actual values
`NET001`	Network error	Check connectivity and proxy
`RATE001`	Rate limited	Reduce concurrency or add delays

Still stuck? Don’t hesitate to reach out! We’re here to help you succeed with mcp-eval. Remember, every great developer has faced these issues - you’re in good company! 🚀

Getting Started

Core Concepts

Writing Tests

Building with LLMs

Evaluation Guides

Configuration

CI/CD & Deployment

Test Reporting

API Reference

CLI Reference

Resources

​Quick diagnostics

System Check

Validate Config

Test Connection

​Common error messages and solutions

​🔑 Authentication errors

​🔌 Server connection issues

​⏱️ Timeout and performance issues

​🧪 Test execution problems

​Debug mode walkthrough

​Step 1: Enable debug output

​Step 2: Examine the debug output

​Step 3: Inspect OTEL traces

​Network and connectivity debugging

​Testing behind a proxy

​Debugging SSL/TLS issues

​Testing with local servers

​Performance troubleshooting

​Identifying bottlenecks

​Optimization strategies

Reduce LLM calls

Parallel execution

Cache results

Optimize prompts

​Platform-specific issues

​macOS

​Windows

​Linux/Docker

​Getting help

​Self-service debugging

​Prepare an issue report

​Community support

​Quick reference: Error codes

Quick diagnostics

Common error messages and solutions

🔑 Authentication errors

🔌 Server connection issues

⏱️ Timeout and performance issues

🧪 Test execution problems

Debug mode walkthrough

Step 1: Enable debug output

Step 2: Examine the debug output

Step 3: Inspect OTEL traces

Network and connectivity debugging

Testing behind a proxy

Debugging SSL/TLS issues

Testing with local servers

Performance troubleshooting

Identifying bottlenecks

Optimization strategies

Platform-specific issues

macOS

Windows

Linux/Docker

Getting help

Self-service debugging

Prepare an issue report

Community support

Quick reference: Error codes