We recommend Anthropic Sonnet/Opus for higher‑quality generation and judging.

Generate

mcp-eval generate --style pytest --n-examples 8 --provider anthropic [--model ...]
What happens:
  • Detect credentials and write/update configs
  • Discover server tools
  • Generate scenarios with an LLM
  • Refine assertions (tool/judge/path)
  • Emit a single test file or dataset

Update an existing file

mcp-eval generate --update tests/test_fetch_generated.py --style pytest --n-examples 4

Implementation