API

  • Case[Input, Output, Metadata]
  • Dataset[Input, Output, Metadata]
Source: datasets.py

Programmatic

from mcp_eval import Case, Dataset, ToolWasCalled, ResponseContains

cases = [
  Case(
    name="fetch_example",
    inputs="Fetch https://example.com",
    evaluators=[ToolWasCalled("fetch"), ResponseContains("Example Domain")],
  )
]

dataset = Dataset(name="Fetch Suite", cases=cases)
report = await dataset.evaluate(lambda inputs, agent, session: agent.generate_str(inputs))
report.print(include_input=True, include_output=True)
Parallel evaluation:
report = await dataset.evaluate(
  lambda inputs, agent, session: agent.generate_str(inputs),
  max_concurrency=4,
)

YAML/JSON

Save/load via Dataset.to_file and Dataset.from_file. Schema: mcpeval.config.schema.json. YAML example (from basic_fetch_dataset.yaml):
name: "Basic Fetch Dataset"
server_name: "fetch"
cases:
  - name: "simple_fetch"
    inputs: "Fetch https://example.com"
    expected_output: "Example Domain"
    evaluators:
      - ToolWasCalled:
          tool_name: "fetch"
      - ResponseContains:
          text: "Example Domain"

Concurrency

Dataset.evaluate(..., max_concurrency=N) runs cases in parallel.

Examples