Define Cases and Datasets for systematic evaluation; run programmatically or from files.
Case[Input, Output, Metadata]
Dataset[Input, Output, Metadata]
Dataset.to_file
and Dataset.from_file
. Schema: mcpeval.config.schema.json.
YAML example (from basic_fetch_dataset.yaml):
Dataset.evaluate(..., max_concurrency=N)
runs cases in parallel.