Define the test agent
- Global default:
- Per‑test override with
with_agent
(place above@task
):
- Factory for parallel safety:
What to measure
- Tool behavior:
Expect.tools.was_called
,called_with
,sequence
,output_matches
- Efficiency and iterations:
Expect.performance.max_iterations
,Expect.path.efficiency
- Quality:
Expect.judge.llm
,Expect.judge.multi_criteria
- Performance: response times, concurrency (see metrics)
Styles for agent evals
- Decorator tests: test_decorator_style.py
- Pytest style: test_pytest_style.py
- Datasets: test_dataset_style.py
Inspecting spans and metrics
- Session/agent: session.py
- Catalog: catalog.py
- Evaluators: evaluators/