Define the test agent
- Global default:
- Per‑test override with
with_agent(place above@task):
- Factory for parallel safety:
What to measure
- Tool behavior:
Expect.tools.was_called,called_with,sequence,output_matches - Efficiency and iterations:
Expect.performance.max_iterations,Expect.path.efficiency - Quality:
Expect.judge.llm,Expect.judge.multi_criteria - Performance: response times, concurrency (see metrics)
Styles for agent evals
- Decorator tests: test_decorator_style.py
- Pytest style: test_pytest_style.py
- Datasets: test_dataset_style.py
Inspecting spans and metrics
- Session/agent: session.py
- Catalog: catalog.py
- Evaluators: evaluators/