Typical assertions
- Correctness:
Expect.content.contains
,Expect.tools.output_matches
- Tool usage:
Expect.tools.was_called
,called_with
,count
,success_rate
,failed
- Efficiency:
Expect.performance.max_iterations
,Expect.path.efficiency
,Expect.tools.sequence
- Quality:
Expect.judge.llm
(rubric‑based)
Examples
- Decorators: test_decorator_style.py
- Pytest: test_pytest_style.py
- Dataset: test_dataset_style.py, datasets/
Golden paths & sequences
UseExpect.path.efficiency
and Expect.tools.sequence
to encode expected tool paths and detect backtracking/repeats.
Artifacts
- Per‑test JSON and OTEL
.jsonl
traces in./test-reports
- Combined JSON/Markdown/HTML via runner options
- Unified assertions: catalog.py
- Session/metrics: session.py, metrics.py, span_tree.py