Typical assertions
- Correctness:
Expect.content.contains,Expect.tools.output_matches - Tool usage:
Expect.tools.was_called,called_with,count,success_rate,failed - Efficiency:
Expect.performance.max_iterations,Expect.path.efficiency,Expect.tools.sequence - Quality:
Expect.judge.llm(rubric‑based)
Examples
- Decorators: test_decorator_style.py
- Pytest: test_pytest_style.py
- Dataset: test_dataset_style.py, datasets/
Golden paths & sequences
UseExpect.path.efficiency and Expect.tools.sequence to encode expected tool paths and detect backtracking/repeats.
Artifacts
- Per‑test JSON and OTEL
.jsonltraces in./test-reports - Combined JSON/Markdown/HTML via runner options
- Unified assertions: catalog.py
- Session/metrics: session.py, metrics.py, span_tree.py