Connect your server to an agent and validate correctness, robustness, performance, and path efficiency.
Expect.content.contains
, Expect.tools.output_matches
Expect.tools.was_called
, called_with
, count
, success_rate
, failed
Expect.performance.max_iterations
, Expect.path.efficiency
, Expect.tools.sequence
Expect.judge.llm
(rubric‑based)Expect.path.efficiency
and Expect.tools.sequence
to encode expected tool paths and detect backtracking/repeats.
.jsonl
traces in ./test-reports