Unified assertion API with the Expect catalog: content, tools, performance, judges, path.
Assertions use a single entrypoint and an expressive catalog. Prefer structural checks for stability, and combine them with judges only when necessary.Use one entrypoint:
response
)Expect.content.contains(text, case_sensitive=False)
Expect.content.not_contains(text, case_sensitive=False)
Expect.content.regex(pattern, case_sensitive=False)
Expect.tools.was_called(name, min_times=1)
Expect.tools.called_with(name, {args})
Expect.tools.count(name, expected_count)
Expect.tools.success_rate(min_rate, tool_name=None)
Expect.tools.failed(name)
Expect.tools.output_matches(name, expected, field_path?, match_type?, case_sensitive?, call_index?)
Expect.tools.sequence([names], allow_other_calls=False)
field_path
supports nested dict/list access, e.g. content[0].text
or content.0.text
Expect.performance.max_iterations(n)
Expect.performance.response_time_under(ms)
Expect.judge.llm(rubric, min_score=0.8, include_input=False, require_reasoning=True)
Expect.judge.multi_criteria(criteria, aggregate_method="weighted", require_all_pass=False, include_confidence=True, use_cot=True, model=None)
Expect.path.efficiency(optimal_steps?, expected_tool_sequence?, allow_extra_steps=0, penalize_backtracking=True, penalize_repeated_tools=True, tool_usage_limits?, default_tool_limit=1)
output_matches
) for tool outputs when possible for stabilityname
in assert_that(..., name="...")
to label checks in reportsmin_score=0.8
) with one or two structural checks (like output_matches
) for resilient tests.when="end"
is used. If an assertion depends on final metrics (e.g., success rate), defer it.