Test Runs
View execution results when you run a test set against an endpoint. Each test run contains all individual test results with detailed metrics, conversation history, and performance data.
What are Test Runs? A test run is created when you execute a test set against an endpoint. It captures all test results, execution metadata, and evaluation metrics for analysis.

By clicking on a test run, you’ll see the overview of how the test run went. From here you can:
- Review: Manually revise the automated test evaluation
- Compare: Compare against a baseline test run
- Re-run: Execute the test set again with the same configuration
- Download: Export test run data as CSV

Review Test Runs
Reviews allow human evaluators to validate or override automated test evaluations.
- Select a Test Run from the Test Runs overview page
- Click on the Reviews tab, then Add Your Review
- Select a Review Status:
- Pass: The test passes the metric
- Fail: The test fails the metric
- Add a comment explaining your review decision
Compare Test Runs
Compare test runs to identify regressions and improvements between executions.
How to Compare:
- Click the Compare button (top right, next to Download)
- Select a baseline test run to compare against
- View the test-by-test comparison of the test set
Comparison Filters:
Use filters to focus on specific changes:
- All Tests: Show all tests from both runs
- Improved: Tests that now pass but failed in baseline
- Regressed: Tests that now fail but passed in baseline
- Unchanged: Tests with the same pass/fail status
Next Steps - Review failed tests to understand issues - Compare against baseline runs to detect regressions - Add human reviews to test results - Export results for reporting or analysis