Skip to Content
PlatformTest Runs

Test Runs

View execution results when you run a test set against an endpoint. Each test run contains all individual test results with detailed metrics, conversation history, and performance data.

What are Test Runs? A test run is created when you execute a test set against an endpoint. It captures all test results, execution metadata, and evaluation metrics for analysis.

Test Runs Page

By clicking on a test run, you’ll see the overview of how the test run went. From here you can:

  • Review: Manually revise the automated test evaluation
  • Compare: Compare against a baseline test run
  • Re-run: Execute the test set again with the same configuration
  • Download: Export test run data as CSV

Test Run Detail Page

Review Test Runs

Reviews allow human evaluators to validate or override automated test evaluations.

  1. Select a Test Run from the Test Runs  overview page
  2. Click on the Reviews tab, then Add Your Review
  3. Select a Review Status:
    • Pass: The test passes the metric
    • Fail: The test fails the metric
  4. Add a comment explaining your review decision

Compare Test Runs

Compare test runs to identify regressions and improvements between executions.

How to Compare:

  1. Click the Compare button (top right, next to Download)
  2. Select a baseline test run to compare against
  3. View the test-by-test comparison of the test set

Comparison Filters:

Use filters to focus on specific changes:

  • All Tests: Show all tests from both runs
  • Improved: Tests that now pass but failed in baseline
  • Regressed: Tests that now fail but passed in baseline
  • Unchanged: Tests with the same pass/fail status

Next Steps - Review failed tests to understand issues - Compare against baseline runs to detect regressions - Add human reviews to test results - Export results for reporting or analysis