Test Execution
The SDK provides methods on TestSet for executing tests against endpoints, re-scoring existing
outputs with different metrics, and managing which metrics are assigned to a test set.
For background on execution concepts (metrics hierarchy, execution modes, output reuse) see the Platform Test Execution guide.
Executing Test Sets
Use execute() to run every test in a set against an endpoint. The endpoint is called for each
test and the responses are scored against the configured metrics.
Execution Mode
Control whether tests are sent to the endpoint in parallel or one at a time. Use the
ExecutionMode enum or strings "parallel" / "sequential".
- ExecutionMode.PARALLEL (default): Tests are dispatched concurrently for maximum throughput.
- ExecutionMode.SEQUENTIAL: Tests run one after another. Use this for endpoints with strict rate limits or when order matters.
Invalid mode values raise ValueError.
Custom Metrics
Pass a metrics list to override the test set and behavior-level metrics for a single execution.
Each item can be a dict (with at least an "id" key) or a metric name string (resolved
automatically via the /metrics API).
Execution-time metrics take the highest priority. When provided, they replace both the test set-level and behavior-level metrics for that run.
Re-scoring Existing Outputs
rescore() re-evaluates metrics on outputs from a previous test run without calling the endpoint
again. This is useful when you want to apply new or different metrics to an existing set of
responses.
Specifying Which Run to Re-score
By default, rescore() uses the latest completed run for the test set / endpoint combination.
You can also pass a specific run:
If no completed run exists for the combination, rescore() raises a ValueError.
Last Completed Run
last_run() returns a summary of the most recent completed test run for a given test set and
endpoint. Returns None if no completed run exists.
Combine with rescore() for an inspect-then-rescore workflow:
Managing Test Set Metrics
Manage which metrics are assigned to a test set. These metrics are used by default when a test set is executed without explicit per-execution metrics.
Get Current Metrics
Add Metrics
Add a single metric or a list. Each item can be a dict with an "id" key, a UUID string, or a
metric name string.
Remove Metrics
Complete Workflow
Related — Test Sets | Test Runs | SDK Metrics (Evaluation Engine) | Platform Test Execution