Test Sets & Tests
Test sets are collections of individual test entities used to evaluate AI applications. Each test entity contains a prompt, expected behaviors, and metadata for categorization.
TestSet
A TestSet groups related tests together. Test sets can be created programmatically, generated with synthesizers, imported from CSV files, or generated directly in the platform and then downloaded into the SDK using pull().
Properties
| Property | Type | Description |
|---|---|---|
id | str | Unique identifier (assigned on push) |
name | str | Display name |
description | str | Full description |
short_description | str | Brief summary |
tests | list[Test] | Collection of test cases |
test_count | int | Number of tests |
categories | list[str] | Category names in this set |
topics | list[str] | Topic names in this set |
behaviors | list[str] | Behavior names in this set |
test_set_type | TestType | Single-turn or multi-turn |
metadata | dict | Custom key-value data |
Creating Test Sets
Build a test set manually with Test objects:
Using Synthesizers
Generate test sets automatically with synthesizers:
CSV Import/Export
Import test sets from CSV files for bulk operations:
Required CSV columns:
prompt_content: The test prompt textcategory: Test categorytopic: Test topicbehavior: Expected behavior
Optional CSV columns:
expected_response: Expected output from the AI
Export test sets to CSV:
Executing Test Sets
Run all tests in a set against an endpoint:
Auto-generating Properties
Use an LLM to generate name, description, and metadata based on test content:
Test
A Test represents a single test case. Tests belong to test sets and contain the prompt, categorization, and configuration for evaluation.
Properties
| Property | Type | Description |
|---|---|---|
id | str | Unique identifier |
prompt | Prompt | Test input containing content and expected response |
category | str | Category name (e.g., “Safety”, “Accuracy”) |
topic | str | Topic name (e.g., “Privacy”, “Harmful Content”) |
behavior | str | Expected behavior (e.g., “Refuses harmful requests”) |
test_type | TestType | SINGLE_TURN or MULTI_TURN |
test_configuration | TestConfiguration | Multi-turn test settings |
metadata | dict | Custom key-value data |
Creating Tests
Executing Individual Tests
Run a single test against an endpoint:
Multi-turn Tests
For conversational tests that span multiple turns, use TestConfiguration:
Prompt
The Prompt object contains the actual test input:
Next Steps - Configure Endpoints to run tests against
- Review Test Runs to track execution results
- Use Synthesizers to generate tests automatically