Test Set Type
A required classification for a test set that determines which tests it can contain and how execution is handled—either single-turn or multi-turn.
Overview
Every test set in Rhesis has a type that determines which tests can be added to it and how those tests are run. The type must be specified when creating a test set and cannot be changed after creation.
Available Types
Single-Turn: The test set contains single-turn tests, where each test is a single input/output exchange. Execution sends each test's prompt to the endpoint and evaluates the response independently.
Multi-Turn: The test set contains multi-turn tests, where each test is a sequence of conversation turns. Execution uses Penelope to conduct a full conversation and evaluates the outcome at the conversation level.
Why Type Enforcement Matters
Mixing single-turn and multi-turn tests in the same set would lead to execution mismatches—the platform would not know how to handle tests that require fundamentally different execution strategies. Type enforcement ensures:
- All tests in a set can be executed with the same runner
- Metrics and pass/fail thresholds are applied consistently
- Penelope is only invoked for multi-turn test sets
- Results are comparable across tests in the same set
Specifying the Type
The type must be provided when creating a test set via the UI, SDK, or backend API:
Attempting to assign a single-turn test to a multi-turn test set (or vice versa) will result in a validation error.
Best Practices
- Choose the type based on the nature of your endpoint: single-turn for stateless APIs, multi-turn for conversational agents
- Create separate test sets for single-turn and multi-turn scenarios, even when testing the same endpoint
- Verify the type before adding tests—it cannot be changed after the test set is created
- Align test set types with the execution runner that will be used so Penelope is only invoked when needed