Test Set Type

Back to Glossary Testing Fundamentals

A required classification for a test set that determines which tests it can contain and how execution is handled—either single-turn or multi-turn.

Also known as: test set type

Overview

Every test set in Rhesis has a type that determines which tests can be added to it and how those tests are run. The type must be specified when creating a test set and cannot be changed after creation.

Available Types

Single-Turn: The test set contains single-turn tests, where each test is a single input/output exchange. Execution sends each test's prompt to the endpoint and evaluates the response independently.

Multi-Turn: The test set contains multi-turn tests, where each test is a sequence of conversation turns. Execution uses Penelope to conduct a full conversation and evaluates the outcome at the conversation level.

Why Type Enforcement Matters

Mixing single-turn and multi-turn tests in the same set would lead to execution mismatches—the platform would not know how to handle tests that require fundamentally different execution strategies. Type enforcement ensures:

All tests in a set can be executed with the same runner
Metrics and pass/fail thresholds are applied consistently
Penelope is only invoked for multi-turn test sets
Results are comparable across tests in the same set

Specifying the Type

The type must be provided when creating a test set via the UI, SDK, or backend API:

python
from rhesis.sdk import RhesisClient

client = RhesisClient()
test_set = client.test_sets.create(
      name="Customer support - refund flows",
      test_set_type="multi_turn"
)

Attempting to assign a single-turn test to a multi-turn test set (or vice versa) will result in a validation error.

Best Practices

Choose the type based on the nature of your endpoint: single-turn for stateless APIs, multi-turn for conversational agents
Create separate test sets for single-turn and multi-turn scenarios, even when testing the same endpoint
Verify the type before adding tests—it cannot be changed after the test set is created
Align test set types with the execution runner that will be used so Penelope is only invoked when needed