Skip to Content
GlossaryTest Set Type - Glossary

Test Set Type

Back to GlossaryTesting Fundamentals

A required classification for a test set that determines which tests it can contain and how execution is handled—either single-turn or multi-turn.

Also known as: test set type

Overview

Every test set in Rhesis has a type that determines which tests can be added to it and how those tests are run. The type must be specified when creating a test set and cannot be changed after creation.

Available Types

Single-Turn: The test set contains single-turn tests, where each test is a single input/output exchange. Execution sends each test's prompt to the endpoint and evaluates the response independently.

Multi-Turn: The test set contains multi-turn tests, where each test is a sequence of conversation turns. Execution uses Penelope to conduct a full conversation and evaluates the outcome at the conversation level.

Why Type Enforcement Matters

Mixing single-turn and multi-turn tests in the same set would lead to execution mismatches—the platform would not know how to handle tests that require fundamentally different execution strategies. Type enforcement ensures:

  • All tests in a set can be executed with the same runner
  • Metrics and pass/fail thresholds are applied consistently
  • Penelope is only invoked for multi-turn test sets
  • Results are comparable across tests in the same set

Specifying the Type

The type must be provided when creating a test set via the UI, SDK, or backend API:

python
from rhesis.sdk import RhesisClient

client = RhesisClient()
test_set = client.test_sets.create(
      name="Customer support - refund flows",
      test_set_type="multi_turn"
)

Attempting to assign a single-turn test to a multi-turn test set (or vice versa) will result in a validation error.

Best Practices

  • Choose the type based on the nature of your endpoint: single-turn for stateless APIs, multi-turn for conversational agents
  • Create separate test sets for single-turn and multi-turn scenarios, even when testing the same endpoint
  • Verify the type before adding tests—it cannot be changed after the test set is created
  • Align test set types with the execution runner that will be used so Penelope is only invoked when needed

Related Terms