Test Type

Back to Glossary Testing Fundamentals

The classification of an individual test as either single-turn or multi-turn, determining how it is executed and which metrics can evaluate it.

Also known as: test classification

Overview

Every test in Rhesis has a type that defines how it is executed: as a single input/output exchange (single-turn) or as a multi-step conversation (multi-turn). The test type must match the type of the test set it belongs to and must be compatible with the scope of the metrics used to evaluate it.

Available Types

Single-Turn: A single prompt is sent to the endpoint and a single response is evaluated. Execution is stateless—each test runs independently with no conversation history.

Multi-Turn: A sequence of conversation turns is executed by Penelope against the endpoint. The test defines a goal, and Penelope adapts its strategy across turns until the goal is achieved or the maximum number of turns is reached.

How Test Type Relates to Other Classifications

Test type is one of three parallel single/multi-turn classifications in Rhesis, each applied to a different entity:

Test Type (this term): classification on an individual test
Test Set Type: classification on a test set — tests can only be added to a set of the same type
Metric Scope: classification on a metric — metrics can only evaluate tests within their declared scope

All three must align for a test to execute and produce a valid result.

Example

A multi-turn test for a customer support chatbot:

python
from rhesis.sdk import RhesisClient

client = RhesisClient()
test = client.tests.create(
      prompt="I need help with my order",
      test_type="multi_turn",
      goal="User successfully gets the status of their most recent order"
)

Common Pitfalls

Mismatched test set type: Adding a multi-turn test to a single-turn test set will fail validation. Always verify the test set type before creating tests.

Metric scope mismatch: Assigning a single-turn-only metric to a multi-turn test set means that metric will not be evaluated. Check metric scope when configuring execution.

Best Practices

Set the test type explicitly rather than relying on defaults to avoid unexpected execution behavior
Create single-turn tests for stateless, fact-based queries and multi-turn tests for goal-oriented conversations
Verify that all metrics assigned to a test set are scoped to match the test type before running
Use separate test sets for single-turn and multi-turn tests of the same endpoint to keep results comparable