Multi-Turn Test

Goal-based conversation tests that evaluate your AI system across multiple turns, powered by Penelope.

Also known as: multi turn, conversational test

Overview

Multi-turn tests evaluate conversational AI systems through goal-oriented dialogues. Powered by Penelope, these tests adapt their strategy based on your AI's responses, testing complex scenarios that require multiple exchanges.

How It Works

Goal Definition: Define what the test should achieve
Adaptive Conversation: Penelope conducts a natural dialogue
Context Tracking: Maintains conversation state across turns
Goal Assessment: Evaluates if the objective was met

Use Cases

Customer Support:

Test problem resolution workflows
Verify information gathering
Check escalation handling

E-commerce:

Evaluate product discovery
Test personalization
Verify upsell appropriateness

Technical Assistance:

Multi-step troubleshooting
Iterative refinement
Context-dependent responses

Example with Penelope

python
from rhesis.penelope import PenelopeAgent, EndpointTarget

# Initialize Penelope
agent = PenelopeAgent()

# Create target (your AI endpoint)
target = EndpointTarget(endpoint_id="my-chatbot")

# Execute a multi-turn test
result = agent.execute_test(
      target=target,
      goal="Book a hotel room for 2 adults in Paris for 3 nights",
      max_iterations=10
)

print(f"Goal achieved: {result.goal_achieved}")
print(f"Turns used: {result.turns_used}")

Key Differences from Single-Turn

Aspect	Single-Turn	Multi-Turn
Conversation	One exchange	Multiple exchanges
Context	None	Maintained across turns
Complexity	Simple	Complex scenarios
Execution Time	Fast	Slower
Use Case	Quick checks	Workflow testing

Best Practices

Clear goals: Define specific measurable objectives
Reasonable scope: Limit turns to 5-15 for most tests
Edge cases: Test conversation recovery and clarification
Combine with single-turn: Use both types for comprehensive coverage

Documentation

/platform/tests /penelope