Skip to Content

Tests

Individual test cases that evaluate specific aspects of your AI system’s performance.

What are Tests? Tests are individual test cases that validate specific inputs and expected outputs for your AI system, evaluated using assigned metrics.

Test Types

There are two types of tests:

  • Single-turn tests check how the AI responds to a single prompt with no follow-up (Q&A).
  • Multi-turn tests check how the AI behaves over multiple messages in a conversation.

Single-Turn Tests

A single prompt sent to your AI system, evaluated against expected outputs and metrics.

**Properties: **

FieldDescription
Test PromptThe input text sent to your AI system
CategoryHigh-level classification (e.g., Harmful, Harmless)
TopicSpecific subject matter (e.g., healthcare, financial advice)
BehaviorType of behavior to validate (e.g., Compliance, Reliability, Robustness)
Expected Output(Optional) What the AI should respond with

Multi-Turn Tests

Goal-based conversations that test your AI system across multiple turns. Powered by Penelope, an autonomous testing agent that adapts its strategy based on responses. Ideal for testing conversational workflows.

Properties:

FieldDescription
GoalWhat the target should do - the success criteria for this test
Instructions(Optional) How to conduct the test - if not provided, the agent plans its own approach
Restrictions(Optional) What the target must not do - forbidden behaviors or boundaries
Scenario(Optional) Context and persona for the test - narrative setup or user role
Max. TurnsMaximum number of conversation turns allowed
CategoryHigh-level classification (e.g., Harmful, Harmless)
TopicSpecific subject matter (e.g., healthcare, financial advice)
BehaviorType of behavior to validate (e.g., Compliance, Reliability, Robustness)

Creating Tests

Create tests manually or generate them automatically from behaviors and requirements. See Generation for automated test generation.

Running Tests

Run individual tests from the Tests page  or execute multiple tests together using Test Sets.


Next Steps - Organize tests into Test Sets - Generate tests from Knowledge - View execution results in Test Runs