Tests

Individual test cases that evaluate specific aspects of your AI system’s performance.

What are Tests? Tests are individual test cases that validate specific inputs and expected outputs for your AI system, evaluated using assigned metrics.

Test Types

There are two types of tests:

Single-turn tests check how the AI responds to a single prompt with no follow-up (Q&A).
Multi-turn tests check how the AI behaves over multiple messages in a conversation.

Single-Turn Tests

A single prompt sent to your AI system, evaluated against expected outputs and metrics.

**Properties: **

Field	Description
Test Prompt	The input text sent to your AI system
Category	High-level classification (e.g., Harmful, Harmless)
Topic	Specific subject matter (e.g., healthcare, financial advice)
Behavior	Type of behavior to validate (e.g., Compliance, Reliability, Robustness)
Expected Output	(Optional) What the AI should respond with

Multi-Turn Tests

Goal-based conversations that test your AI system across multiple turns. Powered by Penelope, an autonomous testing agent that adapts its strategy based on responses. Ideal for testing conversational workflows.

Properties:

Field	Description
Goal	What the target should do - the success criteria for this test
Instructions	(Optional) How to conduct the test - if not provided, the agent plans its own approach
Restrictions	(Optional) What the target must not do - forbidden behaviors or boundaries
Scenario	(Optional) Context and persona for the test - narrative setup or user role
Max. Turns	Maximum number of conversation turns allowed
Category	High-level classification (e.g., Harmful, Harmless)
Topic	Specific subject matter (e.g., healthcare, financial advice)
Behavior	Type of behavior to validate (e.g., Compliance, Reliability, Robustness)

Creating Tests

Create tests manually or generate them automatically from behaviors and requirements. See Generation for automated test generation.

Running Tests

Run individual tests from the Tests page or execute multiple tests together using Test Sets.

Next Steps - Organize tests into Test Sets - Generate tests from Knowledge - View execution results in Test Runs