Tests
Individual test cases that evaluate specific aspects of your AI system’s performance.
What are Tests? Tests are individual test cases that validate specific inputs and expected outputs for your AI system, evaluated using assigned metrics.
Test Types
There are two types of tests:
- Single-turn tests check how the AI responds to a single prompt with no follow-up (Q&A).
- Multi-turn tests check how the AI behaves over multiple messages in a conversation.
Single-Turn Tests
A single prompt sent to your AI system, evaluated against expected outputs and metrics.
**Properties: **
| Field | Description |
|---|---|
| Test Prompt | The input text sent to your AI system |
| Category | High-level classification (e.g., Harmful, Harmless) |
| Topic | Specific subject matter (e.g., healthcare, financial advice) |
| Behavior | Type of behavior to validate (e.g., Compliance, Reliability, Robustness) |
| Expected Output | (Optional) What the AI should respond with |
Multi-Turn Tests
Goal-based conversations that test your AI system across multiple turns. Powered by Penelope, an autonomous testing agent that adapts its strategy based on responses. Ideal for testing conversational workflows.
Properties:
| Field | Description |
|---|---|
| Goal | What the target should do - the success criteria for this test |
| Instructions | (Optional) How to conduct the test - if not provided, the agent plans its own approach |
| Restrictions | (Optional) What the target must not do - forbidden behaviors or boundaries |
| Scenario | (Optional) Context and persona for the test - narrative setup or user role |
| Max. Turns | Maximum number of conversation turns allowed |
| Category | High-level classification (e.g., Harmful, Harmless) |
| Topic | Specific subject matter (e.g., healthcare, financial advice) |
| Behavior | Type of behavior to validate (e.g., Compliance, Reliability, Robustness) |
Creating Tests
Create tests manually or generate them automatically from behaviors and requirements. See Generation for automated test generation.
Running Tests
Run individual tests from the Tests page or execute multiple tests together using Test Sets.