Penelope
Back to GlossaryTesting
An autonomous testing agent that powers multi-turn tests, adapting its strategy based on AI responses to evaluate conversational workflows.
Overview
Penelope is Rhesis's autonomous testing agent that conducts goal-oriented conversations with your AI system. Unlike scripted tests, Penelope adapts her strategy based on your AI's responses, testing realistic conversational scenarios.
How Penelope Works
Adaptive Testing:
- Goal Understanding: Penelope knows what to achieve
- Dynamic Strategy: Adjusts approach based on responses
- Natural Conversation: Conducts realistic dialogue
- Goal Assessment: Evaluates if objective was met
Intelligent Behaviors:
- Clarification: Asks for missing information
- Verification: Confirms understanding
- Edge Testing: Tries boundary cases
- Recovery: Handles errors gracefully
Using Penelope
Basic Test Execution:
With Instructions and Restrictions:
What Penelope Tests
Conversational Capabilities:
- Information gathering: Does the AI ask the right questions?
- Context retention: Does it remember previous turns?
- Clarification handling: How does it handle ambiguity?
- Task completion: Can it achieve the goal?
Edge Cases:
- Missing information: How does the AI handle gaps?
- Contradictions: Can it recover from conflicts?
- Complexity: Does it manage multi-step workflows?
- User changes: How does it adapt to new requirements?
Penelope vs. Scripted Tests
| Aspect | Scripted | Penelope |
|---|---|---|
| Conversation | Fixed script | Adaptive |
| Realism | Predictable | Natural |
| Coverage | Limited paths | Explores variations |
| Maintenance | Update scripts | Update goals |
Target Options
Rhesis Endpoints:
LangChain Chains:
LangGraph Graphs:
Configuration
Best Practices
- Clear goals: Define specific measurable objectives
- Reasonable scope: Limit turns to 5-15 for most tests
- Use restrictions: Define what the AI should NOT do
- Review traces: Analyze conversation logs for insights
- Iterate: Refine goals based on test results