Execution Trace
Penelope captures comprehensive execution traces that provide complete visibility into multi-turn test runs. These traces are structured, machine-readable, and designed for analysis, debugging, and integration with metrics systems.
Overview
Every test execution produces a TestResult object that contains:
- Test outcomes - Status, goal achievement, findings
- Complete conversation history - Every turn with full message context
- Easy-to-read conversation summary - Simplified turn-by-turn flow with clear roles
- Structured evaluation data - Complete goal evaluation with detailed criteria analysis
- Standardized metrics - SDK-compatible metric summaries (no duplication)
- Test configuration - Full reproducibility information
- Performance statistics - Timing, tool usage, token consumption
Schema Structure
Key Fields
Data Structure Overview
Penelope’s execution trace uses an optimized structure to avoid data duplication:
goal_evaluation: Complete goal evaluation data with detailedcriteria_evaluations, evidence, and turn referencesmetrics: Summary-only data for goal achievement metrics (score, criteria counts) - detailed data is ingoal_evaluationconversation_summary: Easy-to-read turn-by-turn flow for UI displayhistory: Complete technical conversation history with full message context
Status & Outcome
- status:
"success","failure","timeout", or"error" - goal_achieved: Boolean indicating if test objective was met
- turns_used: Number of conversation turns executed
- findings: List of key observations from the test
Goal Evaluation
Complete goal evaluation with detailed criteria analysis. This field contains the full evaluation data, while metrics contains only summary information to avoid duplication:
Conversation Summary
For easy reading and UI display, each test includes a simplified conversation summary with clear role names:
Key Benefits:
- Clear roles: “penelope” for the agent, “target” for the endpoint
- Easy tracking: Turn-by-turn conversation flow
- UI-friendly: Perfect for frontend display and analysis
- Complements history: Simplified view while detailed
historyremains available
Conversation History
Each turn in the detailed history contains:
Metrics
SDK-compatible format for integration with Rhesis platform. For goal achievement metrics, contains summary data only (detailed criteria are in goal_evaluation):
Accessing Traces
Python API
JSON Export
Integration with Rhesis Platform
Penelope traces integrate seamlessly with the Rhesis platform:
Analysis & Debugging
Quick Summary
Debugging Failed Tests
Complete Schema: See the Penelope GitHub repository for the complete TestResult schema definition and examples.