Extending Penelope
Understand Penelopeβs architecture and learn how to extend it with custom tools for specialized testing needs.
Architecture Overview
Penelope follows a clean, modular architecture designed for extensibility and reliability.
Core Components
PenelopeAgent
Main orchestrator coordinating test execution:
TurnExecutor
Handles individual turn execution - reasoning, tool selection, and execution.
GoalEvaluator
LLM-based evaluation of goal achievement using structured output.
Targets
Abstraction for systems under test:
Built-in Tools
Penelope includes three core tools:
- Send Message to Target - Interacts with the system under test
- Analyze Response - Evaluates target responses for goal criteria
- Extract Information - Pulls specific data from responses
Execution Flow
- Initialize - Agent receives goal, instructions, and context
- Turn Loop - For each turn up to max_iterations:
- Agent reasons about current state
- Selects and executes tool
- Processes result
- Evaluates goal achievement
- Checks stopping conditions
- Completion - Returns TestResult with full history
Stopping Conditions
Tests stop when any condition is met:
Custom Tools
Extend Penelopeβs capabilities by creating custom tools for specialized testing needs.
Tool Interface
All tools implement the Tool abstract base class:
Parameter Validation: Tool parameters are automatically validated via
Pydantic schemas. Your execute method receives validated inputs.
Creating a Custom Tool
Example: Database verification tool for testing data persistence.
Using Custom Tools
Writing Quality Tool Descriptions
Good descriptions help Penelope understand when and how to use your tool. Include:
- Purpose - What the tool does
- When to Use - Scenarios for using this tool
- When NOT to Use - Scenarios to avoid
- Parameters - Expected inputs with types
- Examples - Real usage examples
- Important Notes - Caveats and limitations
Multiple Custom Tools
Add multiple tools for comprehensive testing:
Best Practices
Clear Naming
Handle Errors Gracefully
Provide Rich Output
Test Your Tools
Design Principles
- Modularity - Clear separation of concerns (agent, executor, evaluator, tools)
- Extensibility - Easy to add custom tools and targets
- Observability - Full transparency into reasoning and execution
- Type Safety - Pydantic validation throughout
- Provider Agnostic - Works with any LLM provider
Real-World Examples
See complete implementations in the examples directoryΒ :
- custom_tools.py - Database verification, API monitoring, security scanning
- batch_testing.py - Batch test runner tool
- platform_integration.py - TestSet loader tool
Next: Check out Examples to see custom tools in action, or learn about Configuration options.