Test Execution System
Overview
The Rhesis test execution system provides a flexible and extensible architecture for running tests against endpoints. It supports multiple test types and execution modes, enabling you to test everything from simple API responses to complex multi-turn conversations.
Architecture
The system uses the Strategy Pattern to route tests to appropriate executors based on their type:
Test → Factory → Executor → Results
↓
SingleTurnExecutor (traditional tests)
MultiTurnExecutor (Penelope-based)
[Future: ImageTestExecutor, etc.]Key Components
-
Test Execution Entry Point (
test_execution.py)- Main
execute_test()function - Routes to appropriate executor via factory pattern
- Maintains backward compatibility
- Main
-
Executors (
executors/)BaseTestExecutor- Abstract base classSingleTurnTestExecutor- Traditional request/response testsMultiTurnTestExecutor- Agentic multi-turn tests using Penelopefactory.py- Routing logic based on test type
-
Shared Utilities (
executors/shared.py)- Common helper functions
- Data retrieval and validation
- Result storage
Test Flow
Standard Execution Flow
Test Configuration
↓
Retrieve Test
↓
Determine Test Type
↓
├─→ Single-Turn → SingleTurnExecutor
│
└─→ Multi-Turn → MultiTurnExecutor
↓
Store Results
↓
Return SummaryParallel vs Sequential
Test configurations can execute multiple tests in two modes:
- Parallel (default): Tests run simultaneously using Celery workers
- Sequential: Tests run one after another
See Execution Modes for details.
Executor Pattern
Creating an Executor
All executors must implement the BaseTestExecutor interface:
from rhesis.backend.tasks.execution.executors.base import BaseTestExecutor
class CustomTestExecutor(BaseTestExecutor):
def execute(
self,
db: Session,
test_config_id: str,
test_run_id: str,
test_id: str,
endpoint_id: str,
organization_id: Optional[str] = None,
user_id: Optional[str] = None,
model: Optional[Any] = None,
) -> Dict[str, Any]:
# Your execution logic here
return {
"test_id": test_id,
"execution_time": execution_time_ms,
"metrics": metrics_dict,
}Return Format
All executors must return a dictionary with:
test_id(str): The test identifierexecution_time(float): Execution time in millisecondsmetrics(Dict[str, Any]): Metric evaluation results
Usage Examples
Running a Test
from rhesis.backend.tasks.execution.test_execution import execute_test
# Execute test (automatically routes to correct executor)
result = execute_test(
db=db,
test_config_id="config-uuid",
test_run_id="run-uuid",
test_id="test-uuid",
endpoint_id="endpoint-uuid",
organization_id="org-uuid",
user_id="user-uuid"
)
print(f"Execution time: {result['execution_time']}ms")
print(f"Metrics: {result['metrics']}")Using Executors Directly
from rhesis.backend.tasks.execution.executors import (
create_executor,
SingleTurnTestExecutor,
MultiTurnTestExecutor,
)
# Get test and create appropriate executor
test = get_test(db, test_id)
executor = create_executor(test)
# Or use a specific executor directly
executor = MultiTurnTestExecutor()
result = executor.execute(db, test_config_id, ...)Extending the System
Adding a New Test Type
- Create the Executor
# executors/image_test_executor.py
from rhesis.backend.tasks.execution.executors.base import BaseTestExecutor
class ImageTestExecutor(BaseTestExecutor):
def execute(self, ...):
# Image-specific test logic
pass- Update the Factory
# executors/factory.py
def create_executor(test: Test) -> BaseTestExecutor:
test_type = get_test_type(test)
if test_type == TestType.IMAGE:
from .image_test_executor import ImageTestExecutor
return ImageTestExecutor()
elif test_type == TestType.MULTI_TURN:
# ...existing code- Add Test Type Enum
# tasks/enums.py
class TestType(str, Enum):
SINGLE_TURN = "Single-Turn"
MULTI_TURN = "Multi-Turn"
IMAGE = "Image" # New typeDesign Principles
Modularity
Each executor is self-contained (~150-200 lines) and handles one test type.
Loose Coupling
- Executors don’t depend on each other
- Multi-turn executor preserves Penelope trace as-is
- No tight coupling to external frameworks
Backward Compatibility
- Public API (
execute_test()) unchanged - Helper functions re-exported for existing code
- All existing tests work without modifications
Extensibility (Open/Closed Principle)
- Add new test types without modifying existing code
- Factory pattern enables clean routing
- Easy to test executors independently
Performance Considerations
Execution Time Tracking
All executors track execution time in milliseconds for monitoring and optimization.
Caching
The system checks for existing results to avoid duplicate test execution.
Parallel Processing
Use parallel execution mode for independent tests to maximize throughput.
Module Location
The test execution code is located in the backend repository but is primarily executed by workers:
apps/backend/src/rhesis/backend/tasks/execution/
├── executors/
│ ├── __init__.py
│ ├── base.py # BaseTestExecutor ABC
│ ├── data.py # Data retrieval utilities
│ ├── metrics.py # Metrics processing
│ ├── results.py # Result storage
│ ├── runners.py # Core execution logic (shared)
│ ├── single_turn.py # Single-turn executor
│ ├── multi_turn.py # Multi-turn executor (Penelope)
│ └── factory.py # Routing logic
├── test_execution.py # Main entry point
├── modes.py # Execution mode utilities
├── orchestration.py # Task orchestration
├── parallel.py # Parallel execution
├── sequential.py # Sequential execution
└── penelope_target.py # Backend target for PenelopeRelated Documentation
- Test Types - Single-turn vs Multi-turn tests
- Execution Modes - Sequential vs Parallel
- Background Tasks - Celery task system
- Architecture - Worker system overview