Skip to Content
SDKEntitiesTest Runs & Results

Test Runs & Results

Test runs track the execution of tests against endpoints. Each run produces test results containing the endpoint’s response, evaluation metrics, and review status.

TestRun

A TestRun represents a batch execution of tests. When you execute a test set against an endpoint, a test run is created to track the execution.

Properties

PropertyTypeDescription
idstrUnique identifier
namestrDisplay name
test_configuration_idstrAssociated test configuration
status_idstrCurrent execution status
user_idstrUser who initiated the run
organization_idstrOrganization ID
owner_idstrOwner of the test run
assignee_idstrAssigned reviewer
attributesdictCustom attributes

Fetching Test Runs

fetch_test_runs.py
from rhesis.sdk.entities import TestRuns

# List all test runs
for run in TestRuns.all():
    print(f"{run.name}: {run.status_id}")

# Get by ID
run = TestRuns.pull(id="run-123")

# Filter by status
completed_runs = TestRuns.all(filter="status_id eq 'completed'")

Getting Test Results

Retrieve all results for a test run:

get_results.py
from rhesis.sdk.entities import TestRuns

run = TestRuns.pull(id="run-123")
results = run.get_test_results()

for result in results:
    print(f"Test: {result['test_id']}")
    print(f"Output: {result['test_output']}")
    print(f"Metrics: {result['test_metrics']}")

TestResult

A TestResult contains the output and evaluation for a single test execution.

Properties

PropertyTypeDescription
idstrUnique identifier
test_run_idstrParent test run
test_idstrExecuted test
prompt_idstrTest prompt
status_idstrResult status
statusStatusStatus object with name/description
test_outputdictEndpoint response data
test_metricsdictEvaluation metric scores
test_reviewsdictHuman review data
test_configuration_idstrTest configuration used

Fetching Test Results

fetch_results.py
from rhesis.sdk.entities import TestResults

# Get all results
all_results = TestResults.all()

# Get by ID
result = TestResults.pull(id="result-123")

# Filter by test run
run_results = TestResults.all(filter="test_run_id eq 'run-123'")

# Filter by status
failed_results = TestResults.all(filter="status_id eq 'failed'")

Working with Results

analyze_results.py
from rhesis.sdk.entities import TestResults

result = TestResults.pull(id="result-123")

# Access output
if result.test_output:
    print(f"Response: {result.test_output.get('output')}")
    print(f"Session: {result.test_output.get('session_id')}")

# Access metrics
if result.test_metrics:
    for metric_name, score in result.test_metrics.items():
        print(f"{metric_name}: {score}")

# Check status
if result.status:
    print(f"Status: {result.status.name}")

TestConfiguration

A TestConfiguration defines the settings for test execution, linking test sets to endpoints with specific parameters.

Properties

PropertyTypeDescription
idstrUnique identifier
endpoint_idstrTarget endpoint (required)
test_set_idstrTest set to execute
category_idstrFilter by category
topic_idstrFilter by topic
prompt_idstrSpecific prompt
use_case_idstrAssociated use case
status_idstrConfiguration status
attributesdictCustom settings

Creating Test Configurations

create_config.py
from rhesis.sdk.entities import TestConfiguration

config = TestConfiguration(
    endpoint_id="endpoint-123",
    test_set_id="test-set-456",
    attributes={
        "timeout": 30,
        "retry_count": 3,
    }
)

config.push()
print(f"Created configuration: {config.id}")

Getting Test Runs for a Configuration

config_runs.py
from rhesis.sdk.entities import TestConfigurations

config = TestConfigurations.pull(id="config-123")
runs = config.get_test_runs()

for run in runs:
    print(f"Run: {run['id']} - Status: {run['status_id']}")

Complete Workflow Example

complete_workflow.py
from rhesis.sdk.entities import TestSets, Endpoints, TestRuns, TestResults

# 1. Get test set and endpoint
test_set = TestSets.pull(name="Safety Evaluation")
endpoint = Endpoints.pull(name="Production Chatbot")

# 2. Execute tests (creates a test run)
execution = test_set.execute(endpoint)
print(f"Execution started: {execution}")

# 3. Later: fetch the test run
# (In practice, you'd get the run ID from the execution response or webhook)
runs = TestRuns.all(filter="contains(name, 'Safety')")
latest_run = runs[0] if runs else None

if latest_run:
    # 4. Get results
    results = latest_run.get_test_results()
    
    # 5. Analyze results
    passed = sum(1 for r in results if r.get('status_id') == 'passed')
    failed = sum(1 for r in results if r.get('status_id') == 'failed')
    
    print(f"Passed: {passed}, Failed: {failed}")

Next Steps - Create Test Sets for evaluation