Adaptive Testing

Adaptive Testing is a topic-based workflow for expanding and maintaining single-turn test sets over time. Instead of treating a test set as static, you can organize tests into topic trees, generate outputs in bulk, evaluate results with a selected metric, and iterate with AI-generated suggestions.

When to Use Adaptive Testing

Use Adaptive Testing when you need to:

Grow coverage across specific risk areas or product domains
Re-evaluate existing tests without regenerating all outputs
Curate suggestions before saving them to your test set
Keep one evolving test set instead of creating many one-off sets

End-to-End Workflow

Adaptive Testing in the UI follows this loop:

Create an adaptive test set
Organize tests in topics
Generate outputs from a selected endpoint
Evaluate tests with a selected metric
Generate and review suggestions
Accept selected suggestions into the test set

Topic operations are hierarchical. Renaming a topic cascades to child topics and tests; removing a topic removes subtopics and moves tests to the parent topic.

Generate Outputs and Evaluate with Overwrite Control

Two actions drive most iteration cycles:

Generate outputs: invoke an endpoint for test inputs and store output in test metadata
Evaluate: run a selected metric against test input/output pairs and store label and score in test metadata

Both actions support an overwrite option.

Parameter	Type	Default	Behavior
`topic`	`string \| null`	`null`	Limits processing to a topic
`include_subtopics`	`boolean`	`true`	Includes descendant topics when `topic` is set
`overwrite`	`boolean`	`false`	Replaces existing outputs/results instead of skipping
`test_ids`	`string[] \| null`	`null`	Optional explicit subset of tests

With overwrite=false, tests that already have outputs or evaluation labels are skipped. The response includes generated or evaluated, plus skipped and failed counts.

Generate Outputs API Example

generate_outputs.sh
curl -X POST "$RHESIS_BASE_URL/adaptive_testing/$TEST_SET_ID/generate_outputs" \
-H "Authorization: Bearer $RHESIS_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
    "endpoint_id": "00000000-0000-0000-0000-000000000000",
    "topic": "Safety/Jailbreak",
    "include_subtopics": true,
    "overwrite": false
}'

Evaluate API Example

evaluate_tests.sh
curl -X POST "$RHESIS_BASE_URL/adaptive_testing/$TEST_SET_ID/evaluate" \
-H "Authorization: Bearer $RHESIS_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
    "metric_names": ["answer_relevancy"],
    "topic": "Safety/Jailbreak",
    "include_subtopics": true,
    "overwrite": true
}'

Suggestions Workflow

Suggestion endpoints are non-persisted until you accept results in the UI.

POST /adaptive_testing/\{id\}/generate_suggestions
POST /adaptive_testing/\{id\}/generate_suggestion_outputs
POST /adaptive_testing/\{id\}/evaluate_suggestions
Accept selected suggestions to create real tests in the set

Suggestion parameter	Type	Default	Notes
`num_examples`	`int`	`10`	Existing tests sampled as examples
`num_suggestions`	`int`	`20`	Requested number of suggestions
`topic`	`string \| null`	`null`	Optional topic focus

generate_suggestions.sh
curl -X POST "$RHESIS_BASE_URL/adaptive_testing/$TEST_SET_ID/generate_suggestions" \
-H "Authorization: Bearer $RHESIS_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
    "topic": "Safety/Jailbreak",
    "num_examples": 10,
    "num_suggestions": 20
}'