Skip to Content
DocsTest SetsAdaptive Testing

Adaptive Testing

Adaptive Testing is a topic-based workflow for expanding and maintaining single-turn test sets over time. Instead of treating a test set as static, you can organize tests into topic trees, generate outputs in bulk, evaluate results with a selected metric, and iterate with AI-generated suggestions.

When to Use Adaptive Testing

Use Adaptive Testing when you need to:

  • Grow coverage across specific risk areas or product domains
  • Re-evaluate existing tests without regenerating all outputs
  • Curate suggestions before saving them to your test set
  • Keep one evolving test set instead of creating many one-off sets

End-to-End Workflow

Adaptive Testing in the UI follows this loop:

  1. Create an adaptive test set
  2. Organize tests in topics
  3. Generate outputs from a selected endpoint
  4. Evaluate tests with a selected metric
  5. Generate and review suggestions
  6. Accept selected suggestions into the test set

Topic operations are hierarchical. Renaming a topic cascades to child topics and tests; removing a topic removes subtopics and moves tests to the parent topic.

Generate Outputs and Evaluate with Overwrite Control

Two actions drive most iteration cycles:

  • Generate outputs: invoke an endpoint for test inputs and store output in test metadata
  • Evaluate: run a selected metric against test input/output pairs and store label and score in test metadata

Both actions support an overwrite option.

ParameterTypeDefaultBehavior
topicstring | nullnullLimits processing to a topic
include_subtopicsbooleantrueIncludes descendant topics when topic is set
overwritebooleanfalseReplaces existing outputs/results instead of skipping
test_idsstring[] | nullnullOptional explicit subset of tests

With overwrite=false, tests that already have outputs or evaluation labels are skipped. The response includes generated or evaluated, plus skipped and failed counts.

Generate Outputs API Example

generate_outputs.sh
curl -X POST "$RHESIS_BASE_URL/adaptive_testing/$TEST_SET_ID/generate_outputs" \
-H "Authorization: Bearer $RHESIS_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
    "endpoint_id": "00000000-0000-0000-0000-000000000000",
    "topic": "Safety/Jailbreak",
    "include_subtopics": true,
    "overwrite": false
}'

Evaluate API Example

evaluate_tests.sh
curl -X POST "$RHESIS_BASE_URL/adaptive_testing/$TEST_SET_ID/evaluate" \
-H "Authorization: Bearer $RHESIS_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
    "metric_names": ["answer_relevancy"],
    "topic": "Safety/Jailbreak",
    "include_subtopics": true,
    "overwrite": true
}'

Suggestions Workflow

Suggestion endpoints are non-persisted until you accept results in the UI.

  1. POST /adaptive_testing/\{id\}/generate_suggestions
  2. POST /adaptive_testing/\{id\}/generate_suggestion_outputs
  3. POST /adaptive_testing/\{id\}/evaluate_suggestions
  4. Accept selected suggestions to create real tests in the set
Suggestion parameterTypeDefaultNotes
num_examplesint10Existing tests sampled as examples
num_suggestionsint20Requested number of suggestions
topicstring | nullnullOptional topic focus
generate_suggestions.sh
curl -X POST "$RHESIS_BASE_URL/adaptive_testing/$TEST_SET_ID/generate_suggestions" \
-H "Authorization: Bearer $RHESIS_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
    "topic": "Safety/Jailbreak",
    "num_examples": 10,
    "num_suggestions": 20
}'