Skip to Content
DocsArchitectPlanning Test Suites

Planning Test Suites

After exploration, Architect proposes a structured plan. You review it, ask for changes, and approve before anything is created.

Plan structure

A plan has five sections:

SectionRequiredWhat it contains
ProjectNoA name and description to group the test suite. Omit for ad-hoc tests against an existing endpoint.
BehaviorsYesWhat the endpoint should (or shouldn’t) do. Each test is tagged with a behavior.
Test setsYesNamed collections of tests, each targeting specific behaviors, categories, and topics.
MetricsYesHow each test is evaluated — the evaluation criteria and pass/fail threshold.
Behavior-metric mappingsYesWhich metric evaluates which behavior. Every behavior needs at least one.

Reuse status

Architect checks what already exists on your platform before proposing new entities. Each behavior and metric in the plan is labelled:

LabelMeaning
(reuse)Already exists — Architect uses it as-is
(improve)Exists but needs adjustment — Architect refines it in place
(new)Doesn’t exist — Architect creates it

This prevents duplicate behaviors and metrics from accumulating across sessions.

Naming conventions

Architect uses Title Case for behavior and metric names — two to five words describing what is being measured or expected.

GoodAvoid
Refuses Harmful Requestsrefuses_harmful_requests
Provides Accurate Informationaccuracy
Handles Ambiguous QueriesisHandlingAmbiguous

Test generation vs. verbatim import

By default, Architect generates test content using the Rhesis synthesizer. You supply the goal (“what should the test probe for?”) and Architect writes the actual prompts. This gives varied, realistic test cases without you having to author each one.

If you have specific prompts that must appear verbatim — for example, prompts from a security audit or a bug report — tell Architect explicitly: “Use these exact prompts.” It will import them without modification.

Knowledge sources

If your endpoint’s correct behavior depends on internal documentation — a product FAQ, an API spec, a policy document — you can ground the test generation in that content.

Just reference the document by name in your message:

"Use our product FAQ as the basis for the test prompts."

Architect looks up the matching source in your platform knowledge library, passes its ID to the synthesizer, and the generated tests are grounded in that content automatically.

Knowledge source grounding only applies to single-turn test sets. Multi-turn test generation does not consume sources.

Iterating on the plan

The plan is a conversation. After Architect presents it, you can:

  • Ask for more or fewer test sets
  • Swap a metric for an existing one
  • Add specific behaviors Architect didn’t include
  • Remove entities you don’t need

Architect updates the plan and presents it again before creating anything. There is no limit to how many rounds of revision you can make.

→ Once you approve, see Running and Analyzing for what happens next. → For confirmation controls, see Chat Features.