Planning Test Suites
After exploration, Architect proposes a structured plan. You review it, ask for changes, and approve before anything is created.
Plan structure
A plan has five sections:
| Section | Required | What it contains |
|---|---|---|
| Project | No | A name and description to group the test suite. Omit for ad-hoc tests against an existing endpoint. |
| Behaviors | Yes | What the endpoint should (or shouldn’t) do. Each test is tagged with a behavior. |
| Test sets | Yes | Named collections of tests, each targeting specific behaviors, categories, and topics. |
| Metrics | Yes | How each test is evaluated — the evaluation criteria and pass/fail threshold. |
| Behavior-metric mappings | Yes | Which metric evaluates which behavior. Every behavior needs at least one. |
Reuse status
Architect checks what already exists on your platform before proposing new entities. Each behavior and metric in the plan is labelled:
| Label | Meaning |
|---|---|
| (reuse) | Already exists — Architect uses it as-is |
| (improve) | Exists but needs adjustment — Architect refines it in place |
| (new) | Doesn’t exist — Architect creates it |
This prevents duplicate behaviors and metrics from accumulating across sessions.
Naming conventions
Architect uses Title Case for behavior and metric names — two to five words describing what is being measured or expected.
| Good | Avoid |
|---|---|
Refuses Harmful Requests | refuses_harmful_requests |
Provides Accurate Information | accuracy |
Handles Ambiguous Queries | isHandlingAmbiguous |
Test generation vs. verbatim import
By default, Architect generates test content using the Rhesis synthesizer. You supply the goal (“what should the test probe for?”) and Architect writes the actual prompts. This gives varied, realistic test cases without you having to author each one.
If you have specific prompts that must appear verbatim — for example, prompts from a security audit or a bug report — tell Architect explicitly: “Use these exact prompts.” It will import them without modification.
Knowledge sources
If your endpoint’s correct behavior depends on internal documentation — a product FAQ, an API spec, a policy document — you can ground the test generation in that content.
Just reference the document by name in your message:
"Use our product FAQ as the basis for the test prompts."Architect looks up the matching source in your platform knowledge library, passes its ID to the synthesizer, and the generated tests are grounded in that content automatically.
Knowledge source grounding only applies to single-turn test sets. Multi-turn test generation does not consume sources.
Iterating on the plan
The plan is a conversation. After Architect presents it, you can:
- Ask for more or fewer test sets
- Swap a metric for an existing one
- Add specific behaviors Architect didn’t include
- Remove entities you don’t need
Architect updates the plan and presents it again before creating anything. There is no limit to how many rounds of revision you can make.
→ Once you approve, see Running and Analyzing for what happens next. → For confirmation controls, see Chat Features.