Skip to Content
ContributeSDKArchitect Agent

Architect Agent (SDK)

The ArchitectAgent class lives in sdk/src/rhesis/sdk/agents/architect/. It drives the conversational test-suite design workflow — managing mode transitions, plan state, tool confirmation, and event emission across conversation turns.

Module layout

sdk/src/rhesis/sdk/agents/architect/ ├── __init__.py # Public exports ├── agent.py # ArchitectAgent class ├── config.py # ArchitectConfig dataclass ├── plan.py # ArchitectPlan Pydantic model + spec models ├── tool_registry.py # TOOL_REGISTRY mapping tools → modes └── prompt_templates/ ├── system_prompt.j2 # Core behavior, rules, workflow phases ├── personality.j2 # Tone and persona shaping ├── streaming_response.j2 # Streaming acknowledgment format └── iteration_prompt.j2 # Injected between ReAct loop iterations

ArchitectAgent class

ArchitectAgent extends BaseAgent with multi-turn conversation state, plan tracking, mode management, and the two-layer write guard.

Constructor parameters:

ParameterTypeDefaultPurpose
modelstr | BaseLLM | NoneNoneLLM model identifier or instance
toolslist[BaseTool | MCPTool] | NoneNoneTool list injected by caller
configArchitectConfig | NoneNoneConfig override (uses defaults if omitted)
max_iterationsint | None15ReAct loop iteration cap
max_tool_executionsint | NoneNoneOptional cap on tool calls per turn
timeout_secondsfloat | NoneNonePer-turn timeout
history_windowint | NoneNoneConversation history window
verboseboolFalsePrint loop trace to stdout
event_handlerslist[AgentEventHandler] | NoneNoneReal-time lifecycle callbacks

Internal state:

AttributePurpose
_conversation_historyFull turn history (user + assistant messages)
_planCurrent ArchitectPlan instance, set when the agent calls save_plan
_modeCurrent AgentMode (DISCOVERY, PLANNING, CREATING, EXECUTING)
_creation_approvedTrue for the turn immediately after confirmation
_confirming_toolsSpecific mutating tools blocked and awaiting user confirmation
_mutating_toolsLazily built from tool requires_confirmation metadata
_auto_approve_allWhen True, bypasses per-turn confirmation for all mutating tools
_discovery_stateTracks endpoint ID, exploration status, observations, open questions
_id_to_nameUUID → entity name map used to resolve IDs in mapping tool calls
_pending_tasksAsync tasks submitted but not yet resolved (Celery background tasks)

Public methods:

code.txt
# Synchronous — runs asyncio.run() internally
response = architect.chat("Test my chatbot", attachments={"files": [...]})

# Async — preferred when running inside an existing event loop
response = await architect.chat_async("Test my chatbot", attachments={"files": [...]})

ArchitectConfig

ArchitectConfig is a frozen dataclass. Pass a custom instance to the config parameter to override any value.

FieldDefaultPurpose
max_iterations15Maximum ReAct loop iterations per turn
max_payload_bytes100_000Total argument payload size limit
max_string_value_len10_000Max length of any single string argument
max_array_items100Max items in any array argument
max_attachment_chars20_000Attachment text truncation limit (chars)
recent_msg_limit4Number of recent messages kept at full length
recent_msg_max_chars2_000Max chars per recent message
older_msg_max_chars500Max chars per older (compressed) message
tool_result_preview_chars4_000Tool result preview length for streaming
reasoning_preview_chars200Reasoning text preview length
readonly_http_methods{GET, HEAD, OPTIONS}Methods treated as non-mutating (no confirmation required)
code.txt
from rhesis.sdk.agents.architect import ArchitectAgent, ArchitectConfig

config = ArchitectConfig(max_iterations=25, max_attachment_chars=50_000)
agent = ArchitectAgent(model="vertex_ai/gemini-2.0-flash", config=config)

ArchitectPlan model

ArchitectPlan (in plan.py) is the Pydantic model that holds the agent’s structured test suite proposal. The agent populates it by calling the internal save_plan tool during the planning phase.

Spec models

ModelKey fields
ProjectSpecname, description, completed
BehaviorSpecname, description, reuse_status, existing_id, completed
TestSetSpecname, description, num_tests, test_type, generation_prompt, behaviors, categories, topics, completed
MetricSpecname, description, reuse_status, existing_id, evaluation_prompt, evaluation_steps, threshold, threshold_operator, completed
MappingSpecbehavior, metrics (list of metric names), completed

reuse_status is a Literal["reuse", "improve", "new"] on BehaviorSpec and MetricSpec.

save_plan auto-generation

The build_save_plan_tool() function in plan.py generates the save_plan JSON schema automatically from ArchitectPlan’s Pydantic field definitions. Internal fields (completed, existing_id) are stripped from the tool schema — the LLM never sees them; they are populated by the agent at creation time.

This means the plan schema stays in sync with the Pydantic model automatically. If you add a new field to a spec class, build_save_plan_tool() picks it up without any manual schema update — unless the field should be hidden from the LLM, in which case add it to the exclusion list in build_save_plan_tool().

Tool registry

TOOL_REGISTRY in tool_registry.py maps tool names to their AgentMode and optional PlanCategory. The agent uses this to:

  1. Determine mode transitions — when the agent calls a tool registered under CREATING, the mode switches to AgentMode.CREATING.
  2. Track plan progress — when a creating tool succeeds, its PlanCategory is used to tick the corresponding checkbox in the plan.
ToolModePlan category
list_sourcesDISCOVERY
create_projectCREATINGPROJECT
create_behaviorCREATINGBEHAVIOR
generate_test_setCREATINGTEST_SET
create_test_set_bulkCREATINGTEST_SET
create_metricCREATINGMETRIC
generate_metricCREATINGMETRIC
improve_metricCREATINGMETRIC
add_behavior_to_metricCREATINGMAPPING
execute_test_setEXECUTING
get_test_result_statsEXECUTING
get_test_run_statsEXECUTING

Tools not in the registry (read-only platform tools, explore tool) do not trigger mode changes.

Prompt templates

Templates are Jinja2 files in prompt_templates/. They are rendered at the start of each turn.

TemplateWhat it controls
system_prompt.j2Core agent persona, workflow phases, tool usage rules, write guard instructions, security boundaries, and off-topic refusal rules. This is the primary source of truth for agent behavior.
personality.j2Tone shaping — how Architect presents itself (direct, structured, no filler). Injected alongside the system prompt.
streaming_response.j2Format for streaming acknowledgments shown to the user while tools execute.
iteration_prompt.j2Injected between ReAct iterations to keep the agent on track and prevent loops.

When modifying behavior, change system_prompt.j2 first. The other templates are secondary.

Write guard (two-layer safety)

The write guard prevents the agent from creating or modifying platform entities without user approval.

Layer 1 — prompt: system_prompt.j2 instructs the LLM to always present a plan and ask for confirmation before calling any mutating tool.

Layer 2 — structural: agent.py intercepts tool calls at execution time. If a mutating tool is called before _creation_approved is True, the agent blocks it, records the tool name in _confirming_tools, and presents a confirmation prompt to the user. On the next turn, if the user confirms, only the specific blocked tools are unlocked.

_auto_approve_all = True bypasses layer 2 for the session (set when the UI auto-approve toggle is on). Layer 1 (prompt) is always active.

A tool is considered mutating if its HTTP method is not in ArchitectConfig.readonly_http_methods (GET, HEAD, OPTIONS).

How to add a new MCP tool

  1. Define the tool in mcp_tools.yaml (backend):
code.txt
- name: my_new_tool
description: "What this tool does."
method: POST
path: /my-resource/
requires_confirmation: true   # set true for mutating tools
  1. Register the tool in TOOL_REGISTRY if it should trigger a mode change or track plan progress:
code.txt
# tool_registry.py
TOOL_REGISTRY["my_new_tool"] = ToolEntry(
    mode=AgentMode.CREATING,
    plan_category=PlanCategory.BEHAVIOR,  # or None
)
  1. Update the system prompt (system_prompt.j2) if the tool requires specific usage guidance — when to call it, what arguments to pass, and how to interpret results.

  2. Test with the playground scripts in playground/telemachus/architect_e2e.py and tool_call_chain.py are good starting points for exercising new tool integrations.