Architect Background Tasks
This page documents the worker-side runtime for Architect chat execution and async resume behavior.
The implementation lives in:
apps/backend/src/rhesis/backend/tasks/architect.pyapps/backend/src/rhesis/backend/tasks/architect_monitor.pyapps/backend/src/rhesis/backend/tasks/endpoint/explore.pyapps/backend/src/rhesis/backend/tasks/architect_progress.py
Overview
Architect chat runs through Celery so long-running planning and execution work does not block WebSocket handlers.
High-level flow:
- WebSocket handler receives
architect.message architect_chat_taskruns in Celery- Agent streams lifecycle events through Redis pub/sub
- Final response and session state are persisted
- If background tasks are pending, session is auto-resumed when they complete
Primary task: architect_chat_task
Task definition:
What it restores before each turn
From architect_session + architect_message records, the task restores:
- mode
- plan data
- guard state
- discovery state
- id-to-name cache
- conversation history
This ensures the agent continues correctly across turns and worker boundaries.
What it persists after each turn
After chat_async() returns, the task writes:
- assistant message
- updated mode
- serialized plan (
plan_data) - serialized
agent_stateincluding:discovery_stateguard_statepending_tasksid_to_name
Streaming bridge
WebSocketEventHandler maps agent events to Redis-published WebSocket events, including:
architect.thinkingarchitect.tool_startarchitect.tool_endarchitect.mode_changearchitect.plan_updatearchitect.stream_startarchitect.text_chunkarchitect.stream_endarchitect.task_progressarchitect.error
This is how the frontend receives near real-time progress while the agent is running in Celery.
Live progress from awaited tasks
Some tasks continue after the Architect turn has returned an awaiting_task response. For endpoint exploration, run_exploration_task publishes architect.task_progress events while Penelope connects to the endpoint, runs the selected strategy, and probes individual turns.
Progress publication uses Redis task-session lookup keys set by register_awaiting_tasks. If a task is not currently awaited by an Architect session, progress publication is a no-op, so the same Celery task can serve external API callers that poll GET /jobs/{task_id}.
| Progress status | Meaning |
|---|---|
started | The awaited task has begun. |
progress | A task step or probing turn is in flight. |
completed | The task finished and included any final duration. |
failed | The task failed and the label contains a user-facing summary. |
Async waiting and auto-resume
When the agent uses internal await_task, architect_chat_task registers pending IDs with:
register_awaiting_tasks(session_id, task_ids, org_id, user_id, auto_approve)
The monitor (architect_monitor.py) stores this in Redis:
arch:task:<id>for lookuparch:count:<session_id>as a countdownarch:result:<session_id>:<task_id>for completed task summaries
On Celery task_postrun, the monitor:
- checks whether the completed task is awaited
- stores summarized result
- decrements countdown
- when countdown reaches zero, dispatches a new
architect_chat_taskturn with a[TASK_COMPLETED]system message
This removes polling loops and uses event-driven completion.
Configuration and limits
Key defaults:
| Setting | Value | Source |
|---|---|---|
| Task soft limit | 300 seconds | architect_chat_task |
| Task hard limit | 360 seconds | architect_chat_task |
| Awaiting keys TTL | 7200 seconds | architect_monitor.py |
Common operational checks
- Confirm worker has Redis connectivity (await/resume depends on Redis keys and
task_postrun). - Confirm WebSocket Redis subscriber is running (for streamed event fan-out).
- Confirm delegation token auth is valid for local tool provider calls.
- Confirm session ownership checks pass before task dispatch.