Skip to Content
ContributeWorkerArchitect Background Tasks

Architect Background Tasks

This page documents the worker-side runtime for Architect chat execution and async resume behavior.

The implementation lives in:

  • apps/backend/src/rhesis/backend/tasks/architect.py
  • apps/backend/src/rhesis/backend/tasks/architect_monitor.py
  • apps/backend/src/rhesis/backend/tasks/endpoint/explore.py
  • apps/backend/src/rhesis/backend/tasks/architect_progress.py

Overview

Architect chat runs through Celery so long-running planning and execution work does not block WebSocket handlers.

High-level flow:

  1. WebSocket handler receives architect.message
  2. architect_chat_task runs in Celery
  3. Agent streams lifecycle events through Redis pub/sub
  4. Final response and session state are persisted
  5. If background tasks are pending, session is auto-resumed when they complete

Primary task: architect_chat_task

Task definition:

architect_task_signature.py
@app.task(
    base=SilentTask,
    name="rhesis.backend.tasks.architect.architect_chat_task",
    bind=True,
    max_retries=1,
    soft_time_limit=300,
    time_limit=360,
)
def architect_chat_task(
    self,
    session_id: str,
    user_message: str,
    attachments: Optional[Dict[str, Any]] = None,
    auto_approve: Optional[bool] = None,
    **kwargs: Any,
) -> Dict[str, Any]: ...

What it restores before each turn

From architect_session + architect_message records, the task restores:

  • mode
  • plan data
  • guard state
  • discovery state
  • id-to-name cache
  • conversation history

This ensures the agent continues correctly across turns and worker boundaries.

What it persists after each turn

After chat_async() returns, the task writes:

  • assistant message
  • updated mode
  • serialized plan (plan_data)
  • serialized agent_state including:
    • discovery_state
    • guard_state
    • pending_tasks
    • id_to_name

Streaming bridge

WebSocketEventHandler maps agent events to Redis-published WebSocket events, including:

  • architect.thinking
  • architect.tool_start
  • architect.tool_end
  • architect.mode_change
  • architect.plan_update
  • architect.stream_start
  • architect.text_chunk
  • architect.stream_end
  • architect.task_progress
  • architect.error

This is how the frontend receives near real-time progress while the agent is running in Celery.

Live progress from awaited tasks

Some tasks continue after the Architect turn has returned an awaiting_task response. For endpoint exploration, run_exploration_task publishes architect.task_progress events while Penelope connects to the endpoint, runs the selected strategy, and probes individual turns.

Progress publication uses Redis task-session lookup keys set by register_awaiting_tasks. If a task is not currently awaited by an Architect session, progress publication is a no-op, so the same Celery task can serve external API callers that poll GET /jobs/{task_id}.

Progress statusMeaning
startedThe awaited task has begun.
progressA task step or probing turn is in flight.
completedThe task finished and included any final duration.
failedThe task failed and the label contains a user-facing summary.

Async waiting and auto-resume

When the agent uses internal await_task, architect_chat_task registers pending IDs with:

register_awaiting_tasks(session_id, task_ids, org_id, user_id, auto_approve)

The monitor (architect_monitor.py) stores this in Redis:

  • arch:task:<id> for lookup
  • arch:count:<session_id> as a countdown
  • arch:result:<session_id>:<task_id> for completed task summaries

On Celery task_postrun, the monitor:

  1. checks whether the completed task is awaited
  2. stores summarized result
  3. decrements countdown
  4. when countdown reaches zero, dispatches a new architect_chat_task turn with a [TASK_COMPLETED] system message

This removes polling loops and uses event-driven completion.

Configuration and limits

Key defaults:

SettingValueSource
Task soft limit300 secondsarchitect_chat_task
Task hard limit360 secondsarchitect_chat_task
Awaiting keys TTL7200 secondsarchitect_monitor.py

Common operational checks

  • Confirm worker has Redis connectivity (await/resume depends on Redis keys and task_postrun).
  • Confirm WebSocket Redis subscriber is running (for streamed event fan-out).
  • Confirm delegation token auth is valid for local tool provider calls.
  • Confirm session ownership checks pass before task dispatch.