Rhesis Worker Documentation
This directory contains documentation related to the Rhesis worker system, which handles background processing, task queues, and asynchronous operations using Redis as the message broker.
Contents
- Background Tasks and Processing: Detailed information about Redis-based Celery configuration, task management, tenant context handling, error recovery, and troubleshooting.
- Troubleshooting Guide: Solutions for common issues with workers, tasks, and the Celery processing system.
- GKE Troubleshooting Guide: NEW - Comprehensive guide for diagnosing and fixing worker issues in Google Kubernetes Engine.
- Logging Guide: NEW - Complete guide to worker logging, log analysis, and monitoring.
- Trace Ingestion Pipeline: How traces flow through storage, enrichment, and metric evaluation after ingestion via
POST /telemetry/traces. - Architecture and Dependencies: Explanation of how the worker system integrates with the backend and SDK components.
Topics Covered
- Redis-based Celery configuration with TLS support
- Worker deployment and scaling in GKE (Google Kubernetes Engine)
- Task management and organization
- Async batch execution and cooperative cancellation
- Multi-tenancy in background tasks
- Error handling and recovery
- Task monitoring and observability
- GKE troubleshooting with kubectl
- Comprehensive logging and log analysis
- Redis connection diagnostics
Quick Start Guides
For Worker Registration Issues
If workers aren’t processing tasks or seem disconnected:
- Check Worker Status: Create and run
check_workers.py(see Worker Registration) - Quick Status:
- Test Redis Connection: Verify broker connectivity
- Scale Workers (for GKE):
For GKE Worker Issues
If you’re experiencing deployment or connectivity issues:
- Connect to Cluster: Follow GKE Setup
- Check Pod Status:
- Check Logs:
- Test Health Endpoints:
- Check Worker Registration: Use Worker Registration Checking
- Full Diagnostics: See GKE Troubleshooting Guide and Logging Guide
For Current Test Execution Issues (Async Batch Engine)
If test runs are stuck or not progressing as expected:
- Inspect active tasks:
- Check worker logs for batch runner and cancellation-watchdog events.
- Review execution internals: Test Execution and Execution Modes
- Use troubleshooting playbooks: Troubleshooting Guide
Related Documentation
- Backend API Documentation : Information about the API services that queue background tasks