Skip to Content
ContributeWorkerOverview

Rhesis Worker Documentation

This directory contains documentation related to the Rhesis worker system, which handles background processing, task queues, and asynchronous operations using Redis as the message broker.

Contents

  • Background Tasks and Processing: Detailed information about Redis-based Celery configuration, task management, tenant context handling, error recovery, and troubleshooting.
  • Troubleshooting Guide: Solutions for common issues with workers, tasks, and the Celery processing system.
  • GKE Troubleshooting Guide: NEW - Comprehensive guide for diagnosing and fixing worker issues in Google Kubernetes Engine.
  • Logging Guide: NEW - Complete guide to worker logging, log analysis, and monitoring.
  • Trace Ingestion Pipeline: How traces flow through storage, enrichment, and metric evaluation after ingestion via POST /telemetry/traces.
  • Architecture and Dependencies: Explanation of how the worker system integrates with the backend and SDK components.

Topics Covered

  • Redis-based Celery configuration with TLS support
  • Worker deployment and scaling in GKE (Google Kubernetes Engine)
  • Task management and organization
  • Async batch execution and cooperative cancellation
  • Multi-tenancy in background tasks
  • Error handling and recovery
  • Task monitoring and observability
  • GKE troubleshooting with kubectl
  • Comprehensive logging and log analysis
  • Redis connection diagnostics

Quick Start Guides

For Worker Registration Issues

If workers aren’t processing tasks or seem disconnected:

  1. Check Worker Status: Create and run check_workers.py (see Worker Registration)
  2. Quick Status:
code.txt
python -c "from rhesis.backend.worker import app; print('Workers:', list(app.control.inspect().active().keys()) if app.control.inspect().active() else 'None')"
  1. Test Redis Connection: Verify broker connectivity
  2. Scale Workers (for GKE):
code.txt
kubectl scale deployment rhesis-worker --replicas=0/2 -n <namespace>

For GKE Worker Issues

If you’re experiencing deployment or connectivity issues:

  1. Connect to Cluster: Follow GKE Setup
  2. Check Pod Status:
code.txt
kubectl get pods -n <namespace>
  1. Check Logs:
code.txt
kubectl logs <pod> -c worker -n <namespace> --tail=100
  1. Test Health Endpoints:
code.txt
kubectl exec -it <pod> -- curl localhost:8080/debug
  1. Check Worker Registration: Use Worker Registration Checking
  2. Full Diagnostics: See GKE Troubleshooting Guide and Logging Guide

For Current Test Execution Issues (Async Batch Engine)

If test runs are stuck or not progressing as expected:

  1. Inspect active tasks:
code.txt
celery -A rhesis.backend.worker inspect active
  1. Check worker logs for batch runner and cancellation-watchdog events.
  2. Review execution internals: Test Execution and Execution Modes
  3. Use troubleshooting playbooks: Troubleshooting Guide