Skip to Content
DevelopmentWorkerChord Monitoring Quick Reference

Chord Monitoring Quick Reference

This is a quick reference for chord monitoring commands. For detailed information, see Chord Management and Monitoring.

Quick Commands

🔍 Check Status

code.txt
# Quick interactive check and fix
python fix_chords.py

# Show current chord status
python -m rhesis.backend.tasks.execution.chord_monitor status

# Check for stuck chords (>1 hour)
python -m rhesis.backend.tasks.execution.chord_monitor check --max-hours 1

🔧 Fix Issues

code.txt
# Dry run - see what would be revoked
python -m rhesis.backend.tasks.execution.chord_monitor revoke --max-hours 1 --dry-run

# Actually revoke stuck chords
python -m rhesis.backend.tasks.execution.chord_monitor revoke --max-hours 1

# Emergency: purge all tasks (dangerous!)
python -m rhesis.backend.tasks.execution.chord_monitor clean --force

🔍 Inspect Specific Chord

code.txt
# Get details about a specific chord
python -m rhesis.backend.tasks.execution.chord_monitor inspect <chord-id>

# Get verbose details with subtasks
python -m rhesis.backend.tasks.execution.chord_monitor inspect <chord-id> --verbose

Common Workflows

Daily Health Check

code.txt
python fix_chords.py

When Tests are Stuck

code.txt
# 1. Check status
python -m rhesis.backend.tasks.execution.chord_monitor status

# 2. Look for stuck chords
python -m rhesis.backend.tasks.execution.chord_monitor check --max-hours 0.5

# 3. Revoke if needed
python -m rhesis.backend.tasks.execution.chord_monitor revoke --max-hours 0.5

Emergency Recovery

code.txt
# 1. Stop workers
pkill -f celery

# 2. Clean all tasks
python -m rhesis.backend.tasks.execution.chord_monitor clean --force

# 3. Restart workers
celery -A rhesis.backend.worker.app worker --loglevel=INFO &

# 4. Verify
python fix_chords.py

Log Monitoring

code.txt
# Watch for chord issues
tail -f celery_worker.log | grep -E "(chord_unlock|MaxRetries|ERROR)"

# Count stuck chords
grep "chord_unlock.*retry" celery_worker.log | wc -l

Return Codes

  • 0: Success / No issues
  • 1: Issues found / Errors
  • 130: Cancelled by user

Command Options

OptionDescription
--max-hours NConsider chords stuck after N hours
--dry-runShow what would be done
--jsonJSON output
--verboseDetailed information
--forceRequired for destructive operations

Files

  • fix_chords.py - Quick interactive script
  • src/rhesis/backend/tasks/execution/chord_monitor.py - Full monitoring suite
  • celery_worker.log - Worker logs
  • src/rhesis/backend/worker.py - Configuration