Skip to Content
DevelopmentTracing SystemOverview

Tracing System

Technical documentation for the Rhesis tracing system architecture and implementation.

For SDK Users: See the Tracing documentation for usage guides. This section covers the internal architecture and design for developers and contributors.

Overview

The tracing system captures OpenTelemetry-compliant traces from SDK-instrumented applications. It supports two operating modes:

  • Test Mode: Traces linked to test runs, test cases, and test results
  • Production Mode: Traces from live application monitoring

High-Level Architecture

Key Technologies

ComponentTechnologyPurpose
SDK TracerOpenTelemetry PythonSpan creation with AI semantic conventions
Batch ProcessorOTEL BatchSpanProcessorBatches spans, exports every 5 seconds
TransportOTLP/HTTPJSON payload to /telemetry/traces
StoragePostgreSQL + JSONBFlexible span storage with full-text search
EnrichmentCelery + LiteLLMCost calculation, anomaly detection
LinkingService LayerHybrid strategy for test context linking

Communication Channels

The SDK uses two independent channels:

ChannelProtocolPurposeUsed By
TracingHTTP POSTExport OpenTelemetry spans@observe, @endpoint
TestingWebSocketRemote test invocation@endpoint only

Design Principles

  1. OpenTelemetry Standard - Industry-standard OTLP protocol for interoperability
  2. Async-First with Sync Fallback - Optimal in production, works without workers in development
  3. Hybrid Linking - Two strategic linking points to handle race conditions
  4. Idempotent Operations - Safe to call linking multiple times
  5. Cache Enrichment - Compute once, query fast
  6. Graceful Degradation - System works even when components fail

Performance Characteristics

OperationTimingNotes
Span creation (SDK)~0.1msPer span, negligible overhead
BatchProcessor delay5000msFixed by OpenTelemetry design
Span export (OTLP)~10msNetwork call
Backend ingestion~10-20msWith async enrichment
Enrichment calculation~50-100msBackground (Celery)
Trace query~10msCached enrichment
End-to-end~5 secondsTest start to queryable trace

Key Files

SDK

FilePurpose
sdk/src/rhesis/sdk/telemetry/tracer.pyCore Tracer class
sdk/src/rhesis/sdk/telemetry/exporter.pyOTLP HTTP exporter
sdk/src/rhesis/sdk/telemetry/attributes.pyAI semantic conventions
sdk/src/rhesis/sdk/decorators/observe.py@observe decorator
sdk/src/rhesis/sdk/decorators/endpoint.py@endpoint decorator

Backend

FilePurpose
apps/backend/.../routers/telemetry.pyIngestion endpoint
apps/backend/.../services/telemetry/linking_service.pyHybrid linking logic
apps/backend/.../services/telemetry/enricher.pyCost/anomaly enrichment
apps/backend/.../tasks/execution/executors/results.pyTest result processing
apps/backend/.../crud.pycreate_trace_spans(), update_traces_with_test_result_id()

Next Steps