Trace Ingestion Pipeline
This document explains how traces are processed after arriving via POST /telemetry/traces, covering the full pipeline from storage through enrichment and metric evaluation.
Pipeline Overview
When traces are ingested, the backend stores spans immediately, then dispatches post-ingestion work (linking, enrichment, evaluation) either asynchronously via Celery or synchronously as a fallback.
Phase 1: Span Storage
The telemetry router receives an OTELTraceBatch payload containing one or more spans. Before storage, the backend injects any pending mapped output into span attributes (for SDK endpoints where output arrives asynchronously).
Spans are then stored in the Trace table via crud.create_trace_spans(). Each span record includes the OTEL trace ID, span ID, parent span ID, timing data, attributes, and tenant context (organization and project).
Phase 2: Post-Ingestion Dispatch
After storage, the router checks for Celery worker availability using a TTL-cached ping (300-second cache to avoid repeated 3-second Celery inspect calls).
Async Path (Workers Available)
When workers are available, the router dispatches a single post_ingest_link Celery task that handles all post-ingestion work:
Sync Fallback (No Workers)
Without workers, the router runs linking and enrichment synchronously in the same request. Metric evaluation is skipped because it involves LLM calls that should not block API responses.
Phase 3: Linking
The post_ingest_link task performs three types of linking:
-
Test-result linking: Associates trace spans with test results when spans carry test execution context attributes (
rhesis.test.test_run_id,rhesis.test.test_result_id, etc.). -
Conversation-id linking: Patches first-turn spans with conversation IDs that were not known at the time the span was stored. This happens when a stateful endpoint generates the conversation ID during invocation.
-
Input file linking: Attaches pending file records (images, documents) to their corresponding trace spans.
Phase 4: Enrichment
After linking, the pipeline dispatches an enrichment chain per unique trace ID. The first task in the chain is enrich_trace_async, which runs the TraceEnricher processor:
Enrichment calculates three things from the trace’s spans:
-
Token costs: Uses LiteLLM’s pricing database to calculate USD and EUR costs for each LLM invocation span. Looks for spans with
ai.operation.type = "ai.llm.invoke"and reads token counts fromai.llm.tokens.input/ai.llm.tokens.output. -
Anomaly detection: Flags slow spans (greater than 10 seconds), high token usage (greater than 10,000 tokens), and error spans.
-
Metadata extraction: Collects unique models, tools, and operation types used across the trace.
The enriched data is stored in the enriched_data JSON column on the root span.
Phase 5: Trace Metrics Evaluation
The second task in the chain is evaluate_turn_trace_metrics, which runs LLM-based metric evaluation on the trace. This is the step that applies configured quality metrics (relevance, coherence, safety, etc.) to trace content.
Prerequisites for Evaluation
Evaluation requires all of the following:
-
Celery workers running — evaluation never runs in the sync fallback path.
-
Trace metrics enabled on the project — the project’s
attributes.trace_metrics.enabledmust not befalse. -
Trace-scoped metrics configured — at least one metric with
Tracein itsmetric_scopemust exist for the organization. -
Input/output attributes on the root span — the root span (a span with no
parent_span_id) must include:rhesis.conversation.input— the user’s input textrhesis.conversation.output— the system’s response text
If any of these conditions are missing, evaluation is skipped silently with a log message.
Evaluation Flow
Multi-Turn (Conversation) Evaluation
For traces with a conversation_id, a second evaluation phase runs on a debounce timer. The evaluate_conversation_trace_metrics task:
- Loads all root spans sharing the same
trace_id, ordered bystart_time. - Reconstructs the full conversation from
rhesis.conversation.input/rhesis.conversation.outputattributes across all turns. - Evaluates Multi-Turn scoped metrics against the full conversation history.
- Derives a combined Pass/Fail status from both turn-level and conversation-level results.
External Trace Ingestion
For deployments where the customer generates traces externally (e.g., with SDK tracing disabled), the same pipeline applies as long as the published spans meet the requirements.
Required Span Attributes for Evaluation
When posting traces via POST /telemetry/traces, include these attributes on the root span to enable the full pipeline:
| Attribute | Required For | Description |
|---|---|---|
rhesis.conversation.input | Evaluation | The user’s input text for this turn |
rhesis.conversation.output | Evaluation | The system’s response text for this turn |
ai.operation.type | Cost calculation | Set to ai.llm.invoke for LLM spans |
ai.model.name | Cost calculation | Model identifier (e.g., gpt-4o, claude-3-sonnet) |
ai.llm.tokens.input | Cost calculation | Number of input tokens |
ai.llm.tokens.output | Cost calculation | Number of output tokens |
conversation_id | Multi-turn eval | Shared conversation identifier across turns |
What Runs Without These Attributes
- Enrichment (Phase 4) always runs. Cost calculation skips spans that lack LLM-specific attributes; anomaly detection and metadata extraction still process all spans.
- Evaluation (Phase 5) requires
rhesis.conversation.inputand/orrhesis.conversation.output. Without them, evaluation returns early with statusno_io.
Related Documentation
- Background Tasks: Celery configuration, task patterns, and tenant context
- Architecture: Component relationships and dependencies
- Troubleshooting: Common worker issues and fixes