Data Structures

Schemas, database design, and data formats for the tracing system.

Span Structure

OTLP Span Format

Spans sent from SDK to backend follow the OTLP/JSON format:

Span Payload
{
  "trace_id": "a1b2c3d4e5f6...",
  "span_id": "1234567890abcdef",
  "parent_span_id": null,
  "project_id": "my-project",
  "environment": "development",
  "span_name": "ai.llm.invoke",
  "span_kind": "CLIENT",
  "start_time": "2024-01-01T00:00:00.000000Z",
  "end_time": "2024-01-01T00:00:01.500000Z",
  "status_code": "OK",
  "status_message": null,
  "attributes": {
    "ai.model.name": "gpt-4",
    "ai.model.provider": "openai",
    "ai.llm.tokens.input": 10,
    "ai.llm.tokens.output": 25,
    "rhesis.test.run_id": "uuid",
    "rhesis.test.id": "uuid"
  },
  "events": [
    {
      "name": "ai.prompt",
      "timestamp": "2024-01-01T00:00:00.100000Z",
      "attributes": {
        "ai.prompt.role": "user",
        "ai.prompt.content": "Hello, world!"
      }
    },
    {
      "name": "ai.completion",
      "timestamp": "2024-01-01T00:00:01.400000Z",
      "attributes": {
        "ai.completion.content": "Hi there! How can I help?"
      }
    }
  ],
  "links": [],
  "resource": {
    "service.name": "my-service",
    "service.namespace": "rhesis",
    "deployment.environment": "development"
  }
}

Test Execution Context

Context attributes added to spans during test execution:

Test Context Attributes
test_execution_context = {
    "rhesis.test.run_id": "uuid",              # Which test run
    "rhesis.test.id": "uuid",                  # Which test definition
    "rhesis.test.configuration_id": "uuid",    # Which configuration
    # test_result_id is linked after creation
}

Database Schema

traces Table

SQL Schema
CREATE TABLE traces (
    -- Identity
    id UUID PRIMARY KEY,
    trace_id VARCHAR(32) NOT NULL,      -- OTEL trace ID
    span_id VARCHAR(16) NOT NULL,       -- OTEL span ID
    parent_span_id VARCHAR(16),
    
    -- Span data
    span_name VARCHAR(255) NOT NULL,
    start_time TIMESTAMP WITH TIME ZONE NOT NULL,
    end_time TIMESTAMP WITH TIME ZONE NOT NULL,
    duration_ms FLOAT NOT NULL,
    status_code VARCHAR(50) NOT NULL,
    
    -- Multi-tenancy
    organization_id UUID NOT NULL REFERENCES organization(id),
    project_id UUID NOT NULL REFERENCES project(id),
    
    -- Test execution (FKs for linking)
    test_run_id UUID REFERENCES test_run(id) ON DELETE SET NULL,
    test_result_id UUID REFERENCES test_result(id) ON DELETE SET NULL,
    test_id UUID REFERENCES test(id) ON DELETE SET NULL,
    
    -- JSONB columns (flexible schema)
    attributes JSONB NOT NULL DEFAULT '{}',
    events JSONB NOT NULL DEFAULT '[]',
    enriched_data JSONB,                -- Cached enrichment
    
    -- Timestamps
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

Column Details

Column	Type	Description
`id`	UUID	Primary key (internal)
`trace_id`	VARCHAR(32)	OpenTelemetry trace ID (groups spans)
`span_id`	VARCHAR(16)	OpenTelemetry span ID (unique per span)
`parent_span_id`	VARCHAR(16)	Parent span for hierarchy
`span_name`	VARCHAR(255)	Operation name (`ai.llm.invoke`)
`start_time`	TIMESTAMP	Span start time
`end_time`	TIMESTAMP	Span end time
`duration_ms`	FLOAT	Calculated duration
`status_code`	VARCHAR(50)	OK, ERROR, UNSET
`organization_id`	UUID	Multi-tenancy isolation
`project_id`	UUID	Project isolation
`test_run_id`	UUID	Linked test run
`test_result_id`	UUID	Linked test result
`test_id`	UUID	Linked test definition
`attributes`	JSONB	Span attributes
`events`	JSONB	Span events (prompts, completions)
`enriched_data`	JSONB	Cached enrichment results

Indexes

Critical Indexes
-- Get all spans for a trace (primary query)
CREATE INDEX idx_trace_trace_id ON traces(trace_id, start_time DESC);

-- Test execution queries
CREATE INDEX idx_trace_test_run ON traces(test_run_id, start_time DESC);
CREATE INDEX idx_trace_test_result ON traces(test_result_id);

-- JSONB attribute queries
CREATE INDEX idx_trace_attributes ON traces USING GIN(attributes jsonb_path_ops);

-- Organization/project filtering
CREATE INDEX idx_trace_org_project ON traces(organization_id, project_id, created_at DESC);

-- Status code filtering
CREATE INDEX idx_trace_status ON traces(status_code, created_at DESC);

Index Usage

Query	Index Used
Get spans by trace_id	`idx_trace_trace_id`
Get traces for test run	`idx_trace_test_run`
Query by attribute (model, provider)	`idx_trace_attributes`
Filter by organization	`idx_trace_org_project`
Find error traces	`idx_trace_status`

Enrichment Data

The enriched_data JSONB column caches computed values:

Enrichment Structure
{
  "costs": {
    "total_cost_usd": 0.023,
    "total_cost_eur": 0.021,
    "breakdown": [
      {
        "span_id": "1234567890abcdef",
        "model": "gpt-4",
        "tokens_input": 150,
        "tokens_output": 80,
        "cost_usd": 0.023,
        "cost_eur": 0.021
      }
    ]
  },
  "anomalies": [
    {
      "type": "high_latency",
      "span_id": "1234567890abcdef",
      "threshold_ms": 1000,
      "actual_ms": 2340,
      "severity": "warning"
    }
  ],
  "metadata": {
    "models_used": [
      "gpt-4"
    ],
    "total_tokens_input": 150,
    "total_tokens_output": 80,
    "total_tokens": 230,
    "span_count": 5,
    "llm_call_count": 2,
    "tool_call_count": 1
  },
  "enriched_at": "2025-01-01T10:00:00Z"
}

Enrichment Fields

Field	Description
`costs.total_cost_usd`	Total cost in USD
`costs.total_cost_eur`	Total cost in EUR
`costs.breakdown`	Per-span cost breakdown
`anomalies`	Detected anomalies
`metadata.models_used`	Unique models in trace
`metadata.total_tokens`	Sum of all tokens
`metadata.span_count`	Number of spans
`enriched_at`	Enrichment timestamp

Common Query Patterns

Get Trace by ID

Query
SELECT * FROM traces
WHERE trace_id = 'abc123...'
ORDER BY start_time ASC;
-- Uses: idx_trace_trace_id

Get Traces for Test Run

Query
SELECT DISTINCT trace_id, MIN(start_time) as trace_start
FROM traces
WHERE test_run_id = 'uuid'
GROUP BY trace_id
ORDER BY trace_start DESC;
-- Uses: idx_trace_test_run

Get LLM Calls with Specific Model

Query
SELECT * FROM traces
WHERE attributes @> '{"ai.model.name": "gpt-4"}'
AND project_id = 'uuid'
ORDER BY created_at DESC;
-- Uses: idx_trace_attributes (GIN index)

Get Error Traces

Query
SELECT DISTINCT trace_id, span_name, status_code
FROM traces
WHERE status_code = 'ERROR'
AND project_id = 'uuid'
ORDER BY created_at DESC
LIMIT 100;
-- Uses: idx_trace_status

Get High-Cost Traces

Query
SELECT trace_id, 
       enriched_data->'costs'->>'total_cost_usd' as cost_usd,
       enriched_data->'metadata'->>'models_used' as models
FROM traces
WHERE enriched_data IS NOT NULL
AND (enriched_data->'costs'->>'total_cost_usd')::float > 0.10
AND project_id = 'uuid'
ORDER BY (enriched_data->'costs'->>'total_cost_usd')::float DESC
LIMIT 50;

HTTP Request Format

Ingestion Endpoint

Endpoint: POST /telemetry/traces

Headers:

Headers
Authorization: Bearer <api_key>
Content-Type: application/json

Payload:

Request Body
{
  "spans": [
      { ... span 1 ... },
      { ... span 2 ... }
  ]
}

Response Codes

Status	Meaning	Action
200	Success	Spans ingested
401	Unauthorized	Check API key
422	Validation Error	Fix span names/attributes
500	Server Error	Retry with backoff

Validation Errors

Common 422 errors:

Validation Error
{
  "detail": [
    {
      "loc": [
        "spans",
        0,
        "span_name"
      ],
      "msg": "span_name cannot use framework concept 'agent'. Use primitive operations: llm, tool, retrieval, embedding",
      "type": "value_error"
    }
  ]
}

Why PostgreSQL + JSONB?

Aspect	Benefit
Single Database	Simplifies operations, existing expertise
JSONB Flexibility	Schema can evolve without migrations
GIN Indexes	Fast attribute queries
ACID Compliance	Reliable linking operations
Familiar SQL	Easy debugging and ad-hoc queries

Future Scaling

If trace volume exceeds PostgreSQL capacity:

Partition by time - Monthly partitions for retention
TimescaleDB - Hypertable for time-series optimization
ClickHouse - Columnar store for analytics
Archive strategy - Move old traces to cold storage