Skip to Content
ContributeBackendTest Result Statistics

Test Result Statistics API

/test_results/stats aggregates test_metrics for dashboards: multiple mode values, rich filters, and compact payloads.

Overview

Analytics over test results: per-metric and overall pass/fail derived from stored metrics, with filters for runs, entities, time ranges, and more.

Key Features

  • Modes to trim payloads (summary, metrics, etc., as implemented)
  • Multi-valued filter parameters where the API supports them

Authentication

All requests require Bearer token authentication:

authentication.sh
curl -H "Authorization: Bearer YOUR_API_KEY" "URL"

Quick Start

Basic Usage

basic-usage.sh
# Get complete statistics for last 6 months
curl 'http://localhost:8080/test_results/stats' -H 'Authorization: Bearer YOUR_TOKEN'

# Get lightweight summary
curl 'http://localhost:8080/test_results/stats?mode=summary' -H 'Authorization: Bearer YOUR_TOKEN'

# Get metrics analysis only
curl 'http://localhost:8080/test_results/stats?mode=metrics' -H 'Authorization: Bearer YOUR_TOKEN'

Basic Filtering

basic-filtering.sh
# Filter by specific test run
curl 'http://localhost:8080/test_results/stats?test_run_id=UUID' -H 'Authorization: Bearer YOUR_TOKEN'

# Filter by priority level
curl 'http://localhost:8080/test_results/stats?priority_min=3' -H 'Authorization: Bearer YOUR_TOKEN'

# Filter by date range
curl 'http://localhost:8080/test_results/stats?start_date=2024-01-01&end_date=2024-12-31' -H 'Authorization: Bearer YOUR_TOKEN'

Data Modes

The endpoint supports configurable data modes for performance optimization:

Performance-Optimized Modes

ModeData SizeResponse TimeUse Case
summary~5%~50msDashboard widgets, KPI tracking
metrics~20%~100msMetric-focused charts
behavior~15%~100msBehavior performance analysis
category~15%~100msCategory comparison
topic~15%~100msTopic performance insights
overall~10%~75msExecutive dashboards
timeline~40%~150msTrend analysis
test_runs~30%~125msTest run comparison
all100%~200-500msComprehensive analytics

Mode Examples

Summary Mode (Ultra-lightweight)

mode-summary.sh
curl 'http://localhost:8080/test_results/stats?mode=summary' -H 'Authorization: Bearer YOUR_TOKEN'

Returns only:

  • overall_pass_rates
  • metadata

Metrics Mode (Individual metric analysis)

mode-metrics.sh
curl 'http://localhost:8080/test_results/stats?mode=metrics&months=12' -H 'Authorization: Bearer YOUR_TOKEN'

Returns only:

  • metric_pass_rates (Answer Fluency, Answer Relevancy, etc.)
  • metadata

Timeline Mode (Trend analysis)

mode-timeline.sh
curl 'http://localhost:8080/test_results/stats?mode=timeline&months=6' -H 'Authorization: Bearer YOUR_TOKEN'

Returns only:

  • timeline (monthly pass/fail trends)
  • metadata

Filtering System

The endpoint supports comprehensive filtering across multiple dimensions:

Test-Level Filters

ParameterTypeDescriptionExample
test_set_idsUUID[]Filter by test sets?test_set_ids=uuid1&test_set_ids=uuid2
behavior_idsUUID[]Filter by behaviors?behavior_ids=uuid1&behavior_ids=uuid2
category_idsUUID[]Filter by categories?category_ids=uuid1&category_ids=uuid2
topic_idsUUID[]Filter by topics?topic_ids=uuid1&topic_ids=uuid2
status_idsUUID[]Filter by test statuses?status_ids=uuid1&status_ids=uuid2
test_idsUUID[]Filter specific tests?test_ids=uuid1&test_ids=uuid2
test_type_idsUUID[]Filter by test types?test_type_ids=uuid1&test_type_ids=uuid2

Test Run Filters

ParameterTypeDescriptionExample
test_run_idUUIDSingle test run (legacy)?test_run_id=uuid
test_run_idsUUID[]Multiple test runs?test_run_ids=uuid1&test_run_ids=uuid2
ParameterTypeDescriptionExample
user_idsUUID[]Filter by test creators?user_ids=uuid1&user_ids=uuid2
assignee_idsUUID[]Filter by assignees?assignee_ids=uuid1&assignee_ids=uuid2
owner_idsUUID[]Filter by test owners?owner_ids=uuid1&owner_ids=uuid2

Other Filters

ParameterTypeDescriptionExample
prompt_idsUUID[]Filter by prompts?prompt_ids=uuid1&prompt_ids=uuid2
priority_minintMinimum priority (inclusive)?priority_min=1
priority_maxintMaximum priority (inclusive)?priority_max=5
tagsstring[]Filter by tags (AND logic)?tags=urgent&tags=regression

Date Range Filters

ParameterTypeDescriptionExample
start_dateISO dateStart date (overrides months)?start_date=2024-01-01
end_dateISO dateEnd date (overrides months)?end_date=2024-12-31
monthsintHistorical months (default: 6)?months=12

Multiple Value Support

All list-based filters support multiple values using repeated parameters:

Multiple Test Runs

multiple-test-runs.sh
# Multiple test runs
curl 'http://localhost:8080/test_results/stats?test_run_ids=uuid1&test_run_ids=uuid2' -H 'Authorization: Bearer YOUR_TOKEN'

# Backward compatibility (combines both)
curl 'http://localhost:8080/test_results/stats?test_run_id=uuid1&test_run_ids=uuid2' -H 'Authorization: Bearer YOUR_TOKEN'

Multiple Behaviors/Categories/Topics

multiple-behaviors-categories.sh
# Multiple behaviors
curl 'http://localhost:8080/test_results/stats?behavior_ids=uuid1&behavior_ids=uuid2' -H 'Authorization: Bearer YOUR_TOKEN'

# Multiple categories
curl 'http://localhost:8080/test_results/stats?category_ids=uuid1&category_ids=uuid2' -H 'Authorization: Bearer YOUR_TOKEN'

Multiple Users/Teams

multiple-users-teams.sh
# Multiple team members
curl 'http://localhost:8080/test_results/stats?user_ids=dev1&user_ids=dev2&assignee_ids=lead1' -H 'Authorization: Bearer YOUR_TOKEN'

Multiple Tags

multiple-tags.sh
# Multiple tags (AND logic - tests must have ALL tags)
curl 'http://localhost:8080/test_results/stats?tags=urgent&tags=healthcare' -H 'Authorization: Bearer YOUR_TOKEN'

Response Structure

Summary Mode Response

response-summary-mode.json
{
  "overall_pass_rates": {
    "total": 150,
    "passed": 75,
    "failed": 75,
    "pass_rate": 50
  },
  "metadata": {
    "mode": "summary",
    "total_test_results": 150,
    "total_test_runs": 5,
    "start_date": "2024-06-01T00:00:00",
    "end_date": "2024-12-01T00:00:00",
    "organization_id": "org-uuid",
    "available_metrics": [
      "Answer Fluency",
      "Answer Relevancy"
    ],
    "available_behaviors": [
      "Factual Accuracy",
      "Reasoning"
    ],
    "available_categories": [
      "RAG Systems",
      "Chatbots"
    ],
    "available_topics": [
      "Healthcare",
      "Finance"
    ]
  }
}

Metrics Mode Response

response-metrics-mode.json
{
"metric_pass_rates": {
  "Answer Fluency": {
    "total": 150,
    "passed": 90,
    "failed": 60,
    "pass_rate": 60.0
  },
  "Answer Relevancy": {
    "total": 150,
    "passed": 135,
    "failed": 15,
    "pass_rate": 90.0
  }
},
"metadata": { "mode": "metrics", ... }
}

Complete Mode Sections

ModeResponse Sections
allAll sections below
summaryoverall_pass_rates + metadata
metricsmetric_pass_rates + metadata
behaviorbehavior_pass_rates + metadata
categorycategory_pass_rates + metadata
topictopic_pass_rates + metadata
overalloverall_pass_rates + metadata
timelinetimeline + metadata
test_runstest_run_summary + metadata

Performance Optimization

Choose the Right Mode

For Dashboard Widgets

dashboard-widgets.sh
# Ultra-fast, minimal data
curl 'http://localhost:8080/test_results/stats?mode=summary&months=1' -H 'Authorization: Bearer YOUR_TOKEN'

For Specific Analysis

specific-analysis.sh
# Only metrics data
curl 'http://localhost:8080/test_results/stats?mode=metrics&test_set_ids=uuid' -H 'Authorization: Bearer YOUR_TOKEN'

# Only behavior trends
curl 'http://localhost:8080/test_results/stats?mode=behavior&months=6' -H 'Authorization: Bearer YOUR_TOKEN'

For Time-based Charts

time-based-charts.sh
# Timeline data only
curl 'http://localhost:8080/test_results/stats?mode=timeline&start_date=2024-01-01&end_date=2024-06-30' -H 'Authorization: Bearer YOUR_TOKEN'

Use Targeted Filters

Reduce Dataset Size

reduce-dataset-size.sh
# Specific test set analysis
curl 'http://localhost:8080/test_results/stats?test_set_ids=uuid&mode=metrics' -H 'Authorization: Bearer YOUR_TOKEN'

# High-priority tests only
curl 'http://localhost:8080/test_results/stats?priority_min=3&mode=summary' -H 'Authorization: Bearer YOUR_TOKEN'

# Recent data only
curl 'http://localhost:8080/test_results/stats?months=1&mode=timeline' -H 'Authorization: Bearer YOUR_TOKEN'

Complete Examples

Dashboard Implementation

Executive Dashboard (Ultra-fast)

executive-dashboard.sh
curl 'http://localhost:8080/test_results/stats?mode=summary&months=1' -H 'Authorization: Bearer YOUR_TOKEN'
# Response time: ~50ms, Data size: ~5%

Team Performance Dashboard

team-performance-dashboard.sh
curl 'http://localhost:8080/test_results/stats?mode=test_runs&assignee_ids=lead1&assignee_ids=lead2&months=3' -H 'Authorization: Bearer YOUR_TOKEN'
# Shows test run performance for specific team leads

Analytics Use Cases

Metric Performance Analysis

metric-performance-analysis.sh
curl 'http://localhost:8080/test_results/stats?mode=metrics&test_set_ids=regression_suite&months=12' -H 'Authorization: Bearer YOUR_TOKEN'
# 12-month metric trends for regression test suite

Behavior Comparison

behavior-comparison.sh
curl 'http://localhost:8080/test_results/stats?mode=behavior&behavior_ids=uuid1&behavior_ids=uuid2&start_date=2024-01-01&end_date=2024-06-30' -H 'Authorization: Bearer YOUR_TOKEN'
# Compare specific behaviors over date range

Category Trends

category-trends.sh
curl 'http://localhost:8080/test_results/stats?mode=category&category_ids=chatbot&months=6' -H 'Authorization: Bearer YOUR_TOKEN'
# 6-month category performance trends

Advanced Filtering

Urgent Healthcare Tests

urgent-healthcare-tests.sh
curl 'http://localhost:8080/test_results/stats?behavior_ids=healthcare_uuid&tags=urgent&priority_min=4&mode=summary' -H 'Authorization: Bearer YOUR_TOKEN'
# High-priority urgent healthcare test performance

Team Regression Analysis

team-regression-analysis.sh
curl 'http://localhost:8080/test_results/stats?tags=regression&assignee_ids=dev1&assignee_ids=dev2&start_date=2024-01-01&mode=test_runs' -H 'Authorization: Bearer YOUR_TOKEN'
# Regression test performance by team members

Multi-dimensional Analysis

multi-dimensional-analysis.sh
curl 'http://localhost:8080/test_results/stats?test_set_ids=suite1&test_set_ids=suite2&behavior_ids=behavior1&category_ids=category1&priority_min=2&mode=all' -H 'Authorization: Bearer YOUR_TOKEN'
# Complete analysis across multiple dimensions

Best Practices

Performance Best Practices

  1. Use Specific Modes: Always use the most specific mode for your use case
  2. Apply Filters: Reduce dataset size with targeted filters
  3. Limit Time Ranges: Use shorter time periods when possible
  4. Cache Results: Cache responses for repeated queries

Query Optimization

Good Examples:

query-optimization-good.sh
# Lightweight dashboard widget
?mode=summary&months=1

# Focused metric analysis
?mode=metrics&test_set_ids=uuid&months=3

# Targeted behavior comparison
?mode=behavior&behavior_ids=uuid1&behavior_ids=uuid2&months=6

Avoid These:

query-optimization-avoid.sh
# Unnecessarily broad queries
?mode=all&months=24

# Unfocused large datasets
?mode=all  # without any filters

Error Handling

The API provides clear validation errors:

error-validation.json
{
  "detail": [
    {
      "type": "int_parsing",
      "loc": [
        "query",
        "priority_max"
      ],
      "msg": "Input should be a valid integer, unable to parse string as an integer",
      "input": "abc"
    }
  ]
}

Integration Tips

React/JavaScript Usage:

integration-javascript.js
// Dashboard widget
const summaryData = await fetch("/test_results/stats?mode=summary&months=1");

// Chart data
const metricsData = await fetch(
"/test_results/stats?mode=metrics&test_set_ids=" + testSetId,
);

// Timeline chart
const timelineData = await fetch("/test_results/stats?mode=timeline&months=6");

Python Usage:

integration-python.py
import requests

# Dashboard summary
response = requests.get(
    "http://localhost:8080/test_results/stats",
    headers={"Authorization": f"Bearer {token}"},
    params={"mode": "summary", "months": 1}
)

# Multi-filter analysis
response = requests.get(
    "http://localhost:8080/test_results/stats",
    headers={"Authorization": f"Bearer {token}"},
    params={
        "mode": "behavior",
        "behavior_ids": ["uuid1", "uuid2"],
        "priority_min": 3,
        "months": 6
    }
)

This endpoint provides enterprise-level analytics capabilities with the flexibility and performance needed for production dashboards and detailed analysis workflows.