Human Reviews for Test Results
Overview
Human reviews complement automated metrics by allowing human evaluators to review, adjust, or override automated test outcomes. Each test result can include one or more reviews, representing human judgments with structured metadata.
This schema separates automated metrics (test_metrics) from human-provided evaluations (test_reviews), enabling clearer data management, traceability, and aggregation.
Status: ✅ Fully Implemented and Tested
Design Goals
- Separation of concerns: Keep human feedback separate from machine metrics.
- Support multiple reviewers and rounds: Allow several humans to evaluate the same test.
- Granular scope: Reviews can target specific metrics or the overall test.
- Traceability: Include timestamps, reviewer identity, and status references.
- Efficient access: Include top-level metadata for quick lookups and summaries.
- Full CRUD operations: Create, read, update, and delete reviews via REST API.
- Automatic metadata management: Metadata updates automatically on all operations.
Top-Level Structure
The test_reviews field is stored as a JSON object with two parts:
metadata: Summary information about the reviews collection.reviews: List of individual review objects.
JSON Schema
Field Reference
metadata
| Field | Type | Description |
|---|---|---|
last_updated_at | string (ISO 8601) | Timestamp when any review was last modified. |
last_updated_by | object | Contains the user ID and name of the last editor. |
total_reviews | integer | Total number of reviews in the list. |
latest_status | object | Status of the most recent review (for quick summaries). |
summary | string | Optional short description or aggregated summary. |
reviews[]
| Field | Type | Description |
|---|---|---|
review_id | UUID | Unique ID for this review entry. |
status | object | Contains a status_id (UUID) and name (string). |
user | object | Contains a user_id (UUID) and name (string). |
comments | string | Free-text explanation from the reviewer. |
created_at | string (ISO 8601) | Timestamp when the review was first created. |
updated_at | string (ISO 8601) | Timestamp when the review was last modified. |
target | object | Defines whether the review applies to a metric or the whole test. |
Example target values
Implementation Details
Database
Column: test_reviews (JSONB, nullable)
Table: test_result
The column stores the complete review structure as JSON, providing flexibility for complex review scenarios without additional tables.
Backend Models
File: apps/backend/src/rhesis/backend/app/models/test_result.py
Derived Properties:
last_review: Returns the most recent review byupdated_attimestampmatches_review: Boolean indicating if the test result’s status_id matches the latest review’s status_id
Pydantic Schemas
File: apps/backend/src/rhesis/backend/app/schemas/test_result.py
API Endpoints
All endpoints follow REST conventions and automatically manage metadata.
1. Create Review
Endpoint: POST /test_results/{test_result_id}/reviews
Request Body:
Response (201 Created):
Features:
- Auto-generates unique
review_id - Sets both
created_atandupdated_atto current time - Auto-populates user info from authenticated user
- Fetches and embeds status details from Status model
- Updates test_reviews metadata automatically
2. Update Review
Endpoint: PUT /test_results/{test_result_id}/reviews/{review_id}
Request Body (all fields optional):
Response (200 OK):
Features:
- Preserves
created_attimestamp - Updates
updated_atto current time - Updates only provided fields
- Updates metadata automatically
3. Delete Review
Endpoint: DELETE /test_results/{test_result_id}/reviews/{review_id}
Response (200 OK):
Features:
- Removes review from reviews array
- Updates metadata automatically
- Handles empty state (when last review deleted, sets
latest_status: null) - Returns deleted review data
4. Get Test Result with Reviews
Endpoint: GET /test_results/{test_result_id}
Response includes:
Derived Properties:
last_review: Most recent review byupdated_attimestampmatches_review: Boolean indicating if test result status matches latest review status
Use Cases
1. Override Automated Metrics
A human reviewer disagrees with an automated pass/fail and adds a review with a different status.
2. Multi-Reviewer Workflow
Multiple team members review the same test result, each adding their perspective.
3. Metric-Specific Reviews
Review specific metrics (e.g., “Answer Relevancy”) separately from overall test.
4. Audit Trail
Track who reviewed what and when, with full edit history via timestamps.
5. Status Conflict Detection
Use matches_review to identify cases where human review disagrees with automated result.
Implementation Notes
Metadata Management
The metadata section is automatically updated on every review operation:
- Create: Initializes metadata with first review’s information
- Update: Updates
last_updated_at,last_updated_by, andlatest_status - Delete: Updates metadata or clears
latest_statusif no reviews remain
JSONB Column Updates
When modifying reviews, use SQLAlchemy’s flag_modified to ensure changes are detected:
Empty State Handling
When the last review is deleted:
reviewsarray becomes emptytotal_reviews= 0latest_status= nullsummary= “All reviews removed”last_reviewderived property returns Nonematches_reviewreturns False
Testing
All functionality has been tested and verified:
✅ Create review with auto-generated ID and timestamps
✅ Update review with preserved created_at
✅ Delete review with metadata updates
✅ Multiple reviews support
✅ Both “test” and “metric” target types
✅ Empty state handling
✅ Derived properties (last_review, matches_review)
✅ Automatic metadata synchronization
Future Extensions
Potential enhancements:
confidence: Reviewer confidence score (0–1)attachments: File attachments for evidencereview_type: Categorize reviews (approval, rejection, follow-up)tags: Categorize reviews with tags- Review workflows and approval chains