Skip to Content
DocsCore Concepts

Core Concepts

Rhesis evaluates whether your AI application meets the quality bar you care about. This page has two parts: the workflow (what you do and in what order) and the platform structure (what each object is and how they connect).

The workflow

A normal workflow in Rhesis looks like this:

  1. Connect your endpoint — link Rhesis to the system under test. See Connect your application.
  2. Define behaviors — decide what good looks like for your application.
  3. Set up metrics — assign judges to each behavior so responses can be scored automatically.
  4. Create tests — write, import, or generate inputs; tag each test with a behavior.
  5. Run a test set — execute a batch of tests against your endpoint.
  6. Review results — inspect individual failures, then track trends across runs.

New to Rhesis? The Getting Started guide walks through environment setup and your first run.

Platform structure

Setup
Configure before you run
What to test
Test: One prompt or conversation goal, tagged with a behavior.

Test

One prompt or conversation goal, tagged with a behavior.

Learn more →
Test set: Batch of tests for a run. A test can belong to many sets.

Test set

Batch of tests for a run. A test can belong to many sets.

Learn more →
How to measure
Behavior: What good looks like. Defaults: Reliability, Robustness, Compliance.

Behavior

What good looks like. Defaults: Reliability, Robustness, Compliance.

Learn more →
Metric: Judge assigned to behaviors. Pass/fail, score, and reasoning.

Metric

Judge assigned to behaviors. Pass/fail, score, and reasoning.

Learn more →
Your application
System under test
Endpoint: How Rhesis reaches your app. Sends input, receives output for evaluation.

Endpoint

How Rhesis reaches your app. Sends input, receives output for evaluation.

Learn more →
Connection options
REST API: HTTP endpoint with URL and UI mapping.

REST API

HTTP endpoint with URL and UI mapping.

Learn more →
SDK Connector: @endpoint on your function, invoked over WebSocket.

SDK Connector

@endpoint on your function, invoked over WebSocket.

Learn more →
Results
What you get after a run
Test run: One execution of a test set against an endpoint. Full snapshot.

Test run

One execution of a test set against an endpoint. Full snapshot.

Learn more →
Results overview: Dashboard across runs: trends, pass rates by behavior.

Results overview

Dashboard across runs: trends, pass rates by behavior.

Learn more →

Solid flow: test set → endpoint → test run. Metrics score each test result after the response returns.


Ready to start? Create a project, connect an endpoint, and generate tests to run your first evaluation.