Platform

Try Rhesis Now: Access the platform at app.rhesis.ai to start testing your LLM and agentic applications.

The Rhesis platform provides comprehensive tools for testing and evaluating LLM and agentic applications at scale. Build, test, and improve your LLM and agentic applications with confidence using our integrated testing and evaluation platform.

New to Rhesis? Start with Core Concepts to understand how everything fits together, or take a look at the Getting Started guide.

Where to Start

If you are just getting started, follow this path:

Create a Project - Organize your testing work
Configure Endpoints - Connect to your LLM or agentic application
Generate Tests - Create test cases with AI assistance
Define Metrics - Set up evaluation criteria
Run and Analyze - Execute tests and review results

Already familiar? Jump to any feature below.

Features & Capabilities

Explore the comprehensive features and capabilities of the Rhesis platform. Click on any feature to learn more:

Organizations & Team: Manage organization settings and invite team members.

Organizations & Team

Manage organization settings and invite team members.

Learn more →

Projects: Organize your testing work into projects.

Projects

Organize your testing work into projects.

Learn more →

Knowledge: Add sources to generate context-aware test cases.

Knowledge

Add sources to generate context-aware test cases.

Learn more →

Behaviors: Define expected behaviors that your AI application should follow during testing.

Behaviors

Define expected behaviors that your AI application should follow during testing.

Learn more →

Metrics: Define and manage LLM-based evaluation criteria.

Metrics

Define and manage LLM-based evaluation criteria.

Learn more →

Endpoints: Configure the AI application you are testing against.

Endpoints

Configure the AI application you are testing against.

Learn more →

Models: Configure and manage AI models used for test generation and evaluation.

Models

Configure and manage AI models used for test generation and evaluation.

Learn more →

Test Generation: Generate test cases using AI from knowledge sources and defined behaviors.

Test Generation

Generate test cases using AI from knowledge sources and defined behaviors.

Learn more →

Tests: Create and manage test cases manually or generate them using AI.

Tests

Create and manage test cases manually or generate them using AI.

Learn more →

Test Sets: Organize tests into collections and execute them against your AI application.

Test Sets

Organize tests into collections and execute them against your AI application.

Learn more →

Test Execution: Run tests against your endpoints and configure execution parameters.

Test Execution

Run tests against your endpoints and configure execution parameters.

Learn more →

Test Runs: View execution results for individual test runs with filtering, comparison, and metric analysis.

Test Runs

View execution results for individual test runs with filtering, comparison, and metric analysis.

Learn more →

Results Overview: A global view of all your test results.

Results Overview

A global view of all your test results.

Learn more →

Playground: Interactively test conversational endpoints in real time and convert sessions into tests.

Playground

Interactively test conversational endpoints in real time and convert sessions into tests.

Learn more →

Tracing: OpenTelemetry-based observability for LLM calls, tool invocations, and agent workflows.

Tracing

OpenTelemetry-based observability for LLM calls, tool invocations, and agent workflows.

Learn more →

Collaboration: Coordinate testing work with tasks, inline comments, and human review workflows.

Collaboration

Coordinate testing work with tasks, inline comments, and human review workflows.

Learn more →

Polyphemus: Rhesis-hosted open-source LLM service with built-in access control and rate limiting.

Polyphemus

Rhesis-hosted open-source LLM service with built-in access control and rate limiting.

Learn more →

MCP: Connect to Model Context Protocol servers to import knowledge sources.

MCP

Connect to Model Context Protocol servers to import knowledge sources.

Learn more →

Dashboard

The dashboard provides a centralized view of your test activity, performance trends, and quality metrics.

Watch this short video to see the modernized Rhesis dashboard in action.

Need Help? Check out our Development Guide for SDK and API documentation, or visit Getting Started for initial setup.