Skip to Content
GuidesQuick Start: Testing in 10 Minutes

How to Start Testing LLM and Agentic Apps in 10 Minutes with Rhesis AI

Testing LLM and agentic apps is challenging: outputs are non-deterministic, edge cases are unpredictable, and manual testing doesn’t scale. This guide shows you how to get a complete, automated testing pipeline up and running with Rhesis in under 10 minutes.

What You’ll Get

  • Test generation: Generate hundreds of test scenarios from plain-language requirements
  • Single-turn and multi-turn testing: Test both simple Q&A responses and complex conversations (via Penelope)
  • LLM-based evaluation: Automated scoring of whether outputs meet your requirements
  • Full testing platform: UI, API, and SDK for running and managing tests

Prerequisites

  • Docker Desktop installed and running
  • Git (to clone the repository)
  • Ports 3000, 8080, 8081, 5432, and 6379 available on your system
  • An AI provider API key (Rhesis API, OpenAI, Azure OpenAI, or Google Gemini)

Step 1: Clone and Start (5 minutes)

Terminal
# Clone the repository
git clone https://github.com/rhesis-ai/rhesis.git
cd rhesis

# Start all services with one command
./rh start

The ./rh start command automatically:

  • Checks if Docker is running
  • Generates a secure database encryption key
  • Creates .env.docker.local with all required configuration
  • Enables local authentication bypass (auto-login)
  • Starts all services (backend, frontend, database, worker)
  • Creates the database and runs migrations
  • Creates the default admin user (Local Admin)
  • Loads example test data

Wait approximately 5-7 minutes for all services to start. You’ll see containers starting up in your terminal.

Step 2: Access the Platform (1 minute)

Once services are running:

Rhesis AI Dashboard

Step 3: Configure AI Provider (1 minute)

Configure an AI provider to enable test generation:

  1. Get your API key from https://app.rhesis.ai/ 
  2. Edit .env.docker.local and add:
.env.docker.local
RHESIS_API_KEY=your-actual-rhesis-api-key-here

Option 2: Use Your Own AI Provider

Add your provider credentials to .env.docker.local:

.env.docker.local
# Google Gemini
GEMINI_API_KEY=your-gemini-api-key
GEMINI_MODEL_NAME=gemini-2.0-flash-001
GOOGLE_API_KEY=your-google-api-key

# Or Azure OpenAI
AZURE_OPENAI_ENDPOINT=your-endpoint
AZURE_OPENAI_API_KEY=your-key
AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4o
AZURE_OPENAI_API_VERSION=your-version

# Or OpenAI
OPENAI_API_KEY=your-openai-key
OPENAI_MODEL_NAME=gpt-4o

After updating, restart services:

Terminal
./rh restart

Step 4: Start Testing Your LLM/Agentic App (3 minutes)

Via Web UI

  1. Open http://localhost:3000 
  2. Create an Endpoint: Add your LLM/agentic app’s API endpoint
  3. Define Requirements: Specify what your app should and shouldn’t do
  4. Generate Tests: Automatically generate hundreds of test scenarios
  5. Run Tests: Execute tests against your endpoint
  6. Review Results: View which outputs violate requirements

Via Python SDK

Installation

terminal
pip install rhesis-sdk

Obtain an API key

  1. Visit http://localhost:3000 
  2. Navigate to API Tokens
  3. Generate a new API key

Your API key will be in the format rh-XXXXXXXXXXXXXXXXXXXX. Keep this key secure and never share it publicly.

Configure the SDK

You can configure the SDK either through environment variables or direct configuration:

Environment Variables:

terminal
export RHESIS_API_KEY="your-api-key"
export RHESIS_BASE_URL="http://localhost:8080"

Direct Configuration:

config.py
import rhesis

rhesis.api_key = "your-api-key"
rhesis.base_url = "http://localhost:8080"

Set up your REST endpoint

Set up your REST endpoint in the UI, or use the Rhesis connector . Then retrieve the endpoint by ID:

endpoint.py
import os
from rhesis.sdk.entities import Endpoint

os.environ["RHESIS_API_KEY"] = "rh-your-api-key"  # Get from http://localhost:3000 
os.environ["RHESIS_BASE_URL"] = "http://localhost:8080"

# Retrieve endpoint by ID
endpoint = Endpoint(id="your-endpoint-id")
endpoint.pull()

Create a test set

Using the SDK, synthesizer, and then the push method:

create_test_set.py
import os
from rhesis.sdk.entities import TestSet
from rhesis.sdk.synthesizers import PromptSynthesizer

os.environ["RHESIS_API_KEY"] = "rh-your-api-key"  # Get from http://localhost:3000
os.environ["RHESIS_BASE_URL"] = "http://localhost:8080"

# Generate custom test scenarios
synthesizer = PromptSynthesizer(
    prompt="Generate tests for a medical chatbot that must never provide diagnosis",
)
test_set = synthesizer.generate(num_tests=10)

# Push test set to platform
test_set.push()
print(f"Created test set: {test_set.id}")

Run tests

Retrieve endpoint by ID, start a test run on the test set and endpoint:

run_tests.py
import os
from rhesis.sdk.entities import Endpoint, TestSet
from rhesis.sdk.entities.test_configuration import TestConfiguration

os.environ["RHESIS_API_KEY"] = "rh-your-api-key"  # Get from http://localhost:3000
os.environ["RHESIS_BASE_URL"] = "http://localhost:8080"

# Retrieve endpoint by ID
endpoint = Endpoint(id="your-endpoint-id")
endpoint.pull()

# Retrieve test set by ID
test_set = TestSet(id="your-test-set-id")
test_set.pull()

# Create test configuration linking test set and endpoint
test_config = TestConfiguration(
    endpoint_id=endpoint.id,
    test_set_id=test_set.id,
)
test_config.push()

# Execute test set against endpoint
result = test_set.execute(endpoint)
print(f"Test run started: {result}")

For complex conversations, use Penelope  to simulate multi-turn interactions .

What’s Running

Your local infrastructure includes:

ServicePortDescription
Backend API8080FastAPI application handling test execution and evaluation
Frontend3000Next.js dashboard for managing tests and reviewing results
Worker8081Celery worker processing test runs and AI evaluations
PostgreSQL5432Database storing tests, results, and configurations
Redis6379Message broker for worker tasks

Architecture Overview

Quick Commands

Terminal
# Stop all services
./rh stop

# View logs
./rh logs

# Restart services
./rh restart

# Delete everything (fresh start)
./rh delete

Next Steps

Troubleshooting

Docker not running?

Terminal
./rh start

Port already in use?

Terminal
lsof -i :3000  # Check what's using the port
kill -9 <PID>  # Kill the process

Services not starting?

Terminal
./rh logs  # Check logs for errors
./rh delete && ./rh start  # Fresh start

AI provider not working?

Verify your API key in .env.docker.local and restart with ./rh restart.

Need help? Join our Discord: https://discord.rhesis.ai 


You’re all set! You now have a full testing infrastructure for LLM and agentic applications. Generate tests, run them automatically, and catch issues before production.

For more details, see the full self-hosting guide or read about our Docker Compose journey .