Skip to Content
GlossarySDK - Glossary

SDK

Back to GlossaryDevelopment

Software Development Kit - A Python library that provides programmatic access to Rhesis platform features for integration into your workflows.

Also known as: Software Development Kit, Python SDK

Overview

The Rhesis Python SDK provides programmatic access to platform features, enabling you to integrate AI testing into your development workflows, CI/CD pipelines, and custom tooling.

Installation

bash
pip install rhesis-sdk

Quick Start

python
import os
from rhesis.sdk.entities import TestSet
from rhesis.sdk.synthesizers import PromptSynthesizer

# Set API key
os.environ["RHESIS_API_KEY"] = "rh-your-api-key"
os.environ["RHESIS_BASE_URL"] = "https://api.rhesis.ai"  # optional

# Browse available test sets
for test_set in TestSet().all():
      print(test_set)

# Generate custom tests
synthesizer = PromptSynthesizer(
      prompt="Generate tests for a medical chatbot"
)
test_set = synthesizer.generate(num_tests=10)
print(test_set.tests)

Core Features

Test Generation:

python
from rhesis.sdk.synthesizers import PromptSynthesizer, Synthesizer

# Simple prompt-based generation
synthesizer = PromptSynthesizer(
      prompt="Generate tests for a customer support chatbot"
)
test_set = synthesizer.generate(num_tests=50)

# With behaviors and categories
synthesizer = Synthesizer(
      prompt="Test an insurance chatbot",
      behaviors=["helpful", "accurate", "refuses harmful requests"],
      categories=["claims", "policies", "quotes"]
)
test_set = synthesizer.generate(num_tests=100)

Evaluation with Metrics:

python
from rhesis.sdk.metrics import NumericJudge, DeepEvalAnswerRelevancy

# Custom numeric metric
metric = NumericJudge(
      name="answer_quality",
      evaluation_prompt="Rate the quality of this answer",
      min_score=0.0,
      max_score=10.0,
      threshold=7.0
)

result = metric.evaluate(
      input="What is the capital of France?",
      output="The capital of France is Paris"
)
print(f"Score: {result.score}")

# Pre-built metrics
metric = DeepEvalAnswerRelevancy(threshold=0.7)
result = metric.evaluate(
      input="What is photosynthesis?",
      output="Photosynthesis is how plants convert light into energy"
)

Endpoint Connector:

python
from rhesis.sdk import RhesisClient, endpoint

# Initialize client
client = RhesisClient(
      api_key="rh-your-api-key",
      project_id="your-project-id",
      environment="development"
)

# Register functions as endpoints
@endpoint()
def chat(input: str, session_id: str = None) -> dict:
      return {"output": process_message(input), "session_id": session_id}

CI/CD Integration

GitHub Actions:

yaml
name: AI Quality Tests
on: [push, pull_request]

jobs:
    test:
      runs-on: ubuntu-latest
      steps:
        - uses: actions/checkout@v2
        - name: Generate and Evaluate Tests
          env:
            RHESIS_API_KEY: ${{ secrets.RHESIS_API_KEY }}
          run: |
            pip install rhesis-sdk
            python test_runner.py

Example Test Runner:

python
import os
from rhesis.sdk.synthesizers import PromptSynthesizer
from rhesis.sdk.metrics import DeepEvalAnswerRelevancy

os.environ["RHESIS_API_KEY"] = os.getenv("RHESIS_API_KEY")

# Generate tests
synthesizer = PromptSynthesizer(
      prompt="Generate regression tests for chatbot"
)
test_set = synthesizer.generate(num_tests=20)

# Evaluate responses
metric = DeepEvalAnswerRelevancy(threshold=0.7)
failed = 0

for test in test_set.tests:
      response = your_chatbot(test.prompt.content)
      result = metric.evaluate(
          input=test.prompt.content,
          output=response
      )
      if not result.details['is_successful']:
          failed += 1
          print(f"Failed: {test.prompt.content}")

if failed > 0:
      raise Exception(f"{failed} tests failed")

Working with Models

python
from rhesis.sdk.models import get_model

# Use default model
model = get_model()

# Use specific provider
model = get_model("gemini")

# Use in synthesizers
synthesizer = PromptSynthesizer(
      prompt="Generate tests",
      model=model
)

# Use in metrics
metric = NumericJudge(
      name="quality",
      evaluation_prompt="Rate quality",
      min_score=0.0,
      max_score=10.0,
      threshold=7.0,
      model="gemini"
)

Best Practices

  • Error handling: Wrap SDK calls in try-except blocks
  • Environment variables: Store API keys securely
  • Version pinning: Pin SDK version in requirements.txt
  • Review generated tests: Always review AI-generated content
  • Iterate: Refine prompts based on results

Documentation

Related Terms