Overview
The Rhesis SDK provides a comprehensive metrics system for evaluating LLM-based systems. The metrics module supports multiple evaluation frameworks and allows you to create custom metrics tailored to your specific use cases. The metrics module is integrated with the backend, allowing you to work with metrics directly from the platform.
Metric Types
Rhesis SDK supports two types of metrics:
Single-Turn Metrics
Single-turn metrics evaluate individual exchanges between user input and system output. These metrics are ideal for assessing:
- RAG Systems: Context relevance, faithfulness, and answer accuracy
- Response Quality: Clarity, relevance, and accuracy of individual responses
- Safety & Compliance: Bias, toxicity, PII leakage, and other safety concerns
- Custom Evaluations: Domain-specific quality assessments
View Single-Turn Metrics Documentation →
Conversational Metrics
Conversational metrics (multi-turn metrics) evaluate the quality of interactions across multiple conversation turns. These metrics are ideal for assessing:
- Conversation Flow: Turn relevancy and coherence across dialogue
- Goal Achievement: Whether objectives are met throughout the conversation
- Role Adherence: Consistency in maintaining assigned roles
- Knowledge Retention: Ability to recall and reference earlier conversation context
- Tool Usage: Appropriate selection and utilization of available tools
- Conversation Completeness: Whether conversations reach satisfactory conclusions
View Conversational Metrics Documentation →
Framework Integration
Rhesis integrates with the following open-source evaluation frameworks:
- DeepEval - Apache License 2.0
The LLM Evaluation Framework by Confident AI - DeepTeam - Apache License 2.0
The LLM Red Teaming Framework by Confident AI - Ragas - Apache License 2.0
Supercharge Your LLM Application Evaluations by Exploding Gradients
These tools are used through their public APIs. The original licenses and copyright notices can be found in their respective repositories. Rhesis is not affiliated with these projects.
Quick Example
API Key Required: All examples require a valid Rhesis API key. Set your API key using:
For more information, see the Installation & Setup guide.
Single-Turn Evaluation
Conversational Evaluation
Custom Metrics
In addition to framework-provided metrics, Rhesis offers custom metric builders:
For Single-Turn Evaluation
NumericJudge: Create custom numeric scoring metrics (e.g., 0-10 scale)CategoricalJudge: Create custom categorical classification metrics
For Conversational Evaluation
ConversationalJudge: Create custom conversational quality metricsGoalAchievementJudge: Evaluate goal achievement with custom criteria
Platform Integration
Metrics can be managed both in the platform and in the SDK. The SDK provides push and pull methods to synchronize metrics with the platform.
Next Steps
- Single-Turn Metrics - Learn about all available single-turn metrics
- Conversational Metrics - Learn about all available conversational metrics
- Models Documentation - Configure LLM models for evaluation
- Installation & Setup - Setup instructions
Need Help?
If any metrics are missing from the list, or you would like to use a different provider, please let us know by creating an issue on GitHub .