Single-Turn Metrics
Overview
Single-turn metrics evaluate individual exchanges between user input and system output. These metrics are ideal for assessing the quality of standalone responses, RAG systems, and classification tasks.
API Key Required: All examples in this documentation require a valid Rhesis API key. Set your API key using:
For more information, see the Installation & Setup guide.
Rhesis integrates with the following open-source evaluation frameworks:
- DeepEval - Apache License 2.0
The LLM Evaluation Framework by Confident AI - DeepTeam - Apache License 2.0
The LLM Red Teaming Framework by Confident AI - Ragas - Apache License 2.0
Supercharge Your LLM Application Evaluations by Exploding Gradients
These tools are used through their public APIs. The original licenses and copyright notices can be found in their respective repositories. Rhesis is not affiliated with these projects.
Supported Metrics
DeepEval Metrics
| Metric | Description | Requires Context | Requires Ground Truth | Reference |
|---|---|---|---|---|
DeepEvalAnswerRelevancy | Measures answer relevance to the question | No | No | Docs |
DeepEvalFaithfulness | Checks if answer is grounded in context | Yes | No | Docs |
DeepEvalContextualRelevancy | Evaluates context relevance to question | Yes | No | Docs |
DeepEvalContextualPrecision | Measures precision of retrieved context | Yes | Yes | Docs |
DeepEvalContextualRecall | Measures recall of retrieved context | Yes | Yes | Docs |
DeepEvalBias | Detects biased content in responses | No | No | Docs |
DeepEvalToxicity | Detects toxic content in responses | No | No | Docs |
DeepEvalPIILeakage | Detects personally identifiable information | No | No | Docs |
DeepEvalRoleViolation | Detects when assistant violates assigned role | No | No | Docs |
DeepEvalMisuse | Detects potential misuse of the system | No | No | Docs |
DeepEvalNonAdvice | Ensures assistant doesn’t give restricted advice | No | No | Docs |
DeepTeam Metrics
| Metric | Description | Requires Context | Requires Ground Truth | Reference |
|---|---|---|---|---|
DeepTeamSafety | Detects safety violations | No | No | Docs |
DeepTeamIllegal | Detects illegal content or requests | No | No | Docs |
Ragas Metrics
| Metric | Description | Requires Context | Requires Ground Truth | Reference |
|---|---|---|---|---|
RagasContextRelevance | Evaluates context relevance to question | Yes | No | Docs |
RagasAnswerAccuracy | Measures answer accuracy against ground truth | No | Yes | Docs |
RagasFaithfulness | Checks if answer is grounded in context | Yes | No | Docs |
RagasAspectCritic | Custom aspect-based evaluation | No | No | Docs |
Rhesis Custom Metrics
| Metric | Description | Configuration |
|---|---|---|
NumericJudge | LLM-based numeric scoring (e.g., 0-10 scale) | Min/max score, threshold, custom prompts |
CategoricalJudge | LLM-based categorical classification | Categories, passing categories, custom prompts |
If any metrics are missing from the list, or you would like to use a different provider, please let us know by creating an issue on GitHub .
Quick Start
Using DeepEval Metrics
Using Ragas Metrics
Creating Custom Metrics
You can create custom metrics using the NumericJudge and CategoricalJudge classes.
Numeric Judge
NumericJudge returns a numeric score (e.g., from 0 to 10) and requires four specific parameters: min_score, max_score, threshold, and threshold_operator.
Categorical Judge
CategoricalJudge returns a categorical value and requires you to specify categories and passing_categories.
Understanding Results
All metrics return a MetricResult object:
Configuring Models
All metrics require an LLM model to perform the evaluation. If no model is specified, the default model will be used. You can specify the model using the model argument.
For more information about models, see the Models Documentation.
Advanced Configuration
Serialization
Custom metrics can be serialized and deserialized using the from_config/to_config or from_dict/to_dict methods.
Platform Integration
Metrics can be managed both in the platform and in the SDK. The SDK provides push and pull methods to synchronize metrics with the platform.
Pushing Metrics
To push a metric to the platform:
Pulling Metrics
To pull metrics from the platform, use the pull method and specify the metric name. If the name is not unique, you must also specify the metric ID.
See Also
- Conversational Metrics - Multi-turn conversation evaluation
- Models Documentation - Configure LLM models for evaluation
- Installation & Setup - Setup instructions
- GitHub Repository - Source code and examples