Conversational Metrics
Overview
Conversational metrics (multi-turn metrics) evaluate the quality of interactions across multiple conversation turns. These metrics assess aspects like coherence, goal achievement, role adherence, and tool usage in extended dialogues.
API Key Required: All examples in this documentation require a valid Rhesis API key. Set your API key using:
For more information, see the Installation & Setup guide.
Rhesis integrates with the following open-source evaluation frameworks:
- DeepEval - Apache License 2.0
The LLM Evaluation Framework by Confident AI
These tools are used through their public APIs. The original licenses and copyright notices can be found in their respective repositories. Rhesis is not affiliated with these projects.
Supported Metrics
DeepEval Conversational Metrics
| Metric | Description | Reference |
|---|---|---|
DeepEvalTurnRelevancy | Evaluates relevance of assistant responses across conversation turns | Docs |
DeepEvalRoleAdherence | Evaluates whether assistant maintains its assigned role throughout the conversation | Docs |
DeepEvalKnowledgeRetention | Evaluates assistant’s ability to retain and recall facts from earlier in the conversation | Docs |
DeepEvalConversationCompleteness | Evaluates whether conversation reaches a satisfactory conclusion | Docs |
DeepEvalGoalAccuracy | Evaluates assistant’s ability to plan and execute tasks to achieve specific goals | Docs |
DeepEvalToolUse | Evaluates assistant’s capability in selecting and using tools appropriately | Docs |
Rhesis Conversational Metrics
| Metric | Description | Configuration |
|---|---|---|
ConversationalJudge | Custom LLM-based evaluation for conversation quality | Custom prompts, evaluation criteria, scoring rubric |
GoalAchievementJudge | Evaluates whether specific goals were achieved in the conversation | Goal criteria, achievement indicators, threshold |
If any metrics are missing from the list, or you would like to use a different provider, please let us know by creating an issue on GitHub .
Conversation History
All conversational metrics require a ConversationHistory object that represents the multi-turn dialogue. Create one using the from_messages method:
Quick Start
Turn Relevancy
Evaluates whether assistant responses are relevant to the conversational context throughout the conversation.
Role Adherence
Evaluates whether the assistant maintains its assigned role throughout the conversation.
Knowledge Retention
Evaluates the assistant’s ability to retain and recall factual information introduced earlier in the conversation.
Conversation Completeness
Evaluates whether the conversation reaches a satisfactory conclusion where the user’s needs are met.
Goal Accuracy
Evaluates the assistant’s ability to plan and execute tasks to achieve specific goals.
Tool Use
Evaluates the assistant’s capability in selecting and utilizing tools appropriately during conversations.
Creating Custom Conversational Metrics
Conversational Judge
Create custom conversational evaluations using ConversationalJudge:
Goal Achievement Judge
Evaluate goal achievement with custom criteria using GoalAchievementJudge:
Understanding Results
All conversational metrics return a MetricResult object:
Configuring Models
All conversational metrics require an LLM model to perform the evaluation. If no model is specified, the default model will be used.
For more information about models, see the Models Documentation.
See Also
- Single-Turn Metrics - Individual response evaluation
- Models Documentation - Configure LLM models for evaluation
- Installation & Setup - Setup instructions
- GitHub Repository - Source code and examples