Score Configuration

Settings that define how metrics score responses, including numeric scales or categorical classifications.

Also known as: scoring config

Overview

Score configuration determines how the judge model assigns scores to AI responses. Choose between numeric scales or categorical classifications based on your evaluation needs.

Scoring Types

Numeric Scoring: Scale-based evaluation (e.g., 0-10, 0-100) with a pass threshold:

python
from rhesis.sdk.metrics import NumericJudge

metric = NumericJudge(
      name="helpfulness",
      evaluation_prompt="Rate response helpfulness",
      min_score=0.0,
      max_score=10.0,
      threshold=7.0
)

Categorical Scoring: Classify into predefined categories:

python
from rhesis.sdk.metrics import CategoricalJudge

metric = CategoricalJudge(
      name="quality_classifier",
      evaluation_prompt="Classify response quality",
      categories=["excellent", "good", "fair", "poor"],
      passing_categories=["excellent", "good"]
)

Choosing the Right Type

Use Binary/Categorical When:

Binary decision (safe/unsafe, correct/incorrect)
Clear yes/no criteria
Simple fast evaluation needed

Use Numeric When:

Need granularity in scoring
Want to track incremental improvements
Comparing performance across versions

Use Categorical When:

Natural classifications exist
Multiple quality levels
Easier to interpret than numbers

Best Practices

Match criteria: Align scoring type with what you're evaluating
Clear thresholds: Define what constitutes "passing"
Consistent scales: Use same scales across similar metrics
Document meanings: Explain what each score/category means

Documentation

/platform/metrics

Score Configuration

Overview

Scoring Types

Choosing the Right Type

Best Practices

Related Terms