Skip to Content
GlossaryScore Configuration - Glossary

Score Configuration

Back to GlossaryTesting

Settings that define how metrics score responses, including numeric scales or categorical classifications.

Also known as: scoring config

Overview

Score configuration determines how the judge model assigns scores to AI responses. Choose between numeric scales or categorical classifications based on your evaluation needs.

Scoring Types

Numeric Scoring: Scale-based evaluation (e.g., 0-10, 0-100) with a pass threshold:

python
from rhesis.sdk.metrics import NumericJudge

metric = NumericJudge(
      name="helpfulness",
      evaluation_prompt="Rate response helpfulness",
      min_score=0.0,
      max_score=10.0,
      threshold=7.0
)

Categorical Scoring: Classify into predefined categories:

python
from rhesis.sdk.metrics import CategoricalJudge

metric = CategoricalJudge(
      name="quality_classifier",
      evaluation_prompt="Classify response quality",
      categories=["excellent", "good", "fair", "poor"],
      passing_categories=["excellent", "good"]
)

Choosing the Right Type

Use Binary/Categorical When:

  • Binary decision (safe/unsafe, correct/incorrect)
  • Clear yes/no criteria
  • Simple fast evaluation needed

Use Numeric When:

  • Need granularity in scoring
  • Want to track incremental improvements
  • Comparing performance across versions

Use Categorical When:

  • Natural classifications exist
  • Multiple quality levels
  • Easier to interpret than numbers

Best Practices

  • Match criteria: Align scoring type with what you're evaluating
  • Clear thresholds: Define what constitutes "passing"
  • Consistent scales: Use same scales across similar metrics
  • Document meanings: Explain what each score/category means

Documentation

Related Terms