Score Configuration
Back to GlossaryTesting
Settings that define how metrics score responses, including numeric scales or categorical classifications.
Also known as: scoring config
Overview
Score configuration determines how the judge model assigns scores to AI responses. Choose between numeric scales or categorical classifications based on your evaluation needs.
Scoring Types
Numeric Scoring: Scale-based evaluation (e.g., 0-10, 0-100) with a pass threshold:
Categorical Scoring: Classify into predefined categories:
Choosing the Right Type
Use Binary/Categorical When:
- Binary decision (safe/unsafe, correct/incorrect)
- Clear yes/no criteria
- Simple fast evaluation needed
Use Numeric When:
- Need granularity in scoring
- Want to track incremental improvements
- Comparing performance across versions
Use Categorical When:
- Natural classifications exist
- Multiple quality levels
- Easier to interpret than numbers
Best Practices
- Match criteria: Align scoring type with what you're evaluating
- Clear thresholds: Define what constitutes "passing"
- Consistent scales: Use same scales across similar metrics
- Document meanings: Explain what each score/category means