Numeric Scoring
A metric scoring type that uses a numeric scale (e.g., 0-10) with a defined pass/fail threshold.
Overview
Numeric scoring provides granular evaluation on a scale, allowing you to track subtle improvements and set specific passing thresholds.
Common Scales
0-10 Scale:
Good for: General quality assessment.
1-5 Scale:
Good for: Quick evaluations, star ratings.
0-100 Scale:
Good for: Percentage-style scoring, fine-grained evaluation.
Setting Thresholds
Strictness: Higher threshold = more strict Use case: Critical features need higher thresholds Baseline: Set based on current performance
Examples:
Benefits
Numeric scoring provides granularity that lets you see small improvements over time. It offers flexibility to adjust thresholds as your system's quality improves. Scores are easily comparable across different tests, and you can track average scores over time to identify trends in performance.
Best Practices
- Anchor scores: Define what each score level means
- Avoid extremes: Rarely use 0 or 10 unless truly warranted
- Review distributions: Check if scores cluster or spread
- Adjust thresholds: Raise bar as quality improves