Rubrics Based Scoring Evaluator
The Rubrics Based Scoring evaluator assesses responses against customized qualitative criteria defined as rubrics. Unlike fixed metrics, rubric-based evaluation allows users to define application-specific dimensions that matter for their use case.

Rubrics Based Scoring component interface and configuration
Evaluation Notice: The effectiveness of rubric-based evaluation depends on carefully crafted criteria that reflect your application's specific quality requirements. Consider involving domain experts in defining your evaluation rubrics.
Component Inputs
- User Input: The original question or query
Example: "What are the benefits and drawbacks of remote work?"
- Generated Output: The response generated by the RAG system
Example: "Remote work has several benefits including flexibility in schedule, no commute time, and improved work-life balance. On the other hand, it can cause feelings of isolation, make it difficult to separate work from personal life, and potentially reduce team cohesion due to limited face-to-face interaction."
- Expected Output: The reference or ground truth answer for comparison
Example: "Remote work offers advantages such as schedule flexibility, elimination of commuting, and better work-life balance. However, it presents challenges including social isolation, blurred boundaries between work and home, and reduced team collaboration."
Component Outputs
- Score: An aggregated numerical value between 0 and 1, representing the overall assessment across all rubrics
Example: 0.88 (indicating strong performance across defined rubrics)
- Evaluation Result: A detailed breakdown showing scores for each individual rubric criterion
Example: { "overall_score": 0.88, "rubric_scores": { "Is the answer comprehensive...": 0.92, "Is the answer concise...": 0.85, ... } }
Score Interpretation
Excellent Performance (0.7-1.0)
Response strongly meets most or all criteria defined in the rubrics
Example Score: 0.92
This indicates excellent performance against your custom evaluation criteria
Satisfactory Performance (0.3-0.7)
Response meets some but not all criteria, with room for improvement in certain dimensions
Example Score: 0.55
This indicates adequate performance with some weaknesses in specific rubric criteria
Poor Performance (0.0-0.3)
Response fails to meet most of the criteria defined in the rubrics
Example Score: 0.15
This indicates poor performance against your custom evaluation criteria
Implementation Example
from ragas.metrics import RubricScore
# Define custom rubrics
rubrics = [
"Is the answer comprehensive, covering all aspects of the question?",
"Is the answer concise, without unnecessary information?",
"Is the answer well-structured and easy to understand?",
"Does the answer address the core intent of the query?"
]
# Create the metric
rubric_score = RubricScore(rubrics=rubrics)
# Use in evaluation
from datasets import Dataset
from ragas import evaluate
eval_dataset = Dataset.from_dict({
"user_input": ["What are the benefits and drawbacks of remote work?"],
"generated_output": ["Remote work has several benefits including
flexibility in schedule, no commute time, and improved
work-life balance. On the other hand, it can cause +
feelings of isolation, make it difficult to separate +
work from personal life, and potentially reduce team +
cohesion due to limited face-to-face interaction."],
"expected_output": ["Remote work offers advantages such
as schedule flexibility, elimination of commuting, and
better work-life balance. However, it presents
challenges including social isolation, blurred
boundaries between work and home, and reduced team
collaboration."]
})
result = evaluate(
eval_dataset,
metrics=[rubric_score]
)
print(result)
Use Cases
- Custom Evaluation Criteria: Create domain-specific evaluation frameworks tailored to particular use cases
- Multi-dimensional Assessment: Evaluate responses across multiple quality dimensions simultaneously
- Educational Feedback: Provide structured feedback on responses for training or educational purposes
- Industry-Specific Evaluation: Assess responses against industry standards or regulatory requirements
- Brand Voice Alignment: Evaluate how well responses adhere to brand communication guidelines
Best Practices
- Keep rubric questions clear, specific, and objectively answerable
- Balance the number of rubrics to cover important dimensions without overwhelming the evaluation
- Periodically review and refine your rubrics based on changing requirements
- Consider weighting certain rubrics higher than others if they're more important for your use case
- Use the per-rubric scores to identify specific areas for improvement in your RAG system