Documentation

Response Relevancy Evaluator

The Response Relevancy Evaluator assesses how well a generated response addresses the user's original query. It helps identify responses that may be factually correct but fail to answer what the user actually asked.

Response Relevancy Evaluator Component

Response Relevancy Evaluator component interface and configuration

Evaluation Notice: Low relevancy scores indicate that responses are not addressing what users are asking for, which can lead to poor user experience and frustration regardless of factual correctness.

Component Inputs

  • Prompt / User Input: The original query or question posed by the user

    Example: "What are the health benefits of regular exercise?"

  • Generated Output: The response generated by the RAG system

    Example: "Regular exercise provides numerous health benefits, including improved cardiovascular health, better weight management, enhanced mental wellbeing, stronger muscles and bones, reduced risk of chronic diseases, and improved sleep quality."

Component Outputs

  • Evaluation Result: Qualitative assessment of the response's relevance to the original query

    Example: "The response directly addresses the health benefits of regular exercise as requested in the query."

Score Interpretation

High Relevance (0.7-1.0)

Response directly addresses the query and provides the information the user was seeking

Example Score: 0.95 This indicates an excellent response that precisely answers what was asked

Moderate Relevance (0.3-0.7)

Response partially addresses the query but may include tangential information or miss some aspects

Example Score: 0.50 This indicates a response that addresses the query topic but may not fully answer what was asked

Low Relevance (0.0-0.3)

Response fails to address the query or provides information on a different topic

Example Score: 0.15 This indicates a response that does not answer the question asked

Implementation Example

from ragas.metrics import ResponseRelevancy # Create the metric response_relevancy = ResponseRelevancy() # Use in evaluation from datasets import Dataset from ragas import evaluate eval_dataset = Dataset.from_dict({ "question": ["What are the health benefits of regular exercise?"], "contexts": [["Regular exercise improves cardiovascular health, helps with weight management, boosts mental health, strengthens muscles and bones, reduces risk of chronic diseases, + and improves sleep quality."]], "answer": ["Regular exercise provides numerous health benefits, + including improved cardiovascular health, better weight + management, enhanced mental wellbeing, stronger muscles + and bones, reduced risk of chronic diseases, and + improved sleep quality."] }) result = evaluate( eval_dataset, metrics=[response_relevancy] ) print(result)

Use Cases

  • Query Understanding: Evaluate how well your system interprets and responds to different query types
  • Response Quality Assurance: Ensure responses actually answer the questions users are asking
  • LLM Comparison: Compare different models' ability to generate relevant responses
  • Prompt Engineering: Refine prompts to improve response relevancy
  • User Satisfaction Prediction: Use relevancy scores as a predictor for potential user satisfaction

Best Practices

  • Use ResponseRelevancy in conjunction with other metrics like Faithfulness and AnswerRelevancy for comprehensive evaluation
  • Set appropriate thresholds for different types of queries and use cases
  • Regularly audit responses with low relevancy scores to identify patterns and improve system performance
  • Consider the complexity of the original query when interpreting scores
  • Incorporate user feedback to verify and calibrate relevancy scores