Documentation

Response Context Recall Evaluator

The Response Context Recall evaluator assesses how comprehensively the generated response utilizes the important information from the retrieved contexts. It helps identify situations where critical information in the contexts was overlooked during response generation.

Response Context Recall Component

Response Context Recall component interface and configuration

Evaluation Notice: The quality of evaluation depends on correctly determining the important information in contexts. Configure your evaluation parameters carefully to ensure accurate assessment of information utilization.

Component Inputs

  • User Input: The original query or question posed by the user

    Example: "Explain the key factors in climate change."

  • Generated Output: The response generated by the RAG system

    Example: "Climate change is mainly caused by human activities that release greenhouse gases..."

  • Expected Output: The reference or ground truth response (if available)

    Example: "Climate change is driven by greenhouse gas emissions from several sources including fossil fuels, deforestation, and industrial processes..."

  • Retrieved Contexts: The collection of retrieved passages or documents used to generate the response

    Example: ["Climate change is primarily driven by greenhouse gas emissions...", "The effects of climate change include rising global temperatures..."]

Component Outputs

  • Evaluation Result: Qualitative assessment of how well the response utilized the available context information

    Example: "The response effectively captures and utilizes most of the key information from the relevant contexts..."

Score Interpretation

High Utilization (0.7-1.0)

Response effectively incorporates most or all of the important information from the contexts

Example Score: 0.92 This indicates excellent utilization of context information

Moderate Utilization (0.3-0.7)

Response incorporates some but not all of the important information from the contexts

Example Score: 0.52 This indicates partial utilization of context information

Low Utilization (0.0-0.3)

Response omits significant important information that was present in the contexts

Example Score: 0.15 This indicates poor utilization of context information

Implementation Example

from ragas.metrics import ResponseContextRecall # Create the metric response_recall = ResponseContextRecall() # Use in evaluation from datasets import Dataset from ragas import evaluate eval_dataset = Dataset.from_dict({ "question": ["Explain the key factors in climate change."], "contexts": [["Climate change is primarily driven by greenhouse gas emissions from human activities. + The main factors include burning fossil fuels, + deforestation, industrial processes, and agriculture. + These activities release carbon dioxide, methane, + and other greenhouse gases that trap heat in the + atmosphere.", "The effects of climate change + include rising global temperatures, melting ice caps + and glaciers, sea level rise, more frequent extreme weather events, + and disruptions to ecosystems."]], "answer": ["Climate change is mainly caused by human activities that release greenhouse gases into the + atmosphere. Key factors include burning fossil + fuels, deforestation, industrial processes, and + agricultural practices. These activities emit + carbon dioxide, methane, and other gases that + trap heat, leading to global warming. The + consequences include rising temperatures, melting + ice, sea level rise, extreme weather events, and + ecosystem disruptions."] }) result = evaluate( eval_dataset, metrics=[response_recall] ) print(result)

Use Cases

  • Response Generation Evaluation: Assess how effectively LLMs utilize available information when generating responses
  • Prompt Engineering: Refine prompts to encourage more comprehensive use of context information
  • Information Utilization Analysis: Identify patterns of information omission or underutilization in RAG systems
  • Model Comparison: Compare different LLMs' ability to utilize relevant contextual information
  • System Optimization: Identify areas for improvement in RAG pipelines based on information utilization metrics

Best Practices

  • Use in combination with Context Recall to distinguish between retrieval issues and generation issues
  • Consider domain-specific thresholds for what constitutes "important information"
  • Analyze low scores to identify patterns in what types of information are being omitted
  • Compare this metric across different prompt strategies to optimize information utilization
  • Combine with other metrics like Faithfulness to ensure comprehensive evaluation of response quality