Response Context Recall Evaluator
The Response Context Recall evaluator assesses how comprehensively the generated response utilizes the important information from the retrieved contexts. It helps identify situations where critical information in the contexts was overlooked during response generation.

Response Context Recall component interface and configuration
Evaluation Notice: The quality of evaluation depends on correctly determining the important information in contexts. Configure your evaluation parameters carefully to ensure accurate assessment of information utilization.
Component Inputs
- User Input: The original query or question posed by the user
Example: "Explain the key factors in climate change."
- Generated Output: The response generated by the RAG system
Example: "Climate change is mainly caused by human activities that release greenhouse gases..."
- Expected Output: The reference or ground truth response (if available)
Example: "Climate change is driven by greenhouse gas emissions from several sources including fossil fuels, deforestation, and industrial processes..."
- Retrieved Contexts: The collection of retrieved passages or documents used to generate the response
Example: ["Climate change is primarily driven by greenhouse gas emissions...", "The effects of climate change include rising global temperatures..."]
Component Outputs
- Evaluation Result: Qualitative assessment of how well the response utilized the available context information
Example: "The response effectively captures and utilizes most of the key information from the relevant contexts..."
Score Interpretation
High Utilization (0.7-1.0)
Response effectively incorporates most or all of the important information from the contexts
Example Score: 0.92
This indicates excellent utilization of context information
Moderate Utilization (0.3-0.7)
Response incorporates some but not all of the important information from the contexts
Example Score: 0.52
This indicates partial utilization of context information
Low Utilization (0.0-0.3)
Response omits significant important information that was present in the contexts
Example Score: 0.15
This indicates poor utilization of context information
Implementation Example
from ragas.metrics import ResponseContextRecall
# Create the metric
response_recall = ResponseContextRecall()
# Use in evaluation
from datasets import Dataset
from ragas import evaluate
eval_dataset = Dataset.from_dict({
"question": ["Explain the key factors in climate
change."],
"contexts": [["Climate change is primarily driven
by greenhouse gas emissions from human activities. +
The main factors include burning fossil fuels, +
deforestation, industrial processes, and agriculture. +
These activities release carbon dioxide, methane, +
and other greenhouse gases that trap heat in the +
atmosphere.", "The effects of climate change +
include rising global temperatures, melting ice caps +
and glaciers, sea level rise, more frequent extreme
weather events, +
and disruptions to ecosystems."]],
"answer": ["Climate change is mainly caused by human
activities that release greenhouse gases into the +
atmosphere. Key factors include burning fossil +
fuels, deforestation, industrial processes, and +
agricultural practices. These activities emit +
carbon dioxide, methane, and other gases that +
trap heat, leading to global warming. The +
consequences include rising temperatures, melting +
ice, sea level rise, extreme weather events, and +
ecosystem disruptions."]
})
result = evaluate(
eval_dataset,
metrics=[response_recall]
)
print(result)
Use Cases
- Response Generation Evaluation: Assess how effectively LLMs utilize available information when generating responses
- Prompt Engineering: Refine prompts to encourage more comprehensive use of context information
- Information Utilization Analysis: Identify patterns of information omission or underutilization in RAG systems
- Model Comparison: Compare different LLMs' ability to utilize relevant contextual information
- System Optimization: Identify areas for improvement in RAG pipelines based on information utilization metrics
Best Practices
- Use in combination with Context Recall to distinguish between retrieval issues and generation issues
- Consider domain-specific thresholds for what constitutes "important information"
- Analyze low scores to identify patterns in what types of information are being omitted
- Compare this metric across different prompt strategies to optimize information utilization
- Combine with other metrics like Faithfulness to ensure comprehensive evaluation of response quality