Response Context Recall Evaluator

The Response Context Recall evaluator assesses how comprehensively the generated response utilizes the important information from the retrieved contexts. It helps identify situations where critical information in the contexts was overlooked during response generation.

Response Context Recall component interface and configuration

Evaluation Notice: The quality of evaluation depends on correctly determining the important information in contexts. Configure your evaluation parameters carefully to ensure accurate assessment of information utilization.

Component Inputs

User Input: The original query or question posed by the user
Example: "Explain the key factors in climate change."
Generated Output: The response generated by the RAG system
Example: "Climate change is mainly caused by human activities that release greenhouse gases..."
Expected Output: The reference or ground truth response (if available)
Example: "Climate change is driven by greenhouse gas emissions from several sources including fossil fuels, deforestation, and industrial processes..."
Retrieved Contexts: The collection of retrieved passages or documents used to generate the response
Example: ["Climate change is primarily driven by greenhouse gas emissions...", "The effects of climate change include rising global temperatures..."]

Component Outputs

Evaluation Result: Qualitative assessment of how well the response utilized the available context information
Example: "The response effectively captures and utilizes most of the key information from the relevant contexts..."

Score Interpretation

High Utilization (0.7-1.0)

Response effectively incorporates most or all of the important information from the contexts

Example Score: 0.92
This indicates excellent utilization of context information

Moderate Utilization (0.3-0.7)

Response incorporates some but not all of the important information from the contexts

Example Score: 0.52
This indicates partial utilization of context information

Low Utilization (0.0-0.3)

Response omits significant important information that was present in the contexts

Example Score: 0.15
This indicates poor utilization of context information

Implementation Example

from ragas.metrics import ResponseContextRecall

# Create the metric
response_recall = ResponseContextRecall()

# Use in evaluation
from datasets import Dataset
from ragas import evaluate

eval_dataset = Dataset.from_dict({
    "question": ["Explain the key factors in climate 
    change."],
    "contexts": [["Climate change is primarily driven
     by greenhouse gas emissions from human activities. +
     The main factors include burning fossil fuels, +
     deforestation, industrial processes, and agriculture. +
     These activities release carbon dioxide, methane, +
     and other greenhouse gases that trap heat in the +
     atmosphere.", "The effects of climate change +
     include rising global temperatures, melting ice caps +
     and glaciers, sea level rise, more frequent extreme 
     weather events, +
     and disruptions to ecosystems."]],
    "answer": ["Climate change is mainly caused by human 
    activities that release greenhouse gases into the +
    atmosphere. Key factors include burning fossil +
    fuels, deforestation, industrial processes, and +
    agricultural practices. These activities emit +
    carbon dioxide, methane, and other gases that +
    trap heat, leading to global warming. The +
    consequences include rising temperatures, melting +
    ice, sea level rise, extreme weather events, and +
    ecosystem disruptions."]
})

result = evaluate(
    eval_dataset,
    metrics=[response_recall]
)
print(result)

Use Cases

Response Generation Evaluation: Assess how effectively LLMs utilize available information when generating responses
Prompt Engineering: Refine prompts to encourage more comprehensive use of context information
Information Utilization Analysis: Identify patterns of information omission or underutilization in RAG systems
Model Comparison: Compare different LLMs' ability to utilize relevant contextual information
System Optimization: Identify areas for improvement in RAG pipelines based on information utilization metrics

Useful Resources

Best Practices

Use in combination with Context Recall to distinguish between retrieval issues and generation issues
Consider domain-specific thresholds for what constitutes "important information"
Analyze low scores to identify patterns in what types of information are being omitted
Compare this metric across different prompt strategies to optimize information utilization
Combine with other metrics like Faithfulness to ensure comprehensive evaluation of response quality

Documentation