Context Precision Evaluator

The Context Precision Evaluator measures how relevant the retrieved contexts are to the user's query. It helps identify cases where irrelevant or tangential information is included in the retrieved contexts.

Context Precision Evaluator component interface and configuration

Evaluation Notice: Low Context Precision scores indicate that your retrieval system is surfacing irrelevant information, which can lead to distraction, confusion, or injection of incorrect information into responses.

Component Inputs

Retrieved Contexts: The collection of retrieved passages or documents used to generate the response
Example: ["Electric vehicles produce zero direct emissions, which improves air quality.", "The history of automobiles dates back to the late 19th century when the first gasoline cars were invented."]
Expected Contexts: The reference or expected contexts that are considered relevant
Example: ["Electric vehicles produce zero direct emissions, which improves air quality.", "EVs have lower operating costs compared to conventional vehicles."]
Distance Measure: The method used to calculate the relevance of retrieved contexts
Example: "Semantic similarity"

Component Outputs

Evaluation Result: Qualitative assessment of the relevance of each retrieved context
Example: "Context #1 is highly relevant to the question about electric vehicle benefits. Context #2 about automobile history is tangential and less relevant."

Score Interpretation

High Context Precision (0.7-1.0)

Most or all of the retrieved contexts are relevant to the query

Example Score: 0.95
This indicates excellent retrieval precision with minimal irrelevant information

Moderate Context Precision (0.3-0.7)

Some retrieved contexts are relevant, but others contain off-topic or tangential information

Example Score: 0.50
This indicates a mix of relevant and irrelevant contexts

Low Context Precision (0.0-0.3)

Most retrieved contexts are irrelevant to the query

Example Score: 0.15
This indicates poor retrieval precision with mostly irrelevant information

Implementation Example

from ragas.metrics import ContextPrecision

# Create the metric
context_precision = ContextPrecision()

# Use in evaluation
from datasets import Dataset
from ragas import evaluate

eval_dataset = Dataset.from_dict({
    "question": ["What are the benefits of electric vehicles?"],
    "contexts": [
        ["Electric vehicles produce zero direct emissions,
         which improves air quality.", 
         "The history of automobiles dates back to the late 
         19th century when the first gasoline cars were invented."]
    ]
})

result = evaluate(
    eval_dataset,
    metrics=[context_precision]
)
print(result)

Use Cases

Retrieval Efficiency: Optimize retrieval systems to avoid wasting computational resources on irrelevant contexts
Vector Search Tuning: Fine-tune vector search parameters (like similarity thresholds) to improve precision
Query Parsing Improvement: Refine query parsing methods to better capture user intent
Distraction Reduction: Prevent models from being distracted by irrelevant information that could lead to hallucinations
Document Preprocessing: Evaluate different document chunking and preprocessing approaches

Useful Resources

Best Practices

Balance Context Precision with Context Recall - improving precision may reduce recall
Consider implementing a pre-filtering step to remove obviously irrelevant documents before more expensive processing
Track precision metrics over time to detect drift in retrieval effectiveness
Use domain-specific knowledge to define context relevance for specialized applications
Combine with query-specific filters to improve precision for particular types of questions

Documentation