Faithfulness Evaluator

The Faithfulness Evaluator assesses whether the generated response contains statements that contradict or misrepresent information in the retrieved contexts. It detects cases where a response is factually incorrect according to the provided source material.

Faithfulness Evaluator component interface and configuration

Evaluation Notice: Low faithfulness scores indicate misrepresentations or contradictions in the generated response. This can damage user trust and lead to the spread of inaccurate information.

Component Inputs

User Input: The question or prompt from the user
Example: "What is photosynthesis?"
Generated Output: The response generated by the RAG system
Example: "Photosynthesis is a biological process where plants convert light energy into chemical energy. During this process, plants take in water, carbon dioxide, and minerals, and produce oxygen and energy-rich compounds."
Retrieved Contexts: The collection of retrieved passages or documents used to generate the response
Example: ["Photosynthesis is the process by which green plants and certain other organisms transform light energy into chemical energy. During photosynthesis in green plants, light energy is captured and used to convert water, carbon dioxide, and minerals into oxygen and energy-rich organic compounds."]
Handling unknown answers: Configuration for how to handle information not present in the contexts
Example: "Strict - Only allow information explicitly stated in contexts"

Component Outputs

Evaluation Result: Qualitative assessment of the response's faithfulness to the contexts
Example: "The response accurately represents the information in the context about photosynthesis."

Score Interpretation

High Faithfulness (0.7-1.0)

Response accurately represents information from the contexts with minimal to no contradictions

Example Score: 0.95
This indicates a highly faithful response with statements that closely align with context information

Moderate Faithfulness (0.3-0.7)

Response contains some accurate information but includes notable misrepresentations or contradictions

Example Score: 0.55
This indicates partial faithfulness with some problematic contradictions

Low Faithfulness (0.0-0.3)

Response significantly misrepresents or contradicts the information in the contexts

Example Score: 0.15
This indicates a response with substantial contradictions to the source material

Implementation Example

from ragas.metrics import Faithfulness

# Create the metric
faithfulness = Faithfulness()

# Use in evaluation
from datasets import Dataset
from ragas import evaluate

eval_dataset = Dataset.from_dict({
    "question": ["What is photosynthesis?"],
    "contexts": [["Photosynthesis is the process
     by which green plants and certain other 
     organisms transform light energy into chemical 
     energy. During photosynthesis in green plants, 
     light energy is captured and used to convert 
     water, carbon dioxide, and minerals into oxygen
      and energy-rich organic compounds."]],
    "answer": ["Photosynthesis is a biological process
     where plants convert light energy into chemical energy. +
     During this process, plants take in water, carbon +
     dioxide, and minerals, and produce oxygen and +
     energy-rich compounds."]
})

result = evaluate(
    eval_dataset,
    metrics=[faithfulness]
)
print(result)

Use Cases

Factual Verification: Validate that responses don't contradict or misrepresent the source material
Model Comparison: Compare different LLMs for their tendencies to stay faithful to provided contexts
Critical Domain Applications: Ensure accuracy in high-stakes domains like healthcare, legal, and financial advice
Prompt Engineering: Improve prompting strategies to enhance the faithfulness of generated responses
Response Filtering: Implement quality gates to block responses that misrepresent source information

Useful Resources

Best Practices

Use Faithfulness in combination with Response Context Precision for a comprehensive assessment
Analyze contradictions to identify patterns in how the LLM misinterprets or misrepresents information
Consider implementing additional verification steps for responses with moderate faithfulness scores
Set domain-specific faithfulness thresholds based on the criticality of accurate information
Test faithfulness across various types of content (e.g., factual, numerical, procedural) to identify weak spots

Documentation