Faithfulness Evaluator
The Faithfulness Evaluator assesses whether the generated response contains statements that contradict or misrepresent information in the retrieved contexts. It detects cases where a response is factually incorrect according to the provided source material.

Faithfulness Evaluator component interface and configuration
Evaluation Notice: Low faithfulness scores indicate misrepresentations or contradictions in the generated response. This can damage user trust and lead to the spread of inaccurate information.
Component Inputs
- User Input: The question or prompt from the user
Example: "What is photosynthesis?"
- Generated Output: The response generated by the RAG system
Example: "Photosynthesis is a biological process where plants convert light energy into chemical energy. During this process, plants take in water, carbon dioxide, and minerals, and produce oxygen and energy-rich compounds."
- Retrieved Contexts: The collection of retrieved passages or documents used to generate the response
Example: ["Photosynthesis is the process by which green plants and certain other organisms transform light energy into chemical energy. During photosynthesis in green plants, light energy is captured and used to convert water, carbon dioxide, and minerals into oxygen and energy-rich organic compounds."]
- Handling unknown answers: Configuration for how to handle information not present in the contexts
Example: "Strict - Only allow information explicitly stated in contexts"
Component Outputs
- Evaluation Result: Qualitative assessment of the response's faithfulness to the contexts
Example: "The response accurately represents the information in the context about photosynthesis."
Score Interpretation
High Faithfulness (0.7-1.0)
Response accurately represents information from the contexts with minimal to no contradictions
Example Score: 0.95
This indicates a highly faithful response with statements that closely align with context information
Moderate Faithfulness (0.3-0.7)
Response contains some accurate information but includes notable misrepresentations or contradictions
Example Score: 0.55
This indicates partial faithfulness with some problematic contradictions
Low Faithfulness (0.0-0.3)
Response significantly misrepresents or contradicts the information in the contexts
Example Score: 0.15
This indicates a response with substantial contradictions to the source material
Implementation Example
from ragas.metrics import Faithfulness
# Create the metric
faithfulness = Faithfulness()
# Use in evaluation
from datasets import Dataset
from ragas import evaluate
eval_dataset = Dataset.from_dict({
"question": ["What is photosynthesis?"],
"contexts": [["Photosynthesis is the process
by which green plants and certain other
organisms transform light energy into chemical
energy. During photosynthesis in green plants,
light energy is captured and used to convert
water, carbon dioxide, and minerals into oxygen
and energy-rich organic compounds."]],
"answer": ["Photosynthesis is a biological process
where plants convert light energy into chemical energy. +
During this process, plants take in water, carbon +
dioxide, and minerals, and produce oxygen and +
energy-rich compounds."]
})
result = evaluate(
eval_dataset,
metrics=[faithfulness]
)
print(result)
Use Cases
- Factual Verification: Validate that responses don't contradict or misrepresent the source material
- Model Comparison: Compare different LLMs for their tendencies to stay faithful to provided contexts
- Critical Domain Applications: Ensure accuracy in high-stakes domains like healthcare, legal, and financial advice
- Prompt Engineering: Improve prompting strategies to enhance the faithfulness of generated responses
- Response Filtering: Implement quality gates to block responses that misrepresent source information
Best Practices
- Use Faithfulness in combination with Response Context Precision for a comprehensive assessment
- Analyze contradictions to identify patterns in how the LLM misinterprets or misrepresents information
- Consider implementing additional verification steps for responses with moderate faithfulness scores
- Set domain-specific faithfulness thresholds based on the criticality of accurate information
- Test faithfulness across various types of content (e.g., factual, numerical, procedural) to identify weak spots