LangWatch Evaluator
The LangWatch Evaluator provides comprehensive evaluation of LLM outputs against expected results. It helps maintain quality control and monitors performance across different evaluation metrics.

LangWatch evaluation workflow
Configuration
API Setup:
const evaluator = new LangWatchEvaluator({
apiKey: "your-api-key",
evaluatorName: "custom-evaluator-name"
});
Input Parameters
- evaluatorName:
Unique identifier for your evaluation configuration
- apiKey:
Authentication key for LangWatch services
- input:
Original prompt or query sent to the LLM
- output:
Actual response generated by the LLM
- expectedOutput:
Reference output for comparison
- contexts:
Additional context information for evaluation
Example Usage
const result = await evaluator.evaluate({
input: "Summarize the benefits of renewable energy",
output: "Renewable energy provides sustainable power...",
expectedOutput: "Key benefits of renewable energy include...",
contexts: {
temperature: 0.7,
maxTokens: 150,
domain: "environmental-science"
}
});
Evaluation Results
Sample Output:
{
score: 0.85,
metrics: {
relevance: 0.9,
accuracy: 0.8,
completeness: 0.85,
coherence: 0.85
},
feedback: {
strengths: ["Comprehensive coverage", "Clear explanation"],
improvements: ["Add specific examples", "Include statistics"]
},
metadata: {
evaluationTime: "2024-01-20T10:30:00Z",
modelVersion: "1.0.0"
}
}
Evaluation Metrics
- Semantic Similarity
- Response Accuracy
- Content Completeness
- Contextual Relevance
- Grammar and Coherence
- Task Alignment
Use Cases
- Quality Assurance Testing
- Model Performance Monitoring
- Response Validation
- Regression Testing
- Content Generation Verification
Note: Evaluation results may vary based on the complexity of the input and the specific metrics being measured. Consider running multiple evaluations for more reliable results.
Tip: Regularly update your evaluation criteria and expected outputs to maintain alignment with evolving business needs and model capabilities.