LangWatch Evaluator

The LangWatch Evaluator provides comprehensive evaluation of LLM outputs against expected results. It helps maintain quality control and monitors performance across different evaluation metrics.

LangWatch Evaluator Architecture

LangWatch evaluation workflow

Configuration

API Setup:

const evaluator = new LangWatchEvaluator({ apiKey: "your-api-key", evaluatorName: "custom-evaluator-name" });

Input Parameters

  • evaluatorName:

    Unique identifier for your evaluation configuration

  • apiKey:

    Authentication key for LangWatch services

  • input:

    Original prompt or query sent to the LLM

  • output:

    Actual response generated by the LLM

  • expectedOutput:

    Reference output for comparison

  • contexts:

    Additional context information for evaluation

Example Usage

const result = await evaluator.evaluate({ input: "Summarize the benefits of renewable energy", output: "Renewable energy provides sustainable power...", expectedOutput: "Key benefits of renewable energy include...", contexts: { temperature: 0.7, maxTokens: 150, domain: "environmental-science" } });

Evaluation Results

Sample Output:

{ score: 0.85, metrics: { relevance: 0.9, accuracy: 0.8, completeness: 0.85, coherence: 0.85 }, feedback: { strengths: ["Comprehensive coverage", "Clear explanation"], improvements: ["Add specific examples", "Include statistics"] }, metadata: { evaluationTime: "2024-01-20T10:30:00Z", modelVersion: "1.0.0" } }

Evaluation Metrics

  • Semantic Similarity
  • Response Accuracy
  • Content Completeness
  • Contextual Relevance
  • Grammar and Coherence
  • Task Alignment

Use Cases

  • Quality Assurance Testing
  • Model Performance Monitoring
  • Response Validation
  • Regression Testing
  • Content Generation Verification

Note: Evaluation results may vary based on the complexity of the input and the specific metrics being measured. Consider running multiple evaluations for more reliable results.

Tip: Regularly update your evaluation criteria and expected outputs to maintain alignment with evolving business needs and model capabilities.