Documentation

Embedding Similarity

Calculate similarity between embedding vectors using various metrics. Support for cosine similarity, euclidean distance, dot product, and more with configurable options for different use cases.

Embedding Similarity Component

Embedding Similarity component interface and configuration

Dimension Matching: When comparing embedding vectors, ensure both vectors have the same dimensionality. Attempting to compare vectors of different dimensions will result in an error.

Component Inputs

  • Metric: The similarity metric to use for comparison

    Example: "cosine", "euclidean", "dot-product", "manhattan"

  • Vector1: First embedding vector for comparison

    Example: [0.1, 0.2, 0.3, 0.4, ...]

  • Vector2: Second embedding vector for comparison

    Example: [0.2, 0.3, 0.4, 0.5, ...]

  • Source Vector: For batch comparisons, the reference vector

    Example: [0.1, 0.2, 0.3, 0.4, ...]

  • Target Vectors: For batch comparisons, array of vectors to compare against

    Example: [[0.2, 0.3, 0.4], [0.3, 0.4, 0.5], [0.4, 0.5, 0.6]]

Component Outputs

  • Similarity Score: Raw similarity value based on the selected metric

    Example: 0.95 for cosine similarity

  • Normalized Score: Normalized value between 0 and 1 (1 being most similar)

    Example: 0.97 (normalized from the raw score)

  • Metadata: Additional information about the comparison

    Example: vector_dimensions: 768, comparison_time: 0.023

  • Batch Results: For batch comparisons, array of similarity scores

    Example: [0.95, 0.87, 0.76]

Metric Comparison

Cosine Similarity

Measures the cosine of the angle between two vectors, focusing on direction rather than magnitude

Range: -1 to 1 (normalized to 0-1) Interpretation: Higher value means greater similarity Ideal for: Text embeddings, semantic search Note: Ignores magnitude differences, best for normalized vectors

Euclidean Distance

Measures the straight-line distance between two points in Euclidean space

Range: 0 to ∞ (normalized to 0-1, where 1 is closer) Interpretation: Lower value means greater similarity Ideal for: Physical or spatial data, when magnitude matters Note: Sensitive to scale, consider normalizing input vectors

Dot Product

Simple multiplication and summation of corresponding elements

Range: -∞ to ∞ (normalized to 0-1) Interpretation: Higher value means greater similarity Ideal for: Quick calculations, pre-normalized vectors Note: Affected by both direction and magnitude

Implementation Example

// Calculate cosine similarity const cosineSim = new EmbeddingSimilarity({ metric: "cosine" }); // Compare two vectors const result = await cosineSim.compare({ vector1: [0.1, 0.2, 0.3], vector2: [0.2, 0.3, 0.4] }); // Batch comparison const batchResult = await cosineSim.compareBatch({ sourceVector: [0.1, 0.2, 0.3], targetVectors: [ [0.2, 0.3, 0.4], [0.3, 0.4, 0.5], [0.4, 0.5, 0.6] ] }); // Using different metrics const euclideanSim = new EmbeddingSimilarity({ metric: "euclidean" }); const dotProductSim = new EmbeddingSimilarity({ metric: "dot-product" }); console.log(result.similarity.score); // Raw similarity score console.log(result.similarity.normalized_score); // Normalized score between 0-1

Use Cases

  • Semantic Search: Find similar documents or content based on embedding similarity
  • Recommendation Systems: Compare user preference embeddings with content embeddings
  • Duplicate Detection: Identify similar or duplicate content
  • Clustering: Group similar items based on embedding similarity
  • Relevance Ranking: Sort search results by similarity to a query

Best Practices

  • Normalize vectors when using cosine similarity to ensure consistent results
  • Select the appropriate metric based on your specific use case
  • For large-scale comparisons, consider using vector databases like FAISS or Pinecone
  • Validate vector dimensions before comparisons to avoid runtime errors
  • Use batch processing for multiple comparisons to improve performance