Embedding Similarity

Calculate similarity between embedding vectors using various metrics. Support for cosine similarity, euclidean distance, dot product, and more with configurable options for different use cases.

Embedding Similarity component interface and configuration

Dimension Matching: When comparing embedding vectors, ensure both vectors have the same dimensionality. Attempting to compare vectors of different dimensions will result in an error.

Component Inputs

Metric: The similarity metric to use for comparison
Example: "cosine", "euclidean", "dot-product", "manhattan"
Vector1: First embedding vector for comparison
Example: [0.1, 0.2, 0.3, 0.4, ...]
Vector2: Second embedding vector for comparison
Example: [0.2, 0.3, 0.4, 0.5, ...]
Source Vector: For batch comparisons, the reference vector
Example: [0.1, 0.2, 0.3, 0.4, ...]
Target Vectors: For batch comparisons, array of vectors to compare against
Example: [[0.2, 0.3, 0.4], [0.3, 0.4, 0.5], [0.4, 0.5, 0.6]]

Component Outputs

Similarity Score: Raw similarity value based on the selected metric
Example: 0.95 for cosine similarity
Normalized Score: Normalized value between 0 and 1 (1 being most similar)
Example: 0.97 (normalized from the raw score)
Metadata: Additional information about the comparison
Example: vector_dimensions: 768, comparison_time: 0.023
Batch Results: For batch comparisons, array of similarity scores
Example: [0.95, 0.87, 0.76]

Metric Comparison

Cosine Similarity

Measures the cosine of the angle between two vectors, focusing on direction rather than magnitude

Range: -1 to 1 (normalized to 0-1)
Interpretation: Higher value means greater similarity
Ideal for: Text embeddings, semantic search
Note: Ignores magnitude differences, best for normalized vectors

Euclidean Distance

Measures the straight-line distance between two points in Euclidean space

Range: 0 to ∞ (normalized to 0-1, where 1 is closer)
Interpretation: Lower value means greater similarity
Ideal for: Physical or spatial data, when magnitude matters
Note: Sensitive to scale, consider normalizing input vectors

Dot Product

Simple multiplication and summation of corresponding elements

Range: -∞ to ∞ (normalized to 0-1)
Interpretation: Higher value means greater similarity
Ideal for: Quick calculations, pre-normalized vectors
Note: Affected by both direction and magnitude

Implementation Example

// Calculate cosine similarity
const cosineSim = new EmbeddingSimilarity({
  metric: "cosine"
});

// Compare two vectors
const result = await cosineSim.compare({
  vector1: [0.1, 0.2, 0.3],
  vector2: [0.2, 0.3, 0.4]
});

// Batch comparison
const batchResult = await cosineSim.compareBatch({
  sourceVector: [0.1, 0.2, 0.3],
  targetVectors: [
    [0.2, 0.3, 0.4],
    [0.3, 0.4, 0.5],
    [0.4, 0.5, 0.6]
  ]
});

// Using different metrics
const euclideanSim = new EmbeddingSimilarity({
  metric: "euclidean"
});

const dotProductSim = new EmbeddingSimilarity({
  metric: "dot-product"
});

console.log(result.similarity.score); // Raw similarity score
console.log(result.similarity.normalized_score); // Normalized score between 0-1

Use Cases

Semantic Search: Find similar documents or content based on embedding similarity
Recommendation Systems: Compare user preference embeddings with content embeddings
Duplicate Detection: Identify similar or duplicate content
Clustering: Group similar items based on embedding similarity
Relevance Ranking: Sort search results by similarity to a query

Useful Resources

Best Practices

Normalize vectors when using cosine similarity to ensure consistent results
Select the appropriate metric based on your specific use case
For large-scale comparisons, consider using vector databases like FAISS or Pinecone
Validate vector dimensions before comparisons to avoid runtime errors
Use batch processing for multiple comparisons to improve performance

Documentation