HuggingFace Embeddings

Generate embeddings using HuggingFace's extensive collection of open-source models. Supports both hosted inference endpoints and local model deployment for maximum flexibility.

HuggingFace Embeddings component interface and configuration

API Key Notice: When using HuggingFace's hosted inference API, you'll need a valid API token with appropriate permissions. Some models have rate limits that may affect high-volume embedding generation.

Component Inputs

API Key: Your HuggingFace API token for authentication
Example: "hf_abcdefghijklmnopqrstuvwxyz"
Model Name: The identifier of the embedding model to use
Example: "sentence-transformers/all-mpnet-base-v2", "BAAI/bge-large-en-v1.5"
Inference Endpoint: Optional custom endpoint for hosted models
Example: "https://your-endpoint.huggingface.cloud"
Input Text: The text to convert to embeddings
Example: "This is a sample text for embedding generation."

Component Outputs

Embeddings: Vector representation of the input text
Example: [0.018, -0.032, 0.067, ...]
Dimensions: Size of the embedding vector
Example: 384 for all-MiniLM-L6-v2, 768 for all-mpnet-base-v2, 1024 for bge-large-en
Model Info: Details about the model used
Example: name: sentence-transformers/all-mpnet-base-v2, version: 1.0.0, type: huggingface

Model Comparison

sentence-transformers/all-mpnet-base-v2

High-quality general-purpose embedding model with strong performance across many tasks

Dimensions: 768
Performance: Excellent semantic understanding
Size: 420MB
Language Support: English-focused
Ideal for: High-quality general purpose embeddings

sentence-transformers/all-MiniLM-L6-v2

Lightweight and efficient model with good performance-to-size ratio

Dimensions: 384
Performance: Good semantic understanding
Size: 80MB
Language Support: English-focused
Ideal for: Resource-constrained environments or applications with speed requirements

BAAI/bge-large-en-v1.5

State-of-the-art embedding model specifically optimized for retrieval tasks

Dimensions: 1024
Performance: Superior for retrieval and similarity
Size: 1.3GB
Language Support: English
Ideal for: Retrieval augmented generation and semantic search applications

Implementation Example

// Using hosted inference API
const embedder = new HuggingFaceEmbeddor({
  apiKey: process.env.HUGGINGFACE_API_KEY,
  modelName: "sentence-transformers/all-mpnet-base-v2"
});

// Using custom inference endpoint
const customEmbedder = new HuggingFaceEmbeddor({
  apiKey: process.env.HUGGINGFACE_API_KEY,
  modelName: "BAAI/bge-large-en-v1.5",
  inferenceEndpoint: "https://your-endpoint.huggingface.cloud"
});

// Generate embeddings
const result = await embedder.embed({
  input: "Your text to embed"
});

// Batch processing
const batchResult = await embedder.embedBatch({
  inputs: [
    "First text to embed",
    "Second text to embed"
  ]
});

console.log(result.embeddings);

Use Cases

Open Source Deployments: Use high-quality embeddings with open source models
Domain-Specific Applications: Select from specialized models for particular domains
Resource-Constrained Environments: Choose smaller models for edge devices or limited compute
On-premise RAG Systems: Deploy HuggingFace models entirely within your infrastructure
Multi-language Support: Select from models trained on multiple languages

Useful Resources

Best Practices

Choose appropriate model size based on your performance needs and resource constraints
Consider using Inference Endpoints for production deployments with high reliability requirements
Implement caching for frequently used embeddings to reduce API calls
Monitor API usage limits to avoid rate limiting in high-traffic applications
Test different models to find the best performing one for your specific use case

Documentation