Text Embedder

A versatile text embedding component that supports multiple embedding models and provides standardized output format. Ideal for converting text into vector representations for semantic search, clustering, and similarity analysis.

Text Embedder component interface and configuration

Model Configuration Notice: When configuring this component, ensure that the embedding model type and dimensions match the expected output format for your downstream applications. Inconsistent dimensions can cause issues with vector stores and similarity searches.

Component Inputs

Embedding Model: Configuration object for the embedding model
Example: modelType: openai, dimensions: 1536, options: {...}
Message: Text content to be embedded
Example: "The quick brown fox jumps over the lazy dog"
Model Type: Type of embedding model to use
Example: "openai", "huggingface", "custom"
Dimensions: Size of the embedding vector
Example: 768, 1024, 1536, 3072
Model Options: Specific configuration for the selected model
Example: model: text-embedding-ada-002, apiKey: your-api-key

Component Outputs

Embeddings: Vector representation of the input text
Example: [0.021, -0.038, 0.075, ...]
Metadata: Information about the embedding process
Example: model_type: openai, dimensions: 1536, processing_time_ms: 178
Status: Success or error information
Example: success: true, error: null

Model Type Comparison

OpenAI Models

High-quality embeddings from OpenAI's API service

modelType: 'openai'
dimensions: 1536 or 3072
options: {
  model: 'text-embedding-3-small',
  apiKey: 'your-api-key'
}
Ideal for: High-quality embeddings when API access is available

Hugging Face Models

Local or hosted embeddings using Hugging Face Transformers

modelType: 'huggingface'
dimensions: varies by model (typically 384-1024)
options: {
  model: 'sentence-transformers/all-MiniLM-L6-v2',
  quantize: true
}
Ideal for: Local embedding generation without API dependencies

Custom Models

Integrate with your own embedding models or third-party services

modelType: 'custom'
dimensions: specified by implementation
options: {
  // Custom parameters for your model
}
Ideal for: Specialized embedding models or proprietary implementations

Implementation Example

const embedder = new TextEmbedder({
  embeddingModel: {
    modelType: "openai",
    dimensions: 1536,
    options: {
      model: "text-embedding-ada-002",
      apiKey: process.env.OPENAI_API_KEY
    }
  }
});

// Single text embedding
const result = await embedder.embed({
  message: "Your text to embed"
});

// Batch processing
const batchResult = await embedder.embedBatch({
  messages: [
    "First text to embed",
    "Second text to embed",
    "Third text to embed"
  ]
});

// Custom model configuration
const customEmbedder = new TextEmbedder({
  embeddingModel: {
    modelType: "custom",
    dimensions: 768,
    options: {
      modelPath: "path/to/model",
      normalize: true,
      batchSize: 32
    }
  }
});

console.log(result.embeddings);

Use Cases

Model Abstraction: Standardize embedding interfaces across multiple providers
Flexible RAG Systems: Build retrieval systems that can switch between embedding models
Multi-Model Testing: Compare embedding quality across different providers
Hybrid Search: Combine results from multiple embedding models
Fallback Architecture: Implement model redundancy with automatic fallbacks

Useful Resources

Best Practices

Choose embedding dimensions appropriate for your application's needs
Implement proper error handling for model failures
Cache embeddings for frequently used text to reduce computation
Pre-process text appropriately for the chosen embedding model
Use batch processing for multiple texts to improve throughput

Documentation