Documentation

OpenAI Embeddings

Generate high-quality embeddings using OpenAI's state-of-the-art models. Features extensive configuration options, robust error handling, and enterprise-grade reliability.

OpenAI Embeddings Component

OpenAI Embeddings component interface and configuration

API Key Notice: Ensure your API key has sufficient quota for your embedding needs. OpenAI rate limits API requests based on your tier, and embedding large volumes of text may require a higher tier subscription.

Component Inputs

  • OpenAI API Key: Your OpenAI API authentication key

    Example: "sk-abcdefg123456789"

  • Model: The embedding model to use

    Example: "text-embedding-3-small", "text-embedding-3-large", "text-embedding-ada-002"

  • OpenAI API Base: Optional custom API endpoint

    Example: "https://api.openai.com/v1" or your custom endpoint

  • OpenAI API Type: Type of API service

    Example: "openai" or "azure"

  • OpenAI API Version: Version of the API

    Example: "2024-02-15"

  • OpenAI Organization: Organization ID for team accounts

    Example: "org-123456789"

Component Outputs

  • Embeddings: Vector representations of the input text

    Example: [0.012, -0.045, 0.067, ...]

  • Token Usage: Number of tokens used for the request

    Example: total_tokens: 125, prompt_tokens: 125

Model Comparison

text-embedding-3-small

Efficient model with 1536 dimensions, offering a good balance between quality and cost

Dimensions: 1536 Contextual Understanding: Strong Languages: Multilingual Ideal for: Most general embedding tasks

text-embedding-3-large

Highest quality embeddings with 3072 dimensions, optimal for tasks requiring maximum accuracy

Dimensions: 3072 Contextual Understanding: Superior Languages: Multilingual with enhanced capabilities Ideal for: High-precision semantic search, knowledge retrieval, and nuanced text comparison

text-embedding-ada-002

Legacy model with 1536 dimensions, maintained for backward compatibility

Dimensions: 1536 Contextual Understanding: Good Languages: Primarily optimized for English Ideal for: Legacy systems or when compatibility is required

Implementation Example

// Basic configuration const embedder = new OpenAIEmbeddor({ openaiApiKey: process.env.OPENAI_API_KEY, model: "text-embedding-3-small" }); // Advanced configuration const advancedEmbedder = new OpenAIEmbeddor({ openaiApiKey: process.env.OPENAI_API_KEY, model: "text-embedding-3-large", openaiApiBase: "https://custom-endpoint.com", openaiApiType: "azure", openaiApiVersion: "2024-02-15", openaiOrganization: "org-id", maxRetries: 5, requestTimeout: 30000, chunkSize: 1000, showProgressBar: true, skipEmpty: true, tiktokenEnable: true }); // Generate embeddings const result = await embedder.embed({ input: "Your text to embed" }); // The result contains the embedding vectors console.log(result);

Use Cases

  • Semantic Search: Create a vector store for similarity search
  • RAG Applications: Enhance retrieval-augmented generation with high-quality embeddings
  • Document Clustering: Group similar documents based on semantic similarity
  • Recommendation Systems: Build content recommendation engines
  • Duplicate Content Detection: Identify similar or duplicate content

Best Practices

  • Use environment variables for your API keys in production environments
  • Implement caching for embeddings to reduce API costs for repeated content
  • Use text-embedding-3-small for most use cases, and text-embedding-3-large when precision is critical
  • Pre-process text by removing unnecessary whitespace and formatting to reduce token usage
  • Batch similar-length texts together for more efficient processing