OpenAI Embeddings

Generate high-quality embeddings using OpenAI's state-of-the-art models. Features extensive configuration options, robust error handling, and enterprise-grade reliability.

OpenAI Embeddings component interface and configuration

API Key Notice: Ensure your API key has sufficient quota for your embedding needs. OpenAI rate limits API requests based on your tier, and embedding large volumes of text may require a higher tier subscription.

Component Inputs

OpenAI API Key: Your OpenAI API authentication key
Example: "sk-abcdefg123456789"
Model: The embedding model to use
Example: "text-embedding-3-small", "text-embedding-3-large", "text-embedding-ada-002"
OpenAI API Base: Optional custom API endpoint
Example: "https://api.openai.com/v1" or your custom endpoint
OpenAI API Type: Type of API service
Example: "openai" or "azure"
OpenAI API Version: Version of the API
Example: "2024-02-15"
OpenAI Organization: Organization ID for team accounts
Example: "org-123456789"

Component Outputs

Embeddings: Vector representations of the input text
Example: [0.012, -0.045, 0.067, ...]
Token Usage: Number of tokens used for the request
Example: total_tokens: 125, prompt_tokens: 125

Model Comparison

text-embedding-3-small

Efficient model with 1536 dimensions, offering a good balance between quality and cost

Dimensions: 1536
Contextual Understanding: Strong
Languages: Multilingual
Ideal for: Most general embedding tasks

text-embedding-3-large

Highest quality embeddings with 3072 dimensions, optimal for tasks requiring maximum accuracy

Dimensions: 3072
Contextual Understanding: Superior
Languages: Multilingual with enhanced capabilities
Ideal for: High-precision semantic search, knowledge retrieval, and nuanced text comparison

text-embedding-ada-002

Legacy model with 1536 dimensions, maintained for backward compatibility

Dimensions: 1536
Contextual Understanding: Good
Languages: Primarily optimized for English
Ideal for: Legacy systems or when compatibility is required

Implementation Example

// Basic configuration
const embedder = new OpenAIEmbeddor({
  openaiApiKey: process.env.OPENAI_API_KEY,
  model: "text-embedding-3-small"
});

// Advanced configuration
const advancedEmbedder = new OpenAIEmbeddor({
  openaiApiKey: process.env.OPENAI_API_KEY,
  model: "text-embedding-3-large",
  openaiApiBase: "https://custom-endpoint.com",
  openaiApiType: "azure",
  openaiApiVersion: "2024-02-15",
  openaiOrganization: "org-id",
  maxRetries: 5,
  requestTimeout: 30000,
  chunkSize: 1000,
  showProgressBar: true,
  skipEmpty: true,
  tiktokenEnable: true
});

// Generate embeddings
const result = await embedder.embed({
  input: "Your text to embed"
});

// The result contains the embedding vectors
console.log(result);

Use Cases

Semantic Search: Create a vector store for similarity search
RAG Applications: Enhance retrieval-augmented generation with high-quality embeddings
Document Clustering: Group similar documents based on semantic similarity
Recommendation Systems: Build content recommendation engines
Duplicate Content Detection: Identify similar or duplicate content

Useful Resources

Best Practices

Use environment variables for your API keys in production environments
Implement caching for embeddings to reduce API costs for repeated content
Use text-embedding-3-small for most use cases, and text-embedding-3-large when precision is critical
Pre-process text by removing unnecessary whitespace and formatting to reduce token usage
Batch similar-length texts together for more efficient processing

Documentation