Vertex AI Embeddor

Enterprise-Grade Embeddings
Vertex AI Embeddor Diagram

Overview

Generate high-quality embeddings using Google's Vertex AI platform. Features advanced parameter control, parallel processing, and enterprise-grade reliability.

Available Models

  • textembedding-gecko
  • textembedding-gecko-multilingual

Key Features

  • Parallel processing
  • Advanced parameter tuning
  • Streaming support
  • Enterprise security

Configuration

Required Parameters

  • credentialsGoogle Cloud credentials
  • locationGCP region
  • projectGCP project ID
  • modelNameVertex AI model name

Optional Parameters

  • maxOutputTokensDefault: 1024
  • maxRetriesDefault: 3
  • nNumber of outputs (Default: 1)
  • requestParallelismDefault: 5
  • stopStop sequences
  • streamingDefault: false
  • temperatureDefault: 0.0
  • topKDefault: 40
  • topPDefault: 0.95

Example Usage

// Basic configuration
const embedder = new VertexAIEmbeddor({
  credentials: {
    client_email: "your-service-account@project.iam.gserviceaccount.com",
    private_key: "your-private-key"
  },
  location: "us-central1",
  project: "your-project-id",
  modelName: "textembedding-gecko"
});

// Advanced configuration
const advancedEmbedder = new VertexAIEmbeddor({
  credentials: {
    client_email: "your-service-account@project.iam.gserviceaccount.com",
    private_key: "your-private-key"
  },
  location: "us-central1",
  project: "your-project-id",
  modelName: "textembedding-gecko-multilingual",
  maxOutputTokens: 2048,
  maxRetries: 5,
  n: 3,
  requestParallelism: 10,
  stop: ["END"],
  streaming: true,
  temperature: 0.7,
  topK: 50,
  topP: 0.8
});

// Generate embeddings
const result = await embedder.embed({
  input: "Your text to embed"
});

// Batch processing with streaming
const streamingResult = await advancedEmbedder.embedBatch({
  inputs: [
    "First text to embed",
    "Second text to embed"
  ],
  streaming: true
});

Best Practices

  • Use service account authentication
  • Implement proper error handling
  • Monitor API quotas
  • Cache frequent embeddings

Performance Tips

  • Optimize request parallelism
  • Use appropriate batch sizes
  • Monitor token usage

Response Format

{
  "embeddings": {
    "vectors": number[][],
    "dimensions": number,
    "model": string
  },
  "usage": {
    "total_tokens": number,
    "prompt_tokens": number,
    "completion_tokens": number
  },
  "metadata": {
    "project_id": string,
    "location": string,
    "model_version": string,
    "processing_time": number,
    "request_params": {
      "temperature": number,
      "topK": number,
      "topP": number,
      "n": number
    }
  },
  "status": {
    "success": boolean,
    "error": string | null
  }
}