Vertex AI Models
A drag-and-drop component for integrating Google Cloud's Vertex AI models into your workflow. Configure model parameters and connect inputs/outputs to other components.

Vertex AI component interface and configuration
GCP Setup Required: This component requires a Google Cloud Platform account with Vertex AI API enabled and appropriate service account credentials. Ensure you have set up a GCP project and configured billing before using this component.
Component Inputs
- Input: Text input for the model
Example: "Explain how multimodal large language models work."
- System Message: System prompt to guide model behavior
Example: "You are a helpful AI assistant with expertise in machine learning and AI technologies."
- Stream: Toggle for streaming responses
Example: true (for real-time token streaming) or false (for complete response)
- Model Name: The Vertex AI model to use
Example: "gemini-1.5-pro", "gemini-1.5-flash", "gemini-pro"
- Credentials: Google Cloud credentials file
Example: Path to service account key JSON file
- Project: Google Cloud project ID
Example: "my-vertex-project-123456"
- Location: Region where Vertex AI is deployed
Example: "us-central1", "europe-west4"
Component Outputs
- Text: Generated text output
Example: "Multimodal large language models are AI systems that can process and generate content across multiple types of data..."
- Language Model: Model information and metadata
Example: model: gemini-1.5-pro, usage: {prompt_tokens: 50, completion_tokens: 180, total_tokens: 230}
Model Parameters
Max Output Tokens
Maximum number of tokens to generate in the response
Default: Model-dependent
Range: 1 to model maximum (varies by model)
Recommendation: Set based on expected response length
Temperature
Controls randomness in the output - higher values increase creativity
Default: 0.0
Range: 0.0 to 1.0
Recommendation: Lower (0.0-0.3) for factual/consistent responses, Higher (0.7-1.0) for creative tasks
Top K
Limits vocabulary for each generation step to k most likely tokens
Default: 40
Range: 1 to any positive integer
Recommendation: Higher values allow for more diversity in responses
Top P
Nucleus sampling parameter - controls diversity of generated text
Default: 0.95
Range: 0.0 to 1.0
Recommendation: Lower values (e.g., 0.5) for more focused text generation
Max Retries
Number of retry attempts for failed requests
Default: 1
Range: 0 to any reasonable number
Recommendation: Increase for critical applications
Verbose
Toggle detailed output logging for debugging
Options: true/false
Default: false
Recommendation: Enable during development and testing
Supported Models
Gemini Models
Google's latest multimodal models
- gemini-1.5-pro: Most powerful model with 1M context window
- gemini-1.5-flash: Efficient model for faster responses
- gemini-pro: Earlier generation model
- gemini-ultra: Enterprise-focused model with advanced capabilities
PaLM Models
Text-only models (older generation)
- text-bison: General purpose text model
- chat-bison: Optimized for conversational applications
Implementation Example
// Basic configuration
const vertexAI = {
modelName: "gemini-1.5-pro",
project: "my-vertex-project-123456",
location: "us-central1",
credentials: process.env.GOOGLE_APPLICATION_CREDENTIALS
};
// Advanced configuration
const advancedVertexAI = {
modelName: "gemini-1.5-pro",
project: "my-vertex-project-123456",
location: "us-central1",
credentials: JSON.parse(process.env.GCP_SERVICE_ACCOUNT_KEY),
maxOutputTokens: 2000,
temperature: 0.2,
topK: 40,
topP: 0.95,
maxRetries: 3,
verbose: true,
stream: true
};
// Usage example
async function generateResponse(input) {
const response = await vertexAIComponent.generate({
input: input,
systemMessage: "You are an AI assistant specializing in technical explanations.",
modelName: "gemini-1.5-pro",
temperature: 0.1
});
return response.text;
}
Use Cases
- Enterprise Applications: Build AI solutions with Google Cloud security and compliance features
- Multimodal Processing: Create applications that can understand and generate content with text and images
- Content Generation: Generate articles, summaries, and creative content
- Conversational Agents: Build sophisticated chatbots with context awareness
- Google Cloud Integration: Integrate with other Google Cloud services in a unified environment
Best Practices
- Use service account credentials with least privilege access
- Set appropriate region for lower latency based on your user locations
- Enable streaming for real-time responses in interactive applications
- Monitor API quotas and usage through Google Cloud Console
- Implement proper error handling with appropriate retry mechanisms
- Test with small token limits during development
- Consider using environment variables for credentials and project settings