Documentation

Groq Models

A drag-and-drop component for integrating Groq's high-performance LLM inference into your workflow. Configure model parameters and connect inputs/outputs to other components.

Groq Component

Groq component interface and configuration

API Key Required: A valid Groq API key is required to use this component. Ensure you've registered for a Groq account and have generated an API key before using this component.

Component Inputs

  • Input: Text input for the model

    Example: "Explain the advantages of Groq's LPU architecture for inference."

  • System Message: System prompt to guide model behavior

    Example: "You are a helpful AI assistant specializing in hardware acceleration and machine learning."

  • Stream: Toggle for streaming responses

    Example: true (for real-time token streaming) or false (for complete response)

  • Model: The Groq model to use

    Example: "llama3-8b-8192", "mixtral-8x7b-32768", "gemma-7b-it"

  • Groq API Key: Your API authentication key

    Example: "gsk_abc123def456..."

  • Groq API Base: API endpoint URL

    Example: "https://api.groq.com/openai/v1" (Default)

Component Outputs

  • Text: Generated text output

    Example: "Groq's LPU (Language Processing Unit) architecture is specifically designed for LLM inference, offering several advantages..."

  • Language Model: Model information and metadata

    Example: model: llama3-8b-8192, usage: {prompt_tokens: 35, completion_tokens: 150, total_tokens: 185}

Model Parameters

Max Output Tokens

Maximum number of tokens to generate in the response

Default: Model-dependent Range: 1 to model maximum Recommendation: Set based on expected response length

Temperature

Controls randomness in the output - higher values increase creativity

Default: 0.1 Range: 0.0 to 1.0 Recommendation: Lower (0.0-0.3) for factual/consistent responses, Higher (0.7-1.0) for creative tasks

N

Number of completions to generate

Default: 1 Range: 1 to 5 Recommendation: Use 1 for most applications, higher values for generating multiple response options

Available Models

Llama 3

Meta's latest Llama models optimized for Groq's LPU architecture

Models: - llama3-8b-8192 (8B parameters, 8K context window) - llama3-70b-8192 (70B parameters, 8K context window)

Mixtral

Mixtral's mixture-of-experts models

Models: - mixtral-8x7b-32768 (8x7B parameters, 32K context window)

Gemma

Google's lightweight, open models

Models: - gemma-7b-it (7B parameters, instruction-tuned)

Implementation Example

// Basic configuration const groqConfig = { model: "llama3-8b-8192", groqApiKey: process.env.GROQ_API_KEY, systemMessage: "You are a helpful assistant." }; // Advanced configuration const advancedGroqConfig = { model: "mixtral-8x7b-32768", groqApiKey: process.env.GROQ_API_KEY, groqApiBase: "https://api.groq.com/openai/v1", maxOutputTokens: 2000, temperature: 0.3, n: 1, stream: true }; // Usage example async function generateResponse(input) { const response = await groqComponent.generate({ input: input, systemMessage: "You are an AI assistant that explains complex concepts clearly.", model: "llama3-8b-8192", temperature: 0.2 }); return response.text; }

Use Cases

  • Real-time Applications: Leverage Groq's low-latency inference for chat and interactive systems
  • Content Generation: Create articles, summaries, and creative content quickly
  • Customer Support: Build responsive support bots with fast response times
  • Development Tools: Integrate into development workflows for code generation and documentation
  • Education: Create interactive learning experiences with minimal latency

Best Practices

  • Store your API key securely using environment variables
  • Enable streaming for faster perceived response times
  • Start with the default temperature (0.1) and adjust based on your use case
  • Use a single completion (n=1) for most applications
  • Monitor your API token usage through the Groq console
  • Implement proper error handling for API failures
  • Consider model size tradeoffs (larger models are more capable but may have higher latency)