Documentation

Hugging Face Models

A drag-and-drop component for integrating Hugging Face's inference API. Configure model parameters and connect inputs/outputs to access thousands of open-source models.

Hugging Face Component

Hugging Face component interface and configuration

API Token Required: A valid Hugging Face API token is required to use this component with most models. Some models may have rate limits or require Pro subscriptions for commercial usage.

Component Inputs

  • Input: Text input for the model

    Example: "Explain the transformer architecture in simple terms."

  • System Message: System prompt to guide model behavior

    Example: "You are a helpful AI assistant that explains complex concepts clearly."

  • Stream: Toggle for streaming responses

    Example: true (for real-time token streaming) or false (for complete response)

  • Model ID: The Hugging Face model identifier

    Example: "meta-llama/Llama-3-8b-chat-hf", "google/flan-t5-xxl"

  • API Token: Your Hugging Face API token

    Example: "hf_abcdefghijklmnopqrstuvwxyz"

  • Inference Endpoint: API endpoint URL (optional)

    Example: "https://api-inference.huggingface.co/models/meta-llama/Llama-3-8b-chat-hf"

  • Task: Specific task for the model

    Example: "text-generation", "summarization", "question-answering"

Component Outputs

  • Text: Generated text output

    Example: "The transformer architecture consists of an encoder and a decoder, using self-attention mechanisms to process input sequences in parallel..."

  • Language Model: Model information and metadata

    Example: model_id: meta-llama/Llama-3-8b-chat-hf, task: text-generation

Generation Parameters

Max New Tokens

Maximum number of tokens to generate in the response

Default: 512 Range: 1 to model maximum Recommendation: Set based on expected response length

Temperature

Controls randomness in the output - higher values increase creativity

Default: 0.8 Range: 0.0 to 2.0 Recommendation: Lower (0.1-0.3) for factual/consistent responses, Higher (0.7-1.0) for creative tasks

Top K

Limits vocabulary for each generation step to k most likely tokens

Default: 50 Range: 0 (disabled) to any positive integer Recommendation: 40-100 for balanced diversity

Top P

Nucleus sampling parameter - controls diversity of generated text

Default: 0.95 Range: 0.0 to 1.0 Recommendation: Lower values (e.g., 0.5) for more focused text generation

Typical P

Controls generation based on typical probability of tokens

Default: 0.95 Range: 0.0 to 1.0 Recommendation: Higher values for more diverse outputs

Repetition Penalty

Penalty for repeated token sequences

Default: 1.1 Range: 1.0 to 2.0 Recommendation: Higher values (1.2-1.5) to reduce repetition

Retry Attempts

Number of times to retry failed API calls

Default: 1 Range: 0 to any reasonable number Recommendation: 2-3 for better reliability with rate-limited models

Popular Model Categories

Open-Source LLMs

State-of-the-art open-source large language models

- meta-llama/Llama-3-8b-chat-hf - mistralai/Mistral-7B-Instruct-v0.2 - tiiuae/falcon-7b-instruct - codellama/CodeLlama-13b-Instruct-hf

Specialized Models

Models trained for specific tasks

- facebook/bart-large-cnn (summarization) - distilbert-base-uncased-finetuned-sst-2-english (sentiment analysis) - xlm-roberta-large-xnli (language understanding) - t5-base (translation)

Multilingual Models

Models supporting multiple languages

- facebook/mbart-large-50 - xlm-roberta-base - cardiffnlp/twitter-xlm-roberta-base-sentiment - Helsinki-NLP/opus-mt-en-fr (translation)

Implementation Example

// Basic configuration const huggingFaceConfig = { modelId: "meta-llama/Llama-3-8b-chat-hf", apiToken: process.env.HUGGINGFACE_API_TOKEN, task: "text-generation" }; // Advanced configuration const advancedHuggingFaceConfig = { modelId: "mistralai/Mistral-7B-Instruct-v0.2", apiToken: process.env.HUGGINGFACE_API_TOKEN, inferenceEndpoint: "https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.2", task: "text-generation", maxNewTokens: 1000, temperature: 0.7, topK: 50, topP: 0.9, typicalP: 0.95, repetitionPenalty: 1.2, retryAttempts: 3, stream: true }; // Usage example async function generateText(input) { const response = await huggingFaceComponent.generate({ input: input, systemMessage: "You are an AI assistant that provides helpful information.", modelId: "meta-llama/Llama-3-8b-chat-hf", temperature: 0.5, maxNewTokens: 500 }); return response.text; }

Use Cases

  • Open-Source AI: Build applications with open-source models
  • Specialized Tasks: Access models fine-tuned for specific domains or tasks
  • Multilingual Applications: Create solutions supporting multiple languages
  • Model Comparison: Benchmark performance across different model architectures
  • Educational Tools: Utilize smaller models for educational applications with lower costs

Best Practices

  • Choose models appropriate for your specific task
  • Balance generation parameters based on your use case
  • Use retry attempts for better reliability with rate-limited models
  • Monitor your API token usage and rate limits
  • Secure API token handling with environment variables
  • Test with smaller token limits during development
  • Consider Pro subscriptions for higher rate limits in production
  • Implement proper error handling for API failures