Ollama Models

A drag-and-drop component for running local LLMs through Ollama. Configure model parameters and connect inputs/outputs to other components while keeping all processing on your machine.

Ollama component interface and configuration

Local Setup Required: This component requires Ollama to be installed and running on your machine or a remote server you can access. Ensure you have downloaded the necessary models before using this component.

Component Inputs

Base URL: The URL where Ollama is running
Example: "http://localhost:11434" (Default for local installation)
Template: Custom prompt template for model instructions
Example: "[INST] input [/INST]" (For Llama-based models)
Format: Response format specification
Example: "json" (To force JSON output from supported models)
System: System prompt to guide model behavior
Example: "You are a helpful AI assistant that answers questions concisely."
Input: User input text
Example: "Explain the concept of transfer learning in machine learning."

Component Outputs

Text: Generated text output
Example: "Transfer learning is a machine learning technique where a model developed for one task is reused as the starting point for a model on a second task..."
Language Model: Model information and metadata
Example: model: llama3:8b, created_at: 2024-07-07T12:34:56.789Z, done: true

Model Parameters

Temperature

Controls randomness in the output - higher values increase creativity

Default: 0.7
Range: 0.0 to 2.0
Recommendation: Lower (0.1-0.3) for factual/consistent responses, Higher (0.7-1.0) for creative tasks

Context Window Size

Maximum number of tokens to use from the context

Default: Model-dependent
Range: 1 to model maximum (e.g., 8192 for Llama3 8B)
Recommendation: Adjust based on available system memory

Number of GPU

Number of GPUs to use for inference

Default: All available
Range: 0 to number of available GPUs
Recommendation: Use all available GPUs for optimal performance

Number of Threads

CPU threads to utilize

Default: Auto-detected
Range: 1 to available CPU threads
Recommendation: Set to number of physical cores for balance of performance and system responsiveness

Advanced Settings

Mirostat

Adaptive sampling algorithm for controlling perplexity

Options: Disabled (0), Enabled (1), v2 (2)
Default: Disabled
Recommendation: Enable for more consistent quality output

Mirostat Eta

Learning rate for mirostat algorithm

Default: 0.1
Range: 0.0 to 1.0
Recommendation: Start with default and adjust if needed

Mirostat Tau

Target entropy for mirostat algorithm

Default: 5.0
Range: 0.0 to 10.0
Recommendation: Lower values (3-5) for more focused text

Repeat Penalty

Penalty for repeated token sequences

Default: 1.1
Range: 1.0 to 2.0
Recommendation: Higher values (1.2-1.5) to reduce repetition

Top K

Limits vocabulary for each generation step to k most likely tokens

Default: 40
Range: 0 (disabled) to any positive integer
Recommendation: 40-100 for balanced diversity

Top P

Nucleus sampling - considers tokens with cumulative probability p

Default: 0.9
Range: 0.0 to 1.0
Recommendation: 0.9-0.95 for most use cases

Implementation Example

// Basic configuration
const ollamaConfig = {
  baseUrl: "http://localhost:11434",
  model: "llama3",
  system: "You are a helpful programming assistant."
};

// Advanced configuration
const advancedOllamaConfig = {
  baseUrl: "http://localhost:11434",
  model: "mistral",
  temperature: 0.5,
  numGpu: 1,
  numThread: 8,
  repeatPenalty: 1.2,
  topK: 50,
  topP: 0.9,
  mirostat: 2,
  mirostatEta: 0.1,
  mirostatTau: 5.0,
  template: "{{system}}\n\nUser: {{input}}\n\nAssistant:"
};

// Usage example
async function generateResponse(input) {
  const response = await ollamaComponent.generate({
    input: input,
    system: "You are an AI assistant that explains complex concepts simply.",
    temperature: 0.3
  });
  
  return response.text;
}

Use Cases

Privacy-Focused Applications: Use local LLMs for sensitive data that shouldn't leave your system
Offline Development: Create AI-powered applications that work without internet connectivity
Cost-Effective Solutions: Eliminate API costs by running models locally
Low-Latency Requirements: Reduce response time by eliminating network latency
Custom Model Integration: Run specialized or fine-tuned models not available on cloud services

Useful Resources

Best Practices

Ensure Ollama is running before attempting to connect
Adjust thread count based on your CPU capabilities
Configure GPU usage appropriately for your hardware
Test with different sampling methods to find the best for your use case
Use smaller models (7B-13B) for faster responses or on limited hardware
Consider using quantized models (Q4_K_M) to reduce memory requirements
Monitor system resource usage when running larger models

Documentation