Ollama Models
A drag-and-drop component for running local LLMs through Ollama. Configure model parameters and connect inputs/outputs to other components while keeping all processing on your machine.

Ollama component interface and configuration
Local Setup Required: This component requires Ollama to be installed and running on your machine or a remote server you can access. Ensure you have downloaded the necessary models before using this component.
Component Inputs
- Base URL: The URL where Ollama is running
Example: "http://localhost:11434" (Default for local installation)
- Template: Custom prompt template for model instructions
Example: "[INST] input [/INST]" (For Llama-based models)
- Format: Response format specification
Example: "json" (To force JSON output from supported models)
- System: System prompt to guide model behavior
Example: "You are a helpful AI assistant that answers questions concisely."
- Input: User input text
Example: "Explain the concept of transfer learning in machine learning."
Component Outputs
- Text: Generated text output
Example: "Transfer learning is a machine learning technique where a model developed for one task is reused as the starting point for a model on a second task..."
- Language Model: Model information and metadata
Example: model: llama3:8b, created_at: 2024-07-07T12:34:56.789Z, done: true
Model Parameters
Temperature
Controls randomness in the output - higher values increase creativity
Default: 0.7
Range: 0.0 to 2.0
Recommendation: Lower (0.1-0.3) for factual/consistent responses, Higher (0.7-1.0) for creative tasks
Context Window Size
Maximum number of tokens to use from the context
Default: Model-dependent
Range: 1 to model maximum (e.g., 8192 for Llama3 8B)
Recommendation: Adjust based on available system memory
Number of GPU
Number of GPUs to use for inference
Default: All available
Range: 0 to number of available GPUs
Recommendation: Use all available GPUs for optimal performance
Number of Threads
CPU threads to utilize
Default: Auto-detected
Range: 1 to available CPU threads
Recommendation: Set to number of physical cores for balance of performance and system responsiveness
Advanced Settings
Mirostat
Adaptive sampling algorithm for controlling perplexity
Options: Disabled (0), Enabled (1), v2 (2)
Default: Disabled
Recommendation: Enable for more consistent quality output
Mirostat Eta
Learning rate for mirostat algorithm
Default: 0.1
Range: 0.0 to 1.0
Recommendation: Start with default and adjust if needed
Mirostat Tau
Target entropy for mirostat algorithm
Default: 5.0
Range: 0.0 to 10.0
Recommendation: Lower values (3-5) for more focused text
Repeat Penalty
Penalty for repeated token sequences
Default: 1.1
Range: 1.0 to 2.0
Recommendation: Higher values (1.2-1.5) to reduce repetition
Top K
Limits vocabulary for each generation step to k most likely tokens
Default: 40
Range: 0 (disabled) to any positive integer
Recommendation: 40-100 for balanced diversity
Top P
Nucleus sampling - considers tokens with cumulative probability p
Default: 0.9
Range: 0.0 to 1.0
Recommendation: 0.9-0.95 for most use cases
Implementation Example
// Basic configuration
const ollamaConfig = {
baseUrl: "http://localhost:11434",
model: "llama3",
system: "You are a helpful programming assistant."
};
// Advanced configuration
const advancedOllamaConfig = {
baseUrl: "http://localhost:11434",
model: "mistral",
temperature: 0.5,
numGpu: 1,
numThread: 8,
repeatPenalty: 1.2,
topK: 50,
topP: 0.9,
mirostat: 2,
mirostatEta: 0.1,
mirostatTau: 5.0,
template: "{{system}}\n\nUser: {{input}}\n\nAssistant:"
};
// Usage example
async function generateResponse(input) {
const response = await ollamaComponent.generate({
input: input,
system: "You are an AI assistant that explains complex concepts simply.",
temperature: 0.3
});
return response.text;
}
Use Cases
- Privacy-Focused Applications: Use local LLMs for sensitive data that shouldn't leave your system
- Offline Development: Create AI-powered applications that work without internet connectivity
- Cost-Effective Solutions: Eliminate API costs by running models locally
- Low-Latency Requirements: Reduce response time by eliminating network latency
- Custom Model Integration: Run specialized or fine-tuned models not available on cloud services
Useful Resources
Best Practices
- Ensure Ollama is running before attempting to connect
- Adjust thread count based on your CPU capabilities
- Configure GPU usage appropriately for your hardware
- Test with different sampling methods to find the best for your use case
- Use smaller models (7B-13B) for faster responses or on limited hardware
- Consider using quantized models (Q4_K_M) to reduce memory requirements
- Monitor system resource usage when running larger models