Documentation

LM Studio Models

A drag-and-drop component for integrating with LM Studio's local LLM server. Configure model parameters and connect inputs/outputs while keeping all processing on your local machine.

LM Studio Component

LM Studio component interface and configuration

Local Setup Required: This component requires LM Studio to be installed and the local inference server to be running. Make sure you have downloaded the necessary models and started the LM Studio inference server before using this component.

Component Inputs

  • Input: Text input for the model

    Example: "Write a function in Python to calculate the Fibonacci sequence."

  • System Message: System prompt to guide model behavior

    Example: "You are a helpful programming assistant who writes efficient and well-documented code."

  • Stream: Toggle for streaming responses

    Example: true (for real-time token streaming) or false (for complete response)

  • Base URL: The URL where LM Studio server is running

    Example: "http://localhost:1234/v1" (Default for local installation)

  • LM Studio API Key: Your API authentication key

    Example: "lmstudio-xxx" (if configured in LM Studio)

  • Model Kwargs: Additional model parameters

    Example: top_p: 0.9, frequency_penalty: 0.2

  • Model Name: Selected model identifier

    Example: "TinyLlama-1.1B" or "Llama-3-8B"

Component Outputs

  • Text: Generated text output

    Example: "```python\ndef fibonacci(n):\n a, b = 0, 1\n for _ in range(n):\n yield a\n a, b = b, a + b\n```"

  • Language Model: Model information and metadata

    Example: model: TinyLlama-1.1B, usage: {prompt_tokens: 42, completion_tokens: 78, total_tokens: 120}

Generation Parameters

Max Tokens

Maximum number of tokens to generate in the response

Default: 2048 Range: 1 to model maximum Recommendation: Set based on expected response length

Temperature

Controls randomness in the output - higher values increase creativity

Default: 0.1 Range: 0.0 to 2.0 Recommendation: Lower (0.0-0.3) for factual/consistent responses, Higher (0.7-1.0) for creative tasks

Seed

Random seed for reproducible outputs

Default: 1 Range: Any integer Recommendation: Set specific values for reproducible results

Advanced Parameters

Top P

Nucleus sampling parameter - controls diversity of generated text

Default: 0.9 Range: 0.0 to 1.0 Recommendation: Lower values (e.g., 0.5) for more focused text generation

Frequency Penalty

Reduces repetition by penalizing tokens based on their frequency

Default: 0.0 Range: 0.0 to 2.0 Recommendation: Higher values (0.5-1.0) to reduce repetition

Presence Penalty

Penalizes tokens that have already appeared in the text

Default: 0.0 Range: 0.0 to 2.0 Recommendation: Higher values (0.5-1.0) to encourage topic diversity

Implementation Example

// Basic configuration const lmStudioConfig = { baseUrl: "http://localhost:1234/v1", modelName: "TinyLlama-1.1B", systemMessage: "You are a helpful assistant." }; // Advanced configuration const advancedLMStudioConfig = { baseUrl: "http://localhost:1234/v1", modelName: "Llama-3-8B", maxTokens: 1000, temperature: 0.5, stream: true, seed: 42, modelKwargs: { top_p: 0.9, frequency_penalty: 0.3, presence_penalty: 0.3 } }; // Usage example async function generateCode(input) { const response = await lmStudioComponent.generate({ input: input, systemMessage: "You are an expert programmer. Write clean, well-documented code.", temperature: 0.2, maxTokens: 500 }); return response.text; }

Use Cases

  • Privacy-Focused Applications: Process sensitive data locally without sending to external APIs
  • Offline Development: Create AI-powered applications that work without internet connectivity
  • Cost-Effective Solutions: Eliminate API costs by running models locally
  • Low-Latency Applications: Reduce response time by eliminating network latency
  • Model Experimentation: Test different models and parameters in a consistent environment

Best Practices

  • Ensure LM Studio server is running before attempting to connect
  • Verify the correct base URL in your configuration
  • Use consistent seeds for reproducible results during testing
  • Start with low temperature values (0.1-0.3) for predictable outputs
  • Monitor system resources when running larger models
  • Set appropriate token limits based on your hardware capabilities
  • Test with streaming enabled for better user experience with long outputs
  • Implement proper error handling for cases when the server is unavailable