Chunk Split Component
The Chunk Split component provides intelligent text splitting capabilities for processing large documents. It supports multiple splitting strategies, customizable chunk sizes, and advanced splitting configurations while maintaining context and semantic meaning.

Chunk Split workflow and architecture
Configuration Parameters
Required Parameters
- input: Text content to split
- splitterType: Type of splitting algorithm
- character
- token
- sentence
- paragraph
- semantic
- code
Optional Parameters
- chunkSize: Size of each chunk (default: 1000)
- chunkOverlap: Overlap between chunks (default: 200)
- separators: Custom text separators
- codeLanguage: Programming language for code splitting
- embeddings: Embedding configuration for semantic splitting
- breakpointThresholdType: Type of threshold for splits
- tokens
- characters
- sentences
- semantic_similarity
- breakpointThresholdAmount: Threshold value
- numberOfChunks: Target number of chunks
- sentenceSplitRegex: Custom regex for sentence splitting
- bufferSize: Memory buffer size for large texts
Output Format
{ "data": { "chunks": [ { "text": string, "index": number, "metadata": { "start_char": number, "end_char": number, "tokens": number, "embedding": array (optional) } } ], "statistics": { "total_chunks": number, "average_chunk_size": number, "overlap_percentage": number, "processing_time": number }, "analysis": { "semantic_coherence": number, "context_preservation": number, "chunk_distribution": { "min_size": number, "max_size": number, "std_dev": number } } } }
Features
- Multiple splitting strategies
- Semantic preservation
- Code-aware splitting
- Custom separators
- Overlap control
- Memory efficiency
- Statistical analysis
- Embedding support
Note: Choose appropriate chunk sizes based on your embedding model's token limits. Consider memory usage for large documents.
Tip: Use semantic splitting for natural language content and code-aware splitting for source code files.