Documentation

Chunk Split

The Chunk Split component divides text or data into manageable chunks using various splitting strategies, with configurable chunk sizes and overlap options.

Chunk Split Component

Chunk Split interface and configuration

Component Inputs

  • Splitter Type: RecursiveCharacterTextSplitter

    The type of splitting algorithm to use

  • Chunk Size: 1000

    The size of each chunk in characters

  • Chunk Overlap: 200

    The number of overlapping characters between chunks

  • Input: Text or data to be split

    The content to be chunked

  • Separators: Custom separation points

    Define custom split points in the text

  • Code Language: python

    The programming language for code-aware splitting

  • Language: Additional language settings

    Language-specific configuration

  • Embeddings: Embedding configuration

    Settings for embedding-aware splitting

Additional Parameters

  • Breakpoint Threshold Type: percentile

    Method for determining split points

  • Breakpoint Threshold Amount: 0.5

    Threshold value for splitting

  • Number of Chunks: 5

    Target number of chunks to generate

  • Sentence Split Regex: Custom regex pattern

    Regular expression for sentence splitting

  • Buffer Size: 0

    Size of the processing buffer

Component Output

  • Data: Array of text chunks

    The resulting chunks after splitting

Implementation Example

const chunkSplitter = { splitterType: "RecursiveCharacterTextSplitter", chunkSize: 1000, chunkOverlap: 200, input: "Long text content...", separators: [" ", " ", " ", ""], codeLanguage: "python", breakpointThresholdType: "percentile", breakpointThresholdAmount: 0.5, numberOfChunks: 5 }; // Output: // { // chunks: [ // "First chunk of text...", // "Second chunk with overlap...", // "Third chunk with overlap..." // ] // }

Best Practices

  • Choose appropriate chunk sizes based on content type
  • Use meaningful overlap for context preservation
  • Consider language-specific splitting for better results
  • Test different separator patterns for optimal splitting