Recursive text chunking strategy.

Splits text using a hierarchy of separators (paragraphs, sentences, words) to maintain semantic coherence while respecting size constraints.

The algorithm tries to split on:

  1. Double newlines (paragraphs)
  2. Single newlines
  3. Sentences (. ! ?)
  4. Words (spaces)
const chunker = new RecursiveTextChunker({
chunkSize: 1000,
chunkOverlap: 200,
minChunkSize: 100
});

const result = await chunker.chunk({
id: 'doc1',
content: largeText,
metadata: { source: 'manual.pdf' }
});

Implements

  • DocumentChunker

Constructors

Methods

Properties

Constructors

Methods

  • Split a document into chunks

    Parameters

    • document: Document

      Document to chunk

    Returns Promise<ChunkResult>

    Chunked documents with metadata

  • Split multiple documents into chunks

    Parameters

    • documents: Document[]

      Documents to chunk

    Returns Promise<ChunkResult[]>

    Array of chunk results

  • Estimate the number of chunks that will be created

    Parameters

    • document: Document

      Document to estimate

    Returns number

    Estimated number of chunks

Properties

config: ChunkerConfig

Configuration for this chunker