Chunking
Chunking splits a file into smaller pieces before embedding. This is useful when files are long, when you want late interaction over chunks, or when you want chunk-aware aggregation.
Parameters
chunking_methodchunk_sizechunk_overlapmax_chunkschunk_languagechunker_optionschunk_aggregation
Public Methods
noneNo chunking. The whole file is treated as one document.codeCode-aware chunking through Chonkie’s code chunker.chonkie_tokenToken-based chunking.chonkie_sentenceSentence/statement-oriented chunking.chonkie_recursiveRecursive chunking for progressively splitting larger text.chonkie_fastA lightweight Chonkie path for simple chunking workflows.
Parameter Details
chunk_sizeSize hint passed to chunkers that support a target chunk length.chunk_overlapOverlap hint between adjacent chunks.max_chunksMaximum number of chunks kept per file.0means keep all chunks.chunk_languageLanguage hint used by language-aware chunkers, especially code chunking.chunker_optionsExtraname=valueoptions forwarded to the underlying chunker.chunk_aggregationHow chunk embeddings are reduced for single-vector scoring. Common choices:mean,max.
Install
pip install "matheel[chunking]"
pip install "matheel[chunking_code]"
Behavior Notes
- Chunking is language-agnostic at the interface level.
- Code-aware chunkers become stronger when
chunk_languagematches the source language. - If a Chonkie-backed method is selected and Chonkie is not installed, Matheel raises an import error instead of silently switching methods.
CLI Example
python examples/sample_data.py --output sample_pairs.zip --overwrite
matheel compare sample_pairs.zip \
--chunking-method code \
--chunk-language java \
--chunk-size 120 \
--chunk-overlap 20 \
--max-chunks 8 \
--chunker-option include_line_numbers=true
Python Example
from matheel.chunking import chunk_text
chunks = chunk_text(
"public class Demo { int add(int a, int b) { return a + b; } }",
method="code",
chunk_size=80,
chunk_overlap=10,
max_chunks=4,
chunk_language="java",
chunker_options={"include_line_numbers": True},
)
print(chunks)