Question 1

What is RAG chunking and why does it matter?

Accepted Answer

Retrieval-Augmented Generation (RAG) pipelines split documents into chunks, embed each chunk as a vector, and retrieve the most relevant ones to ground an LLM's answer. Chunk quality directly affects retrieval quality: chunks that are too large dilute relevance, too small lose context, and chunks that cut across topics retrieve poorly. This tool produces clean, semantically coherent chunks.

Question 2

How do the chunking strategies differ?

Accepted Answer

"By heading" starts a new chunk at each heading of the chosen depth, keeping each section intact. "By token size" splits purely on a token budget, ignoring structure. "Heading + token cap" — the recommended default — splits on headings but further divides any section that exceeds the token limit, giving you semantically coherent chunks that still respect a maximum size.

Question 3

What does the overlap setting do?

Accepted Answer

Overlap carries the last few sentences/paragraphs (approximately the specified token count) from one chunk into the start of the next. This preserves context across boundaries so a query whose answer straddles two chunks can still retrieve the full context. A small overlap (e.g. 50–100 tokens) is common.

Question 4

What metadata is attached to each chunk?

Accepted Answer

Each chunk gets a stable ID derived from the source name and its index, the source filename, the heading path (the chain of headings leading to that chunk, useful for context and filtering), and an estimated token count. This metadata is essential for citation, filtering, and debugging your retrieval pipeline.

Question 5

Which export format should I use?

Accepted Answer

Use JSON for a single array you can load programmatically, JSONL for streaming or bulk-loading into a vector database (one object per line), or Markdown with YAML front-matter if you prefer human-readable chunk files. All three contain the same chunk content and metadata.

Question 6

Is my document uploaded anywhere?

Accepted Answer

No. Chunking, token estimation, and export all run locally in your browser, so even proprietary knowledge-base content stays on your device.

Markdown RAG Formatter

What This Tool Does

Chunking Strategies

Overlap & Context

Chunk Metadata

Export Formats

Tips

Frequently Asked Questions