Class: Deepsearch::Engine::Steps::Rag::Chunker
- Inherits:
-
Object
- Object
- Deepsearch::Engine::Steps::Rag::Chunker
- Defined in:
- lib/deepsearch/engine/steps/rag/chunker.rb
Overview
Splits a large piece of text content into smaller, overlapping chunks. This is a prerequisite for generating embeddings and performing similarity searches in a RAG pipeline.
Constant Summary collapse
- MAX_CHUNK_SIZE =
7500- OVERLAP_SIZE =
300
Instance Method Summary collapse
Instance Method Details
#chunk(content) ⇒ Object
13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
# File 'lib/deepsearch/engine/steps/rag/chunker.rb', line 13 def chunk(content) return [Values::Chunk.new(text: content)] if content.length <= MAX_CHUNK_SIZE chunks = [] step = MAX_CHUNK_SIZE - OVERLAP_SIZE i = 0 while i < content.length chunk_text = content.slice(i, MAX_CHUNK_SIZE) chunks << Values::Chunk.new(text: chunk_text) i += step end chunks end |