Class: Deepsearch::Engine::Steps::Rag::Chunker

Inherits:
Object
  • Object
show all
Defined in:
lib/deepsearch/engine/steps/rag/chunker.rb

Overview

Splits a large piece of text content into smaller, overlapping chunks. This is a prerequisite for generating embeddings and performing similarity searches in a RAG pipeline.

Constant Summary collapse

MAX_CHUNK_SIZE =
7500
OVERLAP_SIZE =
300

Instance Method Summary collapse

Instance Method Details

#chunk(content) ⇒ Object



13
14
15
16
17
18
19
20
21
22
23
24
25
26
# File 'lib/deepsearch/engine/steps/rag/chunker.rb', line 13

def chunk(content)
  return [Values::Chunk.new(text: content)] if content.length <= MAX_CHUNK_SIZE

  chunks = []
  step = MAX_CHUNK_SIZE - OVERLAP_SIZE

  i = 0
  while i < content.length
    chunk_text = content.slice(i, MAX_CHUNK_SIZE)
    chunks << Values::Chunk.new(text: chunk_text)
    i += step
  end
  chunks
end