Class: Deepsearch::Engine::Steps::Rag::Similarity

Inherits:
Object
  • Object
show all
Defined in:
lib/deepsearch/engine/steps/rag/similarity.rb

Overview

Calculates and filters text chunks based on their semantic similarity to a query. It uses cosine similarity to score chunks against a query embedding and employs a two-step filtering process: first, it retrieves a fixed number of top candidates (top-k), and second, it filters these candidates based on a score relative to the best-scoring chunk.

Constant Summary collapse

TOP_K_CANDIDATES =
75
RELATIVE_SCORE_THRESHOLD =
0.85

Instance Method Summary collapse

Instance Method Details

#find_relevant(query, chunks, threshold: RELATIVE_SCORE_THRESHOLD) ⇒ Object



15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# File 'lib/deepsearch/engine/steps/rag/similarity.rb', line 15

def find_relevant(query, chunks, threshold: RELATIVE_SCORE_THRESHOLD)
  return [] if chunks.empty?

  similarities = calculate(chunks.map(&:embedding), query.embedding)

  top_candidates = top_k_with_scores(similarities, TOP_K_CANDIDATES)

  return [] if top_candidates.empty?

  best_score = top_candidates.first.first
  cutoff_score = best_score * threshold

  top_candidates.select { |score, _| score >= cutoff_score }
                .map { |_, index| chunks[index] }
end