Class: Deepsearch::Engine::Steps::Rag::Similarity
- Inherits:
-
Object
- Object
- Deepsearch::Engine::Steps::Rag::Similarity
- Defined in:
- lib/deepsearch/engine/steps/rag/similarity.rb
Overview
Calculates and filters text chunks based on their semantic similarity to a query. It uses cosine similarity to score chunks against a query embedding and employs a two-step filtering process: first, it retrieves a fixed number of top candidates (top-k), and second, it filters these candidates based on a score relative to the best-scoring chunk.
Constant Summary collapse
- TOP_K_CANDIDATES =
75- RELATIVE_SCORE_THRESHOLD =
0.85
Instance Method Summary collapse
Instance Method Details
#find_relevant(query, chunks, threshold: RELATIVE_SCORE_THRESHOLD) ⇒ Object
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
# File 'lib/deepsearch/engine/steps/rag/similarity.rb', line 15 def find_relevant(query, chunks, threshold: RELATIVE_SCORE_THRESHOLD) return [] if chunks.empty? similarities = calculate(chunks.map(&:embedding), query.) top_candidates = top_k_with_scores(similarities, TOP_K_CANDIDATES) return [] if top_candidates.empty? best_score = top_candidates.first.first cutoff_score = best_score * threshold top_candidates.select { |score, _| score >= cutoff_score } .map { |_, index| chunks[index] } end |