Class: Documentrix::Documents
- Inherits:
-
Object
- Object
- Documentrix::Documents
- Includes:
- Cache, Kramdown::ANSI::Width
- Defined in:
- lib/documentrix/documents.rb,
lib/documentrix/documents.rb,
lib/documentrix/documents/cache/redis_backed_memory_cache.rb
Defined Under Namespace
Modules: Cache, Splitters Classes: MemoryCache, RedisBackedMemoryCache, RedisCache
Constant Summary collapse
- Record =
Class.new Documentrix::Documents::Cache::Records::Record
Instance Attribute Summary collapse
-
#cache ⇒ Object
readonly
Returns the value of attribute cache.
-
#collection ⇒ Object
Returns the value of attribute collection.
-
#model ⇒ Object
readonly
Returns the value of attribute model.
-
#ollama ⇒ Object
readonly
Returns the value of attribute ollama.
Instance Method Summary collapse
-
#[](text) ⇒ Object
The [] method retrieves the value associated with the given text from the cache.
-
#[]=(text, record) ⇒ Object
The []= method sets the value for a given text in the cache.
-
#add(texts, batch_size: nil, source: nil, tags: []) ⇒ Documentrix::Documents
(also: #<<)
The method adds new texts
textsto the documents collection by processing them through various stages. -
#clear(tags: nil) ⇒ Documentrix::Documents
The clear method clears all texts from the cache or tags was given the ones tagged with the .
-
#collections ⇒ Array
The collections method returns an array of unique collection names.
-
#default_collection ⇒ :default
The default_collection method returns the default collection name.
-
#delete(text) ⇒ FalseClass, TrueClass
The delete method removes the specified text from the cache by calling the delete method on the underlying cache object.
-
#exist?(text) ⇒ FalseClass, TrueClass
The exist? method checks if the given text exists in the cache.
-
#find(string, tags: nil, prompt: nil, max_records: nil) ⇒ Array<Documentrix::Documents::Record>
The find method searches for strings within the cache by computing their similarity scores.
-
#find_where(string, text_size: nil, text_count: nil, **opts) ⇒ Array<Documentrix::Documents::Record>
The method filters the records returned by find based on text size and count.
-
#initialize(ollama:, model:, model_options: nil, collection: nil, embedding_length: 1_024, cache: MemoryCache, database_filename: nil, redis_url: nil, debug: false) ⇒ Documents
constructor
The initialize method sets up the Documentrix::Documents instance by configuring its components.
-
#size ⇒ Integer
The size method returns the number of texts stored in the cache of this Documentrix::Documents instance.
-
#tags ⇒ Documentrix::Utils::Tags
The tags method returns an array of unique tags from the cache.
Constructor Details
#initialize(ollama:, model:, model_options: nil, collection: nil, embedding_length: 1_024, cache: MemoryCache, database_filename: nil, redis_url: nil, debug: false) ⇒ Documents
The initialize method sets up the Documentrix::Documents instance by configuring its components.
37 38 39 40 41 42 43 |
# File 'lib/documentrix/documents.rb', line 37 def initialize(ollama:, model:, model_options: nil, collection: nil, embedding_length: 1_024, cache: MemoryCache, database_filename: nil, redis_url: nil, debug: false) collection ||= default_collection @ollama, @model, , @collection, @debug = ollama, model, , collection.to_sym, debug database_filename ||= ':memory:' @cache = connect_cache(cache, redis_url, , database_filename) end |
Instance Attribute Details
#cache ⇒ Object (readonly)
Returns the value of attribute cache.
52 53 54 |
# File 'lib/documentrix/documents.rb', line 52 def cache @cache end |
#collection ⇒ Object
Returns the value of attribute collection.
52 53 54 |
# File 'lib/documentrix/documents.rb', line 52 def collection @collection end |
#model ⇒ Object (readonly)
Returns the value of attribute model.
52 53 54 |
# File 'lib/documentrix/documents.rb', line 52 def model @model end |
#ollama ⇒ Object (readonly)
Returns the value of attribute ollama.
52 53 54 |
# File 'lib/documentrix/documents.rb', line 52 def ollama @ollama end |
Instance Method Details
#[](text) ⇒ Object
The [] method retrieves the value associated with the given text from the cache.
130 131 132 |
# File 'lib/documentrix/documents.rb', line 130 def [](text) @cache[key(text)] end |
#[]=(text, record) ⇒ Object
The []= method sets the value for a given text in the cache.
138 139 140 |
# File 'lib/documentrix/documents.rb', line 138 def []=(text, record) @cache[key(text)] = record end |
#add(texts, batch_size: nil, source: nil, tags: []) ⇒ Documentrix::Documents Also known as: <<
The method adds new texts texts to the documents collection by
processing them through various stages. It first filters out existing texts
from the input array using the prepare_texts method, then fetches
embeddings for each text using the specified model and options. The fetched
embeddings are used to create a new record in the cache, which is
associated with the original text and tags (if any). The method processes
the texts in batches of size , displaying progress information
in the console. It also accepts an optional string to associate
with the added texts and an array of to attach to each record. Once
all texts have been processed, it returns the Documentrix::Documents
instance itself, allowing for method chaining.
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
# File 'lib/documentrix/documents.rb', line 100 def add(texts, batch_size: nil, source: nil, tags: []) texts = prepare_texts(texts) or return self = Documentrix::Utils::Tags.new(, source:) if source .add(File.basename(source).gsub(/\?.*/, ''), source:) end batches = texts.each_slice(batch_size || 10). ( label: "Add #{truncate(tags.to_s(link: false), percentage: 25)}", total: texts.size ) batches.each do |batch| = (model:, options: , input: batch) batch.zip() do |text, | norm = @cache.norm() self[text] = Record[text:, embedding:, norm:, source:, tags: .to_a] end .progress by: batch.size end .newline self end |
#clear(tags: nil) ⇒ Documentrix::Documents
The clear method clears all texts from the cache or tags was given the ones tagged with the .
176 177 178 179 |
# File 'lib/documentrix/documents.rb', line 176 def clear(tags: nil) @cache.clear(tags:) self end |
#collections ⇒ Array
The collections method returns an array of unique collection names
228 229 230 |
# File 'lib/documentrix/documents.rb', line 228 def collections ([ default_collection ] + @cache.collections('%s-' % class_prefix)).uniq end |
#default_collection ⇒ :default
The default_collection method returns the default collection name.
48 49 50 |
# File 'lib/documentrix/documents.rb', line 48 def default_collection :default end |
#delete(text) ⇒ FalseClass, TrueClass
The delete method removes the specified text from the cache by calling the delete method on the underlying cache object.
158 159 160 |
# File 'lib/documentrix/documents.rb', line 158 def delete(text) @cache.delete(key(text)) end |
#exist?(text) ⇒ FalseClass, TrueClass
The exist? method checks if the given text exists in the cache.
147 148 149 |
# File 'lib/documentrix/documents.rb', line 147 def exist?(text) @cache.key?(key(text)) end |
#find(string, tags: nil, prompt: nil, max_records: nil) ⇒ Array<Documentrix::Documents::Record>
The find method searches for strings within the cache by computing their similarity scores.
193 194 195 196 |
# File 'lib/documentrix/documents.rb', line 193 def find(string, tags: nil, prompt: nil, max_records: nil) needle = convert_to_vector(string, prompt:) @cache.find_records(needle, tags:, max_records: nil) end |
#find_where(string, text_size: nil, text_count: nil, **opts) ⇒ Array<Documentrix::Documents::Record>
The method filters the records returned by find based on text size and count.
208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 |
# File 'lib/documentrix/documents.rb', line 208 def find_where(string, text_size: nil, text_count: nil, **opts) if text_count opts[:max_records] = text_count end records = find(string, **opts) size, count = 0, 0 records.take_while do |record| if text_size and (size += record.text.size) > text_size next false end if text_count and (count += 1) > text_count next false end true end end |
#size ⇒ Integer
The size method returns the number of texts stored in the cache of this Documentrix::Documents instance.
166 167 168 |
# File 'lib/documentrix/documents.rb', line 166 def size @cache.size end |
#tags ⇒ Documentrix::Utils::Tags
The tags method returns an array of unique tags from the cache.
235 236 237 |
# File 'lib/documentrix/documents.rb', line 235 def @cache. end |