Module: OllamaChat::SourceFetching
- Included in:
- Chat
- Defined in:
- lib/ollama_chat/source_fetching.rb
Instance Method Summary collapse
-
#add_image(images, source_io, source) ⇒ Object
Adds an image to the images collection from the given source IO and source identifier.
-
#embed(source) ⇒ String?
Embeds content from the specified source.
-
#embed_source(source_io, source, count: nil) ⇒ Array, ...
Embeds content from the given source IO and source identifier.
-
#fetch_source(source) {|tmp| ... } ⇒ Object
The fetch_source method retrieves content from various source types including commands, URLs, and file paths.
-
#http_options(url) ⇒ Hash
The http_options method prepares HTTP options for requests based on configuration settings.
-
#import(source) ⇒ String?
Imports content from the specified source and processes it.
-
#import_source(source_io, source) ⇒ String
The import_source method processes and imports content from a given source, displaying information about the document type and returning a formatted string that indicates the import result along with the parsed content.
-
#summarize(source, words: nil) ⇒ String?
Summarizes content from the specified source.
-
#summarize_source(source_io, source, words: nil) ⇒ String?
Summarizes content from the given source IO and source identifier.
Instance Method Details
#add_image(images, source_io, source) ⇒ Object
Adds an image to the images collection from the given source IO and source identifier.
This method takes an IO object containing image data and associates it with a source, creating an Ollama::Image instance and adding it to the images array.
72 73 74 75 76 |
# File 'lib/ollama_chat/source_fetching.rb', line 72 def add_image(images, source_io, source) STDERR.puts "Adding #{source_io&.content_type} image #{source.to_s.inspect}." image = Ollama::Image.for_io(source_io, path: source.to_s) (images << image).uniq! end |
#embed(source) ⇒ String?
Embeds content from the specified source.
This method fetches content from a given source (command, URL, or file) and processes it for embedding using the embed_source method. If embedding is disabled, it falls back to generating a summary instead.
or file path
nil if the operation fails
220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 |
# File 'lib/ollama_chat/source_fetching.rb', line 220 def (source) if .on? STDOUT.puts "Now embedding #{source.to_s.inspect}." fetch_source(source) do |source_io| content = parse_source(source_io) content.present? or return source_io.rewind (source_io, source) end config.prompts. % { source: } else STDOUT.puts "Embedding is off, so I will just give a small summary of this source." summarize(source) end end |
#embed_source(source_io, source, count: nil) ⇒ Array, ...
Embeds content from the given source IO and source identifier.
This method processes document content by splitting it into chunks using various splitting strategies (Character, RecursiveCharacter, Semantic) and adds the chunks to a document store for embedding.
nil if embedding is disabled or fails
162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 |
# File 'lib/ollama_chat/source_fetching.rb', line 162 def (source_io, source, count: nil) .on? or return parse_source(source_io) m = "Embedding #{italic { source_io&.content_type }} document #{source.to_s.inspect}." if count STDOUT.puts '%u. %s' % [ count, m ] else STDOUT.puts m end text = parse_source(source_io) or return text.downcase! splitter_config = config..splitter inputs = nil case splitter_config.name when 'Character' splitter = Documentrix::Documents::Splitters::Character.new( chunk_size: splitter_config.chunk_size, ) inputs = splitter.split(text) when 'RecursiveCharacter' splitter = Documentrix::Documents::Splitters::RecursiveCharacter.new( chunk_size: splitter_config.chunk_size, ) inputs = splitter.split(text) when 'Semantic' splitter = Documentrix::Documents::Splitters::Semantic.new( ollama:, model: config..model.name, chunk_size: splitter_config.chunk_size, ) inputs = splitter.split( text, breakpoint: splitter_config.breakpoint.to_sym, percentage: splitter_config.percentage?, percentile: splitter_config.percentile?, ) end inputs or return source = source.to_s if source.start_with?(?!) source = Kramdown::ANSI::Width.truncate( source[1..-1].gsub(/\W+/, ?_), length: 10 ) end @documents.add(inputs, source:, batch_size: config..batch_size?) end |
#fetch_source(source) {|tmp| ... } ⇒ Object
The fetch_source method retrieves content from various source types including commands, URLs, and file paths. It processes the source based on its type and yields a temporary file handle for further processing.
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
# File 'lib/ollama_chat/source_fetching.rb', line 30 def fetch_source(source, &block) case source when %r(\A!(.*)) command = $1 OllamaChat::Utils::Fetcher.execute(command) do |tmp| block.(tmp) end when %r(\Ahttps?://\S+) links.add(source.to_s) OllamaChat::Utils::Fetcher.get( source, headers: config.request_headers?.to_h, cache: @cache, debug: config.debug, http_options: (OllamaChat::Utils::Fetcher.normalize_url(source)) ) do |tmp| block.(tmp) end when %r(\Afile://(/\S*?)#|\A((?:\.\.|[~.]?)/\S*)) filename = $~.captures.compact.first filename = File.(filename) OllamaChat::Utils::Fetcher.read(filename) do |tmp| block.(tmp) end else raise "invalid source #{source.inspect}" end rescue => e STDERR.puts "Cannot fetch source #{source.to_s.inspect}: #{e.class} #{e}\n#{e.backtrace * ?\n}" end |
#http_options(url) ⇒ Hash
The http_options method prepares HTTP options for requests based on configuration settings. It determines whether SSL peer verification should be disabled for a given URL and whether a proxy should be used, then returns a hash of options.
proxy settings
11 12 13 14 15 16 17 18 19 20 21 |
# File 'lib/ollama_chat/source_fetching.rb', line 11 def (url) = {} if ssl_no_verify = config.ssl_no_verify? hostname = URI.parse(url).hostname |= { ssl_verify_peer: !ssl_no_verify.include?(hostname) } end if proxy = config.proxy? |= { proxy: } end end |
#import(source) ⇒ String?
Imports content from the specified source and processes it.
This method fetches content from a given source (command, URL, or file) and passes the resulting IO object to the import_source method for processing.
or file path
105 106 107 108 109 110 111 |
# File 'lib/ollama_chat/source_fetching.rb', line 105 def import(source) fetch_source(source) do |source_io| content = import_source(source_io, source) or return source_io.rewind content end end |
#import_source(source_io, source) ⇒ String
The import_source method processes and imports content from a given source, displaying information about the document type and returning a formatted string that indicates the import result along with the parsed content.
parsed content
87 88 89 90 91 92 93 |
# File 'lib/ollama_chat/source_fetching.rb', line 87 def import_source(source_io, source) source = source.to_s document_type = source_io&.content_type.full? { |ct| italic { ct } + ' ' } STDOUT.puts "Importing #{document_type}document #{source.to_s.inspect} now." source_content = parse_source(source_io) "Imported #{source.inspect}:\n\n#{source_content}\n\n" end |
#summarize(source, words: nil) ⇒ String?
Summarizes content from the specified source.
This method fetches content from a given source (command, URL, or file) and generates a summary using the summarize_source method.
141 142 143 144 145 146 147 |
# File 'lib/ollama_chat/source_fetching.rb', line 141 def summarize(source, words: nil) fetch_source(source) do |source_io| content = summarize_source(source_io, source, words:) or return source_io.rewind content end end |
#summarize_source(source_io, source, words: nil) ⇒ String?
Summarizes content from the given source IO and source identifier.
This method takes an IO object containing document content and generates a summary based on the configured prompt template and word count.
123 124 125 126 127 128 129 130 |
# File 'lib/ollama_chat/source_fetching.rb', line 123 def summarize_source(source_io, source, words: nil) STDOUT.puts "Summarizing #{italic { source_io&.content_type }} document #{source.to_s.inspect} now." words = words.to_i words < 1 and words = 100 source_content = parse_source(source_io) source_content.present? or return config.prompts.summarize % { source_content:, words: } end |