Module: OllamaChat::SourceFetching
- Included in:
- Chat
- Defined in:
- lib/ollama_chat/source_fetching.rb
Overview
A module that provides functionality for fetching and processing various types of content sources.
The SourceFetching module encapsulates methods for retrieving content from different source types including URLs, file paths, and shell commands. It handles the logic for determining the appropriate fetching method based on the source identifier and processes the retrieved content through specialized parsers depending on the content type. The module also manages image handling, document importing, summarizing, and embedding operations while providing error handling and debugging capabilities.
Instance Method Summary collapse
-
#add_image(images, source_io, source) ⇒ Object
Adds an image to the images collection from the given source IO and source identifier.
-
#embed(source) ⇒ String?
Embeds content from the specified source.
-
#embed_source(source_io, source, count: nil) ⇒ Array, ...
Embeds content from the given source IO and source identifier.
-
#fetch_source(source) {|tmp| ... } ⇒ Object
The fetch_source method retrieves content from various source types including commands, URLs, and file paths.
-
#http_options(url) ⇒ Hash
The http_options method prepares HTTP options for requests based on configuration settings.
-
#import(source) ⇒ String?
Imports content from the specified source and processes it.
-
#import_source(source_io, source) ⇒ String
The import_source method processes and imports content from a given source, displaying information about the document type and returning a formatted string that indicates the import result along with the parsed content.
-
#summarize(source, words: nil) ⇒ String?
Summarizes content from the specified source.
-
#summarize_source(source_io, source, words: nil) ⇒ String?
Summarizes content from the given source IO and source identifier.
Instance Method Details
#add_image(images, source_io, source) ⇒ Object
Adds an image to the images collection from the given source IO and source identifier.
This method takes an IO object containing image data and associates it with a source, creating an Ollama::Image instance and adding it to the images array.
97 98 99 100 101 |
# File 'lib/ollama_chat/source_fetching.rb', line 97 def add_image(images, source_io, source) STDERR.puts "Adding #{source_io&.content_type} image #{source.to_s.inspect}." image = Ollama::Image.for_io(source_io, path: source.to_s) (images << image).uniq! end |
#embed(source) ⇒ String?
Embeds content from the specified source.
This method fetches content from a given source (command, URL, or file) and processes it for embedding using the embed_source method. If embedding is disabled, it falls back to generating a summary instead.
or file path
nil if the operation fails
243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 |
# File 'lib/ollama_chat/source_fetching.rb', line 243 def (source) if @embedding.on? STDOUT.puts "Now embedding #{source.to_s.inspect}." fetch_source(source) do |source_io| content = parse_source(source_io) content.present? or return source_io.rewind (source_io, source) end config.prompts. % { source: } else STDOUT.puts "Embedding is off, so I will just give a small summary of this source." summarize(source) end end |
#embed_source(source_io, source, count: nil) ⇒ Array, ...
Embeds content from the given source IO and source identifier.
This method processes document content by splitting it into chunks using various splitting strategies (Character, RecursiveCharacter, Semantic) and adds the chunks to a document store for embedding.
nil if embedding is disabled or fails
185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 |
# File 'lib/ollama_chat/source_fetching.rb', line 185 def (source_io, source, count: nil) @embedding.on? or return parse_source(source_io) m = "Embedding #{italic { source_io&.content_type }} document #{source.to_s.inspect}." if count STDOUT.puts '%u. %s' % [ count, m ] else STDOUT.puts m end text = parse_source(source_io) or return text.downcase! splitter_config = config..splitter inputs = nil case splitter_config.name when 'Character' splitter = Documentrix::Documents::Splitters::Character.new( chunk_size: splitter_config.chunk_size, ) inputs = splitter.split(text) when 'RecursiveCharacter' splitter = Documentrix::Documents::Splitters::RecursiveCharacter.new( chunk_size: splitter_config.chunk_size, ) inputs = splitter.split(text) when 'Semantic' splitter = Documentrix::Documents::Splitters::Semantic.new( ollama:, model: config..model.name, chunk_size: splitter_config.chunk_size, ) inputs = splitter.split( text, breakpoint: splitter_config.breakpoint.to_sym, percentage: splitter_config.percentage?, percentile: splitter_config.percentile?, ) end inputs or return source = source.to_s if source.start_with?(?!) source = Kramdown::ANSI::Width.truncate( source[1..-1].gsub(/\W+/, ?_), length: 10 ) end @documents.add(inputs, source:, batch_size: config..batch_size?) end |
#fetch_source(source) {|tmp| ... } ⇒ Object
The fetch_source method retrieves content from various source types including commands, URLs, and file paths. It processes the source based on its type and yields a temporary file handle for further processing.
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
# File 'lib/ollama_chat/source_fetching.rb', line 55 def fetch_source(source, &block) case source when %r(\A!(.*)) command = $1 OllamaChat::Utils::Fetcher.execute(command) do |tmp| block.(tmp) end when %r(\Ahttps?://\S+) links.add(source.to_s) OllamaChat::Utils::Fetcher.get( source, headers: config.request_headers?.to_h, cache: @cache, debug: config.debug, http_options: (OllamaChat::Utils::Fetcher.normalize_url(source)) ) do |tmp| block.(tmp) end when %r(\Afile://(/\S*?)#|\A((?:\.\.|[~.]?)/\S*)) filename = $~.captures.compact.first filename = File.(filename) OllamaChat::Utils::Fetcher.read(filename) do |tmp| block.(tmp) end else raise "invalid source #{source.inspect}" end rescue => e STDERR.puts "Cannot fetch source #{source.to_s.inspect}: #{e.class} #{e}\n#{e.backtrace * ?\n}" end |
#http_options(url) ⇒ Hash
The http_options method prepares HTTP options for requests based on configuration settings. It determines whether SSL peer verification should be disabled for a given URL and whether a proxy should be used, then returns a hash of options.
proxy settings
36 37 38 39 40 41 42 43 44 45 46 |
# File 'lib/ollama_chat/source_fetching.rb', line 36 def (url) = {} if ssl_no_verify = config.ssl_no_verify? hostname = URI.parse(url).hostname |= { ssl_verify_peer: !ssl_no_verify.include?(hostname) } end if proxy = config.proxy? |= { proxy: } end end |
#import(source) ⇒ String?
Imports content from the specified source and processes it.
This method fetches content from a given source (command, URL, or file) and passes the resulting IO object to the import_source method for processing.
or file path
130 131 132 133 134 135 136 |
# File 'lib/ollama_chat/source_fetching.rb', line 130 def import(source) fetch_source(source) do |source_io| content = import_source(source_io, source) or return source_io.rewind content end end |
#import_source(source_io, source) ⇒ String
The import_source method processes and imports content from a given source, displaying information about the document type and returning a formatted string that indicates the import result along with the parsed content.
parsed content
112 113 114 115 116 117 118 |
# File 'lib/ollama_chat/source_fetching.rb', line 112 def import_source(source_io, source) source = source.to_s document_type = source_io&.content_type.full? { |ct| italic { ct } + ' ' } STDOUT.puts "Importing #{document_type}document #{source.to_s.inspect} now." source_content = parse_source(source_io) "Imported #{source.inspect}:\n\n#{source_content}\n\n" end |
#summarize(source, words: nil) ⇒ String?
Summarizes content from the specified source.
This method fetches content from a given source (command, URL, or file) and generates a summary using the summarize_source method.
165 166 167 168 169 170 171 |
# File 'lib/ollama_chat/source_fetching.rb', line 165 def summarize(source, words: nil) fetch_source(source) do |source_io| content = summarize_source(source_io, source, words:) or return source_io.rewind content end end |
#summarize_source(source_io, source, words: nil) ⇒ String?
Summarizes content from the given source IO and source identifier.
This method takes an IO object containing document content and generates a summary based on the configured prompt template and word count.
147 148 149 150 151 152 153 154 |
# File 'lib/ollama_chat/source_fetching.rb', line 147 def summarize_source(source_io, source, words: nil) STDOUT.puts "Summarizing #{italic { source_io&.content_type }} document #{source.to_s.inspect} now." words = words.to_i words < 1 and words = 100 source_content = parse_source(source_io) source_content.present? or return config.prompts.summarize % { source_content:, words: } end |