Module: OllamaChat::SourceFetching

Included in:
Chat
Defined in:
lib/ollama_chat/source_fetching.rb

Overview

A module that provides functionality for fetching and processing various types of content sources.

The SourceFetching module encapsulates methods for retrieving content from different source types including URLs, file paths, and shell commands. It handles the logic for determining the appropriate fetching method based on the source identifier and processes the retrieved content through specialized parsers depending on the content type. The module also manages image handling, document importing, summarizing, and embedding operations while providing error handling and debugging capabilities.

Examples:

Fetching content from a URL

chat.fetch_source('https://example.com/document.html') do |source_io|
  # Process the fetched content
end

Importing a local file

chat.fetch_source('/path/to/local/file.txt') do |source_io|
  # Process the imported file content
end

Executing a shell command

chat.fetch_source('!ls -la') do |source_io|
  # Process the command output
end

Instance Method Summary collapse

Instance Method Details

#add_image(images, source_io, source) ⇒ Object

Adds an image to the images collection from the given source IO and source identifier.

This method takes an IO object containing image data and associates it with a source, creating an Ollama::Image instance and adding it to the images array.

Parameters:

  • images (Array)

    The collection of images to which the new image will be added

  • source_io (IO)

    The input stream containing the image data

  • source (String, #to_s)

    The identifier or path for the source of the image



97
98
99
100
101
# File 'lib/ollama_chat/source_fetching.rb', line 97

def add_image(images, source_io, source)
  STDERR.puts "Adding #{source_io&.content_type} image #{source.to_s.inspect}."
  image = Ollama::Image.for_io(source_io, path: source.to_s)
  (images << image).uniq!
end

#embed(source) ⇒ String?

Embeds content from the specified source.

This method fetches content from a given source (command, URL, or file) and processes it for embedding using the embed_source method. If embedding is disabled, it falls back to generating a summary instead.

or file path

nil if the operation fails

Parameters:

  • source (String)

    The source identifier which can be a command, URL,

Returns:

  • (String, nil)

    The formatted embedding result or summary message, or



243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
# File 'lib/ollama_chat/source_fetching.rb', line 243

def embed(source)
  if @embedding.on?
    STDOUT.puts "Now embedding #{source.to_s.inspect}."
    fetch_source(source) do |source_io|
      content = parse_source(source_io)
      content.present? or return
      source_io.rewind
      embed_source(source_io, source)
    end
    config.prompts.embed % { source: }
  else
    STDOUT.puts "Embedding is off, so I will just give a small summary of this source."
    summarize(source)
  end
end

#embed_source(source_io, source, count: nil) ⇒ Array, ...

Embeds content from the given source IO and source identifier.

This method processes document content by splitting it into chunks using various splitting strategies (Character, RecursiveCharacter, Semantic) and adds the chunks to a document store for embedding.

nil if embedding is disabled or fails

Parameters:

  • source_io (IO)

    The input stream containing the document content to embed

  • source (String, #to_s)

    The identifier or path for the source of the content

  • count (Integer, nil) (defaults to: nil)

    An optional counter for tracking processing order

Returns:

  • (Array, String, nil)

    The embedded chunks or processed content, or



185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
# File 'lib/ollama_chat/source_fetching.rb', line 185

def embed_source(source_io, source, count: nil)
  @embedding.on? or return parse_source(source_io)
  m = "Embedding #{italic { source_io&.content_type }} document #{source.to_s.inspect}."
  if count
    STDOUT.puts '%u. %s' % [ count, m ]
  else
    STDOUT.puts m
  end
  text = parse_source(source_io) or return
  text.downcase!
  splitter_config = config.embedding.splitter
  inputs = nil
  case splitter_config.name
  when 'Character'
    splitter = Documentrix::Documents::Splitters::Character.new(
      chunk_size: splitter_config.chunk_size,
    )
    inputs = splitter.split(text)
  when 'RecursiveCharacter'
    splitter = Documentrix::Documents::Splitters::RecursiveCharacter.new(
      chunk_size: splitter_config.chunk_size,
    )
    inputs = splitter.split(text)
  when 'Semantic'
    splitter = Documentrix::Documents::Splitters::Semantic.new(
      ollama:, model: config.embedding.model.name,
      chunk_size: splitter_config.chunk_size,
    )
    inputs = splitter.split(
      text,
      breakpoint: splitter_config.breakpoint.to_sym,
      percentage: splitter_config.percentage?,
      percentile: splitter_config.percentile?,
    )
  end
  inputs or return
  source = source.to_s
  if source.start_with?(?!)
    source = Kramdown::ANSI::Width.truncate(
      source[1..-1].gsub(/\W+/, ?_),
      length: 10
    )
  end
  @documents.add(inputs, source:, batch_size: config.embedding.batch_size?)
end

#fetch_source(source) {|tmp| ... } ⇒ Object

The fetch_source method retrieves content from various source types including commands, URLs, and file paths. It processes the source based on its type and yields a temporary file handle for further processing.

Parameters:

  • source (String)

    the source identifier which can be a command, URL, or file path

Yields:

  • (tmp)


55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# File 'lib/ollama_chat/source_fetching.rb', line 55

def fetch_source(source, &block)
  case source
  when %r(\A!(.*))
    command = $1
    OllamaChat::Utils::Fetcher.execute(command) do |tmp|
      block.(tmp)
    end
  when %r(\Ahttps?://\S+)
    links.add(source.to_s)
    OllamaChat::Utils::Fetcher.get(
      source,
      headers:      config.request_headers?.to_h,
      cache:        @cache,
      debug:        config.debug,
      http_options: http_options(OllamaChat::Utils::Fetcher.normalize_url(source))
    ) do |tmp|
      block.(tmp)
    end
  when %r(\Afile://(/\S*?)#|\A((?:\.\.|[~.]?)/\S*))
    filename = $~.captures.compact.first
    filename = File.expand_path(filename)
    OllamaChat::Utils::Fetcher.read(filename) do |tmp|
      block.(tmp)
    end
  else
    raise "invalid source #{source.inspect}"
  end
rescue => e
  STDERR.puts "Cannot fetch source #{source.to_s.inspect}: #{e.class} #{e}\n#{e.backtrace * ?\n}"
end

#http_options(url) ⇒ Hash

The http_options method prepares HTTP options for requests based on configuration settings. It determines whether SSL peer verification should be disabled for a given URL and whether a proxy should be used, then returns a hash of options.

proxy settings

Parameters:

  • url (String)

    the URL for which HTTP options are being prepared

Returns:

  • (Hash)

    a hash containing HTTP options such as ssl_verify_peer and



36
37
38
39
40
41
42
43
44
45
46
# File 'lib/ollama_chat/source_fetching.rb', line 36

def http_options(url)
  options = {}
  if ssl_no_verify = config.ssl_no_verify?
    hostname = URI.parse(url).hostname
    options |= { ssl_verify_peer: !ssl_no_verify.include?(hostname) }
  end
  if proxy = config.proxy?
    options |= { proxy: }
  end
  options
end

#import(source) ⇒ String?

Imports content from the specified source and processes it.

This method fetches content from a given source (command, URL, or file) and passes the resulting IO object to the import_source method for processing.

or file path

Parameters:

  • source (String)

    The source identifier which can be a command, URL,

Returns:

  • (String, nil)

    A formatted message indicating the import result and parsed content, # or nil if the operation fails



130
131
132
133
134
135
136
# File 'lib/ollama_chat/source_fetching.rb', line 130

def import(source)
  fetch_source(source) do |source_io|
    content = import_source(source_io, source) or return
    source_io.rewind
    content
  end
end

#import_source(source_io, source) ⇒ String

The import_source method processes and imports content from a given source, displaying information about the document type and returning a formatted string that indicates the import result along with the parsed content.

parsed content

Parameters:

  • source_io (IO)

    the input stream containing the document content

  • source (String)

    the source identifier or path

Returns:

  • (String)

    a formatted message indicating the import result and the



112
113
114
115
116
117
118
# File 'lib/ollama_chat/source_fetching.rb', line 112

def import_source(source_io, source)
  source        = source.to_s
  document_type = source_io&.content_type.full? { |ct| italic { ct } + ' ' }
  STDOUT.puts "Importing #{document_type}document #{source.to_s.inspect} now."
  source_content = parse_source(source_io)
  "Imported #{source.inspect}:\n\n#{source_content}\n\n"
end

#summarize(source, words: nil) ⇒ String?

Summarizes content from the specified source.

This method fetches content from a given source (command, URL, or file) and generates a summary using the summarize_source method.

Parameters:

  • source (String)

    The source identifier which can be a command, URL, or file path

  • words (Integer, nil) (defaults to: nil)

    The target number of words for the summary (defaults to 100)

Returns:

  • (String, nil)

    The formatted summary message or nil if the operation fails



165
166
167
168
169
170
171
# File 'lib/ollama_chat/source_fetching.rb', line 165

def summarize(source, words: nil)
  fetch_source(source) do |source_io|
    content = summarize_source(source_io, source, words:) or return
    source_io.rewind
    content
  end
end

#summarize_source(source_io, source, words: nil) ⇒ String?

Summarizes content from the given source IO and source identifier.

This method takes an IO object containing document content and generates a summary based on the configured prompt template and word count.

Parameters:

  • source_io (IO)

    The input stream containing the document content to summarize

  • source (String, #to_s)

    The identifier or path for the source of the content

  • words (Integer, nil) (defaults to: nil)

    The target number of words for the summary (defaults to 100)

Returns:

  • (String, nil)

    The formatted summary message or nil if content is empty or cannot be processed



147
148
149
150
151
152
153
154
# File 'lib/ollama_chat/source_fetching.rb', line 147

def summarize_source(source_io, source, words: nil)
  STDOUT.puts "Summarizing #{italic { source_io&.content_type }} document #{source.to_s.inspect} now."
  words = words.to_i
  words < 1 and words = 100
  source_content = parse_source(source_io)
  source_content.present? or return
  config.prompts.summarize % { source_content:, words: }
end