Module: OllamaChat::SourceFetching

Included in:
Chat
Defined in:
lib/ollama_chat/source_fetching.rb

Instance Method Summary collapse

Instance Method Details

#add_image(images, source_io, source) ⇒ Object

Adds an image to the images collection from the given source IO and source identifier.

This method takes an IO object containing image data and associates it with a source, creating an Ollama::Image instance and adding it to the images array.

Parameters:

  • images (Array)

    The collection of images to which the new image will be added

  • source_io (IO)

    The input stream containing the image data

  • source (String, #to_s)

    The identifier or path for the source of the image



72
73
74
75
76
# File 'lib/ollama_chat/source_fetching.rb', line 72

def add_image(images, source_io, source)
  STDERR.puts "Adding #{source_io&.content_type} image #{source.to_s.inspect}."
  image = Ollama::Image.for_io(source_io, path: source.to_s)
  (images << image).uniq!
end

#embed(source) ⇒ String?

Embeds content from the specified source.

This method fetches content from a given source (command, URL, or file) and processes it for embedding using the embed_source method. If embedding is disabled, it falls back to generating a summary instead.

or file path

nil if the operation fails

Parameters:

  • source (String)

    The source identifier which can be a command, URL,

Returns:

  • (String, nil)

    The formatted embedding result or summary message, or



220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
# File 'lib/ollama_chat/source_fetching.rb', line 220

def embed(source)
  if @embedding.on?
    STDOUT.puts "Now embedding #{source.to_s.inspect}."
    fetch_source(source) do |source_io|
      content = parse_source(source_io)
      content.present? or return
      source_io.rewind
      embed_source(source_io, source)
    end
    config.prompts.embed % { source: }
  else
    STDOUT.puts "Embedding is off, so I will just give a small summary of this source."
    summarize(source)
  end
end

#embed_source(source_io, source, count: nil) ⇒ Array, ...

Embeds content from the given source IO and source identifier.

This method processes document content by splitting it into chunks using various splitting strategies (Character, RecursiveCharacter, Semantic) and adds the chunks to a document store for embedding.

nil if embedding is disabled or fails

Parameters:

  • source_io (IO)

    The input stream containing the document content to embed

  • source (String, #to_s)

    The identifier or path for the source of the content

  • count (Integer, nil) (defaults to: nil)

    An optional counter for tracking processing order

Returns:

  • (Array, String, nil)

    The embedded chunks or processed content, or



162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
# File 'lib/ollama_chat/source_fetching.rb', line 162

def embed_source(source_io, source, count: nil)
  @embedding.on? or return parse_source(source_io)
  m = "Embedding #{italic { source_io&.content_type }} document #{source.to_s.inspect}."
  if count
    STDOUT.puts '%u. %s' % [ count, m ]
  else
    STDOUT.puts m
  end
  text = parse_source(source_io) or return
  text.downcase!
  splitter_config = config.embedding.splitter
  inputs = nil
  case splitter_config.name
  when 'Character'
    splitter = Documentrix::Documents::Splitters::Character.new(
      chunk_size: splitter_config.chunk_size,
    )
    inputs = splitter.split(text)
  when 'RecursiveCharacter'
    splitter = Documentrix::Documents::Splitters::RecursiveCharacter.new(
      chunk_size: splitter_config.chunk_size,
    )
    inputs = splitter.split(text)
  when 'Semantic'
    splitter = Documentrix::Documents::Splitters::Semantic.new(
      ollama:, model: config.embedding.model.name,
      chunk_size: splitter_config.chunk_size,
    )
    inputs = splitter.split(
      text,
      breakpoint: splitter_config.breakpoint.to_sym,
      percentage: splitter_config.percentage?,
      percentile: splitter_config.percentile?,
    )
  end
  inputs or return
  source = source.to_s
  if source.start_with?(?!)
    source = Kramdown::ANSI::Width.truncate(
      source[1..-1].gsub(/\W+/, ?_),
      length: 10
    )
  end
  @documents.add(inputs, source:, batch_size: config.embedding.batch_size?)
end

#fetch_source(source) {|tmp| ... } ⇒ Object

The fetch_source method retrieves content from various source types including commands, URLs, and file paths. It processes the source based on its type and yields a temporary file handle for further processing.

Parameters:

  • source (String)

    the source identifier which can be a command, URL, or file path

Yields:

  • (tmp)


30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# File 'lib/ollama_chat/source_fetching.rb', line 30

def fetch_source(source, &block)
  case source
  when %r(\A!(.*))
    command = $1
    OllamaChat::Utils::Fetcher.execute(command) do |tmp|
      block.(tmp)
    end
  when %r(\Ahttps?://\S+)
    links.add(source.to_s)
    OllamaChat::Utils::Fetcher.get(
      source,
      headers:      config.request_headers?.to_h,
      cache:        @cache,
      debug:        config.debug,
      http_options: http_options(OllamaChat::Utils::Fetcher.normalize_url(source))
    ) do |tmp|
      block.(tmp)
    end
  when %r(\Afile://(/\S*?)#|\A((?:\.\.|[~.]?)/\S*))
    filename = $~.captures.compact.first
    filename = File.expand_path(filename)
    OllamaChat::Utils::Fetcher.read(filename) do |tmp|
      block.(tmp)
    end
  else
    raise "invalid source #{source.inspect}"
  end
rescue => e
  STDERR.puts "Cannot fetch source #{source.to_s.inspect}: #{e.class} #{e}\n#{e.backtrace * ?\n}"
end

#http_options(url) ⇒ Hash

The http_options method prepares HTTP options for requests based on configuration settings. It determines whether SSL peer verification should be disabled for a given URL and whether a proxy should be used, then returns a hash of options.

proxy settings

Parameters:

  • url (String)

    the URL for which HTTP options are being prepared

Returns:

  • (Hash)

    a hash containing HTTP options such as ssl_verify_peer and



11
12
13
14
15
16
17
18
19
20
21
# File 'lib/ollama_chat/source_fetching.rb', line 11

def http_options(url)
  options = {}
  if ssl_no_verify = config.ssl_no_verify?
    hostname = URI.parse(url).hostname
    options |= { ssl_verify_peer: !ssl_no_verify.include?(hostname) }
  end
  if proxy = config.proxy?
    options |= { proxy: }
  end
  options
end

#import(source) ⇒ String?

Imports content from the specified source and processes it.

This method fetches content from a given source (command, URL, or file) and passes the resulting IO object to the import_source method for processing.

or file path

Parameters:

  • source (String)

    The source identifier which can be a command, URL,

Returns:

  • (String, nil)

    A formatted message indicating the import result and parsed content, # or nil if the operation fails



105
106
107
108
109
110
111
# File 'lib/ollama_chat/source_fetching.rb', line 105

def import(source)
  fetch_source(source) do |source_io|
    content = import_source(source_io, source) or return
    source_io.rewind
    content
  end
end

#import_source(source_io, source) ⇒ String

The import_source method processes and imports content from a given source, displaying information about the document type and returning a formatted string that indicates the import result along with the parsed content.

parsed content

Parameters:

  • source_io (IO)

    the input stream containing the document content

  • source (String)

    the source identifier or path

Returns:

  • (String)

    a formatted message indicating the import result and the



87
88
89
90
91
92
93
# File 'lib/ollama_chat/source_fetching.rb', line 87

def import_source(source_io, source)
  source        = source.to_s
  document_type = source_io&.content_type.full? { |ct| italic { ct } + ' ' }
  STDOUT.puts "Importing #{document_type}document #{source.to_s.inspect} now."
  source_content = parse_source(source_io)
  "Imported #{source.inspect}:\n\n#{source_content}\n\n"
end

#summarize(source, words: nil) ⇒ String?

Summarizes content from the specified source.

This method fetches content from a given source (command, URL, or file) and generates a summary using the summarize_source method.

Parameters:

  • source (String)

    The source identifier which can be a command, URL, or file path

  • words (Integer, nil) (defaults to: nil)

    The target number of words for the summary (defaults to 100)

Returns:

  • (String, nil)

    The formatted summary message or nil if the operation fails



141
142
143
144
145
146
147
# File 'lib/ollama_chat/source_fetching.rb', line 141

def summarize(source, words: nil)
  fetch_source(source) do |source_io|
    content = summarize_source(source_io, source, words:) or return
    source_io.rewind
    content
  end
end

#summarize_source(source_io, source, words: nil) ⇒ String?

Summarizes content from the given source IO and source identifier.

This method takes an IO object containing document content and generates a summary based on the configured prompt template and word count.

Parameters:

  • source_io (IO)

    The input stream containing the document content to summarize

  • source (String, #to_s)

    The identifier or path for the source of the content

  • words (Integer, nil) (defaults to: nil)

    The target number of words for the summary (defaults to 100)

Returns:

  • (String, nil)

    The formatted summary message or nil if content is empty or cannot be processed



123
124
125
126
127
128
129
130
# File 'lib/ollama_chat/source_fetching.rb', line 123

def summarize_source(source_io, source, words: nil)
  STDOUT.puts "Summarizing #{italic { source_io&.content_type }} document #{source.to_s.inspect} now."
  words = words.to_i
  words < 1 and words = 100
  source_content = parse_source(source_io)
  source_content.present? or return
  config.prompts.summarize % { source_content:, words: }
end