Method: Wgit::DSL#index

Defined in:
lib/wgit/dsl.rb

#index(*urls, insert_externals: false) {|doc| ... } ⇒ Object Also known as: index_url

Indexes a single webpage using Wgit::Indexer#index_url underneath.

Parameters:

  • urls (*Wgit::Url)

    The webpage URL's to crawl. Defaults to the start URL(s).

  • insert_externals (Boolean) (defaults to: false)

    Whether or not to insert the website's external URL's into the database.

Yields:

  • (doc)

    Given the Wgit::Document of the crawled webpage, before it's inserted into the database allowing for prior manipulation. Return nil or false from the block to prevent the document from being saved into the database.

Raises:

  • (StandardError)

    If no urls are provided and no start URL has been set.



238
239
240
241
242
243
244
245
246
# File 'lib/wgit/dsl.rb', line 238

def index(*urls, insert_externals: false, &block)
  urls = (@dsl_start || []) if urls.empty?
  raise DSL_ERROR__NO_START_URL if urls.empty?

  indexer = Wgit::Indexer.new(get_db, get_crawler)

  urls.map! { |url| Wgit::Url.parse(url) }
  indexer.index_urls(*urls, insert_externals:, &block)
end