Class: RightScraper::Scraper

Inherits:

Object

Object
RightScraper::Scraper

Defined in:: lib/right_scraper/scraper.rb

Overview

Library main entry point. Instantiate this class and call the scrape method to download or update a remote repository to the local disk and run a scraper on the resulting files.

Instance Attribute Summary collapse

#resources ⇒ Object readonly
(Array)

Scraped resources.

Instance Method Summary collapse

#errors ⇒ Object
(Array)

Error messages in case of failure.
#initialize(options = {}) ⇒ Scraper constructor

Initialize scrape destination directory.
#repo_dir(repo) ⇒ Object

Path to directory where given repo should be or was downloaded.
#scrape(repo, incremental = true, &callback) ⇒ Object

Scrape given repository, depositing files into the scrape directory.
#succeeded? ⇒ Boolean (also: #successful?)

Call errors to get error messages if false.

Constructor Details

#initialize(options = {}) ⇒ `Scraper`

Initialize scrape destination directory

Options

:kind: Type of scraper that will traverse directory for resources, one of :cookbook or :workflow
:basedir: Local directory where files are retrieved and scraped, use temporary directory if nil
:max_bytes: Maximum number of bytes to read from remote repo, unlimited if nil
:max_seconds: Maximum number of seconds to spend reading from remote repo, unlimited if nil

# File 'lib/right_scraper/scraper.rb', line 42

def initialize(options={})
  @temporary = !options.has_key?(:basedir)
  options[:basedir] ||= Dir.mktmpdir
  @logger = ScraperLogger.new
  @options = options.merge({:logger => @logger})
  @resources = []
end

Instance Attribute Details

#resources ⇒ `Object` (readonly)

(Array): Scraped resources



33
34
35

# File 'lib/right_scraper/scraper.rb', line 33

def resources
  @resources
end

Instance Method Details

#errors ⇒ `Object`

(Array): Error messages in case of failure



121
122
123

# File 'lib/right_scraper/scraper.rb', line 121

def errors
  @logger.errors
end

#repo_dir(repo) ⇒ `Object`

Path to directory where given repo should be or was downloaded

Parameters

repo(Hash|RightScraper::Repositories::Base): Remote repository corresponding to local directory

Return

String: Path to local directory that corresponds to given repository



116
117
118

# File 'lib/right_scraper/scraper.rb', line 116

def repo_dir(repo)
  RightScraper::Retrievers::Base.repo_dir(@options[:basedir], repo)
end

#scrape(repo, incremental = true, &callback) ⇒ `Object`

Scrape given repository, depositing files into the scrape directory. Update content of unique directory incrementally when possible with further calls.

Parameters

repo(Hash|RightScraper::Repositories::Base): Repository to be scraped

Note: repo can either be a Hash or a RightScraper::Repositories::Base instance.
      See the RightScraper::Repositories::Base class for valid Hash keys.

Block

If a block is given, it will be called back with progress information the block should take four arguments:

first argument is one of :begin, :commit, :abort which signifies what the scraper is trying to do and where it is when it does it
second argument is a symbol describing the operation being performed in an easy-to-match way
third argument is optional further explanation
fourth argument is the exception pending (only relevant for :abort)

Return

true: If scrape was successful
false: If scrape failed, call errors for information on failure

Raise

‘Invalid repository type’: If repository type is not known

# File 'lib/right_scraper/scraper.rb', line 76

def scrape(repo, incremental=true, &callback)
  errorlen = errors.size
  repo = RightScraper::Repositories::Base.from_hash(repo) if repo.is_a?(Hash)
  @logger.callback = callback
  begin
    # 1. Retrieve the files
    retriever = nil
    @logger.operation(:retrieving, "from #{repo}") do
      retriever = repo.retriever(@options)
      retriever.retrieve if retriever.available?
    end

    # 2. Now scrape if there is a scraper in the options
    @logger.operation(:scraping, retriever.repo_dir) do
      if @options[:kind]
        options = @options.merge({:ignorable_paths => retriever.ignorable_paths,
                                  :repo_dir        => retriever.repo_dir,
                                  :repository      => retriever.repository})
        scraper = RightScraper::Scrapers::Base.scraper(options)
        @resources += scraper.scrape
      end
    end

    # 3. Cleanup if temporary
    FileUtils.remove_entry_secure(@options[:basedir]) if @temporary
  rescue
    # logger handles communication with the end user and appending
    # to our error list, we just need to keep going.
  end
  @logger.callback = nil
  errors.size == errorlen
end

#succeeded? ⇒ `Boolean` Also known as: successful?

Call errors to get error messages if false

Return

Boolean: true if scrape finished with no error, false otherwise.

Returns:

(Boolean)



130
131
132

# File 'lib/right_scraper/scraper.rb', line 130

def succeeded?
  errors.empty?
end

Class: RightScraper::Scraper

Overview

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(options = {}) ⇒ Scraper

Options

Instance Attribute Details

#resources ⇒ Object (readonly)

Instance Method Details

#errors ⇒ Object

#repo_dir(repo) ⇒ Object

Parameters

Return

#scrape(repo, incremental = true, &callback) ⇒ Object

Parameters

Block

Return

Raise

#succeeded? ⇒ Boolean Also known as: successful?

Return

#initialize(options = {}) ⇒ `Scraper`

#resources ⇒ `Object` (readonly)

#errors ⇒ `Object`

#repo_dir(repo) ⇒ `Object`

#scrape(repo, incremental = true, &callback) ⇒ `Object`

#succeeded? ⇒ `Boolean` Also known as: successful?