Class: RightScraper::Scraper

Inherits:
Object
  • Object
show all
Defined in:
lib/right_scraper/scraper.rb

Overview

Library main entry point. Instantiate this class and call the scrape method to download or update a remote repository to the local disk and run a scraper on the resulting files.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(options = {}) ⇒ Scraper

Initialize scrape destination directory

Options

:kind

Type of scraper that will traverse directory for resources, one of :cookbook or :workflow

:basedir

Local directory where files are retrieved and scraped, use temporary directory if nil

:max_bytes

Maximum number of bytes to read from remote repo, unlimited if nil

:max_seconds

Maximum number of seconds to spend reading from remote repo, unlimited if nil



42
43
44
45
46
47
48
# File 'lib/right_scraper/scraper.rb', line 42

def initialize(options={})
  @temporary = !options.has_key?(:basedir)
  options[:basedir] ||= Dir.mktmpdir
  @logger = ScraperLogger.new
  @options = options.merge({:logger => @logger})
  @resources = []
end

Instance Attribute Details

#resourcesObject (readonly)

(Array)

Scraped resources



33
34
35
# File 'lib/right_scraper/scraper.rb', line 33

def resources
  @resources
end

Instance Method Details

#errorsObject

(Array)

Error messages in case of failure



121
122
123
# File 'lib/right_scraper/scraper.rb', line 121

def errors
  @logger.errors
end

#repo_dir(repo) ⇒ Object

Path to directory where given repo should be or was downloaded

Parameters

repo(Hash|RightScraper::Repositories::Base)

Remote repository corresponding to local directory

Return

String

Path to local directory that corresponds to given repository



116
117
118
# File 'lib/right_scraper/scraper.rb', line 116

def repo_dir(repo)
  RightScraper::Retrievers::Base.repo_dir(@options[:basedir], repo)
end

#scrape(repo, incremental = true, &callback) ⇒ Object

Scrape given repository, depositing files into the scrape directory. Update content of unique directory incrementally when possible with further calls.

Parameters

repo(Hash|RightScraper::Repositories::Base)

Repository to be scraped

Note: repo can either be a Hash or a RightScraper::Repositories::Base instance.
      See the RightScraper::Repositories::Base class for valid Hash keys.

Block

If a block is given, it will be called back with progress information the block should take four arguments:

  • first argument is one of :begin, :commit, :abort which signifies what the scraper is trying to do and where it is when it does it

  • second argument is a symbol describing the operation being performed in an easy-to-match way

  • third argument is optional further explanation

  • fourth argument is the exception pending (only relevant for :abort)

Return

true

If scrape was successful

false

If scrape failed, call errors for information on failure

Raise

‘Invalid repository type’

If repository type is not known



76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
# File 'lib/right_scraper/scraper.rb', line 76

def scrape(repo, incremental=true, &callback)
  errorlen = errors.size
  repo = RightScraper::Repositories::Base.from_hash(repo) if repo.is_a?(Hash)
  @logger.callback = callback
  begin
    # 1. Retrieve the files
    retriever = nil
    @logger.operation(:retrieving, "from #{repo}") do
      retriever = repo.retriever(@options)
      retriever.retrieve if retriever.available?
    end

    # 2. Now scrape if there is a scraper in the options
    @logger.operation(:scraping, retriever.repo_dir) do
      if @options[:kind]
        options = @options.merge({:ignorable_paths => retriever.ignorable_paths,
                                  :repo_dir        => retriever.repo_dir,
                                  :repository      => retriever.repository})
        scraper = RightScraper::Scrapers::Base.scraper(options)
        @resources += scraper.scrape
      end
    end

    # 3. Cleanup if temporary
    FileUtils.remove_entry_secure(@options[:basedir]) if @temporary
  rescue
    # logger handles communication with the end user and appending
    # to our error list, we just need to keep going.
  end
  @logger.callback = nil
  errors.size == errorlen
end

#succeeded?Boolean Also known as: successful?

Call errors to get error messages if false

Return

Boolean

true if scrape finished with no error, false otherwise.

Returns:

  • (Boolean)


130
131
132
# File 'lib/right_scraper/scraper.rb', line 130

def succeeded?
  errors.empty?
end