Class: RightScraper::Scraper
- Inherits:
-
Object
- Object
- RightScraper::Scraper
- Defined in:
- lib/right_scraper/scraper.rb
Overview
Library main entry point. Instantiate this class and call the scrape method to download or update a remote repository to the local disk and run a scraper on the resulting files.
Instance Attribute Summary collapse
-
#resources ⇒ Object
readonly
- (Array)
-
Scraped resources.
Instance Method Summary collapse
-
#errors ⇒ Object
- (Array)
-
Error messages in case of failure.
-
#initialize(options = {}) ⇒ Scraper
constructor
Initialize scrape destination directory.
-
#repo_dir(repo) ⇒ Object
Path to directory where given repo should be or was downloaded.
-
#scrape(repo, incremental = true, &callback) ⇒ Object
Scrape given repository, depositing files into the scrape directory.
-
#succeeded? ⇒ Boolean
(also: #successful?)
Call errors to get error messages if false.
Constructor Details
#initialize(options = {}) ⇒ Scraper
Initialize scrape destination directory
Options
:kind
-
Type of scraper that will traverse directory for resources, one of :cookbook or :workflow
:basedir
-
Local directory where files are retrieved and scraped, use temporary directory if nil
:max_bytes
-
Maximum number of bytes to read from remote repo, unlimited if nil
:max_seconds
-
Maximum number of seconds to spend reading from remote repo, unlimited if nil
42 43 44 45 46 47 48 |
# File 'lib/right_scraper/scraper.rb', line 42 def initialize(={}) @temporary = !.has_key?(:basedir) [:basedir] ||= Dir.mktmpdir @logger = ScraperLogger.new @options = .merge({:logger => @logger}) @resources = [] end |
Instance Attribute Details
#resources ⇒ Object (readonly)
- (Array)
-
Scraped resources
33 34 35 |
# File 'lib/right_scraper/scraper.rb', line 33 def resources @resources end |
Instance Method Details
#errors ⇒ Object
- (Array)
-
Error messages in case of failure
121 122 123 |
# File 'lib/right_scraper/scraper.rb', line 121 def errors @logger.errors end |
#repo_dir(repo) ⇒ Object
Path to directory where given repo should be or was downloaded
Parameters
- repo(Hash|RightScraper::Repositories::Base)
-
Remote repository corresponding to local directory
Return
- String
-
Path to local directory that corresponds to given repository
116 117 118 |
# File 'lib/right_scraper/scraper.rb', line 116 def repo_dir(repo) RightScraper::Retrievers::Base.repo_dir(@options[:basedir], repo) end |
#scrape(repo, incremental = true, &callback) ⇒ Object
Scrape given repository, depositing files into the scrape directory. Update content of unique directory incrementally when possible with further calls.
Parameters
- repo(Hash|RightScraper::Repositories::Base)
-
Repository to be scraped
Note: repo can either be a Hash or a RightScraper::Repositories::Base instance.
See the RightScraper::Repositories::Base class for valid Hash keys.
Block
If a block is given, it will be called back with progress information the block should take four arguments:
-
first argument is one of
:begin
,:commit
,:abort
which signifies what the scraper is trying to do and where it is when it does it -
second argument is a symbol describing the operation being performed in an easy-to-match way
-
third argument is optional further explanation
-
fourth argument is the exception pending (only relevant for
:abort
)
Return
- true
-
If scrape was successful
- false
-
If scrape failed, call errors for information on failure
Raise
- ‘Invalid repository type’
-
If repository type is not known
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
# File 'lib/right_scraper/scraper.rb', line 76 def scrape(repo, incremental=true, &callback) errorlen = errors.size repo = RightScraper::Repositories::Base.from_hash(repo) if repo.is_a?(Hash) @logger.callback = callback begin # 1. Retrieve the files retriever = nil @logger.operation(:retrieving, "from #{repo}") do retriever = repo.retriever(@options) retriever.retrieve if retriever.available? end # 2. Now scrape if there is a scraper in the options @logger.operation(:scraping, retriever.repo_dir) do if @options[:kind] = @options.merge({:ignorable_paths => retriever.ignorable_paths, :repo_dir => retriever.repo_dir, :repository => retriever.repository}) scraper = RightScraper::Scrapers::Base.scraper() @resources += scraper.scrape end end # 3. Cleanup if temporary FileUtils.remove_entry_secure(@options[:basedir]) if @temporary rescue # logger handles communication with the end user and appending # to our error list, we just need to keep going. end @logger.callback = nil errors.size == errorlen end |
#succeeded? ⇒ Boolean Also known as: successful?
Call errors to get error messages if false
Return
- Boolean
-
true if scrape finished with no error, false otherwise.
130 131 132 |
# File 'lib/right_scraper/scraper.rb', line 130 def succeeded? errors.empty? end |