Class: RightScraper::Main
- Inherits:
-
Object
- Object
- RightScraper::Main
- Defined in:
- lib/right_scraper/main.rb
Overview
Library main entry point. Instantiate this class and call the scrape method to download or update a remote repository to the local disk and run a scraper on the resulting files.
Note that this class was known as Scraper in v1-3 but the name was confusing due to the Scrapers module performing only a subset of the main Scraper class functionality.
Instance Attribute Summary collapse
-
#resources ⇒ Object
readonly
- (Array)
-
Scraped resources.
Instance Method Summary collapse
-
#errors ⇒ Object
- (Array)
-
Error messages in case of failure.
-
#initialize(options = {}) ⇒ Main
constructor
Initialize scrape destination directory.
-
#repo_dir(repo) ⇒ Object
Path to directory where given repo should be or was downloaded.
-
#scrape(repo, incremental = true, &callback) ⇒ Object
Scrape given repository, depositing files into the scrape directory.
-
#succeeded? ⇒ Boolean
(also: #successful?)
Call errors to get error messages if false.
-
#warnings ⇒ Object
- (Array)
-
Warnings or empty.
Constructor Details
#initialize(options = {}) ⇒ Main
Initialize scrape destination directory
Options
:kind-
Type of scraper that will traverse directory for resources, one of :cookbook or :workflow
:basedir-
Local directory where files are retrieved and scraped, use temporary directory if nil
:max_bytes-
Maximum number of bytes to read from remote repo, unlimited if nil
:max_seconds-
Maximum number of seconds to spend reading from remote repo, unlimited if nil
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
# File 'lib/right_scraper/main.rb', line 50 def initialize(={}) = { :kind => nil, :basedir => nil, :max_bytes => nil, :max_seconds => nil, :callback => nil, :logger => nil, :s3_key => nil, :s3_secret => nil, :s3_bucket => nil, :errors => nil, :warnings => nil, :scanners => nil, :builders => nil, }.merge() @temporary = !.has_key?(:basedir) [:basedir] ||= Dir.mktmpdir [:logger] ||= ::RightScraper::Loggers::Default.new @logger = [:logger] @resources = [] = end |
Instance Attribute Details
#resources ⇒ Object (readonly)
- (Array)
-
Scraped resources
41 42 43 |
# File 'lib/right_scraper/main.rb', line 41 def resources @resources end |
Instance Method Details
#errors ⇒ Object
- (Array)
-
Error messages in case of failure
166 167 168 |
# File 'lib/right_scraper/main.rb', line 166 def errors @logger.errors end |
#repo_dir(repo) ⇒ Object
Path to directory where given repo should be or was downloaded
Parameters
- repo(Hash|RightScraper::Repositories::Base)
-
Remote repository corresponding to local directory
Return
- String
-
Path to local directory that corresponds to given repository
161 162 163 |
# File 'lib/right_scraper/main.rb', line 161 def repo_dir(repo) RightScraper::Retrievers::Base.repo_dir([:basedir], repo) end |
#scrape(repo, incremental = true, &callback) ⇒ Object
Scrape given repository, depositing files into the scrape directory. Update content of unique directory incrementally when possible with further calls.
Parameters
- repo(Hash|RightScraper::Repositories::Base)
-
Repository to be scraped
Note: repo can either be a Hash or a RightScraper::Repositories::Base instance.
See the RightScraper::Repositories::Base class for valid Hash keys.
Block
If a block is given, it will be called back with progress information the block should take four arguments:
-
first argument is one of
:begin,:commit,:abortwhich signifies what the scraper is trying to do and where it is when it does it -
second argument is a symbol describing the operation being performed in an easy-to-match way
-
third argument is optional further explanation
-
fourth argument is the exception pending (only relevant for
:abort)
Return
- true
-
If scrape was successful
- false
-
If scrape failed, call errors for information on failure
Raise
- ‘Invalid repository type’
-
If repository type is not known
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
# File 'lib/right_scraper/main.rb', line 100 def scrape(repo, incremental=true, &callback) errorlen = errors.size repo = RightScraper::Repositories::Base.from_hash(repo) if repo.is_a?(Hash) @logger.callback = callback begin # 1. Retrieve the files retriever = nil repo_dir_changed = false @logger.operation(:retrieving, "from #{repo}") do # note that the retriever type may be unavailable but allow the # retrieve method to raise any such error. retriever = repo.retriever() repo_dir_changed = retriever.retrieve end # TEAL FIX: Note that retrieve will now return true iff there has been # a change to the last scraped repository directory for efficiency # reasons and only for retreiver types that support this behavior. # # Even if the retrieval is skipped due to already having the data on # disk we still need to scrape its resources only because of the case # of the metadata scraper daemon, which updates multiple repositories # of similar criteria. # # The issue is that a new repo can appear later with the same criteria # as an already-scraped repo and will need it's own copy of the # scraped resources. The easiest (but not most efficient) way to # deliver these is to rescrape the already-seen resources. This # becomes more expensive as we rely on generating "metadata.json" from # "metadata.rb" for cookbooks but is likely not expensive enough to # need to improve this logic. # 2. Now scrape if there is a scraper in the options @logger.operation(:scraping, retriever.repo_dir) do if [:kind] = .merge({:ignorable_paths => retriever.ignorable_paths, :repo_dir => retriever.repo_dir, :repository => retriever.repository}) scraper = RightScraper::Scrapers::Base.scraper() @resources += scraper.scrape end end rescue Exception # logger handles communication with the end user and appending # to our error list, we just need to keep going. ensure # ensure basedir is always removed if temporary (even with errors). ::FileUtils.remove_entry_secure([:basedir]) rescue nil if @temporary end @logger.callback = nil errors.size == errorlen end |
#succeeded? ⇒ Boolean Also known as: successful?
Call errors to get error messages if false
Return
- Boolean
-
true if scrape finished with no error, false otherwise.
180 181 182 |
# File 'lib/right_scraper/main.rb', line 180 def succeeded? errors.empty? end |
#warnings ⇒ Object
- (Array)
-
Warnings or empty
171 172 173 |
# File 'lib/right_scraper/main.rb', line 171 def warnings @logger.warnings end |