Class: RightScraper::Retrievers::Base
- Inherits:
-
Object
- Object
- RightScraper::Retrievers::Base
- Defined in:
- lib/right_scraper/retrievers/base.rb
Overview
Base class for all retrievers.
Retrievers fetch remote repositories into a given path They will attempt to fetch incrementally when possible (e.g. leveraging the underlying source control management system incremental capabilities)
Direct Known Subclasses
Defined Under Namespace
Classes: RetrieverError
Constant Summary collapse
- @@types =
(Hash) Lookup table from textual description of scraper type (‘cookbook’ or ‘workflow’ currently) to the class that represents that scraper.
{}
Instance Attribute Summary collapse
-
#logger ⇒ Object
readonly
Returns the value of attribute logger.
-
#max_bytes ⇒ Object
Returns the value of attribute max_bytes.
-
#max_seconds ⇒ Object
Returns the value of attribute max_seconds.
-
#repo_dir ⇒ Object
readonly
Returns the value of attribute repo_dir.
-
#repository ⇒ Object
readonly
Returns the value of attribute repository.
Class Method Summary collapse
-
.repo_dir(root_dir, repo) ⇒ Object
Path to directory where given repo should be or was downloaded.
Instance Method Summary collapse
-
#available? ⇒ Boolean
Determines if retriever is available (has required CLI tools, etc.).
-
#ignorable_paths ⇒ Object
Paths to ignore when traversing the filesystem.
-
#initialize(repository, options = {}) ⇒ Base
constructor
Create a new retriever for the given repository.
-
#retrieve ⇒ Object
Retrieve repository, overridden in heirs.
Constructor Details
#initialize(repository, options = {}) ⇒ Base
Create a new retriever for the given repository. This class recognizes several options, and subclasses may recognize additional options. Options may never be required.
Options
:basedir-
Required, base directory where all files should be retrieved
:max_bytes-
Maximum number of bytes to read
:max_seconds-
Maximum number of seconds to spend reading
:logger-
Logger to use
Parameters
- repository(RightScraper::Repositories::Base)
-
repository to scrape
- options(Hash)
-
retriever options
Raise
- ‘Missing base directory’
-
if :basedir option is missing
61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
# File 'lib/right_scraper/retrievers/base.rb', line 61 def initialize(repository, ={}) raise 'Missing base directory' unless [:basedir] @repository = repository @max_bytes = [:max_bytes] || nil @max_seconds = [:max_seconds] || nil @basedir = [:basedir] @repo_dir = RightScraper::Retrievers::Base.repo_dir(@basedir, repository) unless @logger = [:logger] raise ::ArgumentError, ':logger is required' end @logger.operation(:initialize, "setting up in #{@repo_dir}") do ::FileUtils.mkdir_p(@repo_dir) end end |
Instance Attribute Details
#logger ⇒ Object (readonly)
Returns the value of attribute logger.
40 41 42 |
# File 'lib/right_scraper/retrievers/base.rb', line 40 def logger @logger end |
#max_bytes ⇒ Object
Returns the value of attribute max_bytes.
38 39 40 |
# File 'lib/right_scraper/retrievers/base.rb', line 38 def max_bytes @max_bytes end |
#max_seconds ⇒ Object
Returns the value of attribute max_seconds.
38 39 40 |
# File 'lib/right_scraper/retrievers/base.rb', line 38 def max_seconds @max_seconds end |
#repo_dir ⇒ Object (readonly)
Returns the value of attribute repo_dir.
40 41 42 |
# File 'lib/right_scraper/retrievers/base.rb', line 40 def repo_dir @repo_dir end |
#repository ⇒ Object (readonly)
Returns the value of attribute repository.
40 41 42 |
# File 'lib/right_scraper/retrievers/base.rb', line 40 def repository @repository end |
Class Method Details
.repo_dir(root_dir, repo) ⇒ Object
Path to directory where given repo should be or was downloaded
Parameters
- root_dir(String)
-
Path to directory containing all scraped repositories
- repo(Hash|RightScraper::Repositories::Base)
-
Remote repository corresponding to local directory
Return
- String
-
Path to local directory that corresponds to given repository
103 104 105 106 107 108 |
# File 'lib/right_scraper/retrievers/base.rb', line 103 def self.repo_dir(root_dir, repo) repo = ::RightScraper::Repositories::Base.from_hash(repo) if repo.is_a?(Hash) dir_name = repo.repository_hash dir_path = ::File.join(root_dir, dir_name) "#{dir_path}/repo" end |
Instance Method Details
#available? ⇒ Boolean
Determines if retriever is available (has required CLI tools, etc.)
77 78 79 |
# File 'lib/right_scraper/retrievers/base.rb', line 77 def available? raise ::NotImplementedError end |
#ignorable_paths ⇒ Object
Paths to ignore when traversing the filesystem. Mostly used for things like Git and Subversion version control directories.
Return
- list(Array)
-
list of filenames to ignore.
86 87 88 |
# File 'lib/right_scraper/retrievers/base.rb', line 86 def ignorable_paths [] end |
#retrieve ⇒ Object
Retrieve repository, overridden in heirs
91 92 93 |
# File 'lib/right_scraper/retrievers/base.rb', line 91 def retrieve raise ::NotImplementedError end |