Class: RightScraper::Retrievers::Base
- Inherits:
-
Object
- Object
- RightScraper::Retrievers::Base
- Defined in:
- lib/right_scraper/retrievers/base.rb
Overview
Base class for all retrievers.
Retrievers fetch remote repositories into a given path They will attempt to fetch incrementally when possible (e.g. leveraging the underlying source control management system incremental capabilities)
Direct Known Subclasses
Defined Under Namespace
Classes: RetrieverError
Constant Summary collapse
- @@types =
(Hash) Lookup table from textual description of scraper type (‘cookbook’ or ‘workflow’ currently) to the class that represents that scraper.
{}
Instance Attribute Summary collapse
-
#max_bytes ⇒ Object
- Integer
-
optional maximum size permitted for repositories.
-
#max_seconds ⇒ Object
- Integer
-
optional maximum number of seconds for any single retrieve operation.
-
#repo_dir ⇒ Object
readonly
- String
-
Path to directory where files are retrieved.
-
#repository ⇒ Object
readonly
- RightScraper::Repositories::Base
-
repository currently being retrieved.
Class Method Summary collapse
-
.repo_dir(root_dir, repo) ⇒ Object
Path to directory where given repo should be or was downloaded.
Instance Method Summary collapse
-
#available? ⇒ Boolean
Determines if retriever is available (has required CLI tools, etc.).
-
#ignorable_paths ⇒ Object
Paths to ignore when traversing the filesystem.
-
#initialize(repository, options = {}) ⇒ Base
constructor
Create a new retriever for the given repository.
-
#retrieve ⇒ Object
Retrieve repository, overridden in heirs.
Constructor Details
#initialize(repository, options = {}) ⇒ Base
Create a new retriever for the given repository. This class recognizes several options, and subclasses may recognize additional options. Options may never be required.
Options
:basedir
-
Required, base directory where all files should be retrieved
:max_bytes
-
Maximum number of bytes to read
:max_seconds
-
Maximum number of seconds to spend reading
:logger
-
Logger to use
Parameters
- repository(RightScraper::Repositories::Base)
-
repository to scrape
- options(Hash)
-
retriever options
Raise
- ‘Missing base directory’
-
if :basedir option is missing
65 66 67 68 69 70 71 72 73 74 75 76 77 |
# File 'lib/right_scraper/retrievers/base.rb', line 65 def initialize(repository, ={}) raise 'Missing base directory' unless [:basedir] @repository = repository @max_bytes = [:max_bytes] || nil @max_seconds = [:max_seconds] || nil @basedir = [:basedir] @repo_dir = RightScraper::Retrievers::Base.repo_dir(@basedir, repository) @logger = [:logger] || RightScraper::Logger.new @logger.repository = repository @logger.operation(:initialize, "setting up in #{@repo_dir}") do FileUtils.mkdir_p(@repo_dir) end end |
Instance Attribute Details
#max_bytes ⇒ Object
- Integer
-
optional maximum size permitted for repositories
34 35 36 |
# File 'lib/right_scraper/retrievers/base.rb', line 34 def max_bytes @max_bytes end |
#max_seconds ⇒ Object
- Integer
-
optional maximum number of seconds for any single
retrieve operation.
38 39 40 |
# File 'lib/right_scraper/retrievers/base.rb', line 38 def max_seconds @max_seconds end |
#repo_dir ⇒ Object (readonly)
- String
-
Path to directory where files are retrieved
44 45 46 |
# File 'lib/right_scraper/retrievers/base.rb', line 44 def repo_dir @repo_dir end |
#repository ⇒ Object (readonly)
- RightScraper::Repositories::Base
-
repository currently being retrieved
41 42 43 |
# File 'lib/right_scraper/retrievers/base.rb', line 41 def repository @repository end |
Class Method Details
.repo_dir(root_dir, repo) ⇒ Object
Path to directory where given repo should be or was downloaded
Parameters
- root_dir(String)
-
Path to directory containing all scraped repositories
- repo(Hash|RightScraper::Repositories::Base)
-
Remote repository corresponding to local directory
Return
- String
-
Path to local directory that corresponds to given repository
106 107 108 109 110 111 |
# File 'lib/right_scraper/retrievers/base.rb', line 106 def self.repo_dir(root_dir, repo) repo = RightScraper::Repositories::Base.from_hash(repo) if repo.is_a?(Hash) dir_name = repo.repository_hash dir_path = File.join(root_dir, dir_name) "#{dir_path}/repo" end |
Instance Method Details
#available? ⇒ Boolean
Determines if retriever is available (has required CLI tools, etc.)
80 81 82 |
# File 'lib/right_scraper/retrievers/base.rb', line 80 def available? raise NotImplementedError end |
#ignorable_paths ⇒ Object
Paths to ignore when traversing the filesystem. Mostly used for things like Git and Subversion version control directories.
Return
- list(Array)
-
list of filenames to ignore.
89 90 91 |
# File 'lib/right_scraper/retrievers/base.rb', line 89 def ignorable_paths [] end |
#retrieve ⇒ Object
Retrieve repository, overridden in heirs
94 95 96 |
# File 'lib/right_scraper/retrievers/base.rb', line 94 def retrieve raise NotImplementedError end |