Class: RightScraper::Retrievers::Base

Inherits:

Object

Object
RightScraper::Retrievers::Base

Defined in:: lib/right_scraper/retrievers/base.rb

Overview

Base class for all retrievers.

Retrievers fetch remote repositories into a given path They will attempt to fetch incrementally when possible (e.g. leveraging the underlying source control management system incremental capabilities)

Direct Known Subclasses

CheckoutBasedRetriever, Download

Defined Under Namespace

Classes: RetrieverError

Constant Summary collapse

@@types = (Hash) Lookup table from textual description of scraper type (‘cookbook’ or ‘workflow’ currently) to the class that represents that scraper.

{}

Instance Attribute Summary collapse

#max_bytes ⇒ Object
Integer

optional maximum size permitted for repositories.
#max_seconds ⇒ Object
Integer

optional maximum number of seconds for any single retrieve operation.
#repo_dir ⇒ Object readonly
String

Path to directory where files are retrieved.
#repository ⇒ Object readonly
RightScraper::Repositories::Base

repository currently being retrieved.

Class Method Summary collapse

.repo_dir(root_dir, repo) ⇒ Object

Path to directory where given repo should be or was downloaded.

Instance Method Summary collapse

#available? ⇒ Boolean

Determines if retriever is available (has required CLI tools, etc.).
#ignorable_paths ⇒ Object

Paths to ignore when traversing the filesystem.
#initialize(repository, options = {}) ⇒ Base constructor

Create a new retriever for the given repository.
#retrieve ⇒ Object

Retrieve repository, overridden in heirs.

Constructor Details

#initialize(repository, options = {}) ⇒ `Base`

Create a new retriever for the given repository. This class recognizes several options, and subclasses may recognize additional options. Options may never be required.

Options

:basedir: Required, base directory where all files should be retrieved
:max_bytes: Maximum number of bytes to read
:max_seconds: Maximum number of seconds to spend reading
:logger: Logger to use

Parameters

repository(RightScraper::Repositories::Base): repository to scrape
options(Hash): retriever options

Raise

‘Missing base directory’: if :basedir option is missing

# File 'lib/right_scraper/retrievers/base.rb', line 65

def initialize(repository, options={})
  raise 'Missing base directory' unless options[:basedir]
  @repository = repository
  @max_bytes = options[:max_bytes] || nil
  @max_seconds = options[:max_seconds] || nil
  @basedir = options[:basedir]
  @repo_dir = RightScraper::Retrievers::Base.repo_dir(@basedir, repository)
  @logger = options[:logger] || RightScraper::Logger.new
  @logger.repository = repository
  @logger.operation(:initialize, "setting up in #{@repo_dir}") do
    FileUtils.mkdir_p(@repo_dir)
  end
end

Instance Attribute Details

#max_bytes ⇒ `Object`

Integer: optional maximum size permitted for repositories



34
35
36

# File 'lib/right_scraper/retrievers/base.rb', line 34

def max_bytes
  @max_bytes
end

#max_seconds ⇒ `Object`

Integer: optional maximum number of seconds for any single

retrieve operation.



38
39
40

# File 'lib/right_scraper/retrievers/base.rb', line 38

def max_seconds
  @max_seconds
end

#repo_dir ⇒ `Object` (readonly)

String: Path to directory where files are retrieved



44
45
46

# File 'lib/right_scraper/retrievers/base.rb', line 44

def repo_dir
  @repo_dir
end

#repository ⇒ `Object` (readonly)

RightScraper::Repositories::Base: repository currently being retrieved



41
42
43

# File 'lib/right_scraper/retrievers/base.rb', line 41

def repository
  @repository
end

Class Method Details

.repo_dir(root_dir, repo) ⇒ `Object`

Path to directory where given repo should be or was downloaded

Parameters

root_dir(String): Path to directory containing all scraped repositories
repo(Hash|RightScraper::Repositories::Base): Remote repository corresponding to local directory

Return

String: Path to local directory that corresponds to given repository

# File 'lib/right_scraper/retrievers/base.rb', line 106

def self.repo_dir(root_dir, repo)
  repo = RightScraper::Repositories::Base.from_hash(repo) if repo.is_a?(Hash)
  dir_name  = repo.repository_hash
  dir_path  = File.join(root_dir, dir_name)
  "#{dir_path}/repo"
end

Instance Method Details

#available? ⇒ `Boolean`

Determines if retriever is available (has required CLI tools, etc.)

Returns:

(Boolean)

Raises:

(NotImplementedError)



80
81
82

# File 'lib/right_scraper/retrievers/base.rb', line 80

def available?
  raise NotImplementedError
end

#ignorable_paths ⇒ `Object`

Paths to ignore when traversing the filesystem. Mostly used for things like Git and Subversion version control directories.

Return

list(Array): list of filenames to ignore.



89
90
91

# File 'lib/right_scraper/retrievers/base.rb', line 89

def ignorable_paths
  []
end

#retrieve ⇒ `Object`

Retrieve repository, overridden in heirs

Raises:

(NotImplementedError)



94
95
96

# File 'lib/right_scraper/retrievers/base.rb', line 94

def retrieve
  raise NotImplementedError
end

Class: RightScraper::Retrievers::Base

Overview

Direct Known Subclasses

Defined Under Namespace

Constant Summary collapse

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(repository, options = {}) ⇒ Base

Options

Parameters

Raise

Instance Attribute Details

#max_bytes ⇒ Object

#max_seconds ⇒ Object

#repo_dir ⇒ Object (readonly)

#repository ⇒ Object (readonly)

Class Method Details

.repo_dir(root_dir, repo) ⇒ Object

Parameters

Return

Instance Method Details

#available? ⇒ Boolean

#ignorable_paths ⇒ Object

Return

#retrieve ⇒ Object

#initialize(repository, options = {}) ⇒ `Base`

#max_bytes ⇒ `Object`

#max_seconds ⇒ `Object`

#repo_dir ⇒ `Object` (readonly)

#repository ⇒ `Object` (readonly)

.repo_dir(root_dir, repo) ⇒ `Object`

#available? ⇒ `Boolean`

#ignorable_paths ⇒ `Object`

#retrieve ⇒ `Object`