Class: RightScraper::Retrievers::Base

Inherits:
Object
  • Object
show all
Defined in:
lib/right_scraper/retrievers/base.rb

Overview

Base class for all retrievers.

Retrievers fetch remote repositories into a given path They will attempt to fetch incrementally when possible (e.g. leveraging the underlying source control management system incremental capabilities)

Direct Known Subclasses

CheckoutBasedRetriever, Download

Defined Under Namespace

Classes: RetrieverError

Constant Summary collapse

@@types =

(Hash) Lookup table from textual description of scraper type (‘cookbook’ or ‘workflow’ currently) to the class that represents that scraper.

{}

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(repository, options = {}) ⇒ Base

Create a new retriever for the given repository. This class recognizes several options, and subclasses may recognize additional options. Options may never be required.

Options

:basedir

Required, base directory where all files should be retrieved

:max_bytes

Maximum number of bytes to read

:max_seconds

Maximum number of seconds to spend reading

:logger

Logger to use

Parameters

repository(RightScraper::Repositories::Base)

repository to scrape

options(Hash)

retriever options

Raise

‘Missing base directory’

if :basedir option is missing



65
66
67
68
69
70
71
72
73
74
75
76
77
# File 'lib/right_scraper/retrievers/base.rb', line 65

def initialize(repository, options={})
  raise 'Missing base directory' unless options[:basedir]
  @repository = repository
  @max_bytes = options[:max_bytes] || nil
  @max_seconds = options[:max_seconds] || nil
  @basedir = options[:basedir]
  @repo_dir = RightScraper::Retrievers::Base.repo_dir(@basedir, repository)
  @logger = options[:logger] || RightScraper::Logger.new
  @logger.repository = repository
  @logger.operation(:initialize, "setting up in #{@repo_dir}") do
    FileUtils.mkdir_p(@repo_dir)
  end
end

Instance Attribute Details

#max_bytesObject

Integer

optional maximum size permitted for repositories



34
35
36
# File 'lib/right_scraper/retrievers/base.rb', line 34

def max_bytes
  @max_bytes
end

#max_secondsObject

Integer

optional maximum number of seconds for any single

retrieve operation.



38
39
40
# File 'lib/right_scraper/retrievers/base.rb', line 38

def max_seconds
  @max_seconds
end

#repo_dirObject (readonly)

String

Path to directory where files are retrieved



44
45
46
# File 'lib/right_scraper/retrievers/base.rb', line 44

def repo_dir
  @repo_dir
end

#repositoryObject (readonly)

RightScraper::Repositories::Base

repository currently being retrieved



41
42
43
# File 'lib/right_scraper/retrievers/base.rb', line 41

def repository
  @repository
end

Class Method Details

.repo_dir(root_dir, repo) ⇒ Object

Path to directory where given repo should be or was downloaded

Parameters

root_dir(String)

Path to directory containing all scraped repositories

repo(Hash|RightScraper::Repositories::Base)

Remote repository corresponding to local directory

Return

String

Path to local directory that corresponds to given repository



106
107
108
109
110
111
# File 'lib/right_scraper/retrievers/base.rb', line 106

def self.repo_dir(root_dir, repo)
  repo = RightScraper::Repositories::Base.from_hash(repo) if repo.is_a?(Hash)
  dir_name  = repo.repository_hash
  dir_path  = File.join(root_dir, dir_name)
  "#{dir_path}/repo"
end

Instance Method Details

#available?Boolean

Determines if retriever is available (has required CLI tools, etc.)

Returns:

  • (Boolean)

Raises:

  • (NotImplementedError)


80
81
82
# File 'lib/right_scraper/retrievers/base.rb', line 80

def available?
  raise NotImplementedError
end

#ignorable_pathsObject

Paths to ignore when traversing the filesystem. Mostly used for things like Git and Subversion version control directories.

Return

list(Array)

list of filenames to ignore.



89
90
91
# File 'lib/right_scraper/retrievers/base.rb', line 89

def ignorable_paths
  []
end

#retrieveObject

Retrieve repository, overridden in heirs

Raises:

  • (NotImplementedError)


94
95
96
# File 'lib/right_scraper/retrievers/base.rb', line 94

def retrieve
  raise NotImplementedError
end