Class: RightScraper::Repositories::Base

Inherits:
RightScraper::RegisteredBase show all
Defined in:
lib/right_scraper/repositories/base.rb

Overview

Description of remote repository that needs to be scraped.

Repository definitions inherit from this base class. A repository must register its #repo_type in @@types so that they can be used with Repositories::Base::from_hash, as follows:

class Foo < ::RightScraper::Repositories::Base
  ...

  # self-register
  register_self
  register_url_schemas('foo')
end

Subclasses should override #repo_type, #retriever and #to_url; when sensible, #revision should also be overridden. The most important methods are #to_url, which will return a URI that completely characterizes the repository, and #retriever which returns the appropriate RightScraper::Retrievers::Base to scan that repository.

Direct Known Subclasses

Download, Git, Svn

Defined Under Namespace

Modules: PATTERN Classes: RepositoryError

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from RightScraper::RegisteredBase

query_registered_type, register_class, register_self, registered_types

Instance Attribute Details

#display_nameObject

(String) Human readable repository name used for progress reports



110
111
112
# File 'lib/right_scraper/repositories/base.rb', line 110

def display_name
  @display_name
end

#resources_pathObject

(Array of String) Subdirectories in the repository to search for resources



113
114
115
# File 'lib/right_scraper/repositories/base.rb', line 113

def resources_path
  @resources_path
end

#urlObject

(String) URL to repository (e.g ‘git://github.com/rightscale/right_scraper.git’)



116
117
118
# File 'lib/right_scraper/repositories/base.rb', line 116

def url
  @url
end

Class Method Details

.from_hash(repo_hash) ⇒ RightScraper::Repositories::Base

Factory method for a new repository.

Parameters:

  • repo_hash (Hash)

    describing repository to create

Returns:

Raises:

  • (::ArgumentError)


92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
# File 'lib/right_scraper/repositories/base.rb', line 92

def self.from_hash(repo_hash)
  repo_type = repo_hash[:repo_type].to_s
  raise ::ArgumentError, ':repo_type is required' if repo_type.empty?
  repo_class = query_registered_type(repo_type)
  repo = repo_class.new
  validate_uri(repo_hash[:url]) unless ENV['VALIDATE_URI'].to_s == 'false'
  repo_hash.each do |k, v|
    k = k.to_sym
    next if k == :repo_type
    if [:first_credential, :second_credential].include?(k) && is_useful?(v)
      v = useful_part(v)
    end
    repo.__send__("#{k.to_s}=".to_sym, v)
  end
  repo
end

.register_url_schemas(*args) ⇒ TrueClass

Registers any unknown URL schemas for validation.

Parameters:

  • args (Array)

    to register as URL schema(s)

Returns:

  • (TrueClass)

    always true



77
78
79
80
81
82
83
84
85
# File 'lib/right_scraper/repositories/base.rb', line 77

def self.register_url_schemas(*args)
  # note that set += blah seems to be badly implemented as set = set + blah
  # for the Set class, which leaves the original set object unchanged and
  # will return a new set object with the new data. only use the << operator
  # to update an existing set object.
  schemas = registered_url_schemas
  Array(args).flatten.each { |schema| schemas << schema }
  true
end

.registered_url_schemasSet

Returns set of registered repo url schemas.

Returns:

  • (Set)

    set of registered repo url schemas



64
65
66
67
68
69
70
# File 'lib/right_scraper/repositories/base.rb', line 64

def self.registered_url_schemas
  unless schemas = registration_module.instance_variable_get(:@registered_url_schemas)
    schemas = ::Set.new(['http', 'https', 'ftp'])
    registration_module.instance_variable_set(:@registered_url_schemas, schemas)
  end
  schemas
end

.registration_moduleModule

Returns module for registered repository types.

Returns:

  • (Module)

    module for registered repository types



59
60
61
# File 'lib/right_scraper/repositories/base.rb', line 59

def self.registration_module
  ::RightScraper::Repositories
end

Instance Method Details

#==(other) ⇒ Object

Return true if this repository and other represent the same repository including the same checkout tag.

Parameters

other(Repositories::Base)

repository to compare with

Returns

Boolean

true iff this repository and other are the same



190
191
192
193
194
195
196
# File 'lib/right_scraper/repositories/base.rb', line 190

def ==(other)
  if other.is_a?(RightScraper::Repositories::Base)
    checkout_hash == other.checkout_hash
  else
    false
  end
end

#checkout_hashObject

Return a unique identifier for this revision in this repository.

Returns

String

opaque unique ID for this revision in this repository



161
162
163
# File 'lib/right_scraper/repositories/base.rb', line 161

def checkout_hash
  repository_hash
end

#equal_repo?(other) ⇒ Boolean

Return true if this repository and other represent the same repository, excluding the checkout tag.

Parameters

other(Repositories::Base)

repository to compare with

Returns

Boolean

true iff this repository and other are the same

Returns:

  • (Boolean)


206
207
208
209
210
211
212
# File 'lib/right_scraper/repositories/base.rb', line 206

def equal_repo?(other)
  if other.is_a?(RightScraper::Repositories::Base)
    repository_hash == other.repository_hash
  else
    false
  end
end

#repo_typeObject

(String) Type of the repository. Currently one of ‘git’, ‘svn’ or ‘download’, implemented by the appropriate subclass. Needs to be overridden by subclasses.

Raises:

  • (NotImplementedError)


121
122
123
# File 'lib/right_scraper/repositories/base.rb', line 121

def repo_type
  raise NotImplementedError
end

#repository_hashObject

Return a unique identifier for this repository ignoring the tags to check out.

Returns

String

opaque unique ID for this repository



153
154
155
# File 'lib/right_scraper/repositories/base.rb', line 153

def repository_hash
  digest("#{::RightScraper::PROTOCOL_VERSION}\000#{repo_type}\000#{url}")
end

#retriever(options) ⇒ Object

(RightScraper::Retrievers::Base class) Appropriate class for retrieving this sort of repository. Needs to be overridden appropriately by subclasses.

Options

:max_bytes

Maximum number of bytes to read

:max_seconds

Maximum number of seconds to spend reading

:basedir

Destination directory, use temp dir if not specified

:logger

Logger to use

Returns

retriever(Retrievers::Base)

Corresponding retriever instance

Raises:

  • (NotImplementedError)


136
137
138
# File 'lib/right_scraper/repositories/base.rb', line 136

def retriever(options)
  raise NotImplementedError
end

#revisionObject

Return the revision this repository is currently looking at.

Returns

String

opaque revision type



144
145
146
# File 'lib/right_scraper/repositories/base.rb', line 144

def revision
  nil
end

#to_sObject

Unique representation for this repo, should resolve to the same string for repos that should be cloned in same directory

Returns

res(String)

Unique representation for this repo



170
171
172
# File 'lib/right_scraper/repositories/base.rb', line 170

def to_s
  res = "#{repo_type} #{url}"
end

#to_urlObject

Convert this repository to a URL in the style of resource URLs.

Returns

URI

URL representing this repository



178
179
180
# File 'lib/right_scraper/repositories/base.rb', line 178

def to_url
  URI.parse(url)
end