Class: DataKitten::Dataset

Inherits:
Object
  • Object
show all
Includes:
Hosts, Origins, PublishingFormats
Defined in:
lib/data_kitten/dataset.rb

Overview

Represents a single dataset from some origin (see dcat:Dataset for relevant vocabulary).

Designed to be created with a URI to the dataset, and then to work out metadata from there.

Currently supports Datasets hosted in Git (and optionally on GitHub), and which use the Datapackage metadata format.

Examples:

Load a Dataset from a git repository

dataset = Dataset.new(access_url: 'git://github.com/theodi/dataset-metadata-survey.git')
dataset.supported?         # => true
dataset.origin             # => :git
dataset.host               # => :github
dataset.publishing_format  # => :datapackage

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(options) ⇒ Dataset

Create a new Dataset object

Parameters:

  • options (Hash)

    the details of the Dataset.

Options Hash (options):

  • :access_url (String)

    A URL that can be used to access the Dataset. The class will attempt to auto-load metadata from this URL.



38
39
40
41
42
43
# File 'lib/data_kitten/dataset.rb', line 38

def initialize(options)
  @access_url = DataKitten::Fetcher.wrap(options[:access_url])
  detect_origin
  detect_host
  detect_publishing_format
end

Instance Attribute Details

#access_urlString

Returns the URL that gives access to the dataset.

Returns:

  • (String)

    the URL that gives access to the dataset



30
31
32
# File 'lib/data_kitten/dataset.rb', line 30

def access_url
  @access_url
end

Instance Method Details

#change_historyArray

A history of changes to the Dataset

Returns:

  • (Array)

    An array of changes. Exact format depends on the origin and publishing format.



228
229
230
# File 'lib/data_kitten/dataset.rb', line 228

def change_history
  []
end

#contributor_agreement_urlString

The URL of the contributor license agreement

Returns:

  • (String)

    A URL for the agreement that contributors accept.



205
206
207
# File 'lib/data_kitten/dataset.rb', line 205

def contributor_agreement_url
  nil
end

#contributorsArray<Agent>

A list of contributors

Returns:

  • (Array<Agent>)

    An array of contributors to the dataset, each as an Agent object.



191
192
193
# File 'lib/data_kitten/dataset.rb', line 191

def contributors
  []
end

#crowdsourced?Boolean

Has the data been crowdsourced?

Returns:

  • (Boolean)

    Whether the data has been crowdsourced or not.



198
199
200
# File 'lib/data_kitten/dataset.rb', line 198

def crowdsourced?
  false
end

#data_titleString

The human-readable title of the dataset.

Returns:

  • (String)

    the title of the dataset.



82
83
84
# File 'lib/data_kitten/dataset.rb', line 82

def data_title
  nil
end

#descriptionString

A brief description of the dataset

Returns:

  • (String)

    the description of the dataset.



89
90
91
# File 'lib/data_kitten/dataset.rb', line 89

def description
  nil
end

#distributionsArray<Distribution> Also known as: files, resources

A list of distributions. Has aliases for popular alternative vocabularies.

Returns:

  • (Array<Distribution>)

    An array of Distribution objects.



212
213
214
# File 'lib/data_kitten/dataset.rb', line 212

def distributions
  []
end

#documentation_urlString

Human-readable documentation for the dataset.

Returns:

  • (String)

    the URL of the documentation.



103
104
105
# File 'lib/data_kitten/dataset.rb', line 103

def documentation_url
  nil
end

#hostSymbol

Where the dataset is hosted.

Returns:

  • (Symbol)

    The host. For instance, data loaded from github repositories will return :github. This can be used to control extra host-specific behaviour if required. If no host type is identified, will return nil.



75
76
77
# File 'lib/data_kitten/dataset.rb', line 75

def host
  nil
end

#issuedDate Also known as: release_date

Date the dataset was released

Returns:

  • (Date)

    the release date of the dataset



118
119
120
# File 'lib/data_kitten/dataset.rb', line 118

def issued
  nil
end

#keywordsArray<string>

Keywords for the dataset

Returns:

  • (Array<string>)

    an array of keywords



96
97
98
# File 'lib/data_kitten/dataset.rb', line 96

def keywords
  []
end

#licensesArray<License>

A list of licenses

Returns:

  • (Array<License>)

    An array of licenses, each as a License object.



177
178
179
# File 'lib/data_kitten/dataset.rb', line 177

def licenses
  []
end

#maintainersArray<Agent>

A list of maintainers

Returns:

  • (Array<Agent>)

    An array of maintainers, each as an Agent object.



163
164
165
# File 'lib/data_kitten/dataset.rb', line 163

def maintainers
  []
end

#modifiedDate

Date the dataset was last modified

Returns:

  • (Date)

    the dataset’s last modified date



126
127
128
# File 'lib/data_kitten/dataset.rb', line 126

def modified
  nil
end

#originSymbol

The origin type of the dataset.

Returns:

  • (Symbol)

    The origin type. For instance, datasets loaded from git repositories will return :git. If no origin type is identified, will return nil.



66
67
68
# File 'lib/data_kitten/dataset.rb', line 66

def origin
  nil
end

#publishersArray<Agent>

A list of publishers

Returns:

  • (Array<Agent>)

    An array of publishers, each as an Agent object.



170
171
172
# File 'lib/data_kitten/dataset.rb', line 170

def publishers
  []
end

#publishing_formatSymbol

The publishing format for the dataset.

Returns:

  • (Symbol)

    The format. For instance, datasets that publish metadata in Datapackage format will return :datapackage. If no format is identified, will return nil.



156
157
158
# File 'lib/data_kitten/dataset.rb', line 156

def publishing_format
  nil
end

#release_typeSymbol

What type of dataset is this? Options are: :web_service for API-accessible data, or :one_off for downloadable data dumps.

Returns:

  • (Symbol)

    the release type.



111
112
113
# File 'lib/data_kitten/dataset.rb', line 111

def release_type
  false
end

#rightsObject<Rights>

The rights statment for the data

Returns:

  • (Object<Rights>)

    How the content and data can be used, as well as copyright notice and attribution URL



184
185
186
# File 'lib/data_kitten/dataset.rb', line 184

def rights
  nil
end

#sourcesArray<Source>

Where the data is sourced from

Returns:

  • (Array<Source>)

    the sources of the data, each as a Source object.



140
141
142
# File 'lib/data_kitten/dataset.rb', line 140

def sources
  []
end

#supported?Boolean

Can metadata be loaded for this Dataset?

Returns:

  • (Boolean)

    true if metadata can be loaded, false if it’s an unknown origin type, or has an unknown metadata format.



57
58
59
# File 'lib/data_kitten/dataset.rb', line 57

def supported?
  !(origin.nil? || publishing_format.nil?)
end

#temporalObject<Temporal>

The temporal coverage of the dataset

Returns:

  • (Object<Temporal>)

    the start and end dates of the dataset’s temporal coverage



133
134
135
# File 'lib/data_kitten/dataset.rb', line 133

def temporal
  nil
end

#time_sensitive?Boolean

Is the information time-sensitive?

Returns:

  • (Boolean)

    whether the information will go out of date.



147
148
149
# File 'lib/data_kitten/dataset.rb', line 147

def time_sensitive?
  false
end

#update_frequencyString

How frequently the data is updated.

Returns:

  • (String)

    The frequency of update expressed as a dct:Frequency.



221
222
223
# File 'lib/data_kitten/dataset.rb', line 221

def update_frequency
  nil
end

#uriObject



45
46
47
# File 'lib/data_kitten/dataset.rb', line 45

def uri
  URI(@access_url.to_s)
end

#urlObject



49
50
51
# File 'lib/data_kitten/dataset.rb', line 49

def url
  @access_url.to_s
end