Class: DataKitten::Dataset

Inherits:
Object
  • Object
show all
Includes:
Hosts, Origins, PublishingFormats
Defined in:
lib/data_kitten/dataset.rb

Overview

Represents a single dataset from some origin (see dcat:Dataset for relevant vocabulary).

Designed to be created with a URI to the dataset, and then to work out metadata from there.

Currently supports Datasets hosted in Git (and optionally on GitHub), and which use the Datapackage metadata format.

Examples:

Load a Dataset from a git repository

dataset = Dataset.new(access_url: 'git://github.com/theodi/dataset-metadata-survey.git')
dataset.supported?         # => true
dataset.origin             # => :git
dataset.host               # => :github
dataset.publishing_format  # => :datapackage

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(options) ⇒ Dataset

Create a new Dataset object

Parameters:

  • options (Hash)

    the details of the Dataset.

Options Hash (options):

  • :access_url (String)

    A URL that can be used to access the Dataset. The class will attempt to auto-load metadata from this URL.



38
39
40
41
42
43
# File 'lib/data_kitten/dataset.rb', line 38

def initialize(options)
  @access_url = DataKitten::Fetcher.wrap(options[:access_url])
  detect_origin
  detect_host
  detect_publishing_format
end

Instance Attribute Details

#access_urlString

Returns the URL that gives access to the dataset.

Returns:

  • (String)

    the URL that gives access to the dataset



30
31
32
# File 'lib/data_kitten/dataset.rb', line 30

def access_url
  @access_url
end

Instance Method Details

#change_historyArray

A history of changes to the Dataset

Returns:

  • (Array)

    An array of changes. Exact format depends on the origin and publishing format.



261
262
263
# File 'lib/data_kitten/dataset.rb', line 261

def change_history
  []
end

#contributor_agreement_urlString

The URL of the contributor license agreement

Returns:

  • (String)

    A URL for the agreement that contributors accept.



238
239
240
# File 'lib/data_kitten/dataset.rb', line 238

def contributor_agreement_url
  nil
end

#contributorsArray<Agent>

A list of contributors

Returns:

  • (Array<Agent>)

    An array of contributors to the dataset, each as an Agent object.



210
211
212
# File 'lib/data_kitten/dataset.rb', line 210

def contributors
  []
end

#crowdsourced?Boolean

Has the data been crowdsourced?

Returns:

  • (Boolean)

    Whether the data has been crowdsourced or not.



231
232
233
# File 'lib/data_kitten/dataset.rb', line 231

def crowdsourced?
  false
end

#data_titleString

The human-readable title of the dataset.

Returns:

  • (String)

    the title of the dataset.



94
95
96
# File 'lib/data_kitten/dataset.rb', line 94

def data_title
  nil
end

#descriptionString

A brief description of the dataset

Returns:

  • (String)

    the description of the dataset.



101
102
103
# File 'lib/data_kitten/dataset.rb', line 101

def description
  nil
end

#distributionsArray<Distribution> Also known as: files, resources

A list of distributions. Has aliases for popular alternative vocabularies.

Returns:

  • (Array<Distribution>)

    An array of Distribution objects.



245
246
247
# File 'lib/data_kitten/dataset.rb', line 245

def distributions
  []
end

#documentation_urlString

Human-readable documentation for the dataset.

Returns:

  • (String)

    the URL of the documentation.



115
116
117
# File 'lib/data_kitten/dataset.rb', line 115

def documentation_url
  nil
end

#hostSymbol

Where the dataset is hosted.

Returns:

  • (Symbol)

    The host. For instance, data loaded from github repositories will return :github. This can be used to control extra host-specific behaviour if required. If no host type is identified, will return nil.



79
80
81
# File 'lib/data_kitten/dataset.rb', line 79

def host
  nil
end

#identifierString

A unique identifier of the dataset.

Returns:

  • (String)

    the identifier of the dataset



87
88
89
# File 'lib/data_kitten/dataset.rb', line 87

def identifier
  nil
end

#issuedDate Also known as: release_date

Date the dataset was released

Returns:

  • (Date)

    the release date of the dataset



130
131
132
# File 'lib/data_kitten/dataset.rb', line 130

def issued
  nil
end

#keywordsArray<string>

Keywords for the dataset

Returns:

  • (Array<string>)

    an array of keywords



108
109
110
# File 'lib/data_kitten/dataset.rb', line 108

def keywords
  []
end

#landing_pageString

A web page that can be used to gain access to the dataset, its distributions and/or additional information.

Returns:

  • (String)

    The URL to the dataset



145
146
147
# File 'lib/data_kitten/dataset.rb', line 145

def landing_page
  nil
end

#languageString

The language of the dataset.

Returns:

  • (String)

    the language of the dataset



217
218
219
# File 'lib/data_kitten/dataset.rb', line 217

def language
  nil
end

#licensesArray<License>

A list of licenses

Returns:

  • (Array<License>)

    An array of licenses, each as a License object.



196
197
198
# File 'lib/data_kitten/dataset.rb', line 196

def licenses
  []
end

#maintainersArray<Agent>

A list of maintainers

Returns:

  • (Array<Agent>)

    An array of maintainers, each as an Agent object.



182
183
184
# File 'lib/data_kitten/dataset.rb', line 182

def maintainers
  []
end

#modifiedDate

Date the dataset was last modified

Returns:

  • (Date)

    the dataset’s last modified date



138
139
140
# File 'lib/data_kitten/dataset.rb', line 138

def modified
  nil
end

#originSymbol

The origin type of the dataset.

Returns:

  • (Symbol)

    The origin type. For instance, datasets loaded from git repositories will return :git. If no origin type is identified, will return nil.



70
71
72
# File 'lib/data_kitten/dataset.rb', line 70

def origin
  nil
end

#publishersArray<Agent>

A list of publishers

Returns:

  • (Array<Agent>)

    An array of publishers, each as an Agent object.



189
190
191
# File 'lib/data_kitten/dataset.rb', line 189

def publishers
  []
end

#publishing_formatSymbol

The publishing format for the dataset.

Returns:

  • (Symbol)

    The format. For instance, datasets that publish metadata in Datapackage format will return :datapackage. If no format is identified, will return nil.



175
176
177
# File 'lib/data_kitten/dataset.rb', line 175

def publishing_format
  nil
end

#release_typeSymbol

What type of dataset is this? Options are: :web_service for API-accessible data, or :one_off for downloadable data dumps.

Returns:

  • (Symbol)

    the release type.



123
124
125
# File 'lib/data_kitten/dataset.rb', line 123

def release_type
  false
end

#rightsObject<Rights>

The rights statment for the data

Returns:

  • (Object<Rights>)

    How the content and data can be used, as well as copyright notice and attribution URL



203
204
205
# File 'lib/data_kitten/dataset.rb', line 203

def rights
  nil
end

#sourceObject



53
54
55
# File 'lib/data_kitten/dataset.rb', line 53

def source
  @access_url.as_json if @access_url.ok?
end

#sourcesArray<Source>

Where the data is sourced from

Returns:

  • (Array<Source>)

    the sources of the data, each as a Source object.



159
160
161
# File 'lib/data_kitten/dataset.rb', line 159

def sources
  []
end

#spatialGeoJSON Geometry

Spatial coverage of the dataset

Returns:

  • (GeoJSON Geometry)

    A GeoJSON geometry object of the spatial coverage



268
269
270
# File 'lib/data_kitten/dataset.rb', line 268

def spatial
  nil
end

#supported?Boolean

Can metadata be loaded for this Dataset?

Returns:

  • (Boolean)

    true if metadata can be loaded, false if it’s an unknown origin type, or has an unknown metadata format.



61
62
63
# File 'lib/data_kitten/dataset.rb', line 61

def supported?
  !(origin.nil? || publishing_format.nil?)
end

#temporalObject<Temporal>

The temporal coverage of the dataset

Returns:

  • (Object<Temporal>)

    the start and end dates of the dataset’s temporal coverage



152
153
154
# File 'lib/data_kitten/dataset.rb', line 152

def temporal
  nil
end

#themeString

The main category the dataset belongs to.

Returns:

  • (String)


224
225
226
# File 'lib/data_kitten/dataset.rb', line 224

def theme
  nil
end

#time_sensitive?Boolean

Is the information time-sensitive?

Returns:

  • (Boolean)

    whether the information will go out of date.



166
167
168
# File 'lib/data_kitten/dataset.rb', line 166

def time_sensitive?
  false
end

#update_frequencyString

How frequently the data is updated.

Returns:

  • (String)

    The frequency of update expressed as a dct:Frequency.



254
255
256
# File 'lib/data_kitten/dataset.rb', line 254

def update_frequency
  nil
end

#uriObject



45
46
47
# File 'lib/data_kitten/dataset.rb', line 45

def uri
  URI(@access_url.to_s)
end

#urlObject



49
50
51
# File 'lib/data_kitten/dataset.rb', line 49

def url
  @access_url.to_s
end