Class: DataKitten::Dataset

Inherits:
Object
  • Object
show all
Includes:
Hosts, Origins, PublishingFormats
Defined in:
lib/data_kitten/dataset.rb

Overview

Represents a single dataset from some origin (see dcat:Dataset for relevant vocabulary).

Designed to be created with a URI to the dataset, and then to work out metadata from there.

Currently supports Datasets hosted in Git (and optionally on GitHub), and which use the Datapackage metadata format.

Examples:

Load a Dataset from a git repository

dataset = Dataset.new('git://github.com/theodi/dataset-metadata-survey.git')
dataset.supported?         # => true
dataset.origin             # => :git
dataset.host               # => :github
dataset.publishing_format  # => :datapackage

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#new(url) ⇒ Dataset #new(options) ⇒ Dataset

Create a new Dataset object

The class will attempt to auto-load metadata from this URL.

Overloads:

  • #new(url) ⇒ Dataset

    Parameters:

    • url (String)

      A URL that can be used to access the Dataset

  • #new(options) ⇒ Dataset

    Parameters:

    • options (Hash)

      the details of the Dataset.

    Options Hash (options):

    • :access_url (String)

      A URL that can be used to access the Dataset.



43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# File 'lib/data_kitten/dataset.rb', line 43

def initialize(url_or_options, base_url=nil)
  url = case url_or_options
  when Hash
    base_url ||= url_or_options[:base_url]
    url_or_options[:access_url]
  else
    url_or_options
  end
  @access_url = DataKitten::Fetcher.wrap(url)
  @base_uri = URI(base_url) if base_url

  detect_origin
  detect_host
  detect_publishing_format
end

Instance Attribute Details

#access_urlString

Returns the URL that gives access to the dataset.

Returns:

  • (String)

    the URL that gives access to the dataset



30
31
32
# File 'lib/data_kitten/dataset.rb', line 30

def access_url
  @access_url
end

#identifierString

A unique identifier of the dataset.

Returns:

  • (String)

    the identifier of the dataset



107
108
109
# File 'lib/data_kitten/dataset.rb', line 107

def identifier
  @identifier
end

#metadataObject

Returns the value of attribute metadata.



290
291
292
# File 'lib/data_kitten/dataset.rb', line 290

def 
  @metadata
end

#sourceObject



71
72
73
# File 'lib/data_kitten/dataset.rb', line 71

def source
  @source ||= @access_url.as_json if @access_url.ok?
end

Instance Method Details

#base_uriObject



63
64
65
# File 'lib/data_kitten/dataset.rb', line 63

def base_uri
  @base_uri || uri.merge("/")
end

#change_historyArray

A history of changes to the Dataset

Returns:

  • (Array)

    An array of changes. Exact format depends on the origin and publishing format.



279
280
281
# File 'lib/data_kitten/dataset.rb', line 279

def change_history
  []
end

#contributor_agreement_urlString

The URL of the contributor license agreement

Returns:

  • (String)

    A URL for the agreement that contributors accept.



256
257
258
# File 'lib/data_kitten/dataset.rb', line 256

def contributor_agreement_url
  nil
end

#contributorsArray<Agent>

A list of contributors

Returns:

  • (Array<Agent>)

    An array of contributors to the dataset, each as an Agent object.



228
229
230
# File 'lib/data_kitten/dataset.rb', line 228

def contributors
  []
end

#crowdsourced?Boolean

Has the data been crowdsourced?

Returns:

  • (Boolean)

    Whether the data has been crowdsourced or not.



249
250
251
# File 'lib/data_kitten/dataset.rb', line 249

def crowdsourced?
  false
end

#data_titleString

The human-readable title of the dataset.

Returns:

  • (String)

    the title of the dataset.



112
113
114
# File 'lib/data_kitten/dataset.rb', line 112

def data_title
  nil
end

#descriptionString

A brief description of the dataset

Returns:

  • (String)

    the description of the dataset.



119
120
121
# File 'lib/data_kitten/dataset.rb', line 119

def description
  nil
end

#distributionsArray<Distribution> Also known as: files, resources

A list of distributions. Has aliases for popular alternative vocabularies.

Returns:

  • (Array<Distribution>)

    An array of Distribution objects.



263
264
265
# File 'lib/data_kitten/dataset.rb', line 263

def distributions
  []
end

#documentation_urlString

Human-readable documentation for the dataset.

Returns:

  • (String)

    the URL of the documentation.



133
134
135
# File 'lib/data_kitten/dataset.rb', line 133

def documentation_url
  nil
end

#hostSymbol

Where the dataset is hosted.

Returns:

  • (Symbol)

    The host. For instance, data loaded from github repositories will return :github. This can be used to control extra host-specific behaviour if required. If no host type is identified, will return nil.



99
100
101
# File 'lib/data_kitten/dataset.rb', line 99

def host
  nil
end

#issuedDate Also known as: release_date

Date the dataset was released

Returns:

  • (Date)

    the release date of the dataset



148
149
150
# File 'lib/data_kitten/dataset.rb', line 148

def issued
  nil
end

#keywordsArray<string>

Keywords for the dataset

Returns:

  • (Array<string>)

    an array of keywords



126
127
128
# File 'lib/data_kitten/dataset.rb', line 126

def keywords
  []
end

#landing_pageString

A web page that can be used to gain access to the dataset, its distributions and/or additional information.

Returns:

  • (String)

    The URL to the dataset



163
164
165
# File 'lib/data_kitten/dataset.rb', line 163

def landing_page
  nil
end

#languageString

The language of the dataset.

Returns:

  • (String)

    the language of the dataset



235
236
237
# File 'lib/data_kitten/dataset.rb', line 235

def language
  nil
end

#licensesArray<License>

A list of licenses

Returns:

  • (Array<License>)

    An array of licenses, each as a License object.



214
215
216
# File 'lib/data_kitten/dataset.rb', line 214

def licenses
  []
end

#maintainersArray<Agent>

A list of maintainers

Returns:

  • (Array<Agent>)

    An array of maintainers, each as an Agent object.



200
201
202
# File 'lib/data_kitten/dataset.rb', line 200

def maintainers
  []
end

#modifiedDate

Date the dataset was last modified

Returns:

  • (Date)

    the dataset’s last modified date



156
157
158
# File 'lib/data_kitten/dataset.rb', line 156

def modified
  nil
end

#originSymbol

The origin type of the dataset.

Returns:

  • (Symbol)

    The origin type. For instance, datasets loaded from git repositories will return :git. If no origin type is identified, will return nil.



90
91
92
# File 'lib/data_kitten/dataset.rb', line 90

def origin
  nil
end

#publishersArray<Agent>

A list of publishers

Returns:

  • (Array<Agent>)

    An array of publishers, each as an Agent object.



207
208
209
# File 'lib/data_kitten/dataset.rb', line 207

def publishers
  []
end

#publishing_formatSymbol

The publishing format for the dataset.

Returns:

  • (Symbol)

    The format. For instance, datasets that publish metadata in Datapackage format will return :datapackage. If no format is identified, will return nil.



193
194
195
# File 'lib/data_kitten/dataset.rb', line 193

def publishing_format
  nil
end

#release_typeSymbol

What type of dataset is this? Options are: :web_service for API-accessible data, or :one_off for downloadable data dumps.

Returns:

  • (Symbol)

    the release type.



141
142
143
# File 'lib/data_kitten/dataset.rb', line 141

def release_type
  false
end

#rightsObject<Rights>

The rights statment for the data

Returns:

  • (Object<Rights>)

    How the content and data can be used, as well as copyright notice and attribution URL



221
222
223
# File 'lib/data_kitten/dataset.rb', line 221

def rights
  nil
end

#sourcesArray<Source>

Where the data is sourced from

Returns:

  • (Array<Source>)

    the sources of the data, each as a Source object.



177
178
179
# File 'lib/data_kitten/dataset.rb', line 177

def sources
  []
end

#spatialGeoJSON Geometry

Spatial coverage of the dataset

Returns:

  • (GeoJSON Geometry)

    A GeoJSON geometry object of the spatial coverage



286
287
288
# File 'lib/data_kitten/dataset.rb', line 286

def spatial
  nil
end

#supported?Boolean

Can metadata be loaded for this Dataset?

Returns:

  • (Boolean)

    true if metadata can be loaded, false if it’s an unknown origin type, or has an unknown metadata format.



81
82
83
# File 'lib/data_kitten/dataset.rb', line 81

def supported?
  !(origin.nil? || publishing_format.nil?)
end

#temporalObject<Temporal>

The temporal coverage of the dataset

Returns:

  • (Object<Temporal>)

    the start and end dates of the dataset’s temporal coverage



170
171
172
# File 'lib/data_kitten/dataset.rb', line 170

def temporal
  nil
end

#themeString

The main category the dataset belongs to.

Returns:

  • (String)


242
243
244
# File 'lib/data_kitten/dataset.rb', line 242

def theme
  nil
end

#time_sensitive?Boolean

Is the information time-sensitive?

Returns:

  • (Boolean)

    whether the information will go out of date.



184
185
186
# File 'lib/data_kitten/dataset.rb', line 184

def time_sensitive?
  false
end

#update_frequencyString

How frequently the data is updated.

Returns:

  • (String)

    The frequency of update expressed as a dct:Frequency.



272
273
274
# File 'lib/data_kitten/dataset.rb', line 272

def update_frequency
  nil
end

#uriObject



59
60
61
# File 'lib/data_kitten/dataset.rb', line 59

def uri
  URI(@access_url.to_s)
end

#urlObject



67
68
69
# File 'lib/data_kitten/dataset.rb', line 67

def url
  @access_url.to_s
end