Class: DataKitten::Distribution

Inherits:
Object
  • Object
show all
Defined in:
lib/data_kitten/distribution.rb

Overview

A specific available form of a dataset, such as a CSV file, an API, or an RSS feed.

Based on dcat:Distribution, but with useful aliases for other vocabularies.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(dataset, options) ⇒ Distribution

Create a new Distribution. Currently only loads from Datapackage resource hashes.

Parameters:

  • dataset (Dataset)

    the DataKitten::Dataset that this is a part of.

  • options (Hash)

    A set of options with which to initialise the distribution.

Options Hash (options):

  • :datapackage_resource (String)

    the resource section of a Datapackage representation to load information from.



66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
# File 'lib/data_kitten/distribution.rb', line 66

def initialize(dataset, options)
  # Store dataset
  @dataset = dataset
  # Parse datapackage
  if r = options[:datapackage_resource]
    # Load basics
    @description = r['description']
    # Work out format
    @format = begin
      @extension = r['format']
      if @extension.nil?
        @extension = r['path'].is_a?(String) ? r['path'].split('.').last.upcase : nil
      end
      @extension ? DistributionFormat.new(self) : nil
    end
    # Get CSV dialect
    @dialect = r['dialect']
    # Extract schema
    @schema = r['schema']
    # Get path
    @path = r['path']
    @download_url = r['url']
    # Set title
    @title = @path || @uri
  elsif r = options[:dcat_resource]
    @title       = r[:title]
    @description = r[:title]
    @access_url  = r[:accessURL]
  elsif r = options[:ckan_resource]
    @title        = r[:title]
    @description  = r[:title]
    @issued       = r[:issued]
    @modified     = r[:modified]
    @access_url   = r[:accessURL]
    @download_url = r[:downloadURL]
    @byte_size    = r[:byteSize]
    @media_type   = r[:mediaType]
    @extension    = r[:format]
    # Load HTTP Response for further use
    @format = r[:format] ? DistributionFormat.new(self) : nil
  end
  # Set default CSV dialect
  @dialect ||= {
    "delimiter" => ","
  }

  @download = Fetcher.wrap(@download_url)
end

Instance Attribute Details

#access_urlString

Returns a URL to access the distribution.

Returns:

  • (String)

    a URL to access the distribution.



16
17
18
# File 'lib/data_kitten/distribution.rb', line 16

def access_url
  @access_url
end

#byte_sizeInteger

Returns size of file in bytes.

Returns:

  • (Integer)

    size of file in bytes



45
46
47
# File 'lib/data_kitten/distribution.rb', line 45

def byte_size
  @byte_size
end

#descriptionString

Returns a textual description.

Returns:

  • (String)

    a textual description



33
34
35
# File 'lib/data_kitten/distribution.rb', line 33

def description
  @description
end

#download_urlString Also known as: uri

Returns a URL to the file of the distribution.

Returns:

  • (String)

    a URL to the file of the distribution.



20
21
22
# File 'lib/data_kitten/distribution.rb', line 20

def download_url
  @download_url
end

#extensionString

Returns the file extension of the distribution.

Returns:

  • (String)

    the file extension of the distribution



58
59
60
# File 'lib/data_kitten/distribution.rb', line 58

def extension
  @extension
end

#formatDistributionFormat

Returns the file format of the distribution.

Returns:



12
13
14
# File 'lib/data_kitten/distribution.rb', line 12

def format
  @format
end

#issuedDate

Returns date created.

Returns:

  • (Date)

    date created



37
38
39
# File 'lib/data_kitten/distribution.rb', line 37

def issued
  @issued
end

#media_typeString

Returns the IANA media type (MIME type) of the distribution.

Returns:

  • (String)

    the IANA media type (MIME type) of the distribution



49
50
51
# File 'lib/data_kitten/distribution.rb', line 49

def media_type
  @media_type
end

#modifiedDate

Returns date modified.

Returns:

  • (Date)

    date modified



41
42
43
# File 'lib/data_kitten/distribution.rb', line 41

def modified
  @modified
end

#pathString

Returns the path of the distribution within the source, if appropriate.

Returns:

  • (String)

    the path of the distribution within the source, if appropriate



25
26
27
# File 'lib/data_kitten/distribution.rb', line 25

def path
  @path
end

#schemaHash

Returns a hash representing the schema of the data within the distribution. Will change to a more structured object later.

Returns:

  • (Hash)

    a hash representing the schema of the data within the distribution. Will change to a more structured object later.



54
55
56
# File 'lib/data_kitten/distribution.rb', line 54

def schema
  @schema
end

#titleString Also known as: name

A usable name for the distribution, unique within the DataKitten::Dataset.

Returns:

  • (String)

    a locally unique name



29
30
31
# File 'lib/data_kitten/distribution.rb', line 29

def title
  @title
end

Instance Method Details

#dataArray<Array<String>>

A CSV object representing the loaded data.

Returns:

  • (Array<Array<String>>)

    an array of arrays of strings, representing each row.



147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
# File 'lib/data_kitten/distribution.rb', line 147

def data
  @data ||= begin
    if @path
      datafile = @dataset.send(:load_file, @path)
    elsif @download.ok?
      datafile = @download.body
    end
    if datafile
      case format.extension
      when :csv
        CSV.parse(
          datafile,
          :headers => true,
          :col_sep => @dialect["delimiter"]
        )
      else
        nil
      end
    else
      nil
    end
  rescue
    nil
  end
end

#exists?Boolean

Whether the file that the distribution represents actually exists

Returns:

  • (Boolean)

    whether the HTTP response returns a success code or not



140
141
142
# File 'lib/data_kitten/distribution.rb', line 140

def exists?
  @download.exists?
end

#headersArray<String>

An array of column headers for the distribution. Loaded from the schema, or from the file directly if no schema is present.

Returns:

  • (Array<String>)

    an array of column headers, as strings.



127
128
129
130
131
132
133
134
135
# File 'lib/data_kitten/distribution.rb', line 127

def headers
  @headers ||= begin
    if @schema
      @schema['fields'].map{|x| x['id']}
    else
      data.headers
    end
  end
end