Class: DataKitten::Distribution

Inherits:
Object
  • Object
show all
Defined in:
lib/data_kitten/distribution.rb

Overview

A specific available form of a dataset, such as a CSV file, an API, or an RSS feed.

Based on dcat:Distribution, but with useful aliases for other vocabularies.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(dataset, options) ⇒ Distribution

Create a new Distribution. Currently only loads from Datapackage resource hashes.

Parameters:

  • dataset (Dataset)

    the DataKitten::Dataset that this is a part of.

  • options (Hash)

    A set of options with which to initialise the distribution.

Options Hash (options):

  • :datapackage_resource (String)

    the resource section of a Datapackage representation to load information from.



47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
# File 'lib/data_kitten/distribution.rb', line 47

def initialize(dataset, options)
  # Store dataset
  @dataset = dataset
  # Parse datapackage
  if r = options[:datapackage_resource]
    # Load basics
    @description = r['description']
    # Work out format
    @format = begin
      @extension = r['format']
      if @extension.nil?
        @extension = r['path'].is_a?(String) ? r['path'].split('.').last.upcase : nil
      end
      @extension ? DistributionFormat.new(self) : nil
    end
    # Get CSV dialect
    @dialect = r['dialect']
    # Extract schema
    @schema = r['schema']
    # Get path
    @path = r['path']
    @access_url = r['url']
    # Set title
    @title = @path || @uri
  elsif r = options[:dcat_resource]
    @title       = r[:title]
    @description = r[:title]
    @access_url  = r[:accessURL]
  elsif r = options[:ckan_resource]
    @title       = r[:title]
    @description = r[:title]
    @access_url  = r[:accessURL]
    @extension   = r[:format]
    # Load HTTP Response for further use
    @format = r[:format] ? DistributionFormat.new(self) : nil
end
  # Set default CSV dialect
  @dialect ||= {
    "delimiter" => ","
  }
end

Instance Attribute Details

#access_urlString Also known as: uri, download_url

Returns a URL to access the distribution.

Returns:

  • (String)

    a URL to access the distribution.



16
17
18
# File 'lib/data_kitten/distribution.rb', line 16

def access_url
  @access_url
end

#descriptionString

Returns a textual description.

Returns:

  • (String)

    a textual description



30
31
32
# File 'lib/data_kitten/distribution.rb', line 30

def description
  @description
end

#extensionString

Returns the file extension of the distribution.

Returns:

  • (String)

    the file extension of the distribution



39
40
41
# File 'lib/data_kitten/distribution.rb', line 39

def extension
  @extension
end

#formatDistributionFormat

Returns the file format of the distribution.

Returns:



12
13
14
# File 'lib/data_kitten/distribution.rb', line 12

def format
  @format
end

#pathString

Returns the path of the distribution within the source, if appropriate.

Returns:

  • (String)

    the path of the distribution within the source, if appropriate



22
23
24
# File 'lib/data_kitten/distribution.rb', line 22

def path
  @path
end

#schemaHash

Returns a hash representing the schema of the data within the distribution. Will change to a more structured object later.

Returns:

  • (Hash)

    a hash representing the schema of the data within the distribution. Will change to a more structured object later.



35
36
37
# File 'lib/data_kitten/distribution.rb', line 35

def schema
  @schema
end

#titleString Also known as: name

A usable name for the distribution, unique within the DataKitten::Dataset.

Returns:

  • (String)

    a locally unique name



26
27
28
# File 'lib/data_kitten/distribution.rb', line 26

def title
  @title
end

Instance Method Details

#dataArray<Array<String>>

A CSV object representing the loaded data.

Returns:

  • (Array<Array<String>>)

    an array of arrays of strings, representing each row.



123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
# File 'lib/data_kitten/distribution.rb', line 123

def data
  @data ||= begin
    if @path
      datafile = @dataset.send(:load_file, @path)
    elsif @access_url
      datafile = RestClient.get @access_url rescue nil
    end
    if datafile
      case format.extension
      when :csv
        CSV.parse(
          datafile,
          :headers => true,
          :col_sep => @dialect["delimiter"]
        )
      else
        nil
      end
    else
      nil
    end
  rescue
    nil
  end
end

#exists?Boolean

Whether the file that the distribution represents actually exists

Returns:

  • (Boolean)

    whether the HTTP response returns a success code or not



114
115
116
117
118
# File 'lib/data_kitten/distribution.rb', line 114

def exists?
  if @access_url
    http_head.response_code != 404
  end
end

#headersArray<String>

An array of column headers for the distribution. Loaded from the schema, or from the file directly if no schema is present.

Returns:

  • (Array<String>)

    an array of column headers, as strings.



101
102
103
104
105
106
107
108
109
# File 'lib/data_kitten/distribution.rb', line 101

def headers
  @headers ||= begin
    if @schema
      @schema['fields'].map{|x| x['id']}
    else
      data.headers
    end
  end
end

#http_headObject



149
150
151
152
153
154
155
156
157
158
# File 'lib/data_kitten/distribution.rb', line 149

def http_head
  if @access_url
    @http_head ||= begin
      Curl::Easy.http_head(@access_url) do |c|
        c.follow_location = true
        c.useragent = "curb"
      end
    end
  end
end