Class: DataKitten::Distribution
- Inherits:
-
Object
- Object
- DataKitten::Distribution
- Defined in:
- lib/data_kitten/distribution.rb
Overview
A specific available form of a dataset, such as a CSV file, an API, or an RSS feed.
Based on dcat:Distribution, but with useful aliases for other vocabularies.
Instance Attribute Summary collapse
-
#access_url ⇒ String
(also: #uri, #download_url)
A URL to access the distribution.
-
#description ⇒ String
A textual description.
-
#extension ⇒ String
The file extension of the distribution.
-
#format ⇒ DistributionFormat
The file format of the distribution.
-
#path ⇒ String
The path of the distribution within the source, if appropriate.
-
#schema ⇒ Hash
A hash representing the schema of the data within the distribution.
-
#title ⇒ String
(also: #name)
A usable name for the distribution, unique within the Dataset.
Instance Method Summary collapse
-
#data ⇒ Array<Array<String>>
A CSV object representing the loaded data.
-
#exists? ⇒ Boolean
Whether the file that the distribution represents actually exists.
-
#headers ⇒ Array<String>
An array of column headers for the distribution.
- #http_head ⇒ Object
-
#initialize(dataset, options) ⇒ Distribution
constructor
Create a new Distribution.
Constructor Details
#initialize(dataset, options) ⇒ Distribution
Create a new Distribution. Currently only loads from Datapackage resource hashes.
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
# File 'lib/data_kitten/distribution.rb', line 47 def initialize(dataset, ) # Store dataset @dataset = dataset # Parse datapackage if r = [:datapackage_resource] # Load basics @description = r['description'] # Work out format @format = begin @extension = r['format'] if @extension.nil? @extension = r['path'].is_a?(String) ? r['path'].split('.').last.upcase : nil end @extension ? DistributionFormat.new(self) : nil end # Get CSV dialect @dialect = r['dialect'] # Extract schema @schema = r['schema'] # Get path @path = r['path'] @access_url = r['url'] # Set title @title = @path || @uri elsif r = [:dcat_resource] @title = r[:title] @description = r[:title] @access_url = r[:accessURL] elsif r = [:ckan_resource] @title = r[:title] @description = r[:title] @access_url = r[:accessURL] @extension = r[:format] # Load HTTP Response for further use @format = r[:format] ? DistributionFormat.new(self) : nil end # Set default CSV dialect @dialect ||= { "delimiter" => "," } end |
Instance Attribute Details
#access_url ⇒ String Also known as: uri, download_url
Returns a URL to access the distribution.
16 17 18 |
# File 'lib/data_kitten/distribution.rb', line 16 def access_url @access_url end |
#description ⇒ String
Returns a textual description.
30 31 32 |
# File 'lib/data_kitten/distribution.rb', line 30 def description @description end |
#extension ⇒ String
Returns the file extension of the distribution.
39 40 41 |
# File 'lib/data_kitten/distribution.rb', line 39 def extension @extension end |
#format ⇒ DistributionFormat
Returns the file format of the distribution.
12 13 14 |
# File 'lib/data_kitten/distribution.rb', line 12 def format @format end |
#path ⇒ String
Returns the path of the distribution within the source, if appropriate.
22 23 24 |
# File 'lib/data_kitten/distribution.rb', line 22 def path @path end |
#schema ⇒ Hash
Returns a hash representing the schema of the data within the distribution. Will change to a more structured object later.
35 36 37 |
# File 'lib/data_kitten/distribution.rb', line 35 def schema @schema end |
#title ⇒ String Also known as: name
A usable name for the distribution, unique within the DataKitten::Dataset.
26 27 28 |
# File 'lib/data_kitten/distribution.rb', line 26 def title @title end |
Instance Method Details
#data ⇒ Array<Array<String>>
A CSV object representing the loaded data.
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
# File 'lib/data_kitten/distribution.rb', line 123 def data @data ||= begin if @path datafile = @dataset.send(:load_file, @path) elsif @access_url datafile = RestClient.get @access_url rescue nil end if datafile case format.extension when :csv CSV.parse( datafile, :headers => true, :col_sep => @dialect["delimiter"] ) else nil end else nil end rescue nil end end |
#exists? ⇒ Boolean
Whether the file that the distribution represents actually exists
114 115 116 117 118 |
# File 'lib/data_kitten/distribution.rb', line 114 def exists? if @access_url http_head.response_code != 404 end end |
#headers ⇒ Array<String>
An array of column headers for the distribution. Loaded from the schema, or from the file directly if no schema is present.
101 102 103 104 105 106 107 108 109 |
# File 'lib/data_kitten/distribution.rb', line 101 def headers @headers ||= begin if @schema @schema['fields'].map{|x| x['id']} else data.headers end end end |
#http_head ⇒ Object
149 150 151 152 153 154 155 156 157 158 |
# File 'lib/data_kitten/distribution.rb', line 149 def http_head if @access_url @http_head ||= begin Curl::Easy.http_head(@access_url) do |c| c.follow_location = true c.useragent = "curb" end end end end |