Class: DataKitten::Distribution
- Inherits:
-
Object
- Object
- DataKitten::Distribution
- Defined in:
- lib/data_kitten/distribution.rb
Overview
A specific available form of a dataset, such as a CSV file, an API, or an RSS feed.
Based on dcat:Distribution, but with useful aliases for other vocabularies.
Instance Attribute Summary collapse
-
#access_url ⇒ String
A URL to access the distribution.
-
#byte_size ⇒ Integer
Size of file in bytes.
-
#description ⇒ String
A textual description.
-
#download_url ⇒ String
(also: #uri)
A URL to the file of the distribution.
-
#extension ⇒ String
The file extension of the distribution.
-
#format ⇒ DistributionFormat
The file format of the distribution.
-
#issued ⇒ Date
Date created.
-
#media_type ⇒ String
The IANA media type (MIME type) of the distribution.
-
#modified ⇒ Date
Date modified.
-
#path ⇒ String
The path of the distribution within the source, if appropriate.
-
#schema ⇒ Hash
A hash representing the schema of the data within the distribution.
-
#title ⇒ String
(also: #name)
A usable name for the distribution, unique within the Dataset.
Instance Method Summary collapse
-
#data ⇒ Array<Array<String>>
A CSV object representing the loaded data.
-
#exists? ⇒ Boolean
Whether the file that the distribution represents actually exists.
-
#headers ⇒ Array<String>
An array of column headers for the distribution.
-
#initialize(dataset, options) ⇒ Distribution
constructor
Create a new Distribution.
Constructor Details
#initialize(dataset, options) ⇒ Distribution
Create a new Distribution. Currently only loads from Datapackage resource
hashes.
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
# File 'lib/data_kitten/distribution.rb', line 66 def initialize(dataset, ) # Store dataset @dataset = dataset # Parse datapackage if r = [:datapackage_resource] # Load basics @description = r['description'] # Work out format @format = begin @extension = r['format'] if @extension.nil? @extension = r['path'].is_a?(String) ? r['path'].split('.').last.upcase : nil end @extension ? DistributionFormat.new(self) : nil end # Get CSV dialect @dialect = r['dialect'] # Extract schema @schema = r['schema'] # Get path @path = r['path'] @download_url = r['url'] # Set title @title = @path || @uri elsif r = [:dcat_resource] @title = r[:title] @description = r[:title] @access_url = r[:accessURL] elsif r = [:ckan_resource] @title = r[:title] @description = r[:title] @issued = r[:issued] @modified = r[:modified] @access_url = r[:accessURL] @download_url = r[:downloadURL] @byte_size = r[:byteSize] @media_type = r[:mediaType] @extension = r[:format] # Load HTTP Response for further use @format = r[:format] ? DistributionFormat.new(self) : nil end # Set default CSV dialect @dialect ||= { "delimiter" => "," } @download = Fetcher.wrap(@download_url) end |
Instance Attribute Details
#access_url ⇒ String
Returns a URL to access the distribution.
16 17 18 |
# File 'lib/data_kitten/distribution.rb', line 16 def access_url @access_url end |
#byte_size ⇒ Integer
Returns size of file in bytes.
45 46 47 |
# File 'lib/data_kitten/distribution.rb', line 45 def byte_size @byte_size end |
#description ⇒ String
Returns a textual description.
33 34 35 |
# File 'lib/data_kitten/distribution.rb', line 33 def description @description end |
#download_url ⇒ String Also known as: uri
Returns a URL to the file of the distribution.
20 21 22 |
# File 'lib/data_kitten/distribution.rb', line 20 def download_url @download_url end |
#extension ⇒ String
Returns the file extension of the distribution.
58 59 60 |
# File 'lib/data_kitten/distribution.rb', line 58 def extension @extension end |
#format ⇒ DistributionFormat
Returns the file format of the distribution.
12 13 14 |
# File 'lib/data_kitten/distribution.rb', line 12 def format @format end |
#issued ⇒ Date
Returns date created.
37 38 39 |
# File 'lib/data_kitten/distribution.rb', line 37 def issued @issued end |
#media_type ⇒ String
Returns the IANA media type (MIME type) of the distribution.
49 50 51 |
# File 'lib/data_kitten/distribution.rb', line 49 def media_type @media_type end |
#modified ⇒ Date
Returns date modified.
41 42 43 |
# File 'lib/data_kitten/distribution.rb', line 41 def modified @modified end |
#path ⇒ String
Returns the path of the distribution within the source, if appropriate.
25 26 27 |
# File 'lib/data_kitten/distribution.rb', line 25 def path @path end |
#schema ⇒ Hash
Returns a hash representing the schema of the data within the distribution. Will change to a more structured object later.
54 55 56 |
# File 'lib/data_kitten/distribution.rb', line 54 def schema @schema end |
#title ⇒ String Also known as: name
A usable name for the distribution, unique within the DataKitten::Dataset.
29 30 31 |
# File 'lib/data_kitten/distribution.rb', line 29 def title @title end |
Instance Method Details
#data ⇒ Array<Array<String>>
A CSV object representing the loaded data.
147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 |
# File 'lib/data_kitten/distribution.rb', line 147 def data @data ||= begin if @path datafile = @dataset.send(:load_file, @path) elsif @download.ok? datafile = @download.body end if datafile case format.extension when :csv CSV.parse( datafile, :headers => true, :col_sep => @dialect["delimiter"] ) else nil end else nil end rescue nil end end |
#exists? ⇒ Boolean
Whether the file that the distribution represents actually exists
140 141 142 |
# File 'lib/data_kitten/distribution.rb', line 140 def exists? @download.exists? end |
#headers ⇒ Array<String>
An array of column headers for the distribution. Loaded from the schema, or from the file directly if no schema is present.
127 128 129 130 131 132 133 134 135 |
# File 'lib/data_kitten/distribution.rb', line 127 def headers @headers ||= begin if @schema @schema['fields'].map{|x| x['id']} else data.headers end end end |