Class: Importer::DataReader
- Inherits:
-
Object
- Object
- Importer::DataReader
- Defined in:
- lib/iron/import/data_reader.rb
Overview
Base class for our input reading - dealing with the raw file/stream, and extracting raw values. In addition, we provide the base data coercion/parsing for our derived classes.
Direct Known Subclasses
Instance Attribute Summary collapse
-
#format ⇒ Object
readonly
Attributes.
Class Method Summary collapse
-
.for_format(importer, format) ⇒ Object
Factory method to build a reader from an explicit format selector.
-
.for_path(importer, path) ⇒ Object
Figure out which format to use for a given path based on file name.
-
.for_source(importer, source) ⇒ Object
Implement our automatic reader selection, based on the import source.
-
.for_stream(importer, stream) ⇒ Object
Figure out which format to use based on a stream’s source file info.
-
.is_stream?(source) ⇒ Boolean
Attempt to determine if the given source is a stream.
-
.path_from_stream(stream) ⇒ Object
Try to find the original file name for the given stream, as in the case where a file is uploaded to Rails and we’re dealing with an ActionDispatch::Http::UploadedFile.
- .verify_nokogiri! ⇒ Object
- .verify_roo! ⇒ Object
Instance Method Summary collapse
- #add_error(*args) ⇒ Object
- #add_exception(*args) ⇒ Object
-
#init_source(mode, source) ⇒ Object
Override this method in derived classes to set up the given source in the given mode.
-
#initialize(importer, format) ⇒ DataReader
constructor
A new instance of DataReader.
-
#load(path_or_stream, scopes = nil, &block) ⇒ Object
Core data reader method.
-
#load_each(mode, source, scopes, &block) ⇒ Object
Load up the sheet in the correct mode.
-
#load_raw(scopes, &block) ⇒ Object
Override this method in derived classes to take the given sheet definition, find that sheet in the input source, and read out the raw (unparsed) rows as an array of arrays.
-
#parse_value(val, type) ⇒ Object
Provides default value parsing/coersion for all derived data readers.
- #supports?(mode) ⇒ Boolean
- #supports_file! ⇒ Object
- #supports_file? ⇒ Boolean
- #supports_stream! ⇒ Object
- #supports_stream? ⇒ Boolean
Constructor Details
#initialize(importer, format) ⇒ DataReader
Returns a new instance of DataReader.
103 104 105 106 107 |
# File 'lib/iron/import/data_reader.rb', line 103 def initialize(importer, format) @importer = importer @format = format @supports = [] end |
Instance Attribute Details
#format ⇒ Object (readonly)
Attributes
9 10 11 |
# File 'lib/iron/import/data_reader.rb', line 9 def format @format end |
Class Method Details
.for_format(importer, format) ⇒ Object
Factory method to build a reader from an explicit format selector
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
# File 'lib/iron/import/data_reader.rb', line 45 def self.for_format(importer, format) case format when :csv CsvReader.new(importer) when :xls verify_roo! XlsReader.new(importer) when :xlsx verify_roo! XlsxReader.new(importer) when :html verify_nokogiri! HtmlReader.new(importer) else nil end end |
.for_path(importer, path) ⇒ Object
Figure out which format to use for a given path based on file name
64 65 66 67 68 69 70 71 72 73 74 75 |
# File 'lib/iron/import/data_reader.rb', line 64 def self.for_path(importer, path) format = path.to_s.extract(/\.(csv|tsv|html?|xlsx?)\z/i) if format format = format.downcase format = 'html' if format == 'htm' format = 'csv' if format == 'tsv' format = format.to_sym for_format(importer, format) else nil end end |
.for_source(importer, source) ⇒ Object
Implement our automatic reader selection, based on the import source
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
# File 'lib/iron/import/data_reader.rb', line 28 def self.for_source(importer, source) data = nil if is_stream?(source) data = DataReader::for_stream(importer, source) unless data importer.add_error("Unable to find format handler for stream") end else data = DataReader::for_path(importer, source) unless data importer.add_error("Unable to find format handler for file #{source}") end end data end |
.for_stream(importer, stream) ⇒ Object
Figure out which format to use based on a stream’s source file info
78 79 80 81 |
# File 'lib/iron/import/data_reader.rb', line 78 def self.for_stream(importer, stream) path = path_from_stream(stream) for_path(importer, path) end |
.is_stream?(source) ⇒ Boolean
Attempt to determine if the given source is a stream
84 85 86 87 88 |
# File 'lib/iron/import/data_reader.rb', line 84 def self.is_stream?(source) # For now, just assume anything that has a #read method is a stream, in # duck-type fashion source.respond_to?(:read) end |
.path_from_stream(stream) ⇒ Object
Try to find the original file name for the given stream, as in the case where a file is uploaded to Rails and we’re dealing with an ActionDispatch::Http::UploadedFile.
93 94 95 96 97 98 99 100 101 |
# File 'lib/iron/import/data_reader.rb', line 93 def self.path_from_stream(stream) if stream.respond_to?(:original_filename) stream.original_filename elsif stream.respond_to?(:path) stream.path else nil end end |
.verify_nokogiri! ⇒ Object
19 20 21 22 23 24 25 |
# File 'lib/iron/import/data_reader.rb', line 19 def self.verify_nokogiri! if Gem::Specification.find_all_by_name('nokogiri', '>= 1.6.0').empty? raise "You are attempting to use the iron-import gem to import an HTML file. Doing so requires installing the nokogiri gem, version 1.6.0 or later." else require 'nokogiri' end end |
.verify_roo! ⇒ Object
11 12 13 14 15 16 17 |
# File 'lib/iron/import/data_reader.rb', line 11 def self.verify_roo! if Gem::Specification.find_all_by_name('roo', '>= 1.13.0').empty? raise "You are attempting to use the iron-import gem to import an Excel file. Doing so requires installing the roo gem, version 1.13.0 or later." else require 'roo' end end |
Instance Method Details
#add_error(*args) ⇒ Object
303 304 305 |
# File 'lib/iron/import/data_reader.rb', line 303 def add_error(*args) @importer.add_error(*args) end |
#add_exception(*args) ⇒ Object
307 308 309 |
# File 'lib/iron/import/data_reader.rb', line 307 def add_exception(*args) @importer.add_exception(*args) end |
#init_source(mode, source) ⇒ Object
Override this method in derived classes to set up the given source in the given mode
205 206 207 |
# File 'lib/iron/import/data_reader.rb', line 205 def init_source(mode, source) raise "Unimplemented method #init_source in data reader #{self.class.name}" end |
#load(path_or_stream, scopes = nil, &block) ⇒ Object
Core data reader method. Takes a given input source (either a stream or a file path) and attempts to load it. Returns true if successful, false if not. If false, there will be one or more errors explaining what went wrong.
Passed scopes are interpreted by each derived class as makes sense, but generally are used to target seaching in multi-block formats such as Excel spreadsheets (sheet name/index) or HTML documents (css selectors, xpath selectors). If scopes is nil, all possible blocks will be checked.
Each block is read in as raw data from the source, and passed to the given block as an array of arrays. If the block returns true, processing is stopped and no further blocks will be checked.
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 |
# File 'lib/iron/import/data_reader.rb', line 142 def load(path_or_stream, scopes = nil, &block) # Figure out what we've been passed, and handle it if self.class.is_stream?(path_or_stream) # We have a stream (open file, upload, whatever) if supports_stream? # Stream loader defined, run it load_each(:stream, path_or_stream, scopes, &block) else # Write to temp file, as some of our readers only read physical files, annoyingly file = Tempfile.new(['importer', ".#{format}"]) file.binmode begin file.write path_or_stream.read file.close load_each(:file, file.path, scopes, &block) ensure file.close file.unlink end end elsif path_or_stream.is_a?(String) # Assume it's a path is_path = File.exist?(path_or_stream) rescue false if is_path if supports_file? # We're all set, load up the given path load_each(:file, path_or_stream, scopes, &block) else # No file handler, so open the file and run the stream processor file = File.open(path_or_stream, 'rb') load_each(:stream, file, scopes, &block) end else add_error("Unable to locate source file with path #{path_or_stream.slice(0,200)}") end else add_error("Unable to load data source - not a file path or stream: #{path_or_stream.inspect}") end # Return our status !@importer.has_errors? end |
#load_each(mode, source, scopes, &block) ⇒ Object
Load up the sheet in the correct mode
188 189 190 191 192 193 194 195 196 197 198 199 200 201 |
# File 'lib/iron/import/data_reader.rb', line 188 def load_each(mode, source, scopes, &block) # Handle some common error cases centrally if mode == :file && !File.exist?(source) add_error("File not found: #{source}") return end # Let our derived classes open the file, etc. as they need if init_source(mode, source) # Once the source is set, run through each defined sheet, pass it to # our sheet loader, and have the sheet parse it out. load_raw(scopes, &block) end end |
#load_raw(scopes, &block) ⇒ Object
Override this method in derived classes to take the given sheet definition, find that sheet in the input source, and read out the raw (unparsed) rows as an array of arrays. Return false if the sheet cannot be loaded.
212 213 214 |
# File 'lib/iron/import/data_reader.rb', line 212 def load_raw(scopes, &block) raise "Unimplemented method #load_raw in data reader #{self.class.name}" end |
#parse_value(val, type) ⇒ Object
Provides default value parsing/coersion for all derived data readers. Attempts to be clever and handle edge cases like converting ‘5.00’ to 5 when in integer mode, etc. If you find your inputs aren’t being parsed correctly, add a custom #parse block on your Column definition.
219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 |
# File 'lib/iron/import/data_reader.rb', line 219 def parse_value(val, type) return nil if val.nil? || val.to_s.strip == '' case type when :raw then val when :string then if val.is_a?(Float) # Sometimes float values come in for "integer" columns from Excel, # so if the user asks for a string, strip off that ".0" if present val.to_s.gsub(/\.0+$/, '') else # Strip whitespace and we're good to go val.to_s.strip end when :integer, :int then if val.class < Numeric # If numeric, verify that there's no decimal places to worry about if (val.to_f % 1.0 == 0.0) val.to_i else nil end else # Convert to string, strip off trailing decimal zeros val = val.to_s.strip.gsub(/\.0*$/, '') if val.integer? val.to_i else nil end end when :float then if val.class < Numeric val.to_f else # Clean up then verify it matches a valid float format & convert val = val.to_s.strip if val.match(/\A-?[0-9]+(?:\.[0-9]+)?\z/) val.to_f else nil end end when :cents then if val.is_a?(String) val = val.gsub(/\s*\$\s*/, '') end intval = parse_value(val, :integer) if !val.is_a?(Float) && intval intval * 100 else floatval = parse_value(val, :float) if floatval (floatval * 100).round else nil end end when :date then # Pull out the date part of the string and convert date_str = val.to_s.extract(/[0-9]+[\-\/][0-9]+[\-\/][0-9]+/) date_str.to_date rescue nil when :bool then val_str = parse_value(val, :string).to_s.downcase if ['true','yes','y','t','1'].include?(val_str) return true elsif ['false','no','n','f','0'].include?(val_str) return false else nil end else raise "Unknown column type #{type.inspect} - unimplemented?" end end |
#supports?(mode) ⇒ Boolean
109 110 111 |
# File 'lib/iron/import/data_reader.rb', line 109 def supports?(mode) @supports.include?(mode) end |
#supports_file! ⇒ Object
117 118 119 |
# File 'lib/iron/import/data_reader.rb', line 117 def supports_file! @supports << :file end |
#supports_file? ⇒ Boolean
121 122 123 |
# File 'lib/iron/import/data_reader.rb', line 121 def supports_file? supports?(:file) end |
#supports_stream! ⇒ Object
113 114 115 |
# File 'lib/iron/import/data_reader.rb', line 113 def supports_stream! @supports << :stream end |
#supports_stream? ⇒ Boolean
125 126 127 |
# File 'lib/iron/import/data_reader.rb', line 125 def supports_stream? supports?(:stream) end |