Class: CsvReader::ParserStd
- Inherits:
-
Object
- Object
- CsvReader::ParserStd
- Defined in:
- lib/csvreader/parser_std.rb
Constant Summary collapse
- DOUBLE_QUOTE =
char constants
"\""
- SINGLE_QUOTE =
"'"
- BACKSLASH =
use BACKSLASH_ESCAPE ??
"\\"
- COMMENT1 =
use COMMENT_HASH or HASH or ??
"#"
- COMMENT2 =
use COMMENT_PERCENT or PERCENT or ??
"%"
- DIRECTIVE =
use a different name e.g. AT or ??
"@"
- SPACE =
s == ASCII 32 (dec) = (Space)
" "
- TAB =
t == ASCII 0x09 (hex) = HT (Tab/horizontal tab)
"\t"
- LF =
n == ASCII 0x0A (hex) 10 (dec) = LF (Newline/line feed)
"\n"
- CR =
r == ASCII 0x0D (hex) 13 (dec) = CR (Carriage return)
"\r"
- SEPARATORS =
",;|^:"
Instance Attribute Summary collapse
-
#config ⇒ Object
readonly
todo/fix: change config to proper dialect class/struct - why? why not?.
-
#meta ⇒ Object
readonly
Returns the value of attribute meta.
Class Method Summary collapse
-
.build_logger ⇒ Object
add simple logger with debug flag/switch.
- .logger ⇒ Object
Instance Method Summary collapse
- #check_sep(sep) ⇒ Object
-
#initialize(sep: ',', null: ['\N', 'NA'], numeric: false, nan: nil, space: nil) ⇒ ParserStd
constructor
todo/check: null values - include NA - why? why not? make null values case sensitive or add an option for case sensitive or better allow a proc as option for checking too!!!.
- #logger ⇒ Object
- #nan=(value) ⇒ Object
- #null=(value) ⇒ Object
- #numeric=(value) ⇒ Object
- #parse(str_or_readable, sep: , &block) ⇒ Object
-
#sep=(value) ⇒ Object
config convenience helpers e.g.
- #space=(value) ⇒ Object
Constructor Details
#initialize(sep: ',', null: ['\N', 'NA'], numeric: false, nan: nil, space: nil) ⇒ ParserStd
todo/check:
null values - include NA - why? why not?
make null values case sensitive or add an option for case sensitive
or better allow a proc as option for checking too!!!
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
# File 'lib/csvreader/parser_std.rb', line 52 def initialize( sep: ',', null: ['\N', 'NA'], ## note: set to nil for no null vales / not availabe (na) numeric: false, ## (auto-)convert all non-quoted values to float nan: nil, ## note: only if numeric - set mappings for Float::NAN (not a number) values space: nil ) @config = {} ## todo/fix: change config to proper dialect class/struct - why? why not? check_sep( sep ) @config[:sep] = sep ## note: null values must get handled by parser ## only get checked for unquoted strings (and NOT for quoted strings) ## "higher-level" code only knows about strings and has no longer any info if string was quoted or unquoted @config[:null] = null ## null values @config[:numeric] = numeric @config[:nan] = nan # not a number (NaN) e.g. Float::NAN ## e.g. treat/convert char to space e.g. _-+• etc ## Man_Utd => Man Utd ## or use it for leading and trailing spaces without quotes ## todo/check: only use for unquoted values? why? why not? @config[:space] = space @meta = nil ## no meta data block (use empty hash {} - why? why not?) end |
Instance Attribute Details
#config ⇒ Object (readonly)
todo/fix: change config to proper dialect class/struct - why? why not?
44 45 46 |
# File 'lib/csvreader/parser_std.rb', line 44 def config @config end |
#meta ⇒ Object (readonly)
Returns the value of attribute meta.
45 46 47 |
# File 'lib/csvreader/parser_std.rb', line 45 def @meta end |
Class Method Details
.build_logger ⇒ Object
add simple logger with debug flag/switch
use Parser.debug = true # to turn on
todo/fix: use logutils instead of std logger - why? why not?
33 34 35 36 37 |
# File 'lib/csvreader/parser_std.rb', line 33 def self.build_logger() l = Logger.new( STDOUT ) l.level = :info ## set to :info on start; note: is 0 (debug) by default l end |
.logger ⇒ Object
38 |
# File 'lib/csvreader/parser_std.rb', line 38 def self.logger() @@logger ||= build_logger; end |
Instance Method Details
#check_sep(sep) ⇒ Object
83 84 85 86 87 88 89 90 91 92 93 |
# File 'lib/csvreader/parser_std.rb', line 83 def check_sep( sep ) ## note: parse does NOT support space or tab as separator!! ## leading and trailing space or tab (whitespace) gets by default trimmed ## unless quoted (or alternative space char used e.g. _-+ if configured) if SEPARATORS.include?( sep ) ## everything ok else raise ArgumentError, "invalid/unsupported sep >#{sep}< - for now only >#{SEPARATORS}< allowed; sorry" end end |
#logger ⇒ Object
39 |
# File 'lib/csvreader/parser_std.rb', line 39 def logger() self.class.logger; end |
#nan=(value) ⇒ Object
104 |
# File 'lib/csvreader/parser_std.rb', line 104 def nan=( value ) @config[:nan]=value; end |
#null=(value) ⇒ Object
102 |
# File 'lib/csvreader/parser_std.rb', line 102 def null=( value ) @config[:null]=value; end |
#numeric=(value) ⇒ Object
103 |
# File 'lib/csvreader/parser_std.rb', line 103 def numeric=( value ) @config[:numeric]=value; end |
#parse(str_or_readable, sep: , &block) ⇒ Object
110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
# File 'lib/csvreader/parser_std.rb', line 110 def parse( str_or_readable, sep: config[:sep], &block ) check_sep( sep ) ## note: data - will wrap either a String or IO object passed in data ## note: kwargs NOT used for now (but required for "protocol/interface" by other parsers) ## make sure data (string or io) is a wrapped into Buffer!!!!!! if str_or_readable.is_a?( Buffer ) ### allow (re)use of Buffer if managed from "outside" input = str_or_readable else input = Buffer.new( str_or_readable ) end if block_given? parse_lines( input, sep: sep, &block ) else records = [] parse_lines( input, sep: sep ) do |record| records << record end records end end |
#sep=(value) ⇒ Object
config convenience helpers
e.g. use like Csv.defaultl.null = '\N' etc. instead of
Csv.default.config[:null] = '\N'
100 |
# File 'lib/csvreader/parser_std.rb', line 100 def sep=( value ) check_sep( value ); @config[:sep]=value; end |
#space=(value) ⇒ Object
105 |
# File 'lib/csvreader/parser_std.rb', line 105 def space=( value ) @config[:space]=value; end |