Class: SportDb::Import::TeamReader
- Inherits:
-
Object
- Object
- SportDb::Import::TeamReader
- Defined in:
- lib/sportdb/config/team_reader.rb
Defined Under Namespace
Classes: Team
Class Method Summary collapse
- .parse(txt) ⇒ Object
-
.read(path) ⇒ Object
use - rename to read_file or from_file etc.
Class Method Details
.parse(txt) ⇒ Object
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 |
# File 'lib/sportdb/config/team_reader.rb', line 42 def self.parse( txt ) recs = [] last_rec = nil txt.each_line do |line| line = line.strip next if line.empty? next if line.start_with?( '#' ) ## skip comments too ## strip inline (until end-of-line) comments too ## e.g Eupen => KAS Eupen, ## [de] ## => Eupen => KAS Eupen, line = line.sub( /#.*/, '' ).strip pp line next if line =~ /^={1,}$/ ## skip "decorative" only heading e.g. ======== ## note: like in wikimedia markup (and markdown) all optional trailing ==== too ## todo/check: allow === Text =-=-=-=-=-= too - why? why not? if line =~ /^(={1,}) ## leading ====== ([^=]+?) ## text (note: for now no "inline" = allowed) =* ## (optional) trailing ==== $/x heading_marker = $1 heading_level = $1.length ## count number of = for heading level heading = $2.strip puts "heading #{heading_level} >#{heading}<" ## skip heading for now elsif line.start_with?( '|' ) ## assume continuation with line of alternative names ## note: skip leading pipe values = line[1..-1].split( '|' ) # team names - allow/use pipe(|) ## strip and squish (white)spaces # e.g. New York FC (2011-) => New York FC (2011-) values = values.map { |value| value.strip.gsub( /[ \t]+/, ' ' ) } last_rec.alt_names += values ## check for duplicates - simple check for now - fix/improve ## todo/fix: (auto)remove duplicates - why? why not? ## todo/fix: add canonical name too!! might get duplicated in alt names!!! count = last_rec.alt_names.size count_uniq = last_rec.alt_names.uniq.size if count != count_uniq puts puts "*** !!! WARN !!! - #{count-count_uniq} duplicate alt name(s):" pp last_rec ## exit 1 end else values = line.split( ',' ) rec = Team.new value = values.shift ## get first item ## strip and squish (white)spaces # e.g. New York FC (2011-) => New York FC (2011-) value = value.strip.gsub( /[ \t]+/, ' ' ) rec.name = value # canoncial_name ## note: ## check/todo!!!!!!!!!!!!!!!!!- ## strip year if to present e.g. (2011-) ## ## do NOT strip for defunct / historic clubs e.g. ## (1899-1910) ## or (-1914) or (-2011) etc. ### ## todo: move year out of canonical team name - why? why not? ## check if canonical name include (2011-) or similar in name ## if yes, remove (2011-) and add to (alt) names ## e.g. New York FC (2011) => New York FC if rec.name =~ /\(.+?\)/ ## note: use non-greedy (?) match name = rec.name.gsub( /\(.+?\)/, '' ).strip rec.alt_names << name if rec.name =~ /\(([0-9]{4})-\)/ ## e.g. (2014-) rec.year = $1.to_i elsif rec.name =~ /\(-([0-9]{4})\)/ ## e.g. (-2014) rec.year_end = $1.to_i elsif rec.name =~ /\(([0-9]{4})-([0-9]{4})\)/ ## e.g. (2011-2014) rec.year = $1.to_i rec.year_end = $2.to_i else ## todo/check: warn about unknown year format end end ## todo/check - check for unknown format values ## e.g. too many values, duplicate years, etc. ## check for overwritting, etc. while values.size > 0 value = values.shift ## strip and squish (white)spaces # e.g. León › Guanajuato => León › Guanajuato value = value.strip.gsub( /[ \t]+/, ' ' ) if value =~/^\d{4}$/ # e.g 1904 ## todo/check: issue warning if year is already set!!!!!!! if rec.year puts "!!! error - year already set to #{rec.year} - CANNOT overwrite with #{value}:" pp rec exit 1 end rec.year = value.to_i elsif value.start_with?( '@' ) # e.g. @ Anfield ## cut-off leading @ and spaces rec.ground = value[1..-1].strip else ## assume city / geo tree ## split into geo tree geos = value.split( /[<>‹›]/ ) ## note: allow > < or › ‹ geos = geos.map { |geo| geo.strip } ## remove all whitespaces city = geos[0] ## check for "embedded" district e.g. London (Fulham) or Hamburg (St. Pauli) etc. if city =~ /\((.+?)\)/ ## note: use non-greedy (?) match rec.district = $1.strip city = city.gsub( /\(.+?\)/, '' ).strip end rec.city = city if geos.size > 1 ## cut-off city and keep the rest (of geo tree) rec.geos = geos[1..-1] end end end last_rec = rec ### todo/fix: ## auto-add alt name with dots stripped - why? why not? ## e.g. D.C. United => DC United ## e.g. Liverpool F.C. => Liverpool FC ## e.g. St. Albin => St Albin etc. ## e.g. 1. FC Köln => 1 FC Köln -- make special case for 1. - why? why not? ## ## todo/fix: unify mapping entries ## always lowercase !!!! (case insensitive) ## always strip (2011-) !!! ## always strip dots (e.g. St., F.C, etc.) recs << rec end end # each_line recs end |
.read(path) ⇒ Object
use - rename to read_file or from_file etc. - why? why not?
36 37 38 39 |
# File 'lib/sportdb/config/team_reader.rb', line 36 def self.read( path ) ## use - rename to read_file or from_file etc. - why? why not? txt = File.open( path, 'r:utf-8' ).read parse( txt ) end |