Module: Krikri::Util::ExtendedDateParser
- Defined in:
- lib/krikri/util/extended_date_parser.rb
Overview
Utilities to parse string values into EDTF dates or Intervals.
Class Method Summary collapse
-
.circa(str) ⇒ Date?
Remove ‘circa’ or ‘about’ or variations and return an uncertian ETDF dates.
-
.decade_hyphen(str) ⇒ Object
e.g.
-
.decade_s(str) ⇒ Object
e.g.
-
.hyphenated_partial_range(str) ⇒ Object
e.g.
-
.month_year(str) ⇒ Object
e.g.
-
.parse(date_str, allow_interval = false) ⇒ Date, ...
Attempts to parse a string into a valid EDTF or ‘Date` format.
-
.parse_date(*args) ⇒ Date?
Runs ‘Date#parse`; if arguments are invalid (as with an invalid date string) returns `nil`.
-
.parse_interval(str) ⇒ ETDF::Interval?
Creates an EDTF::Interval from a string.
-
.parse_m_d_y(value) ⇒ Date?
Runs ‘Date#strptime` with ’%m-%d-%Y’; if arguments are invalid (as with an invalid date string) returns ‘nil`.
-
.partial_edtf(str) ⇒ Object
e.g.
-
.preprocess(str) ⇒ Object
Preprocess the date string to remove extra whitespace and convert ad hoc formatting to equivalent EDTF.
-
.range_match(str) ⇒ Array(String)
Matches a wide variety of date ranges separated by ‘..’ or ‘-’.
Class Method Details
.circa(str) ⇒ Date?
Remove ‘circa’ or ‘about’ or variations and return an uncertian ETDF dates.
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
# File 'lib/krikri/util/extended_date_parser.rb', line 86 def circa(str) run = str.gsub!(/.*c[irca\.]*/i, '') run ||= str.gsub!(/.*about/i, '') date = parse(str) if run return nil if date.nil? # The EDTF grammar does not support uncertainty on masked precision dates if date.respond_to? :uncertain! date.uncertain! elsif date.is_a? EDTF::Interval # Interval uncertainty is scoped to the begin and end dates; # to be safe, we mark both. date.from = date.from.uncertain! if date.from.respond_to? :uncertain! date.to = date.to.uncertain! if date.to.respond_to? :uncertain! end date end |
.decade_hyphen(str) ⇒ Object
e.g. 199-
189 190 191 192 193 |
# File 'lib/krikri/util/extended_date_parser.rb', line 189 def decade_hyphen(str) /^(\d{3})-$/.match(str) do |m| Date.edtf("#{m[1]}x") end end |
.decade_s(str) ⇒ Object
e.g. 1990s
181 182 183 184 185 |
# File 'lib/krikri/util/extended_date_parser.rb', line 181 def decade_s(str) /^(\d{3})0s$/.match(str) do |m| Date.edtf("#{m[1]}x") end end |
.hyphenated_partial_range(str) ⇒ Object
e.g. 1990-92
165 166 167 168 169 |
# File 'lib/krikri/util/extended_date_parser.rb', line 165 def hyphenated_partial_range(str) /^(\d{2})(\d{2})-(\d{2})$/.match(str) do |m| Date.edtf("#{m[1]}#{m[2]}/#{m[1]}#{m[3]}") end end |
.month_year(str) ⇒ Object
e.g. 01-2045
157 158 159 160 161 |
# File 'lib/krikri/util/extended_date_parser.rb', line 157 def month_year(str) /^(\d{2})-(\d{4})$/.match(str) do |m| Date.edtf("#{m[2]}-#{m[1]}") end end |
.parse(date_str, allow_interval = false) ⇒ Date, ...
Attempts to parse a string into a valid EDTF or ‘Date` format.
- Attempts to split `#providedLabel` on '-', '/', '..', 'to', 'until', and
looks for EDTF and `Date.parse` patterns on either side, setting them to
`#begin` and `#end`. Both split and unsplit dates are parsed as follows:
- Attempts to parse `#providedLabel` as an EDTF interval and populates
begin and end with their respective values.
- Attempts to match to a number of regular expressions which specify
ranges informally.
- Attempts to parse `#providedLabel` as a single date value with
`Date.parse` and enters that value to both `#begin` and `#end`.
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
# File 'lib/krikri/util/extended_date_parser.rb', line 26 def parse(date_str, allow_interval = false) str = preprocess(date_str.dup) date = parse_interval(str) if allow_interval date ||= parse_m_d_y(str) date ||= Date.edtf(str.gsub('.', '-')) date ||= partial_edtf(str) date ||= decade_hyphen(str) date ||= month_year(str) date ||= decade_s(str) date ||= hyphenated_partial_range(str) date ||= parse_date(str) # Only do this if certian letters are present to avoid infinite loops. date ||= circa(str) if str.match(/[circabout]/i) date = date.first if date.is_a? EDTF::Set date || nil end |
.parse_date(*args) ⇒ Date?
Runs ‘Date#parse`; if arguments are invalid (as with an invalid date string) returns `nil`.
132 133 134 135 136 137 138 |
# File 'lib/krikri/util/extended_date_parser.rb', line 132 def parse_date(*args) begin Date.parse(*args) rescue ArgumentError nil end end |
.parse_interval(str) ⇒ ETDF::Interval?
Creates an EDTF::Interval from a string
114 115 116 117 118 119 120 121 122 123 124 |
# File 'lib/krikri/util/extended_date_parser.rb', line 114 def parse_interval(str) match = range_match(str) return nil if match.nil? begin_date, end_date = match.map { |date| parse(date) || :unknown } begin_date = begin_date.first if begin_date.respond_to? :first end_date = end_date.last if end_date.respond_to? :last EDTF::Interval.new(begin_date, end_date) end |
.parse_m_d_y(value) ⇒ Date?
Runs ‘Date#strptime` with ’%m-%d-%Y’; if arguments are invalid (as with an invalid date string) returns ‘nil`.
147 148 149 150 151 152 153 |
# File 'lib/krikri/util/extended_date_parser.rb', line 147 def parse_m_d_y(value) begin Date.strptime(value.gsub(/[^0-9]/, '-'), '%m-%d-%Y') rescue ArgumentError nil end end |
.partial_edtf(str) ⇒ Object
e.g. 1970-08-01/02 or 1970-12/10
173 174 175 176 177 |
# File 'lib/krikri/util/extended_date_parser.rb', line 173 def partial_edtf(str) /^(\d{4}(-\d{2})*)-(\d{2})\/(\d{2})$/.match(str) do |m| Date.edtf("#{m[1]}-#{m[3]}/#{m[1]}-#{m[4]}") end end |
.preprocess(str) ⇒ Object
should ‘-` be intepreted as ’x’ or ‘?’
Preprocess the date string to remove extra whitespace and convert ad hoc formatting to equivalent EDTF.
69 70 71 72 73 74 75 76 77 |
# File 'lib/krikri/util/extended_date_parser.rb', line 69 def preprocess(str) str.gsub!(/late/i, '') str.gsub!(/early/i, '') str.strip! str.gsub!(/\s+/, ' ') str.gsub!('0s', 'x') if str.match(/^[1-9]+0s$/) str.gsub!('-', 'x') if str.match(/^[1-9]+\-+$/) str end |
.range_match(str) ⇒ Array(String)
Matches a wide variety of date ranges separated by ‘..’ or ‘-’
49 50 51 52 53 54 55 56 57 58 59 60 61 |
# File 'lib/krikri/util/extended_date_parser.rb', line 49 def range_match(str) str = str.gsub('to', '-').gsub('until', '-') regexp = %r{ ([a-zA-Z]{0,3}\s?[\d\-\/\.xu\?\~a-zA-Z]*,?\s? \d{3}[\d\-xs][s\d\-\.xu\?\~]*) \s*[-\.]+\s* ([a-zA-Z]{0,3}\s?[\d\-\/\.xu\?\~a-zA-Z]*,?\s? \d{3}[\d\-xs][s\d\-\.xu\?\~]*) }x regexp.match(str) do |m| [m[1], m[2]] end end |