Module: Krikri::Util::ExtendedDateParser
- Defined in:
- lib/krikri/util/extended_date_parser.rb
Overview
Utilities to parse string values into EDTF dates or Intervals.
Class Method Summary collapse
-
.circa(str) ⇒ Date?
Remove ‘circa’ or ‘about’ or variations and return an uncertian ETDF dates.
-
.decade_hyphen(str) ⇒ Object
e.g.
-
.decade_s(str) ⇒ Object
e.g.
-
.hyphenated_partial_range(str) ⇒ Object
e.g.
-
.month_year(str) ⇒ Object
e.g.
-
.parse(date_str, allow_interval = false) ⇒ Date, ...
Attempts to parse a string into a valid EDTF or ‘Date` format.
-
.parse_date(*args) ⇒ Date?
Runs ‘Date#parse`; if arguments are invalid (as with an invalid date string) returns `nil`.
-
.parse_interval(str) ⇒ ETDF::Interval?
Creates an EDTF::Interval from a string.
-
.parse_m_d_y(value) ⇒ Date?
Runs ‘Date#strptime` with ’%m-%d-%Y’; if arguments are invalid (as with an invalid date string) returns ‘nil`.
-
.partial_edtf(str) ⇒ Object
e.g.
-
.preprocess(str) ⇒ Object
Preprocess the date string to remove extra whitespace and convert ad hoc formatting to equivalent EDTF.
-
.range_match(str) ⇒ Array(String)
Matches a wide variety of date ranges separated by ‘..’ or ‘-’.
Class Method Details
.circa(str) ⇒ Date?
Remove ‘circa’ or ‘about’ or variations and return an uncertian ETDF dates.
86 87 88 89 90 91 |
# File 'lib/krikri/util/extended_date_parser.rb', line 86 def circa(str) run = str.gsub!(/.*c[irca\.]*/i, '') run ||= str.gsub!(/.*about/i, '') date = parse(str) if run date.nil? ? nil : date.uncertain! end |
.decade_hyphen(str) ⇒ Object
e.g. 199-
176 177 178 179 180 |
# File 'lib/krikri/util/extended_date_parser.rb', line 176 def decade_hyphen(str) /^(\d{3})-$/.match(str) do |m| Date.edtf("#{m[1]}x") end end |
.decade_s(str) ⇒ Object
e.g. 1990s
168 169 170 171 172 |
# File 'lib/krikri/util/extended_date_parser.rb', line 168 def decade_s(str) /^(\d{3})0s$/.match(str) do |m| Date.edtf("#{m[1]}x") end end |
.hyphenated_partial_range(str) ⇒ Object
e.g. 1990-92
152 153 154 155 156 |
# File 'lib/krikri/util/extended_date_parser.rb', line 152 def hyphenated_partial_range(str) /^(\d{2})(\d{2})-(\d{2})$/.match(str) do |m| Date.edtf("#{m[1]}#{m[2]}/#{m[1]}#{m[3]}") end end |
.month_year(str) ⇒ Object
e.g. 01-2045
144 145 146 147 148 |
# File 'lib/krikri/util/extended_date_parser.rb', line 144 def month_year(str) /^(\d{2})-(\d{4})$/.match(str) do |m| Date.edtf("#{m[2]}-#{m[1]}") end end |
.parse(date_str, allow_interval = false) ⇒ Date, ...
Attempts to parse a string into a valid EDTF or ‘Date` format.
- Attempts to split `#providedLabel` on '-', '/', '..', 'to', 'until', and
looks for EDTF and `Date.parse` patterns on either side, setting them to
`#begin` and `#end`. Both split and unsplit dates are parsed as follows:
- Attempts to parse `#providedLabel` as an EDTF interval and populates
begin and end with their respective values.
- Attempts to match to a number of regular expressions which specify
ranges informally.
- Attempts to parse `#providedLabel` as a single date value with
`Date.parse` and enters that value to both `#begin` and `#end`.
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
# File 'lib/krikri/util/extended_date_parser.rb', line 26 def parse(date_str, allow_interval = false) str = preprocess(date_str.dup) date = parse_interval(str) if allow_interval date ||= parse_m_d_y(str) date ||= Date.edtf(str.gsub('.', '-')) date ||= partial_edtf(str) date ||= decade_hyphen(str) date ||= month_year(str) date ||= decade_s(str) date ||= hyphenated_partial_range(str) date ||= parse_date(str) # Only do this if certian letters are present to avoid infinite loops. date ||= circa(str) if str.match(/[circabout]/i) date = date.first if date.is_a? EDTF::Set date || nil end |
.parse_date(*args) ⇒ Date?
Runs ‘Date#parse`; if arguments are invalid (as with an invalid date string) returns `nil`.
119 120 121 122 123 124 125 |
# File 'lib/krikri/util/extended_date_parser.rb', line 119 def parse_date(*args) begin Date.parse(*args) rescue ArgumentError nil end end |
.parse_interval(str) ⇒ ETDF::Interval?
Creates an EDTF::Interval from a string
101 102 103 104 105 106 107 108 109 110 111 |
# File 'lib/krikri/util/extended_date_parser.rb', line 101 def parse_interval(str) match = range_match(str) return nil if match.nil? begin_date, end_date = match.map { |date| parse(date) || :unknown } begin_date = begin_date.first if begin_date.respond_to? :first end_date = end_date.last if end_date.respond_to? :last EDTF::Interval.new(begin_date, end_date) end |
.parse_m_d_y(value) ⇒ Date?
Runs ‘Date#strptime` with ’%m-%d-%Y’; if arguments are invalid (as with an invalid date string) returns ‘nil`.
134 135 136 137 138 139 140 |
# File 'lib/krikri/util/extended_date_parser.rb', line 134 def parse_m_d_y(value) begin Date.strptime(value.gsub(/[^0-9]/, '-'), '%m-%d-%Y') rescue ArgumentError nil end end |
.partial_edtf(str) ⇒ Object
e.g. 1970-08-01/02 or 1970-12/10
160 161 162 163 164 |
# File 'lib/krikri/util/extended_date_parser.rb', line 160 def partial_edtf(str) /^(\d{4}(-\d{2})*)-(\d{2})\/(\d{2})$/.match(str) do |m| Date.edtf("#{m[1]}-#{m[3]}/#{m[1]}-#{m[4]}") end end |
.preprocess(str) ⇒ Object
should ‘-` be intepreted as ’x’ or ‘?’
Preprocess the date string to remove extra whitespace and convert ad hoc formatting to equivalent EDTF.
69 70 71 72 73 74 75 76 77 |
# File 'lib/krikri/util/extended_date_parser.rb', line 69 def preprocess(str) str.gsub!(/late/i, '') str.gsub!(/early/i, '') str.strip! str.gsub!(/\s+/, ' ') str.gsub!('0s', 'x') if str.match(/^[1-9]+0s$/) str.gsub!('-', 'x') if str.match(/^[1-9]+\-+$/) str end |
.range_match(str) ⇒ Array(String)
Matches a wide variety of date ranges separated by ‘..’ or ‘-’
49 50 51 52 53 54 55 56 57 58 59 60 61 |
# File 'lib/krikri/util/extended_date_parser.rb', line 49 def range_match(str) str = str.gsub('to', '-').gsub('until', '-') regexp = %r{ ([a-zA-Z]{0,3}\s?[\d\-\/\.xu\?\~a-zA-Z]*,?\s? \d{3}[\d\-xs][s\d\-\.xu\?\~]*) \s*[-\.]+\s* ([a-zA-Z]{0,3}\s?[\d\-\/\.xu\?\~a-zA-Z]*,?\s? \d{3}[\d\-xs][s\d\-\.xu\?\~]*) }x regexp.match(str) do |m| [m[1], m[2]] end end |