Module: Krikri::Util::ExtendedDateParser

Defined in:
lib/krikri/util/extended_date_parser.rb

Overview

Utilities to parse string values into EDTF dates or Intervals.

Class Method Summary collapse

Class Method Details

.circa(str) ⇒ Date?

Remove ‘circa’ or ‘about’ or variations and return an uncertian ETDF dates.

Parameters:

  • str (String)

Returns:

  • (Date, nil)

    an EDTF date, marked uncertian; or ‘nil`

See Also:

  • #parse


86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
# File 'lib/krikri/util/extended_date_parser.rb', line 86

def circa(str)
  run = str.gsub!(/.*c[irca\.]*/i, '')
  run ||= str.gsub!(/.*about/i, '')
  date = parse(str) if run

  return nil if date.nil?

  # The EDTF grammar does not support uncertainty on masked precision dates
  if date.respond_to? :uncertain!
    date.uncertain!
  elsif date.is_a? EDTF::Interval
    # Interval uncertainty is scoped to the begin and end dates;
    # to be safe, we mark both.
    date.from = date.from.uncertain! if date.from.respond_to? :uncertain!
    date.to   = date.to.uncertain!   if date.to.respond_to? :uncertain!
  end

  date
end

.decade_hyphen(str) ⇒ Object

e.g. 199-



189
190
191
192
193
# File 'lib/krikri/util/extended_date_parser.rb', line 189

def decade_hyphen(str)
  /^(\d{3})-$/.match(str) do |m|
    Date.edtf("#{m[1]}x")
  end
end

.decade_s(str) ⇒ Object

e.g. 1990s



181
182
183
184
185
# File 'lib/krikri/util/extended_date_parser.rb', line 181

def decade_s(str)
  /^(\d{3})0s$/.match(str) do |m|
    Date.edtf("#{m[1]}x")
  end
end

.hyphenated_partial_range(str) ⇒ Object

e.g. 1990-92



165
166
167
168
169
# File 'lib/krikri/util/extended_date_parser.rb', line 165

def hyphenated_partial_range(str)
  /^(\d{2})(\d{2})-(\d{2})$/.match(str) do |m|
    Date.edtf("#{m[1]}#{m[2]}/#{m[1]}#{m[3]}")
  end
end

.month_year(str) ⇒ Object

e.g. 01-2045



157
158
159
160
161
# File 'lib/krikri/util/extended_date_parser.rb', line 157

def month_year(str)
  /^(\d{2})-(\d{4})$/.match(str) do |m|
    Date.edtf("#{m[2]}-#{m[1]}")
  end
end

.parse(date_str, allow_interval = false) ⇒ Date, ...

Attempts to parse a string into a valid EDTF or ‘Date` format.

- Attempts to split `#providedLabel` on '-', '/', '..', 'to', 'until', and
  looks for EDTF and `Date.parse` patterns on either side, setting them to
  `#begin` and `#end`. Both split and unsplit dates are parsed as follows:
- Attempts to parse `#providedLabel` as an EDTF interval and populates
  begin and end with their respective values.
- Attempts to match to a number of regular expressions which specify
  ranges informally.
- Attempts to parse `#providedLabel` as a single date value with
  `Date.parse` and enters that value to both `#begin` and `#end`.

Parameters:

  • date_str (String)

    a string which may contain a date range

  • allow_interval (Boolean) (defaults to: false)

    a flag specifing whethe to use #range_match to look for range values.

Returns:

  • (Date, EDTF::Epoch, EDTF::Interval, nil)

    the date parsed or nil

See Also:



26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# File 'lib/krikri/util/extended_date_parser.rb', line 26

def parse(date_str, allow_interval = false)
  str = preprocess(date_str.dup)
  date = parse_interval(str) if allow_interval
  date ||= parse_m_d_y(str)
  date ||= Date.edtf(str.gsub('.', '-'))
  date ||= partial_edtf(str)
  date ||= decade_hyphen(str)
  date ||= month_year(str)
  date ||= decade_s(str)
  date ||= hyphenated_partial_range(str)
  date ||= parse_date(str)
  # Only do this if certian letters are present to avoid infinite loops.
  date ||= circa(str) if str.match(/[circabout]/i)
  date = date.first if date.is_a? EDTF::Set
  date || nil
end

.parse_date(*args) ⇒ Date?

Runs ‘Date#parse`; if arguments are invalid (as with an invalid date string) returns `nil`.

Returns:

  • (Date, nil)

    the parsed date or nil

See Also:

  • Date#parse


132
133
134
135
136
137
138
# File 'lib/krikri/util/extended_date_parser.rb', line 132

def parse_date(*args)
  begin
    Date.parse(*args)
  rescue ArgumentError
    nil
  end
end

.parse_interval(str) ⇒ ETDF::Interval?

Creates an EDTF::Interval from a string

Parameters:

  • str (String)

    a string which may contain a date range

Returns:

  • (ETDF::Interval, nil)

    an EDTF object representing a date range or nil if none can be found

See Also:

  • #range_match


114
115
116
117
118
119
120
121
122
123
124
# File 'lib/krikri/util/extended_date_parser.rb', line 114

def parse_interval(str)
  match = range_match(str)
  return nil if match.nil?

  begin_date, end_date = match.map { |date| parse(date) || :unknown }

  begin_date = begin_date.first if begin_date.respond_to? :first
  end_date = end_date.last if end_date.respond_to? :last

  EDTF::Interval.new(begin_date, end_date)
end

.parse_m_d_y(value) ⇒ Date?

Runs ‘Date#strptime` with ’%m-%d-%Y’; if arguments are invalid (as with an invalid date string) returns ‘nil`.

Parameters:

  • value (String)

    the string to parse

Returns:

  • (Date, nil)

    the parsed date or nil

See Also:

  • Date#strptime


147
148
149
150
151
152
153
# File 'lib/krikri/util/extended_date_parser.rb', line 147

def parse_m_d_y(value)
  begin
    Date.strptime(value.gsub(/[^0-9]/, '-'), '%m-%d-%Y')
  rescue ArgumentError
    nil
  end
end

.partial_edtf(str) ⇒ Object

e.g. 1970-08-01/02 or 1970-12/10



173
174
175
176
177
# File 'lib/krikri/util/extended_date_parser.rb', line 173

def partial_edtf(str)
  /^(\d{4}(-\d{2})*)-(\d{2})\/(\d{2})$/.match(str) do |m|
    Date.edtf("#{m[1]}-#{m[3]}/#{m[1]}-#{m[4]}")
  end
end

.preprocess(str) ⇒ Object

TODO:

should ‘-` be intepreted as ’x’ or ‘?’

Preprocess the date string to remove extra whitespace and convert ad hoc formatting to equivalent EDTF.



69
70
71
72
73
74
75
76
77
# File 'lib/krikri/util/extended_date_parser.rb', line 69

def preprocess(str)
  str.gsub!(/late/i, '')
  str.gsub!(/early/i, '')
  str.strip!
  str.gsub!(/\s+/, ' ')
  str.gsub!('0s', 'x') if str.match(/^[1-9]+0s$/)
  str.gsub!('-', 'x') if str.match(/^[1-9]+\-+$/)
  str
end

.range_match(str) ⇒ Array(String)

Matches a wide variety of date ranges separated by ‘..’ or ‘-’

Parameters:

  • str (String)

    a string which may contain a date range

Returns:

  • (Array(String))

    the begining and ending dates of an identified range



49
50
51
52
53
54
55
56
57
58
59
60
61
# File 'lib/krikri/util/extended_date_parser.rb', line 49

def range_match(str)
  str = str.gsub('to', '-').gsub('until', '-')
  regexp = %r{
    ([a-zA-Z]{0,3}\s?[\d\-\/\.xu\?\~a-zA-Z]*,?\s?
    \d{3}[\d\-xs][s\d\-\.xu\?\~]*)
    \s*[-\.]+\s*
    ([a-zA-Z]{0,3}\s?[\d\-\/\.xu\?\~a-zA-Z]*,?\s?
    \d{3}[\d\-xs][s\d\-\.xu\?\~]*)
  }x
  regexp.match(str) do |m|
    [m[1], m[2]]
  end
end