Module: Utilities::Strings

Defined in:
lib/utilities/strings.rb

Overview

Methods that receive or generate a String. This methods in this library should be completely independant (i.e. ultimately gemifiable) from TaxonWorks.

Class Method Summary collapse

Class Method Details

.a_label(string) ⇒ Object

Returns String,nil the string preceeded with “a” or “an”.

Returns:

  • String,nil the string preceeded with “a” or “an”



22
23
24
25
# File 'lib/utilities/strings.rb', line 22

def self.a_label(string)
  return nil if string.to_s.length == 0
  (string =~ /\A[aeiou]/i ? 'an ' : 'a ') + string
end

.alphabetic_strings(string) ⇒ Array

Splits a string on special characters, returning an array of the strings that do not contain digits.

It splits on accent characters, and does not split on underscores. The method is used for building wildcard searches, so splitting on accents creates pseudo accent insensitivity in searches.

#alphanumeric allows searches by page number, year, etc.

Parameters:

  • string (String)

Returns:

  • (Array)

    whitespace and special character split, then any string containing a digit eliminated



136
137
138
139
# File 'lib/utilities/strings.rb', line 136

def self.alphabetic_strings(string)
  return [] if string.nil? || string.length == 0
  string.split(/[^[[:word:]]]+/).select { |b| !(b =~ /\d/) }.reject { |b| b.empty? }
end

.alphanumeric_strings(string) ⇒ Object

alphanumeric allows searches by page number, year, etc.



142
143
144
145
# File 'lib/utilities/strings.rb', line 142

def self.alphanumeric_strings(string)
  return [] if string.nil? || string.length == 0
  string.split(/[^[[:word:]]]+/).reject { |b| b.empty? }
end

.asciify(string) ⇒ Object

Returns String, nil replace
, <i>, <b> tags with their asciidoc equivalents.

Returns:

  • String, nil replace
    , <i>, <b> tags with their asciidoc equivalents



6
7
8
9
10
11
12
13
# File 'lib/utilities/strings.rb', line 6

def self.asciify(string)
  return nil if string.to_s.length == 0

  string.gsub!(/<br>/, "\n")
  string.gsub!(/<i>|<\/i>/, '_')
  string.gsub!(/<b>|<\/b>/, '**')
  string
end

.authorship_sentence(last_names = []) ⇒ String?

TODO: DEPRECATE (doesn’t belong here because to_sentence is Rails?

Parameters:

  • last_names (Array) (defaults to: [])

Returns:

  • (String, nil)


122
123
124
125
# File 'lib/utilities/strings.rb', line 122

def self.authorship_sentence(last_names = [])
  return nil if last_names.empty?
  last_names.to_sentence(two_words_connector: ' & ', last_word_connector: ' & ')
end

.encode_with_utf8(string) ⇒ String, false

Returns !! this is a bad sign, you should know your encoding before it gets to needing this.

Parameters:

  • string (String)

Returns:

  • (String, false)

    !! this is a bad sign, you should know your encoding before it gets to needing this



150
151
152
153
154
155
156
157
# File 'lib/utilities/strings.rb', line 150

def self.encode_with_utf8(string)
  return false if string.nil?
  if Encoding.compatible?('test'.encode(Encoding::UTF_8), string)
    string.force_encoding(Encoding::UTF_8)
  else
    false
  end
end

.escape_single_quote(string) ⇒ String

Adds a second single quote to escape apostrophe in SQL query strings

Parameters:

  • string (String)

Returns:

  • (String)


83
84
85
86
# File 'lib/utilities/strings.rb', line 83

def self.escape_single_quote(string)
  return nil if string.blank?
  string.gsub("'", "''")
end

.generate_md5(text) ⇒ Digest::MD5

Parameters:

  • text (String)

Returns:

  • (Digest::MD5)


62
63
64
65
66
# File 'lib/utilities/strings.rb', line 62

def self.generate_md5(text)
  return nil if text.blank?
  text = text.downcase.gsub(/[\s\.,;:\?!]*/, '')
  Digest::MD5.hexdigest(text)
end

.increment_contained_integer(string) ⇒ String, Boolean

Increments the first integer encountered in the string, wrapping it in only the immediate non integer strings before and after (see tests). Returns false if no number is found

Parameters:

  • string (String)

Returns:

  • (String, Boolean)

    increments the first integer encountered in the string, wrapping it in only the immediate non integer strings before and after (see tests). Returns false if no number is found



73
74
75
76
77
78
# File 'lib/utilities/strings.rb', line 73

def self.increment_contained_integer(string)
  string =~ /([^\d]*)(\d+)([^\d]*)/
  a, b, c = $1, $2, $3
  return false if b.nil?
  [a, (b.to_i + 1), c].compact.join
end

.integers(string) ⇒ Array<String>

Get numbers separated by spaces from a string

Parameters:

  • string (String)

Returns:

  • (Array<String>)

    of strings representing integers



176
177
178
179
# File 'lib/utilities/strings.rb', line 176

def self.integers(string)
  return [] if string.nil? || string.length == 0
  string.split(/\s+/).select { |t| is_i?(t) }
end

.is_i?(string) ⇒ Boolean

see stackoverflow.com/questions/1235863/test-if-a-string-is-basically-an-integer-in-quotes-using-ruby Note: Might checkout CSV::Converters constants to see how they handle this Allows ‘02’, but treated as OK as 02.to_i returns 2

Parameters:

  • string (String)

Returns:

  • (Boolean)

    whether the string is an integer (positive or negative)



94
95
96
# File 'lib/utilities/strings.rb', line 94

def self.is_i?(string)
  /\A[-+]?\d+\z/ === string
end

.linearize(string, separator = ' | ') ⇒ Object



15
16
17
18
# File 'lib/utilities/strings.rb', line 15

def self.linearize(string, separator = ' | ')
  return nil if string.to_s.length == 0
  string.gsub(/\n|(\r\n)/, separator)
end

.nil_squish_strip(string) ⇒ String?

Returns strips pre/post fixed space and condenses internal spaces, and also but returns nil (not empty string) if nothing is left.

Parameters:

  • string (String)

Returns:

  • (String, nil)

    strips pre/post fixed space and condenses internal spaces, and also but returns nil (not empty string) if nothing is left



50
51
52
53
54
55
56
57
58
# File 'lib/utilities/strings.rb', line 50

def self.nil_squish_strip(string)
  a = string.dup
  if !a.nil?
    a.delete("\u0000")
    a.squish!
    a = nil if a == ''
  end
  a
end

.nil_strip(string) ⇒ String?

Returns strips space, leaves internal whitespace as is, returns nil if nothing is left.

Parameters:

  • string (String)

Returns:

  • (String, nil)

    strips space, leaves internal whitespace as is, returns nil if nothing is left



38
39
40
41
42
43
44
45
# File 'lib/utilities/strings.rb', line 38

def self.nil_strip(string) # string should have content or be empty
  a = string.dup
  if !a.nil?
    a.strip!
    a = nil if a == ''
  end
  a
end

.nil_wrap(pre = nil, content = nil, post = nil) ⇒ String?

Return nil if content.nil?, else wrap and return string if provided

Parameters:

  • pre (String) (defaults to: nil)
  • content (String) (defaults to: nil)
  • post (String) (defaults to: nil)

Returns:

  • (String, nil)

    return nil if content.nil?, else wrap and return string if provided



114
115
116
117
# File 'lib/utilities/strings.rb', line 114

def self.nil_wrap(pre = nil, content = nil, post = nil)
  return nil if content.blank?
  [pre, content, post].compact.join
end

.only_integer(string) ⇒ Integer?

Return an integer if and only if the string is a single integer, otherwise nil

Parameters:

  • string (String)

Returns:

  • (Integer, nil)

    return an integer if and only if the string is a single integer, otherwise nil



185
186
187
188
189
190
191
# File 'lib/utilities/strings.rb', line 185

def self.only_integer(string)
  if is_i?(string)
    string.to_i
  else
    nil
  end
end

.only_integers?(string) ⇒ Boolean

Returns true if the query string only contains integers separated by whitespace.

Returns:

  • (Boolean)

    true if the query string only contains integers separated by whitespace



195
196
197
# File 'lib/utilities/strings.rb', line 195

def self.only_integers?(string)
  !(string =~ /[^\d\s]/i) && !integers(string).empty?
end

.parse_authorship(authorship) ⇒ Array

Parse a scientificAuthorship field to extract author and year information.

If the format matches ICZN, adds parentheses around author name (if detected)

Parameters:

  • authorship (String)

Returns:

  • (Array)
    author_name, year


204
205
206
207
208
209
210
211
# File 'lib/utilities/strings.rb', line 204

def self.parse_authorship(authorship)
  return [] if (authorship = authorship.to_s.strip).empty?

  year_match = /(,|\s)\s*(?<year>\d+)(?<paren>\))?$/.match(authorship)
  author_name = "#{authorship[..(year_match&.offset(0)&.first || 0)-1]}#{year_match&.[](:paren)}"

  [author_name, year_match&.[](:year)]
end

.random_string(string_length) ⇒ String?

Returns stub a string of a certain length.

Parameters:

  • string_length (Integer)

Returns:

  • (String, nil)

    stub a string of a certain length



30
31
32
33
# File 'lib/utilities/strings.rb', line 30

def self.random_string(string_length)
  return nil if string_length.to_i == 0
  ('a'..'z').to_a.shuffle[0, string_length].join
end

.sanitize_for_csv(string) ⇒ String, param

Returns the goal is to sanitizie an individual string such that it is usable in TAB delimited, UTF-8, column. See Download TODO: Likely need to handle quotes, and write better UTF compliancy tests ~~ Technically n is allowed!.

Parameters:

  • string (String)

Returns:

  • (String, param)

    the goal is to sanitizie an individual string such that it is usable in TAB delimited, UTF-8, column. See Download TODO: Likely need to handle quotes, and write better UTF compliancy tests ~~ Technically n is allowed!



103
104
105
106
107
# File 'lib/utilities/strings.rb', line 103

def self.sanitize_for_csv(string)
  a = string.dup
  return a if a.blank? # TODO: .blank is Rails, not OK here
  a.to_s.gsub(/\n|\t/, ' ')
end

.verbatim_author(author_year_string) ⇒ String?

Parameters:

  • author_year_string (String)

Returns:

  • (String, nil)


226
227
228
229
230
231
# File 'lib/utilities/strings.rb', line 226

def self.verbatim_author(author_year_string)
  return nil if author_year_string.to_s.strip.empty?  # alternative to .blank?
  author_end_index = author_year_string.rindex(' ')
  author_end_index ||= author_year_string.length
  author_year_string[0...author_end_index]
end

.year_letter(string) ⇒ String?

Returns the immediately following letter recognized as coming directly past the first year

`Smith, 1920a. ... ` returns `a`.

Returns:

  • (String, nil)

    the immediately following letter recognized as coming directly past the first year

    `Smith, 1920a. ... ` returns `a`
    


168
169
170
# File 'lib/utilities/strings.rb', line 168

def self.year_letter(string)
  string.match(/\d{4}([a-zAZ]+)/).to_a.last
end

.year_of_publication(author_year) ⇒ String?

Parameters:

  • author_year (String)

Returns:

  • (String, nil)


215
216
217
218
219
220
221
222
# File 'lib/utilities/strings.rb', line 215

def self.year_of_publication(author_year)
  return nil if author_year.to_s.strip.empty?   # alternative to .blank?
  split_author_year = author_year.split(' ')
  year = split_author_year[split_author_year.length - 1]
  # try matching last element first, otherwise scan entire string for year
  # Maybe we don't need regex match and can use years(author_year) exclusively?
  year =~ /\A\d+\z/ ? year : years(author_year).last.to_s
end

.years(string) ⇒ Array

Returns:

  • (Array)


160
161
162
163
# File 'lib/utilities/strings.rb', line 160

def self.years(string)
  return [] if string.nil?
  string.scan(/\d{4}/).to_a.uniq
end