Class: String
Instance Method Summary collapse
-
#blank_to_nil ⇒ String?
Convert blank strings to
nil
. -
#cleanup ⇒ String
Fix messy oddities such as the use of two apostrophes instead of a quote.
-
#compact ⇒ String
Strip and collapse unnecessary whitespace.
-
#correlate(other, synonyms = []) ⇒ Integer
Calculate the correlation of two strings by counting mutual words.
-
#full_strip ⇒ Object
Similar to
strip
, but remove any leading or trailing non-letters/numbers which includes whitespace. -
#to_ff ⇒ Float
Same as
to_f
but accept both dot and comma as decimal separator. -
#unglue ⇒ String
Add spaces between obviously glued words: * camel glued words * three-or-more-letter and number-only words.
Instance Method Details
#blank_to_nil ⇒ String?
Convert blank strings to nil
.
12 13 14 |
# File 'lib/core_ext/string.rb', line 12 def blank_to_nil self if present? end |
#cleanup ⇒ String
Fix messy oddities such as the use of two apostrophes instead of a quote
22 23 24 25 26 27 |
# File 'lib/core_ext/string.rb', line 22 def cleanup gsub(/[#{AIXM::MIN}]{2}|[#{AIXM::SEC}]/, '"'). # unify quotes gsub(/[#{AIXM::MIN}]/, "'"). # unify apostrophes gsub(/"[[:blank:]]*(.*?)[[:blank:]]*"/m, '"\1"'). # remove whitespace within quotes split(/\r?\n/).map { |s| s.strip.blank_to_nil }.compact.join("\n") # remove blank lines end |
#compact ⇒ String
While similar to String#squish from ActiveSupport, newlines \n
are preserved and not collapsed into one space.
Strip and collapse unnecessary whitespace
38 39 40 |
# File 'lib/core_ext/string.rb', line 38 def compact split("\n").map { |s| s.squish.blank_to_nil }.compact.join("\n") end |
#correlate(other, synonyms = []) ⇒ Integer
Calculate the correlation of two strings by counting mutual words
Both strings are normalized as follows:
-
remove accents, umlauts etc
-
remove everything but members of the
\w
class -
downcase
The normalized strings are split into words. Only words fulfilling either of the following conditions are taken into consideration:
-
words present in and translated by the
synonyms
map -
words of at least 5 characters length
-
words consisting of exactly one letter followed by any number of digits (an optional whitespace between the two is ignored, e.g. “D 25” is the same as “D25”)
The synonyms
map is an array where terms in even positions map to their synonym in the following (odd) position:
SYNONYMS = ['term1', 'synonym1', 'term2', 'synonym2']
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
# File 'lib/core_ext/string.rb', line 73 def correlate(other, synonyms=[]) self_words, other_words = [self, other].map do |string| string. unicode_normalize(:nfd). downcase.gsub(/[-\u2013]/, ' '). remove(/[^\w\s]/). gsub(/\b(\w)\s?(\d+)\b/, '\1\2'). compact. split(/\W+/). map { |w| (i = synonyms.index(w)).nil? ? w : (i.odd? ? w : synonyms[i + 1]).upcase }. keep_if { |w| w.match?(/\w{5,}|\w\d+|[[:upper:]]/) }. uniq end (self_words & other_words).count end |
#full_strip ⇒ Object
Similar to strip
, but remove any leading or trailing non-letters/numbers which includes whitespace
91 92 93 |
# File 'lib/core_ext/string.rb', line 91 def full_strip remove(/\A[^\p{L}\p{N}]*|[^\p{L}\p{N}]*\z/) end |
#to_ff ⇒ Float
Same as to_f
but accept both dot and comma as decimal separator
103 104 105 |
# File 'lib/core_ext/string.rb', line 103 def to_ff sub(/,/, '.').to_f end |
#unglue ⇒ String
Add spaces between obviously glued words:
-
camel glued words
-
three-or-more-letter and number-only words
116 117 118 119 120 121 122 |
# File 'lib/core_ext/string.rb', line 116 def unglue self.dup.tap do |string| [/([[:lower:]])([[:upper:]])/, /([[:alpha:]]{3,})(\d)/, /(\d)([[:alpha:]]{3,})/].freeze.each do |regexp| string.gsub!(regexp, '\1 \2') end end end |