Module: Nomener::Cleaner
- Included in:
- Parser
- Defined in:
- lib/nomener/cleaner.rb
Overview
Module with helper functions to clean strings
Currently exposes
.reformat
.cleanup!
.dustoff
Constant Summary collapse
- TRAILER_TRASH =
regex for stuff at the end we want to get out
/[,|\s]+$/- DIRTY_STUFF =
regex for name characters we aren’t going to use
/[^,'\-(?:\p{Alpha}(?<\.))\p{Alpha}\p{Blank}]/- @@allowable =
Allowable characters in a name after quotes have been reduced
nil
Class Method Summary collapse
-
.cleanup!(*args) ⇒ Object
Internal: Clean up a string where there are numerous consecutive and trailing non-name characters.
-
.reformat(name) ⇒ Object
Internal: Clean up a given string.
Class Method Details
.cleanup!(*args) ⇒ Object
Internal: Clean up a string where there are numerous consecutive and
trailing non-name characters.
Modifies given string in place.
args - strings to clean up
Returns nothing
55 56 57 58 59 60 61 62 63 64 65 |
# File 'lib/nomener/cleaner.rb', line 55 def self.cleanup!(*args) args.each do |dirty| next unless dirty.is_a?(String) dirty.gsub! DIRTY_STUFF, ' ' dirty.squeeze! ' ' # remove any trailing commas or whitespace dirty.gsub! TRAILER_TRASH, '' dirty.strip! end end |
.reformat(name) ⇒ Object
Internal: Clean up a given string. Quotes from en.wikipedia.org/wiki/Quotation_mark
Needs to be fixed up for matching and non-english quotes
name - the string to clean
Returns a string which is (ideally) pretty much the same as it was given.
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
# File 'lib/nomener/cleaner.rb', line 31 def self.reformat(name) @@allowable = %r![^\p{Alpha}\-&\/\ \.\,\'\"\(\) #{Nomener.config.left} #{Nomener.config.right} #{Nomener.config.single} ] !x unless @@allowable # remove illegal characters, translate fullwidth down nomen = name.dup.scrub.tr("\uFF02\uFF07", "\u0022\u0027") nomen = replace_doubles(nomen) replace_singles(nomen) .gsub(/@@allowable/, ' ') .squeeze(' ') .strip end |