Module: Nomener::Cleaner

Included in:
Parser
Defined in:
lib/nomener/cleaner.rb

Overview

Module with helper functions to clean strings

Currently exposes
.reformat
.cleanup!
.dustoff

Constant Summary collapse

TRAILER_TRASH =

regex for stuff at the end we want to get out

/[,|\s]+$/
DIRTY_STUFF =

regex for name characters we aren’t going to use

/[^,'\-(?:\p{Alpha}(?<\.))\p{Alpha}\p{Blank}]/
@@allowable =

Allowable characters in a name after quotes have been reduced

nil

Class Method Summary collapse

Class Method Details

.cleanup!(*args) ⇒ Object

Internal: Clean up a string where there are numerous consecutive and

trailing non-name characters.
Modifies given string in place.

args - strings to clean up

Returns nothing



55
56
57
58
59
60
61
62
63
64
65
# File 'lib/nomener/cleaner.rb', line 55

def self.cleanup!(*args)
  args.each do |dirty|
    next unless dirty.is_a?(String)

    dirty.gsub! DIRTY_STUFF, ' '
    dirty.squeeze! ' '
    # remove any trailing commas or whitespace
    dirty.gsub! TRAILER_TRASH, ''
    dirty.strip!
  end
end

.reformat(name) ⇒ Object

Internal: Clean up a given string. Quotes from en.wikipedia.org/wiki/Quotation_mark

Needs to be fixed up for matching and non-english quotes

name - the string to clean

Returns a string which is (ideally) pretty much the same as it was given.



31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# File 'lib/nomener/cleaner.rb', line 31

def self.reformat(name)
  @@allowable = %r![^\p{Alpha}\-&\/\ \.\,\'\"\(\)
    #{Nomener.config.left}
    #{Nomener.config.right}
    #{Nomener.config.single}
    ] !x unless @@allowable

  # remove illegal characters, translate fullwidth down
  nomen = name.dup.scrub.tr("\uFF02\uFF07", "\u0022\u0027")

  nomen = replace_doubles(nomen)
  replace_singles(nomen)
    .gsub(/@@allowable/, ' ')
    .squeeze(' ')
    .strip
end