Module: Stringex::StringExtensions

Defined in:: lib/stringex/string_extensions.rb,
lib/stringex/unidecoder.rb

Overview

These methods are all added on String class.

Defined Under Namespace

Modules: ClassMethods

Class Method Summary collapse

.included(base) ⇒ Object

:nodoc:.

Instance Method Summary collapse

#collapse(character = " ") ⇒ Object

Removes specified character from the beginning and/or end of the string and then performs String#squeeze(character), condensing runs of the character within the string.
#convert_accented_entities ⇒ Object

Converts HTML entities into the respective non-accented letters.
#convert_misc_characters(options = {}) ⇒ Object

Converts various common plaintext characters to a more URI-friendly representation.
#convert_misc_entities ⇒ Object

Converts HTML entities (taken from common Textile/RedCloth formattings) into plain text formats.
#convert_smart_punctuation ⇒ Object

Converts MS Word ‘smart punctuation’ to ASCII.
#convert_vulgar_fractions ⇒ Object

Converts vulgar fractions from supported html entities and unicode to plain text formats.
#limit(limit = nil) ⇒ Object

Returns the string limited in size to the value of limit.
#remove_formatting(options = {}) ⇒ Object

Performs multiple text manipulations.
#replace_whitespace(replace = " ") ⇒ Object

Replace runs of whitespace in string.
#strip_html_tags(leave_whitespace = false) ⇒ Object

Removes HTML tags from text.
#to_ascii ⇒ Object

Returns string with its UTF-8 characters transliterated to ASCII ones.
#to_html(lite_mode = false) ⇒ Object

Returns the string converted (via Textile/RedCloth) to HTML format or self [with a friendly warning] if Redcloth is not available.
#to_url(options = {}) ⇒ Object

Create a URI-friendly representation of the string.

Class Method Details

.included(base) ⇒ `Object`

:nodoc:



6
7
8

# File 'lib/stringex/string_extensions.rb', line 6

def self.included(base) # :nodoc:
  base.extend(ClassMethods)
end

Instance Method Details

#collapse(character = " ") ⇒ `Object`

Removes specified character from the beginning and/or end of the string and then performs String#squeeze(character), condensing runs of the character within the string.

Note: This method has been superceded by ActiveSupport’s squish method.



229
230
231

# File 'lib/stringex/string_extensions.rb', line 229

def collapse(character = " ")
  sub(/^#{character}*/, "").sub(/#{character}*$/, "").squeeze(character)
end

#convert_accented_entities ⇒ `Object`

Converts HTML entities into the respective non-accented letters. Examples:

"&aacute;".convert_accented_entities # => "a"
"&ccedil;".convert_accented_entities # => "c"
"&egrave;".convert_accented_entities # => "e"
"&icirc;".convert_accented_entities # => "i"
"&oslash;".convert_accented_entities # => "o"
"&uuml;".convert_accented_entities # => "u"

Note: This does not do any conversion of Unicode/ASCII accented-characters. For that functionality please use to_ascii.



84
85
86

# File 'lib/stringex/string_extensions.rb', line 84

def convert_accented_entities
  gsub(/&([A-Za-z])(grave|acute|circ|tilde|uml|ring|cedil|slash);/, '\1').strip
end

#convert_misc_characters(options = {}) ⇒ `Object`

Converts various common plaintext characters to a more URI-friendly representation. Examples:

"foo & bar".convert_misc_characters # => "foo and bar"
"Chanel #9".convert_misc_characters # => "Chanel number nine"
"user@host".convert_misc_characters # => "user at host"
"google.com".convert_misc_characters # => "google dot com"
"$10".convert_misc_characters # => "10 dollars"
"*69".convert_misc_characters # => "star 69"
"100%".convert_misc_characters # => "100 percent"
"windows/mac/linux".convert_misc_characters # => "windows slash mac slash linux"

Note: Because this method will convert any & symbols to the string “and”, you should run any methods which convert HTML entities (convert_html_entities and convert_misc_entities) before running this method.

# File 'lib/stringex/string_extensions.rb', line 177

def convert_misc_characters(options = {})
  dummy = dup.gsub(/\.{3,}/, " dot dot dot ") # Catch ellipses before single dot rule!
  # Special rules for money
  {
    /(\s|^)\$(\d+)\.(\d+)(\s|$)/ => '\2 dollars \3 cents',
    /(\s|^)£(\d+)\.(\d+)(\s|$)/u => '\2 pounds \3 pence',
  }.each do |found, replaced|
    replaced = " #{replaced} " unless replaced =~ /\\1/
    dummy.gsub!(found, replaced)
  end
  # Special rules for abbreviations
  dummy.gsub!(/(\s|^)([[:alpha:]](\.[[:alpha:]])+(\.?)[[:alpha:]]*(\s|$))/) do |x|
    x.gsub(".", "")
  end
  # Back to normal rules
  misc_characters =
  {
    /\s*&\s*/ => "and",
    /\s*#/ => "number",
    /\s*@\s*/ => "at",
    /(\S|^)\.(\S)/ => '\1 dot \2',
    /(\s|^)\$(\d*)(\s|$)/ => '\2 dollars',
    /(\s|^)£(\d*)(\s|$)/u => '\2 pounds',
    /(\s|^)¥(\d*)(\s|$)/u => '\2 yen',
    /\s*\*\s*/ => "star",
    /\s*%\s*/ => "percent",
    /(\s*=\s*)/ => " equals ",
    /\s*\+\s*/ => "plus",
    /\s*÷\s*/ => "divide",
    /\s*°\s*/ => "degrees"
  }
  misc_characters[/\s*(\\|\/|／)\s*/] = 'slash' unless options[:allow_slash]
  misc_characters.each do |found, replaced|
    replaced = " #{replaced} " unless replaced =~ /\\1/
    dummy.gsub!(found, replaced)
  end
  dummy = dummy.gsub(/(^|[[:alpha:]])'|`([[:alpha:]]|$)/, '\1\2').gsub(/[\.,:;()\[\]\/\?!\^'ʼ"_\|]/, " ").strip
end

#convert_misc_entities ⇒ `Object`

Converts HTML entities (taken from common Textile/RedCloth formattings) into plain text formats.

Note: This isn’t an attempt at complete conversion of HTML entities, just those most likely to be generated by Textile.

# File 'lib/stringex/string_extensions.rb', line 92

def convert_misc_entities
  dummy = dup
  {
    "#822[01]" => "\"",
    "#821[67]" => "'",
    "#8230" => "...",
    "#8211" => "-",
    "#8212" => "--",
    "#215" => "x",
    "gt" => ">",
    "lt" => "<",
    "(#8482|trade)" => "(tm)",
    "(#174|reg)" => "(r)",
    "(#169|copy)" => "(c)",
    "(#38|amp)" => "and",
    "nbsp" => " ",
    "(#162|cent)" => " cent",
    "(#163|pound)" => " pound",
    "(#188|frac14)" => "one fourth",
    "(#189|frac12)" => "half",
    "(#190|frac34)" => "three fourths",
    "(#247|divide)" => "divide",
    "(#176|deg)" => " degrees "
  }.each do |textiled, normal|
    dummy.gsub!(/&#{textiled};/, normal)
  end
  dummy.gsub(/&[^;]+;/, "").strip
end

#convert_smart_punctuation ⇒ `Object`

Converts MS Word ‘smart punctuation’ to ASCII

# File 'lib/stringex/string_extensions.rb', line 149

def convert_smart_punctuation
  dummy = dup
  {

    "(“|”|\302\223|\302\224|\303\222|\303\223)" => '"',
    "(‘|’|\302\221|\302\222|\303\225)" => "'",
    "…" => "...",
  }.each do |smart, normal|
    dummy.gsub!(/#{smart}/, normal)
  end
  dummy.strip
end

#convert_vulgar_fractions ⇒ `Object`

Converts vulgar fractions from supported html entities and unicode to plain text formats.

# File 'lib/stringex/string_extensions.rb', line 123

def convert_vulgar_fractions
  dummy = dup
  {
    "(&#188;|&frac14;|¼)" => "one fourth",
    "(&#189;|&frac12;|½)" => "half",
    "(&#190;|&frac34;|¾)" => "three fourths",
    "(&#8531;|⅓)" => "one third",
    "(&#8532;|⅔)" => "two thirds",
    "(&#8533;|⅕)" => "one fifth",
    "(&#8534;|⅖)" => "two fifths",
    "(&#8535;|⅗)" => "three fifths",
    "(&#8536;|⅘)" => "four fifths",
    "(&#8537;|⅙)" => "one sixth",
    "(&#8538;|⅚)" => "five sixths",
    "(&#8539;|⅛)" => "one eighth",
    "(&#8540;|⅜)" => "three eighths",
    "(&#8541;|⅝)" => "five eighths",
    "(&#8542;|⅞)" => "seven eighths"
  }.each do |textiled, normal|
    dummy.gsub!(/#{textiled}/, normal)
  end
  dummy
end

#limit(limit = nil) ⇒ `Object`

Returns the string limited in size to the value of limit.



44
45
46

# File 'lib/stringex/string_extensions.rb', line 44

def limit(limit = nil)
  limit.nil? ? self : self[0...limit]
end

#remove_formatting(options = {}) ⇒ `Object`

Performs multiple text manipulations. Essentially a shortcut for typing them all. View source below to see which methods are run.

# File 'lib/stringex/string_extensions.rb', line 50

def remove_formatting(options = {})
  strip_html_tags.
    convert_smart_punctuation.
    convert_accented_entities.
    convert_vulgar_fractions.
    convert_misc_entities.
    convert_misc_characters(options).
    to_ascii.
    # NOTE: String#to_ascii may convert some Unicode characters to ascii we'd already transliterated
    # so we need to do it again just to be safe
    convert_misc_characters(options).
    collapse
end

#replace_whitespace(replace = " ") ⇒ `Object`

Replace runs of whitespace in string. Defaults to a single space but any replacement string may be specified as an argument. Examples:

"Foo       bar".replace_whitespace # => "Foo bar"
"Foo       bar".replace_whitespace("-") # => "Foo-bar"



221
222
223

# File 'lib/stringex/string_extensions.rb', line 221

def replace_whitespace(replace = " ")
  gsub(/\s+/, replace)
end

#strip_html_tags(leave_whitespace = false) ⇒ `Object`

Removes HTML tags from text. This code is simplified from Tobias Luettke’s regular expression in Typo.

# File 'lib/stringex/string_extensions.rb', line 66

def strip_html_tags(leave_whitespace = false)
  name = /[\w:_-]+/
  value = /([A-Za-z0-9]+|('[^']*?'|"[^"]*?"))/
  attr = /(#{name}(\s*=\s*#{value})?)/
  rx = /<[!\/?\[]?(#{name}|--)(\s+(#{attr}(\s+#{attr})*))?\s*([!\/?\]]+|--)?>/
  (leave_whitespace) ?  gsub(rx, "").strip : gsub(rx, "").gsub(/\s+/, " ").strip
end

#to_ascii ⇒ `Object`

Returns string with its UTF-8 characters transliterated to ASCII ones. Example:

"⠋⠗⠁⠝⠉⠑".to_ascii #=> "france"



167
168
169

# File 'lib/stringex/unidecoder.rb', line 167

def to_ascii
  Stringex::Unidecoder.decode(self)
end

#to_html(lite_mode = false) ⇒ `Object`

Returns the string converted (via Textile/RedCloth) to HTML format or self [with a friendly warning] if Redcloth is not available.

Using :lite argument will cause RedCloth to not wrap the HTML in a container P element, which is useful behavior for generating header element text, etc. This is roughly equivalent to ActionView’s textilize_without_paragraph except that it makes RedCloth do all the work instead of just gsubbing the return from RedCloth.

# File 'lib/stringex/string_extensions.rb', line 18

def to_html(lite_mode = false)
  if defined?(RedCloth)
    if lite_mode
      RedCloth.new(self, [:lite_mode]).to_html
    else
      if self =~ /<pre>/
        RedCloth.new(self).to_html.tr("\t", "")
      else
        RedCloth.new(self).to_html.tr("\t", "").gsub(/\n\n/, "")
      end
    end
  else
    warn "String#to_html was called without RedCloth being successfully required"
    self
  end
end

#to_url(options = {}) ⇒ `Object`

Create a URI-friendly representation of the string. This is used internally by acts_as_url but can be called manually in order to generate an URI-friendly version of any string.

# File 'lib/stringex/string_extensions.rb', line 38

def to_url(options = {})
  return self if options[:exclude] && options[:exclude].include?(self)
  remove_formatting(options).downcase.replace_whitespace("-").collapse("-").limit(options[:limit])
end

Module: Stringex::StringExtensions

Overview

Defined Under Namespace

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.included(base) ⇒ Object

Instance Method Details

#collapse(character = " ") ⇒ Object

#convert_accented_entities ⇒ Object

#convert_misc_characters(options = {}) ⇒ Object

#convert_misc_entities ⇒ Object

#convert_smart_punctuation ⇒ Object

#convert_vulgar_fractions ⇒ Object

#limit(limit = nil) ⇒ Object

#remove_formatting(options = {}) ⇒ Object

#replace_whitespace(replace = " ") ⇒ Object

#strip_html_tags(leave_whitespace = false) ⇒ Object

#to_ascii ⇒ Object

#to_html(lite_mode = false) ⇒ Object

#to_url(options = {}) ⇒ Object