Class: Babosa::Identifier
- Inherits:
-
Object
- Object
- Babosa::Identifier
- Defined in:
- lib/babosa/identifier.rb
Overview
This class provides some string-manipulation methods specific to slugs.
Note that this class includes many “bang methods” such as #clean! and #normalize! that perform actions on the string in-place. Each of these methods has a corresponding “bangless” method (i.e., Identifier#clean! and Identifier#clean) which does not appear in the documentation because it is generated dynamically.
All of the bang methods return an instance of String, while the bangless versions return an instance of Babosa::Identifier, so that calls to methods specific to this class can be chained:
string = Identifier.new("hello world")
string.with_separators! # => "hello-world"
string.with_separators # => <Babosa::Identifier:0x000001013e1590 @wrapped_string="hello-world">
Constant Summary collapse
- @@utf8_proxy =
if Babosa.jruby15? UTF8::JavaProxy elsif defined? Unicode UTF8::UnicodeProxy elsif defined? ActiveSupport UTF8::ActiveSupportProxy else UTF8::DumbProxy end
Instance Attribute Summary collapse
-
#wrapped_string ⇒ Object
(also: #to_s)
readonly
Returns the value of attribute wrapped_string.
Class Method Summary collapse
-
.utf8_proxy ⇒ Object
Return the proxy used for UTF-8 support.
-
.utf8_proxy=(obj) ⇒ Object
Set a proxy object used for UTF-8 support.
Instance Method Summary collapse
-
#clean! ⇒ Object
Converts dashes to spaces, removes leading and trailing spaces, and replaces multiple whitespace characters with a single space.
-
#default_normalize_options ⇒ Object
The default options for #normalize!.
-
#downcase! ⇒ Object
Perform UTF-8 sensitive downcasing.
-
#initialize(string) ⇒ Identifier
constructor
A new instance of Identifier.
- #method_missing(symbol, *args, &block) ⇒ Object
-
#normalize!(options = nil) ⇒ Object
Normalize the string for use as a URL slug.
-
#normalize_utf8! ⇒ Object
Perform Unicode composition on the wrapped string.
-
#tidy_bytes! ⇒ Object
Attempt to convert characters encoded using CP1252 and IS0-8859-1 to UTF-8.
-
#to_ascii! ⇒ Object
Delete any non-ascii characters.
- #to_identifier ⇒ Object (also: #to_slug)
-
#to_ruby_method!(allow_bangs = true) ⇒ Object
Normalize a string so that it can safely be used as a Ruby method name.
-
#transliterate!(transliterations = {}) ⇒ Object
(also: #approximate_ascii!)
Approximate an ASCII string.
-
#truncate!(max) ⇒ Object
Truncate the string to
max
characters. -
#truncate_bytes!(max) ⇒ Object
Truncate the string to
max
bytes. -
#upcase! ⇒ Object
Perform UTF-8 sensitive upcasing.
-
#with_separators!(char = "-") ⇒ Object
(also: #with_dashes!)
Replaces whitespace with dashes (“-”).
-
#word_chars! ⇒ Object
Remove any non-word characters.
Constructor Details
#initialize(string) ⇒ Identifier
Returns a new instance of Identifier.
53 54 55 56 57 |
# File 'lib/babosa/identifier.rb', line 53 def initialize(string) @wrapped_string = string.to_s tidy_bytes! normalize_utf8! end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(symbol, *args, &block) ⇒ Object
48 49 50 |
# File 'lib/babosa/identifier.rb', line 48 def method_missing(symbol, *args, &block) @wrapped_string.__send__(symbol, *args, &block) end |
Instance Attribute Details
#wrapped_string ⇒ Object (readonly) Also known as: to_s
Returns the value of attribute wrapped_string.
23 24 25 |
# File 'lib/babosa/identifier.rb', line 23 def wrapped_string @wrapped_string end |
Class Method Details
.utf8_proxy ⇒ Object
Return the proxy used for UTF-8 support.
38 39 40 |
# File 'lib/babosa/identifier.rb', line 38 def self.utf8_proxy @@utf8_proxy end |
.utf8_proxy=(obj) ⇒ Object
Set a proxy object used for UTF-8 support.
44 45 46 |
# File 'lib/babosa/identifier.rb', line 44 def self.utf8_proxy=(obj) @@utf8_proxy = obj end |
Instance Method Details
#clean! ⇒ Object
Converts dashes to spaces, removes leading and trailing spaces, and replaces multiple whitespace characters with a single space.
103 104 105 |
# File 'lib/babosa/identifier.rb', line 103 def clean! @wrapped_string = @wrapped_string.gsub("-", " ").squeeze(" ").strip end |
#default_normalize_options ⇒ Object
The default options for #normalize!. Override to set your own defaults.
238 239 240 |
# File 'lib/babosa/identifier.rb', line 238 def {:transliterate => true, :max_length => 255, :separator => "-"} end |
#downcase! ⇒ Object
Perform UTF-8 sensitive downcasing.
207 208 209 |
# File 'lib/babosa/identifier.rb', line 207 def downcase! @wrapped_string = @@utf8_proxy.downcase(@wrapped_string) end |
#normalize!(options = nil) ⇒ Object
Normalize the string for use as a URL slug. Note that in this context, normalize
means, strip, remove non-letters/numbers, downcasing, truncating to 255 bytes and converting whitespace to dashes.
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
# File 'lib/babosa/identifier.rb', line 119 def normalize!( = nil) # Handle deprecated usage if == true warn "#normalize! now takes a hash of options rather than a boolean" = .merge(:to_ascii => true) else = .merge( || {}) end if [:transliterate] transliterate!(*[:transliterations]) end to_ascii! if [:to_ascii] clean! word_chars! clean! downcase! truncate_bytes!([:max_length]) with_separators!([:separator]) end |
#normalize_utf8! ⇒ Object
Perform Unicode composition on the wrapped string.
213 214 215 |
# File 'lib/babosa/identifier.rb', line 213 def normalize_utf8! @wrapped_string = @@utf8_proxy.normalize_utf8(@wrapped_string) end |
#tidy_bytes! ⇒ Object
Attempt to convert characters encoded using CP1252 and IS0-8859-1 to UTF-8.
220 221 222 |
# File 'lib/babosa/identifier.rb', line 220 def tidy_bytes! @wrapped_string = @@utf8_proxy.tidy_bytes(@wrapped_string) end |
#to_ascii! ⇒ Object
Delete any non-ascii characters.
159 160 161 |
# File 'lib/babosa/identifier.rb', line 159 def to_ascii! @wrapped_string = @wrapped_string.gsub(/[^\x00-\x7f]/u, '') end |
#to_identifier ⇒ Object Also known as: to_slug
233 234 235 |
# File 'lib/babosa/identifier.rb', line 233 def to_identifier self end |
#to_ruby_method!(allow_bangs = true) ⇒ Object
Normalize a string so that it can safely be used as a Ruby method name.
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
# File 'lib/babosa/identifier.rb', line 140 def to_ruby_method!(allow_bangs = true) leader, trailer = @wrapped_string.strip.scan(/\A(.+)(.)\z/).flatten if allow_bangs trailer.downcase.gsub!(/[^a-z0-9!=\\\\?]/, '') else trailer.downcase.gsub!(/[^a-z0-9]/, '') end id = leader.to_identifier id.transliterate! id.to_ascii! id.clean! id.word_chars! id.clean! @wrapped_string = id.to_s + trailer with_separators!("_") end |
#transliterate!(transliterations = {}) ⇒ Object Also known as: approximate_ascii!
Approximate an ASCII string. This works only for Western strings using characters that are Roman-alphabet characters + diacritics. Non-letter characters are left unmodified.
string = Identifier.new "Łódź, Poland"
string.transliterate # => "Lodz, Poland"
string = Identifier.new "日本"
string.transliterate # => "日本"
You can pass any key(s) from Characters.approximations
as arguments. This allows for contextual approximations. Danish, German, Serbian and Spanish are currently supported.
string = Identifier.new "Jürgen Müller"
string.transliterate # => "Jurgen Muller"
string.transliterate :german # => "Juergen Mueller"
string = Identifier.new "¡Feliz año!"
string.transliterate # => "¡Feliz ano!"
string.transliterate :spanish # => "¡Feliz anio!"
You can modify the built-in approximations, or add your own:
# Make Spanish use "nh" rather than "nn"
Babosa::Characters.add_approximations(:spanish, "ñ" => "nh")
Notice that this method does not simply convert to ASCII; if you want to remove non-ASCII characters such as “¡” and “¿”, use #to_ascii!:
string.transliterate!(:spanish) # => "¡Feliz anio!"
string.transliterate! # => "Feliz anio!"
91 92 93 94 95 96 97 98 |
# File 'lib/babosa/identifier.rb', line 91 def transliterate!(transliterations = {}) if transliterations.kind_of? Symbol transliterations = Characters.approximations[transliterations] else transliterations ||= {} end @wrapped_string = unpack("U*").map { |char| approx_char(char, transliterations) }.flatten.pack("U*") end |
#truncate!(max) ⇒ Object
Truncate the string to max
characters.
167 168 169 |
# File 'lib/babosa/identifier.rb', line 167 def truncate!(max) @wrapped_string = unpack("U*")[0...max].pack("U*") end |
#truncate_bytes!(max) ⇒ Object
Truncate the string to max
bytes. This can be useful for ensuring that a UTF-8 string will always fit into a database column with a certain max byte length. The resulting string may be less than max
if the string must be truncated at a multibyte character boundary.
178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
# File 'lib/babosa/identifier.rb', line 178 def truncate_bytes!(max) return @wrapped_string if @wrapped_string.bytesize <= max curr = 0 new = [] unpack("U*").each do |char| break if curr > max char = [char].pack("U") curr += char.bytesize if curr <= max new << char end end @wrapped_string = new.join end |
#upcase! ⇒ Object
Perform UTF-8 sensitive upcasing.
201 202 203 |
# File 'lib/babosa/identifier.rb', line 201 def upcase! @wrapped_string = @@utf8_proxy.upcase(@wrapped_string) end |
#with_separators!(char = "-") ⇒ Object Also known as: with_dashes!
Replaces whitespace with dashes (“-”).
195 196 197 |
# File 'lib/babosa/identifier.rb', line 195 def with_separators!(char = "-") @wrapped_string = @wrapped_string.gsub(/\s/u, char) end |
#word_chars! ⇒ Object
Remove any non-word characters. For this library’s purposes, this means anything other than letters, numbers, spaces, newlines and linefeeds.
110 111 112 |
# File 'lib/babosa/identifier.rb', line 110 def word_chars! @wrapped_string = (unpack("U*") - Characters.strippable).pack("U*") end |