Class: Babosa::Identifier
- Inherits:
-
Object
- Object
- Babosa::Identifier
- Defined in:
- lib/babosa/identifier.rb
Overview
This class provides some string-manipulation methods specific to slugs.
Note that this class includes many “bang methods” such as #clean! and #normalize! that perform actions on the string in-place. Each of these methods has a corresponding “bangless” method (i.e., Identifier#clean! and Identifier#clean) which does not appear in the documentation because it is generated dynamically.
All of the bang methods return an instance of String, while the bangless versions return an instance of Babosa::Identifier, so that calls to methods specific to this class can be chained:
string = Identifier.new("hello world")
string.with_separators! # => "hello-world"
string.with_separators # => <Babosa::Identifier:0x000001013e1590 @wrapped_string="hello-world">
Constant Summary collapse
- @@utf8_proxy =
if Babosa.jruby15? UTF8::JavaProxy elsif defined? Unicode UTF8::UnicodeProxy elsif defined? ActiveSupport UTF8::ActiveSupportProxy else UTF8::DumbProxy end
Instance Attribute Summary collapse
-
#wrapped_string ⇒ Object
(also: #to_s)
readonly
Returns the value of attribute wrapped_string.
Class Method Summary collapse
-
.utf8_proxy ⇒ Object
Return the proxy used for UTF-8 support.
-
.utf8_proxy=(obj) ⇒ Object
Set a proxy object used for UTF-8 support.
Instance Method Summary collapse
- #==(value) ⇒ Object
-
#clean! ⇒ Object
Converts dashes to spaces, removes leading and trailing spaces, and replaces multiple whitespace characters with a single space.
-
#default_normalize_options ⇒ Object
The default options for #normalize!.
-
#downcase! ⇒ Object
Perform UTF-8 sensitive downcasing.
- #empty? ⇒ Boolean
- #eql?(value) ⇒ Boolean
-
#initialize(string) ⇒ Identifier
constructor
A new instance of Identifier.
- #method_missing(symbol, *args, &block) ⇒ Object
-
#normalize!(options = nil) ⇒ Object
Normalize the string for use as a URL slug.
-
#normalize_utf8! ⇒ Object
Perform Unicode composition on the wrapped string.
-
#tidy_bytes! ⇒ Object
Attempt to convert characters encoded using CP1252 and IS0-8859-1 to UTF-8.
-
#to_ascii! ⇒ Object
Delete any non-ascii characters.
- #to_identifier ⇒ Object (also: #to_slug)
-
#to_ruby_method!(allow_bangs = true) ⇒ Object
Normalize a string so that it can safely be used as a Ruby method name.
-
#transliterate!(*kinds) ⇒ Object
(also: #approximate_ascii!)
Approximate an ASCII string.
-
#truncate!(max) ⇒ Object
Truncate the string to
max
characters. -
#truncate_bytes!(max) ⇒ Object
Truncate the string to
max
bytes. -
#upcase! ⇒ Object
Perform UTF-8 sensitive upcasing.
-
#with_separators!(char = "-") ⇒ Object
(also: #with_dashes!)
Replaces whitespace with dashes (“-”).
-
#word_chars! ⇒ Object
Remove any non-word characters.
Constructor Details
#initialize(string) ⇒ Identifier
Returns a new instance of Identifier.
63 64 65 66 67 |
# File 'lib/babosa/identifier.rb', line 63 def initialize(string) @wrapped_string = string.to_s tidy_bytes! normalize_utf8! end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(symbol, *args, &block) ⇒ Object
58 59 60 |
# File 'lib/babosa/identifier.rb', line 58 def method_missing(symbol, *args, &block) @wrapped_string.__send__(symbol, *args, &block) end |
Instance Attribute Details
#wrapped_string ⇒ Object (readonly) Also known as: to_s
Returns the value of attribute wrapped_string.
33 34 35 |
# File 'lib/babosa/identifier.rb', line 33 def wrapped_string @wrapped_string end |
Class Method Details
.utf8_proxy ⇒ Object
Return the proxy used for UTF-8 support.
48 49 50 |
# File 'lib/babosa/identifier.rb', line 48 def self.utf8_proxy @@utf8_proxy end |
.utf8_proxy=(obj) ⇒ Object
Set a proxy object used for UTF-8 support.
54 55 56 |
# File 'lib/babosa/identifier.rb', line 54 def self.utf8_proxy=(obj) @@utf8_proxy = obj end |
Instance Method Details
#==(value) ⇒ Object
69 70 71 |
# File 'lib/babosa/identifier.rb', line 69 def ==(value) @wrapped_string.to_s == value.to_s end |
#clean! ⇒ Object
Converts dashes to spaces, removes leading and trailing spaces, and replaces multiple whitespace characters with a single space.
128 129 130 |
# File 'lib/babosa/identifier.rb', line 128 def clean! @wrapped_string = @wrapped_string.gsub("-", " ").squeeze(" ").strip end |
#default_normalize_options ⇒ Object
The default options for #normalize!. Override to set your own defaults.
268 269 270 |
# File 'lib/babosa/identifier.rb', line 268 def {:transliterate => true, :max_length => 255, :separator => "-"} end |
#downcase! ⇒ Object
Perform UTF-8 sensitive downcasing.
236 237 238 |
# File 'lib/babosa/identifier.rb', line 236 def downcase! @wrapped_string = @@utf8_proxy.downcase(@wrapped_string) end |
#empty? ⇒ Boolean
77 78 79 80 81 |
# File 'lib/babosa/identifier.rb', line 77 def empty? # included to make this class :respond_to? :empty for compatibility with Active Support's # #blank? @wrapped_string.empty? end |
#eql?(value) ⇒ Boolean
73 74 75 |
# File 'lib/babosa/identifier.rb', line 73 def eql?(value) @wrapped_string == value end |
#normalize!(options = nil) ⇒ Object
Normalize the string for use as a URL slug. Note that in this context, normalize
means, strip, remove non-letters/numbers, downcasing, truncating to 255 bytes and converting whitespace to dashes.
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
# File 'lib/babosa/identifier.rb', line 144 def normalize!( = nil) # Handle deprecated usage if == true warn "#normalize! now takes a hash of options rather than a boolean" = .merge(:to_ascii => true) else = .merge( || {}) end if translit_option = [:transliterate] if translit_option != true transliterate!(*translit_option) else transliterate!(*[:transliterations]) end end to_ascii! if [:to_ascii] clean! word_chars! clean! downcase! truncate_bytes!([:max_length]) with_separators!([:separator]) end |
#normalize_utf8! ⇒ Object
Perform Unicode composition on the wrapped string.
242 243 244 |
# File 'lib/babosa/identifier.rb', line 242 def normalize_utf8! @wrapped_string = @@utf8_proxy.normalize_utf8(@wrapped_string) end |
#tidy_bytes! ⇒ Object
Attempt to convert characters encoded using CP1252 and IS0-8859-1 to UTF-8.
249 250 251 |
# File 'lib/babosa/identifier.rb', line 249 def tidy_bytes! @wrapped_string = @@utf8_proxy.tidy_bytes(@wrapped_string) end |
#to_ascii! ⇒ Object
Delete any non-ascii characters.
188 189 190 |
# File 'lib/babosa/identifier.rb', line 188 def to_ascii! @wrapped_string = @wrapped_string.gsub(/[^\x00-\x7f]/u, '') end |
#to_identifier ⇒ Object Also known as: to_slug
263 264 265 |
# File 'lib/babosa/identifier.rb', line 263 def to_identifier self end |
#to_ruby_method!(allow_bangs = true) ⇒ Object
Normalize a string so that it can safely be used as a Ruby method name.
169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 |
# File 'lib/babosa/identifier.rb', line 169 def to_ruby_method!(allow_bangs = true) leader, trailer = @wrapped_string.strip.scan(/\A(.+)(.)\z/).flatten if allow_bangs trailer.downcase.gsub!(/[^a-z0-9!=\\\\?]/, '') else trailer.downcase.gsub!(/[^a-z0-9]/, '') end id = leader.to_identifier id.transliterate! id.to_ascii! id.clean! id.word_chars! id.clean! @wrapped_string = id.to_s + trailer with_separators!("_") end |
#transliterate!(*kinds) ⇒ Object Also known as: approximate_ascii!
Approximate an ASCII string. This works only for Western strings using characters that are Roman-alphabet characters + diacritics. Non-letter characters are left unmodified.
string = Identifier.new "Łódź
string.transliterate # => "Lodz, Poland"
string = Identifier.new "日本"
string.transliterate # => "日本"
You can pass any key(s) from Characters.approximations
as arguments. This allows for contextual approximations. Various languages are supported, you can see which ones by looking at the source of Transliterator::Base.
string = Identifier.new "Jürgen Müller"
string.transliterate # => "Jurgen Muller"
string.transliterate :german # => "Juergen Mueller"
string = Identifier.new "¡Feliz año!"
string.transliterate # => "¡Feliz ano!"
string.transliterate :spanish # => "¡Feliz anio!"
You can modify the built-in approximations, or add your own:
# Make Spanish use "nh" rather than "nn"
Babosa::Characters.add_approximations(:spanish, "ñ" => "nh")
Notice that this method does not simply convert to ASCII; if you want to remove non-ASCII characters such as “¡” and “¿”, use #to_ascii!:
string.transliterate!(:spanish) # => "¡Feliz anio!"
string.transliterate! # => "¡Feliz anio!"
115 116 117 118 119 120 121 122 123 |
# File 'lib/babosa/identifier.rb', line 115 def transliterate!(*kinds) kinds.compact! kinds = [:latin] if kinds.empty? kinds.each do |kind| transliterator = Transliterator.get(kind).instance @wrapped_string = transliterator.transliterate(@wrapped_string) end @wrapped_string end |
#truncate!(max) ⇒ Object
Truncate the string to max
characters.
196 197 198 |
# File 'lib/babosa/identifier.rb', line 196 def truncate!(max) @wrapped_string = unpack("U*")[0...max].pack("U*") end |
#truncate_bytes!(max) ⇒ Object
Truncate the string to max
bytes. This can be useful for ensuring that a UTF-8 string will always fit into a database column with a certain max byte length. The resulting string may be less than max
if the string must be truncated at a multibyte character boundary.
207 208 209 210 211 212 213 214 215 216 217 218 219 220 |
# File 'lib/babosa/identifier.rb', line 207 def truncate_bytes!(max) return @wrapped_string if @wrapped_string.bytesize <= max curr = 0 new = [] unpack("U*").each do |char| break if curr > max char = [char].pack("U") curr += char.bytesize if curr <= max new << char end end @wrapped_string = new.join end |
#upcase! ⇒ Object
Perform UTF-8 sensitive upcasing.
230 231 232 |
# File 'lib/babosa/identifier.rb', line 230 def upcase! @wrapped_string = @@utf8_proxy.upcase(@wrapped_string) end |
#with_separators!(char = "-") ⇒ Object Also known as: with_dashes!
Replaces whitespace with dashes (“-”).
224 225 226 |
# File 'lib/babosa/identifier.rb', line 224 def with_separators!(char = "-") @wrapped_string = @wrapped_string.gsub(/\s/u, char) end |
#word_chars! ⇒ Object
Remove any non-word characters. For this library’s purposes, this means anything other than letters, numbers, spaces, newlines and linefeeds.
135 136 137 |
# File 'lib/babosa/identifier.rb', line 135 def word_chars! @wrapped_string = (unpack("U*") - Babosa::STRIPPABLE).pack("U*") end |