Class: Babosa::Identifier
- Inherits:
-
Object
- Object
- Babosa::Identifier
- Defined in:
- lib/babosa/identifier.rb
Overview
This class provides some string-manipulation methods specific to slugs.
Note that this class includes many “bang methods” such as #clean! and #normalize! that perform actions on the string in-place. Each of these methods has a corresponding “bangless” method (i.e., Identifier#clean! and Identifier#clean) which does not appear in the documentation because it is generated dynamically.
All of the bang methods return an instance of String, while the bangless versions return an instance of Babosa::Identifier, so that calls to methods specific to this class can be chained:
string = Identifier.new("hello world")
string.with_separators! # => "hello-world"
string.with_separators # => <Babosa::Identifier:0x000001013e1590 @wrapped_string="hello-world">
Constant Summary collapse
- Error =
Class.new(StandardError)
- @@utf8_proxy =
if Babosa.jruby15? UTF8::JavaProxy elsif defined? Unicode UTF8::UnicodeProxy elsif defined? ActiveSupport UTF8::ActiveSupportProxy else UTF8::DumbProxy end
Instance Attribute Summary collapse
-
#wrapped_string ⇒ Object
(also: #to_s)
readonly
Returns the value of attribute wrapped_string.
Class Method Summary collapse
-
.utf8_proxy ⇒ Object
Return the proxy used for UTF-8 support.
-
.utf8_proxy=(obj) ⇒ Object
Set a proxy object used for UTF-8 support.
Instance Method Summary collapse
- #==(value) ⇒ Object
-
#clean! ⇒ Object
Converts dashes to spaces, removes leading and trailing spaces, and replaces multiple whitespace characters with a single space.
-
#default_normalize_options ⇒ Object
The default options for #normalize!.
-
#downcase! ⇒ Object
Perform UTF-8 sensitive downcasing.
- #empty? ⇒ Boolean
- #eql?(value) ⇒ Boolean
-
#initialize(string) ⇒ Identifier
constructor
A new instance of Identifier.
- #method_missing(symbol, *args, &block) ⇒ Object
-
#normalize!(options = nil) ⇒ Object
Normalize the string for use as a URL slug.
-
#normalize_utf8! ⇒ Object
Perform Unicode composition on the wrapped string.
-
#tidy_bytes! ⇒ Object
Attempt to convert characters encoded using CP1252 and IS0-8859-1 to UTF-8.
-
#to_ascii! ⇒ Object
Delete any non-ascii characters.
- #to_identifier ⇒ Object (also: #to_slug)
-
#to_ruby_method!(allow_bangs = true) ⇒ Object
Normalize a string so that it can safely be used as a Ruby method name.
-
#transliterate!(*kinds) ⇒ Object
(also: #approximate_ascii!)
Approximate an ASCII string.
-
#truncate!(max) ⇒ Object
Truncate the string to
max
characters. -
#truncate_bytes!(max) ⇒ Object
Truncate the string to
max
bytes. -
#upcase! ⇒ Object
Perform UTF-8 sensitive upcasing.
-
#with_separators!(char = "-") ⇒ Object
(also: #with_dashes!)
Replaces whitespace with dashes (“-”).
-
#word_chars! ⇒ Object
Remove any non-word characters.
Constructor Details
#initialize(string) ⇒ Identifier
Returns a new instance of Identifier.
65 66 67 68 69 |
# File 'lib/babosa/identifier.rb', line 65 def initialize(string) @wrapped_string = string.to_s tidy_bytes! normalize_utf8! end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(symbol, *args, &block) ⇒ Object
60 61 62 |
# File 'lib/babosa/identifier.rb', line 60 def method_missing(symbol, *args, &block) @wrapped_string.__send__(symbol, *args, &block) end |
Instance Attribute Details
#wrapped_string ⇒ Object (readonly) Also known as: to_s
Returns the value of attribute wrapped_string.
35 36 37 |
# File 'lib/babosa/identifier.rb', line 35 def wrapped_string @wrapped_string end |
Class Method Details
.utf8_proxy ⇒ Object
Return the proxy used for UTF-8 support.
50 51 52 |
# File 'lib/babosa/identifier.rb', line 50 def self.utf8_proxy @@utf8_proxy end |
.utf8_proxy=(obj) ⇒ Object
Set a proxy object used for UTF-8 support.
56 57 58 |
# File 'lib/babosa/identifier.rb', line 56 def self.utf8_proxy=(obj) @@utf8_proxy = obj end |
Instance Method Details
#==(value) ⇒ Object
71 72 73 |
# File 'lib/babosa/identifier.rb', line 71 def ==(value) @wrapped_string.to_s == value.to_s end |
#clean! ⇒ Object
Converts dashes to spaces, removes leading and trailing spaces, and replaces multiple whitespace characters with a single space.
131 132 133 |
# File 'lib/babosa/identifier.rb', line 131 def clean! @wrapped_string = @wrapped_string.gsub("-", " ").squeeze(" ").strip end |
#default_normalize_options ⇒ Object
The default options for #normalize!. Override to set your own defaults.
273 274 275 |
# File 'lib/babosa/identifier.rb', line 273 def {:transliterate => true, :max_length => 255, :separator => "-"} end |
#downcase! ⇒ Object
Perform UTF-8 sensitive downcasing.
241 242 243 |
# File 'lib/babosa/identifier.rb', line 241 def downcase! @wrapped_string = @@utf8_proxy.downcase(@wrapped_string) end |
#empty? ⇒ Boolean
79 80 81 82 83 |
# File 'lib/babosa/identifier.rb', line 79 def empty? # included to make this class :respond_to? :empty for compatibility with Active Support's # #blank? @wrapped_string.empty? end |
#eql?(value) ⇒ Boolean
75 76 77 |
# File 'lib/babosa/identifier.rb', line 75 def eql?(value) @wrapped_string == value end |
#normalize!(options = nil) ⇒ Object
Normalize the string for use as a URL slug. Note that in this context, normalize
means, strip, remove non-letters/numbers, downcasing, truncating to 255 bytes and converting whitespace to dashes.
147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
# File 'lib/babosa/identifier.rb', line 147 def normalize!( = nil) = .merge( || {}) if translit_option = [:transliterate] if translit_option != true transliterate!(*translit_option) else transliterate!(*[:transliterations]) end end to_ascii! if [:to_ascii] clean! word_chars! clean! downcase! truncate_bytes!([:max_length]) with_separators!([:separator]) end |
#normalize_utf8! ⇒ Object
Perform Unicode composition on the wrapped string.
247 248 249 |
# File 'lib/babosa/identifier.rb', line 247 def normalize_utf8! @wrapped_string = @@utf8_proxy.normalize_utf8(@wrapped_string) end |
#tidy_bytes! ⇒ Object
Attempt to convert characters encoded using CP1252 and IS0-8859-1 to UTF-8.
254 255 256 |
# File 'lib/babosa/identifier.rb', line 254 def tidy_bytes! @wrapped_string = @@utf8_proxy.tidy_bytes(@wrapped_string) end |
#to_ascii! ⇒ Object
Delete any non-ascii characters.
193 194 195 |
# File 'lib/babosa/identifier.rb', line 193 def to_ascii! @wrapped_string = @wrapped_string.gsub(/[^\x00-\x7f]/u, '') end |
#to_identifier ⇒ Object Also known as: to_slug
268 269 270 |
# File 'lib/babosa/identifier.rb', line 268 def to_identifier self end |
#to_ruby_method!(allow_bangs = true) ⇒ Object
Normalize a string so that it can safely be used as a Ruby method name.
167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 |
# File 'lib/babosa/identifier.rb', line 167 def to_ruby_method!(allow_bangs = true) leader, trailer = @wrapped_string.strip.scan(/\A(.+)(.)\z/).flatten leader = leader.to_s trailer = trailer.to_s if allow_bangs trailer.downcase! trailer.gsub!(/[^a-z0-9!=\\?]/, '') else trailer.downcase! trailer.gsub!(/[^a-z0-9]/, '') end id = leader.to_identifier id.transliterate! id.to_ascii! id.clean! id.word_chars! id.clean! @wrapped_string = id.to_s + trailer if @wrapped_string == "" raise Error, "Input generates impossible Ruby method name" end with_separators!("_") end |
#transliterate!(*kinds) ⇒ Object Also known as: approximate_ascii!
Approximate an ASCII string. This works only for Western strings using characters that are Roman-alphabet characters + diacritics. Non-letter characters are left unmodified.
string = Identifier.new "Łódź
string.transliterate # => "Lodz, Poland"
string = Identifier.new "日本"
string.transliterate # => "日本"
You can pass any key(s) from Characters.approximations
as arguments. This allows for contextual approximations. Various languages are supported, you can see which ones by looking at the source of Transliterator::Base.
string = Identifier.new "Jürgen Müller"
string.transliterate # => "Jurgen Muller"
string.transliterate :german # => "Juergen Mueller"
string = Identifier.new "¡Feliz año!"
string.transliterate # => "¡Feliz ano!"
string.transliterate :spanish # => "¡Feliz anio!"
The approximations are an array, which you can modify if you choose:
# Make Spanish use "nh" rather than "nn"
Babosa::Transliterator::Spanish::APPROXIMATIONS["ñ"] = "nh"
Notice that this method does not simply convert to ASCII; if you want to remove non-ASCII characters such as “¡” and “¿”, use #to_ascii!:
string.transliterate!(:spanish) # => "¡Feliz anio!"
string.transliterate! # => "¡Feliz anio!"
118 119 120 121 122 123 124 125 126 |
# File 'lib/babosa/identifier.rb', line 118 def transliterate!(*kinds) kinds.compact! kinds = [:latin] if kinds.empty? kinds.each do |kind| transliterator = Transliterator.get(kind).instance @wrapped_string = transliterator.transliterate(@wrapped_string) end @wrapped_string end |
#truncate!(max) ⇒ Object
Truncate the string to max
characters.
201 202 203 |
# File 'lib/babosa/identifier.rb', line 201 def truncate!(max) @wrapped_string = unpack("U*")[0...max].pack("U*") end |
#truncate_bytes!(max) ⇒ Object
Truncate the string to max
bytes. This can be useful for ensuring that a UTF-8 string will always fit into a database column with a certain max byte length. The resulting string may be less than max
if the string must be truncated at a multibyte character boundary.
212 213 214 215 216 217 218 219 220 221 222 223 224 225 |
# File 'lib/babosa/identifier.rb', line 212 def truncate_bytes!(max) return @wrapped_string if @wrapped_string.bytesize <= max curr = 0 new = [] unpack("U*").each do |char| break if curr > max char = [char].pack("U") curr += char.bytesize if curr <= max new << char end end @wrapped_string = new.join end |
#upcase! ⇒ Object
Perform UTF-8 sensitive upcasing.
235 236 237 |
# File 'lib/babosa/identifier.rb', line 235 def upcase! @wrapped_string = @@utf8_proxy.upcase(@wrapped_string) end |
#with_separators!(char = "-") ⇒ Object Also known as: with_dashes!
Replaces whitespace with dashes (“-”).
229 230 231 |
# File 'lib/babosa/identifier.rb', line 229 def with_separators!(char = "-") @wrapped_string = @wrapped_string.gsub(/\s/u, char) end |
#word_chars! ⇒ Object
Remove any non-word characters. For this library’s purposes, this means anything other than letters, numbers, spaces, newlines and linefeeds.
138 139 140 |
# File 'lib/babosa/identifier.rb', line 138 def word_chars! @wrapped_string = (unpack("U*") - Babosa::STRIPPABLE).pack("U*") end |