Class: Kebab::Identifier
- Inherits:
-
Object
- Object
- Kebab::Identifier
- Defined in:
- lib/kebab/identifier.rb
Overview
This class provides some string-manipulation methods specific to slugs.
Note that this class includes many “bang methods” such as #clean! and #normalize! that perform actions on the string in-place. Each of these methods has a corresponding “bangless” method (i.e., Identifier#clean! and Identifier#clean) which does not appear in the documentation because it is generated dynamically.
All of the bang methods return an instance of String, while the bangless versions return an instance of Kebab::Identifier, so that calls to methods specific to this class can be chained:
string = Identifier.new("hello world")
string.with_separators! # => "hello-world"
string.with_separators # => <Kebab::Identifier:0x000001013e1590 @wrapped_string="hello-world">
Constant Summary collapse
- Error =
Class.new(StandardError)
- @@utf8_proxy =
if Kebab.jruby15? UTF8::JavaProxy elsif defined? Unicode::VERSION UTF8::UnicodeProxy elsif defined? ActiveSupport UTF8::ActiveSupportProxy else UTF8::DumbProxy end
Instance Attribute Summary collapse
-
#wrapped_string ⇒ Object
(also: #to_s)
readonly
Returns the value of attribute wrapped_string.
Class Method Summary collapse
-
.utf8_proxy ⇒ Object
Return the proxy used for UTF-8 support.
-
.utf8_proxy=(obj) ⇒ Object
Set a proxy object used for UTF-8 support.
Instance Method Summary collapse
- #==(value) ⇒ Object
-
#clean! ⇒ Object
Converts dashes to spaces, removes leading and trailing spaces, and replaces multiple whitespace characters with a single space.
-
#default_normalize_options ⇒ Object
The default options for #normalize!.
-
#downcase! ⇒ Object
Perform UTF-8 sensitive downcasing.
- #empty? ⇒ Boolean
- #eql?(value) ⇒ Boolean
-
#initialize(string) ⇒ Identifier
constructor
A new instance of Identifier.
- #method_missing(symbol, *args, &block) ⇒ Object
-
#normalize!(options = nil) ⇒ Object
Normalize the string for use as a URL slug.
-
#normalize_utf8! ⇒ Object
Perform Unicode composition on the wrapped string.
-
#tidy_bytes! ⇒ Object
Attempt to convert characters encoded using CP1252 and IS0-8859-1 to UTF-8.
-
#to_ascii! ⇒ Object
Delete any non-ascii characters.
- #to_identifier ⇒ Object (also: #to_slug)
-
#to_ruby_method!(allow_bangs = true) ⇒ Object
Normalize a string so that it can safely be used as a Ruby method name.
-
#transliterate!(*kinds) ⇒ Object
(also: #approximate_ascii!)
Approximate an ASCII string.
-
#truncate!(max) ⇒ Object
Truncate the string to
maxcharacters. -
#truncate_bytes!(max) ⇒ Object
Truncate the string to
maxbytes. -
#upcase! ⇒ Object
Perform UTF-8 sensitive upcasing.
-
#with_separators!(char = "-") ⇒ Object
(also: #with_dashes!)
Replaces whitespace with dashes (“-”).
-
#word_chars! ⇒ Object
Remove any non-word characters.
Constructor Details
#initialize(string) ⇒ Identifier
Returns a new instance of Identifier.
66 67 68 69 70 |
# File 'lib/kebab/identifier.rb', line 66 def initialize(string) @wrapped_string = string.to_s tidy_bytes! normalize_utf8! end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(symbol, *args, &block) ⇒ Object
61 62 63 |
# File 'lib/kebab/identifier.rb', line 61 def method_missing(symbol, *args, &block) @wrapped_string.__send__(symbol, *args, &block) end |
Instance Attribute Details
#wrapped_string ⇒ Object (readonly) Also known as: to_s
Returns the value of attribute wrapped_string.
36 37 38 |
# File 'lib/kebab/identifier.rb', line 36 def wrapped_string @wrapped_string end |
Class Method Details
.utf8_proxy ⇒ Object
Return the proxy used for UTF-8 support.
51 52 53 |
# File 'lib/kebab/identifier.rb', line 51 def self.utf8_proxy @@utf8_proxy end |
.utf8_proxy=(obj) ⇒ Object
Set a proxy object used for UTF-8 support.
57 58 59 |
# File 'lib/kebab/identifier.rb', line 57 def self.utf8_proxy=(obj) @@utf8_proxy = obj end |
Instance Method Details
#==(value) ⇒ Object
72 73 74 |
# File 'lib/kebab/identifier.rb', line 72 def ==(value) @wrapped_string.to_s == value.to_s end |
#clean! ⇒ Object
Converts dashes to spaces, removes leading and trailing spaces, and replaces multiple whitespace characters with a single space.
132 133 134 |
# File 'lib/kebab/identifier.rb', line 132 def clean! @wrapped_string = @wrapped_string.gsub("-", " ").squeeze(" ").strip end |
#default_normalize_options ⇒ Object
The default options for #normalize!. Override to set your own defaults.
274 275 276 |
# File 'lib/kebab/identifier.rb', line 274 def {:transliterate => true, :max_length => 255, :separator => "-"} end |
#downcase! ⇒ Object
Perform UTF-8 sensitive downcasing.
242 243 244 |
# File 'lib/kebab/identifier.rb', line 242 def downcase! @wrapped_string = @@utf8_proxy.downcase(@wrapped_string) end |
#empty? ⇒ Boolean
80 81 82 83 84 |
# File 'lib/kebab/identifier.rb', line 80 def empty? # included to make this class :respond_to? :empty for compatibility with Active Support's # #blank? @wrapped_string.empty? end |
#eql?(value) ⇒ Boolean
76 77 78 |
# File 'lib/kebab/identifier.rb', line 76 def eql?(value) @wrapped_string == value end |
#normalize!(options = nil) ⇒ Object
Normalize the string for use as a URL slug. Note that in this context, normalize means, strip, remove non-letters/numbers, downcasing, truncating to 255 bytes and converting whitespace to dashes.
148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 |
# File 'lib/kebab/identifier.rb', line 148 def normalize!( = nil) = .merge( || {}) if translit_option = [:transliterate] if translit_option != true transliterate!(*translit_option) else transliterate!(*[:transliterations]) end end to_ascii! if [:to_ascii] clean! word_chars! clean! downcase! truncate_bytes!([:max_length]) with_separators!([:separator]) end |
#normalize_utf8! ⇒ Object
Perform Unicode composition on the wrapped string.
248 249 250 |
# File 'lib/kebab/identifier.rb', line 248 def normalize_utf8! @wrapped_string = @@utf8_proxy.normalize_utf8(@wrapped_string) end |
#tidy_bytes! ⇒ Object
Attempt to convert characters encoded using CP1252 and IS0-8859-1 to UTF-8.
255 256 257 |
# File 'lib/kebab/identifier.rb', line 255 def tidy_bytes! @wrapped_string = @@utf8_proxy.tidy_bytes(@wrapped_string) end |
#to_ascii! ⇒ Object
Delete any non-ascii characters.
194 195 196 |
# File 'lib/kebab/identifier.rb', line 194 def to_ascii! @wrapped_string = @wrapped_string.gsub(/[^\x00-\x7f]/u, '') end |
#to_identifier ⇒ Object Also known as: to_slug
269 270 271 |
# File 'lib/kebab/identifier.rb', line 269 def to_identifier self end |
#to_ruby_method!(allow_bangs = true) ⇒ Object
Normalize a string so that it can safely be used as a Ruby method name.
168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 |
# File 'lib/kebab/identifier.rb', line 168 def to_ruby_method!(allow_bangs = true) leader, trailer = @wrapped_string.strip.scan(/\A(.+)(.)\z/).flatten leader = leader.to_s trailer = trailer.to_s if allow_bangs trailer.downcase! trailer.gsub!(/[^a-z0-9!=\\?]/, '') else trailer.downcase! trailer.gsub!(/[^a-z0-9]/, '') end id = leader.to_identifier id.transliterate! id.to_ascii! id.clean! id.word_chars! id.clean! @wrapped_string = id.to_s + trailer if @wrapped_string == "" raise Error, "Input generates impossible Ruby method name" end with_separators!("_") end |
#transliterate!(*kinds) ⇒ Object Also known as: approximate_ascii!
Approximate an ASCII string. This works only for Western strings using characters that are Roman-alphabet characters + diacritics. Non-letter characters are left unmodified.
string = Identifier.new "Łódź
string.transliterate # => "Lodz, Poland"
string = Identifier.new "
You can pass any key(s) from Characters.approximations as arguments. This allows for contextual approximations. Various languages are supported, you can see which ones by looking at the source of Transliterator::Base.
string = Identifier.new "Jürgen Müller"
string.transliterate # => "Jurgen Muller"
string.transliterate :german # => "Juergen Mueller"
string = Identifier.new "¡Feliz año!"
string.transliterate # => "¡Feliz ano!"
string.transliterate :spanish # => "¡Feliz anio!"
The approximations are an array, which you can modify if you choose:
# Make Spanish use "nh" rather than "nn"
Kebab::Transliterator::Spanish::APPROXIMATIONS["ñ"] = "nh"
Notice that this method does not simply convert to ASCII; if you want to remove non-ASCII characters such as “¡” and “¿”, use #to_ascii!:
string.transliterate!(:spanish) # => "¡Feliz anio!"
string.transliterate! # => "¡Feliz anio!"
119 120 121 122 123 124 125 126 127 |
# File 'lib/kebab/identifier.rb', line 119 def transliterate!(*kinds) kinds.compact! kinds = [:latin] if kinds.empty? kinds.each do |kind| transliterator = Transliterator.get(kind).instance @wrapped_string = transliterator.transliterate(@wrapped_string) end @wrapped_string end |
#truncate!(max) ⇒ Object
Truncate the string to max characters.
202 203 204 |
# File 'lib/kebab/identifier.rb', line 202 def truncate!(max) @wrapped_string = unpack("U*")[0...max].pack("U*") end |
#truncate_bytes!(max) ⇒ Object
Truncate the string to max bytes. This can be useful for ensuring that a UTF-8 string will always fit into a database column with a certain max byte length. The resulting string may be less than max if the string must be truncated at a multibyte character boundary.
213 214 215 216 217 218 219 220 221 222 223 224 225 226 |
# File 'lib/kebab/identifier.rb', line 213 def truncate_bytes!(max) return @wrapped_string if @wrapped_string.bytesize <= max curr = 0 new = [] unpack("U*").each do |char| break if curr > max char = [char].pack("U") curr += char.bytesize if curr <= max new << char end end @wrapped_string = new.join end |
#upcase! ⇒ Object
Perform UTF-8 sensitive upcasing.
236 237 238 |
# File 'lib/kebab/identifier.rb', line 236 def upcase! @wrapped_string = @@utf8_proxy.upcase(@wrapped_string) end |
#with_separators!(char = "-") ⇒ Object Also known as: with_dashes!
Replaces whitespace with dashes (“-”).
230 231 232 |
# File 'lib/kebab/identifier.rb', line 230 def with_separators!(char = "-") @wrapped_string = @wrapped_string.gsub(/\s/u, char) end |
#word_chars! ⇒ Object
Remove any non-word characters. For this library’s purposes, this means anything other than letters, numbers, spaces, newlines and linefeeds.
139 140 141 |
# File 'lib/kebab/identifier.rb', line 139 def word_chars! @wrapped_string = (unpack("U*") - Kebab::STRIPPABLE).pack("U*") end |