Method: Addressable::URI.normalize_component

Defined in:: lib/addressable/uri.rb

.normalize_component(component, character_class = CharacterClassesRegexps::RESERVED_AND_UNRESERVED, leave_encoded = '') ⇒ `String`

Normalizes the encoding of a URI component.

Examples:

Addressable::URI.normalize_component("simpl%65/%65xampl%65", "b-zB-Z")
=> "simple%2Fex%61mple"
Addressable::URI.normalize_component(
  "simpl%65/%65xampl%65", /[^b-zB-Z]/
)
=> "simple%2Fex%61mple"
Addressable::URI.normalize_component(
  "simpl%65/%65xampl%65",
  Addressable::URI::CharacterClasses::UNRESERVED
)
=> "simple%2Fexample"
Addressable::URI.normalize_component(
  "one%20two%2fthree%26four",
  "0-9a-zA-Z &/",
  "/"
)
=> "one two%2Fthree&four"

Parameters:

component (String, #to_str) —

The URI component to encode.
character_class (String, Regexp) (defaults to: CharacterClassesRegexps::RESERVED_AND_UNRESERVED) —

The characters which are not percent encoded. If a String is passed, the String must be formatted as a regular expression character class. (Do not include the surrounding square brackets.) For example, "b-zB-Z0-9" would cause everything but the letters ‘b’ through ‘z’ and the numbers ‘0’ through ‘9’ to be percent encoded. If a Regexp is passed, the value /[^b-zB-Z0-9]/ would have the same effect. A set of useful String values may be found in the Addressable::URI::CharacterClasses module. The default value is the reserved plus unreserved character classes specified in <a href=“www.ietf.org/rfc/rfc3986.txt”>RFC 3986</a>.
leave_encoded (String) (defaults to: '') —

When character_class is a String then leave_encoded is a string of characters that should remain percent encoded while normalizing the component; if they appear percent encoded in the original component, then they will be upcased (“%2f” normalized to “%2F”) but otherwise left alone.

Returns:

(String) —

The normalized component.

# File 'lib/addressable/uri.rb', line 552

def self.normalize_component(component, character_class=
    CharacterClassesRegexps::RESERVED_AND_UNRESERVED,
    leave_encoded='')
  return nil if component.nil?

  begin
    component = component.to_str
  rescue NoMethodError, TypeError
    raise TypeError, "Can't convert #{component.class} into String."
  end if !component.is_a? String

  if ![String, Regexp].include?(character_class.class)
    raise TypeError,
      "Expected String or Regexp, got #{character_class.inspect}"
  end
  if character_class.kind_of?(String)
    leave_re = if leave_encoded.length > 0
      character_class = "#{character_class}%" unless character_class.include?('%')

      bytes = leave_encoded.bytes
      leave_encoded_pattern = bytes.map { |b| SEQUENCE_ENCODING_TABLE[b] }.join('|')
      "|%(?!#{leave_encoded_pattern}|#{leave_encoded_pattern.upcase})"
    end

    character_class = if leave_re
                        /[^#{character_class}]#{leave_re}/
                      else
                        /[^#{character_class}]/
                      end
  end
  # We can't perform regexps on invalid UTF sequences, but
  # here we need to, so switch to ASCII.
  component = component.dup
  component.force_encoding(Encoding::ASCII_8BIT)
  unencoded = self.unencode_component(component, String, leave_encoded)
  begin
    encoded = self.encode_component(
      unencoded.unicode_normalize(:nfc),
      character_class,
      leave_encoded
    )
  rescue ArgumentError
    encoded = self.encode_component(unencoded)
  end
  encoded.force_encoding(Encoding::UTF_8)
  return encoded
end

Method: Addressable::URI.normalize_component

.normalize_component(component, character_class = CharacterClassesRegexps::RESERVED_AND_UNRESERVED, leave_encoded = '') ⇒ String

Examples:

.normalize_component(component, character_class = CharacterClassesRegexps::RESERVED_AND_UNRESERVED, leave_encoded = '') ⇒ `String`