Module: RubyGit::CommandLine::EncodingNormalizer

Defined in:
lib/ruby_git/command_line/encoding_normalizer.rb

Overview

Utility to normalize string encoding

Class Method Summary collapse

Class Method Details

.detect_encoding(str) ⇒ String

Detects the character encoding used to create a string or binary data

Detects the encoding of a string or return binary if it cannot be detected

Examples:

EncodingNormalizer.detect_encoding("Hello, world!") #=> "ascii"
EncodingNormalizer.detect_encoding("\xCB\xEF\xF1\xE5\xEC") #=> "ISO-8859-7"
EncodingNormalizer.detect_encoding("\xC0\xCC\xB0\xCD\xC0\xBA") #=> "EUC-KR"

Parameters:

  • str (String)

    the string to detect the encoding of

Returns:

  • (String)

    the detected encoding



22
23
24
# File 'lib/ruby_git/command_line/encoding_normalizer.rb', line 22

def self.detect_encoding(str)
  CharDet.detect(str)&.dig('encoding') || Encoding::BINARY.name
end

.normalize(str, normalize_to: Encoding::UTF_8.name) ⇒ String

Normalizes the encoding to normalize_to

Examples:

EncodingNormalizer.normalize("Hello, world!") #=> "Hello, world!"
EncodingNormalizer.normalize("\xCB\xEF\xF1\xE5\xEC") #=> "Λορεμ"
EncodingNormalizer.normalize("\xC0\xCC\xB0\xCD\xC0\xBA") #=> "이것은"

Parameters:

  • str (String)

    the string to normalize

  • normalize_to (String) (defaults to: Encoding::UTF_8.name)

    the name of the encoding to normalize to

Returns:

  • (String)

    the string with encoding converted to normalize_to

Raises:

  • (Encoding::UndefinedConversionError)

    if the string cannot be converted to the default encoding



40
41
42
43
44
45
46
47
48
# File 'lib/ruby_git/command_line/encoding_normalizer.rb', line 40

def self.normalize(str, normalize_to: Encoding::UTF_8.name)
  encoding_options = { invalid: :replace, undef: :replace }

  detected_encoding = detect_encoding(str)

  return str if str.valid_encoding? && detected_encoding == normalize_to

  str.encode(normalize_to, detected_encoding, **encoding_options)
end