Method: RubyParser::Legacy::RubyParserStuff#handle_encoding

Defined in:: lib/ruby_parser/legacy/ruby_parser_extras.rb

#handle_encoding(str) ⇒ `Object`

Returns a UTF-8 encoded string after processing BOMs and magic encoding comments.

Holy crap… ok. Here goes:

Ruby’s file handling and encoding support is insane. We need to be able to lex a file. The lexer file is explicitly UTF-8 to make things cleaner. This allows us to deal with extended chars in class and method names. In order to do this, we need to encode all input source files as UTF-8. First, we look for a UTF-8 BOM by looking at the first line while forcing its encoding to ASCII-8BIT. If we find a BOM, we strip it and set the expected encoding to UTF-8. Then, we search for a magic encoding comment. If found, it overrides the BOM. Finally, we force the encoding of the input string to whatever was found, and then encode that to UTF-8 for compatibility with the lexer.

# File 'lib/ruby_parser/legacy/ruby_parser_extras.rb', line 1023

def handle_encoding str
  str = str.dup
  has_enc = str.respond_to? :encoding
  encoding = nil

  header = str.each_line.first(2)
  header.map! { |s| s.force_encoding "ASCII-8BIT" } if has_enc

  first = header.first || ""
  encoding, str = "utf-8", str[3..-1] if first =~ /\A\xEF\xBB\xBF/

  encoding = $1.strip if header.find { |s|
    s[/^#.*?-\*-.*?coding:\s*([^ ;]+).*?-\*-/, 1] ||
    s[/^#.*(?:en)?coding(?:\s*[:=])\s*([\w-]+)/, 1]
  }

  if encoding then
    if has_enc then
      encoding.sub!(/utf-8-.+$/, 'utf-8') # HACK for stupid emacs formats
      hack_encoding str, encoding
    else
      warn "Skipping magic encoding comment"
    end
  else
    # nothing specified... ugh. try to encode as utf-8
    hack_encoding str if has_enc
  end

  str
end

Method: RubyParser::Legacy::RubyParserStuff#handle_encoding

#handle_encoding(str) ⇒ Object

#handle_encoding(str) ⇒ `Object`