Module: Rfc2047

Defined in:
lib/sup/rfc2047.rb

Overview

$Id: rfc2047.rb,v 1.4 2003/04/18 20:55:56 sam Exp $ MODIFIED slightly by William Morgan

An implementation of RFC 2047 decoding.

This module depends on the iconv library by Nobuyoshi Nakada, which I’ve heard may be distributed as a standard part of Ruby 1.8. Many thanks to him for helping with building and using iconv.

Thanks to “Josef ‘Jupp’ Schugt” <jupp / gmx.de> for pointing out an error with stateful character sets.

Copyright © Sam Roberts <sroberts / uniserve.com> 2004

This file is distributed under the same terms as Ruby.

Constant Summary collapse

WORD =

:nodoc: ‘stupid ruby-mode

%r{=\?([!\#$%&'*+-/0-9A-Z\\^\`a-z{|}~]+)\?([BbQq])\?([!->@-~ ]+)\?=}
WORDSEQ =
%r{(#{WORD.source})\s+(?=#{WORD.source})}

Class Method Summary collapse

Class Method Details

.decode_to(target, from) ⇒ Object

Decodes a string, from, containing RFC 2047 encoded words into a target character set, target. See iconv_open(3) for information on the supported target encodings. If one of the encoded words cannot be converted to the target encoding, it is left in its encoded form.



29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# File 'lib/sup/rfc2047.rb', line 29

def Rfc2047.decode_to(target, from)
  from = from.gsub(WORDSEQ, '\1')
  from.gsub(WORD) do
    |word|
    charset, encoding, text = $1, $2, $3

    # B64 or QP decode, as necessary:
    case encoding
      when 'b', 'B'
        ## Padding is optional in RFC 2047 words. Add some extra padding
        ## before decoding the base64, otherwise on Ruby 2.0 the final byte
        ## might be discarded.
        text = (text + '===').unpack('m*')[0]

      when 'q', 'Q'
        # RFC 2047 has a variant of quoted printable where a ' ' character
        # can be represented as an '_', rather than =32, so convert
        # any of these that we find before doing the QP decoding.
        text = text.tr("_", " ")
        text = text.unpack('M*')[0]

      # Don't need an else, because no other values can be matched in a
      # WORD.
    end

    # Handle UTF-7 specially because Ruby doesn't actually support it as
    # a normal character encoding.
    if charset == 'UTF-7'
      begin
        next text.decode_utf7.encode(target)
      rescue ArgumentError, EncodingError
        next word
      end
    end

    begin
      text.force_encoding(charset).encode(target)
    rescue ArgumentError, EncodingError
      word
    end
  end
end

.is_encoded?(s) ⇒ Boolean

Returns:

  • (Boolean)


23
# File 'lib/sup/rfc2047.rb', line 23

def Rfc2047.is_encoded? s; s =~ WORD end