Module: Msg::RTF

Defined in:
lib/msg/rtf.rb

Overview

Introduction

The RTF module contains a few helper functions for dealing with rtf in msgs: rtfdecompr, and rtf2html.

Both were ported from their original C versions for simplicity’s sake.

Constant Summary collapse

RTF_PREBUF =
"{\\rtf1\\ansi\\mac\\deff0\\deftab720{\\fonttbl;}" \
"{\\f0\\fnil \\froman \\fswiss \\fmodern \\fscript " \
"\\fdecor MS Sans SerifSymbolArialTimes New RomanCourier" \
"{\\colortbl\\red0\\green0\\blue0\n\r\\par " \
"\\pard\\plain\\f0\\fs20\\b\\i\\u\\tab\\tx"

Class Method Summary collapse

Class Method Details

.rtf2html(rtf) ⇒ Object

Substandard conversion of the original C code. Test and refactor, and try to correct some inaccuracies. Returns nil if it doesn’t look like an rtf encapsulated rtf.

Code is a hack, but it works.



187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
# File 'lib/msg/rtf.rb', line 187

def rtf2html rtf
  scan = StringScanner.new rtf
  # require \fromhtml. is this worth keeping?
  return nil unless rtf["\\fromhtml"]
  html = ''
  ignore_tag = nil
  # skip up to the first htmltag. return nil if we don't ever find one
  return nil unless scan.scan_until /(?=\{\\\*\\htmltag)/
  until scan.empty?
    if scan.scan /\{/
    elsif scan.scan /\}/
    elsif scan.scan /\\\*\\htmltag(\d+) ?/
      #p scan[1]
      if ignore_tag == scan[1]
        scan.scan_until /\}/
        ignore_tag = nil
      end
    elsif scan.scan /\\\*\\mhtmltag(\d+) ?/
        ignore_tag = scan[1]
    elsif scan.scan /\\par ?/
      html << "\r\n"
    elsif scan.scan /\\tab ?/
      html << "\t"
    elsif scan.scan /\\'([0-9A-Za-z]{2})/
      html << scan[1].hex.chr
    elsif scan.scan /\\pntext/
      scan.scan_until /\}/
    elsif scan.scan /\\htmlrtf/
      scan.scan_until /\\htmlrtf0 ?/
    # a generic throw away unknown tags thing.
    # the above 2 however, are handled specially
    elsif scan.scan /\\[a-z-]+(\d+)? ?/
    #elsif scan.scan /\\li(\d+) ?/
    #elsif scan.scan /\\fi-(\d+) ?/
    elsif scan.scan /[\r\n]/
    elsif scan.scan /\\([{}\\])/
      html << scan[1]
    elsif scan.scan /(.)/
      html << scan[1]
    else
      p :wtf
    end
  end
  html.strip.empty? ? nil : html
end

.rtfdecompr(data) ⇒ Object

Decompresses compressed rtf data, as found in the mapi property PR_RTF_COMPRESSED. Code converted from my C version, which in turn was ported from Java source, in JTNEF I believe.

C version was modified to use circular buffer for back references, instead of the optimization of the Java version to index directly into output buffer. This was in preparation to support streaming in a read/write neutral fashion.



31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# File 'lib/msg/rtf.rb', line 31

def rtfdecompr data
  io  = StringIO.new data
  buf = RTF_PREBUF + "\x00" * (4096 - RTF_PREBUF.length)
  wp  = RTF_PREBUF.length
  rtf = ''

  # get header fields (as defined in RTFLIB.H)
  compr_size, uncompr_size, magic, crc32 = io.read(16).unpack 'L*'
  #warn "compressed-RTF data size mismatch" unless io.size == data.compr_size + 4

  # process the data
  case magic
  when 0x414c454d # magic number that identifies the stream as a uncompressed stream
    rtf = io.read uncompr_size
  when 0x75465a4c # magic number that identifies the stream as a compressed stream
    flag_count = -1
    flags = nil
    while rtf.length < uncompr_size and !io.eof?
      #p [rtf.length, uncompr_size]
      # each flag byte flags 8 literals/references, 1 per bit
      flags = ((flag_count += 1) % 8 == 0) ? io.getc : flags >> 1
      if 1 == (flags & 1) # each flag bit is 1 for reference, 0 for literal
        rp, l = io.getc, io.getc
        # offset is a 12 byte number. 2^12 is 4096, so thats fine
        rp = (rp << 4) | (l >> 4) # the offset relative to block start
        l = (l & 0xf) + 2 # the number of bytes to copy
        l.times do
          rtf << (buf[wp] = buf[rp])
          wp = (wp + 1) % 4096
          rp = (rp + 1) % 4096
        end
      else
        rtf << (buf[wp] = io.getc)
        wp = (wp + 1) % 4096
      end
    end
  else # unknown magic number
    raise "Unknown compression type (magic number 0x%08x)" % magic
  end
  rtf
end