Module: Msg::RTF
- Defined in:
- lib/msg/rtf.rb
Overview
Introduction
The RTF module contains a few helper functions for dealing with rtf in msgs: rtfdecompr, and rtf2html.
Both were ported from their original C versions for simplicity’s sake.
Constant Summary collapse
- RTF_PREBUF =
"{\\rtf1\\ansi\\mac\\deff0\\deftab720{\\fonttbl;}" \ "{\\f0\\fnil \\froman \\fswiss \\fmodern \\fscript " \ "\\fdecor MS Sans SerifSymbolArialTimes New RomanCourier" \ "{\\colortbl\\red0\\green0\\blue0\n\r\\par " \ "\\pard\\plain\\f0\\fs20\\b\\i\\u\\tab\\tx"
Class Method Summary collapse
-
.rtf2html(rtf) ⇒ Object
Substandard conversion of the original C code.
-
.rtfdecompr(data) ⇒ Object
Decompresses compressed rtf
data, as found in the mapi propertyPR_RTF_COMPRESSED.
Class Method Details
.rtf2html(rtf) ⇒ Object
Substandard conversion of the original C code. Test and refactor, and try to correct some inaccuracies. Returns nil if it doesn’t look like an rtf encapsulated rtf.
Code is a hack, but it works.
187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 |
# File 'lib/msg/rtf.rb', line 187 def rtf2html rtf scan = StringScanner.new rtf # require \fromhtml. is this worth keeping? return nil unless rtf["\\fromhtml"] html = '' ignore_tag = nil # skip up to the first htmltag. return nil if we don't ever find one return nil unless scan.scan_until /(?=\{\\\*\\htmltag)/ until scan.empty? if scan.scan /\{/ elsif scan.scan /\}/ elsif scan.scan /\\\*\\htmltag(\d+) ?/ #p scan[1] if ignore_tag == scan[1] scan.scan_until /\}/ ignore_tag = nil end elsif scan.scan /\\\*\\mhtmltag(\d+) ?/ ignore_tag = scan[1] elsif scan.scan /\\par ?/ html << "\r\n" elsif scan.scan /\\tab ?/ html << "\t" elsif scan.scan /\\'([0-9A-Za-z]{2})/ html << scan[1].hex.chr elsif scan.scan /\\pntext/ scan.scan_until /\}/ elsif scan.scan /\\htmlrtf/ scan.scan_until /\\htmlrtf0 ?/ # a generic throw away unknown tags thing. # the above 2 however, are handled specially elsif scan.scan /\\[a-z-]+(\d+)? ?/ #elsif scan.scan /\\li(\d+) ?/ #elsif scan.scan /\\fi-(\d+) ?/ elsif scan.scan /[\r\n]/ elsif scan.scan /\\([{}\\])/ html << scan[1] elsif scan.scan /(.)/ html << scan[1] else p :wtf end end html.strip.empty? ? nil : html end |
.rtfdecompr(data) ⇒ Object
Decompresses compressed rtf data, as found in the mapi property PR_RTF_COMPRESSED. Code converted from my C version, which in turn was ported from Java source, in JTNEF I believe.
C version was modified to use circular buffer for back references, instead of the optimization of the Java version to index directly into output buffer. This was in preparation to support streaming in a read/write neutral fashion.
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
# File 'lib/msg/rtf.rb', line 31 def rtfdecompr data io = StringIO.new data buf = RTF_PREBUF + "\x00" * (4096 - RTF_PREBUF.length) wp = RTF_PREBUF.length rtf = '' # get header fields (as defined in RTFLIB.H) compr_size, uncompr_size, magic, crc32 = io.read(16).unpack 'L*' #warn "compressed-RTF data size mismatch" unless io.size == data.compr_size + 4 # process the data case magic when 0x414c454d # magic number that identifies the stream as a uncompressed stream rtf = io.read uncompr_size when 0x75465a4c # magic number that identifies the stream as a compressed stream flag_count = -1 flags = nil while rtf.length < uncompr_size and !io.eof? #p [rtf.length, uncompr_size] # each flag byte flags 8 literals/references, 1 per bit flags = ((flag_count += 1) % 8 == 0) ? io.getc : flags >> 1 if 1 == (flags & 1) # each flag bit is 1 for reference, 0 for literal rp, l = io.getc, io.getc # offset is a 12 byte number. 2^12 is 4096, so thats fine rp = (rp << 4) | (l >> 4) # the offset relative to block start l = (l & 0xf) + 2 # the number of bytes to copy l.times do rtf << (buf[wp] = buf[rp]) wp = (wp + 1) % 4096 rp = (rp + 1) % 4096 end else rtf << (buf[wp] = io.getc) wp = (wp + 1) % 4096 end end else # unknown magic number raise "Unknown compression type (magic number 0x%08x)" % magic end rtf end |