Class: Mapi::Pst::BlockParser

Inherits:
Object
  • Object
show all
Includes:
Types::Constants
Defined in:
lib/mapi/pst.rb

Overview

the job of this class, is to take a desc record, and be able to enumerate through the mapi properties of the associated thing.

corresponds to

  • _pst_parse_block

  • _pst_process (in some ways. although perhaps thats more the Item::Properties#add_property)

Constant Summary collapse

TYPES =
{
  0xbc => 1,
  0x7c => 2,
  # type 3 is removed. an artifact of not handling the indirect blocks properly in libpst.
}
PR_SUBJECT =
PropertySet::TAGS.find { |num, (name, type)| name == 'PR_SUBJECT' }.first.hex
PR_BODY_HTML =
PropertySet::TAGS.find { |num, (name, type)| name == 'PR_BODY_HTML' }.first.hex
IMMEDIATE_TYPES =
[
  PT_SHORT, PT_LONG, PT_BOOLEAN
]
INDIRECT_TYPES =
[
  PT_DOUBLE, PT_OBJECT,
  0x0014, # whats this? probably something like PT_LONGLONG, given the correspondence with the
          # ole variant types. (= VT_I8)
  PT_STRING8, PT_UNICODE, # unicode isn't in libpst, but added here for outlook 2003 down the track
  PT_SYSTIME,
  0x0048, # another unknown
  0x0102, # this is PT_BINARY vs PT_CLSID
  #0x1003, # these are vector types, but they're commented out for now because i'd expect that
  #0x1014, # there's extra decoding needed that i'm not doing. (probably just need a simple
  #        # PT_* => unpack string mapping for the immediate types, and just do unpack('V*') etc
  #0x101e,
  #0x1102
]
ID2_ATTACHMENTS =
0x671
ID2_RECIPIENTS =
0x692
USE_MAIN_DATA =

Targeting main data, not sub

-1

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(node, local_node_id = USE_MAIN_DATA) ⇒ BlockParser

Returns a new instance of BlockParser.

Parameters:

  • node (NodePtr)
  • local_node_id (Integer) (defaults to: USE_MAIN_DATA)


1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
# File 'lib/mapi/pst.rb', line 1002

def initialize node, local_node_id = USE_MAIN_DATA
  #raise FormatError, "unable to get associated index record for #{node.inspect}" unless node.block
  @node = node
  @data_chunks = {}

  data_array = (local_node_id == USE_MAIN_DATA) ? node.read_main_array : (node.read_sub_array local_node_id)

  data_array.each_with_index { |data, index|
    # see https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-pst/a3fa280c-eba3-434f-86e4-b95141b3c7b1
    if index == 0
      load_root_header data
    else
      load_page_header data, index
    end
  }

  # now, we may have multiple different blocks
end

Instance Attribute Details

#data_chunksHash<Integer, String> (readonly)

Returns HID to data block.

Returns:

  • (Hash<Integer, String>)

    HID to data block



998
999
1000
# File 'lib/mapi/pst.rb', line 998

def data_chunks
  @data_chunks
end

#nodeNodePtr (readonly)

Returns:



994
995
996
# File 'lib/mapi/pst.rb', line 994

def node
  @node
end

Instance Method Details

#get_data_array(offset) ⇒ Array<String>

Parameters:

  • offset (Integer)

Returns:

  • (Array<String>)


1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
# File 'lib/mapi/pst.rb', line 1107

def get_data_array offset
  raise "offset must be Integer" unless Integer === offset

  if offset == 0
    nil
  elsif (offset & 0x1f) != 0
    # this is NID (node)
    node.read_sub_array(offset)
  else
    # this is HID (heap)
    [data_chunks[offset]]
  end
end

#get_data_indirect(offset) ⇒ String

based on the value of offset, return either some data from buf, or some data from the id2 chain id2, where offset is some key into a lookup table that is stored as the id2 chain. i think i may need to create a BlockParser class that wraps up all this mess.

corresponds to:

  • _pst_getBlockOffsetPointer

  • _pst_getBlockOffset

Parameters:

  • offset (Integer)

Returns:

  • (String)


1072
1073
1074
1075
1076
# File 'lib/mapi/pst.rb', line 1072

def get_data_indirect offset
  raise "offset must be Integer" unless Integer === offset

  return get_data_indirect_io(offset).read
end

#get_data_indirect_io(offset) ⇒ StringIO

Resolve data pointed by HNID



1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
# File 'lib/mapi/pst.rb', line 1084

def get_data_indirect_io offset
  raise "offset must be Integer" unless Integer === offset

  if offset == 0
    nil
  elsif (offset & 0x1f) != 0
    # this is NID (node)
    data_array = node.read_sub_array(offset)
    raise "local node id #{offset} points multi page count #{data_array.count}, use get_data_array() instead" if data_array.count >= 2
    if data_array.empty?
      StringIO.new ""
    else
      StringIO.new data_array.first
    end
  else
    # this is HID (heap)
    StringIO.new data_chunks[offset]
  end
end

#handle_indirect_values(key, type, value) ⇒ Object



1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
# File 'lib/mapi/pst.rb', line 1121

def handle_indirect_values key, type, value
  case type
  when PT_BOOLEAN
    value = value != 0
  when *IMMEDIATE_TYPES # not including PT_BOOLEAN which we just did above
    # no processing current applied (needed?).
  when *INDIRECT_TYPES
    # the value is a pointer
    if String === value # ie, value size > 4 above
      value = StringIO.new value
    else
      value = get_data_array(value)
      if value
        value = StringIO.new value.join("")
      end
    end
    # keep strings as immediate values for now, for compatability with how i set up
    # Msg::Properties::ENCODINGS
    if value
      if type == PT_STRING8
        value = node.pst.helper.convert_ansi_str value.read
      elsif type == PT_UNICODE
        value = Ole::Types::FROM_UTF16.iconv value.read
      end
    end
    # special subject handling
    if key == PR_BODY_HTML and value
      # to keep the msg code happy, which thinks body_html will be an io
      # although, in 2003 version, they are 0102 already
      value = StringIO.new value unless value.respond_to?(:read)
    end
    if key == PR_SUBJECT and String === value and value.length >= 2
      if value[0].ord == 1
        # This 2 chars header tell us how to omit subject prefix like `Yes: `, `Re: `, etc.
        # We need not to omit them.
        value = value[2..-1]
      end
=begin
      index = value =~ /^[A-Z]*:/ ? $~[0].length - 1 : nil
      unless ignore == 1 and offset == index
        warn 'something wrong with subject hack' 
        $x = [ignore, offset, value]
        require 'irb'
        IRB.start
        exit
      end
=end

=begin
new idea:

making sense of the \001\00[156] i've seen prefixing subject. i think its to do with the placement
of the ':', or the ' '. And perhaps an optimization to do with thread topic, and ignoring the prefixes
added by mailers. thread topic is equal to subject with all that crap removed.

can test by creating some mails with bizarre subjects.

subject="\001\005RE: blah blah"
subject="\001\001blah blah"
subject="\001\032Out of Office AutoReply: blah blah"
subject="\001\020Undeliverable: blah blah"

looks like it

=end


      # now what i think, is that perhaps, value[offset..-1] ...
      # or something like that should be stored as a special tag. ie, do a double yield
      # for this case. probably PR_CONVERSATION_TOPIC, in which case i'd write instead:
      # yield [PR_SUBJECT, ref_type, value]
      # yield [PR_CONVERSATION_TOPIC, ref_type, value[offset..-1]
      # next # to skip the yield.
    end

    # special handling for embedded objects
    # used for attach_data for attached messages. in which case attach_method should == 5,
    # for embedded object.
    if type == PT_OBJECT and value
      value = value.read if value.respond_to?(:read)
      id2, unknown = value.unpack 'V2'
      io = get_data_indirect_io id2

      # hacky
      #desc2 = OpenStruct.new(:node => io, :pst => node.pst, :sub_block => node.sub_block, :children => [])
      # put nil instead of desc.list_index, otherwise the attachment is attached to itself ad infinitum.
      # should try and fix that FIXME
      # this shouldn't be done always. for an attached message, yes, but for an attached
      # meta file, for example, it shouldn't. difference between embedded_ole vs embedded_msg
      # really.
      # note that in the case where its a embedded ole, you actually get a regular serialized ole
      # object, so i need to create an ole storage object on a rangesioidxchain!
      # eg:
=begin
att.props.display_name # => "Picture (Metafile)"
io = att.props.attach_data
io.read(32).unpack('H*') # => ["d0cf11e0a1b11ae100000.... note the docfile signature.
# plug some missing rangesio holes:
def io.rewind; seek 0; end
def io.flush; raise IOError; end
ole = Ole::Storage.open io
puts ole.root.to_tree

- #<Dirent:"Root Entry">
|- #<Dirent:"\001Ole" size=20 data="\001\000\000\002\000...">
|- #<Dirent:"CONTENTS" size=65696 data="\327\315\306\232\000...">
\- #<Dirent:"\003MailStream" size=12 data="\001\000\000\000[...">
=end

      # until properly fixed, i have disabled this code here, so this will break
      # nested messages temporarily.
      #value = Item.new desc2, RawPropertyStore.new(desc2).to_a
      #desc2.list_index = nil
      value = io
    end
  # this is PT_MV_STRING8, i guess.
  # should probably have the 0x1000 flag, and do the or-ring.
  # example of 0x1102 is PR_OUTLOOK_2003_ENTRYIDS. less sure about that one.
  when 0x101e, 0x1102
    # example data:
    # 0x802b "\003\000\000\000\020\000\000\000\030\000\000\000#\000\000\000BusinessCompetitionFavorites"
    # this 0x802b would be an extended attribute for categories / keywords.
    value = get_data_indirect_io(value).read unless String === value
    num = value.unpack('V')[0]
    offsets = value[4, 4 * num].unpack("V#{num}")
    value = (offsets + [value.length]).to_enum(:each_cons, 2).map { |from, to| value[from...to] }
    value.map! { |str| StringIO.new str } if type == 0x1102
  when 0x101f
    value = get_data_indirect_io(value).read unless String === value
    num = value.unpack('V')[0]
    offsets = value[4, 4 * num].unpack("V#{num}")
    value = (offsets + [value.length]).to_enum(:each_cons, 2).map { |from, to| value[from...to] }
    value.map! { |str| Ole::Types::FROM_UTF16.iconv str }
  when 0x1003 # uint32 array
    value = get_data_indirect_io(value).read unless String === value
    # there is no count field
    value = value.unpack("V#{(value.length / 4)}")
  else
    name = Mapi::Types::DATA[type].first rescue nil
    warn '0x%04x %p' % [key, get_data_indirect_io(value).read]
    raise NotImplementedError, 'unsupported mapi property type - 0x%04x (%p)' % [type, name]
  end
  [key, type, value]
end

#load_page_header(data, page_index) ⇒ Object

Parse HNPAGEHDR / HNBITMAPHDR



1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
# File 'lib/mapi/pst.rb', line 1028

def load_page_header data, page_index
  page_map = data.unpack('v').first

  # read HNPAGEMAP
  offsets_count = data[page_map, 2].unpack("v").first + 1
  offset_tables = data[page_map + 4, 2 * offsets_count].unpack("v#{offsets_count}")

  offset_tables.each_cons(2).to_a.each_with_index do |(from, to), index|
    # conver to HID
    @data_chunks[0x20 * (1 + index) + 65536 * page_index] = data[from, to - from]
  end
end

#load_root_header(data) ⇒ Object

Parse HNHDR



1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
# File 'lib/mapi/pst.rb', line 1045

def load_root_header data
  page_map, sig, @heap_type, @offset1 = data.unpack 'vCCVV'
  raise FormatError, 'invalid signature 0x%02x' % sig unless sig == 0xec
  raise FormatError, 'unknown block type signature 0x%02x' % @heap_type unless TYPES[@heap_type]
  @type = TYPES[@heap_type]

  # read HNPAGEMAP
  offsets_count = data[page_map, 2].unpack("v").first + 1
  offset_tables = data[page_map + 4, 2 * offsets_count].unpack("v#{offsets_count}")

  offset_tables.each_cons(2).to_a.each_with_index do |(from, to), index|
    # conver to HID
    @data_chunks[0x20 * (1 + index)] = data[from, to - from]
  end
end