Class: Mapi::Pst

Inherits:
Object
  • Object
show all
Includes:
Enumerable
Defined in:
lib/mapi/pst.rb

Defined Under Namespace

Modules: Desc2, Index2 Classes: Attachment, AttachmentTable, BlockParser, CompressibleEncryption, Desc, Desc64, FormatError, Header, ID2Assoc, ID2Assoc64, ID2Mapping, Index, Index64, Item, RangesIOEncryptable, RangesIOID2, RangesIOIdxChain, RawPropertyStore, RawPropertyStoreTable, Recipient, RecipientTable, TablePtr

Constant Summary collapse

ToTree =

this is the index and desc record loading code


Module.new
ITEM_COUNT_OFFSET =

more constants from libpst.c these relate to the index block

0x1f0
LEVEL_INDICATOR_OFFSET =

count byte

0x1f3
0x1f8
ITEM_COUNT_OFFSET_64 =

mostly guesses.

0x1e8
LEVEL_INDICATOR_OFFSET_64 =

diff of 3 between these 2 as above…

0x1eb

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(io) ⇒ Pst

corresponds to

  • pst_open

  • pst_load_index

Raises:



265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
# File 'lib/mapi/pst.rb', line 265

def initialize io
  @io = io
  io.pos = 0
  @header = Header.new io.read(Header::SIZE)

  # would prefer this to be in Header#validate, but it doesn't have the io size.
  # should perhaps downgrade this to just be a warning...
  raise FormatError, "header size field invalid (#{header.size} != #{io.size}}" unless header.size == io.size

  load_idx
  load_desc
  load_xattrib

  @special_folder_ids = {}
end

Instance Attribute Details

#descObject (readonly)

Returns the value of attribute desc.



260
261
262
# File 'lib/mapi/pst.rb', line 260

def desc
  @desc
end

#headerObject (readonly)

Returns the value of attribute header.



260
261
262
# File 'lib/mapi/pst.rb', line 260

def header
  @header
end

#idxObject (readonly)

Returns the value of attribute idx.



260
261
262
# File 'lib/mapi/pst.rb', line 260

def idx
  @idx
end

#ioObject (readonly)

Returns the value of attribute io.



260
261
262
# File 'lib/mapi/pst.rb', line 260

def io
  @io
end

#special_folder_idsObject (readonly)

Returns the value of attribute special_folder_ids.



260
261
262
# File 'lib/mapi/pst.rb', line 260

def special_folder_ids
  @special_folder_ids
end

Class Method Details

.make_property_set(property_list) ⇒ Object

higher level item code. wraps up the raw properties above, and gives nice objects to work with. handles item relationships too.




1502
1503
1504
1505
1506
1507
# File 'lib/mapi/pst.rb', line 1502

def self.make_property_set property_list
  hash = property_list.inject({}) do |hash, (key, type, value)|
    hash.update PropertySet::Key.new(key) => value
  end
  PropertySet.new hash
end

.unpack(str, unpack_spec) ⇒ Object

unfortunately there is no Q analogue which is little endian only. this translates T as an unsigned quad word, little endian byte order, to not pollute the rest of the code.

didn’t want to override String#unpack, cause its too hacky, and incomplete.



74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
# File 'lib/mapi/pst.rb', line 74

def self.unpack str, unpack_spec
  return str.unpack(unpack_spec) unless unpack_spec['T']
  @unpack_cache ||= {}
  t_offsets, new_spec = @unpack_cache[unpack_spec]
  unless t_offsets
    t_offsets = []
    offset = 0
    new_spec = ''
    unpack_spec.scan(/([^\d])_?(\*|\d+)?/o) do
      num_elems = $1.downcase == 'a' ? 1 : ($2 || 1).to_i
      if $1 == 'T'
        num_elems.times { |i| t_offsets << offset + i }
        new_spec << "V#{num_elems * 2}"
      else
        new_spec << $~[0]
      end
      offset += num_elems
    end
    @unpack_cache[unpack_spec] = [t_offsets, new_spec]
  end
  a = str.unpack(new_spec)
  t_offsets.each do |offset|
    low, high = a[offset, 2]
    a[offset, 2] = low && high ? low + (high << 32) : nil
  end
  a
end

Instance Method Details

#desc_from_id(id) ⇒ Object

as for idx

corresponds to:

  • _pst_getDptr



748
749
750
# File 'lib/mapi/pst.rb', line 748

def desc_from_id id
  @desc_from_id[id]
end

#dump_debug_infoObject

other random code




1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
# File 'lib/mapi/pst.rb', line 1689

def dump_debug_info
  puts "* pst header"
  p header

=begin
Looking at the output of this, for blank-o1997.pst, i see this part:
...
- (26624,516) desc block data (overlap of 4 bytes)
- (27136,516) desc block data (gap of 508 bytes)
- (28160,516) desc block data (gap of 2620 bytes)
...

which confirms my belief that the block size for idx and desc is more likely 512
=end

  if 0 + 0 == 0
    puts '* file range usage'
    file_ranges =
      # these 3 things, should account for most of the data in the file.
      [[0, Header::SIZE, 'pst file header']] +
      @idx_offsets.map { |offset| [offset, Index::BLOCK_SIZE, 'idx block data'] } +
      @desc_offsets.map { |offset| [offset, Desc::BLOCK_SIZE, 'desc block data'] } +
      @idx.map { |idx| [idx.offset, idx.size, 'idx id=0x%x (%s)' % [idx.id, idx.type]] }
    (file_ranges.sort_by { |idx| idx.first } + [nil]).to_enum(:each_cons, 2).each do |(offset, size, name), next_record|
      # i think there is a padding of the size out to 64 bytes
      # which is equivalent to padding out the final offset, because i think the offset is 
      # similarly oriented
      pad_amount = 64
      warn 'i am wrong about the offset padding' if offset % pad_amount != 0
      # so, assuming i'm not wrong about that, then we can calculate how much padding is needed.
      pad = pad_amount - (size % pad_amount)
      pad = 0 if pad == pad_amount
      gap = next_record ? next_record.first - (offset + size + pad) : 0
      extra = case gap <=> 0
        when -1; ["overlap of #{gap.abs} bytes)"]
        when  0; []
        when +1; ["gap of #{gap} bytes"]
      end
      # how about we check that padding
      @io.pos = offset + size
      pad_bytes = @io.read(pad)
      extra += ["padding not all zero"] unless pad_bytes == 0.chr * pad
      puts "- #{offset}:#{size}+#{pad} #{name.inspect}" + (extra.empty? ? '' : ' [' + extra * ', ' + ']')
    end
  end

  # i think the idea of the idx, and indeed the idx2, is just to be able to
  # refer to data indirectly, which means it can get moved around, and you just update
  # the idx table. it is simply a list of file offsets and sizes.
  # not sure i get how id2 plays into it though....
  # the sizes seem to be all even. is that a co-incidence? and the ids are all even. that
  # seems to be related to something else (see the (id & 2) == 1 stuff)
  puts '* idx entries'
  @idx.each { |idx| puts "- #{idx.inspect}" }

  # if you look at the desc tree, you notice a few things:
  # 1. there is a desc that seems to be the parent of all the folders, messages etc.
  #    it is the one whose parent is itself.
  #    one of its children is referenced as the subtree_entryid of the first desc item,
  #    the root.
  # 2. typically only 2 types of desc records have idx2_id != 0. messages themselves,
  #    and the desc with id = 0x61 - the xattrib container. everything else uses the
  #    regular ids to find its data. i think it should be reframed as small blocks and
  #    big blocks, but i'll look into it more.
  #
  # idx_id and idx2_id are for getting to the data. desc_id and parent_desc_id just define
  # the parent <-> child relationship, and the desc_ids are how the items are referred to in
  # entryids.
  # note that these aren't unique! eg for 0, 4 etc. i expect these'd never change, as the ids
  # are stored in entryids. whereas the idx and idx2 could be a bit more volatile.
  puts '* desc tree'
  # make a dummy root hold everything just for convenience
  root = Desc.new ''
  def root.inspect; "#<Pst::Root>"; end
  root.children.replace @orphans
  # this still loads the whole thing as a string for gsub. should use directo output io
  # version.
  puts root.to_tree.gsub(/, (parent_desc_id|idx2_id)=0x0(?!\d)/, '')

  # this is fairly easy to understand, its just an attempt to display the pst items in a tree form
  # which resembles what you'd see in outlook.
  puts '* item tree'
  # now streams directly
  root_item.to_tree STDOUT
end

#each(&block) ⇒ Object



1791
1792
1793
1794
1795
# File 'lib/mapi/pst.rb', line 1791

def each(&block)
  root = self.root
  block[root]
  root.each_recursive(&block)
end

#encrypted?Boolean

Returns:

  • (Boolean)


281
282
283
# File 'lib/mapi/pst.rb', line 281

def encrypted?
  @header.encrypted?
end

#id2_block_idx_chain(idx) ⇒ Object

corresponds to:

  • _pst_ff_getID2block

  • _pst_ff_getID2data

  • _pst_ff_compile_ID



911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
# File 'lib/mapi/pst.rb', line 911

def id2_block_idx_chain idx
  if (idx.id & 0x2) == 0
    [idx]
  else
    buf = idx.read
    type, fdepth, count = buf[0, 4].unpack 'CCv'
    unless type == 1 # libpst.c:3958
      warn 'Error in idx_chain - %p, %p, %p - attempting to ignore' % [type, fdepth, count]
      return [idx]
    end
    # there are 4 unaccounted for bytes here, 4...8
    if header.version_2003?
      ids = buf[8, count * 8].unpack("T#{count}")
    else
      ids = buf[8, count * 4].unpack('V*')
    end
    if fdepth == 1
      ids.map { |id| idx_from_id id }
    else
      ids.map { |id| id2_block_idx_chain idx_from_id(id) }.flatten
    end
  end
end

#idx_from_id(id) ⇒ Object

most access to idx objects will use this function

corresponds to

  • _pst_getID



652
653
654
# File 'lib/mapi/pst.rb', line 652

def idx_from_id id
  @idx_from_id[id]
end

#inspectObject



1801
1802
1803
# File 'lib/mapi/pst.rb', line 1801

def inspect
  "#<Pst name=#{name.inspect} io=#{io.inspect}>"
end

#load_descObject

corresponds to

  • _pst_build_desc_ptr

  • record_descriptor



659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
# File 'lib/mapi/pst.rb', line 659

def load_desc
  @desc = []
  @desc_offsets = []
  if header.version_2003?
    @desc = Desc64.load_chain io, header
    @desc.each { |desc| desc.pst = self }
  else
    load_desc_rec header.index2, header.index2_count, 0x21
  end

  # first create a lookup cache
  @desc_from_id = {}
    @desc.each do |desc|
    desc.pst = self
    warn "there are duplicate desc records with id #{desc.desc_id}" if @desc_from_id[desc.desc_id]
    @desc_from_id[desc.desc_id] = desc
  end

  # now turn the flat list of loaded desc records into a tree

  # well, they have no parent, so they're more like, the toplevel descs.
  @orphans = []
  # now assign each node to the parents child array, putting the orphans in the above
  @desc.each do |desc|
    parent = @desc_from_id[desc.parent_desc_id]
    # note, besides this, its possible to create other circular structures.
    if parent == desc
      # this actually happens usually, for the root_item it appears.
      #warn "desc record's parent is itself (#{desc.inspect})"
    # maybe add some more checks in here for circular structures
    elsif parent
      parent.children << desc
      next
    end
    @orphans << desc
  end

  # maybe change this to some sort of sane-ness check. orphans are expected
#   warn "have #{@orphans.length} orphan desc record(s)." unless @orphans.empty?
end

#load_desc_rec(offset, linku1, start_val) ⇒ Object

load the flat list of desc records recursively

corresponds to

  • _pst_build_desc_ptr

  • record_descriptor



705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
# File 'lib/mapi/pst.rb', line 705

def load_desc_rec offset, linku1, start_val
  @desc_offsets << offset
  
  buf = pst_read_block_size offset, Desc::BLOCK_SIZE, false
  item_count = buf[ITEM_COUNT_OFFSET]

  # not real desc
  desc = Desc.new buf[BACKLINK_OFFSET, 4]
  raise 'blah 1' unless desc.desc_id == linku1

  if buf[LEVEL_INDICATOR_OFFSET] == 0
    # leaf pointers
    raise "have too many active items in index (#{item_count})" if item_count > Desc::COUNT_MAX
    # split the data into item_count desc objects
    buf[0, Desc::SIZE * item_count].scan(/.{#{Desc::SIZE}}/mo).each_with_index do |data, i|
      desc = Desc.new data
      # first entry
      raise 'blah 3' if i == 0 and start_val != 0 and desc.desc_id != start_val
      # this shouldn't really happen i'd imagine
      break if desc.desc_id == 0
      @desc << desc
    end
  else
    # node pointers
    raise "have too many active items in index (#{item_count})" if item_count > Index::COUNT_MAX
    # split the data into item_count table pointers
    buf[0, TablePtr::SIZE * item_count].scan(/.{#{TablePtr::SIZE}}/mo).each_with_index do |data, i|
      table = TablePtr.new data
      # for the first value, we expect the start to be equal note that ids -1, so even for the
      # first we expect it to be equal. thats the 0x21 (dec 33) desc record. this means we assert
      # that the first desc record is always 33...
      raise 'blah 3' if i == 0 and start_val != -1 and table.start != start_val
      # this shouldn't really happen i'd imagine
      break if table.start == 0
      load_desc_rec table.offset, table.u1, table.start
    end
  end
end

#load_idxObject

corresponds to

  • _pst_build_id_ptr



588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
# File 'lib/mapi/pst.rb', line 588

def load_idx
  @idx = []
  @idx_offsets = []
  if header.version_2003?
    @idx = Index64.load_chain io, header
    @idx.each { |idx| idx.pst = self }
  else
    load_idx_rec header.index1, header.index1_count, 0
  end

  # we'll typically be accessing by id, so create a hash as a lookup cache
  @idx_from_id = {}
    @idx.each do |idx|
    warn "there are duplicate idx records with id #{idx.id}" if @idx_from_id[idx.id]
    @idx_from_id[idx.id] = idx
  end
end

#load_idx2(idx) ⇒ Object



856
857
858
859
860
861
862
863
# File 'lib/mapi/pst.rb', line 856

def load_idx2 idx
  if header.version_2003?
    id2 = ID2Assoc64.load_chain idx
  else
    id2 = load_idx2_rec idx
  end
  ID2Mapping.new self, id2
end

#load_idx2_rec(idx) ⇒ Object

corresponds to

  • _pst_build_id2



867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
# File 'lib/mapi/pst.rb', line 867

def load_idx2_rec idx
  # i should perhaps use a idx chain style read here?
  buf = pst_read_block_size idx.offset, idx.size, false
  type, count = buf.unpack 'v2'
  unless type == 0x0002
    raise 'unknown id2 type 0x%04x' % type
    #return
  end
  id2 = []
  count.times do |i|
    assoc = ID2Assoc.new buf[4 + ID2Assoc::SIZE * i, ID2Assoc::SIZE]
    id2 << assoc
    if assoc.table2 != 0
      id2 += load_idx2_rec idx_from_id(assoc.table2)
    end
  end
  id2
end

#load_idx_rec(offset, linku1, start_val) ⇒ Object

load the flat idx table, which maps ids to file ranges. this is the recursive helper

corresponds to

  • _pst_build_id_ptr



610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
# File 'lib/mapi/pst.rb', line 610

def load_idx_rec offset, linku1, start_val
  @idx_offsets << offset

  #_pst_read_block_size(pf, offset, BLOCK_SIZE, &buf, 0, 0) < BLOCK_SIZE)
  buf = pst_read_block_size offset, Index::BLOCK_SIZE, false

  item_count = buf[ITEM_COUNT_OFFSET]
  raise "have too many active items in index (#{item_count})" if item_count > Index::COUNT_MAX

  idx = Index.new buf[BACKLINK_OFFSET, Index::SIZE]
  raise 'blah 1' unless idx.id == linku1

  if buf[LEVEL_INDICATOR_OFFSET] == 0
    # leaf pointers
    # split the data into item_count index objects
    buf[0, Index::SIZE * item_count].scan(/.{#{Index::SIZE}}/mo).each_with_index do |data, i|
      idx = Index.new data
      # first entry
      raise 'blah 3' if i == 0 and start_val != 0 and idx.id != start_val
      idx.pst = self
      # this shouldn't really happen i'd imagine
      break if idx.id == 0
      @idx << idx
    end
  else
    # node pointers
    # split the data into item_count table pointers
    buf[0, TablePtr::SIZE * item_count].scan(/.{#{TablePtr::SIZE}}/mo).each_with_index do |data, i|
      table = TablePtr.new data
      # for the first value, we expect the start to be equal
      raise 'blah 3' if i == 0 and start_val != 0 and table.start != start_val
      # this shouldn't really happen i'd imagine
      break if table.start == 0
      load_idx_rec table.offset, table.u1, table.start
    end
  end
end

#load_xattribObject

corresponds to

  • pst_load_extended_attributes



754
755
756
757
758
759
760
761
762
763
764
765
766
767
# File 'lib/mapi/pst.rb', line 754

def load_xattrib
  unless desc = desc_from_id(0x61)
    warn "no extended attributes desc record found"
    return
  end
  unless desc.desc
    warn "no desc idx for extended attributes"
    return
  end
  if desc.list_index
  end
  #warn "skipping loading xattribs"
  # FIXME implement loading xattribs
end

#nameObject



1797
1798
1799
# File 'lib/mapi/pst.rb', line 1797

def name
  @name ||= root_item.props.display_name
end

#pst_parse_item(desc) ⇒ Object

corresponds to

  • _pst_parse_item



1680
1681
1682
# File 'lib/mapi/pst.rb', line 1680

def pst_parse_item desc
  Item.new desc, RawPropertyStore.new(desc).to_a
end

#pst_read_block_size(offset, size, decrypt = true) ⇒ Object

corresponds to:

  • _pst_read_block_size

  • _pst_read_block ??

  • _pst_ff_getIDblock_dec ??

  • _pst_ff_getIDblock ??



774
775
776
777
778
779
# File 'lib/mapi/pst.rb', line 774

def pst_read_block_size offset, size, decrypt=true
  io.seek offset
  buf = io.read size
  warn "tried to read #{size} bytes but only got #{buf.length}" if buf.length != size
  encrypted? && decrypt ? CompressibleEncryption.decrypt(buf) : buf
end

#rootObject



1784
1785
1786
# File 'lib/mapi/pst.rb', line 1784

def root
  root_item
end

#root_descObject



1774
1775
1776
# File 'lib/mapi/pst.rb', line 1774

def root_desc
  @desc.first
end

#root_itemObject



1778
1779
1780
1781
1782
# File 'lib/mapi/pst.rb', line 1778

def root_item
  item = pst_parse_item root_desc
  item.type = :root
  item
end

#warn(s) ⇒ Object

until i properly fix logging…



286
287
288
# File 'lib/mapi/pst.rb', line 286

def warn s
  Mapi::Log.warn s
end