Class: Mapi::Pst

Inherits:
Object
  • Object
show all
Includes:
Enumerable
Defined in:
lib/mapi/pst.rb

Defined Under Namespace

Modules: Desc2, Index2 Classes: Attachment, AttachmentTable, BlockParser, CompressibleEncryption, Desc, Desc64, FormatError, Header, ID2Assoc, ID2Assoc64, ID2Mapping, Index, Index64, Item, RangesIOEncryptable, RangesIOID2, RangesIOIdxChain, RawPropertyStore, RawPropertyStoreTable, Recipient, RecipientTable, TablePtr

Constant Summary collapse

ToTree =

this is the index and desc record loading code


Module.new
ITEM_COUNT_OFFSET =

more constants from libpst.c these relate to the index block

0x1f0
LEVEL_INDICATOR_OFFSET =

count byte

0x1f3
0x1f8
ITEM_COUNT_OFFSET_64 =

mostly guesses.

0x1e8
LEVEL_INDICATOR_OFFSET_64 =

diff of 3 between these 2 as above…

0x1eb

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(io) ⇒ Pst

corresponds to

  • pst_open

  • pst_load_index

Raises:



265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
# File 'lib/mapi/pst.rb', line 265

def initialize io
	@io = io
	io.pos = 0
	@header = Header.new io.read(Header::SIZE)

	# would prefer this to be in Header#validate, but it doesn't have the io size.
	# should perhaps downgrade this to just be a warning...
	raise FormatError, "header size field invalid (#{header.size} != #{io.size}}" unless header.size == io.size

	load_idx
	load_desc
	load_xattrib

	@special_folder_ids = {}
end

Instance Attribute Details

#descObject (readonly)

Returns the value of attribute desc.



260
261
262
# File 'lib/mapi/pst.rb', line 260

def desc
  @desc
end

#headerObject (readonly)

Returns the value of attribute header.



260
261
262
# File 'lib/mapi/pst.rb', line 260

def header
  @header
end

#idxObject (readonly)

Returns the value of attribute idx.



260
261
262
# File 'lib/mapi/pst.rb', line 260

def idx
  @idx
end

#ioObject (readonly)

Returns the value of attribute io.



260
261
262
# File 'lib/mapi/pst.rb', line 260

def io
  @io
end

#special_folder_idsObject (readonly)

Returns the value of attribute special_folder_ids.



260
261
262
# File 'lib/mapi/pst.rb', line 260

def special_folder_ids
  @special_folder_ids
end

Class Method Details

.make_property_set(property_list) ⇒ Object

higher level item code. wraps up the raw properties above, and gives nice objects to work with. handles item relationships too.




1502
1503
1504
1505
1506
1507
# File 'lib/mapi/pst.rb', line 1502

def self.make_property_set property_list
	hash = property_list.inject({}) do |hash, (key, type, value)|
		hash.update PropertySet::Key.new(key) => value
	end
	PropertySet.new hash
end

.unpack(str, unpack_spec) ⇒ Object

unfortunately there is no Q analogue which is little endian only. this translates T as an unsigned quad word, little endian byte order, to not pollute the rest of the code.

didn’t want to override String#unpack, cause its too hacky, and incomplete.



74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
# File 'lib/mapi/pst.rb', line 74

def self.unpack str, unpack_spec
	return str.unpack(unpack_spec) unless unpack_spec['T']
	@unpack_cache ||= {}
	t_offsets, new_spec = @unpack_cache[unpack_spec]
	unless t_offsets
		t_offsets = []
		offset = 0
		new_spec = ''
		unpack_spec.scan(/([^\d])_?(\*|\d+)?/o) do
			num_elems = $1.downcase == 'a' ? 1 : ($2 || 1).to_i
			if $1 == 'T'
				num_elems.times { |i| t_offsets << offset + i }
				new_spec << "V#{num_elems * 2}"
			else
				new_spec << $~[0]
			end
			offset += num_elems
		end
		@unpack_cache[unpack_spec] = [t_offsets, new_spec]
	end
	a = str.unpack(new_spec)
	t_offsets.each do |offset|
		low, high = a[offset, 2]
		a[offset, 2] = low && high ? low + (high << 32) : nil
	end
	a
end

Instance Method Details

#desc_from_id(id) ⇒ Object

as for idx

corresponds to:

  • _pst_getDptr



748
749
750
# File 'lib/mapi/pst.rb', line 748

def desc_from_id id
	@desc_from_id[id]
end

#dump_debug_infoObject

other random code




1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
# File 'lib/mapi/pst.rb', line 1689

def dump_debug_info
	puts "* pst header"
	p header

=begin
Looking at the output of this, for blank-o1997.pst, i see this part:
...
- (26624,516) desc block data (overlap of 4 bytes)
- (27136,516) desc block data (gap of 508 bytes)
- (28160,516) desc block data (gap of 2620 bytes)
...

which confirms my belief that the block size for idx and desc is more likely 512
=end
	if 0 + 0 == 0
		puts '* file range usage'
		file_ranges =
			# these 3 things, should account for most of the data in the file.
			[[0, Header::SIZE, 'pst file header']] +
			@idx_offsets.map { |offset| [offset, Index::BLOCK_SIZE, 'idx block data'] } +
			@desc_offsets.map { |offset| [offset, Desc::BLOCK_SIZE, 'desc block data'] } +
			@idx.map { |idx| [idx.offset, idx.size, 'idx id=0x%x (%s)' % [idx.id, idx.type]] }
		(file_ranges.sort_by { |idx| idx.first } + [nil]).to_enum(:each_cons, 2).each do |(offset, size, name), next_record|
			# i think there is a padding of the size out to 64 bytes
			# which is equivalent to padding out the final offset, because i think the offset is 
			# similarly oriented
			pad_amount = 64
			warn 'i am wrong about the offset padding' if offset % pad_amount != 0
			# so, assuming i'm not wrong about that, then we can calculate how much padding is needed.
			pad = pad_amount - (size % pad_amount)
			pad = 0 if pad == pad_amount
			gap = next_record ? next_record.first - (offset + size + pad) : 0
			extra = case gap <=> 0
				when -1; ["overlap of #{gap.abs} bytes)"]
				when  0; []
				when +1; ["gap of #{gap} bytes"]
			end
			# how about we check that padding
			@io.pos = offset + size
			pad_bytes = @io.read(pad)
			extra += ["padding not all zero"] unless pad_bytes == 0.chr * pad
			puts "- #{offset}:#{size}+#{pad} #{name.inspect}" + (extra.empty? ? '' : ' [' + extra * ', ' + ']')
		end
	end

	# i think the idea of the idx, and indeed the idx2, is just to be able to
	# refer to data indirectly, which means it can get moved around, and you just update
	# the idx table. it is simply a list of file offsets and sizes.
	# not sure i get how id2 plays into it though....
	# the sizes seem to be all even. is that a co-incidence? and the ids are all even. that
	# seems to be related to something else (see the (id & 2) == 1 stuff)
	puts '* idx entries'
	@idx.each { |idx| puts "- #{idx.inspect}" }

	# if you look at the desc tree, you notice a few things:
	# 1. there is a desc that seems to be the parent of all the folders, messages etc.
	#    it is the one whose parent is itself.
	#    one of its children is referenced as the subtree_entryid of the first desc item,
	#    the root.
	# 2. typically only 2 types of desc records have idx2_id != 0. messages themselves,
	#    and the desc with id = 0x61 - the xattrib container. everything else uses the
	#    regular ids to find its data. i think it should be reframed as small blocks and
	#    big blocks, but i'll look into it more.
	#
	# idx_id and idx2_id are for getting to the data. desc_id and parent_desc_id just define
	# the parent <-> child relationship, and the desc_ids are how the items are referred to in
	# entryids.
	# note that these aren't unique! eg for 0, 4 etc. i expect these'd never change, as the ids
	# are stored in entryids. whereas the idx and idx2 could be a bit more volatile.
	puts '* desc tree'
	# make a dummy root hold everything just for convenience
	root = Desc.new ''
	def root.inspect; "#<Pst::Root>"; end
	root.children.replace @orphans
	# this still loads the whole thing as a string for gsub. should use directo output io
	# version.
	puts root.to_tree.gsub(/, (parent_desc_id|idx2_id)=0x0(?!\d)/, '')

	# this is fairly easy to understand, its just an attempt to display the pst items in a tree form
	# which resembles what you'd see in outlook.
	puts '* item tree'
	# now streams directly
	root_item.to_tree STDOUT
end

#each(&block) ⇒ Object



1791
1792
1793
1794
1795
# File 'lib/mapi/pst.rb', line 1791

def each(&block)
	root = self.root
	block[root]
	root.each_recursive(&block)
end

#encrypted?Boolean

Returns:

  • (Boolean)


281
282
283
# File 'lib/mapi/pst.rb', line 281

def encrypted?
	@header.encrypted?
end

#id2_block_idx_chain(idx) ⇒ Object

corresponds to:

  • _pst_ff_getID2block

  • _pst_ff_getID2data

  • _pst_ff_compile_ID



911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
# File 'lib/mapi/pst.rb', line 911

def id2_block_idx_chain idx
	if (idx.id & 0x2) == 0
		[idx]
	else
		buf = idx.read
		type, fdepth, count = buf[0, 4].unpack 'CCv'
		unless type == 1 # libpst.c:3958
			warn 'Error in idx_chain - %p, %p, %p - attempting to ignore' % [type, fdepth, count]
			return [idx]
		end
		# there are 4 unaccounted for bytes here, 4...8
		if header.version_2003?
			ids = buf[8, count * 8].unpack("T#{count}")
		else
			ids = buf[8, count * 4].unpack('V*')
		end
		if fdepth == 1
			ids.map { |id| idx_from_id id }
		else
			ids.map { |id| id2_block_idx_chain idx_from_id(id) }.flatten
		end
	end
end

#idx_from_id(id) ⇒ Object

most access to idx objects will use this function

corresponds to

  • _pst_getID



652
653
654
# File 'lib/mapi/pst.rb', line 652

def idx_from_id id
	@idx_from_id[id]
end

#inspectObject



1801
1802
1803
# File 'lib/mapi/pst.rb', line 1801

def inspect
	"#<Pst name=#{name.inspect} io=#{io.inspect}>"
end

#load_descObject

corresponds to

  • _pst_build_desc_ptr

  • record_descriptor



659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
# File 'lib/mapi/pst.rb', line 659

def load_desc
	@desc = []
	@desc_offsets = []
	if header.version_2003?
		@desc = Desc64.load_chain io, header
		@desc.each { |desc| desc.pst = self }
	else
		load_desc_rec header.index2, header.index2_count, 0x21
	end

	# first create a lookup cache
	@desc_from_id = {}
		@desc.each do |desc|
		desc.pst = self
		warn "there are duplicate desc records with id #{desc.desc_id}" if @desc_from_id[desc.desc_id]
		@desc_from_id[desc.desc_id] = desc
	end

	# now turn the flat list of loaded desc records into a tree

	# well, they have no parent, so they're more like, the toplevel descs.
	@orphans = []
	# now assign each node to the parents child array, putting the orphans in the above
	@desc.each do |desc|
		parent = @desc_from_id[desc.parent_desc_id]
		# note, besides this, its possible to create other circular structures.
		if parent == desc
			# this actually happens usually, for the root_item it appears.
			#warn "desc record's parent is itself (#{desc.inspect})"
		# maybe add some more checks in here for circular structures
		elsif parent
			parent.children << desc
			next
		end
		@orphans << desc
	end

	# maybe change this to some sort of sane-ness check. orphans are expected
#		warn "have #{@orphans.length} orphan desc record(s)." unless @orphans.empty?
end

#load_desc_rec(offset, linku1, start_val) ⇒ Object

load the flat list of desc records recursively

corresponds to

  • _pst_build_desc_ptr

  • record_descriptor



705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
# File 'lib/mapi/pst.rb', line 705

def load_desc_rec offset, linku1, start_val
	@desc_offsets << offset
	
	buf = pst_read_block_size offset, Desc::BLOCK_SIZE, false
	item_count = buf[ITEM_COUNT_OFFSET]

	# not real desc
	desc = Desc.new buf[BACKLINK_OFFSET, 4]
	raise 'blah 1' unless desc.desc_id == linku1

	if buf[LEVEL_INDICATOR_OFFSET] == 0
		# leaf pointers
		raise "have too many active items in index (#{item_count})" if item_count > Desc::COUNT_MAX
		# split the data into item_count desc objects
		buf[0, Desc::SIZE * item_count].scan(/.{#{Desc::SIZE}}/mo).each_with_index do |data, i|
			desc = Desc.new data
			# first entry
			raise 'blah 3' if i == 0 and start_val != 0 and desc.desc_id != start_val
			# this shouldn't really happen i'd imagine
			break if desc.desc_id == 0
			@desc << desc
		end
	else
		# node pointers
		raise "have too many active items in index (#{item_count})" if item_count > Index::COUNT_MAX
		# split the data into item_count table pointers
		buf[0, TablePtr::SIZE * item_count].scan(/.{#{TablePtr::SIZE}}/mo).each_with_index do |data, i|
			table = TablePtr.new data
			# for the first value, we expect the start to be equal note that ids -1, so even for the
			# first we expect it to be equal. thats the 0x21 (dec 33) desc record. this means we assert
			# that the first desc record is always 33...
			raise 'blah 3' if i == 0 and start_val != -1 and table.start != start_val
			# this shouldn't really happen i'd imagine
			break if table.start == 0
			load_desc_rec table.offset, table.u1, table.start
		end
	end
end

#load_idxObject

corresponds to

  • _pst_build_id_ptr



588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
# File 'lib/mapi/pst.rb', line 588

def load_idx
	@idx = []
	@idx_offsets = []
	if header.version_2003?
		@idx = Index64.load_chain io, header
		@idx.each { |idx| idx.pst = self }
	else
		load_idx_rec header.index1, header.index1_count, 0
	end

	# we'll typically be accessing by id, so create a hash as a lookup cache
	@idx_from_id = {}
		@idx.each do |idx|
		warn "there are duplicate idx records with id #{idx.id}" if @idx_from_id[idx.id]
		@idx_from_id[idx.id] = idx
	end
end

#load_idx2(idx) ⇒ Object



856
857
858
859
860
861
862
863
# File 'lib/mapi/pst.rb', line 856

def load_idx2 idx
	if header.version_2003?
		id2 = ID2Assoc64.load_chain idx
	else
		id2 = load_idx2_rec idx
	end
	ID2Mapping.new self, id2
end

#load_idx2_rec(idx) ⇒ Object

corresponds to

  • _pst_build_id2



867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
# File 'lib/mapi/pst.rb', line 867

def load_idx2_rec idx
	# i should perhaps use a idx chain style read here?
	buf = pst_read_block_size idx.offset, idx.size, false
	type, count = buf.unpack 'v2'
	unless type == 0x0002
		raise 'unknown id2 type 0x%04x' % type
		#return
	end
	id2 = []
	count.times do |i|
		assoc = ID2Assoc.new buf[4 + ID2Assoc::SIZE * i, ID2Assoc::SIZE]
		id2 << assoc
		if assoc.table2 != 0
			id2 += load_idx2_rec idx_from_id(assoc.table2)
		end
	end
	id2
end

#load_idx_rec(offset, linku1, start_val) ⇒ Object

load the flat idx table, which maps ids to file ranges. this is the recursive helper

corresponds to

  • _pst_build_id_ptr



610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
# File 'lib/mapi/pst.rb', line 610

def load_idx_rec offset, linku1, start_val
	@idx_offsets << offset

	#_pst_read_block_size(pf, offset, BLOCK_SIZE, &buf, 0, 0) < BLOCK_SIZE)
	buf = pst_read_block_size offset, Index::BLOCK_SIZE, false

	item_count = buf[ITEM_COUNT_OFFSET]
	raise "have too many active items in index (#{item_count})" if item_count > Index::COUNT_MAX

	idx = Index.new buf[BACKLINK_OFFSET, Index::SIZE]
	raise 'blah 1' unless idx.id == linku1

	if buf[LEVEL_INDICATOR_OFFSET] == 0
		# leaf pointers
		# split the data into item_count index objects
		buf[0, Index::SIZE * item_count].scan(/.{#{Index::SIZE}}/mo).each_with_index do |data, i|
			idx = Index.new data
			# first entry
			raise 'blah 3' if i == 0 and start_val != 0 and idx.id != start_val
			idx.pst = self
			# this shouldn't really happen i'd imagine
			break if idx.id == 0
			@idx << idx
		end
	else
		# node pointers
		# split the data into item_count table pointers
		buf[0, TablePtr::SIZE * item_count].scan(/.{#{TablePtr::SIZE}}/mo).each_with_index do |data, i|
			table = TablePtr.new data
			# for the first value, we expect the start to be equal
			raise 'blah 3' if i == 0 and start_val != 0 and table.start != start_val
			# this shouldn't really happen i'd imagine
			break if table.start == 0
			load_idx_rec table.offset, table.u1, table.start
		end
	end
end

#load_xattribObject

corresponds to

  • pst_load_extended_attributes



754
755
756
757
758
759
760
761
762
763
764
765
766
767
# File 'lib/mapi/pst.rb', line 754

def load_xattrib
	unless desc = desc_from_id(0x61)
		warn "no extended attributes desc record found"
		return
	end
	unless desc.desc
		warn "no desc idx for extended attributes"
		return
	end
	if desc.list_index
	end
	#warn "skipping loading xattribs"
	# FIXME implement loading xattribs
end

#nameObject



1797
1798
1799
# File 'lib/mapi/pst.rb', line 1797

def name
	@name ||= root_item.props.display_name
end

#pst_parse_item(desc) ⇒ Object

corresponds to

  • _pst_parse_item



1680
1681
1682
# File 'lib/mapi/pst.rb', line 1680

def pst_parse_item desc
	Item.new desc, RawPropertyStore.new(desc).to_a
end

#pst_read_block_size(offset, size, decrypt = true) ⇒ Object

corresponds to:

  • _pst_read_block_size

  • _pst_read_block ??

  • _pst_ff_getIDblock_dec ??

  • _pst_ff_getIDblock ??



774
775
776
777
778
779
# File 'lib/mapi/pst.rb', line 774

def pst_read_block_size offset, size, decrypt=true
	io.seek offset
	buf = io.read size
	warn "tried to read #{size} bytes but only got #{buf.length}" if buf.length != size
	encrypted? && decrypt ? CompressibleEncryption.decrypt(buf) : buf
end

#rootObject



1784
1785
1786
# File 'lib/mapi/pst.rb', line 1784

def root
	root_item
end

#root_descObject



1774
1775
1776
# File 'lib/mapi/pst.rb', line 1774

def root_desc
	@desc.first
end

#root_itemObject



1778
1779
1780
1781
1782
# File 'lib/mapi/pst.rb', line 1778

def root_item
	item = pst_parse_item root_desc
	item.type = :root
	item
end

#warn(s) ⇒ Object

until i properly fix logging…



286
287
288
# File 'lib/mapi/pst.rb', line 286

def warn s
	Mapi::Log.warn s
end