Class: SimpleCfb

Inherits:

Object

Object
SimpleCfb

show all

Defined in:: lib/simple_cfb/simple_cfb.rb,
lib/simple_cfb/version.rb

Overview

File data is added with #add then, when finished, the entire blob of CFB data is generated in one go with #write. Progressive creation is impossible as the CFB file requires information on file sizes and directory entries at the start of output, so all of that must be known beforehand.

Files can be parsed into a new object with #parse!, then #file_index and #full_paths examined to extract the parsed CFB container components.

https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-cfb/

This Ruby port tries to be equivalent to the JavaScript original, but in so doing there are likely additional bugs and I’ve omitted anything that wasn’t needed for encrypted OOXML writing and reading.

Defined Under Namespace

Classes: SectorList

Constant Summary collapse

VERSION = Gem version. If this changes, be sure to re-run “bundle install” or “bundle update”.

'0.3.0'

DATE = Date for VERSION. If this changes, be sure to re-run “bundle install” or “bundle update”.

'2024-10-22'

MSSZ = CFB miscellaneous

MSCSZ = Mini Sector Size = 1<<6

NUL = Convenience accessor to binary-encoded NUL byte.

String.new("\x00", encoding: 'ASCII-8BIT')

FREESECT = 2.1 Compound File Sector Numbers and Types

-1

ENDOFCHAIN =

-2

FATSECT =

-3

DIFSECT =

-4

MAXREGSECT =

-6

HEADER_SIGNATURE = Compound File Header

String.new("\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1", encoding: 'ASCII-8BIT')

HEADER_CLSID =

String.new("\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", encoding: 'ASCII-8BIT')

HEADER_MINOR_VERSION =

String.new("\x3e\x00", encoding: 'ASCII-8BIT')

MAXREGSID =

-6

NOSTREAM =

-1

STREAM =

ENTRY_TYPES = 2.6.1 Compound File Directory Entry

['unknown', 'storage', 'stream', 'lockbytes', 'property', 'root']

SEED_FILENAME = Initial seed filename

"\u0001Sh33tJ5"

Instance Attribute Summary collapse

#file_index ⇒ Object

PUBLIC INSTANCE INTERFACE =========================================================================.
#full_paths ⇒ Object

PUBLIC INSTANCE INTERFACE =========================================================================.

Class Method Summary collapse

.get_int32le(input, index = 0) ⇒ Object

Treat an input ASCII-8BIT encoded string as 4 bytes and from this parse and return a signed 32-bit little-endian integer.
.get_time(data) ⇒ Object

Parse a ctime/mtime 8-byte sequence (4 16-bit little endian pairs) into a returned Ruby Time object, or nil if the values are all zero.
.get_uint32le(input, index = 0) ⇒ Object

Treat an input ASCII-8BIT encoded string as 4 bytes and from this parse and return an unsigned 32-bit little-endian integer.
.host_is_little_endian? ⇒ Boolean

Returns true if the executing computer is little-endian natively, else false.

Instance Method Summary collapse

#add(name, content) ⇒ Object

Add a file entry.
#initialize ⇒ SimpleCfb constructor

A new instance of SimpleCfb.
#parse!(file) ⇒ Object

Parses an input file into this object, allowing you to extract individual files thereafter via #read.
#write ⇒ Object

Compile and return the CFB file data.

Constructor Details

#initialize ⇒ `SimpleCfb`

Returns a new instance of SimpleCfb.



128
129
130

# File 'lib/simple_cfb/simple_cfb.rb', line 128

def initialize
  self.reinit()
end

Instance Attribute Details

#file_index ⇒ `Object`

PUBLIC INSTANCE INTERFACE



126
127
128

# File 'lib/simple_cfb/simple_cfb.rb', line 126

def file_index
  @file_index
end

#full_paths ⇒ `Object`

PUBLIC INSTANCE INTERFACE



126
127
128

# File 'lib/simple_cfb/simple_cfb.rb', line 126

def full_paths
  @full_paths
end

Class Method Details

.get_int32le(input, index = 0) ⇒ `Object`

Treat an input ASCII-8BIT encoded string as 4 bytes and from this parse and return a signed 32-bit little-endian integer.

input: ASCII-8BIT encoded string including 4 byte sequence
index: Index into input to start reading bytes (default 0)

# File 'lib/simple_cfb/simple_cfb.rb', line 98

def self.get_int32le(input, index = 0)
  data = input.slice(index, 4)
  data = data.reverse() unless self.host_is_little_endian?

  data.unpack('l').first
end

.get_time(data) ⇒ `Object`

Parse a ctime/mtime 8-byte sequence (4 16-bit little endian pairs) into a returned Ruby Time object, or nil if the values are all zero.

data: ASCII-8BIT encoded string, 8 bytes long.

# File 'lib/simple_cfb/simple_cfb.rb', line 110

def self.get_time(data)
  high = self.get_uint32le(data, 4)
  low  = self.get_uint32le(data, 0)

  return nil if high.zero? && low.zero?

  high = (high / 1e7) * 2.pow(32)
  low  = (low  / 1e7)

  return Time.at(high + low - 11644473600).utc
end

.get_uint32le(input, index = 0) ⇒ `Object`

Treat an input ASCII-8BIT encoded string as 4 bytes and from this parse and return an unsigned 32-bit little-endian integer.

input: ASCII-8BIT encoded string including 4 byte sequence
index: Index into input to start reading bytes (default 0)

# File 'lib/simple_cfb/simple_cfb.rb', line 85

def self.get_uint32le(input, index = 0)
  data = input.slice(index, 4)
  data = data.reverse() unless self.host_is_little_endian?

  data.unpack('L').first
end

.host_is_little_endian? ⇒ `Boolean`

Returns true if the executing computer is little-endian natively, else false.

Returns:

(Boolean)



75
76
77

# File 'lib/simple_cfb/simple_cfb.rb', line 75

def self.host_is_little_endian?
  [42].pack('l').bytes[0] == 42
end

Instance Method Details

#add(name, content) ⇒ `Object`

Add a file entry. Supports only root filenames only. File must not be added already.

name: Filename, e.g. “Foo”, in your preferred string encoding
content: Mandatory ASCII-8BIT encoded string containing file data

# File 'lib/simple_cfb/simple_cfb.rb', line 138

def add(name, content)
  self.reinit()

  fpath = self.full_paths[0]

  if name.slice(0, fpath.size) == fpath
    fpath = name
  else
    fpath += '/' unless fpath.end_with?('/')
    fpath  = (fpath + name).gsub('//', '/')
  end

  file = OpenStruct.new({name: filename(name), type: 2, content: content, size: content.bytesize})

  self.file_index << file
  self.full_paths << fpath

  rebuild(force_gc: true)

  return file
end

#parse!(file) ⇒ `Object`

Parses an input file into this object, allowing you to extract individual files thereafter via #read.

file: Source I/O stream. Data is read from the current file pointer, which will therefore have advanced when the method returns.

# File 'lib/simple_cfb/simple_cfb.rb', line 427

def parse!(file)
  raise "CFB corrupt - file size < 512 bytes" if file.size < 512

  mver          = 3
  ssz           = 512
  nmfs          = 0 # number of mini FAT sectors
  difat_sec_cnt = 0
  dir_start     = 0
  minifat_start = 0
  difat_start   = 0
  fat_addrs     = [] # locations of FAT sectors

  # [MS-CFB] 2.2 Compound File Header
  # Check major version
  #
  major, minor = self.check_get_mver(file)

  if major == 3
    ssz = 512
  elsif major == 4
    ssz = 4096
  elsif major == 0 && minor == 0
    raise 'Zip contents are not supported'
  else
    raise "Major version: Only 3 or 4 is supported; #{mver} encountered"
  end

  self.check_shifts(file, major)

  # Number of Directory Sectors
  #
  dir_cnt = self.read_shift(file, 4, 'i')
  raise "Directory sectors: Expected 0, saw #{dir_cnt}" if major == 3 && dir_cnt != 0

  # Number of FAT Sectors
  #
  file.seek(file.pos + 4)

  # First Directory Sector Location
  #
  dir_start = self.read_shift(file, 4, 'i')

  # Transaction Signature
  #
  file.seek(file.pos + 4)

  # Mini Stream Cutoff Size
  #
  self.check_field(file, "\x00\x10\x00\x00", 'Mini stream cutoff size')

  # First Mini FAT Sector Location
  #
  minifat_start = self.read_shift(file, 4, 'i')

  # Number of Mini FAT Sectors
  #
  nmfs = self.read_shift(file, 4, 'i')

  # First DIFAT sector location
  #
  difat_start = self.read_shift(file, 4, 'i')

  # Number of DIFAT Sectors
  #
  difat_sec_cnt = self.read_shift(file, 4, 'i')

  # Grab FAT Sector Locations
  #
  q = -1
  j = 0

  while (j < 109) # 109 = (512 - file.pos) >> 2
    q = self.read_shift(file, 4, 'i')
    break if q < 0
    fat_addrs[j] = q
    j += 1
  end

  # Break the file up into sectors, skipping the file header of 'ssz' size.
  #
  sectors = []
  file.seek(ssz)

  while ! file.eof?
    sectors << file.read(ssz)
  end

  self.sleuth_fat(difat_start, difat_sec_cnt, sectors, ssz, fat_addrs)

  # Chains
  #
  sector_list = self.make_sector_list(sectors, dir_start, fat_addrs, ssz)
  sector_list[dir_start].name = '!Directory'

  if nmfs > 0 && minifat_start != ENDOFCHAIN
    sector_list[minifat_start].name = '!MiniFAT'
  end

  sector_list[fat_addrs[0]].name = '!FAT'
  sector_list.fat_addrs          = fat_addrs
  sector_list.ssz                = ssz

  # [MS-CFB] 2.6.1 Compound File Directory Entry
  #
  files = {}
  paths = []

  self.full_paths = []
  self.file_index = []
  self.read_directory(
    dir_start,
    sector_list,
    sectors,
    paths,
    nmfs,
    files,
    minifat_start
  )

  self.build_full_paths(paths)
ensure
  file.close() unless file.nil?
end

#write ⇒ `Object`

Compile and return the CFB file data.

# File 'lib/simple_cfb/simple_cfb.rb', line 162

def write

  # Commented out for now, because we prefer parity with the JS code for
  # test verification purposes. The overhead seems minimal.
  #
  # # Get rid of the seed file if it's still present and we seem to have
  # # more file entries than the root directory and seed entry.
  # #
  # seed_leaf  = "/#{SEED_FILENAME}"
  # seed_index = self.full_paths.find_index do | path |
  #   path.end_with?(seed_leaf)
  # end
  #
  # unless seed_index.nil? || self.file_index.size < 3
  #   self.file_index.delete_at(seed_index)
  #   self.full_paths.delete_at(seed_index)
  # end
  #
  # self.rebuild(force_gc: true)
  self.rebuild(force_gc: false)

  mini_size = 0
  fat_size  = 0

  0.upto(self.file_index.size - 1) do | i |
    flen = self.file_index[i]&.content&.bytesize
    next if flen.nil? || flen.zero?

    if flen < 0x1000
      mini_size += (flen + 0x3F) >> 6
    else
      fat_size  += (flen + 0x01FF) >> 9
    end
  end

  dir_cnt   = (self.full_paths.size + 3) >> 2
  mini_cnt  = (mini_size + 7) >> 3
  mfat_cnt  = (mini_size + 0x7F) >> 7
  fat_base  = mini_cnt + fat_size + dir_cnt + mfat_cnt
  fat_cnt   = (fat_base + 0x7F) >> 7
  difat_cnt = fat_cnt <= 109 ? 0 : ((fat_cnt - 109).to_f / 0x7F).ceil()

  while (((fat_base + fat_cnt + difat_cnt + 0x7F) >> 7) > fat_cnt)
    fat_cnt += 1
    difat_cnt = fat_cnt <= 109 ? 0 : ((fat_cnt - 109).to_f / 0x7F).ceil()
  end

  el = [1, difat_cnt, fat_cnt, mfat_cnt, dir_cnt, fat_size, mini_size, 0]

  self.file_index[0].size  = mini_size << 6
  self.file_index[0].start = el[0] + el[1] + el[2] + el[3] + el[4] + el[5]

  el[7] = el[0] + el[1] + el[2] + el[3] + el[4] + el[5] + ((el[6] + 7) >> 3)

  o = String.new(encoding: 'ASCII-8BIT')

  o << HEADER_SIGNATURE
  o << NUL * 2 * 8
  o << write_shift(2, 0x003E)
  o << write_shift(2, 0x0003)
  o << write_shift(2, 0xFFFE)
  o << write_shift(2, 0x0009)
  o << write_shift(2, 0x0006)
  o << NUL * 2 * 3

  o << write_shift( 4, 0)
  o << write_shift( 4, el[2])
  o << write_shift( 4, el[0] + el[1] + el[2] + el[3] - 1)
  o << write_shift( 4, 0)
  o << write_shift( 4, 1<<12)
  o << write_shift( 4, (el[3].blank? || el[3].zero?) ? ENDOFCHAIN : el[0] + el[1] + el[2] - 1)
  o << write_shift( 4, el[3])
  o << write_shift(-4, (el[1].blank? || el[1].zero?) ? ENDOFCHAIN : el[0] - 1)
  o << write_shift( 4, el[1])

  i = 0
  t = 0

  while i < 109
    o << write_shift(-4, i < el[2] ? el[1] + i : -1)
    i += 1
  end

  unless el[1].blank? || el[1].zero?
    t = 0
    while t < el[1]
      while i < 236 + t * 127
        o << write_shift(-4, i < el[2] ? el[1] + i : -1)
        i += 1
      end

      o << write_shift(-4, t == el[1] - 1 ? ENDOFCHAIN : t + 1)
      t += 1
    end
  end

  chainit = Proc.new do | w |
    t += w

    while i < t - 1
      o << write_shift(-4, i + 1)
      i += 1
    end

    unless w.blank? || w.zero?
      i += 1
      o << write_shift(-4, ENDOFCHAIN)
    end
  end

  i = 0
  t = el[1]

  while i < t
    o << write_shift(-4, DIFSECT)
    i += 1
  end

  t += el[2]

  while i < t
    o << write_shift(-4, FATSECT)
    i += 1
  end

  chainit.call(el[3])
  chainit.call(el[4])

  j    = 0
  flen = 0
  file = self.file_index[0]

  while j < self.file_index.size
    file = self.file_index[j]
    j   += 1

    next if file.content.nil?

    flen = file.content.bytesize
    next if flen < 0x1000

    file.start = t
    chainit.call((flen + 0x01FF) >> 9)
  end

  chainit.call((el[6] + 7) >> 3)

  while o.size & 0x1FF != 0
    o << write_shift(-4, ENDOFCHAIN)
  end

  t = i = j = 0

  while j < self.file_index.size do
    file = self.file_index[j]
    j   += 1

    next if file.content.nil?

    flen = file.content.bytesize
    next if flen == 0 || flen >= 0x1000

    file.start = t
    chainit.call((flen + 0x3F) >> 6)
  end

  while o.size & 0x1FF != 0
    o << write_shift(-4, ENDOFCHAIN)
  end

  i = 0

  while i < (el[4] << 2) do
    nm = self.full_paths[i]

    if nm.blank?
      0.upto(16) { o << write_shift(4,  0) } # Remember, #upto is inclusive -> *17* words
      0.upto(2 ) { o << write_shift(4, -1) }
      0.upto(11) { o << write_shift(4,  0) }

      i += 1
      next # NOTE EARLY LOOP RESTART
    end

    file = self.file_index[i]

    if i.zero?
      file.start = file.size.blank? || file.size.zero? ? ENDOFCHAIN : file.start - 1;
    end

    u_nm = file.name
    u_nm = u_nm[0...32] if u_nm.size > 32

    flen = 2 * (u_nm.size + 1)

    o << write_shift(64, u_nm, 'utf16le')
    o << write_shift(2, flen)
    o << write_shift(1, file.type)
    o << write_shift(1, file.color)
    o << write_shift(-4, file.L)
    o << write_shift(-4, file.R)
    o << write_shift(-4, file.C)

    if file.clsid.blank?
      j = 0
      while j < 4
        o << write_shift(4, 0)
        j += 1
      end
    else
      o << file.clsid
    end

    o << write_shift(4, file.state.blank? || file.state.zero? ? 0 : file.state)
    o << write_shift(4, 0)
    o << write_shift(4, 0)
    o << write_shift(4, 0)
    o << write_shift(4, 0)
    o << write_shift(4, file.start)
    o << write_shift(4, file.size)
    o << write_shift(4, 0)

    i += 1
  end

  i = 1

  while i < self.file_index.size do
    file = self.file_index[i]

    if file.size.present? && file.size >= 0x1000
      aligned_size = (file.start + 1) << 9
      while (o.size < aligned_size) do; o << 0x00; end

      o << file.content
      while (o.size % 512 != 0) do; o << 0x00; end
    end

    i += 1
  end

  i = 1

  while i < self.file_index.size do
    file = self.file_index[i]

    if file.size.present? && file.size > 0 && file.size < 0x1000
      o << file.content
      while (o.size % 64 != 0) do; o << 0x00; end
    end

    i += 1
  end

  while (o.size < el[7] << 9) do; o << 0x00; end

  return o
end

Class: SimpleCfb

Overview

Defined Under Namespace

Constant Summary collapse

Instance Attribute Summary collapse

PUBLIC INSTANCE INTERFACE =========================================================================.

PUBLIC INSTANCE INTERFACE =========================================================================.

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize ⇒ SimpleCfb

Instance Attribute Details

#file_index ⇒ Object

#full_paths ⇒ Object

Class Method Details

.get_int32le(input, index = 0) ⇒ Object

.get_time(data) ⇒ Object

.get_uint32le(input, index = 0) ⇒ Object

.host_is_little_endian? ⇒ Boolean