Class: SimpleCfb

Inherits:
Object
  • Object
show all
Defined in:
lib/simple_cfb/simple_cfb.rb,
lib/simple_cfb/version.rb

Overview

Ported from github.com/SheetJS/js-cfb.

File data is added with #add then, when finished, the entire blob of CFB data is generated in one go with #write. Progressive creation is impossible as the CFB file requires information on file sizes and directory entries at the start of output, so all of that must be known beforehand.

Files can be parsed into a new object with #parse!, then #file_index and #full_paths examined to extract the parsed CFB container components.

https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-cfb/

This Ruby port tries to be equivalent to the JavaScript original, but in so doing there are likely additional bugs and I’ve omitted anything that wasn’t needed for encrypted OOXML writing and reading.

Defined Under Namespace

Classes: SectorList

Constant Summary collapse

VERSION =

Gem version. If this changes, be sure to re-run “bundle install” or “bundle update”.

'0.3.0'
DATE =

Date for VERSION. If this changes, be sure to re-run “bundle install” or “bundle update”.

'2024-10-22'
MSSZ =

CFB miscellaneous

64
MSCSZ =

Mini Sector Size = 1<<6

4096
NUL =

Convenience accessor to binary-encoded NUL byte.

String.new("\x00", encoding: 'ASCII-8BIT')
FREESECT =

2.1 Compound File Sector Numbers and Types

-1
ENDOFCHAIN =
-2
FATSECT =
-3
DIFSECT =
-4
MAXREGSECT =
-6
HEADER_SIGNATURE =

Compound File Header

String.new("\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1", encoding: 'ASCII-8BIT')
HEADER_CLSID =
String.new("\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", encoding: 'ASCII-8BIT')
HEADER_MINOR_VERSION =
String.new("\x3e\x00", encoding: 'ASCII-8BIT')
MAXREGSID =
-6
NOSTREAM =
-1
STREAM =
2
ENTRY_TYPES =

2.6.1 Compound File Directory Entry

['unknown', 'storage', 'stream', 'lockbytes', 'property', 'root']
SEED_FILENAME =

Initial seed filename

"\u0001Sh33tJ5"

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeSimpleCfb

Returns a new instance of SimpleCfb.



128
129
130
# File 'lib/simple_cfb/simple_cfb.rb', line 128

def initialize
  self.reinit()
end

Instance Attribute Details

#file_indexObject

PUBLIC INSTANCE INTERFACE



126
127
128
# File 'lib/simple_cfb/simple_cfb.rb', line 126

def file_index
  @file_index
end

#full_pathsObject

PUBLIC INSTANCE INTERFACE



126
127
128
# File 'lib/simple_cfb/simple_cfb.rb', line 126

def full_paths
  @full_paths
end

Class Method Details

.get_int32le(input, index = 0) ⇒ Object

Treat an input ASCII-8BIT encoded string as 4 bytes and from this parse and return a signed 32-bit little-endian integer.

input

ASCII-8BIT encoded string including 4 byte sequence

index

Index into input to start reading bytes (default 0)



98
99
100
101
102
103
# File 'lib/simple_cfb/simple_cfb.rb', line 98

def self.get_int32le(input, index = 0)
  data = input.slice(index, 4)
  data = data.reverse() unless self.host_is_little_endian?

  data.unpack('l').first
end

.get_time(data) ⇒ Object

Parse a ctime/mtime 8-byte sequence (4 16-bit little endian pairs) into a returned Ruby Time object, or nil if the values are all zero.

data

ASCII-8BIT encoded string, 8 bytes long.



110
111
112
113
114
115
116
117
118
119
120
# File 'lib/simple_cfb/simple_cfb.rb', line 110

def self.get_time(data)
  high = self.get_uint32le(data, 4)
  low  = self.get_uint32le(data, 0)

  return nil if high.zero? && low.zero?

  high = (high / 1e7) * 2.pow(32)
  low  = (low  / 1e7)

  return Time.at(high + low - 11644473600).utc
end

.get_uint32le(input, index = 0) ⇒ Object

Treat an input ASCII-8BIT encoded string as 4 bytes and from this parse and return an unsigned 32-bit little-endian integer.

input

ASCII-8BIT encoded string including 4 byte sequence

index

Index into input to start reading bytes (default 0)



85
86
87
88
89
90
# File 'lib/simple_cfb/simple_cfb.rb', line 85

def self.get_uint32le(input, index = 0)
  data = input.slice(index, 4)
  data = data.reverse() unless self.host_is_little_endian?

  data.unpack('L').first
end

.host_is_little_endian?Boolean

Returns true if the executing computer is little-endian natively, else false.

Returns:

  • (Boolean)


75
76
77
# File 'lib/simple_cfb/simple_cfb.rb', line 75

def self.host_is_little_endian?
  [42].pack('l').bytes[0] == 42
end

Instance Method Details

#add(name, content) ⇒ Object

Add a file entry. Supports only root filenames only. File must not be added already.

name

Filename, e.g. “Foo”, in your preferred string encoding

content

Mandatory ASCII-8BIT encoded string containing file data



138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
# File 'lib/simple_cfb/simple_cfb.rb', line 138

def add(name, content)
  self.reinit()

  fpath = self.full_paths[0]

  if name.slice(0, fpath.size) == fpath
    fpath = name
  else
    fpath += '/' unless fpath.end_with?('/')
    fpath  = (fpath + name).gsub('//', '/')
  end

  file = OpenStruct.new({name: filename(name), type: 2, content: content, size: content.bytesize})

  self.file_index << file
  self.full_paths << fpath

  rebuild(force_gc: true)

  return file
end

#parse!(file) ⇒ Object

Parses an input file into this object, allowing you to extract individual files thereafter via #read.

file

Source I/O stream. Data is read from the current file pointer, which will therefore have advanced when the method returns.



427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
# File 'lib/simple_cfb/simple_cfb.rb', line 427

def parse!(file)
  raise "CFB corrupt - file size < 512 bytes" if file.size < 512

  mver          = 3
  ssz           = 512
  nmfs          = 0 # number of mini FAT sectors
  difat_sec_cnt = 0
  dir_start     = 0
  minifat_start = 0
  difat_start   = 0
  fat_addrs     = [] # locations of FAT sectors

  # [MS-CFB] 2.2 Compound File Header
  # Check major version
  #
  major, minor = self.check_get_mver(file)

  if major == 3
    ssz = 512
  elsif major == 4
    ssz = 4096
  elsif major == 0 && minor == 0
    raise 'Zip contents are not supported'
  else
    raise "Major version: Only 3 or 4 is supported; #{mver} encountered"
  end

  self.check_shifts(file, major)

  # Number of Directory Sectors
  #
  dir_cnt = self.read_shift(file, 4, 'i')
  raise "Directory sectors: Expected 0, saw #{dir_cnt}" if major == 3 && dir_cnt != 0

  # Number of FAT Sectors
  #
  file.seek(file.pos + 4)

  # First Directory Sector Location
  #
  dir_start = self.read_shift(file, 4, 'i')

  # Transaction Signature
  #
  file.seek(file.pos + 4)

  # Mini Stream Cutoff Size
  #
  self.check_field(file, "\x00\x10\x00\x00", 'Mini stream cutoff size')

  # First Mini FAT Sector Location
  #
  minifat_start = self.read_shift(file, 4, 'i')

  # Number of Mini FAT Sectors
  #
  nmfs = self.read_shift(file, 4, 'i')

  # First DIFAT sector location
  #
  difat_start = self.read_shift(file, 4, 'i')

  # Number of DIFAT Sectors
  #
  difat_sec_cnt = self.read_shift(file, 4, 'i')

  # Grab FAT Sector Locations
  #
  q = -1
  j = 0

  while (j < 109) # 109 = (512 - file.pos) >> 2
    q = self.read_shift(file, 4, 'i')
    break if q < 0
    fat_addrs[j] = q
    j += 1
  end

  # Break the file up into sectors, skipping the file header of 'ssz' size.
  #
  sectors = []
  file.seek(ssz)

  while ! file.eof?
    sectors << file.read(ssz)
  end

  self.sleuth_fat(difat_start, difat_sec_cnt, sectors, ssz, fat_addrs)

  # Chains
  #
  sector_list = self.make_sector_list(sectors, dir_start, fat_addrs, ssz)
  sector_list[dir_start].name = '!Directory'

  if nmfs > 0 && minifat_start != ENDOFCHAIN
    sector_list[minifat_start].name = '!MiniFAT'
  end

  sector_list[fat_addrs[0]].name = '!FAT'
  sector_list.fat_addrs          = fat_addrs
  sector_list.ssz                = ssz

  # [MS-CFB] 2.6.1 Compound File Directory Entry
  #
  files = {}
  paths = []

  self.full_paths = []
  self.file_index = []
  self.read_directory(
    dir_start,
    sector_list,
    sectors,
    paths,
    nmfs,
    files,
    minifat_start
  )

  self.build_full_paths(paths)
ensure
  file.close() unless file.nil?
end

#writeObject

Compile and return the CFB file data.



162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
# File 'lib/simple_cfb/simple_cfb.rb', line 162

def write

  # Commented out for now, because we prefer parity with the JS code for
  # test verification purposes. The overhead seems minimal.
  #
  # # Get rid of the seed file if it's still present and we seem to have
  # # more file entries than the root directory and seed entry.
  # #
  # seed_leaf  = "/#{SEED_FILENAME}"
  # seed_index = self.full_paths.find_index do | path |
  #   path.end_with?(seed_leaf)
  # end
  #
  # unless seed_index.nil? || self.file_index.size < 3
  #   self.file_index.delete_at(seed_index)
  #   self.full_paths.delete_at(seed_index)
  # end
  #
  # self.rebuild(force_gc: true)
  self.rebuild(force_gc: false)

  mini_size = 0
  fat_size  = 0

  0.upto(self.file_index.size - 1) do | i |
    flen = self.file_index[i]&.content&.bytesize
    next if flen.nil? || flen.zero?

    if flen < 0x1000
      mini_size += (flen + 0x3F) >> 6
    else
      fat_size  += (flen + 0x01FF) >> 9
    end
  end

  dir_cnt   = (self.full_paths.size + 3) >> 2
  mini_cnt  = (mini_size + 7) >> 3
  mfat_cnt  = (mini_size + 0x7F) >> 7
  fat_base  = mini_cnt + fat_size + dir_cnt + mfat_cnt
  fat_cnt   = (fat_base + 0x7F) >> 7
  difat_cnt = fat_cnt <= 109 ? 0 : ((fat_cnt - 109).to_f / 0x7F).ceil()

  while (((fat_base + fat_cnt + difat_cnt + 0x7F) >> 7) > fat_cnt)
    fat_cnt += 1
    difat_cnt = fat_cnt <= 109 ? 0 : ((fat_cnt - 109).to_f / 0x7F).ceil()
  end

  el = [1, difat_cnt, fat_cnt, mfat_cnt, dir_cnt, fat_size, mini_size, 0]

  self.file_index[0].size  = mini_size << 6
  self.file_index[0].start = el[0] + el[1] + el[2] + el[3] + el[4] + el[5]

  el[7] = el[0] + el[1] + el[2] + el[3] + el[4] + el[5] + ((el[6] + 7) >> 3)

  o = String.new(encoding: 'ASCII-8BIT')

  o << HEADER_SIGNATURE
  o << NUL * 2 * 8
  o << write_shift(2, 0x003E)
  o << write_shift(2, 0x0003)
  o << write_shift(2, 0xFFFE)
  o << write_shift(2, 0x0009)
  o << write_shift(2, 0x0006)
  o << NUL * 2 * 3

  o << write_shift( 4, 0)
  o << write_shift( 4, el[2])
  o << write_shift( 4, el[0] + el[1] + el[2] + el[3] - 1)
  o << write_shift( 4, 0)
  o << write_shift( 4, 1<<12)
  o << write_shift( 4, (el[3].blank? || el[3].zero?) ? ENDOFCHAIN : el[0] + el[1] + el[2] - 1)
  o << write_shift( 4, el[3])
  o << write_shift(-4, (el[1].blank? || el[1].zero?) ? ENDOFCHAIN : el[0] - 1)
  o << write_shift( 4, el[1])

  i = 0
  t = 0

  while i < 109
    o << write_shift(-4, i < el[2] ? el[1] + i : -1)
    i += 1
  end

  unless el[1].blank? || el[1].zero?
    t = 0
    while t < el[1]
      while i < 236 + t * 127
        o << write_shift(-4, i < el[2] ? el[1] + i : -1)
        i += 1
      end

      o << write_shift(-4, t == el[1] - 1 ? ENDOFCHAIN : t + 1)
      t += 1
    end
  end

  chainit = Proc.new do | w |
    t += w

    while i < t - 1
      o << write_shift(-4, i + 1)
      i += 1
    end

    unless w.blank? || w.zero?
      i += 1
      o << write_shift(-4, ENDOFCHAIN)
    end
  end

  i = 0
  t = el[1]

  while i < t
    o << write_shift(-4, DIFSECT)
    i += 1
  end

  t += el[2]

  while i < t
    o << write_shift(-4, FATSECT)
    i += 1
  end

  chainit.call(el[3])
  chainit.call(el[4])

  j    = 0
  flen = 0
  file = self.file_index[0]

  while j < self.file_index.size
    file = self.file_index[j]
    j   += 1

    next if file.content.nil?

    flen = file.content.bytesize
    next if flen < 0x1000

    file.start = t
    chainit.call((flen + 0x01FF) >> 9)
  end

  chainit.call((el[6] + 7) >> 3)

  while o.size & 0x1FF != 0
    o << write_shift(-4, ENDOFCHAIN)
  end

  t = i = j = 0

  while j < self.file_index.size do
    file = self.file_index[j]
    j   += 1

    next if file.content.nil?

    flen = file.content.bytesize
    next if flen == 0 || flen >= 0x1000

    file.start = t
    chainit.call((flen + 0x3F) >> 6)
  end

  while o.size & 0x1FF != 0
    o << write_shift(-4, ENDOFCHAIN)
  end

  i = 0

  while i < (el[4] << 2) do
    nm = self.full_paths[i]

    if nm.blank?
      0.upto(16) { o << write_shift(4,  0) } # Remember, #upto is inclusive -> *17* words
      0.upto(2 ) { o << write_shift(4, -1) }
      0.upto(11) { o << write_shift(4,  0) }

      i += 1
      next # NOTE EARLY LOOP RESTART
    end

    file = self.file_index[i]

    if i.zero?
      file.start = file.size.blank? || file.size.zero? ? ENDOFCHAIN : file.start - 1;
    end

    u_nm = file.name
    u_nm = u_nm[0...32] if u_nm.size > 32

    flen = 2 * (u_nm.size + 1)

    o << write_shift(64, u_nm, 'utf16le')
    o << write_shift(2, flen)
    o << write_shift(1, file.type)
    o << write_shift(1, file.color)
    o << write_shift(-4, file.L)
    o << write_shift(-4, file.R)
    o << write_shift(-4, file.C)

    if file.clsid.blank?
      j = 0
      while j < 4
        o << write_shift(4, 0)
        j += 1
      end
    else
      o << file.clsid
    end

    o << write_shift(4, file.state.blank? || file.state.zero? ? 0 : file.state)
    o << write_shift(4, 0)
    o << write_shift(4, 0)
    o << write_shift(4, 0)
    o << write_shift(4, 0)
    o << write_shift(4, file.start)
    o << write_shift(4, file.size)
    o << write_shift(4, 0)

    i += 1
  end

  i = 1

  while i < self.file_index.size do
    file = self.file_index[i]

    if file.size.present? && file.size >= 0x1000
      aligned_size = (file.start + 1) << 9
      while (o.size < aligned_size) do; o << 0x00; end

      o << file.content
      while (o.size % 512 != 0) do; o << 0x00; end
    end

    i += 1
  end

  i = 1

  while i < self.file_index.size do
    file = self.file_index[i]

    if file.size.present? && file.size > 0 && file.size < 0x1000
      o << file.content
      while (o.size % 64 != 0) do; o << 0x00; end
    end

    i += 1
  end

  while (o.size < el[7] << 9) do; o << 0x00; end

  return o
end