Class: PDF::Reader::LZW

Inherits:
Object
  • Object
show all
Defined in:
lib/pdf/reader/lzw.rb

Overview

A general class for decoding LZW compressed data. LZW can be used in PDF files to compresses streams, usually for image data sourced from a TIFF file.

See the following links for more information:

ref http://www.fileformat.info/format/tiff/corion-lzw.htm
ref http://marknelson.us/1989/10/01/lzw-data-compression/

The PDF spec also has some data on the algorithm.

Defined Under Namespace

Classes: BitStream, StringTable

Constant Summary collapse

CODE_EOD =

end of data

257
CODE_CLEAR_TABLE =

clear table

256

Class Method Summary collapse

Class Method Details

.decode(data) ⇒ Object

Decompresses a LZW compressed string.



91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
# File 'lib/pdf/reader/lzw.rb', line 91

def self.decode(data)
  stream = BitStream.new(data.to_s, 9) # size of codes between 9 and 12 bits
  string_table = StringTable.new
  result = "".dup
  until (code = stream.read) == CODE_EOD
    if code == CODE_CLEAR_TABLE
      stream.set_bits_in_chunk(9)
      string_table = StringTable.new
      code = stream.read
      break if code == CODE_EOD
      result << string_table[code]
      old_code = code
    else
      string = string_table[code]
      if string
        result << string
        string_table.add create_new_string(string_table, old_code, code)
        old_code = code
      else
        new_string = create_new_string(string_table, old_code, old_code)
        result << new_string
        string_table.add new_string
        old_code = code
      end
      #increase de size of the codes when limit reached
      if string_table.string_table_pos == 511
        stream.set_bits_in_chunk(10)
      elsif string_table.string_table_pos == 1023
        stream.set_bits_in_chunk(11)
      elsif string_table.string_table_pos == 2047
        stream.set_bits_in_chunk(12)
      end
    end
  end
  result
end