Class: Oxcelix::Workbook

Inherits:
Object
  • Object
show all
Includes:
Cellhelper, Numformats, Workbookhelper
Defined in:
lib/oxcelix/workbook.rb

Overview

A class that represents an Excel workbook. By default, it will open the excel file, and convert it to a collection of Matrix objects

Direct Known Subclasses

FormattedWorkbook, RawWorkbook, RuValueWorkbook

Constant Summary

Constants included from Numformats

Numformats::Dtmap, Numformats::Formatarray

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from Numformats

#add_custom_formats, #datetime, #numeric

Methods included from Workbookhelper

#[]

Methods included from Cellhelper

#x, #y

Constructor Details

#initialize(filename = nil, options = {}) ⇒ Workbook

Create a new Oxcelix::Workbook object.

filename is the name of the Excel 2007/2010 file (xlsx) to be opened (Optional)

options is a collection of options that can be passed to Workbook. Options may include:

  • :copymerge (=> true/false) - Copy and repeat the content of the merged cells into the whole group, e.g.

the group of three merged cells | a | will become: |a|a|a|

  • :include (Array) - an array of sheet names to be included

  • :exclude (Array) - an array of sheet names not to be processed

  • :paginate (Array) - an array that defines the number of lines to be included in the pagination and the page to be parsed

  • :cellrange (Range) - the range of cells to be included in parsing

If a filename gets passed, the excel file is first getting unzipped, then the workbook.xml file gets processed. This file stores sheet metadata, which will be filtered (by including and excluding sheets from further processing)

The next stage is building sheets. This includes:

  • Parsing the XML files representing the sheets

  • Interpolation of the shared strings

  • adding comments to the cells

  • Converting each sheet to a Matrix object

  • Deleting the temporary directory that stores the XML files.


53
54
55
56
57
58
59
60
61
62
63
# File 'lib/oxcelix/workbook.rb', line 53

def initialize(filename=nil, options={})
  @sheets=[]
  @sheetbase={}
  @sharedstrings=[]
  unless filename.nil?
    unpack filename
    open options
    parse options
    FileUtils.remove_dir(@destination, true)
  end
end

Instance Attribute Details

#sheetsArray


19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
# File 'lib/oxcelix/workbook.rb', line 19

class Workbook
  include Cellhelper
  include Workbookhelper
  include Numformats

  attr_accessor :sheets

  ##
  # Create a new {Workbook} object.
  #
  # filename is the name of the Excel 2007/2010 file (xlsx) to be opened (Optional)
  #
  # options is a collection of options that can be passed to Workbook.
  # Options may include:
  # * :copymerge (=> true/false) - Copy and repeat the content of the merged cells into the whole group, e.g.
  # the group of three merged cells <tt>|   a   |</tt>
  # will become: <tt>|a|a|a|</tt>
  # * :include (Array) - an array of sheet names to be included
  # * :exclude (Array) - an array of sheet names not to be processed
  # * :paginate (Array) - an array that defines the number of lines to be included in the pagination and the page to be parsed
  # * :cellrange (Range) - the range of cells to be included in parsing
  #
  # If a filename gets passed, the excel file is first getting unzipped, then
  # the workbook.xml file gets processed.
  # This file stores sheet metadata, which will be filtered (by including
  # and excluding sheets from further processing)
  #
  # The next stage is building sheets.
  # This includes:
  # * Parsing the XML files representing the sheets
  # * Interpolation of the shared strings
  # * adding comments to the cells
  # * Converting each sheet to a Matrix object
  # * Deleting the temporary directory that stores the XML files.
  def initialize(filename=nil, options={})
    @sheets=[]
    @sheetbase={}
    @sharedstrings=[]
    unless filename.nil?
      unpack filename
      open options
      parse options
      FileUtils.remove_dir(@destination, true)
    end
  end

  at_exit do
    FileUtils.remove_dir(@destination, true)
  end

  # Unzips the excel file to a temporary directory. The directory will be removed at the end of the parsing stage when invoked
  # by initialize, otherwise at exit.
  # @param [String] filename the name of the Excel file to be unpacked
  def unpack(filename)
    @destination = Dir.mktmpdir
    Zip::File.open(filename){ |zip_file|
      zip_file.each{ |f|
        f_path=File.join(@destination, f.name)
        FileUtils.mkdir_p(File.dirname(f_path))
        zip_file.extract(f, f_path) unless File.exists?(f_path)
      }
    }
  end

  # Parses workbook metadata (sheet data, comments, shared strings)
  # @param [Hash] options Options affecting file opening, metadata collection and processing.
  def open(options={})
    f=IO.read(@destination + '/xl/workbook.xml')
    a=Ox::load(f)

    sheetdata(a, options); commentsrel; shstrings;

    @styles = Styles.new()
    File.open(@destination + '/xl/styles.xml', 'r') do |f|
      Ox.sax_parse(@styles, f)
    end

    @styles.temparray.sort_by!{|st| st[:numFmtId].to_i}
    add_custom_formats @styles.temparray
    @styles.styleary.map!{|s| Numformats::Formatarray[s.to_i][:id].to_i}
  end

  # Parses sheet data by feeding the output of the Xlsheet SAX parser into the arrays representing the sheets.
  # @param [Hash] options Options that affect the parser.
  def parse(options={})
    @sheets.each do |x|
      if !options[:paginate].nil?
        lines = options[:paginate][0]; page = options[:paginate][1]
        sheet = PagSheet.new(lines, page)
      elsif !options[:cellrange].nil?
        range = options[:cellrange]
        sheet = Cellrange.new(range)
      else
        sheet = Xlsheet.new()
      end

      File.open(@destination+"/xl/#{x[:filename]}", 'r') do |f|
        Ox.sax_parse(sheet, f)
      end
      comments = mkcomments(x[:comments])
      sheet.cellarray.each do |sh|
        sh.numformat = @styles.styleary[sh.style.to_i]
        if sh.type=="s"
          sh.value = @sharedstrings[sh.value.to_i]
        end
        if !comments.nil?
          comm=comments.select {|c| c[:ref]==(sh.xlcoords)}
          if comm.size > 0
            sh.comment=comm[0][:comment]
          end
          comments.delete_if{|c| c[:ref]==(sh.xlcoords)}
        end
      end
      x[:cells] = sheet.cellarray
      x[:mergedcells] = sheet.mergedcells
    end
    matrixto options
  end

  private
  # @private
  # Given the data found in workbook.xml, create a hash and push it to the sheets
  # array.
  #
  # The hash will not be pushed into the array if the sheet name is blacklisted
  # (it appears in the *excluded_sheets* array) or does not appear in the list of
  # included sheets.
  #
  # If *included_sheets* (the array of whitelisted sheets) is *nil*, the hash is added.
  def sheetdata(wb_file, options)
    wb_file.locate("workbook/sheets/*").each do |x|
      @sheetbase[:name] = x[:name]
      @sheetbase[:sheetId] = x[:sheetId]
      @sheetbase[:relationId] = x[:"r:id"]

      relationshipfile=nil
      fname=nil
      unless Dir[@destination + '/xl/_rels'].empty?
        Find.find(@destination + '/xl/_rels') do |path|
          if File.basename(path).split(".").last=='rels'
            g=IO.read(path)
            relationshipfile=Ox::load(g)
          end
        end
      end
      relationshipfile.locate("Relationships/*").each do |rship|
        if rship[:Id] == x[:"r:id"]
          @sheetbase[:filename]=rship[:Target]
        end
      end


      @sheets << @sheetbase
      @sheetbase=Hash.new
    end
    sheetarr=@sheets.map{|i| i[:name]}
    sheet_collection(sheetarr, options)
  end

  # Build the array of working sheets based on the :include and :exclude parameters.
  # @param[sheetarr, options]
  def sheet_collection(sheetarr, options)
    if options[:include].nil?; options[:include]=[]; end
    if options[:include].to_a.size>0
      sheetarr.keep_if{|item| options[:include].to_a.detect{|d| d==item}}
    end
    sheetarr=sheetarr-options[:exclude].to_a
    @sheets.keep_if{|item| sheetarr.detect{|d| d==item[:name]}}
    @sheets.uniq!
  end

  # Build the relationship between sheets and the XML files storing the comments
  # to the actual sheet.
  def commentsrel
   unless Dir[@destination + '/xl/worksheets/_rels'].empty?
    Find.find(@destination + '/xl/worksheets/_rels') do |path|
      if File.basename(path).split(".").last=='rels'
        a=IO.read(path)
        f=Ox::load(a)
        f.locate("Relationships/*").each do |x|
          if x[:Target].include?"comments"
            @sheets.each do |s|
              if "worksheets/" + File.basename(path,".rels")==s[:filename]
                s[:comments]=x[:Target]
              end
            end
          end
        end
      end
    end
   else
     @sheets.each do |s|
       s[:comments]=nil
     end
   end
  end

  # Invokes the Sharedstrings helper class
  def shstrings
    strings = Sharedstrings.new()
    File.open(@destination + '/xl/sharedStrings.xml', 'r') do |f|
      Ox.sax_parse(strings, f)
    end
    @sharedstrings=strings.stringarray
  end

  # Parses the comments related to the actual sheet.
  # @param [String] commentfile
  # @return [Array] a collection of comments relative to the Excel sheet currently processed
  def mkcomments(commentfile)
    unless commentfile.nil?
      comms = Comments.new()
      File.open(@destination + '/xl/'+commentfile.gsub('../', ''), 'r') do |f|
        Ox.sax_parse(comms, f)
      end
      return comms.commarray
    end
  end

  # Returns an array of Matrix objects.
  # For each sheet, matrixto first checks the address (xlcoords) of the
  # last cell in the cellarray, then builds a *nil*-filled Matrix object of
  # size *xlcoords.x, xlcoords.y*.
  #
  # The matrix will then be filled with Cell objects according to their coordinates.
  #
  # If the *copymerge* parameter is *true*, it creates a submatrix (minor)
  # of every mergegroup (based on the mergedcells array relative to the actual
  # sheet), and after the only meaningful cell of the minor is found, it is
  # copied back to the remaining cells of the group. The coordinates (xlcoords)
  # of each copied cell is changed to reflect the actual Excel coordinate.
  #
  # The matrix will replace the array of cells in the actual sheet.
  # @param [Hash] options
  # @return [Matrix] a Matrix object that stores the cell values, and, depending on the copymerge parameter, will copy the merged value
  #  into every merged cell
  def matrixto(options)
    @sheets.each_with_index do |sheet, i|
      if sheet[:cells].empty?
        m=Sheet.build(0,0)
      else
        m=buildsheet(sheet, options)
        if options[:copymerge]==true
          sheet[:mergedcells].each do |mc|
            a = mc.split(':')
            x1=x(a[0])
            y1=y(a[0])
            x2=x(a[1])
            y2=y(a[1])
            mrange=m.minor(y1..y2, x1..x2)
            valuecell=mrange.to_a.flatten.compact[0]
            (x1..x2).each do |col|
              (y1..y2).each do |row|
                m, valuecell = mergevalues(m, col, row, valuecell)
              end
            end
          end
        end
        m.name=@sheets[i][:name]; m.sheetId=@sheets[i][:sheetId]; m.relationId=@sheets[i][:relationId]
        @sheets[i]=m
      end
    end
  end

  # buildsheet creates a matrix of the needed size and fills it with the cells. Mainly for internal use only.
  # When paginating or parsing only a range of cells, the size of the matrix will be adjusted (no nil values
  # will be left at the beginning of the sheet), to preserve memory.
  # @param [Sheet] sheet the actual sheetarray.
  # @param [Hash] options :paginate or :cellrange will affect the size of the matrix
  # @return [Sheet] a Sheet object that stores the cell values.
  def buildsheet(sheet, options)
    ydiff, xdiff = 0,0
    if !options[:paginate].nil?
      ydiff = options[:paginate][0] * (options[:paginate][1]-1)
    elsif !options[:cellrange].nil?
      xdiff = x(options[:cellrange].begin)
      ydiff = y(options[:cellrange].begin)
    end

    m=Sheet.build(sheet[:cells].last.y+1-ydiff, sheet[:cells].last.x+1-xdiff) {nil}
    sheet[:cells].each do |c|
      m[c.y-ydiff, c.x-xdiff] = c
    end
    return m
  end

  # Replace the empty values of the mergegroup with cell values or nil.
  # @param [Matrix] m the Sheet object
  # @param [Integer] col Column of the actual cell
  # @param [Integer] row Row of the actual cell
  # @param [Cell] valuecell A Cell containing the value to be copied over the mergegroup
  # @return [Matrix, Cell] the sheet and the new (empty) cell or nil.
  def mergevalues(m, col, row, valuecell)
    if valuecell != nil
      valuecell.xlcoords=(col.col_name)+(row+1).to_s
      m[row, col]=valuecell
      return m, valuecell
    else
      valuecell=Cell.new
      valuecell.xlcoords=(col.col_name)+(row+1).to_s
      m[row, col]=valuecell
      return m, valuecell
    end
  end
end

Instance Method Details

#buildsheet(sheet, options) ⇒ Sheet (private)

buildsheet creates a matrix of the needed size and fills it with the cells. Mainly for internal use only. When paginating or parsing only a range of cells, the size of the matrix will be adjusted (no nil values will be left at the beginning of the sheet), to preserve memory.


289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
# File 'lib/oxcelix/workbook.rb', line 289

def buildsheet(sheet, options)
  ydiff, xdiff = 0,0
  if !options[:paginate].nil?
    ydiff = options[:paginate][0] * (options[:paginate][1]-1)
  elsif !options[:cellrange].nil?
    xdiff = x(options[:cellrange].begin)
    ydiff = y(options[:cellrange].begin)
  end

  m=Sheet.build(sheet[:cells].last.y+1-ydiff, sheet[:cells].last.x+1-xdiff) {nil}
  sheet[:cells].each do |c|
    m[c.y-ydiff, c.x-xdiff] = c
  end
  return m
end

#commentsrelObject (private)

Build the relationship between sheets and the XML files storing the comments to the actual sheet.


192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
# File 'lib/oxcelix/workbook.rb', line 192

def commentsrel
 unless Dir[@destination + '/xl/worksheets/_rels'].empty?
  Find.find(@destination + '/xl/worksheets/_rels') do |path|
    if File.basename(path).split(".").last=='rels'
      a=IO.read(path)
      f=Ox::load(a)
      f.locate("Relationships/*").each do |x|
        if x[:Target].include?"comments"
          @sheets.each do |s|
            if "worksheets/" + File.basename(path,".rels")==s[:filename]
              s[:comments]=x[:Target]
            end
          end
        end
      end
    end
  end
 else
   @sheets.each do |s|
     s[:comments]=nil
   end
 end
end

#matrixto(options) ⇒ Matrix (private)

Returns an array of Matrix objects. For each sheet, matrixto first checks the address (xlcoords) of the last cell in the cellarray, then builds a nil-filled Matrix object of size *xlcoords.x, xlcoords.y*.

The matrix will then be filled with Cell objects according to their coordinates.

If the copymerge parameter is true, it creates a submatrix (minor) of every mergegroup (based on the mergedcells array relative to the actual sheet), and after the only meaningful cell of the minor is found, it is copied back to the remaining cells of the group. The coordinates (xlcoords) of each copied cell is changed to reflect the actual Excel coordinate.

The matrix will replace the array of cells in the actual sheet.


255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
# File 'lib/oxcelix/workbook.rb', line 255

def matrixto(options)
  @sheets.each_with_index do |sheet, i|
    if sheet[:cells].empty?
      m=Sheet.build(0,0)
    else
      m=buildsheet(sheet, options)
      if options[:copymerge]==true
        sheet[:mergedcells].each do |mc|
          a = mc.split(':')
          x1=x(a[0])
          y1=y(a[0])
          x2=x(a[1])
          y2=y(a[1])
          mrange=m.minor(y1..y2, x1..x2)
          valuecell=mrange.to_a.flatten.compact[0]
          (x1..x2).each do |col|
            (y1..y2).each do |row|
              m, valuecell = mergevalues(m, col, row, valuecell)
            end
          end
        end
      end
      m.name=@sheets[i][:name]; m.sheetId=@sheets[i][:sheetId]; m.relationId=@sheets[i][:relationId]
      @sheets[i]=m
    end
  end
end

#mergevalues(m, col, row, valuecell) ⇒ Matrix, Cell (private)

Replace the empty values of the mergegroup with cell values or nil.


311
312
313
314
315
316
317
318
319
320
321
322
# File 'lib/oxcelix/workbook.rb', line 311

def mergevalues(m, col, row, valuecell)
  if valuecell != nil
    valuecell.xlcoords=(col.col_name)+(row+1).to_s
    m[row, col]=valuecell
    return m, valuecell
  else
    valuecell=Cell.new
    valuecell.xlcoords=(col.col_name)+(row+1).to_s
    m[row, col]=valuecell
    return m, valuecell
  end
end

#mkcomments(commentfile) ⇒ Array (private)

Parses the comments related to the actual sheet.


228
229
230
231
232
233
234
235
236
# File 'lib/oxcelix/workbook.rb', line 228

def mkcomments(commentfile)
  unless commentfile.nil?
    comms = Comments.new()
    File.open(@destination + '/xl/'+commentfile.gsub('../', ''), 'r') do |f|
      Ox.sax_parse(comms, f)
    end
    return comms.commarray
  end
end

#open(options = {}) ⇒ Object

Parses workbook metadata (sheet data, comments, shared strings)


85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
# File 'lib/oxcelix/workbook.rb', line 85

def open(options={})
  f=IO.read(@destination + '/xl/workbook.xml')
  a=Ox::load(f)

  sheetdata(a, options); commentsrel; shstrings;

  @styles = Styles.new()
  File.open(@destination + '/xl/styles.xml', 'r') do |f|
    Ox.sax_parse(@styles, f)
  end

  @styles.temparray.sort_by!{|st| st[:numFmtId].to_i}
  add_custom_formats @styles.temparray
  @styles.styleary.map!{|s| Numformats::Formatarray[s.to_i][:id].to_i}
end

#parse(options = {}) ⇒ Object

Parses sheet data by feeding the output of the Xlsheet SAX parser into the arrays representing the sheets.


103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
# File 'lib/oxcelix/workbook.rb', line 103

def parse(options={})
  @sheets.each do |x|
    if !options[:paginate].nil?
      lines = options[:paginate][0]; page = options[:paginate][1]
      sheet = PagSheet.new(lines, page)
    elsif !options[:cellrange].nil?
      range = options[:cellrange]
      sheet = Cellrange.new(range)
    else
      sheet = Xlsheet.new()
    end

    File.open(@destination+"/xl/#{x[:filename]}", 'r') do |f|
      Ox.sax_parse(sheet, f)
    end
    comments = mkcomments(x[:comments])
    sheet.cellarray.each do |sh|
      sh.numformat = @styles.styleary[sh.style.to_i]
      if sh.type=="s"
        sh.value = @sharedstrings[sh.value.to_i]
      end
      if !comments.nil?
        comm=comments.select {|c| c[:ref]==(sh.xlcoords)}
        if comm.size > 0
          sh.comment=comm[0][:comment]
        end
        comments.delete_if{|c| c[:ref]==(sh.xlcoords)}
      end
    end
    x[:cells] = sheet.cellarray
    x[:mergedcells] = sheet.mergedcells
  end
  matrixto options
end

#sheet_collection(sheetarr, options) ⇒ Object (private)

Build the array of working sheets based on the :include and :exclude parameters. @param[sheetarr, options]


180
181
182
183
184
185
186
187
188
# File 'lib/oxcelix/workbook.rb', line 180

def sheet_collection(sheetarr, options)
  if options[:include].nil?; options[:include]=[]; end
  if options[:include].to_a.size>0
    sheetarr.keep_if{|item| options[:include].to_a.detect{|d| d==item}}
  end
  sheetarr=sheetarr-options[:exclude].to_a
  @sheets.keep_if{|item| sheetarr.detect{|d| d==item[:name]}}
  @sheets.uniq!
end

#sheetdata(wb_file, options) ⇒ Object (private)

Given the data found in workbook.xml, create a hash and push it to the sheets array.

The hash will not be pushed into the array if the sheet name is blacklisted (it appears in the excluded_sheets array) or does not appear in the list of included sheets.

If included_sheets (the array of whitelisted sheets) is nil, the hash is added.


148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
# File 'lib/oxcelix/workbook.rb', line 148

def sheetdata(wb_file, options)
  wb_file.locate("workbook/sheets/*").each do |x|
    @sheetbase[:name] = x[:name]
    @sheetbase[:sheetId] = x[:sheetId]
    @sheetbase[:relationId] = x[:"r:id"]

    relationshipfile=nil
    fname=nil
    unless Dir[@destination + '/xl/_rels'].empty?
      Find.find(@destination + '/xl/_rels') do |path|
        if File.basename(path).split(".").last=='rels'
          g=IO.read(path)
          relationshipfile=Ox::load(g)
        end
      end
    end
    relationshipfile.locate("Relationships/*").each do |rship|
      if rship[:Id] == x[:"r:id"]
        @sheetbase[:filename]=rship[:Target]
      end
    end


    @sheets << @sheetbase
    @sheetbase=Hash.new
  end
  sheetarr=@sheets.map{|i| i[:name]}
  sheet_collection(sheetarr, options)
end

#shstringsObject (private)

Invokes the Sharedstrings helper class


217
218
219
220
221
222
223
# File 'lib/oxcelix/workbook.rb', line 217

def shstrings
  strings = Sharedstrings.new()
  File.open(@destination + '/xl/sharedStrings.xml', 'r') do |f|
    Ox.sax_parse(strings, f)
  end
  @sharedstrings=strings.stringarray
end

#unpack(filename) ⇒ Object

Unzips the excel file to a temporary directory. The directory will be removed at the end of the parsing stage when invoked by initialize, otherwise at exit.


72
73
74
75
76
77
78
79
80
81
# File 'lib/oxcelix/workbook.rb', line 72

def unpack(filename)
  @destination = Dir.mktmpdir
  Zip::File.open(filename){ |zip_file|
    zip_file.each{ |f|
      f_path=File.join(@destination, f.name)
      FileUtils.mkdir_p(File.dirname(f_path))
      zip_file.extract(f, f_path) unless File.exists?(f_path)
    }
  }
end