Simple Spreadsheet Extractor

Authors

Stuart Owen, Finn Bacall

Version

0.15.0

Contact

[email protected]

Licence

BSD (See LICENCE or www.opensource.org/licenses/bsd-license.php)

Copyright

© 2010-2015 The University of Manchester, UK

<img src=“https://codeclimate.com/github/myGrid/simple-spreadsheet-extractor-gem/badges/gpa.svg” />

<img src=“https://travis-ci.org/myGrid/simple-spreadsheet-extractor-gem.svg?branch=master” alt=“Build Status” />

Synopsis

This is a simple gem that provides a facility to read an XLS or XLSX Excel spreadsheet document and produce an XML representation of its content.

CSV output can also be generated for a single sheet.

Internally it uses Apache POI, using the sister github.com/myGrid/simple-spreadsheet-extractor tool.

This is a simple tool developed for use within SysMO-DB.

Installation

Java 1.7 (JRE) is required.

gem install simple-spreadsheet-extractor

*Note that Windows is no longer supported* (since version 0.7.2.1)

Usage

  • require ‘simple-spreadsheet-extractor’

  • include the module SysMODB::SpreadsheetExtractor

  • pass an IO object to the method spreedsheet_to_xml which responds with the XML for the contents of the spreadsheet. Alternatively use spreadsheet_to_csv for CSV.

  • if something goes wrong with the extraction then a SysMODB::SpreadsheetExtractionException will be thrown

  • by default the JVM is allocated 512M of memory, you can override this by passing a string as the last argument. This will be passed to -Xmx in the java command.

e.g.

#examples/example.rb - takes path, i.e. ruby example.rb /tmp/spreadsheet.xls
require 'rubygems'
require 'simple-spreadsheet-extractor'

include SysMODB::SpreadsheetExtractor

path=ARGV.first

f=open path
begin
  puts spreadsheet_to_xml(f)
rescue SysMODB::SpreadsheetExtractionException=>e
  puts "Something went wrong #{e.message}"
end

Formulas are evaluated placing the result in the XML produced for that cell, however the original formula is included as an attribute.

Row and column indexes start at 1, rather than 0, to keep consistent with namings of the cells in Excel.

An XSD schema for the XML is available in doc/schema-v1.xsd

The desired spreadsheet extractor jar can be specified by defining SPREADSHEET_EXTRACTOR_JAR_PATH in a config file (e.g. environment.rb)

CSV can be generated in a similar way, and also takes an optional sheet number. If the sheet number is missing then the first sheet is used.

Note that the sheet number for the first sheet is 1, and can either be a string or integer.

e.g.

puts spreadsheet_to_csv(f,"1")

Example XML

<?xml version="1.0" encoding="UTF-8"?>
<workbook xmlns="http://www.sysmo-db.org/2010/xml/spreadsheet">
  <named_ranges/>
  <styles>
    <style id="style21">
      <border-top>dotted 1pt</border-top>
      <border-bottom>dotted 1pt</border-bottom>
      <border-left>dotted 1pt</border-left>
      <border-right>dotted 1pt</border-right>
      <background-color>#ffcc99</background-color>
    </style>
    <style id="style22">
      <border-top>dotted 1pt</border-top>
      <border-bottom>dotted 1pt</border-bottom>
      <border-left>dotted 1pt</border-left>
      <border-right>dotted 1pt</border-right>
      <font-weight>bold</font-weight>
      <font-style>italics</font-style>
      <text-decoration>underline</text-decoration>
    </style>
    <style id="style23">
      <color>#2323dc</color>
    </style>
  </styles>
  <sheet name="Sheet1" index="1" hidden="false" very_hidden="false">
    <data_validations/>
    <columns first_column="1" last_column="3">
      <column index="1" column_alpha="A" width="2048"/>
      <column index="2" column_alpha="B" width="2048"/>
      <column index="3" column_alpha="C" width="2048"/>
    </columns>
    <rows first_row="1" last_row="6">
      <row index="1">
        <cell column="1" column_alpha="A" row="1" type="numeric">13.0</cell>
        <cell column="2" column_alpha="B" row="1" type="numeric">654153.0</cell>
        <cell column="27" column_alpha="AA" row="1" type="string">AA</cell>
        <cell column="28" column_alpha="AB" row="1" type="string">AB</cell>
        <cell column="53" column_alpha="BA" row="1" type="string">BA</cell>
        <cell column="54" column_alpha="BB" row="1" type="string">BB</cell>
        <cell column="55" column_alpha="BC" row="1" type="string">BC</cell>
      </row>
      <row index="2">
        <cell column="1" column_alpha="A" row="2" type="numeric" style="style21">547654.0</cell>
      </row>
      <row index="3">
        <cell column="1" column_alpha="A" row="3" type="numeric">45465.0</cell>
      </row>
      <row index="4" height="14.05pt">
        <cell column="1" column_alpha="A" row="4" type="numeric" style="style22" formula="A1+1">14.0</cell>
      </row>
      <row index="6">
        <cell column="1" column_alpha="A" row="6" type="datetime" style="style23" formula="DATE(2009,6,15)">2009-06-15T0:0:0+0100</cell>
      </row>
    </rows>
  </sheet>
  <sheet name="Sheet2" index="2" hidden="false" very_hidden="false">
    <data_validations/>
    <columns first_column="1" last_column="1">
      <column index="1" column_alpha="A" width="2048"/>
    </columns>
    <rows first_row="1" last_row="1"/>
  </sheet>
  <sheet name="Sheet3" index="3" hidden="false" very_hidden="false">
    <columns first_column="1" last_column="1">
      <column index="1" column_alpha="A" width="2048"/>
    </columns>
    <rows first_row="1" last_row="1"/>
  </sheet>
</workbook>