Class: ModsulatorSheet

Inherits:
Object
  • Object
show all
Defined in:
app/models/modsulator_sheet.rb

Overview

This class provides methods to parse Stanford’s MODS spreadsheets into either an array of hashes, or a JSON string.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(file, filename) ⇒ ModsulatorSheet

Creates a new ModsulatorSheet. When called with temporary files, the filename must be specified separately, hence the second argument.

Parameters:

  • file (File)

    The input spreadsheet

  • filename (String)

    The filename of the input spreadsheet.



15
16
17
18
# File 'app/models/modsulator_sheet.rb', line 15

def initialize(file, filename)
  @file = file
  @filename = filename
end

Instance Attribute Details

#fileObject (readonly)

Returns the value of attribute file.



9
10
11
# File 'app/models/modsulator_sheet.rb', line 9

def file
  @file
end

#filenameObject (readonly)

Returns the value of attribute filename.



9
10
11
# File 'app/models/modsulator_sheet.rb', line 9

def filename
  @filename
end

Instance Method Details

#headersObject

Get the headers used in the spreadsheet



48
49
50
# File 'app/models/modsulator_sheet.rb', line 48

def headers
  rows.first.keys
end

#rowsArray<Hash>

Loads the input spreadsheet into an array of hashes. This spreadsheet should conform to the Stanford MODS template format, which has three header rows. The first row is a kind of “super header”, the second row is an intermediate header and the third row is the header row that names the fields. The data rows are in the fourth row onwards.

Returns:

  • (Array<Hash>)

    An array with one entry per data row in the spreadsheet. Each entry is a hash, indexed by the spreadsheet headers.



27
28
29
30
31
# File 'app/models/modsulator_sheet.rb', line 27

def rows
  # Parse the spreadsheet, automatically finding the header row by looking for "druid" and "sourceId" and leave the
  # header row itself out of the resulting array. Everything preceding the header row is discarded.
  @rows ||= spreadsheet.parse(header_search: ['druid', 'sourceId'], clean: true)
end

#spreadsheetRoo::CSV, ...

Opens a spreadsheet based on its filename extension.

Returns:

  • (Roo::CSV, Roo::Excel, Roo::Excelx)

    A Roo object, whose type depends on the extension of the given filename.



37
38
39
40
41
42
43
44
# File 'app/models/modsulator_sheet.rb', line 37

def spreadsheet
  @spreadsheet ||= case File.extname(@filename)
                   when '.csv' then Roo::Spreadsheet.open(@file, extension: :csv)
                   when '.xls' then Roo::Spreadsheet.open(@file, extension: :xls)
                   when '.xlsx' then Roo::Spreadsheet.open(@file, extension: :xlsx)
                   else fail "Unknown file type: #{@filename}"
  end
end

#to_jsonString

Convert the loaded spreadsheet to a JSON string.

Returns:

  • (String)

    A JSON string.



55
56
57
58
59
60
# File 'app/models/modsulator_sheet.rb', line 55

def to_json
  json_hash = {}
  json_hash['filename'] = File.basename(filename)
  json_hash['rows'] = rows
  json_hash.to_json
end