Class: Importu::Sources::XML

Inherits:
Object
  • Object
show all
Defined in:
lib/importu/sources/xml.rb

Overview

Note:

Requires the nokogiri gem.

Parses XML files as import source data.

Requires an XPath expression to identify which elements represent records. Each matching element becomes a row, with child elements and attributes as fields.

## Field Extraction For each matching element:

  • XML attributes become fields (e.g., ‘<book id=“123”>` → `{ “id” => “123” }`)

  • Child element text becomes fields (e.g., ‘<title>Ruby</title>` → `{ “title” => “Ruby” }`)

Examples:

Basic usage

source = Importu::Sources::XML.new("data.xml", records_xpath: "//book")
source.rows.each { |row| puts row["title"] }

Expected XML format

# data.xml
<library>
  <book id="1">
    <title>The Ruby Way</title>
    <author>Hal Fulton</author>
  </book>
  <book id="2">
    <title>Programming Ruby</title>
    <author>Dave Thomas</author>
  </book>
</library>

Resulting rows

# With records_xpath: "//book"
{ "id" => "1", "title" => "The Ruby Way", "author" => "Hal Fulton" }
{ "id" => "2", "title" => "Programming Ruby", "author" => "Dave Thomas" }

Configure in importer

class BookImporter < Importu::Importer
  source :xml, records_xpath: "//book"
end

Instance Method Summary collapse

Constructor Details

#initialize(infile, records_xpath:) ⇒ XML

Creates a new XML source.

Parameters:

  • infile (String, IO)

    file path or IO object to read from

  • records_xpath (String)

    XPath expression to select record elements

Raises:



54
55
56
57
58
59
60
61
62
63
64
65
66
67
# File 'lib/importu/sources/xml.rb', line 54

def initialize(infile, records_xpath:, **)
  @owns_handle = !infile.respond_to?(:readline)
  @infile = @owns_handle ? File.open(infile, "rb") : infile
  @records_xpath = records_xpath

  if reader.root.nil?
    raise Importu::InvalidInput, "Empty document"
  elsif reader.errors.any?
    raise Importu::InvalidInput, reader.errors.join("\n")
  end
rescue StandardError
  close
  raise
end

Instance Method Details

#closevoid

This method returns an undefined value.

Closes the underlying file handle if opened by this source.

Safe to call multiple times. Only closes handles that were opened by this source (not IO objects passed in).



75
76
77
78
# File 'lib/importu/sources/xml.rb', line 75

def close
  return unless @owns_handle && @infile && !@infile.closed?
  @infile.close
end

#rowsEnumerator<Hash>

Returns an enumerator that yields each matching element as a hash.

Element attributes and child element text content become hash keys.

Returns:

  • (Enumerator<Hash>)

    rows from matching XML elements



85
86
87
88
89
90
91
92
93
94
95
# File 'lib/importu/sources/xml.rb', line 85

def rows
  Enumerator.new do |yielder|
    reader.xpath(@records_xpath).each do |xml|
      data = [
        *xml.attribute_nodes.map {|a| [a.node_name, a.content] },
        *xml.elements.map {|e| [e.name, e.content]},
      ].to_h
      yielder.yield(data)
    end
  end
end

#write_errors(summary, only_errors: false) ⇒ Tempfile?

Generates an XML file with error information appended.

Creates a copy of the original data with an “_errors” child element containing any validation errors for each record.

Parameters:

  • summary (Importu::Summary)

    the import summary containing errors

  • only_errors (Boolean) (defaults to: false)

    if true, only include records that had errors

Returns:

  • (Tempfile, nil)

    temp file with error data, or nil if no errors



105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
# File 'lib/importu/sources/xml.rb', line 105

def write_errors(summary, only_errors: false)
  return unless summary.itemized_errors.any?

  @infile.rewind
  writer = Nokogiri::XML(@infile, &:nonet)
  writer.xpath("//_errors").remove

  itemized_errors = summary.itemized_errors
  writer.xpath(@records_xpath).each_with_index do |xml, index|
    if itemized_errors.key?(index)
      node = Nokogiri::XML::Node.new "_errors", writer
      node.content = itemized_errors[index].join(", ")
      xml.add_child(node)
    elsif only_errors
      xml.remove
    end
  end

  Tempfile.new("import").tap do |file|
    file.write(writer)
    file.rewind
  end
end