Class: CSVDiff::XMLSource

Inherits:
Source
  • Object
show all
Defined in:
lib/csv-diff/xml_source.rb

Overview

Convert XML content to CSV format using XPath selectors to identify the rows and field values in an XML document

Instance Attribute Summary collapse

Attributes inherited from Source

#case_sensitive, #child_field_indexes, #child_fields, #data, #dup_count, #field_names, #index, #key_field_indexes, #key_fields, #line_count, #lines, #parent_field_indexes, #parent_fields, #path, #skip_count, #trim_whitespace, #warnings

Instance Method Summary collapse

Methods inherited from Source

#[], #index_source, #path?, #save_csv, #to_hash

Constructor Details

#initialize(path, options = {}) ⇒ XMLSource

Create a new XMLSource, identified by path. Normally this is a path to the XML document, but any value is fine, as it is just a label to identify this data set.

Parameters:

  • path (String)

    A label for this data set (often a path to the XML document used as the source).

  • options (Hash) (defaults to: {})

    An options hash.

Options Hash (options):

  • :field_names (Array<String>)

    The names of each of the fields in source.

  • :ignore_header (Boolean)

    If true, and :field_names has been specified, then the first row of the file is ignored.

  • :key_field (String)

    The name of the field that uniquely identifies each row.

  • :key_fields (Array<String>)

    The names of the fields that uniquely identifies each row.

  • :parent_field (String)

    The name of the field(s) that identify a parent within which sibling order should be checked.

  • :child_field (String)

    The name of the field(s) that uniquely identify a child of a parent.

  • :case_sensitive (Boolean)

    If true (the default), keys are indexed as-is; if false, the index is built in upper-case for case-insensitive comparisons.

  • :include (Hash)

    A hash of field name(s) or index(es) to regular expression(s). Only source rows whose field values satisfy the regular expressions will be indexed and included in the diff process.

  • :exclude (Hash)

    A hash of field name(s) or index(es) to regular expression(s). Source rows with a field value that satisfies the regular expressions will be excluded from the diff process.

  • :context (String)

    A context value from which fields can be populated using a Regexp.



43
44
45
46
47
48
# File 'lib/csv-diff/xml_source.rb', line 43

def initialize(path, options = {})
    super(options)
    @path = path
    @context = options[:context]
    @data = []
end

Instance Attribute Details

#contextObject

Returns the value of attribute context.



11
12
13
# File 'lib/csv-diff/xml_source.rb', line 11

def context
  @context
end

Instance Method Details

#process(source, rec_xpath, field_maps, context = nil) ⇒ Object

Process a source, converting the XML into a table of data, using rec_xpath to identify the nodes that correspond each record that should appear in the output, and field_maps to populate each field in each row.

Parameters:

  • source (String|Array)

    may be a String containing XML content, an Array of paths to files containing XML content, or a path to a single file.

  • rec_xpath (String)

    An XPath expression that selects all the items in the XML document that are to be converted into new rows. The returned items are not directly used to populate the fields, but provide a context for the field XPath expreessions that populate each field’s content.

  • field_maps (Hash<String, String>)

    A map of field names to expressions that are evaluated in the context of each row node selected by rec_xpath. The field expressions are typically XPath expressions evaluated in the context of the nodes returned by the rec_xpath. Alternatively, a String that is not an XPath expression is used as a literal value for a field, while a Regexp can also be used to pull a value from any context specified in the options hash. The Regexp should include a single grouping, as the value used will be the result in $1 after the match is performed.

  • context (String) (defaults to: nil)

    An optional context for the XML to be processed. The value passed here can be referenced in field map expressions using a Regexp, with the value of the first grouping in the regex being the value returned for the field.



77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
# File 'lib/csv-diff/xml_source.rb', line 77

def process(source, rec_xpath, field_maps, context = nil)
    @field_names = field_maps.keys unless @field_names
    case source
    when Nokogiri::XML::Document
        add_data(source, rec_xpath, field_maps, context || @context)
    when /<\?xml/
        doc = Nokogiri::XML(source)
        add_data(doc, rec_xpath, field_maps, context || @context)
    when Array
        source.each{ |f| process_file(f, rec_xpath, field_maps) }
    when String
        process_file(source, rec_xpath, field_maps)
    else
        raise ArgumentError, "Unhandled source type #{source.class.name}"
    end
    @data
end