Module: XML

Defined in:: lib/webget_ruby_ramp/xml.rb

Overview

XML extensions

Class Method Summary collapse

.load_attributes(dirpath, xpath) ⇒ Object

Sugar to load attributes from a file.
.load_attributes_hash(dirpath, xpath) ⇒ Object

Sugar to load attributes hash from a file.
.load_dir(*dirpaths) ⇒ Object

Specify one or more directory patterns and pass each XML file in the matching directories to a block.
.load_elements(dirpath, xpath) ⇒ Object

Sugar to load elements from a file.
.strip_all(xml_text) ⇒ Object

Santize dirty xml by removing unprintables, bad tags, comments, and generally anything else we might need to enable the XML parser to handle a dirty document.
.strip_attributes(xml_text) ⇒ Object

Strip out all attributes from the xml text’s tags.
.strip_comments(xml_text) ⇒ Object

Strip out all comments from the xml text.
.strip_microsoft(xml_text) ⇒ Object

Strip out all microsoft proprietary codes.
.strip_unprintables(xml_text) ⇒ Object

Strip out all unprintable characters from the input string.

Class Method Details

.load_attributes(dirpath, xpath) ⇒ `Object`

Sugar to load attributes from a file.

Example

XML.load_attributes('config.xml','userlist/user'){|attributes| pp attributes['first_name'] }

# File 'lib/webget_ruby_ramp/xml.rb', line 56

def XML.load_attributes(dirpath,xpath)
  XML.load_elements(dirpath,xpath){|elem|
    yield elem.attributes
  }
end

.load_attributes_hash(dirpath, xpath) ⇒ `Object`

Sugar to load attributes hash from a file.

Example

XML.load_attributes('config.xml','userlist/user'){|attributes| pp attributes['first_name'] }

# File 'lib/webget_ruby_ramp/xml.rb', line 67

def XML.load_attributes_hash(dirpath,xpath)
  XML.load_elements(dirpath,xpath){|elem|
    yield elem.attributes.to_hash
  }
end

.load_dir(*dirpaths) ⇒ `Object`

Specify one or more directory patterns and pass each XML file in the matching directories to a block.

See [Dir#glob](www.ruby-doc.org/core/classes/Dir.html#M002347) for pattern details.

Example

XML.load_dir('/tmp/*.xml'){|xml_document|
  #...whatever you want to do with each xml document
}

Example to load xml documents in files beginning in “foo” or “bar”

XML.load_dir('/tmp/foo*.yaml','/tmp/bar*.xml','){|xml_document|
  #...whatever you want to do with the xml document
}

# File 'lib/webget_ruby_ramp/xml.rb', line 24

def XML.load_dir(*dirpaths)
  dirpaths=[*dirpaths.flatten]
  dirpaths.each do |dirpath|
    Dir[dirpath].sort.each do |filename|
      File.open(filename) do |file|
        doc = REXML::Document.new file
        yield doc
      end #file
    end #dir
  end #each
end

.load_elements(dirpath, xpath) ⇒ `Object`

Sugar to load elements from a file.

Example

XML.load_attributes('config.xml','userlist/user'){|element| pp element.attributes['first_name'] }

# File 'lib/webget_ruby_ramp/xml.rb', line 42

def XML.load_elements(dirpath,xpath)
  XML.load_dir(dirpath){|doc|
    doc.elements.each(xpath){|elem|
      yield elem
    }
  }
end

.strip_all(xml_text) ⇒ `Object`

Santize dirty xml by removing unprintables, bad tags, comments, and generally anything else we might need to enable the XML parser to handle a dirty document.

Example

# This example shows curly braces instead of angle braces because of HTML formatting
s="{foo a=b c=d}{!--comment--}Hello{!-[if bar]}Microsoft{![endif]}World{/foo}"
XML.strip_all(s) => "{foo}HelloWorld{/foo}"

This method calls these in order:

- XML.strip_unprintables
- XML.strip_microsoft
- XML.strip_comments
- XML.strip_attributes



89
90
91

# File 'lib/webget_ruby_ramp/xml.rb', line 89

def XML.strip_all(xml_text)
  return XML.strip_attributes(XML.strip_comments(XML.strip_microsoft(XML.strip_unprintables(xml_text))))
end

.strip_attributes(xml_text) ⇒ `Object`

Strip out all attributes from the xml text’s tags.

Example

s="<foo a=b c=d e=f>Hello</foo>"
XML.strip_attributes(s) => "<foo>Hello</foo>"



100
101
102

# File 'lib/webget_ruby_ramp/xml.rb', line 100

def XML.strip_attributes(xml_text)
  return xml_text.gsub(/<(\/?\w+).*?>/im){"<#{$1}>"}  # delete attributes
end

.strip_comments(xml_text) ⇒ `Object`

Strip out all comments from the xml text.

Example

# This example shows curly braces instead of angle braces because of HTML formatting
s="Hello{!--comment--}World"
XML.strip_comments(s) => "HelloWorld"



112
113
114

# File 'lib/webget_ruby_ramp/xml.rb', line 112

def XML.strip_comments(xml_text)
  return xml_text.gsub(/<!.*?>/im,'')  
end

.strip_microsoft(xml_text) ⇒ `Object`

Strip out all microsoft proprietary codes.

Example

s="Hello<!-[if foo]>Microsoft<![endif]->World"
XML.strip_microsoft(s) => "HelloWorld"



123
124
125

# File 'lib/webget_ruby_ramp/xml.rb', line 123

def XML.strip_microsoft(xml_text)
  return xml_text.gsub(/<!-*\[if\b.*?<!\[endif\]-*>/im,'')
end

.strip_unprintables(xml_text) ⇒ `Object`

Strip out all unprintable characters from the input string.

Example

s="Hello\XXXWorld" # where XXX is unprintable
XML.strip_unprintables(s) => "HelloWorld"



134
135
136

# File 'lib/webget_ruby_ramp/xml.rb', line 134

def XML.strip_unprintables(xml_text)
  return xml_text.gsub(/[^[:print:]]/, "")
end

Module: XML

Overview

Class Method Summary collapse

Class Method Details

.load_attributes(dirpath, xpath) ⇒ Object

Example

.load_attributes_hash(dirpath, xpath) ⇒ Object

Example

.load_dir(*dirpaths) ⇒ Object

Example

Example to load xml documents in files beginning in “foo” or “bar”

.load_elements(dirpath, xpath) ⇒ Object

Example

.strip_all(xml_text) ⇒ Object

Example

.strip_attributes(xml_text) ⇒ Object

Example

.strip_comments(xml_text) ⇒ Object

Example

.strip_microsoft(xml_text) ⇒ Object

Example

.strip_unprintables(xml_text) ⇒ Object

Example

.load_attributes(dirpath, xpath) ⇒ `Object`

.load_attributes_hash(dirpath, xpath) ⇒ `Object`

.load_dir(*dirpaths) ⇒ `Object`

.load_elements(dirpath, xpath) ⇒ `Object`

.strip_all(xml_text) ⇒ `Object`

.strip_attributes(xml_text) ⇒ `Object`

.strip_comments(xml_text) ⇒ `Object`

.strip_microsoft(xml_text) ⇒ `Object`

.strip_unprintables(xml_text) ⇒ `Object`