Module: XML
- Defined in:
- lib/webget_ruby_ramp/xml.rb
Overview
XML extensions
Class Method Summary collapse
-
.load_attributes(dirpath, xpath) ⇒ Object
Sugar to load attributes from a file.
-
.load_attributes_hash(dirpath, xpath) ⇒ Object
Sugar to load attributes hash from a file.
-
.load_dir(*dirpaths) ⇒ Object
Specify one or more directory patterns and pass each XML file in the matching directories to a block.
-
.load_elements(dirpath, xpath) ⇒ Object
Sugar to load elements from a file.
-
.strip_all(xml_text) ⇒ Object
Santize dirty xml by removing unprintables, bad tags, comments, and generally anything else we might need to enable the XML parser to handle a dirty document.
-
.strip_attributes(xml_text) ⇒ Object
Strip out all attributes from the xml text’s tags.
-
.strip_comments(xml_text) ⇒ Object
Strip out all comments from the xml text.
-
.strip_microsoft(xml_text) ⇒ Object
Strip out all microsoft proprietary codes.
-
.strip_unprintables(xml_text) ⇒ Object
Strip out all unprintable characters from the input string.
Class Method Details
.load_attributes(dirpath, xpath) ⇒ Object
Sugar to load attributes from a file.
Example
XML.load_attributes('config.xml','userlist/user'){|attributes| pp attributes['first_name'] }
56 57 58 59 60 |
# File 'lib/webget_ruby_ramp/xml.rb', line 56 def XML.load_attributes(dirpath,xpath) XML.load_elements(dirpath,xpath){|elem| yield elem.attributes } end |
.load_attributes_hash(dirpath, xpath) ⇒ Object
Sugar to load attributes hash from a file.
Example
XML.load_attributes('config.xml','userlist/user'){|attributes| pp attributes['first_name'] }
67 68 69 70 71 |
# File 'lib/webget_ruby_ramp/xml.rb', line 67 def XML.load_attributes_hash(dirpath,xpath) XML.load_elements(dirpath,xpath){|elem| yield elem.attributes.to_hash } end |
.load_dir(*dirpaths) ⇒ Object
Specify one or more directory patterns and pass each XML file in the matching directories to a block.
See [Dir#glob](www.ruby-doc.org/core/classes/Dir.html#M002347) for pattern details.
Example
XML.load_dir('/tmp/*.xml'){|xml_document|
#...whatever you want to do with each xml document
}
Example to load xml documents in files beginning in “foo” or “bar”
XML.load_dir('/tmp/foo*.yaml','/tmp/bar*.xml','){|xml_document|
#...whatever you want to do with the xml document
}
24 25 26 27 28 29 30 31 32 33 34 |
# File 'lib/webget_ruby_ramp/xml.rb', line 24 def XML.load_dir(*dirpaths) dirpaths=[*dirpaths.flatten] dirpaths.each do |dirpath| Dir[dirpath].sort.each do |filename| File.open(filename) do |file| doc = REXML::Document.new file yield doc end #file end #dir end #each end |
.load_elements(dirpath, xpath) ⇒ Object
Sugar to load elements from a file.
Example
XML.load_attributes('config.xml','userlist/user'){|element| pp element.attributes['first_name'] }
42 43 44 45 46 47 48 |
# File 'lib/webget_ruby_ramp/xml.rb', line 42 def XML.load_elements(dirpath,xpath) XML.load_dir(dirpath){|doc| doc.elements.each(xpath){|elem| yield elem } } end |
.strip_all(xml_text) ⇒ Object
Santize dirty xml by removing unprintables, bad tags, comments, and generally anything else we might need to enable the XML parser to handle a dirty document.
Example
# This example shows curly braces instead of angle braces because of HTML formatting
s="{foo a=b c=d}{!--comment--}Hello{!-[if bar]}Microsoft{![endif]}World{/foo}"
XML.strip_all(s) => "{foo}HelloWorld{/foo}"
This method calls these in order:
- XML.strip_unprintables
- XML.strip_microsoft
- XML.strip_comments
- XML.strip_attributes
89 90 91 |
# File 'lib/webget_ruby_ramp/xml.rb', line 89 def XML.strip_all(xml_text) return XML.strip_attributes(XML.strip_comments(XML.strip_microsoft(XML.strip_unprintables(xml_text)))) end |
.strip_attributes(xml_text) ⇒ Object
Strip out all attributes from the xml text’s tags.
Example
s="<foo a=b c=d e=f>Hello</foo>"
XML.strip_attributes(s) => "<foo>Hello</foo>"
100 101 102 |
# File 'lib/webget_ruby_ramp/xml.rb', line 100 def XML.strip_attributes(xml_text) return xml_text.gsub(/<(\/?\w+).*?>/im){"<#{$1}>"} # delete attributes end |
.strip_comments(xml_text) ⇒ Object
Strip out all comments from the xml text.
Example
# This example shows curly braces instead of angle braces because of HTML formatting
s="Hello{!--comment--}World"
XML.strip_comments(s) => "HelloWorld"
112 113 114 |
# File 'lib/webget_ruby_ramp/xml.rb', line 112 def XML.strip_comments(xml_text) return xml_text.gsub(/<!.*?>/im,'') end |
.strip_microsoft(xml_text) ⇒ Object
Strip out all microsoft proprietary codes.
Example
s="Hello<!-[if foo]>Microsoft<![endif]->World"
XML.strip_microsoft(s) => "HelloWorld"
123 124 125 |
# File 'lib/webget_ruby_ramp/xml.rb', line 123 def XML.strip_microsoft(xml_text) return xml_text.gsub(/<!-*\[if\b.*?<!\[endif\]-*>/im,'') end |
.strip_unprintables(xml_text) ⇒ Object
Strip out all unprintable characters from the input string.
Example
s="Hello\XXXWorld" # where XXX is unprintable
XML.strip_unprintables(s) => "HelloWorld"
134 135 136 |
# File 'lib/webget_ruby_ramp/xml.rb', line 134 def XML.strip_unprintables(xml_text) return xml_text.gsub(/[^[:print:]]/, "") end |