Module: XML
- Defined in:
- lib/sixarm_ruby_xml_strip.rb
Class Method Summary collapse
-
.strip_all(xml_text) ⇒ String
Santize dirty xml by removing unprintables, bad tags, comments, and generally anything else we might need to enable the XML parser to handle a dirty document.
-
.strip_attributes(xml_text) ⇒ String
Strip out all attributes from the XML/HTML input string.
-
.strip_comments(xml_text) ⇒ String
Strip out all XML/HTML comments from the XML/HTML input string.
-
.strip_microsoft(xml_text) ⇒ String
Strip out all microsoft proprietary codes from the XML/HTLM input string.
-
.strip_unprintables(xml_text) ⇒ String
Strip out all unprintable characters from the XML/HTML input string.
Class Method Details
.strip_all(xml_text) ⇒ String
Santize dirty xml by removing unprintables, bad tags, comments, and generally anything else we might need to enable the XML parser to handle a dirty document.
This method calls these in order:
- XML.strip_unprintables
- XML.strip_microsoft
- XML.strip_comments
- XML.strip_attributes
27 28 29 |
# File 'lib/sixarm_ruby_xml_strip.rb', line 27 def XML.strip_all(xml_text) return XML.strip_attributes(XML.strip_comments(XML.strip_microsoft(XML.strip_unprintables(xml_text)))) end |
.strip_attributes(xml_text) ⇒ String
Strip out all attributes from the XML/HTML input string.
40 41 42 |
# File 'lib/sixarm_ruby_xml_strip.rb', line 40 def XML.strip_attributes(xml_text) return xml_text.gsub(/<(\/?\w+).*?>/im){"<#{$1}>"} # delete attributes end |
.strip_comments(xml_text) ⇒ String
Strip out all XML/HTML comments from the XML/HTML input string.
54 55 56 |
# File 'lib/sixarm_ruby_xml_strip.rb', line 54 def XML.strip_comments(xml_text) return xml_text.gsub(/<!.*?>/im,'') end |
.strip_microsoft(xml_text) ⇒ String
Strip out all microsoft proprietary codes from the XML/HTLM input string.
67 68 69 |
# File 'lib/sixarm_ruby_xml_strip.rb', line 67 def XML.strip_microsoft(xml_text) return xml_text.gsub(/<!-*\[if\b.*?<!\[endif\]-*>/im,'') end |
.strip_unprintables(xml_text) ⇒ String
Strip out all unprintable characters from the XML/HTML input string.
80 81 82 |
# File 'lib/sixarm_ruby_xml_strip.rb', line 80 def XML.strip_unprintables(xml_text) return xml_text.gsub(/[^[:print:]]/, "") end |