Module: XML

Defined in:
lib/sixarm_ruby_xml_strip.rb

Class Method Summary collapse

Class Method Details

.strip_all(xml_text) ⇒ String

Santize dirty xml by removing unprintables, bad tags, comments, and generally anything else we might need to enable the XML parser to handle a dirty document.

This method calls these in order:

- XML.strip_unprintables
- XML.strip_microsoft
- XML.strip_comments
- XML.strip_attributes

Examples:

# This example shows curly braces instead of angle braces because of HTML formatting
s="{foo a=b c=d}{!--comment--}Hello{!-[if bar]}Microsoft{![endif]}World{/foo}"
XML.strip_all(s) => "{foo}HelloWorld{/foo}"


27
28
29
# File 'lib/sixarm_ruby_xml_strip.rb', line 27

def XML.strip_all(xml_text)
  return XML.strip_attributes(XML.strip_comments(XML.strip_microsoft(XML.strip_unprintables(xml_text))))
end

.strip_attributes(xml_text) ⇒ String

Strip out all attributes from the XML/HTML input string.

Examples:

s="<foo a=b c=d e=f>Hello</foo>"
XML.strip_attributes(s) => "<foo>Hello</foo>"


40
41
42
# File 'lib/sixarm_ruby_xml_strip.rb', line 40

def XML.strip_attributes(xml_text)
  return xml_text.gsub(/<(\/?\w+).*?>/im){"<#{$1}>"}  # delete attributes
end

.strip_comments(xml_text) ⇒ String

Strip out all XML/HTML comments from the XML/HTML input string.

Examples:

# This example shows curly braces instead of angle braces because of HTML formatting
s="Hello{!--comment--}World"
XML.strip_comments(s) => "HelloWorld"


54
55
56
# File 'lib/sixarm_ruby_xml_strip.rb', line 54

def XML.strip_comments(xml_text)
  return xml_text.gsub(/<!.*?>/im,'')  
end

.strip_microsoft(xml_text) ⇒ String

Strip out all microsoft proprietary codes from the XML/HTLM input string.

Examples:

s="Hello<!-[if foo]>Microsoft<![endif]->World"
XML.strip_microsoft(s) => "HelloWorld"


67
68
69
# File 'lib/sixarm_ruby_xml_strip.rb', line 67

def XML.strip_microsoft(xml_text)
  return xml_text.gsub(/<!-*\[if\b.*?<!\[endif\]-*>/im,'')
end

.strip_unprintables(xml_text) ⇒ String

Strip out all unprintable characters from the XML/HTML input string.

Examples:

s="Hello\XXXWorld" # where XXX is unprintable
XML.strip_unprintables(s) => "HelloWorld"


80
81
82
# File 'lib/sixarm_ruby_xml_strip.rb', line 80

def XML.strip_unprintables(xml_text)
  return xml_text.gsub(/[^[:print:]]/, "")
end