Class: Loofah::Scrubbers::Whitewash

Inherits:
Loofah::Scrubber show all
Defined in:
lib/loofah/scrubbers.rb

Overview

scrub!(:whitewash)

+:whitewash+ removes all comments, styling and attributes in
addition to doing markup-fixer-uppery and pruning unsafe tags. I
like to call this "whitewashing", since it's like putting a new
layer of paint on top of the HTML input to make it look nice.

   messy_markup = "ohai! <div id='foo' class='bar' style='margin: 10px'>div with attributes</div>"
   Loofah.html5_fragment(messy_markup).scrub!(:whitewash)
   => "ohai! <div>div with attributes</div>"

One use case for this scrubber is to clean up HTML that was
cut-and-pasted from Microsoft Word into a WYSIWYG editor or a
rich text editor. Microsoft's software is famous for injecting
all kinds of cruft into its HTML output. Who needs that crap?
Certainly not me.

Constant Summary

Constants inherited from Loofah::Scrubber

Loofah::Scrubber::CONTINUE, Loofah::Scrubber::STOP

Instance Attribute Summary

Attributes inherited from Loofah::Scrubber

#block, #direction

Instance Method Summary collapse

Methods inherited from Loofah::Scrubber

#append_attribute, #traverse

Constructor Details

#initializeWhitewash

rubocop:disable Lint/MissingSuper



192
193
194
# File 'lib/loofah/scrubbers.rb', line 192

def initialize # rubocop:disable Lint/MissingSuper
  @direction = :top_down
end

Instance Method Details

#scrub(node) ⇒ Object



196
197
198
199
200
201
202
203
204
205
206
207
208
# File 'lib/loofah/scrubbers.rb', line 196

def scrub(node)
  case node.type
  when Nokogiri::XML::Node::ELEMENT_NODE
    if HTML5::Scrub.allowed_element?(node.name)
      node.attributes.each { |attr| node.remove_attribute(attr.first) }
      return CONTINUE if node.namespaces.empty?
    end
  when Nokogiri::XML::Node::TEXT_NODE, Nokogiri::XML::Node::CDATA_SECTION_NODE
    return CONTINUE
  end
  node.remove
  STOP
end