Class: HTML::Pipeline::SanitizationFilter

Inherits:
Filter
  • Object
show all
Defined in:
lib/html/pipeline/sanitization_filter.rb

Overview

HTML filter with sanization routines and whitelists. This module defines what HTML is allowed in user provided content and fixes up issues with unbalanced tags and whatnot.

See the Sanitize docs for more information on the underlying library:

github.com/rgrove/sanitize/#readme

Context options:

:whitelist - The sanitizer whitelist configuration to use. This can be one
             of the options constants defined in this class or a custom
             sanitize options hash.

This filter does not write additional information to the context.

Constant Summary collapse

WHITELIST =

The main sanitization whitelist. Only these elements and attributes are allowed through by default.

{
  :elements => %w(a abbr b blockquote br cite code dd del dfn div dl dt em
                  h1 h2 h3 h4 h5 h6 hr i img ins kbd li mark meter ol p pre
                  q s samp small source span strong sub sup table tbody td
                  tfooter th thead tr time ul var video wbr),
  :remove_contents => ['script'],
  :attributes => {
    :all         => ['data-after', 'data-id', 'id', 'title', 'class'],
    'a'          => ['href', 'name'],
    'blockquote' => ['cite'],
    'img'        => ['alt', 'height', 'src', 'width'],
    'q'          => ['cite'],
    'source'     => ['src', 'type', 'media'],
    'time'       => ['datetime'],
    'video'      => ['src', 'controls']
  },
  :protocols => {
    'a'          => {'href' => ['ftp', 'http', 'https', 'irc', 'mailto', 'xmpp', 'ed2k', 'magnet', 'tel', :relative]},
    'blockquote' => {'cite' => ['http', 'https', :relative]},
    'img'        => {'src'  => ['http', 'https', :relative]},
    'q'          => {'cite' => ['http', 'https', :relative]}
  }
}
LIMITED =

A more limited sanitization whitelist. This includes all attributes, protocols, and transformers from WHITELIST but with a more locked down set of allowed elements.

WHITELIST.merge(
  elements: %w[b i strong em a pre code img ins del sup sub p ol ul li]
)
FULL =

Strip all HTML tags from the document.

{ elements: [] }.freeze

Instance Attribute Summary

Attributes inherited from Filter

#context, #result

Instance Method Summary collapse

Methods inherited from Filter

#base_url, call, #current_user, #doc, #has_ancestor?, #html, #initialize, #needs, #parse_html, #repository, to_document, to_html, #validate

Constructor Details

This class inherits a constructor from HTML::Pipeline::Filter

Instance Method Details

#callObject

Sanitize markup using the Sanitize library.



60
61
62
# File 'lib/html/pipeline/sanitization_filter.rb', line 60

def call
  Sanitize.clean_node!(doc, whitelist)
end

#whitelistObject

The whitelist to use when sanitizing. This can be passed in the context hash to the filter but defaults to WHITELIST constant value above.



66
67
68
# File 'lib/html/pipeline/sanitization_filter.rb', line 66

def whitelist
  context[:whitelist] || WHITELIST
end