Class: HTML::Pipeline::Filter

Inherits:
Object
  • Object
show all
Defined in:
lib/html/pipeline/filter.rb

Overview

Base class for user content HTML filters. Each filter takes an HTML string or Nokogiri::HTML::DocumentFragment, performs modifications and/or writes information to the result hash. Filters must return a DocumentFragment (typically the same instance provided to the call method) or a String with HTML markup.

Example filter that replaces all images with trollface:

class FuuuFilter < HTML::Pipeline::Filter
  def call
    doc.search('img').each do |img|
      img['src'] = "http://paradoxdgn.com/junk/avatars/trollface.jpg"
    end
  end
end

The context Hash passes options to filters and should not be changed in place. A Result Hash allows filters to make extracted information available to the caller and is mutable.

Common context options:

:base_url   - The site's base URL
:repository - A Repository providing context for the HTML being processed

Each filter may define additional options and output values. See the class docs for more info.

Defined Under Namespace

Classes: InvalidDocumentException

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(doc, context = nil, result = nil) ⇒ Filter



32
33
34
35
36
37
38
39
40
41
42
43
# File 'lib/html/pipeline/filter.rb', line 32

def initialize(doc, context = nil, result = nil)
  if doc.kind_of?(String)
    @html = doc.to_str
    @doc = nil
  else
    @doc = doc
    @html = nil
  end
  @context = context || {}
  @result = result || {}
  validate
end

Instance Attribute Details

#contextObject (readonly)

Public: Returns a simple Hash used to pass extra information into filters and also to allow filters to make extracted information available to the caller.



48
49
50
# File 'lib/html/pipeline/filter.rb', line 48

def context
  @context
end

#resultObject (readonly)

Public: Returns a Hash used to allow filters to pass back information to callers of the various Pipelines. This can be used for #mentioned_users, for example.



53
54
55
# File 'lib/html/pipeline/filter.rb', line 53

def result
  @result
end

Class Method Details

.call(doc, context = nil, result = nil) ⇒ Object

Perform a filter on doc with the given context.

Returns a HTML::Pipeline::DocumentFragment or a String containing HTML markup.



136
137
138
# File 'lib/html/pipeline/filter.rb', line 136

def self.call(doc, context = nil, result = nil)
  new(doc, context, result).call
end

.to_document(input, context = nil) ⇒ Object

Like call but guarantees that a DocumentFragment is returned, even when the last filter returns a String.



142
143
144
145
# File 'lib/html/pipeline/filter.rb', line 142

def self.to_document(input, context = nil)
  html = call(input, context)
  HTML::Pipeline::parse(html)
end

.to_html(input, context = nil) ⇒ Object

Like call but guarantees that a string of HTML markup is returned.



148
149
150
151
152
153
154
155
# File 'lib/html/pipeline/filter.rb', line 148

def self.to_html(input, context = nil)
  output = call(input, context)
  if output.respond_to?(:to_html)
    output.to_html
  else
    output.to_s
  end
end

Instance Method Details

#base_urlObject

The site's base URL provided in the context hash, or '/' when no base URL was specified.



106
107
108
# File 'lib/html/pipeline/filter.rb', line 106

def base_url
  context[:base_url] || '/'
end

#callObject

The main filter entry point. The doc attribute is guaranteed to be a Nokogiri::HTML::DocumentFragment when invoked. Subclasses should modify this document in place or extract information and add it to the context hash.

Raises:

  • (NotImplementedError)


81
82
83
# File 'lib/html/pipeline/filter.rb', line 81

def call
  raise NotImplementedError
end

#current_userObject

The User object provided in the context hash, or nil when no user was specified



100
101
102
# File 'lib/html/pipeline/filter.rb', line 100

def current_user
  context[:current_user]
end

#docObject

The Nokogiri::HTML::DocumentFragment to be manipulated. If the filter was provided a String, parse into a DocumentFragment the first time this method is called.



58
59
60
# File 'lib/html/pipeline/filter.rb', line 58

def doc
  @doc ||= parse_html(html)
end

#has_ancestor?(node, tags) ⇒ Boolean

Helper method for filter subclasses used to determine if any of a node's ancestors have one of the tag names specified.

node - The Node object to check. tags - An array of tag name strings to check. These should be downcase.

Returns true when the node has a matching ancestor.



124
125
126
127
128
129
130
# File 'lib/html/pipeline/filter.rb', line 124

def has_ancestor?(node, tags)
  while node = node.parent
    if tags.include?(node.name.downcase)
      break true
    end
  end
end

#htmlObject

The String representation of the document. If a DocumentFragment was provided to the Filter, it is serialized into a String when this method is called.



72
73
74
75
# File 'lib/html/pipeline/filter.rb', line 72

def html
  raise InvalidDocumentException if @html.nil? && @doc.nil?
  @html || doc.to_html
end

#needs(*keys) ⇒ Object

Validator for required context. This will check that anything passed in contexts exists in @contexts

If any errors are found an ArgumentError will be raised with a message listing all the missing contexts and the filters that require them.



163
164
165
166
167
168
169
170
# File 'lib/html/pipeline/filter.rb', line 163

def needs(*keys)
  missing = keys.reject { |key| context.include? key }

  if missing.any?
    raise ArgumentError,
      "Missing context keys for #{self.class.name}: #{missing.map(&:inspect).join ', '}"
  end
end

#parse_html(html) ⇒ Object

Ensure the passed argument is a DocumentFragment. When a string is provided, it is parsed and returned; otherwise, the DocumentFragment is returned unmodified.



113
114
115
# File 'lib/html/pipeline/filter.rb', line 113

def parse_html(html)
  HTML::Pipeline.parse(html)
end

#repositoryObject

The Repository object provided in the context hash, or nil when no :repository was specified.

It's assumed that the repository context has already been checked for permissions



94
95
96
# File 'lib/html/pipeline/filter.rb', line 94

def repository
  context[:repository]
end

#search_text_nodes(doc) ⇒ Object

Searches a Nokogiri::HTML::DocumentFragment for text nodes. If no elements are found, a second search without root tags is invoked.



64
65
66
67
# File 'lib/html/pipeline/filter.rb', line 64

def search_text_nodes(doc)
  nodes = doc.xpath('.//text()')
  nodes.empty? ? doc.xpath('text()') : nodes
end

#validateObject

Make sure the context has everything we need. Noop: Subclasses can override.



86
87
# File 'lib/html/pipeline/filter.rb', line 86

def validate
end