Class: HTML::Pipeline::Filter

Inherits:
Object
  • Object
show all
Defined in:
lib/html/pipeline/filter.rb

Overview

Base class for user content HTML filters. Each filter takes an HTML string or Nokogiri::HTML::DocumentFragment, performs modifications and/or writes information to the result hash. Filters must return a DocumentFragment (typically the same instance provided to the call method) or a String with HTML markup.

Example filter that replaces all images with trollface:

class FuuuFilter < HTML::Pipeline::Filter
  def call
    doc.search('img').each do |img|
      img['src'] = "http://paradoxdgn.com/junk/avatars/trollface.jpg"
    end
  end
end

The context Hash passes options to filters and should not be changed in place. A Result Hash allows filters to make extracted information available to the caller and is mutable.

Common context options:

:base_url   - The site's base URL
:repository - A Repository providing context for the HTML being processed

Each filter may define additional options and output values. See the class docs for more info.

Defined Under Namespace

Classes: InvalidDocumentException

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(doc, context = nil, result = nil) ⇒ Filter


34
35
36
37
38
39
40
41
42
43
44
45
# File 'lib/html/pipeline/filter.rb', line 34

def initialize(doc, context = nil, result = nil)
  if doc.is_a?(String)
    @html = doc.to_str
    @doc = nil
  else
    @doc = doc
    @html = nil
  end
  @context = context || {}
  @result = result || {}
  validate
end

Instance Attribute Details

#contextObject (readonly)

Public: Returns a simple Hash used to pass extra information into filters and also to allow filters to make extracted information available to the caller.


50
51
52
# File 'lib/html/pipeline/filter.rb', line 50

def context
  @context
end

#resultObject (readonly)

Public: Returns a Hash used to allow filters to pass back information to callers of the various Pipelines. This can be used for #mentioned_users, for example.


55
56
57
# File 'lib/html/pipeline/filter.rb', line 55

def result
  @result
end

Class Method Details

.call(doc, context = nil, result = nil) ⇒ Object

Perform a filter on doc with the given context.

Returns a HTML::Pipeline::DocumentFragment or a String containing HTML markup.


128
129
130
# File 'lib/html/pipeline/filter.rb', line 128

def self.call(doc, context = nil, result = nil)
  new(doc, context, result).call
end

.to_document(input, context = nil) ⇒ Object

Like call but guarantees that a DocumentFragment is returned, even when the last filter returns a String.


134
135
136
137
# File 'lib/html/pipeline/filter.rb', line 134

def self.to_document(input, context = nil)
  html = call(input, context)
  HTML::Pipeline.parse(html)
end

.to_html(input, context = nil) ⇒ Object

Like call but guarantees that a string of HTML markup is returned.


140
141
142
143
144
145
146
147
# File 'lib/html/pipeline/filter.rb', line 140

def self.to_html(input, context = nil)
  output = call(input, context)
  if output.respond_to?(:to_html)
    output.to_html
  else
    output.to_s
  end
end

Instance Method Details

#base_urlObject

The site's base URL provided in the context hash, or '/' when no base URL was specified.


100
101
102
# File 'lib/html/pipeline/filter.rb', line 100

def base_url
  context[:base_url] || '/'
end

#callObject

The main filter entry point. The doc attribute is guaranteed to be a Nokogiri::HTML::DocumentFragment when invoked. Subclasses should modify this document in place or extract information and add it to the context hash.

Raises:

  • (NotImplementedError)

76
77
78
# File 'lib/html/pipeline/filter.rb', line 76

def call
  raise NotImplementedError
end

#current_userObject

The User object provided in the context hash, or nil when no user was specified


94
95
96
# File 'lib/html/pipeline/filter.rb', line 94

def current_user
  context[:current_user]
end

#docObject

The Nokogiri::HTML::DocumentFragment to be manipulated. If the filter was provided a String, parse into a DocumentFragment the first time this method is called.


60
61
62
# File 'lib/html/pipeline/filter.rb', line 60

def doc
  @doc ||= parse_html(html)
end

#has_ancestor?(node, tags) ⇒ Boolean

Helper method for filter subclasses used to determine if any of a node's ancestors have one of the tag names specified.

node - The Node object to check. tags - An array of tag name strings to check. These should be downcase.

Returns true when the node has a matching ancestor.


118
119
120
121
122
# File 'lib/html/pipeline/filter.rb', line 118

def has_ancestor?(node, tags)
  while node = node.parent
    break true if tags.include?(node.name.downcase)
  end
end

#htmlObject

The String representation of the document. If a DocumentFragment was provided to the Filter, it is serialized into a String when this method is called.


67
68
69
70
# File 'lib/html/pipeline/filter.rb', line 67

def html
  raise InvalidDocumentException if @html.nil? && @doc.nil?
  @html || doc.to_html
end

#needs(*keys) ⇒ Object

Validator for required context. This will check that anything passed in contexts exists in @contexts

If any errors are found an ArgumentError will be raised with a message listing all the missing contexts and the filters that require them.


155
156
157
158
159
160
161
162
# File 'lib/html/pipeline/filter.rb', line 155

def needs(*keys)
  missing = keys.reject { |key| context.include? key }

  if missing.any?
    raise ArgumentError,
          "Missing context keys for #{self.class.name}: #{missing.map(&:inspect).join ', '}"
  end
end

#parse_html(html) ⇒ Object

Ensure the passed argument is a DocumentFragment. When a string is provided, it is parsed and returned; otherwise, the DocumentFragment is returned unmodified.


107
108
109
# File 'lib/html/pipeline/filter.rb', line 107

def parse_html(html)
  HTML::Pipeline.parse(html)
end

#repositoryObject

The Repository object provided in the context hash, or nil when no :repository was specified.

It's assumed that the repository context has already been checked for permissions


88
89
90
# File 'lib/html/pipeline/filter.rb', line 88

def repository
  context[:repository]
end

#validateObject

Make sure the context has everything we need. Noop: Subclasses can override.


81
# File 'lib/html/pipeline/filter.rb', line 81

def validate; end