Class: HTML::Pipeline::Filter

Inherits:
Object
  • Object
show all
Defined in:
lib/html/pipeline/filter.rb

Overview

Base class for user content HTML filters. Each filter takes an HTML string or Nokogiri::HTML::DocumentFragment, performs modifications and/or writes information to the result hash. Filters must return a DocumentFragment (typically the same instance provided to the call method) or a String with HTML markup.

Example filter that replaces all images with trollface:

class FuuuFilter < HTML::Pipeline::Filter
  def call
    doc.search('img').each do |img|
      img['src'] = "http://paradoxdgn.com/junk/avatars/trollface.jpg"
    end
  end
end

The context Hash passes options to filters and should not be changed in place. A Result Hash allows filters to make extracted information available to the caller and is mutable.

Common context options:

:base_url   - The site's base URL
:repository - A Repository providing context for the HTML being processed

Each filter may define additional options and output values. See the class docs for more info.

Defined Under Namespace

Classes: InvalidDocumentException

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(doc, context = nil, result = nil) ⇒ Filter

Returns a new instance of Filter.



32
33
34
35
36
37
38
39
40
41
42
43
# File 'lib/html/pipeline/filter.rb', line 32

def initialize(doc, context = nil, result = nil)
  if doc.is_a?(String)
    @html = doc.to_str
    @doc = nil
  else
    @doc = doc
    @html = nil
  end
  @context = context || {}
  @result = result || {}
  validate
end

Instance Attribute Details

#contextObject (readonly)

Public: Returns a simple Hash used to pass extra information into filters and also to allow filters to make extracted information available to the caller.



48
49
50
# File 'lib/html/pipeline/filter.rb', line 48

def context
  @context
end

#resultObject (readonly)

Public: Returns a Hash used to allow filters to pass back information to callers of the various Pipelines. This can be used for #mentioned_users, for example.



53
54
55
# File 'lib/html/pipeline/filter.rb', line 53

def result
  @result
end

Class Method Details

.call(doc, context = nil, result = nil) ⇒ Object

Perform a filter on doc with the given context.

Returns a HTML::Pipeline::DocumentFragment or a String containing HTML markup.



126
127
128
# File 'lib/html/pipeline/filter.rb', line 126

def self.call(doc, context = nil, result = nil)
  new(doc, context, result).call
end

.to_document(input, context = nil) ⇒ Object

Like call but guarantees that a DocumentFragment is returned, even when the last filter returns a String.



132
133
134
135
# File 'lib/html/pipeline/filter.rb', line 132

def self.to_document(input, context = nil)
  html = call(input, context)
  HTML::Pipeline.parse(html)
end

.to_html(input, context = nil) ⇒ Object

Like call but guarantees that a string of HTML markup is returned.



138
139
140
141
142
143
144
145
# File 'lib/html/pipeline/filter.rb', line 138

def self.to_html(input, context = nil)
  output = call(input, context)
  if output.respond_to?(:to_html)
    output.to_html
  else
    output.to_s
  end
end

Instance Method Details

#base_urlObject

The site’s base URL provided in the context hash, or ‘/’ when no base URL was specified.



98
99
100
# File 'lib/html/pipeline/filter.rb', line 98

def base_url
  context[:base_url] || '/'
end

#callObject

The main filter entry point. The doc attribute is guaranteed to be a Nokogiri::HTML::DocumentFragment when invoked. Subclasses should modify this document in place or extract information and add it to the context hash.

Raises:

  • (NotImplementedError)


74
75
76
# File 'lib/html/pipeline/filter.rb', line 74

def call
  raise NotImplementedError
end

#current_userObject

The User object provided in the context hash, or nil when no user was specified



92
93
94
# File 'lib/html/pipeline/filter.rb', line 92

def current_user
  context[:current_user]
end

#docObject

The Nokogiri::HTML::DocumentFragment to be manipulated. If the filter was provided a String, parse into a DocumentFragment the first time this method is called.



58
59
60
# File 'lib/html/pipeline/filter.rb', line 58

def doc
  @doc ||= parse_html(html)
end

#has_ancestor?(node, tags) ⇒ Boolean

Helper method for filter subclasses used to determine if any of a node’s ancestors have one of the tag names specified.

node - The Node object to check. tags - An array of tag name strings to check. These should be downcase.

Returns true when the node has a matching ancestor.

Returns:

  • (Boolean)


116
117
118
119
120
# File 'lib/html/pipeline/filter.rb', line 116

def has_ancestor?(node, tags)
  while node = node.parent
    break true if tags.include?(node.name.downcase)
  end
end

#htmlObject

The String representation of the document. If a DocumentFragment was provided to the Filter, it is serialized into a String when this method is called.



65
66
67
68
# File 'lib/html/pipeline/filter.rb', line 65

def html
  raise InvalidDocumentException if @html.nil? && @doc.nil?
  @html || doc.to_html
end

#needs(*keys) ⇒ Object

Validator for required context. This will check that anything passed in contexts exists in @contexts

If any errors are found an ArgumentError will be raised with a message listing all the missing contexts and the filters that require them.



153
154
155
156
157
158
159
160
# File 'lib/html/pipeline/filter.rb', line 153

def needs(*keys)
  missing = keys.reject { |key| context.include? key }

  if missing.any?
    raise ArgumentError,
          "Missing context keys for #{self.class.name}: #{missing.map(&:inspect).join ', '}"
  end
end

#parse_html(html) ⇒ Object

Ensure the passed argument is a DocumentFragment. When a string is provided, it is parsed and returned; otherwise, the DocumentFragment is returned unmodified.



105
106
107
# File 'lib/html/pipeline/filter.rb', line 105

def parse_html(html)
  HTML::Pipeline.parse(html)
end

#repositoryObject

The Repository object provided in the context hash, or nil when no :repository was specified.

It’s assumed that the repository context has already been checked for permissions



86
87
88
# File 'lib/html/pipeline/filter.rb', line 86

def repository
  context[:repository]
end

#validateObject

Make sure the context has everything we need. Noop: Subclasses can override.



79
# File 'lib/html/pipeline/filter.rb', line 79

def validate; end