Class: HTML::Pipeline

Inherits:
Object
  • Object
show all
Defined in:
lib/html/pipeline.rb,
lib/html/pipeline/filter.rb,
lib/html/pipeline/version.rb,
lib/html/pipeline/toc_filter.rb,
lib/html/pipeline/camo_filter.rb,
lib/html/pipeline/text_filter.rb,
lib/html/pipeline/body_content.rb,
lib/html/pipeline/emoji_filter.rb,
lib/html/pipeline/https_filter.rb,
lib/html/pipeline/image_filter.rb,
lib/html/pipeline/textile_filter.rb,
lib/html/pipeline/@mention_filter.rb,
lib/html/pipeline/autolink_filter.rb,
lib/html/pipeline/markdown_filter.rb,
lib/html/pipeline/email_reply_filter.rb,
lib/html/pipeline/sanitization_filter.rb,
lib/html/pipeline/@team_mention_filter.rb,
lib/html/pipeline/absolute_source_filter.rb,
lib/html/pipeline/image_max_width_filter.rb,
lib/html/pipeline/plain_text_input_filter.rb,
lib/html/pipeline/syntax_highlight_filter.rb

Overview

GitHub HTML processing filters and utilities. This module includes a small framework for defining DOM based content filters and applying them to user provided content.

See HTML::Pipeline::Filter for information on building filters.

Construct a Pipeline for running multiple HTML filters. A pipeline is created once with one to many filters, and it then can be `call`ed many times over the course of its lifetime with input.

filters - Array of Filter objects. Each must respond to call(doc,

context) and return the modified DocumentFragment or a
String containing HTML markup. Filters are performed in the
order provided.

default_context - The default context hash. Values specified here will be merged

into values from the each individual pipeline run.  Can NOT be
nil.  Default: empty Hash.

result_class - The default Class of the result object for individual

calls.  Default: Hash.  Protip:  Pass in a Struct to get
some semblance of type safety.

Defined Under Namespace

Classes: AbsoluteSourceFilter, AutolinkFilter, BodyContent, CamoFilter, EmailReplyFilter, EmojiFilter, Filter, HttpsFilter, ImageFilter, ImageMaxWidthFilter, MarkdownFilter, MentionFilter, MissingDependencyError, PlainTextInputFilter, SanitizationFilter, SyntaxHighlightFilter, TableOfContentsFilter, TeamMentionFilter, TextFilter, TextileFilter

Constant Summary collapse

DocumentFragment =

Our DOM implementation.

Nokogiri::HTML::DocumentFragment
VERSION =
'2.12.0'.freeze

Class Attribute Summary collapse

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(filters, default_context = {}, result_class = nil) ⇒ Pipeline

Returns a new instance of Pipeline

Raises:

  • (ArgumentError)

90
91
92
93
94
95
96
# File 'lib/html/pipeline.rb', line 90

def initialize(filters, default_context = {}, result_class = nil)
  raise ArgumentError, 'default_context cannot be nil' if default_context.nil?
  @filters = filters.flatten.freeze
  @default_context = default_context.freeze
  @result_class = result_class || Hash
  @instrumentation_service = self.class.default_instrumentation_service
end

Class Attribute Details

.default_instrumentation_serviceObject

Public: Default instrumentation service for new pipeline objects.


87
88
89
# File 'lib/html/pipeline.rb', line 87

def default_instrumentation_service
  @default_instrumentation_service
end

Instance Attribute Details

#filtersObject (readonly)

Public: Returns an Array of Filter objects for this Pipeline.


72
73
74
# File 'lib/html/pipeline.rb', line 72

def filters
  @filters
end

#instrumentation_nameObject


80
81
82
83
# File 'lib/html/pipeline.rb', line 80

def instrumentation_name
  return @instrumentation_name if defined?(@instrumentation_name)
  @instrumentation_name = self.class.name
end

#instrumentation_serviceObject

Public: Instrumentation service for the pipeline. Set an ActiveSupport::Notifications compatible object to enable.


76
77
78
# File 'lib/html/pipeline.rb', line 76

def instrumentation_service
  @instrumentation_service
end

Class Method Details

.parse(document_or_html) ⇒ Object

Parse a String into a DocumentFragment object. When a DocumentFragment is provided, return it verbatim.


62
63
64
65
66
67
68
69
# File 'lib/html/pipeline.rb', line 62

def self.parse(document_or_html)
  document_or_html ||= ''
  if document_or_html.is_a?(String)
    DocumentFragment.parse(document_or_html)
  else
    document_or_html
  end
end

.require_dependency(name, requirer) ⇒ Object


50
51
52
53
54
55
# File 'lib/html/pipeline.rb', line 50

def self.require_dependency(name, requirer)
  require name
rescue LoadError => e
  raise MissingDependencyError,
        "Missing dependency '#{name}' for #{requirer}. See README.md for details.\n#{e.class.name}: #{e}"
end

Instance Method Details

#call(html, context = {}, result = nil) ⇒ Object

Apply all filters in the pipeline to the given HTML.

html - A String containing HTML or a DocumentFragment object. context - The context hash passed to each filter. See the Filter docs

for more info on possible values. This object MUST NOT be modified
in place by filters.  Use the Result for passing state back.

result - The result Hash passed to each filter for modification. This

is where Filters store extracted information from the content.

Returns the result Hash after being filtered by this Pipeline. Contains an :output key with the DocumentFragment or String HTML markup based on the output of the last filter in the pipeline.


110
111
112
113
114
115
116
117
118
119
120
121
122
123
# File 'lib/html/pipeline.rb', line 110

def call(html, context = {}, result = nil)
  context = @default_context.merge(context)
  context = context.freeze
  result ||= @result_class.new
  payload = default_payload filters: @filters.map(&:name),
                            context: context, result: result
  instrument 'call_pipeline.html_pipeline', payload do
    result[:output] =
      @filters.inject(html) do |doc, filter|
        perform_filter(filter, doc, context, result)
      end
  end
  result
end

#default_payload(payload = {}) ⇒ Object

Internal: Default payload for instrumentation.

Accepts a Hash of additional payload data to be merged.

Returns a Hash.


183
184
185
# File 'lib/html/pipeline.rb', line 183

def default_payload(payload = {})
  { pipeline: instrumentation_name }.merge(payload)
end

#instrument(event, payload = nil) ⇒ Object

Internal: if the `instrumentation_service` object is set, instruments the block, otherwise the block is ran without instrumentation.

Returns the result of the provided block.


170
171
172
173
174
175
176
# File 'lib/html/pipeline.rb', line 170

def instrument(event, payload = nil)
  payload ||= default_payload
  return yield(payload) unless instrumentation_service
  instrumentation_service.instrument event, payload do |payload|
    yield payload
  end
end

#perform_filter(filter, doc, context, result) ⇒ Object

Internal: Applies a specific filter to the supplied doc.

The filter is instrumented.

Returns the result of the filter.


130
131
132
133
134
135
136
# File 'lib/html/pipeline.rb', line 130

def perform_filter(filter, doc, context, result)
  payload = default_payload filter: filter.name,
                            context: context, result: result
  instrument 'call_filter.html_pipeline', payload do
    filter.call(doc, context, result)
  end
end

#setup_instrumentation(name = nil, service = nil) ⇒ Object

Public: setup instrumentation for this pipeline.

Returns nothing.


160
161
162
163
164
# File 'lib/html/pipeline.rb', line 160

def setup_instrumentation(name = nil, service = nil)
  self.instrumentation_name = name
  self.instrumentation_service =
    service || self.class.default_instrumentation_service
end

#to_document(input, context = {}, result = nil) ⇒ Object

Like call but guarantee the value returned is a DocumentFragment. Pipelines may return a DocumentFragment or a String. Callers that need a DocumentFragment should use this method.


141
142
143
144
# File 'lib/html/pipeline.rb', line 141

def to_document(input, context = {}, result = nil)
  result = call(input, context, result)
  HTML::Pipeline.parse(result[:output])
end

#to_html(input, context = {}, result = nil) ⇒ Object

Like call but guarantee the value returned is a string of HTML markup.


147
148
149
150
151
152
153
154
155
# File 'lib/html/pipeline.rb', line 147

def to_html(input, context = {}, result = nil)
  result = call(input, context, result = nil)
  output = result[:output]
  if output.respond_to?(:to_html)
    output.to_html
  else
    output.to_s
  end
end