Class: Burly::Parsers::HtmlParser Private

Inherits:
Burly::Parser show all
Defined in:
lib/burly/parsers/html_parser.rb

This class is part of a private API. You should avoid using this class if possible, as it may be removed or be changed in the future.

Constant Summary collapse

SRCSET_ATTRIBUTES_MAP =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

A map of HTML srcset attributes and their associated element names.

{
  "imagesrcset" => ["link"],
  "srcset"      => ["img", "source"],
}.freeze
URL_ATTRIBUTES_MAP =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

A map of HTML URL attributes and their associated element names.

{
  "action"     => ["form"],
  "cite"       => ["blockquote", "del", "ins", "q"],
  "data"       => ["object"],
  "formaction" => ["button", "input"],
  "href"       => ["a", "area", "base", "link"],
  "ping"       => ["a", "area"],
  "poster"     => ["video"],
  "src"        => ["audio", "embed", "iframe", "img", "input", "script", "source", "track", "video"],
}.freeze
ATTRIBUTES_XPATHS =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

URL_ATTRIBUTES_MAP.merge(SRCSET_ATTRIBUTES_MAP).flat_map do |attribute, names|
  names.map { |name| ".//#{name} / @#{attribute}" }
end

Constants inherited from Burly::Parser

Burly::Parser::URI_PARSER, Burly::Parser::URI_REGEXP

Instance Method Summary collapse

Constructor Details

#initialize(document, context: nil) ⇒ HtmlParser

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Returns a new instance of HtmlParser.

Parameters:

  • context (String, Array<String>) (defaults to: nil)
  • document (String, #to_s)

    The document to parse for URLs.



41
42
43
44
45
# File 'lib/burly/parsers/html_parser.rb', line 41

def initialize(document, context: nil)
  @context = context

  super
end

Instance Method Details

#parseArray<String>

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Parse an HTML document for absolute or relative URLs.

Returns:

  • (Array<String>)


50
51
52
53
54
55
56
57
58
# File 'lib/burly/parsers/html_parser.rb', line 50

def parse
  attr_nodes.flat_map do |attr_node|
    if SRCSET_ATTRIBUTES_MAP.key?(attr_node.name)
      urls_from_candidate_strings(attr_node.value.split(/\s*,\s*/))
    else
      attr_node.value.strip
    end
  end
end