Module: Newstile::Parser::Html::Constants

Included in:
ElementConverter, Parser
Defined in:
lib/newstile/parser/html.rb

Overview

Contains all constants that are used when parsing.

Constant Summary collapse

HTML_DOCTYPE_RE =

:stopdoc: The following regexps are based on the ones used by REXML, with some slight modifications.

/<!DOCTYPE.*?>/m
HTML_COMMENT_RE =
/<!--(.*?)-->/m
HTML_INSTRUCTION_RE =
/<\?(.*?)\?>/m
HTML_ATTRIBUTE_RE =
/\s*(#{REXML::Parsers::BaseParser::UNAME_STR})\s*=\s*(["'])(.*?)\2/m
HTML_TAG_RE =
/<((?>#{REXML::Parsers::BaseParser::UNAME_STR}))\s*((?>\s+#{REXML::Parsers::BaseParser::UNAME_STR}\s*=\s*(["']).*?\3)*)\s*(\/)?>/m
HTML_TAG_CLOSE_RE =
/<\/(#{REXML::Parsers::BaseParser::UNAME_STR})\s*>/m
HTML_ENTITY_RE =
/&([\w:][\-\w\.:]*);|&#(\d+);|&\#x([0-9a-fA-F]+);/
HTML_PARSE_AS_BLOCK =
%w{applet button blockquote body colgroup dd div dl fieldset form iframe li
map noscript object ol table tbody thead tfoot tr td ul}
HTML_PARSE_AS_SPAN =
%w{a abbr acronym address b bdo big cite caption del dfn dt em
h1 h2 h3 h4 h5 h6 i ins kbd label legend optgroup p q rb rbc
rp rt rtc ruby samp select small span strong sub sup th tt var}
HTML_PARSE_AS_RAW =
%w{script math option textarea pre code}
HTML_PARSE_AS =
Hash.new {|h,k| h[k] = :raw}
HTML_SPAN_ELEMENTS =

Some HTML elements like script belong to both categories (i.e. are valid in block and span HTML) and don’t appear therefore!

%w{a abbr acronym b big bdo br button cite code del dfn em i img input
ins kbd label option q rb rbc rp rt rtc ruby samp select small span
strong sub sup textarea tt var}
HTML_BLOCK_ELEMENTS =
%w{address article aside applet body button blockquote caption col colgroup dd div dl dt fieldset
figcaption footer form h1 h2 h3 h4 h5 h6 header hgroup hr html head iframe legend listing menu
li map nav ol optgroup p pre section summary table tbody td th thead tfoot tr ul}
HTML_ELEMENTS_WITHOUT_BODY =
%w{area base br col command embed hr img input keygen link meta param source track wbr}