Module: Newstile::Parser::Html::Constants
- Included in:
- ElementConverter, Parser
- Defined in:
- lib/newstile/parser/html.rb
Overview
Contains all constants that are used when parsing.
Constant Summary collapse
- HTML_DOCTYPE_RE =
:stopdoc: The following regexps are based on the ones used by REXML, with some slight modifications.
/<!DOCTYPE.*?>/m
- HTML_COMMENT_RE =
/<!--(.*?)-->/m
- HTML_INSTRUCTION_RE =
/<\?(.*?)\?>/m
- HTML_ATTRIBUTE_RE =
/\s*(#{REXML::Parsers::BaseParser::UNAME_STR})\s*=\s*(["'])(.*?)\2/m
- HTML_TAG_RE =
/<((?>#{REXML::Parsers::BaseParser::UNAME_STR}))\s*((?>\s+#{REXML::Parsers::BaseParser::UNAME_STR}\s*=\s*(["']).*?\3)*)\s*(\/)?>/m
- HTML_TAG_CLOSE_RE =
/<\/(#{REXML::Parsers::BaseParser::UNAME_STR})\s*>/m
- HTML_ENTITY_RE =
/&([\w:][\-\w\.:]*);|&#(\d+);|&\#x([0-9a-fA-F]+);/
- HTML_PARSE_AS_BLOCK =
%w{applet button blockquote body colgroup dd div dl fieldset form iframe li map noscript object ol table tbody thead tfoot tr td ul}
- HTML_PARSE_AS_SPAN =
%w{a abbr acronym address b bdo big cite caption del dfn dt em h1 h2 h3 h4 h5 h6 i ins kbd label legend optgroup p q rb rbc rp rt rtc ruby samp select small span strong sub sup th tt var}
- HTML_PARSE_AS_RAW =
%w{script math option textarea pre code}
- HTML_PARSE_AS =
Hash.new {|h,k| h[k] = :raw}
- HTML_SPAN_ELEMENTS =
Some HTML elements like script belong to both categories (i.e. are valid in block and span HTML) and don’t appear therefore!
%w{a abbr acronym b big bdo br button cite code del dfn em i img input ins kbd label option q rb rbc rp rt rtc ruby samp select small span strong sub sup textarea tt var}
- HTML_BLOCK_ELEMENTS =
%w{address article aside applet body button blockquote caption col colgroup dd div dl dt fieldset figcaption footer form h1 h2 h3 h4 h5 h6 header hgroup hr html head iframe legend listing menu li map nav ol optgroup p pre section summary table tbody td th thead tfoot tr ul}
- HTML_ELEMENTS_WITHOUT_BODY =
%w{area base br col command embed hr img input keygen link meta param source track wbr}