Class: HTMLEntities

Inherits:
Object
  • Object
show all
Defined in:
lib/htmlentities.rb,
lib/htmlentities/decoder.rb,
lib/htmlentities/encoder.rb,
lib/htmlentities/flavors.rb,
lib/htmlentities/version.rb,
lib/htmlentities/mappings/html4.rb,
lib/htmlentities/mappings/xhtml1.rb,
lib/htmlentities/mappings/expanded.rb

Overview

HTML entity encoding and decoding for Ruby

Defined Under Namespace

Modules: VERSION Classes: Decoder, Encoder

Constant Summary collapse

UnknownFlavor =
Class.new(RuntimeError)
InstructionError =
Class.new(RuntimeError)
FLAVORS =
%w[html4 xhtml1 expanded]
MAPPINGS =
{}
SKIP_DUP_ENCODINGS =
{}

Instance Method Summary collapse

Constructor Details

#initialize(flavor = 'xhtml1') ⇒ HTMLEntities

Create a new HTMLEntities coder for the specified flavor. Available flavors are ‘html4’, ‘expanded’ and ‘xhtml1’ (the default).

The only difference in functionality between html4 and xhtml1 is in the handling of the apos (apostrophe) named entity, which is not defined in HTML4.

‘expanded’ includes a large number of additional SGML entities drawn from

ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/SGML.TXT

it “maps SGML character entities from various public sets (namely, ISOamsa, ISOamsb, ISOamsc, ISOamsn, ISOamso, ISOamsr, ISObox, ISOcyr1, ISOcyr2, ISOdia, ISOgrk1, ISOgrk2, ISOgrk3, ISOgrk4, ISOlat1, ISOlat2, ISOnum, ISOpub, ISOtech, HTMLspecial, HTMLsymbol) to corresponding Unicode characters.” (sgml.txt).

‘expanded’ is a strict superset of the XHTML entities: every xhtml named entity encodes and decodes the same under :expanded as under :xhtml1

Raises:



32
33
34
35
# File 'lib/htmlentities.rb', line 32

def initialize(flavor='xhtml1')
  @flavor = flavor.to_s.downcase
  raise UnknownFlavor, "Unknown flavor #{flavor}" unless FLAVORS.include?(@flavor)
end

Instance Method Details

#decode(source) ⇒ Object

Decode entities in a string into their UTF-8 equivalents. The string should already be in UTF-8 encoding.

Unknown named entities will not be converted



43
44
45
# File 'lib/htmlentities.rb', line 43

def decode(source)
  (@decoder ||= Decoder.new(@flavor)).decode(source)
end

#encode(source, *instructions) ⇒ Object

Encode codepoints into their corresponding entities. Various operations are possible, and may be specified in order:

:basic

Convert the five XML entities (‘“<>&)

:named

Convert non-ASCII characters to their named HTML 4.01 equivalent

:decimal

Convert non-ASCII characters to decimal entities (e.g. &#1234;)

:hexadecimal

Convert non-ASCII characters to hexadecimal entities (e.g. # &#x12ab;)

You can specify the commands in any order, but they will be executed in the order listed above to ensure that entity ampersands are not clobbered and that named entities are replaced before numeric ones.

If no instructions are specified, :basic will be used.

Examples:

encode(str) - XML-safe
encode(str, :basic, :decimal) - XML-safe and 7-bit clean
encode(str, :basic, :named, :decimal) - 7-bit clean, with all
non-ASCII characters replaced with their named entity where possible, and
decimal equivalents otherwise.

Note: It is the program’s responsibility to ensure that the source contains valid UTF-8 before calling this method.



72
73
74
# File 'lib/htmlentities.rb', line 72

def encode(source, *instructions)
  Encoder.new(@flavor, instructions).encode(source)
end