Class: HTMLEntities
- Inherits:
-
Object
- Object
- HTMLEntities
- Defined in:
- lib/htmlentities.rb,
lib/htmlentities/legacy.rb,
lib/htmlentities/decoder.rb,
lib/htmlentities/encoder.rb,
lib/htmlentities/flavors.rb,
lib/htmlentities/version.rb,
lib/htmlentities/mappings/html4.rb,
lib/htmlentities/mappings/xhtml1.rb,
lib/htmlentities/mappings/expanded.rb
Overview
HTML entity encoding and decoding for Ruby
Defined Under Namespace
Modules: VERSION Classes: Decoder, Encoder
Constant Summary collapse
- UnknownFlavor =
Class.new(RuntimeError)
- InstructionError =
Class.new(RuntimeError)
- FLAVORS =
%w[html4 xhtml1 expanded]
- MAPPINGS =
{}
- SKIP_DUP_ENCODINGS =
{}
Class Method Summary collapse
-
.decode_entities(*args) ⇒ Object
Legacy compatibility class method allowing direct decoding of XHTML1 entities.
-
.encode_entities(*args) ⇒ Object
Legacy compatibility class method allowing direct encoding of XHTML1 entities.
Instance Method Summary collapse
-
#decode(source) ⇒ Object
Decode entities in a string into their UTF-8 equivalents.
-
#encode(source, *instructions) ⇒ Object
Encode codepoints into their corresponding entities.
-
#initialize(flavor = 'xhtml1') ⇒ HTMLEntities
constructor
Create a new HTMLEntities coder for the specified flavor.
Constructor Details
#initialize(flavor = 'xhtml1') ⇒ HTMLEntities
Create a new HTMLEntities coder for the specified flavor. Available flavors are ‘html4’, ‘expanded’ and ‘xhtml1’ (the default).
The only difference in functionality between html4 and xhtml1 is in the handling of the apos (apostrophe) named entity, which is not defined in HTML4.
‘expanded’ includes a large number of additional SGML entities drawn from
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/SGML.TXT
it “maps SGML character entities from various public sets (namely, ISOamsa, ISOamsb, ISOamsc, ISOamsn, ISOamso, ISOamsr, ISObox, ISOcyr1, ISOcyr2, ISOdia, ISOgrk1, ISOgrk2, ISOgrk3, ISOgrk4, ISOlat1, ISOlat2, ISOnum, ISOpub, ISOtech, HTMLspecial, HTMLsymbol) to corresponding Unicode characters.” (sgml.txt).
‘expanded’ is a strict superset of the XHTML entities: every xhtml named entity encodes and decodes the same under :expanded as under :xhtml1
33 34 35 36 |
# File 'lib/htmlentities.rb', line 33 def initialize(flavor='xhtml1') @flavor = flavor.to_s.downcase raise UnknownFlavor, "Unknown flavor #{flavor}" unless FLAVORS.include?(@flavor) end |
Class Method Details
.decode_entities(*args) ⇒ Object
Legacy compatibility class method allowing direct decoding of XHTML1 entities. See HTMLEntities#decode for description of parameters.
Deprecated.
20 21 22 |
# File 'lib/htmlentities/legacy.rb', line 20 def decode_entities(*args) xhtml1_entities.decode(*args) end |
.encode_entities(*args) ⇒ Object
Legacy compatibility class method allowing direct encoding of XHTML1 entities. See HTMLEntities#encode for description of parameters.
Deprecated.
10 11 12 |
# File 'lib/htmlentities/legacy.rb', line 10 def encode_entities(*args) xhtml1_entities.encode(*args) end |
Instance Method Details
#decode(source) ⇒ Object
Decode entities in a string into their UTF-8 equivalents. The string should already be in UTF-8 encoding.
Unknown named entities will not be converted
44 45 46 |
# File 'lib/htmlentities.rb', line 44 def decode(source) (@decoder ||= Decoder.new(@flavor)).decode(source) end |
#encode(source, *instructions) ⇒ Object
Encode codepoints into their corresponding entities. Various operations are possible, and may be specified in order:
- :basic
-
Convert the five XML entities (‘“<>&)
- :named
-
Convert non-ASCII characters to their named HTML 4.01 equivalent
- :decimal
-
Convert non-ASCII characters to decimal entities (e.g. Ӓ)
- :hexadecimal
-
Convert non-ASCII characters to hexadecimal entities (e.g. # ካ)
You can specify the commands in any order, but they will be executed in the order listed above to ensure that entity ampersands are not clobbered and that named entities are replaced before numeric ones.
If no instructions are specified, :basic will be used.
Examples:
encode_entities(str) - XML-safe
encode_entities(str, :basic, :decimal) - XML-safe and 7-bit clean
encode_entities(str, :basic, :named, :decimal) - 7-bit clean, with all
non-ASCII characters replaced with their named entity where possible, and
decimal equivalents otherwise.
Note: It is the program’s responsibility to ensure that the source contains valid UTF-8 before calling this method.
73 74 75 |
# File 'lib/htmlentities.rb', line 73 def encode(source, *instructions) Encoder.new(@flavor, instructions).encode(source) end |