Class: HTMLEntities

Inherits:
Object
  • Object
show all
Defined in:
lib/htmlentities.rb,
lib/htmlentities/html4.rb,
lib/htmlentities/legacy.rb,
lib/htmlentities/xhtml1.rb

Overview

HTML entity encoding and decoding for Ruby

Defined Under Namespace

Classes: InstructionError, UnknownFlavor

Constant Summary collapse

VERSION =
'4.0.0'
FLAVORS =
%w[html4 xhtml1]
INSTRUCTIONS =
[:basic, :named, :decimal, :hexadecimal]
MAPPINGS =
{}

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(flavor = 'xhtml1') ⇒ HTMLEntities

Create a new HTMLEntities coder for the specified flavor. Available flavors are ‘html4’ and ‘xhtml1’ (the default). The only difference in functionality between the two is in the handling of the apos (apostrophe) named entity, which is not defined in HTML4.

Raises:



24
25
26
27
# File 'lib/htmlentities.rb', line 24

def initialize(flavor='xhtml1')
  @flavor = flavor.to_s.downcase
  raise UnknownFlavor, "Unknown flavor #{flavor}" unless FLAVORS.include?(@flavor)
end

Class Method Details

.decode_entities(*args) ⇒ Object

Legacy compatibility class method allowing direct decoding of XHTML1 entities. See HTMLEntities#decode for description of parameters.



16
17
18
# File 'lib/htmlentities/legacy.rb', line 16

def decode_entities(*args)
  xhtml1_entities.decode(*args)
end

.encode_entities(*args) ⇒ Object

Legacy compatibility class method allowing direct encoding of XHTML1 entities. See HTMLEntities#encode for description of parameters.



8
9
10
# File 'lib/htmlentities/legacy.rb', line 8

def encode_entities(*args)
  xhtml1_entities.encode(*args)
end

Instance Method Details

#decode(source) ⇒ Object

Decode entities in a string into their UTF-8 equivalents. Obviously, if your string is not already in UTF-8, you’d better convert it before using this method, or the output will be mixed up.

Unknown named entities will not be converted



37
38
39
40
41
42
43
# File 'lib/htmlentities.rb', line 37

def decode(source)
  return source.to_s.gsub(named_entity_regexp) {
    (cp = map[$1]) ? [cp].pack('U') : $&
  }.gsub(/&#([0-9]{1,7});|&#x([0-9a-f]{1,6});/i) {
    $1 ? [$1.to_i].pack('U') : [$2.to_i(16)].pack('U')
  }
end

#encode(source, *instructions) ⇒ Object

Encode codepoints into their corresponding entities. Various operations are possible, and may be specified in order:

:basic

Convert the five XML entities (‘“<>&)

:named

Convert non-ASCII characters to their named HTML 4.01 equivalent

:decimal

Convert non-ASCII characters to decimal entities (e.g. &#1234;)

:hexadecimal

Convert non-ASCII characters to hexadecimal entities (e.g. # &#x12ab;)

You can specify the commands in any order, but they will be executed in the order listed above to ensure that entity ampersands are not clobbered and that named entities are replaced before numeric ones.

If no instructions are specified, :basic will be used.

Examples:

encode_entities(str) - XML-safe
encode_entities(str, :basic, :decimal) - XML-safe and 7-bit clean
encode_entities(str, :basic, :named, :decimal) - 7-bit clean, with all
non-ASCII characters replaced with their named entity where possible, and
decimal equivalents otherwise.

Note: It is the program’s responsibility to ensure that the source contains valid UTF-8 before calling this method.



70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
# File 'lib/htmlentities.rb', line 70

def encode(source, *instructions)
  string = source.to_s.dup
  if (instructions.empty?)
    instructions = [:basic]
  elsif (unknown_instructions = instructions - INSTRUCTIONS) != []
    raise InstructionError,
    "unknown encode_entities command(s): #{unknown_instructions.inspect}"
  end
  
  basic_entity_encoder =
  if instructions.include?(:basic) || instructions.include?(:named)
    :encode_named
  elsif instructions.include?(:decimal)
    :encode_decimal
  else instructions.include?(:hexadecimal)
    :encode_hexadecimal
  end
  string.gsub!(basic_entity_regexp){ __send__(basic_entity_encoder, $&) }
  
  extended_entity_encoders = []
  if instructions.include?(:named)
    extended_entity_encoders << :encode_named
  end
  if instructions.include?(:decimal)
    extended_entity_encoders << :encode_decimal
  elsif instructions.include?(:hexadecimal)
    extended_entity_encoders << :encode_hexadecimal
  end
  unless extended_entity_encoders.empty?
    string.gsub!(extended_entity_regexp){
      encode_extended(extended_entity_encoders, $&)
    }
  end
  
  return string
end