Module: Escape

Defined in:
lib/escape.rb

Overview

Escape module provides several escape functions.

  • URI

  • HTML

  • shell command

Constant Summary collapse

HTML_TEXT_ESCAPE_HASH =

:stopdoc:

{
  '&' => '&',
  '<' => '&lt;',
  '>' => '&gt;',
}
HTML_ATTR_ESCAPE_HASH =

:stopdoc:

{
  '&' => '&amp;',
  '<' => '&lt;',
  '>' => '&gt;',
  '"' => '&quot;',
}

Class Method Summary collapse

Class Method Details

.html_attr(str) ⇒ Object

Escape.html_attr encodes a string as a double-quoted HTML attribute using character references.

Escape.html_attr("abc") #=> "\"abc\""
Escape.html_attr("a&b") #=> "\"a&amp;b\""
Escape.html_attr("ab&<>\"c") #=> "\"ab&amp;&lt;&gt;&quot;c\""
Escape.html_attr("a'c") #=> "\"a'c\""

It escapes 4 characters:

  • ‘&’ to ‘&amp;’

  • ‘<’ to ‘&lt;’

  • ‘>’ to ‘&gt;’

  • ‘“’ to ‘&quot;’



244
245
246
# File 'lib/escape.rb', line 244

def html_attr(str)
  '"' + str.gsub(/[&<>"]/) {|ch| HTML_ATTR_ESCAPE_HASH[ch] } + '"'
end

.html_form(pairs, sep = '&') ⇒ Object

Escape.html_form composes HTML form key-value pairs as a x-www-form-urlencoded encoded string.

Escape.html_form takes an array of pair of strings or an hash from string to string.

Escape.html_form([["a","b"], ["c","d"]]) #=> "a=b&c=d"
Escape.html_form({"a"=>"b", "c"=>"d"}) #=> "a=b&c=d"

In the array form, it is possible to use same key more than once. (It is required for a HTML form which contains checkboxes and select element with multiple attribute.)

Escape.html_form([["k","1"], ["k","2"]]) #=> "k=1&k=2"

If the strings contains characters which must be escaped in x-www-form-urlencoded, they are escaped using %-encoding.

Escape.html_form([["k=","&;="]]) #=> "k%3D=%26%3B%3D"

The separator can be specified by the optional second argument.

Escape.html_form([["a","b"], ["c","d"]], ";") #=> "a=b;c=d"

See HTML 4.01 for details.



164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
# File 'lib/escape.rb', line 164

def html_form(pairs, sep='&')
  r = ''
  first = true
  pairs.each {|k, v|
    # query-chars - pct-encoded - x-www-form-urlencoded-delimiters =
    #   unreserved / "!" / "$" / "'" / "(" / ")" / "*" / "," / ":" / "@" / "/" / "?"
    # query-char - pct-encoded = unreserved / sub-delims / ":" / "@" / "/" / "?"
    # query-char = pchar / "/" / "?" = unreserved / pct-encoded / sub-delims / ":" / "@" / "/" / "?"
    # unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
    # sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
    # x-www-form-urlencoded-delimiters = "&" / "+" / ";" / "="
    r << sep if !first
    first = false
    k.each_byte {|byte|
      ch = byte.chr
      if %r{[^0-9A-Za-z\-\._~:/?@!\$'()*,]}n =~ ch
        r << "%" << ch.unpack("H2")[0].upcase
      else
        r << ch
      end
    }
    r << '='
    v.each_byte {|byte|
      ch = byte.chr
      if %r{[^0-9A-Za-z\-\._~:/?@!\$'()*,]}n =~ ch
        r << "%" << ch.unpack("H2")[0].upcase
      else
        r << ch
      end
    }
  }
  r
end

.html_form_fast(pairs, sep = ';') ⇒ Object

:stopdoc:



120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
# File 'lib/escape.rb', line 120

def html_form_fast(pairs, sep=';')
  pairs.map {|k, v|
    # query-chars - pct-encoded - x-www-form-urlencoded-delimiters =
    #   unreserved / "!" / "$" / "'" / "(" / ")" / "*" / "," / ":" / "@" / "/" / "?"
    # query-char - pct-encoded = unreserved / sub-delims / ":" / "@" / "/" / "?"
    # query-char = pchar / "/" / "?" = unreserved / pct-encoded / sub-delims / ":" / "@" / "/" / "?"
    # unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
    # sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
    # x-www-form-urlencoded-delimiters = "&" / "+" / ";" / "="
    k = k.gsub(%r{[^0-9A-Za-z\-\._~:/?@!\$'()*,]}n) {
      '%' + $&.unpack("H2")[0].upcase
    }
    v = v.gsub(%r{[^0-9A-Za-z\-\._~:/?@!\$'()*,]}n) {
      '%' + $&.unpack("H2")[0].upcase
    }
    "#{k}=#{v}"
  }.join(sep)
end

.html_text(str) ⇒ Object

Escape.html_text escapes a string appropriate for HTML text using character references.

It escapes 3 characters:

  • ‘&’ to ‘&amp;’

  • ‘<’ to ‘&lt;’

  • ‘>’ to ‘&gt;’

Escape.html_text("abc") #=> "abc"
Escape.html_text("a & b < c > d") #=> "a &amp; b &lt; c &gt; d"

This function is not appropriate for escaping HTML element attribute because quotes are not escaped.



218
219
220
# File 'lib/escape.rb', line 218

def html_text(str)
  str.gsub(/[&<>]/) {|ch| HTML_TEXT_ESCAPE_HASH[ch] }
end

.shell_command(command) ⇒ Object

Escape.shell_command composes a sequence of words to a single shell command line. All shell meta characters are quoted and the words are concatenated with interleaving space.

Escape.shell_command(["ls", "/"]) #=> "ls /"
Escape.shell_command(["echo", "*"]) #=> "echo '*'"

Note that system(*command) and system(Escape.shell_command(command)) is roughly same. There are two exception as follows.

  • The first is that the later may invokes /bin/sh.

  • The second is an interpretation of an array with only one element: the element is parsed by the shell with the former but it is recognized as single word with the later. For example, system(*[“echo foo”]) invokes echo command with an argument “foo”. But system(Escape.shell_command([“echo foo”])) invokes “echo foo” command without arguments (and it probably fails).



52
53
54
# File 'lib/escape.rb', line 52

def shell_command(command)
  command.map {|word| shell_single_word(word) }.join(' ')
end

.shell_single_word(str) ⇒ Object

Escape.shell_single_word quotes shell meta characters.

The result string is always single shell word, even if the argument is “”. Escape.shell_single_word(“”) returns “””.

Escape.shell_single_word("") #=> "''"
Escape.shell_single_word("foo") #=> "foo"
Escape.shell_single_word("*") #=> "'*'"


65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# File 'lib/escape.rb', line 65

def shell_single_word(str)
  if str.empty?
    "''"
  elsif %r{\A[0-9A-Za-z+,./:=@_-]+\z} =~ str
    str
  else
    result = ''
    str.scan(/('+)|[^']+/) {
      if $1
        result << %q{\'} * $1.length
      else
        result << "'#{$&}'"
      end
    }
    result
  end
end

.uri_path(str) ⇒ Object

Escape.uri_path escapes URI path using percent-encoding. The given path should be a sequence of (non-escaped) segments separated by “/”. The segments cannot contains “/”.

Escape.uri_path("a/b/c") #=> "a/b/c"
Escape.uri_path("a?b/c?d/e?f") #=> "a%3Fb/c%3Fd/e%3Ff"

The path is the part after authority before query in URI, as follows.

scheme://authority/path#fragment

See RFC 3986 for details of URI.

Note that this function is not appropriate to convert OS path to URI.



115
116
117
# File 'lib/escape.rb', line 115

def uri_path(str)
  str.gsub(%r{[^/]+}n) { uri_segment($&) }
end

.uri_segment(str) ⇒ Object

Escape.uri_segment escapes URI segment using percent-encoding.

Escape.uri_segment("a/b") #=> "a%2Fb"

The segment is “/”-splitted element after authority before query in URI, as follows.

scheme://authority/segment1/segment2/.../segmentN?query#fragment

See RFC 3986 for details of URI.



92
93
94
95
96
97
98
99
# File 'lib/escape.rb', line 92

def uri_segment(str)
  # pchar - pct-encoded = unreserved / sub-delims / ":" / "@"
  # unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
  # sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
  str.gsub(%r{[^A-Za-z0-9\-._~!$&'()*+,;=:@]}n) {
    '%' + $&.unpack("H2")[0].upcase
  }
end