Module: SanitizeUrl
- Defined in:
- lib/sanitize-url.rb
Overview
Helper methods in this module are module methods so that they won’t pollute the namespace into which the module is mixed in.
Constant Summary collapse
- ALPHANUMERIC_CHAR_CODES =
(48..57).to_a + (65..90).to_a + (97..122).to_a
- VALID_OPAQUE_SPECIAL_CHARS =
['!', '*', "'", '(', ')', ';', ':', '@', '&', '=', '+', '$', ',', '/', '?', '%', '#', '[', ']', '-', '_', '.', '~']
- VALID_OPAQUE_SPECIAL_CHAR_CODES =
VALID_OPAQUE_SPECIAL_CHARS.collect { |c| c[0].is_a?(String) ? c.ord : c[0] }
- VALID_OPAQUE_CHAR_CODES =
ALPHANUMERIC_CHAR_CODES + VALID_OPAQUE_SPECIAL_CHAR_CODES
- VALID_SCHEME_SPECIAL_CHARS =
['+', '.', '-']
- VALID_SCHEME_SPECIAL_CHAR_CODES =
VALID_SCHEME_SPECIAL_CHARS.collect { |c| c[0].is_a?(String) ? c.ord : c[0] }
- VALID_SCHEME_CHAR_CODES =
ALPHANUMERIC_CHAR_CODES + VALID_SCHEME_SPECIAL_CHAR_CODES
- HTTP_STYLE_SCHEMES =
Common schemes whose format should be “scheme://” instead of “scheme:”
['http', 'https', 'ftp', 'ftps', 'svn', 'svn+ssh', 'git']
Class Method Summary collapse
-
.char_or_url_encoded(code) ⇒ Object
Return either the literal char or the URL-encoded equivalent, depending on our normalization rules.
-
.dereference_numerics(str) ⇒ Object
:nodoc:.
-
.url_encode?(code) ⇒ Boolean
Should we URL-encode the byte? Must receive an integer code point.
Instance Method Summary collapse
-
#sanitize_url(url, options = {}) ⇒ Object
Sanitize the URL.
Class Method Details
.char_or_url_encoded(code) ⇒ Object
Return either the literal char or the URL-encoded equivalent, depending on our normalization rules. Requires a decimal code point. Code point can be outside the single-byte range.
94 95 96 97 98 99 100 101 102 |
# File 'lib/sanitize-url.rb', line 94 def self.char_or_url_encoded(code) #:nodoc: if url_encode?(code) utf_8_str = ([code.to_i].pack('U')) length = utf_8_str.respond_to?(:bytes) ? utf_8_str.bytes.to_a.length : utf_8_str.length '%' + utf_8_str.unpack('H2' * length).join('%').upcase else code.chr end end |
.dereference_numerics(str) ⇒ Object
:nodoc:
80 81 82 83 84 85 86 87 88 89 |
# File 'lib/sanitize-url.rb', line 80 def self.dereference_numerics(str) #:nodoc: # Decimal code points, e.g. j j j j str = str.gsub(/&#([a-fA-f0-9]+);?/) do char_or_url_encoded($1.to_i) end # Hex code points, e.g. j j str.gsub(/&#[xX]([a-fA-f0-9]+);?/) do char_or_url_encoded($1.to_i(16)) end end |
.url_encode?(code) ⇒ Boolean
Should we URL-encode the byte? Must receive an integer code point
106 107 108 109 110 111 112 113 |
# File 'lib/sanitize-url.rb', line 106 def self.url_encode?(code) #:nodoc: !( (code >= 48 and code <= 57) or # Numbers (code >= 65 and code <= 90) or # Uppercase (code >= 97 and code <= 122) or # Lowercase VALID_OPAQUE_CHAR_CODES.include?(code) ) end |
Instance Method Details
#sanitize_url(url, options = {}) ⇒ Object
Sanitize the URL. Example usage:
sanitize_url('javascript:alert("XSS")')
sanitize_url('javascript:alert("XSS")', :replace_evil_with => 'Replaced')
sanitize_url('ftp://example.com', :schemes => ['http', 'https'])
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
# File 'lib/sanitize-url.rb', line 20 def sanitize_url(url, = {}) raise(ArgumentError, 'options[:schemes] must be an array') if .has_key?(:schemes) and ![:schemes].is_a?(Array) = { :replace_evil_with => '', :schemes => ['http', 'https', 'ftp', 'ftps', 'mailto', 'svn', 'svn+ssh', 'git'] }.merge() url = SanitizeUrl.dereference_numerics(url) # Schemes can consist of letters, digits, or any of the following special chars: + . - # The scheme must begin with a letter and be terminated by a colon. # Everything after the scheme is opaque for our purposes. (See http://www.w3.org/DesignIssues/Axioms.html#opaque) # Try to match a URI with a scheme. We check for percent-encoded characters in the scheme. url.match(/^(.+?)(:|%3A)(.*)$/) dirty_scheme = $1 if dirty_scheme unescaped_opaque = $3 return [:replace_evil_with] if unescaped_opaque.nil? or unescaped_opaque.empty? or unescaped_opaque.match(/^\/+$/) else # Use http as the best guest, and the rest of the URL will be considered opaque dirty_scheme = 'http' unescaped_opaque = url end # Remove URL encoding from the scheme dirty_scheme.gsub!(/%([a-zA-Z0-9]{2})/) do code = $1.to_i(16) VALID_SCHEME_CHAR_CODES.include?(code) ? code.chr : '' end # Clean the scheme by removing invalid characters scheme = '' dirty_scheme.each_byte do |code| scheme << code.chr if VALID_SCHEME_CHAR_CODES.include?(code) end # URL-encode the opaque portion as necessary. Only encode those bytes that are absolutely not allowed in URLs. opaque = '' unescaped_opaque.each_byte do |code| if SanitizeUrl.url_encode?(code) opaque << '%' << code.to_s(16).upcase else opaque << code.chr end end if [:schemes].collect { |s| s.to_s }.include?(scheme.downcase) if HTTP_STYLE_SCHEMES.include?(scheme.downcase) and !opaque.match(/^\/\//) # It's an HTTP-like scheme, but the two slashes are missing. We'll fix that as a courtesy. url = scheme + '://' + opaque else # Either the scheme doesn't need the two slashes, or the opaque portion already has them. url = scheme + ':' + opaque end return url else return [:replace_evil_with] end end |