Class: Gitlab::UrlSanitizer
- Inherits:
-
Object
- Object
- Gitlab::UrlSanitizer
- Includes:
- Gitlab::Utils::StrongMemoize
- Defined in:
- lib/gitlab/url_sanitizer.rb
Constant Summary collapse
- MASK =
'*****'
- ALLOWED_SCHEMES =
%w[http https ssh git].freeze
- ALLOWED_WEB_SCHEMES =
%w[http https].freeze
- SCHEMIFIED_SCHEME =
'glschemelessuri'
- SCHEMIFY_PLACEHOLDER =
"#{SCHEMIFIED_SCHEME}://".freeze
- URI_REGEXP =
URI::DEFAULT_PARSER.make_regexp will only match URLs with schemes or relative URLs. This section will match schemeless URIs with userinfo e.g. user:[email protected] but will not match scp-style URIs e.g. user@server:path/to/file)
The userinfo part is very loose compared to URI’s implementation so we also match non-escaped userinfo e.g foo:[email protected] which should be encoded as foo:b%[email protected]
%r{ (?# negative lookahead for masked userinfo *****, *****:, *****:*****, or :*****) (?!.*?(\*{5}$|\*{5}:$|\*{5}:\*{5}|:\*{5})) #{URI::REGEXP::PATTERN::USERINFO}@ (?# negative lookahead to ensure this isn't an SCP-style URL) (?!#{URI::REGEXP::PATTERN::HOST}:(?!\b\d+\b)) #{URI::REGEXP::PATTERN::HOSTPORT} }x
- MASKED_USERINFO_REGEX =
This expression is derived from ‘URI::REGEXP::PATTERN::USERINFO` but with the addition of `and `` in the list of allowed characters to account for the possibility of the userinfo portion of a URL containing masked segments. e.g. myuser:masked_password@masked_domain.com/masked_hook
%r{(?:[\\-_.!~*'()a-zA-Z\d;:&=+$,{}]|%[a-fA-F\d]{2})*}
Class Method Summary collapse
- .sanitize(content, user: nil, password: nil) ⇒ Object
-
.sanitize_masked_url(url) ⇒ Object
The url associated with records like ‘WebHookLog` may contain masked portions represented by paired curly brackets in the URL.
- .valid?(url, allowed_schemes: ALLOWED_SCHEMES) ⇒ Boolean
- .valid_web?(url) ⇒ Boolean
Instance Method Summary collapse
- #credentials ⇒ Object
- #full_url ⇒ Object
-
#initialize(url, credentials: nil) ⇒ UrlSanitizer
constructor
A new instance of UrlSanitizer.
- #masked_url ⇒ Object
- #sanitized_url ⇒ Object
- #user ⇒ Object
Constructor Details
#initialize(url, credentials: nil) ⇒ UrlSanitizer
Returns a new instance of UrlSanitizer.
77 78 79 80 81 82 83 84 |
# File 'lib/gitlab/url_sanitizer.rb', line 77 def initialize(url, credentials: nil) i[user password].each do |symbol| credentials[symbol] = credentials[symbol].presence if credentials&.key?(symbol) end @credentials = credentials @url = parse_url(url) end |
Class Method Details
.sanitize(content, user: nil, password: nil) ⇒ Object
35 36 37 38 39 40 41 42 |
# File 'lib/gitlab/url_sanitizer.rb', line 35 def self.sanitize(content, user: nil, password: nil) content = sanitize_unencoded(content, user: user, password: password) content.gsub(URI_REGEXP) do |url| new(url).masked_url rescue Addressable::URI::InvalidURIError '' end end |
.sanitize_masked_url(url) ⇒ Object
The url associated with records like ‘WebHookLog` may contain masked portions represented by paired curly brackets in the URL. As this prohibits straightforward parsing of the URL, we can use a variation of the existing USERINFO regex for these cases.
73 74 75 |
# File 'lib/gitlab/url_sanitizer.rb', line 73 def self.sanitize_masked_url(url) url.gsub(%r{//#{MASKED_USERINFO_REGEX}@}o, '//*****:*****@') end |
.valid?(url, allowed_schemes: ALLOWED_SCHEMES) ⇒ Boolean
54 55 56 57 58 59 60 61 62 63 |
# File 'lib/gitlab/url_sanitizer.rb', line 54 def self.valid?(url, allowed_schemes: ALLOWED_SCHEMES) return false unless url.present? return false unless url.is_a?(String) uri = Addressable::URI.parse(url.strip) allowed_schemes.include?(uri.scheme) rescue Addressable::URI::InvalidURIError false end |
.valid_web?(url) ⇒ Boolean
65 66 67 |
# File 'lib/gitlab/url_sanitizer.rb', line 65 def self.valid_web?(url) valid?(url, allowed_schemes: ALLOWED_WEB_SCHEMES) end |
Instance Method Details
#credentials ⇒ Object
86 87 88 |
# File 'lib/gitlab/url_sanitizer.rb', line 86 def credentials @credentials ||= { user: @url.user.presence, password: @url.password.presence } end |
#full_url ⇒ Object
110 111 112 113 114 115 116 117 |
# File 'lib/gitlab/url_sanitizer.rb', line 110 def full_url return reverse_schemify(@url.to_s) unless valid_credentials? url = @url.dup url.password = encode_percent(credentials[:password]) if credentials[:password].present? url.user = encode_percent(credentials[:user]) if credentials[:user].present? reverse_schemify(url.to_s) end |
#masked_url ⇒ Object
102 103 104 105 106 107 |
# File 'lib/gitlab/url_sanitizer.rb', line 102 def masked_url url = @url.dup url.password = MASK if url.password.present? url.user = MASK if url.user.present? reverse_schemify(url.to_s) end |
#sanitized_url ⇒ Object
94 95 96 97 98 99 |
# File 'lib/gitlab/url_sanitizer.rb', line 94 def sanitized_url safe_url = @url.dup safe_url.password = nil safe_url.user = nil reverse_schemify(safe_url.to_s) end |
#user ⇒ Object
90 91 92 |
# File 'lib/gitlab/url_sanitizer.rb', line 90 def user credentials[:user] end |