Class: Gitlab::UrlSanitizer

Inherits:
Object
  • Object
show all
Includes:
Gitlab::Utils::StrongMemoize
Defined in:
lib/gitlab/url_sanitizer.rb

Constant Summary collapse

MASK =
'*****'
ALLOWED_SCHEMES =
%w[http https ssh git].freeze
ALLOWED_WEB_SCHEMES =
%w[http https].freeze
SCHEMIFIED_SCHEME =
'glschemelessuri'
SCHEMIFY_PLACEHOLDER =
"#{SCHEMIFIED_SCHEME}://".freeze
URI_REGEXP =

URI::DEFAULT_PARSER.make_regexp will only match URLs with schemes or relative URLs. This section will match schemeless URIs with userinfo e.g. user:[email protected] but will not match scp-style URIs e.g. user@server:path/to/file)

The userinfo part is very loose compared to URI’s implementation so we also match non-escaped userinfo e.g foo:[email protected] which should be encoded as foo:b%[email protected]

%r{
  (?# negative lookahead for masked userinfo *****, *****:, *****:*****, or :*****)
  (?!.*?(\*{5}$|\*{5}:$|\*{5}:\*{5}|:\*{5}))
  #{URI::REGEXP::PATTERN::USERINFO}@
  (?# negative lookahead to ensure this isn't an SCP-style URL)
  (?!#{URI::REGEXP::PATTERN::HOST}:(?!\b\d+\b))
  #{URI::REGEXP::PATTERN::HOSTPORT}
}x
MASKED_USERINFO_REGEX =

This expression is derived from ‘URI::REGEXP::PATTERN::USERINFO` but with the addition of `and `` in the list of allowed characters to account for the possibility of the userinfo portion of a URL containing masked segments. e.g. myuser:masked_password@masked_domain.com/masked_hook

%r{(?:[\\-_.!~*'()a-zA-Z\d;:&=+$,{}]|%[a-fA-F\d]{2})*}

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(url, credentials: nil) ⇒ UrlSanitizer

Returns a new instance of UrlSanitizer.



77
78
79
80
81
82
83
84
# File 'lib/gitlab/url_sanitizer.rb', line 77

def initialize(url, credentials: nil)
  i[user password].each do |symbol|
    credentials[symbol] = credentials[symbol].presence if credentials&.key?(symbol)
  end

  @credentials = credentials
  @url = parse_url(url)
end

Class Method Details

.sanitize(content, user: nil, password: nil) ⇒ Object



35
36
37
38
39
40
41
42
# File 'lib/gitlab/url_sanitizer.rb', line 35

def self.sanitize(content, user: nil, password: nil)
  content = sanitize_unencoded(content, user: user, password: password)
  content.gsub(URI_REGEXP) do |url|
    new(url).masked_url
  rescue Addressable::URI::InvalidURIError
    ''
  end
end

.sanitize_masked_url(url) ⇒ Object

The url associated with records like ‘WebHookLog` may contain masked portions represented by paired curly brackets in the URL. As this prohibits straightforward parsing of the URL, we can use a variation of the existing USERINFO regex for these cases.



73
74
75
# File 'lib/gitlab/url_sanitizer.rb', line 73

def self.sanitize_masked_url(url)
  url.gsub(%r{//#{MASKED_USERINFO_REGEX}@}o, '//*****:*****@')
end

.valid?(url, allowed_schemes: ALLOWED_SCHEMES) ⇒ Boolean

Returns:

  • (Boolean)


54
55
56
57
58
59
60
61
62
63
# File 'lib/gitlab/url_sanitizer.rb', line 54

def self.valid?(url, allowed_schemes: ALLOWED_SCHEMES)
  return false unless url.present?
  return false unless url.is_a?(String)

  uri = Addressable::URI.parse(url.strip)

  allowed_schemes.include?(uri.scheme)
rescue Addressable::URI::InvalidURIError
  false
end

.valid_web?(url) ⇒ Boolean

Returns:

  • (Boolean)


65
66
67
# File 'lib/gitlab/url_sanitizer.rb', line 65

def self.valid_web?(url)
  valid?(url, allowed_schemes: ALLOWED_WEB_SCHEMES)
end

Instance Method Details

#credentialsObject



86
87
88
# File 'lib/gitlab/url_sanitizer.rb', line 86

def credentials
  @credentials ||= { user: @url.user.presence, password: @url.password.presence }
end

#full_urlObject



110
111
112
113
114
115
116
117
# File 'lib/gitlab/url_sanitizer.rb', line 110

def full_url
  return reverse_schemify(@url.to_s) unless valid_credentials?

  url = @url.dup
  url.password = encode_percent(credentials[:password]) if credentials[:password].present?
  url.user = encode_percent(credentials[:user]) if credentials[:user].present?
  reverse_schemify(url.to_s)
end

#masked_urlObject



102
103
104
105
106
107
# File 'lib/gitlab/url_sanitizer.rb', line 102

def masked_url
  url = @url.dup
  url.password = MASK if url.password.present?
  url.user = MASK if url.user.present?
  reverse_schemify(url.to_s)
end

#sanitized_urlObject



94
95
96
97
98
99
# File 'lib/gitlab/url_sanitizer.rb', line 94

def sanitized_url
  safe_url = @url.dup
  safe_url.password = nil
  safe_url.user = nil
  reverse_schemify(safe_url.to_s)
end

#userObject



90
91
92
# File 'lib/gitlab/url_sanitizer.rb', line 90

def user
  credentials[:user]
end