Class: Gitlab::UrlSanitizer

Inherits:
Object
  • Object
show all
Includes:
Gitlab::Utils::StrongMemoize
Defined in:
lib/gitlab/url_sanitizer.rb

Constant Summary collapse

ALLOWED_SCHEMES =
%w[http https ssh git].freeze
ALLOWED_WEB_SCHEMES =
%w[http https].freeze
SCHEMIFIED_SCHEME =
'glschemelessuri'
SCHEMIFY_PLACEHOLDER =
"#{SCHEMIFIED_SCHEME}://".freeze
URI_REGEXP =

URI::DEFAULT_PARSER.make_regexp will only match URLs with schemes or relative URLs. This section will match schemeless URIs with userinfo e.g. user:[email protected] but will not match scp-style URIs e.g. user@server:path/to/file)

The userinfo part is very loose compared to URI’s implementation so we also match non-escaped userinfo e.g foo:[email protected] which should be encoded as foo:b%[email protected]

%r{
(?:
   #{URI::DEFAULT_PARSER.make_regexp(ALLOWED_SCHEMES)}
 |
   (?:(?:(?!@)[%#{URI::REGEXP::PATTERN::UNRESERVED}#{URI::REGEXP::PATTERN::RESERVED}])+(?:@))
   (?# negative lookahead ensures this isn't an SCP-style URL: [host]:[rel_path|abs_path] server:path/to/file)
   (?!#{URI::REGEXP::PATTERN::HOST}:(?:#{URI::REGEXP::PATTERN::REL_PATH}|#{URI::REGEXP::PATTERN::ABS_PATH}))
   #{URI::REGEXP::PATTERN::HOSTPORT}
)
}x
MASKED_USERINFO_REGEX =

This expression is derived from ‘URI::REGEXP::PATTERN::USERINFO` but with the addition of `and `` in the list of allowed characters to account for the possibility of the userinfo portion of a URL containing masked segments. e.g. myuser:masked_password@masked_domain.com/masked_hook

%r{(?:[\\-_.!~*'()a-zA-Z\d;:&=+$,{}]|%[a-fA-F\d]{2})*}

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(url, credentials: nil) ⇒ UrlSanitizer

Returns a new instance of UrlSanitizer.



67
68
69
70
71
72
73
74
# File 'lib/gitlab/url_sanitizer.rb', line 67

def initialize(url, credentials: nil)
  %i[user password].each do |symbol|
    credentials[symbol] = credentials[symbol].presence if credentials&.key?(symbol)
  end

  @credentials = credentials
  @url = parse_url(url)
end

Class Method Details

.sanitize(content) ⇒ Object



36
37
38
39
40
41
42
# File 'lib/gitlab/url_sanitizer.rb', line 36

def self.sanitize(content)
  content.gsub(URI_REGEXP) do |url|
    new(url).masked_url
  rescue Addressable::URI::InvalidURIError
    ''
  end
end

.sanitize_masked_url(url) ⇒ Object

The url associated with records like ‘WebHookLog` may contain masked portions represented by paired curly brackets in the URL. As this prohibits straightforward parsing of the URL, we can use a variation of the existing USERINFO regex for these cases.



63
64
65
# File 'lib/gitlab/url_sanitizer.rb', line 63

def self.sanitize_masked_url(url)
  url.gsub(%r{//#{MASKED_USERINFO_REGEX}@}o, '//*****:*****@')
end

.valid?(url, allowed_schemes: ALLOWED_SCHEMES) ⇒ Boolean

Returns:

  • (Boolean)


44
45
46
47
48
49
50
51
52
53
# File 'lib/gitlab/url_sanitizer.rb', line 44

def self.valid?(url, allowed_schemes: ALLOWED_SCHEMES)
  return false unless url.present?
  return false unless url.is_a?(String)

  uri = Addressable::URI.parse(url.strip)

  allowed_schemes.include?(uri.scheme)
rescue Addressable::URI::InvalidURIError
  false
end

.valid_web?(url) ⇒ Boolean

Returns:

  • (Boolean)


55
56
57
# File 'lib/gitlab/url_sanitizer.rb', line 55

def self.valid_web?(url)
  valid?(url, allowed_schemes: ALLOWED_WEB_SCHEMES)
end

Instance Method Details

#credentialsObject



76
77
78
# File 'lib/gitlab/url_sanitizer.rb', line 76

def credentials
  @credentials ||= { user: @url.user.presence, password: @url.password.presence }
end

#full_urlObject



100
101
102
103
104
105
106
107
# File 'lib/gitlab/url_sanitizer.rb', line 100

def full_url
  return reverse_schemify(@url.to_s) unless valid_credentials?

  url = @url.dup
  url.password = encode_percent(credentials[:password]) if credentials[:password].present?
  url.user = encode_percent(credentials[:user]) if credentials[:user].present?
  reverse_schemify(url.to_s)
end

#masked_urlObject



92
93
94
95
96
97
# File 'lib/gitlab/url_sanitizer.rb', line 92

def masked_url
  url = @url.dup
  url.password = "*****" if url.password.present?
  url.user = "*****" if url.user.present?
  reverse_schemify(url.to_s)
end

#sanitized_urlObject



84
85
86
87
88
89
# File 'lib/gitlab/url_sanitizer.rb', line 84

def sanitized_url
  safe_url = @url.dup
  safe_url.password = nil
  safe_url.user = nil
  reverse_schemify(safe_url.to_s)
end

#userObject



80
81
82
# File 'lib/gitlab/url_sanitizer.rb', line 80

def user
  credentials[:user]
end