Class: Gitlab::UntrustedRegexp

Inherits:
Object
  • Object
show all
Defined in:
lib/gitlab/untrusted_regexp.rb,
lib/gitlab/untrusted_regexp/ruby_syntax.rb

Overview

An untrusted regular expression is any regexp containing patterns sourced from user input.

Ruby’s built-in regular expression library allows patterns which complete in exponential time, permitting denial-of-service attacks.

Not all regular expression features are available in untrusted regexes, and there is a strict limit on total execution time. See the RE2 documentation at github.com/google/re2/wiki/Syntax for more details.

This class doesn’t change any instance variables, which allows it to be frozen and setup in constants.

Defined Under Namespace

Classes: RubySyntax

Constant Summary collapse

BACKSLASH_R =
'(\n|\v|\f|\r|\x{0085}|\x{2028}|\x{2029}|\r\n)'

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(pattern, multiline: false) ⇒ UntrustedRegexp

Returns a new instance of UntrustedRegexp.

Raises:

  • (RegexpError)


25
26
27
28
29
30
31
32
33
34
# File 'lib/gitlab/untrusted_regexp.rb', line 25

def initialize(pattern, multiline: false)
  if multiline
    pattern = "(?m)#{pattern}"
  end

  @regexp = RE2::Regexp.new(pattern, log_errors: false)
  @scan_regexp = initialize_scan_regexp

  raise RegexpError, regexp.error unless regexp.ok?
end

Class Method Details

.with_fallback(pattern, multiline: false) ⇒ Object

Handles regular expressions with the preferred RE2 library where possible via UntustedRegex. Falls back to Ruby’s built-in regular expression library when the syntax would be invalid in RE2.

One difference between these is ‘(?m)` multi-line mode. Ruby regex enables this by default, but also handles `^` and `$` differently. See: www.regular-expressions.info/modifiers.html



111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
# File 'lib/gitlab/untrusted_regexp.rb', line 111

def self.with_fallback(pattern, multiline: false)
  UntrustedRegexp.new(pattern, multiline: multiline)
rescue RegexpError
  raise if Feature.enabled?(:disable_unsafe_regexp)

  if Feature.enabled?(:ci_unsafe_regexp_logger, type: :ops)
    Gitlab::AppJsonLogger.info(
      class: self.name,
      regexp: pattern.to_s,
      fabricated: 'unsafe ruby regexp'
    )
  end

  Regexp.new(pattern)
end

Instance Method Details

#==(other) ⇒ Object



100
101
102
# File 'lib/gitlab/untrusted_regexp.rb', line 100

def ==(other)
  self.source == other.source
end

#extract_named_group(name, match) ⇒ Object

#scan returns an array of the groups captured, rather than MatchData. Use this to give the capture group name and grab the proper value

Raises:

  • (RegexpError)


91
92
93
94
95
96
97
98
# File 'lib/gitlab/untrusted_regexp.rb', line 91

def extract_named_group(name, match)
  return unless match

  match_position = regexp.named_capturing_groups[name.to_s]
  raise RegexpError, "Invalid named capture group: #{name}" unless match_position

  match[match_position - 1]
end

#match(text) ⇒ Object



75
76
77
# File 'lib/gitlab/untrusted_regexp.rb', line 75

def match(text)
  scan_regexp.match(text)
end

#match?(text, allow_empty_string: false) ⇒ Boolean

Returns:

  • (Boolean)


79
80
81
82
83
# File 'lib/gitlab/untrusted_regexp.rb', line 79

def match?(text, allow_empty_string: false)
  return false if text.nil?

  (allow_empty_string || text.present?) && scan(text).present?
end

#replace(text, rewrite) ⇒ Object



85
86
87
# File 'lib/gitlab/untrusted_regexp.rb', line 85

def replace(text, rewrite)
  RE2.Replace(text, regexp, rewrite)
end

#replace_all(text, rewrite) ⇒ Object



36
37
38
# File 'lib/gitlab/untrusted_regexp.rb', line 36

def replace_all(text, rewrite)
  RE2.GlobalReplace(text, regexp, rewrite)
end

#replace_gsub(text, limit: 0) ⇒ Object

There is no built-in replace with block support (like ‘gsub`). We can accomplish the same thing by parsing and rebuilding the string with the substitutions.



42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
# File 'lib/gitlab/untrusted_regexp.rb', line 42

def replace_gsub(text, limit: 0)
  return enum_for(:replace_gsub, text, limit:) unless block_given?

  new_text = +''
  remainder = text
  count = 0

  matched = match(remainder)

  until matched.nil? || matched.to_a.compact.empty?
    partitioned = remainder.partition(matched.to_s)
    new_text << partitioned.first
    remainder = partitioned.last

    new_text << yield(matched)

    if limit > 0
      count += 1
      break if count >= limit
    end

    matched = match(remainder)
  end

  new_text << remainder
end

#scan(text) ⇒ Object



69
70
71
72
73
# File 'lib/gitlab/untrusted_regexp.rb', line 69

def scan(text)
  matches = scan_regexp.scan(text).to_a
  matches.map!(&:first) if regexp.number_of_capturing_groups == 0
  matches
end