Class: Gitlab::UntrustedRegexp

Inherits:
Object
  • Object
show all
Defined in:
lib/gitlab/untrusted_regexp.rb,
lib/gitlab/untrusted_regexp/ruby_syntax.rb

Overview

An untrusted regular expression is any regexp containing patterns sourced from user input.

Ruby’s built-in regular expression library allows patterns which complete in exponential time, permitting denial-of-service attacks.

Not all regular expression features are available in untrusted regexes, and there is a strict limit on total execution time. See the RE2 documentation at github.com/google/re2/wiki/Syntax for more details.

This class doesn’t change any instance variables, which allows it to be frozen and setup in constants.

Defined Under Namespace

Classes: RubySyntax

Constant Summary collapse

BACKSLASH_R =
'(\n|\v|\f|\r|\x{0085}|\x{2028}|\x{2029}|\r\n)'

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(pattern, multiline: false) ⇒ UntrustedRegexp

Returns a new instance of UntrustedRegexp.

Raises:

  • (RegexpError)


25
26
27
28
29
30
31
32
33
34
# File 'lib/gitlab/untrusted_regexp.rb', line 25

def initialize(pattern, multiline: false)
  if multiline
    pattern = "(?m)#{pattern}"
  end

  @regexp = RE2::Regexp.new(pattern, log_errors: false)
  @scan_regexp = initialize_scan_regexp

  raise RegexpError, regexp.error unless regexp.ok?
end

Class Method Details

.with_fallback(pattern, multiline: false) ⇒ Object

Handles regular expressions with the preferred RE2 library where possible via UntustedRegex. Falls back to Ruby’s built-in regular expression library when the syntax would be invalid in RE2.

One difference between these is ‘(?m)` multi-line mode. Ruby regex enables this by default, but also handles `^` and `$` differently. See: www.regular-expressions.info/modifiers.html



101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
# File 'lib/gitlab/untrusted_regexp.rb', line 101

def self.with_fallback(pattern, multiline: false)
  UntrustedRegexp.new(pattern, multiline: multiline)
rescue RegexpError
  raise if Feature.enabled?(:disable_unsafe_regexp)

  if Feature.enabled?(:ci_unsafe_regexp_logger, type: :ops)
    Gitlab::AppJsonLogger.info(
      class: self.name,
      regexp: pattern.to_s,
      fabricated: 'unsafe ruby regexp'
    )
  end

  Regexp.new(pattern)
end

Instance Method Details

#==(other) ⇒ Object



90
91
92
# File 'lib/gitlab/untrusted_regexp.rb', line 90

def ==(other)
  self.source == other.source
end

#extract_named_group(name, match) ⇒ Object

#scan returns an array of the groups captured, rather than MatchData. Use this to give the capture group name and grab the proper value

Raises:

  • (RegexpError)


81
82
83
84
85
86
87
88
# File 'lib/gitlab/untrusted_regexp.rb', line 81

def extract_named_group(name, match)
  return unless match

  match_position = regexp.named_capturing_groups[name.to_s]
  raise RegexpError, "Invalid named capture group: #{name}" unless match_position

  match[match_position - 1]
end

#match(text) ⇒ Object



67
68
69
# File 'lib/gitlab/untrusted_regexp.rb', line 67

def match(text)
  scan_regexp.match(text)
end

#match?(text) ⇒ Boolean

Returns:

  • (Boolean)


71
72
73
# File 'lib/gitlab/untrusted_regexp.rb', line 71

def match?(text)
  text.present? && scan(text).present?
end

#replace(text, rewrite) ⇒ Object



75
76
77
# File 'lib/gitlab/untrusted_regexp.rb', line 75

def replace(text, rewrite)
  RE2.Replace(text, regexp, rewrite)
end

#replace_all(text, rewrite) ⇒ Object



36
37
38
# File 'lib/gitlab/untrusted_regexp.rb', line 36

def replace_all(text, rewrite)
  RE2.GlobalReplace(text, regexp, rewrite)
end

#replace_gsub(text) ⇒ Object

There is no built-in replace with block support (like ‘gsub`). We can accomplish the same thing by parsing and rebuilding the string with the substitutions.



42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# File 'lib/gitlab/untrusted_regexp.rb', line 42

def replace_gsub(text)
  new_text = +''
  remainder = text

  matched = match(remainder)

  until matched.nil? || matched.to_a.compact.empty?
    partitioned = remainder.partition(matched.to_s)
    new_text << partitioned.first
    remainder = partitioned.last

    new_text << yield(matched)

    matched = match(remainder)
  end

  new_text << remainder
end

#scan(text) ⇒ Object



61
62
63
64
65
# File 'lib/gitlab/untrusted_regexp.rb', line 61

def scan(text)
  matches = scan_regexp.scan(text).to_a
  matches.map!(&:first) if regexp.number_of_capturing_groups == 0
  matches
end