Class: Gitlab::UntrustedRegexp
- Inherits:
-
Object
- Object
- Gitlab::UntrustedRegexp
- Defined in:
- lib/gitlab/untrusted_regexp.rb,
lib/gitlab/untrusted_regexp/ruby_syntax.rb
Overview
An untrusted regular expression is any regexp containing patterns sourced from user input.
Ruby’s built-in regular expression library allows patterns which complete in exponential time, permitting denial-of-service attacks.
Not all regular expression features are available in untrusted regexes, and there is a strict limit on total execution time. See the RE2 documentation at github.com/google/re2/wiki/Syntax for more details.
This class doesn’t change any instance variables, which allows it to be frozen and setup in constants.
Defined Under Namespace
Classes: RubySyntax
Constant Summary collapse
- BACKSLASH_R =
recreate Ruby’s R metacharacter ruby-doc.org/3.2.2/Regexp.html#class-Regexp-label-Character+Classes
'(\n|\v|\f|\r|\x{0085}|\x{2028}|\x{2029}|\r\n)'
Class Method Summary collapse
-
.with_fallback(pattern, multiline: false) ⇒ Object
Handles regular expressions with the preferred RE2 library where possible via UntustedRegex.
Instance Method Summary collapse
- #==(other) ⇒ Object
-
#extract_named_group(name, match) ⇒ Object
#scan returns an array of the groups captured, rather than MatchData.
-
#initialize(pattern, multiline: false) ⇒ UntrustedRegexp
constructor
A new instance of UntrustedRegexp.
- #match(text) ⇒ Object
- #match?(text) ⇒ Boolean
- #replace(text, rewrite) ⇒ Object
- #replace_all(text, rewrite) ⇒ Object
-
#replace_gsub(text, limit: 0) ⇒ Object
There is no built-in replace with block support (like ‘gsub`).
- #scan(text) ⇒ Object
Constructor Details
#initialize(pattern, multiline: false) ⇒ UntrustedRegexp
Returns a new instance of UntrustedRegexp.
25 26 27 28 29 30 31 32 33 34 |
# File 'lib/gitlab/untrusted_regexp.rb', line 25 def initialize(pattern, multiline: false) if multiline pattern = "(?m)#{pattern}" end @regexp = RE2::Regexp.new(pattern, log_errors: false) @scan_regexp = initialize_scan_regexp raise RegexpError, regexp.error unless regexp.ok? end |
Class Method Details
.with_fallback(pattern, multiline: false) ⇒ Object
Handles regular expressions with the preferred RE2 library where possible via UntustedRegex. Falls back to Ruby’s built-in regular expression library when the syntax would be invalid in RE2.
One difference between these is ‘(?m)` multi-line mode. Ruby regex enables this by default, but also handles `^` and `$` differently. See: www.regular-expressions.info/modifiers.html
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
# File 'lib/gitlab/untrusted_regexp.rb', line 107 def self.with_fallback(pattern, multiline: false) UntrustedRegexp.new(pattern, multiline: multiline) rescue RegexpError raise if Feature.enabled?(:disable_unsafe_regexp) if Feature.enabled?(:ci_unsafe_regexp_logger, type: :ops) Gitlab::AppJsonLogger.info( class: self.name, regexp: pattern.to_s, fabricated: 'unsafe ruby regexp' ) end Regexp.new(pattern) end |
Instance Method Details
#==(other) ⇒ Object
96 97 98 |
# File 'lib/gitlab/untrusted_regexp.rb', line 96 def ==(other) self.source == other.source end |
#extract_named_group(name, match) ⇒ Object
#scan returns an array of the groups captured, rather than MatchData. Use this to give the capture group name and grab the proper value
87 88 89 90 91 92 93 94 |
# File 'lib/gitlab/untrusted_regexp.rb', line 87 def extract_named_group(name, match) return unless match match_position = regexp.named_capturing_groups[name.to_s] raise RegexpError, "Invalid named capture group: #{name}" unless match_position match[match_position - 1] end |
#match(text) ⇒ Object
73 74 75 |
# File 'lib/gitlab/untrusted_regexp.rb', line 73 def match(text) scan_regexp.match(text) end |
#match?(text) ⇒ Boolean
77 78 79 |
# File 'lib/gitlab/untrusted_regexp.rb', line 77 def match?(text) text.present? && scan(text).present? end |
#replace(text, rewrite) ⇒ Object
81 82 83 |
# File 'lib/gitlab/untrusted_regexp.rb', line 81 def replace(text, rewrite) RE2.Replace(text, regexp, rewrite) end |
#replace_all(text, rewrite) ⇒ Object
36 37 38 |
# File 'lib/gitlab/untrusted_regexp.rb', line 36 def replace_all(text, rewrite) RE2.GlobalReplace(text, regexp, rewrite) end |
#replace_gsub(text, limit: 0) ⇒ Object
There is no built-in replace with block support (like ‘gsub`). We can accomplish the same thing by parsing and rebuilding the string with the substitutions.
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
# File 'lib/gitlab/untrusted_regexp.rb', line 42 def replace_gsub(text, limit: 0) new_text = +'' remainder = text count = 0 matched = match(remainder) until matched.nil? || matched.to_a.compact.empty? partitioned = remainder.partition(matched.to_s) new_text << partitioned.first remainder = partitioned.last new_text << yield(matched) if limit > 0 count += 1 break if count >= limit end matched = match(remainder) end new_text << remainder end |
#scan(text) ⇒ Object
67 68 69 70 71 |
# File 'lib/gitlab/untrusted_regexp.rb', line 67 def scan(text) matches = scan_regexp.scan(text).to_a matches.map!(&:first) if regexp.number_of_capturing_groups == 0 matches end |