Module: SchemaGraphy::RegexpUtils
- Defined in:
- lib/schemagraphy/regexp_utils.rb
Overview
A utility module for robustly parsing and using regular expressions. It handles various formats, including literals and plain strings, and provides helpers for extracting captured content.
Class Method Summary collapse
-
.create_regexp(pattern, flags = '') ⇒ Regexp
Create a Regexp object from a pattern string and explicit flags.
-
.extract_all_captures(text, pattern_info) ⇒ Hash, ...
Extract all named capture groups as a hash or positional captures as an array.
-
.extract_capture(text, pattern_info, capture_name = nil) ⇒ String?
Extract content using named or positional capture groups.
-
.extract_flags_from_regexp(regexp) ⇒ String
Extract a flags string from a compiled Regexp object.
-
.flags_to_options(flags) ⇒ Integer
Convert a flags string (ex: “im”) to a Regexp options integer.
-
.parse_and_extract(text, pattern_input, capture_name = nil, default_flags = '') ⇒ String?
A convenience method that combines parsing and a single extraction.
-
.parse_and_extract_all(text, pattern_input, default_flags = '') ⇒ Hash, ...
A convenience method that combines parsing and extraction of all captures.
-
.parse_pattern(input, default_flags = '') ⇒ Hash?
Parse a regex pattern string using the
to_regexpgem for robust parsing. -
.parse_structured_pattern(pattern_hash) ⇒ Object
Future enhancement to parse structured pattern definitions from a Hash.
-
.parse_tagged_pattern(tagged_input, tag_type) ⇒ Object
Future enhancement to parse custom YAML tags for regular expressions.
Class Method Details
.create_regexp(pattern, flags = '') ⇒ Regexp
Create a Regexp object from a pattern string and explicit flags.
157 158 159 160 |
# File 'lib/schemagraphy/regexp_utils.rb', line 157 def create_regexp pattern, flags = '' = (flags) Regexp.new(pattern, ) end |
.extract_all_captures(text, pattern_info) ⇒ Hash, ...
Extract all named capture groups as a hash or positional captures as an array.
193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 |
# File 'lib/schemagraphy/regexp_utils.rb', line 193 def extract_all_captures text, pattern_info return nil unless text && pattern_info regexp = pattern_info[:regexp] match = text.match(regexp) return nil unless match if match.names.any? # Return hash of named captures match.names.each_with_object({}) do |name, captures| captures[name] = match[name] end else # Return array of positional captures match.captures end end |
.extract_capture(text, pattern_info, capture_name = nil) ⇒ String?
Extract content using named or positional capture groups.
168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 |
# File 'lib/schemagraphy/regexp_utils.rb', line 168 def extract_capture text, pattern_info, capture_name = nil return nil unless text && pattern_info regexp = pattern_info[:regexp] match = text.match(regexp) return nil unless match if capture_name && match.names.include?(capture_name.to_s) # Extract named capture group match[capture_name.to_s] elsif match.captures.any? # Extract first capture group match[1] else # Return the entire match match[0] end end |
.extract_flags_from_regexp(regexp) ⇒ String
Extract a flags string from a compiled Regexp object.
144 145 146 147 148 149 150 |
# File 'lib/schemagraphy/regexp_utils.rb', line 144 def extract_flags_from_regexp regexp flags = '' flags += 'i' if regexp..anybits?(Regexp::IGNORECASE) flags += 'm' if regexp..anybits?(Regexp::MULTILINE) flags += 'x' if regexp..anybits?(Regexp::EXTENDED) flags end |
.flags_to_options(flags) ⇒ Integer
Convert a flags string (ex: “im”) to a Regexp options integer.
126 127 128 129 130 131 132 133 134 135 136 137 138 |
# File 'lib/schemagraphy/regexp_utils.rb', line 126 def flags = 0 flags = flags.to_s |= Regexp::IGNORECASE if flags.include?('i') |= Regexp::MULTILINE if flags.include?('m') |= Regexp::EXTENDED if flags.include?('x') # NOTE: 'g' (global) and 'o' (once) are not standard Ruby flags # encoding flags ('n', 'e', 's', 'u') are handled by to_regexp end |
.parse_and_extract(text, pattern_input, capture_name = nil, default_flags = '') ⇒ String?
A convenience method that combines parsing and a single extraction.
219 220 221 222 |
# File 'lib/schemagraphy/regexp_utils.rb', line 219 def parse_and_extract text, pattern_input, capture_name = nil, default_flags = '' pattern_info = parse_pattern(pattern_input, default_flags) extract_capture(text, pattern_info, capture_name) end |
.parse_and_extract_all(text, pattern_input, default_flags = '') ⇒ Hash, ...
A convenience method that combines parsing and extraction of all captures.
230 231 232 233 |
# File 'lib/schemagraphy/regexp_utils.rb', line 230 def parse_and_extract_all text, pattern_input, default_flags = '' pattern_info = parse_pattern(pattern_input, default_flags) extract_all_captures(text, pattern_info) end |
.parse_pattern(input, default_flags = '') ⇒ Hash?
Parse a regex pattern string using the to_regexp gem for robust parsing. Handles /pattern/flags, ‘%rpatternflags`, and plain text formats.
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
# File 'lib/schemagraphy/regexp_utils.rb', line 30 def parse_pattern input, default_flags = '' return nil if input.nil? || input.to_s.strip.empty? input_str = input.to_s.strip # Remove surrounding quotes that might come from YAML parsing clean_input = input_str.gsub(/^["']|["']$/, '') # Manual parsing for /pattern/flags format (common in YAML configs) if clean_input =~ %r{^/(.+)/([a-z]*)$} pattern_str = Regexp.last_match(1) flags_str = Regexp.last_match(2) = (flags_str) begin regexp_obj = Regexp.new(pattern_str, ) return { pattern: pattern_str, flags: flags_str, regexp: regexp_obj, options: } rescue RegexpError => e raise RegexpError, "Invalid regex pattern '#{input}': #{e.}" end end # Heuristic to detect if it's a Regexp literal is_literal = clean_input.start_with?('%r{') if is_literal # Try to parse as regex literal using to_regexp begin regexp_obj = clean_input.to_regexp(detect: true) # Extract pattern and flags from the compiled regexp pattern_str = regexp_obj.source flags_str = extract_flags_from_regexp(regexp_obj) { pattern: pattern_str, flags: flags_str, regexp: regexp_obj, options: regexp_obj. } rescue RegexpError => e # Malformed literal is an error raise RegexpError, "Invalid regex literal '#{input}': #{e.}" end else # Treat as plain pattern string with default flags flags_str = default_flags.to_s = (flags_str) begin regexp_obj = Regexp.new(clean_input, ) { pattern: clean_input, flags: flags_str, regexp: regexp_obj, options: } rescue RegexpError => e raise RegexpError, "Invalid regex pattern '#{input}': #{e.}" end end end |
.parse_structured_pattern(pattern_hash) ⇒ Object
Not yet implemented.
Future enhancement to parse structured pattern definitions from a Hash.
104 105 106 107 108 109 |
# File 'lib/schemagraphy/regexp_utils.rb', line 104 def parse_structured_pattern pattern_hash # TODO: Implement structured pattern parsing # pattern_hash should have 'pattern' and 'flags' keys # flags can be string or array raise NotImplementedError, 'Structured pattern parsing not yet implemented' end |
.parse_tagged_pattern(tagged_input, tag_type) ⇒ Object
Not yet implemented.
Future enhancement to parse custom YAML tags for regular expressions.
116 117 118 119 120 |
# File 'lib/schemagraphy/regexp_utils.rb', line 116 def parse_tagged_pattern tagged_input, tag_type # TODO: Implement custom YAML tag parsing # tag_type would be :literal or :pattern raise NotImplementedError, 'Tagged pattern parsing not yet implemented' end |