Module: SchemaGraphy::RegexpUtils

Defined in:
lib/schemagraphy/regexp_utils.rb

Overview

A utility module for robustly parsing and using regular expressions. It handles various formats, including literals and plain strings, and provides helpers for extracting captured content.

Class Method Summary collapse

Class Method Details

.create_regexp(pattern, flags = '') ⇒ Regexp

Create a Regexp object from a pattern string and explicit flags.

Parameters:

  • pattern (String)

    The regex pattern (without delimiters).

  • flags (String) (defaults to: '')

    The flags string (ex: “im”).

Returns:

  • (Regexp)

    The compiled Regexp object.



157
158
159
160
# File 'lib/schemagraphy/regexp_utils.rb', line 157

def create_regexp pattern, flags = ''
  options = flags_to_options(flags)
  Regexp.new(pattern, options)
end

.extract_all_captures(text, pattern_info) ⇒ Hash, ...

Extract all named capture groups as a hash or positional captures as an array.

Parameters:

  • text (String)

    The text to match against.

  • pattern_info (Hash)

    The hash result from parse_pattern.

Returns:

  • (Hash, Array, nil)

    A hash of named captures, an array of positional captures, or nil.



193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
# File 'lib/schemagraphy/regexp_utils.rb', line 193

def extract_all_captures text, pattern_info
  return nil unless text && pattern_info

  regexp = pattern_info[:regexp]
  match = text.match(regexp)

  return nil unless match

  if match.names.any?
    # Return hash of named captures
    match.names.each_with_object({}) do |name, captures|
      captures[name] = match[name]
    end
  else
    # Return array of positional captures
    match.captures
  end
end

.extract_capture(text, pattern_info, capture_name = nil) ⇒ String?

Extract content using named or positional capture groups.

Parameters:

  • text (String)

    The text to match against.

  • pattern_info (Hash)

    The hash result from parse_pattern.

  • capture_name (String) (defaults to: nil)

    The name of the capture group to extract (optional).

Returns:

  • (String, nil)

    The extracted text, or nil if no match is found.



168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
# File 'lib/schemagraphy/regexp_utils.rb', line 168

def extract_capture text, pattern_info, capture_name = nil
  return nil unless text && pattern_info

  regexp = pattern_info[:regexp]
  match = text.match(regexp)

  return nil unless match

  if capture_name && match.names.include?(capture_name.to_s)
    # Extract named capture group
    match[capture_name.to_s]
  elsif match.captures.any?
    # Extract first capture group
    match[1]
  else
    # Return the entire match
    match[0]
  end
end

.extract_flags_from_regexp(regexp) ⇒ String

Extract a flags string from a compiled Regexp object.

Parameters:

  • regexp (Regexp)

    A compiled regexp object.

Returns:

  • (String)

    String representation of the flags (e.g., “im”).



144
145
146
147
148
149
150
# File 'lib/schemagraphy/regexp_utils.rb', line 144

def extract_flags_from_regexp regexp
  flags = ''
  flags += 'i' if regexp.options.anybits?(Regexp::IGNORECASE)
  flags += 'm' if regexp.options.anybits?(Regexp::MULTILINE)
  flags += 'x' if regexp.options.anybits?(Regexp::EXTENDED)
  flags
end

.flags_to_options(flags) ⇒ Integer

Convert a flags string (ex: “im”) to a Regexp options integer.

Parameters:

  • flags (String)

    String containing regex flags.

Returns:

  • (Integer)

    Regexp options integer.



126
127
128
129
130
131
132
133
134
135
136
137
138
# File 'lib/schemagraphy/regexp_utils.rb', line 126

def flags_to_options flags
  options = 0
  flags = flags.to_s

  options |= Regexp::IGNORECASE if flags.include?('i')
  options |= Regexp::MULTILINE if flags.include?('m')
  options |= Regexp::EXTENDED if flags.include?('x')

  # NOTE: 'g' (global) and 'o' (once) are not standard Ruby flags
  # encoding flags ('n', 'e', 's', 'u') are handled by to_regexp

  options
end

.parse_and_extract(text, pattern_input, capture_name = nil, default_flags = '') ⇒ String?

A convenience method that combines parsing and a single extraction.

Parameters:

  • text (String)

    The text to match against.

  • pattern_input (String)

    The pattern string (with or without /flags/).

  • capture_name (String) (defaults to: nil)

    Name of the capture group to extract (optional).

  • default_flags (String) (defaults to: '')

    Default flags if the pattern has no flags.

Returns:

  • (String, nil)

    The extracted text, or nil if no match is found.



219
220
221
222
# File 'lib/schemagraphy/regexp_utils.rb', line 219

def parse_and_extract text, pattern_input, capture_name = nil, default_flags = ''
  pattern_info = parse_pattern(pattern_input, default_flags)
  extract_capture(text, pattern_info, capture_name)
end

.parse_and_extract_all(text, pattern_input, default_flags = '') ⇒ Hash, ...

A convenience method that combines parsing and extraction of all captures.

Parameters:

  • text (String)

    The text to match against.

  • pattern_input (String)

    The pattern string (with or without /flags/).

  • default_flags (String) (defaults to: '')

    Default flags if the pattern has no flags.

Returns:

  • (Hash, Array, nil)

    All captured content, or nil if no match is found.



230
231
232
233
# File 'lib/schemagraphy/regexp_utils.rb', line 230

def parse_and_extract_all text, pattern_input, default_flags = ''
  pattern_info = parse_pattern(pattern_input, default_flags)
  extract_all_captures(text, pattern_info)
end

.parse_pattern(input, default_flags = '') ⇒ Hash?

Parse a regex pattern string using the to_regexp gem for robust parsing. Handles /pattern/flags, ‘%rpatternflags`, and plain text formats.

Examples:

parse_pattern("/^hello.*$/im")
# => { pattern: "^hello.*$", flags: "im", regexp: /^hello.*$/im, options: 6 }
parse_pattern("hello world")
# => { pattern: "hello world", flags: "", regexp: /hello world/, options: 0 }
parse_pattern("hello world", "i")
# => { pattern: "hello world", flags: "i", regexp: /hello world/i, options: 1 }

Parameters:

  • input (String)

    The input string, e.g., “/pattern/flags” or “plain pattern”.

  • default_flags (String) (defaults to: '')

    Default flags to apply if none are specified (default: “”).

Returns:

  • (Hash, nil)

    A hash with :pattern, :flags, :regexp, and :options, or nil.



30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
# File 'lib/schemagraphy/regexp_utils.rb', line 30

def parse_pattern input, default_flags = ''
  return nil if input.nil? || input.to_s.strip.empty?

  input_str = input.to_s.strip

  # Remove surrounding quotes that might come from YAML parsing
  clean_input = input_str.gsub(/^["']|["']$/, '')

  # Manual parsing for /pattern/flags format (common in YAML configs)
  if clean_input =~ %r{^/(.+)/([a-z]*)$}
    pattern_str = Regexp.last_match(1)
    flags_str = Regexp.last_match(2)
    options = flags_to_options(flags_str)

    begin
      regexp_obj = Regexp.new(pattern_str, options)

      return {
        pattern: pattern_str,
        flags: flags_str,
        regexp: regexp_obj,
        options: options
      }
    rescue RegexpError => e
      raise RegexpError, "Invalid regex pattern '#{input}': #{e.message}"
    end
  end

  # Heuristic to detect if it's a Regexp literal
  is_literal = clean_input.start_with?('%r{')

  if is_literal
    # Try to parse as regex literal using to_regexp
    begin
      regexp_obj = clean_input.to_regexp(detect: true)

      # Extract pattern and flags from the compiled regexp
      pattern_str = regexp_obj.source
      flags_str = extract_flags_from_regexp(regexp_obj)

      {
        pattern: pattern_str,
        flags: flags_str,
        regexp: regexp_obj,
        options: regexp_obj.options
      }
    rescue RegexpError => e
      # Malformed literal is an error
      raise RegexpError, "Invalid regex literal '#{input}': #{e.message}"
    end
  else
    # Treat as plain pattern string with default flags
    flags_str = default_flags.to_s
    options = flags_to_options(flags_str)

    begin
      regexp_obj = Regexp.new(clean_input, options)

      {
        pattern: clean_input,
        flags: flags_str,
        regexp: regexp_obj,
        options: options
      }
    rescue RegexpError => e
      raise RegexpError, "Invalid regex pattern '#{input}': #{e.message}"
    end
  end
end

.parse_structured_pattern(pattern_hash) ⇒ Object

Note:

Not yet implemented.

Future enhancement to parse structured pattern definitions from a Hash.

Parameters:

  • pattern_hash (Hash)

    A hash with ‘pattern’ and ‘flags’ keys.

Raises:

  • (NotImplementedError)

    Always raises this error.



104
105
106
107
108
109
# File 'lib/schemagraphy/regexp_utils.rb', line 104

def parse_structured_pattern pattern_hash
  # TODO: Implement structured pattern parsing
  # pattern_hash should have 'pattern' and 'flags' keys
  # flags can be string or array
  raise NotImplementedError, 'Structured pattern parsing not yet implemented'
end

.parse_tagged_pattern(tagged_input, tag_type) ⇒ Object

Note:

Not yet implemented.

Future enhancement to parse custom YAML tags for regular expressions.

Parameters:

  • tagged_input (String)

    The input string with a YAML tag.

  • tag_type (Symbol)

    The type of tag, e.g., :literal or :pattern.

Raises:

  • (NotImplementedError)

    Always raises this error.



116
117
118
119
120
# File 'lib/schemagraphy/regexp_utils.rb', line 116

def parse_tagged_pattern tagged_input, tag_type
  # TODO: Implement custom YAML tag parsing
  # tag_type would be :literal or :pattern
  raise NotImplementedError, 'Tagged pattern parsing not yet implemented'
end