Class: Oniguruma::ORegexp

Inherits:
Object
  • Object
show all
Defined in:
lib/oniguruma.rb,
ext/oregexp.c

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(pattern, options) ⇒ ORegexp

Constructs a new regular expression from pattern, which is a String. The paramter options is a Hash of the form:

{ :options => option_value, :encoding => encoding_value, :syntax => syntax_value }

Where option_value is a bitwise OR of Oniguruma::OPTION_XXX constants; encoding_value is one of Oniguruma::ENCODING_XXX constants; and syntax_value is one of Oniguruma::SYNTAX_XXX constants.

r1 = ORegexp.new('^a-z+:\\s+\w+')                                            #=> /^a-z+:\s+\w+/
r2 = ORegexp.new('cat', :options => OPTION_IGNORECASE )                      #=> /cat/i
r3 = ORegexp.new('dog', :options => OPTION_EXTEND )                          #=> /dog/x

#Accept java syntax on SJIS encoding:
r4 = ORegexp.new('ape', :syntax  => SYNTAX_JAVA, :encoding => ENCODING_SJIS) #=> /ape/


139
140
141
142
# File 'lib/oniguruma.rb', line 139

def initialize( pattern, options = {} )
   defaults = { :options => OPTION_DEFAULT, :encoding => ENCODING_ASCII, :syntax => SYNTAX_DEFAULT}
   old_initialize( pattern,  defaults.merge( options ).freeze )
end

Class Method Details

.escape(*args) ⇒ Object Also known as: quote

call-seq: ORegexp.escape(str) => a_str ORegexp.quote(str) => a_str

Escapes any characters that would have special meaning in a regular expression. Returns a new escaped string, or self if no characters are escaped. For any string, Regexp.escape(str)=~str will be true.

ORegexp.escape('\\*?{}.')   #=> \\\\\*\?\{\}\.


86
87
88
# File 'lib/oniguruma.rb', line 86

def escape( *args )
   Regexp.escape( *args )
end

.last_match(index = nil) ⇒ Object

call-seq:

ORegexp.last_match           => matchdata
ORegexp.last_match(fixnum)   => str

The first form returns the MatchData object generated by the last successful pattern match. The second form returns the nth field in this MatchData object.

ORegexp.new( 'c(.)t' ) =~ 'cat'       #=> 0
ORegexp.last_match                    #=> #<MatchData:0x401b3d30>
ORegexp.last_match(0)                 #=> "cat"
ORegexp.last_match(1)                 #=> "a"
ORegexp.last_match(2)                 #=> nil


107
108
109
110
111
112
113
# File 'lib/oniguruma.rb', line 107

def last_match( index = nil)
   if index
      @@last_match[index]
   else
      @@last_match
   end
end

Instance Method Details

#==(regexp) ⇒ Object Also known as: eql?

call-seq:

rxp == other_rxp      => true or false
rxp.eql?(other_rxp)   => true or false

Equality—Two regexps are equal if their patterns are identical, they have the same character set code, and their #casefold? values are the same.



152
153
154
# File 'lib/oniguruma.rb', line 152

def == regexp
   @pattern == regexp.source && kcode == regexp.kcode && casefold? == regexp.casefold?
end

#=~(string) ⇒ Object Also known as: ===

call-seq:

rxp =~ string  => int or nil

Matches rxp against string, returning the offset of the start of the match or nil if the match failed. Sets $~ to the corresponding MatchData or nil.

ORegexp.new( 'SIT' ) =~ "insensitive"                                 #=>    nil
ORegexp.new( 'SIT', :options => OPTION_IGNORECASE ) =~ "insensitive"  #=>    5


253
254
255
256
257
258
# File 'lib/oniguruma.rb', line 253

def =~ string
   return nil unless string
   m = match( string )
   return nil unless m
   m.begin(0)
end

#casefold?Boolean

call-seq:

rxp.casefold?   => true of false

Returns the value of the case-insensitive flag.



162
163
164
# File 'lib/oniguruma.rb', line 162

def casefold?
   (@options[:options] & OPTION_IGNORECASE) > 0
end

#gsub(*args) ⇒ Object



459
460
461
# File 'ext/oregexp.c', line 459

static VALUE oregexp_m_gsub(int argc, VALUE *argv, VALUE self) {
  return oregexp_safe_gsub(self, argc, argv, 0, 0);
}

#gsub!(*args) ⇒ Object



466
467
468
# File 'ext/oregexp.c', line 466

static VALUE oregexp_m_gsub_bang(int argc, VALUE *argv, VALUE self) {
  return oregexp_safe_gsub(self, argc, argv, 1, 0);
}

#inspectObject

call-seq:

rxp.inspect   => string

Returns a readable version of rxp

ORegexp.new( 'cat', :options => OPTION_MULTILINE | OPTION_IGNORECASE ).inspect  => /cat/im
ORegexp.new( 'cat', :options => OPTION_MULTILINE | OPTION_IGNORECASE ).to_s     => (?im-x)cat


235
236
237
238
239
240
241
# File 'lib/oniguruma.rb', line 235

def inspect
   opt_str = ""
   opt_str += "i" if (@options[:options] & OPTION_IGNORECASE) > 0
   opt_str += "m" if (@options[:options] & OPTION_MULTILINE) > 0
   opt_str += "x" if (@options[:options] & OPTION_EXTEND) > 0
   "/" + ORegexp.escape( @pattern ) + "/" + opt_str
end

#kcodeObject

call-seq:

rxp.kode        => int

Returns the character set code for the regexp.



170
171
172
# File 'lib/oniguruma.rb', line 170

def kcode
   @options[:encoding]
end

#match(str) ⇒ MatchData?

Returns a MatchData object describing the match, or nil if there was no match. This is equivalent to retrieving the value of the special variable $~ following a normal match.

/(.)(.)(.)/.match("abc")[2]   #=> "b"


194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
# File 'ext/oregexp.c', line 194

static VALUE oregexp_match( VALUE self, VALUE string ) {
   ORegexp *oregexp;
   Data_Get_Struct( self, ORegexp, oregexp );

   VALUE string_str = StringValue( string );
   UChar* str_ptr = RSTRING(string_str)->ptr;
   int str_len = RSTRING(string_str)->len;

   OnigRegion *region = onig_region_new();
   int r = onig_search(oregexp->reg, str_ptr, str_ptr + str_len, str_ptr, str_ptr + str_len, region, ONIG_OPTION_NONE);
   if (r >= 0) {
      VALUE matchData = oregexp_make_match_data( oregexp, region, string_str);
      onig_region_free(region, 1 );
      return matchData;
   } else if (r == ONIG_MISMATCH) {
      onig_region_free(region, 1 );
      return Qnil;
   } else {
      onig_region_free(region, 1 );
      char s[ONIG_MAX_ERROR_MESSAGE_LEN];
      onig_error_code_to_str(s, r);
      rb_raise(rb_eException, "Oniguruma Error: %s", s);
   }

}

#match_all(string) ⇒ Object



282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
# File 'lib/oniguruma.rb', line 282

def match_all string
   matches = []
   positions = []
   position = 0
   tmp_string = string
   while tmp_string != ""
      if m = match( tmp_string )
         matches << m
         positions << position
         tmp_string = m.post_match
         position += m.end(0)
         #if m.end == m.begin
         #   tmp_string = tmp_string[1..-1]
         #   position += 1
         #end
      else
         break
      end
   end
   if matches.size > 0
      MultiMatchData.new( string, matches, positions )
   else
      nil
   end
end

#old_initializeObject

:stopdoc:



117
# File 'lib/oniguruma.rb', line 117

alias old_initialize initialize

#optionsObject

call-seq:

rxp.options   => fixnum

Returns the set of bits corresponding to the options used when creating this ORegexp (see ORegexp::new for details. Note that additional bits may be set in the returned options: these are used internally by the regular expression code. These extra bits are ignored if the options are passed to ORegexp::new.

Oniguruma::OPTION_IGNORECASE                                 #=> 1
Oniguruma::OPTION_EXTEND                                     #=> 2
Oniguruma::OPTION_MULTILINE                                  #=> 4

Regexp.new(r.source, :options => Oniguruma::OPTION_EXTEND )  #=> 2


189
190
191
# File 'lib/oniguruma.rb', line 189

def options
   @options[:options]
end

#sourceObject



278
279
280
# File 'lib/oniguruma.rb', line 278

def source
   @pattern.freeze
end

#sub(*args) ⇒ Object



462
463
464
# File 'ext/oregexp.c', line 462

static VALUE oregexp_m_sub(int argc, VALUE *argv, VALUE self) {
  return oregexp_safe_gsub(self, argc, argv, 0, 1);
}

#sub!(*args) ⇒ Object



469
470
471
# File 'ext/oregexp.c', line 469

static VALUE oregexp_m_sub_bang(int argc, VALUE *argv, VALUE self) {
  return oregexp_safe_gsub(self, argc, argv, 1, 1);
}

#to_sObject

call-seq:

rxp.to_s   => str

Returns a string containing the regular expression and its options (using the (?xxx:yyy) notation. This string can be fed back in to Regexp::new to a regular expression with the same semantics as the original. (However, Regexp#== may not return true when comparing the two, as the source of the regular expression itself may differ, as the example shows). Regexp#inspect produces a generally more readable version of rxp.

r1 = ORegexp.new( 'ab+c', :options OPTION_IGNORECASE | OPTION_EXTEND ) #=> /ab+c/ix
s1 = r1.to_s                                                           #=> "(?ix-m:ab+c)"
r2 = ORegexp.new(s1)                                                   #=> /(?ix-m:ab+c)/
r1 == r2                                                               #=> false
r1.source                                                              #=> "ab+c"
r2.source                                                              #=> "(?ix-m:ab+c)"


211
212
213
214
215
216
217
218
219
220
221
222
223
224
# File 'lib/oniguruma.rb', line 211

def to_s
   opt_str = "(?"
   opt_str += "i" if (@options[:options] & OPTION_IGNORECASE) > 0
   opt_str += "m" if (@options[:options] & OPTION_MULTILINE) > 0
   opt_str += "x" if (@options[:options] & OPTION_EXTEND) > 0
   unless opt_str == "(?imx"
      opt_str += "-"
      opt_str += "i" if (@options[:options] & OPTION_IGNORECASE) == 0
      opt_str += "m" if (@options[:options] & OPTION_MULTILINE) == 0
      opt_str += "x" if (@options[:options] & OPTION_EXTEND) == 0
   end
   opt_str += ")"
   opt_str + ORegexp.escape( @pattern )
end