Class: Scanf::FormatSpecifier

Inherits:
Object
  • Object
show all
Defined in:
lib/scanf.rb

Overview

Technical notes

Rationale behind scanf for Ruby

The impetus for a scanf implementation in Ruby comes chiefly from the fact that existing pattern matching operations, such as Regexp#match and String#scan, return all results as strings, which have to be converted to integers or floats explicitly in cases where what's ultimately wanted are integer or float values.

Design of scanf for Ruby

scanf for Ruby is essentially a <format string>-to-<regular expression> converter.

When scanf is called, a FormatString object is generated from the format string (“%d%s…”) argument. The FormatString object breaks the format string down into atoms (“%d”, “%5f”, “blah”, etc.), and from each atom it creates a FormatSpecifier object, which it saves.

Each FormatSpecifier has a regular expression fragment and a “handler” associated with it. For example, the regular expression fragment associated with the format “%d” is “([-+]?d+)”, and the handler associated with it is a wrapper around String#to_i. scanf itself calls FormatString#match, passing in the input string. FormatString#match iterates through its FormatSpecifiers; for each one, it matches the corresponding regular expression fragment against the string. If there's a match, it sends the matched string to the handler associated with the FormatSpecifier.

Thus, to follow up the “%d” example: if “123” occurs in the input string when a FormatSpecifier consisting of “%d” is reached, the “123” will be matched against “([-+]?d+)”, and the matched string will be rendered into an integer by a call to to_i.

The rendered match is then saved to an accumulator array, and the input string is reduced to the post-match substring. Thus the string is “eaten” from the left as the FormatSpecifiers are applied in sequence. (This is done to a duplicate string; the original string is not altered.)

As soon as a regular expression fragment fails to match the string, or when the FormatString object runs out of FormatSpecifiers, scanning stops and results accumulated so far are returned in an array.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(str) ⇒ FormatSpecifier

Returns a new instance of FormatSpecifier


331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
# File 'lib/scanf.rb', line 331

def initialize(str)
  @spec_string = str
  h = '[A-Fa-f0-9]'

  @re_string, @handler =
    case @spec_string

      # %[[:...:]]
    when /%\*?(\[\[:[a-z]+:\]\])/
      [ "(#{$1}+)", :extract_plain ]

      # %5[[:...:]]
    when /%\*?(\d+)(\[\[:[a-z]+:\]\])/
      [ "(#{$2}{1,#{$1}})", :extract_plain ]

      # %[...]
    when /%\*?\[([^\]]*)\]/
      yes = $1
      if /^\^/.match(yes) then no = yes[1..-1] else no = '^' + yes end
      [ "([#{yes}]+)(?=[#{no}]|\\z)", :extract_plain ]

      # %5[...]
    when /%\*?(\d+)\[([^\]]*)\]/
      yes = $2
      w = $1
      [ "([#{yes}]{1,#{w}})", :extract_plain ]

      # %i
    when /%\*?i/
      [ "([-+]?(?:(?:0[0-7]+)|(?:0[Xx]#{h}+)|(?:[1-9]\\d*)))", :extract_integer ]

      # %5i
    when /%\*?(\d+)i/
      n = $1.to_i
      s = "("
      if n > 1 then s += "[1-9]\\d{1,#{n-1}}|" end
      if n > 1 then s += "0[0-7]{1,#{n-1}}|" end
      if n > 2 then s += "[-+]0[0-7]{1,#{n-2}}|" end
      if n > 2 then s += "[-+][1-9]\\d{1,#{n-2}}|" end
      if n > 2 then s += "0[Xx]#{h}{1,#{n-2}}|" end
      if n > 3 then s += "[-+]0[Xx]#{h}{1,#{n-3}}|" end
      s += "\\d"
      s += ")"
      [ s, :extract_integer ]

      # %d, %u
    when /%\*?[du]/
      [ '([-+]?\d+)', :extract_decimal ]

      # %5d, %5u
    when /%\*?(\d+)[du]/
      n = $1.to_i
      s = "("
      if n > 1 then s += "[-+]\\d{1,#{n-1}}|" end
      s += "\\d{1,#{$1}})"
      [ s, :extract_decimal ]

      # %x
    when /%\*?[Xx]/
      [ "([-+]?(?:0[Xx])?#{h}+)", :extract_hex ]

      # %5x
    when /%\*?(\d+)[Xx]/
      n = $1.to_i
      s = "("
      if n > 3 then s += "[-+]0[Xx]#{h}{1,#{n-3}}|" end
      if n > 2 then s += "0[Xx]#{h}{1,#{n-2}}|" end
      if n > 1 then s += "[-+]#{h}{1,#{n-1}}|" end
      s += "#{h}{1,#{n}}"
      s += ")"
      [ s, :extract_hex ]

      # %o
    when /%\*?o/
      [ '([-+]?[0-7]+)', :extract_octal ]

      # %5o
    when /%\*?(\d+)o/
      [ "([-+][0-7]{1,#{$1.to_i-1}}|[0-7]{1,#{$1}})", :extract_octal ]

      # %f
    when /%\*?[aefgAEFG]/
      [ '([-+]?(?:0[xX](?:\.\h+|\h+(?:\.\h*)?)[pP][-+]\d+|\d+(?![\d.])|\d*\.\d*(?:[eE][-+]?\d+)?))', :extract_float ]

      # %5f
    when /%\*?(\d+)[aefgAEFG]/
      [ '(?=[-+]?(?:0[xX](?:\.\h+|\h+(?:\.\h*)?)[pP][-+]\d+|\d+(?![\d.])|\d*\.\d*(?:[eE][-+]?\d+)?))' +
        "(\\S{1,#{$1}})", :extract_float ]

      # %5s
    when /%\*?(\d+)s/
      [ "(\\S{1,#{$1}})", :extract_plain ]

      # %s
    when /%\*?s/
      [ '(\S+)', :extract_plain ]

      # %c
    when /\s%\*?c/
      [ "\\s*(.)", :extract_plain ]

      # %c
    when /%\*?c/
      [ "(.)", :extract_plain ]

      # %5c (whitespace issues are handled by the count_*_space? methods)
    when /%\*?(\d+)c/
      [ "(.{1,#{$1}})", :extract_plain ]

      # %%
    when /%%/
      [ '(\s*%)', :nil_proc ]

      # literal characters
    else
      [ "(#{Regexp.escape(@spec_string)})", :nil_proc ]
    end

  @re_string = '\A' + @re_string
end

Instance Attribute Details

#conversionObject (readonly)

Returns the value of attribute conversion


289
290
291
# File 'lib/scanf.rb', line 289

def conversion
  @conversion
end

#matchedObject (readonly)

Returns the value of attribute matched


289
290
291
# File 'lib/scanf.rb', line 289

def matched
  @matched
end

#matched_stringObject (readonly)

Returns the value of attribute matched_string


289
290
291
# File 'lib/scanf.rb', line 289

def matched_string
  @matched_string
end

#re_stringObject (readonly)

Returns the value of attribute re_string


289
290
291
# File 'lib/scanf.rb', line 289

def re_string
  @re_string
end

Instance Method Details

#count_space?Boolean

Returns:

  • (Boolean)

327
328
329
# File 'lib/scanf.rb', line 327

def count_space?
  /(?:\A|\S)%\*?\d*c|%\d*\[/.match(@spec_string)
end

#letterObject


469
470
471
# File 'lib/scanf.rb', line 469

def letter
  @spec_string[/%\*?\d*([a-z\[])/, 1]
end

#match(str) ⇒ Object


456
457
458
459
460
461
462
463
464
465
466
467
# File 'lib/scanf.rb', line 456

def match(str)
  @matched = false
  s = str.dup
  s.sub!(/\A\s+/,'') unless count_space?
  res = to_re.match(s)
  if res
    @conversion = send(@handler, res[1])
    @matched_string = @conversion.to_s
    @matched = true
  end
  res
end

#mid_match?Boolean

Returns:

  • (Boolean)

478
479
480
481
482
483
484
485
# File 'lib/scanf.rb', line 478

def mid_match?
  return false unless @matched
  cc_no_width    = letter == '[' &&! width
  c_or_cc_width  = (letter == 'c' || letter == '[') && width
  width_left     = c_or_cc_width && (matched_string.size < width)

  return width_left || cc_no_width
end

#to_reObject


452
453
454
# File 'lib/scanf.rb', line 452

def to_re
  Regexp.new(@re_string,Regexp::MULTILINE)
end

#to_sObject


323
324
325
# File 'lib/scanf.rb', line 323

def to_s
  @spec_string
end

#widthObject


473
474
475
476
# File 'lib/scanf.rb', line 473

def width
  w = @spec_string[/%\*?(\d+)/, 1]
  w && w.to_i
end