Method: ParsingMachine#captures

Defined in:
lib/rpeg/parsing_machine.rb

#capturesObject

Returns the captures obtained when we ran the machine.

If there are no captures we return the final index into the subject string. This is typically one past the matched section. If there is exactly one capture we return it. If there are multiple captures we return them in an array.

The capture code in LPEG (mostly in lpcap.c) looks complicated at first but it is made up of a bunch of pieces that each do one thing and coordinate well togehter. Some extra complexity comes from the manual memory management required in C and the need to interact with Lua values - this appears to be especially the case with the Runtime capture code, which is bewildering at first view. Porting it one capture kind at a time let me understand it at some level as I went.

Basic model:

  • We push Breadcrumb objects onto the stack as we run the VM based on the instructions generated from the patterns. We never pop anything from the stack: the Captures are breadcrumbs that let us work out after the fact what happend. Things do get removed from the Capture stack but only at backtrack points because a match has failed.

  • The End instruction tacks on an unbalanced CloseCapture. This appears to be simply an end-marker like the null string terminator. We don’t do this

  • After the VM runs we analyze the Breadcrumbs to calculate the captures. We go back and forth through the data. So isn’t not a stack, but an array.

This method plays the same role as LPEG’s getcaptures (lpcap.c)



428
429
430
431
432
433
434
435
436
437
438
439
440
# File 'lib/rpeg/parsing_machine.rb', line 428

def captures
  raise "Cannot call #captures unless machine ran sucessfully" unless done? && success?

  @capture_state = new_capture_state
  @capture_state.capture_all

  result = @capture_state.captures

  return @subject_index if result.empty?
  return result.first if result.size == 1

  result
end