ebnf

EBNF parser and generic parser generator.

Description

This is a Ruby implementation of an EBNF and [BNF][] parser and parser generator. It parses EBNF grammars to [BNF][], generates First/Follow and Branch tables for [LL(1)][] grammars, which can be used with the stream [Tokenizer][] and LL(1) Parser.

Of note in this implementation is that the tokenizer and parser are streaming, so that they can process inputs of arbitrary size.

Usage

Parsing an LL(1) Grammar

require 'ebnf'

ebnf = EBNF.parse(File.open(./etc/ebnf.bnf))

Output rules and terminals as S-Expressions, Turtle or EBNF

puts ebnf.to_sxp
puts ebnf.to_ttl
puts ebnf.to_ebnf

Transform EBNF to BNF (generates alt or seq from plus, star or opt)

ebnf.make_bnf

Generate First/Follow rules for BNF grammars

ebnf.first_follow(start_tokens)

Generate Branch, terminal and First/Follow tables as Ruby for parsing grammars

ebnf.to_ruby

Creating terminal definitions and parser rules to parse generated grammars

The parser is initialized to callbacks invoked on entry and exit to each terminal and production. A trivial parser loop can be described as follows:

require 'ebnf/ll1/parser'
require 'meta'

class Parser
  include Meta

  terminal(:SYMBOL, /([a-z]|[A-Z]|[0-9]|_)+/) do |parser, prod, token, input|
    # Add data based on scanned token to input
    input[:symbol] = token.value
  end

  production(:rule) do |parser, phase, input, current, callback|
    # Process on start of production when phase == :start
    # Set state for entry into recursed rules through current

    # Process on end of production when phase == :finish
    # return results in input, retrieve results from recursed rules in current

    # Callback to parser loop with callback
  end

  def initialize(input)
    parser_options = {
      :branch => BRANCH,
      :first => FIRST,
      :follow => FOLLOW
    }
    parse(input, start_symbol, parser_options) do |context, *data|
      # Process calls from callback from productions

    rescue ArgumentError, RDF::LL1::Parser::Error => e
      progress("Parsing completed with errors:\n\t#{e.message}")
      raise RDF::ReaderError, e.message if validate?
    end

EBNF Grammar

The EBNF variant used here is based on [W3C][] as defined in the XML 1.0 recommendation, with minor extensions.

/* An EBNF grammar for EBNF */
[1] ebnf        ::= (declaration | rule)*

[2] declaration ::= '@terminals' | '@pass'

[3] rule        ::= lhs '::=' expression

[4] lhs         ::= '[' (SYMBOL | '.')+ ']' SYMBOL

[5] expression  ::= alt

[6] alt         ::= seq ('|' seq)*

[7] seq         ::= diff+

[8] diff        ::= postfix ('-' postfix)*

[9] postfix     ::= primary ( [?*+] )?

[10] primary    ::= HEX
                |   RANGE
                |   ENUM 
                |   O_RANGE
                |   O_ENUM
                |   STRING1
                |   STRING2
                |   '(' expression ')'

@terminals

[11] SYMBOL     ::= ([a-z] | [A-Z] | [0-9] | "_")+

[12] HEX        ::= '#x' ([0-9] | [a-f] | [A-F])+

[13] RANGE      ::= '[' CHAR '-' CHAR ']'

[14] ENUM       ::= '[' CHAR+ ']'

[15] O_RANGE    ::= '[^' CHAR '-' CHAR ']'

[16] OENUM      ::= '[^' CHAR+ ']'

[17] STRING1    ::= '"' (CHAR - '"')* '"'

[18] STRING2    ::= "'" (CHAR - "'")* "'"

[19] CHAR       ::= HEX
                |   ('\\' [\\trn'"])
                |   [^\t\r\n'"]

@pass           ::= (
                      [#x20\t\r\n]
                    |  
                    )+

Documentation

Full documentation available on [Rubydoc.info][EBNF doc].

Author

Contributing

  • Do your best to adhere to the existing coding conventions and idioms.
  • Don't use hard tabs, and don't leave trailing whitespace on any line.
  • Do document every method you add using YARD annotations. Read the tutorial or just look at the existing code for examples.
  • Don't touch the .gemspec, VERSION or AUTHORS files. If you need to change them, do so on your private branch only.
  • Do feel free to add yourself to the CREDITS file and the corresponding list in the the README. Alphabetical order applies.
  • Do note that in order for us to merge any non-trivial changes (as a rule of thumb, additions larger than about 15 lines of code), we need an explicit public domain dedication on record from you.

License

This is free and unencumbered public domain software. For more information, see http://unlicense.org/ or the accompanying UNLICENSE file.

[EBNF doc]:

[Tokenizer]: