EBNF Build Status

EBNF parser and generic parser generator.

Description

This is a Ruby implementation of an EBNF and [BNF][] parser and parser generator. It parses EBNF grammars to [BNF][], generates [First/Follow and Branch][] tables for LL(1) grammars, which can be used with the stream Tokenizer and LL(1) Parser.

Of note in this implementation is that the tokenizer and parser are streaming, so that they can process inputs of arbitrary size.

Usage

Parsing an LL(1) Grammar

require 'ebnf'

ebnf = EBNF.parse(File.open(./etc/ebnf.bnf))

Output rules and terminals as S-Expressions, Turtle or EBNF

puts ebnf.to_sxp
puts ebnf.to_ttl
puts ebnf.to_ebnf

Transform EBNF to BNF (generates alt or seq from plus, star or opt)

ebnf.make_bnf

Generate First/Follow rules for BNF grammars

ebnf.first_follow(start_tokens)

Generate Terminal, [First/Follow and Branch][] tables as Ruby for parsing grammars

ebnf.to_ruby

Creating terminal definitions and parser rules to parse generated grammars

The parser is initialized to callbacks invoked on entry and exit to each terminal and production. A trivial parser loop can be described as follows:

require 'ebnf/ll1/parser'
require 'meta'

class Parser
  include Meta

  terminal(:SYMBOL, /([a-z]|[A-Z]|[0-9]|_)+/) do |prod, token, input|
    # Add data based on scanned token to input
    input[:symbol] = token.value
  end

  start_production(:rule) do |input, current, callback|
    # Process on start of production
    # Set state for entry into recursed rules through current

    # Callback to parser loop with callback
  end

  production(:rule) do |input, current, callback|
    # Process on end of production
    # return results in input, retrieve results from recursed rules in current

    # Callback to parser loop with callback
  end

  def initialize(input)
    parser_options = {
      :branch => BRANCH,
      :first => FIRST,
      :follow => FOLLOW
    }
    parse(input, start_symbol, parser_options) do |context, *data|
      # Process calls from callback from productions

    rescue ArgumentError, RDF::LL1::Parser::Error => e
      progress("Parsing completed with errors:\n\t#{e.message}")
      raise RDF::ReaderError, e.message if validate?
    end

EBNF Grammar

The EBNF variant used here is based on [W3C][] as defined in the XML 1.0 recommendation, with minor extensions.

/* An EBNF grammar for EBNF */
[1] ebnf        ::= (declaration | rule)*

[2] declaration ::= '@terminals' | '@pass'

[3] rule        ::= lhs '::=' expression

[4] lhs         ::= '[' (SYMBOL | '.')+ ']' SYMBOL

[5] expression  ::= alt

[6] alt         ::= seq ('|' seq)*

[7] seq         ::= diff+

[8] diff        ::= postfix ('-' postfix)*

[9] postfix     ::= primary ( [?*+] )?

[10] primary    ::= HEX
                |   RANGE
                |   ENUM 
                |   O_RANGE
                |   O_ENUM
                |   STRING1
                |   STRING2
                |   '(' expression ')'

@terminals

[11] SYMBOL     ::= ([a-z] | [A-Z] | [0-9] | "_")+

[12] HEX        ::= '#x' ([0-9] | [a-f] | [A-F])+

[13] RANGE      ::= '[' CHAR '-' CHAR ']'

[14] ENUM       ::= '[' CHAR+ ']'

[15] O_RANGE    ::= '[^' CHAR '-' CHAR ']'

[16] OENUM      ::= '[^' CHAR+ ']'

[17] STRING1    ::= '"' (CHAR - '"')* '"'

[18] STRING2    ::= "'" (CHAR - "'")* "'"

[19] CHAR       ::= HEX
                |   ('\\' [\\trn'"])
                |   [^\t\r\n'"]

@pass           ::= (
                      [#x20\t\r\n]
                    |  
                    )+

Acknowledgements

Much of this work, particularly the generic parser, is inspired by work originally done by Tim Berners-Lee's Python predictive parser.

The EBNF parser was inspired by Dan Connolly's EBNF to Turtle processor, EBNF to BNF Notation-3 rules, and First Follow Notation-3 rules.

Documentation

Full documentation available on Rubydoc.info.

Future Work

  • Detect FIRST/FOLLOW and left-recursion conflicts.
  • Generate HTML output of parser results.
  • Better LL(1) parser tests

Author

Contributing

  • Do your best to adhere to the existing coding conventions and idioms.
  • Don't use hard tabs, and don't leave trailing whitespace on any line.
  • Do document every method you add using YARD annotations. Read the tutorial or just look at the existing code for examples.
  • Don't touch the .gemspec, VERSION or AUTHORS files. If you need to change them, do so on your private branch only.
  • Do feel free to add yourself to the CREDITS file and the corresponding list in the the README. Alphabetical order applies.
  • Do note that in order for us to merge any non-trivial changes (as a rule of thumb, additions larger than about 15 lines of code), we need an explicit public domain dedication on record from you.

License

This is free and unencumbered public domain software. For more information, see http://unlicense.org/ or the accompanying UNLICENSE file.