panini

Panini is a flexible toolkit that enables you to generate sentences from a context-free grammar, also known as a CFG.

CFG Background

Informally, a context-free grammar consists of a set of productions rules where a nonterminal on the right hand side of the produces a string of terminals and nonterminals on the right hand side. Like this:

S -> AB
A -> a
B -> b

I the above example, S, A, and B are all nonterminals. a and b are terminals. Furthermore, the nonterminal S is the start symbol for this CFG. By applying the productions as follows:

S     (start symbol)
AB    (apply S -> AB)
aB    (apply A -> a)
ab    (apply B -> b)

The sentence ab is generated. In fact, this is the only sentence this grammar can produce! By adding one additional production to the grammar:

S -> ASB

The grammar may now potentially create an infinite number of sentences. They will all have the form of a<sup>i</sup>b<sup>i</sup> where i > 1. Here is one more example derivation:

S      (start symbol)
ASB    (apply S -> ASB)
aSB    (apply A -> a)
aSb    (apply B -> b)
aaSbb  (apply S -> ASB)
aaABbb (apply S -> AB)
aaaBbb (apply A -> a)
aaabbb (apply B -> b)

You learn more about CFGs, you can reference the CFG article on Wikipedia</a>.

Getting Started With Panini

Defining a Grammar

Defining a grammar is easy. Create a grammar object, add some nonterminals and then add the productions to those nonterminals.

Here’s how the grammar from above is defined:

grammar = Panini::Grammar.new

nt_s = grammar.add_nonterminal
nt_a = grammar.add_nonterminal
nt_b = grammar.add_nonterminal

n_s.add_production([n_a, n_b])        # S -> AB
n_s.add_production([n_a, n_s, n_b])   # S -> ASB
n_a.add_production(['a'])             # A -> 'a'
n_b.add_production(['b'])             # A -> 'b'

Start Symbols

In order to derive sentences, the grammar needs a start symbol. Any nonterminal in the grammar can be used as the start symbol. If a start symbol is not explicitly set, then the first nonterminal added to the grammar is used.

grammar = Panini::Grammar.new

nt_0 = grammar.add_nonterminal
nt_1 = grammar.add_nonterminal

grammar.start = nt_1

Derivators

Derivators are objects that take a Panini::Grammar and then apply the rules to generate a sentence. Creating the sentences from the grammar can be tricky, and certain derivation strategies may be better for some grammars. There are currently two main derivators.

Panini::DerivationStrategy::RandomDampened

This strategy creates random sentences given a grammar. It employs a dampening factor to keep the computation of the sentence from blowing up.

derivator = Panini::DerivationStrategy::RandomDampened.new(grammar)

This will return a different sentence each time it is called. It may (and probably will) return the same sentence multiple times.

Panini::DerivationStrategy::Exhaustive

This strategy is used to exhaustively create all of the sentences that may be created by a grammar.

derivator = Panini::DerivationStrategy::RandomDampened.new(grammar, length_limit)

This will return a new sentence with each call. If there are no additional sentences to be created it will return nil. The same sentence may be returned multiple times if the grammar can derive the sentence in multiple ways.

You can optionally pass a limit for the size of sentences to be generated.

Generating a Sentence

To generate a sentence, call the derivator’s sentence method like thus:

derivator.sentence -> ['a', 'a', 'b', 'b']

You will get a new sentence (depending on the grammar) with every call:

derivator.sentence -> ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b']

Example

In this example, we create a grammar that generates mathematical expressions.

# ================
# = Nonterminals =
# ================
expression = grammar.add_nonterminal("EXPR")
term = grammar.add_nonterminal("TERM")
factor = grammar.add_nonterminal("FACT")
identifier = grammar.add_nonterminal("ID")
number = grammar.add_nonterminal("NUM")

# =============
# = Terminals =
# =============
expression.add_production([term, '+', term])
expression.add_production([term, '-', term])
expression.add_production([term])

term.add_production([factor, '*', term])
term.add_production([factor, '/', term])
term.add_production([factor])

factor.add_production([identifier])
factor.add_production([number])
factor.add_production(['(', expression, ')'])

('a'..'z').each do |v|
  identifier.add_production([v])
end

(0..100).each do |n|
  number.add_production([n])
end

# ===============================================
# = Choose a strategy and create some sentences =
# ===============================================
deriver = Panini::DerivationStrategy::RandomDampened.new(grammar)
10.times do
  puts "#{deriver.sentence.join(' ')}"
end

Contributing to panini

Contributions to this gem are appreciated!

  • Check out the latest master to make sure the feature hasn’t been implemented or the bug hasn’t been fixed yet

  • Check out the issue tracker to make sure someone already hasn’t requested it and/or contributed it

  • Fork the project

  • Start a feature/bugfix branch

  • Commit and push until you are happy with your contribution

  • Make sure to add tests for it. This is important so I don’t break it in a future version unintentionally.

  • Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.

To Do

Features

  • Detect invalid grammars

  • Weighted productions.

  • Arbitrary start symbol.

  • Support Enumerator!

  • DSL or string based grammar definitions

  • Purdom Derivator?

  • Actions

Examples

  • Natural language

  • XML

  • JSON

  • Address

  • Tree/Flower (PS?)

  • Simulated user actions

Copyright © 2011 Matthew Bellantoni. See LICENSE.txt for further details.