Zenlish

Linux build Status Build status Gem Version License

What is Zenlish ?

Zenlish = Zen + English

Zenlish will be a Controlled Natural Language based on English.
A Controlled Natural Language is a subset of a natural language -here English- limited to specific problem domains.

What is the purpose of Zenlish ?

The goal of this project is to implement a toolkit for a subset of the English language. With Zenlish it should be possible for a Ruby application to interact with users with a language that is close enough to English.

Project status

The project is still in inception. Currently, zenlish is able to parse all sentences from lessons 1-A up to 3-G from Learn These Words First.
The parser is able to cope with syntactical ambiguities generating parse forests instead of parse trees.

The intent is to deliver gem versions in small increments.

Zenlish as a library (gem)

Over time, the zenlish gem will contain:

  • A tokenizer (tagging, lemmatizer)[TODO]
  • A lexicon [STARTED]
  • A context-free grammar [STARTED]
  • A parser [STARTED]
  • Feature unification (for number, gender agreement,...)[STARTED]
  • A simplified ontology[TODO]

Some project metrics (v. 0.2.00)

Metric Value
Number of lemmas in dictionary 141
Coverage 100 commonest English words 61%
Number of production rules in grammar 185
Number of lessons covered 23
Number of sentences in spec files 352

Installation...

...with Rubygem

Install the gem yourself as:

$ gem install zenlish

...with Bundler

Add this line to your application's Gemfile:

gem 'zenlish'

And then execute:

$ bundle

Some code snippets

Interacting with the dictionary:

require 'zenlish'

# Retrieving a "word" (more precisely, a lexeme) from the dictionary.
lexeme = Zenlish::Lang::Dictionary.get_lexeme('move')

# What is the Ruby class of a lexeme?
p lexeme.class # => Zenlish::Lex::Lexeme

# What is the word class of verb 'move'?
p lexeme.wclass.class # => Zenlish::WClasses::RegularVerb

# Here is some Zenlish text to analyze:
some_text = 'one person can move to the same place.'
p some_text

some_text.scan(/(?:\w+)|[\.,:"]/).each do |entry|
  lexeme = Zenlish::Lang::Dictionary.get_lexeme(entry)
  p lexeme.wclass.class
end

# Loop result should be:
# Zenlish::WClasses::Cardinal
# Zenlish::WClasses::CommonNoun
# Zenlish::WClasses::ModalVerbCan
# Zenlish::WClasses::RegularVerb
# Zenlish::WClasses::Preposition
# Zenlish::WClasses::DefiniteArticle
# Zenlish::WClasses::Adjective
# Zenlish::WClasses::CommonNoun
# Rley::Syntax::Terminal

Demo of lexeme inflections

# Demo inflection (aka declension, conjugation)
require 'zenlish'

# The Zenlish dictionary is more than a list of words...
dict = Zenlish::Lang::Dictionary

# What are the spellings of a given common noun?
noun_body = dict.get_lexeme('body')
p noun_body.all_inflections # => ["body", "bodies"]

# What are the word forms of a personal pronoun (3rd person)?
p_3rd_pn = dict.get_lexeme('it')
p p_3rd_pn.all_inflections # => ["she", "her", "he", "him", "it", "they", "them"]

# What are the distinct forms of a regular verb?
vb_touch = dict.get_lexeme('touch')
p vb_touch.all_inflections # => ["touch", "touching", "touched", "touches"]

# What are the forms of the (highly) irregular verb be?
vb_be = dict.get_lexeme('be', Zenlish::WClasses::IrregularVerbBe)
p vb_be.all_inflections # => ["am", "being", "was", "been", "are", "were", "is"]

More to come...

Principles behind the Zenlish language

Minimalism

The name of the language is a combination of 'Zen' and 'English'.
It reflects a desire to make Zenlish a simple language:

  • The focus is put on a simplified syntax,
  • A limited lexicon. Priority on most commonly used words.

Expressiveness

Zenlish should be rich enough to express ideas, facts in a fluid way (vs. contrived, artificial way). Litmus test: a Zenlish text should be easy to read to a English reading person.

Roadmap

Here a tentative roadmap:

A) Ability to parse sentences from Learn These Words First

STARTED. 24% complete
This website advocates the idea of a multi-layered dictionary. At the core, there are about 300 essential words.
The choice of these words is inspired by the semantic primitives of NSM (Natural Semantic Metalanguage).
The essential words are introduced in twelve lessons. Each lesson put the words in exemplar sentences and pictures.

The milestone sub-goals are:

  • To inject the 300 core words into Zenlish lexicon,
  • Zenlish should be able to parse all the example sentences

B) Associate lexical features to terms in lexicon

STARTED The sub-goals are:

  • To enrich the lexicon entries with lexical and syntactical features.
  • Zenlish should be able to derive the declensions of nouns, conjugation of verbs,
  • Also Zenlish should detect agreement errors
  • Ideally, Zenlish should have a lemmatizer

C) Enrich lexicon entries with semantical features and relationships

The sub-goals are:

  • To enrich the lexicon entries with lexical and syntactical features.
  • Zenlish should be able to derive the declensions of nouns, conjugation of verbs,
  • Also Zenlish should detect agreement errors

D) Build a generic ontology and map Zenlish text to it.

The sub-goals are:

  • To have a simplified ontology that covers the concepts covered in the lesson sentences.
  • Hopefully Zenlish should be answer to queries related to the lesson sentences.

E) Capability to parse a complete book

A good candidate book is "The Edge of the Sky" by Roberto Trotta (ISBN 978-0-465-04471-9 : hardcover, ISBN 978-0-465-04490-0 : ebook).
Professor Trotta challenged himself by writing a book on Cosmology with the 1000 most used words. More details here.

In order to achieve this goal, Zenlish should:

  • Incorporate the 1000 words in its lexicon
  • Have a grammar that allows the parsing of the sentences in the book.

F) Capability to interpret the meaning of a complete book

Probably, far-fetched. But it will be nice to launch query to Zenlish to check if it has some understanding of the text it reads (i.e. has a semantic representation).

Usage

TODO: Write usage instructions here

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/famished-tiger/Zenlish. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

The gem is available as open source under the terms of the MIT License.

Code of Conduct

Everyone interacting in the Zenlish project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.