Linguistics

Authors

Requirements

  • Ruby >= 1.8.6

Optional

General Information

Linguistics is a framework for building linguistic utilities for Ruby objects in any language. It includes a generic language-independant front end, a module for mapping language codes into language names, and a module which contains various English-language utilities.

Method Interface

The Linguistics module comes with a language-independant mechanism for extending core Ruby classes with linguistic methods.

It consists of three parts: a core linguistics module which contains the class-extension framework for languages, a generic inflector class that serves as a delegator for linguistic methods on Ruby objects, and one or more language-specific modules which contain the actual linguistic functions.

The module works by adding a single instance method for each language named after the language's two-letter code (or three-letter code, if no two-letter code is defined by ISO639) to various Ruby classes. This allows many language-specific methods to be added to objects without cluttering up the interface or risking collision between them, albeit at the cost of three or four more characters per method invocation.

If you don't like extending core Ruby classes, the language modules should also allow you to use them as a function library as well.

For example, the English-language module contains a #plural function which can be accessed via a method on a core class:

Linguistics::use( :en )
"goose".en.plural
# => "geese"

or via the Linguistics::EN::plural function directly:

include Linguistics::EN
plural( "goose" )
# => "geese"

The class-extension mechanism actually uses the functional interface behind the scenes.

A new feature with the 0.02 release: You can now omit the language-code method for unambiguous methods by calling Linguistics::use with the :installProxy configuration key, with the language code of the language module whose methods you wish to be available. For example, instead of having to call:

"goose".en.plural

from the example above, you can now do this:

Lingusitics::use( :en, :installProxy => :en )
"goose".plural
# => "geese"

More about how this works in the documentation for Linguistics::use.

Adding Language Modules

To add a new language to the framework, create a file named the same as the ISO639 2- or 3-letter language code for the language you're adding. It must be placed under lib/linguistics/ to be recognized by the linguistics module, but you can also just require it yourself prior to calling Linguistics::use(). This file should define a module under Linguistics that is an all-caps version of the code used in the filename. Any methods you wish to be exposed to users should be declared as module functions (ie., using Module#module_function).

You may also wish to add your module to the list of default languages by adding the appropriate symbol to the Linguistics::DefaultLanguages array.

For example, to create a Portuguese-language module, create a file called 'lib/linguistics/pt.rb' which contains the following:

module Linguistics
  module PT
    Linguistics::DefaultLanguages << :pt

    module_function

<language methods here>

  end
end

See the English language module (lib/linguistics/en.rb) for an example.

English Language Module

See the README.english file for a synopsis.

The English-language module currently contains linguistic functions ported from a few excellent Perl modules:

Lingua::EN::Inflect
Lingua::Conjunction
Lingua::EN::Infinitive

See the lib/linguistics/en.rb file for specific attributions.

New with version 0.02: integration with the Ruby WordNet? and LinkParser modules (which must be installed separately).

To Do

  • I am planning on improving the results from the infinitive functions, which currently return useful results only part of the time. Investigations into additional stemming functions and some other strategies are ongoing.

  • Martin Chase <stillflame at FaerieMUD dot org> is working on an integration module for his excellent work on a Ruby interface to the CMU Link Grammar (an english-sentence parser). This will make writing fairly accurate natural language parsers in Ruby much easier.

  • Suggestions (and patches) for any of these items or additional features are welcomed.

This module is Open Source Software which is Copyright (c) 2003 by The FaerieMUD Consortium. All rights reserved.

You may use, modify, and/or redistribute this software under the terms of the Perl Artistic License, a copy of which should have been included in this distribution (See the file Artistic). If it was not, a copy of it may be obtained from language.perl.com/misc/Artistic.html or www.faeriemud.org/artistic.html).

THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.

$Id$