Tagmemics

Description

The English language is extremely complicated. We have words that can have multiple parts of speech. Natural language processing is difficult because it is hard to tell if a word is a noun when it could be a verb or an adjective, etc.

The purpose of this project is to develop an algorithm that, given a sentence string, has a ranking system that detects the part of speech of each word.

Why is the useful? Because understanding the correct parts of speech in a sentence is the first step to teaching a robot how to read.

The Goal

The endstate is to have usage like this:


Tagmemics.parse('I am the best thing since sliced bread and binary numbers')

# =>
# <ParsedSentence:0x007fc7ebba47e8
# @adjectives=["best", "binary", "sliced"],
# @articles=["the"],
# @conjunctions=["and"],
# @nouns=["bread", "numbers", "thing"],
# @prepositions=["since"],
# @pronouns=["I"],
# @str="I am the best thing since sliced bread and binary numbers"
# @verbs=["am"]
# >

Notice that sliced is an adjective here, but could also be a past-tense verb. Also, binary is an adjective, but could also be a noun.

This throws the possibility of having a simple hash of words out the window. Instead, the goal is to leverage the WordNet database to list the many possibilities of a given word and rank the possibilities by the part of speech of the word's neighbors.

For example, we know sliced and binary are both adjectives because they are both directly preceding a noun.

The algorithm that handles this ranking is the dream behind this project.

Current Thought Process

Note: this is informal knowledge of grammar and most likely needs improvement.

Cheat Sheet

  • Nouns (including pronouns) are a person, place or thing.

  • Verbs are the action.

  • Adjectives describe the what of a noun or pronoun.

  • Adverbs describe the how of a verb, adjective, or another adverb.

  • Articles are adjectives but have little meaning: "the, a, an" (zero probability of confusion)

  • Prepositions add context to a noun or verb in the form of a prepositional phrase (low probability of confusion).

  • Conjunctions combine words or phrases together (low probability of confusion).

A noun appears:

  • after an adjective (including articles)

    • The red fox jumped the fence.
  • before a verb

    • The bank robber stole the money.
    • Mary likes strawberries.
  • at end of prepositional phrase (as the object)

    • I went across *town.
    • The red fox jumped over the fence.

An adjective appears:

  • before a noun

    • The red fox jumped the tall fence.
    • The tasty food got eaten.
  • after a linking verb (predicate adjective)

    • The food tasted great.
    • I am tired.
    • Nancy is thoughtful.
    • That looked amazing.

A verb appears:

  • directly after a noun

    • The red fox jumped the tall fence.
    • The tasty food got eaten.
  • directly after a pronoun

    • The man who stole it is Bob.
    • They said that maybe he stole it.
    • Bob is a theaf that had a bad childhood.
    • I know he needs to learn some ruby.

An adverb appears:

  • directly after a verb

    • He walked quickly to the store.
    • Run as fast as you can!
    • Mary ate the cheeseburger ridiculously quick
  • before a verb

    • He quickly walked to the store.
    • Mary ridiculously ate the cheeseburger.
    • She sometimes takes medication.
  • before an adjective

    • Mary is really fat.
    • Bob is especially clever.
  • before another adverb

    • He speaks very slowly.
    • He exercises remarkably well.

A preposition appears:

  • directly after a verb

    • He walked across the room.
    • I jumped over the rope.
    • The ball is over there.
    • She will arrive at noon.
  • beginning of a sentence

    • In the morning, I usually drink coffee.
    • Around the mountain, here she comes!