Class: WordNet::Lemma

Inherits:
Object
  • Object
show all
Defined in:
lib/rwordnet/lemma.rb

Overview

Represents a single word in the WordNet lexicon, which can be used to look up a set of synsets.

Constant Summary collapse

SPACE =
' '
POS_SHORTHAND =
{:v => :verb, :n => :noun, :a => :adj, :r => :adv}
@@cache =
{}

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(lexicon_line, id) ⇒ Lemma

Create a lemma from a line in an lexicon file. You should not be creating Lemmas by hand; instead, use the WordNet::Lemma.find and WordNet::Lemma.find_all methods to find the Lemma for a word.



28
29
30
31
32
33
34
35
36
37
38
39
# File 'lib/rwordnet/lemma.rb', line 28

def initialize(lexicon_line, id)
  @id = id
  line = lexicon_line.split(" ")

  @word = line.shift
  @pos = line.shift
  synset_count = line.shift.to_i
  @pointer_symbols = line.slice!(0, line.shift.to_i)
  line.shift # Throw away redundant sense_cnt
  @tagsense_count = line.shift.to_i
  @synset_offsets = line.slice!(0, synset_count).map(&:to_i)
end

Instance Attribute Details

#idObject

A unique integer id that references this lemma. Used internally within WordNet’s database.



20
21
22
# File 'lib/rwordnet/lemma.rb', line 20

def id
  @id
end

#pointer_symbolsObject

An array of valid pointer symbols for this lemma. The list of all valid pointer symbols is defined in pointers.rb.



24
25
26
# File 'lib/rwordnet/lemma.rb', line 24

def pointer_symbols
  @pointer_symbols
end

#posObject

The part of speech (noun, verb, adjective) of this lemma. One of ‘n’, ‘v’, ‘a’ (adjective), or ‘r’ (adverb)



11
12
13
# File 'lib/rwordnet/lemma.rb', line 11

def pos
  @pos
end

#synset_offsetsObject

The offset, in bytes, at which the synsets contained in this lemma are stored in WordNet’s internal database.



17
18
19
# File 'lib/rwordnet/lemma.rb', line 17

def synset_offsets
  @synset_offsets
end

#tagsense_countObject

The number of times the sense is tagged in various semantic concordance texts. A tagsense_count of 0 indicates that the sense has not been semantically tagged.



14
15
16
# File 'lib/rwordnet/lemma.rb', line 14

def tagsense_count
  @tagsense_count
end

#wordObject

The word this lemma represents



8
9
10
# File 'lib/rwordnet/lemma.rb', line 8

def word
  @word
end

Class Method Details

.find(word, pos) ⇒ Object

Find a lemma for a given word and pos. Valid parts of speech are: ‘adj’, ‘adv’, ‘noun’, ‘verb’. Additionally, you can use the shorthand forms of each of these (‘a’, ‘r’, ‘n’, ‘v’)/



65
66
67
68
69
70
71
72
73
# File 'lib/rwordnet/lemma.rb', line 65

def find(word, pos)
  # Map shorthand POS to full forms
  pos = POS_SHORTHAND[pos] || pos

  cache = @@cache[pos] ||= build_cache(pos)
  if found = cache[word]
    Lemma.new(*found)
  end
end

.find_all(word) ⇒ Object

Find all lemmas for this word across all known parts of speech



56
57
58
59
60
# File 'lib/rwordnet/lemma.rb', line 56

def find_all(word)
  [:noun, :verb, :adj, :adv].flat_map do |pos|
    find(word, pos) || []
  end
end

Instance Method Details

#synsetsObject

Return a list of synsets for this Lemma. Each synset represents a different sense, or meaning, of the word.



42
43
44
# File 'lib/rwordnet/lemma.rb', line 42

def synsets
  @synset_offsets.map { |offset| Synset.new(@pos, offset) }
end

#to_sObject

Returns a compact string representation of this lemma, e.g. “fall, v” for the verb form of the word “fall”.



48
49
50
# File 'lib/rwordnet/lemma.rb', line 48

def to_s
  [@word, @pos].join(",")
end