Class: WordNet::Synset

Inherits:
Object
  • Object
show all
Defined in:
lib/rwordnet/synset.rb

Overview

Represents a synset (or group of synonymous words) in WordNet. Synsets are related to each other by various (and numerous!) relationships, including Hypernym (x is a hypernym of y <=> x is a parent of y) and Hyponym (x is a child of y)

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(pos, offset) ⇒ Synset

Create a new synset by reading from the data file specified by pos, at offset bytes into the file. This is how the WordNet database is organized. You shouldn’t be creating Synsets directly; instead, use Lemma#synsets.



51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# File 'lib/rwordnet/synset.rb', line 51

def initialize(pos, offset)
  data_line = DB.open(File.join("dict", "data.#{SYNSET_TYPES.fetch(pos)}")) do |f|
    f.seek(offset)
    f.readline.strip
  end

  info_line, @gloss = data_line.split(" | ", 2)
  line = info_line.split(" ")

  @pos = pos
  @pos_offset = offset
  @synset_offset = line.shift
  @lex_filenum = line.shift
  @synset_type = line.shift

  @word_counts = {}
  word_count = line.shift.to_i
  word_count.times do
    @word_counts[line.shift] = line.shift.to_i
  end

  pointer_count = line.shift.to_i
  @pointers = Array.new(pointer_count).map do
    Pointer.new(
      symbol: line.shift[0],
      offset: line.shift.to_i,
      pos: line.shift,
      source: line.shift
    )
  end
end

Instance Attribute Details

#glossObject (readonly)

Get a string representation of this synset’s gloss. “Gloss” is a human-readable description of this concept, often with example usage, e.g:

move upward; "The fog lifted"; "The smoke arose from the forest fire"; "The mist uprose from the meadows"

for the second sense of the verb “fall”



47
48
49
# File 'lib/rwordnet/synset.rb', line 47

def gloss
  @gloss
end

#lex_filenumObject (readonly)

A two digit decimal integer representing the name of the lexicographer file containing the synset for the sense. Probably only of interest if you’re using a wordnet database marked up with custom attributes, and you want to ensure that you’re using your own additions.



25
26
27
# File 'lib/rwordnet/synset.rb', line 25

def lex_filenum
  @lex_filenum
end

#posObject (readonly)

Get a shorthand representation of the part of speech this synset represents, e.g. “v” for verbs.



39
40
41
# File 'lib/rwordnet/synset.rb', line 39

def pos
  @pos
end

#pos_offsetObject (readonly)

Get the offset, in bytes, at which this synset’s POS information is stored in WordNet’s internal DB. You almost certainly don’t care about this.



36
37
38
# File 'lib/rwordnet/synset.rb', line 36

def pos_offset
  @pos_offset
end

#synset_offsetObject (readonly)

Get the offset, in bytes, at which this synset’s information is stored in WordNet’s internal DB. You almost certainly don’t care about this.



20
21
22
# File 'lib/rwordnet/synset.rb', line 20

def synset_offset
  @synset_offset
end

#synset_typeObject (readonly)

Get the part of speech type of this synset. One of ‘n’ (noun), ‘v’ (verb), ‘a’ (adjective), or ‘r’ (adverb)



32
33
34
# File 'lib/rwordnet/synset.rb', line 32

def synset_type
  @synset_type
end

#word_countsObject (readonly)

Get the list of words (and their frequencies within the WordNet graph) contained in this Synset.



29
30
31
# File 'lib/rwordnet/synset.rb', line 29

def word_counts
  @word_counts
end

Class Method Details

._apply_rules(forms, pos) ⇒ Object



109
110
111
112
113
114
115
116
117
118
119
120
# File 'lib/rwordnet/synset.rb', line 109

def self._apply_rules(forms, pos)
    substitutions = MORPHOLOGICAL_SUBSTITUTIONS[pos]
    out = []
    forms.each do |form|
        substitutions.each do |old, new|
            if form.end_with? old
                out.push form[0...-old.length] + new
            end
        end
    end
    return out
end

._filter_forms(forms, pos) ⇒ Object



122
123
124
# File 'lib/rwordnet/synset.rb', line 122

def self._filter_forms(forms, pos)
    forms.reject{|form| Lemma.find(form, pos).nil?}.uniq
end

.find(word, pos) ⇒ Object

Ported from python NLTK Load all synsets with a given lemma and part of speech tag. If no pos is specified, all synsets for all parts of speech will be loaded. If lang is specified, all the synsets associated with the lemma name of that language will be returned.



89
90
91
92
93
# File 'lib/rwordnet/synset.rb', line 89

def self.find(word, pos)
    word = word.downcase
    lemmas = self.morphy(word, pos).map{|form| WordNet::Lemma.find(form, pos)}
    lemmas.map{|lemma| lemma.synsets}.flatten
end

.find_all(word) ⇒ Object



95
96
97
# File 'lib/rwordnet/synset.rb', line 95

def self.find_all(word)
    SYNSET_TYPES.values.map{|pos| self.find(word, pos)}.flatten
end

.load_exception_mapObject



99
100
101
102
103
104
105
106
107
# File 'lib/rwordnet/synset.rb', line 99

def self.load_exception_map
    SYNSET_TYPES.each do |_, pos|
        @exception_map[pos] = {}
        File.open(File.join(@morphy_path, 'exceptions', "#{pos}.exc"), 'r').each_line do |line|
            line = line.split
            @exception_map[pos][line[0]] = line[1..-1]
        end
    end
end

.morphy(form, pos) ⇒ Object

ported from nltk python from jordanbg: Given an original string x

  1. Apply rules once to the input to get y1, y2, y3, etc.

  2. Return all that are in the database

  3. If there are no matches, keep applying rules until you either find a match or you can’t go any further



133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
# File 'lib/rwordnet/synset.rb', line 133

def self.morphy(form, pos)
    if @exception_map == {}
        self.load_exception_map
    end
    exceptions = @exception_map[pos]

    # 0. Check the exception lists
    if exceptions.has_key? form
        return self._filter_forms([form] + exceptions[form], pos)
    end

    # 1. Apply rules once to the input to get y1, y2, y3, etc.
    forms = self._apply_rules([form], pos)

    # 2. Return all that are in the database (and check the original too)
    results = self._filter_forms([form] + forms, pos)
    if results != []
        return results
    end

    # 3. If there are no matches, keep applying rules until we find a match
    while forms.length > 0
        forms = self._apply_rules(forms, pos)
        results = self._filter_forms(forms, pos)
        if results != []
            return results
        end
    end

    # Return an empty list if we can't find anything
    return []
end

.morphy_all(form) ⇒ Object



166
167
168
# File 'lib/rwordnet/synset.rb', line 166

def self.morphy_all(form)
    SYNSET_TYPES.values.map{|pos| self.morphy(form, pos)}.flatten
end

Instance Method Details

#antonymsObject

Get the Synsets of this sense’s antonym



192
193
194
# File 'lib/rwordnet/synset.rb', line 192

def antonyms
  relation(ANTONYM)
end

#expanded_first_hypernymsObject

Get the entire hypernym tree (from this synset all the way up to entity) as an array.



213
214
215
216
217
218
219
220
221
222
223
224
225
226
# File 'lib/rwordnet/synset.rb', line 213

def expanded_first_hypernyms
  parent = hypernym
  list = []
  return list unless parent

  while parent
    break if list.include? parent.pos_offset
    list.push parent.pos_offset
    parent = parent.hypernym
  end

  list.flatten!
  list.map! { |offset| Synset.new(@pos, offset)}
end

#expanded_hypernymsObject

Get the entire hypernym tree (from this synset all the way up to entity) as an array.



229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
# File 'lib/rwordnet/synset.rb', line 229

def expanded_hypernyms
  parents = hypernyms
  list = []
  return list unless parents

  while parents.length > 0
    parent = parents.pop
    next if list.include? parent.pos_offset
    list.push parent.pos_offset
    parents.push *parent.hypernyms
  end

  list.flatten!
  list.map! { |offset| Synset.new(@pos, offset)}
end

#expanded_hypernyms_depthObject



245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
# File 'lib/rwordnet/synset.rb', line 245

def expanded_hypernyms_depth
  parents = hypernyms.map{|hypernym| [hypernym, 1]}
  list = []
  out = []
  return list unless parents

  max_depth = 1
  while parents.length > 0
    parent, depth = parents.pop
    next if list.include? parent.pos_offset
    list.push parent.pos_offset
    out.push [Synset.new(@pos, parent.pos_offset), depth]
    parents.push *(parent.hypernyms.map{|hypernym| [hypernym, depth + 1]})
    max_depth = [max_depth, depth].max
  end
  return [out, max_depth]
end

#hypernymObject Also known as: parent

Get the parent synset (higher-level category, i.e. fruit -> reproductive_structure).



197
198
199
# File 'lib/rwordnet/synset.rb', line 197

def hypernym
  relation(HYPERNYM)[0]
end

#hypernymsObject Also known as: parents

Get the parent synset (higher-level category, i.e. fruit -> reproductive_structure) as an array.



203
204
205
# File 'lib/rwordnet/synset.rb', line 203

def hypernyms
  relation(HYPERNYM)
end

#hyponymsObject Also known as: children

Get the child synset(s) (i.e., lower-level categories, i.e. fruit -> edible_fruit)



208
209
210
# File 'lib/rwordnet/synset.rb', line 208

def hyponyms
  relation(HYPONYM)
end

#relation(pointer_symbol) ⇒ Object

Get an array of Synsets with the relation ‘pointer_symbol` relative to this Synset. Mostly, this is an internal method used by convience methods (e.g. Synset#antonym), but it can take any valid valid pointer_symbol defined in pointers.rb.

Example (get the gloss of an antonym for ‘fall’):

WordNet::Lemma.find("fall", :verb).synsets[1].relation("!")[0].gloss


186
187
188
189
# File 'lib/rwordnet/synset.rb', line 186

def relation(pointer_symbol)
  @pointers.select { |pointer| pointer.symbol == pointer_symbol }.
    map! { |pointer| Synset.new(@synset_type, pointer.offset) }
end

#to_sObject Also known as: to_str

Returns a compact, human-readable form of this synset, e.g.

(v) fall (descend in free fall under the influence of gravity; "The branch fell from the tree"; "The unfortunate hiker fell into a crevasse")

for the second meaning of the verb “fall.”



268
269
270
# File 'lib/rwordnet/synset.rb', line 268

def to_s
  "(#{@synset_type}) #{words.map { |x| x.tr('_',' ') }.join(', ')} (#{@gloss})"
end

#word_countObject Also known as: size

How many words does this Synset include?



171
172
173
# File 'lib/rwordnet/synset.rb', line 171

def word_count
  @word_counts.size
end

#wordsObject

Get a list of words included in this Synset



176
177
178
# File 'lib/rwordnet/synset.rb', line 176

def words
  @word_counts.keys
end