Class: WordNet::Synset
- Inherits:
-
Object
- Object
- WordNet::Synset
- Defined in:
- lib/rwordnet/synset.rb
Overview
Represents a synset (or group of synonymous words) in WordNet. Synsets are related to each other by various (and numerous!) relationships, including Hypernym (x is a hypernym of y <=> x is a parent of y) and Hyponym (x is a child of y)
Instance Attribute Summary collapse
-
#gloss ⇒ Object
readonly
Get a string representation of this synset’s gloss.
-
#lex_filenum ⇒ Object
readonly
A two digit decimal integer representing the name of the lexicographer file containing the synset for the sense.
-
#pos ⇒ Object
readonly
Get a shorthand representation of the part of speech this synset represents, e.g.
-
#pos_offset ⇒ Object
readonly
Get the offset, in bytes, at which this synset’s POS information is stored in WordNet’s internal DB.
-
#synset_offset ⇒ Object
readonly
Get the offset, in bytes, at which this synset’s information is stored in WordNet’s internal DB.
-
#synset_type ⇒ Object
readonly
Get the part of speech type of this synset.
-
#word_counts ⇒ Object
readonly
Get the list of words (and their frequencies within the WordNet graph) contained in this Synset.
Class Method Summary collapse
- ._apply_rules(forms, pos) ⇒ Object
- ._filter_forms(forms, pos) ⇒ Object
-
.find(word, pos) ⇒ Object
Ported from python NLTK Load all synsets with a given lemma and part of speech tag.
- .find_all(word) ⇒ Object
- .load_exception_map ⇒ Object
-
.morphy(form, pos) ⇒ Object
ported from nltk python from jordanbg: Given an original string x 1.
- .morphy_all(form) ⇒ Object
Instance Method Summary collapse
-
#antonyms ⇒ Object
Get the Synsets of this sense’s antonym.
-
#expanded_first_hypernyms ⇒ Object
Get the entire hypernym tree (from this synset all the way up to
entity
) as an array. -
#expanded_hypernyms ⇒ Object
Get the entire hypernym tree (from this synset all the way up to
entity
) as an array. - #expanded_hypernyms_depth ⇒ Object
-
#hypernym ⇒ Object
(also: #parent)
Get the parent synset (higher-level category, i.e. fruit -> reproductive_structure).
-
#hypernyms ⇒ Object
(also: #parents)
Get the parent synset (higher-level category, i.e. fruit -> reproductive_structure) as an array.
-
#hyponyms ⇒ Object
(also: #children)
Get the child synset(s) (i.e., lower-level categories, i.e. fruit -> edible_fruit).
-
#initialize(pos, offset) ⇒ Synset
constructor
Create a new synset by reading from the data file specified by
pos
, atoffset
bytes into the file. -
#relation(pointer_symbol) ⇒ Object
Get an array of Synsets with the relation ‘pointer_symbol` relative to this Synset.
-
#to_s ⇒ Object
(also: #to_str)
Returns a compact, human-readable form of this synset, e.g.
-
#word_count ⇒ Object
(also: #size)
How many words does this Synset include?.
-
#words ⇒ Object
Get a list of words included in this Synset.
Constructor Details
#initialize(pos, offset) ⇒ Synset
Create a new synset by reading from the data file specified by pos
, at offset
bytes into the file. This is how the WordNet database is organized. You shouldn’t be creating Synsets directly; instead, use Lemma#synsets.
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
# File 'lib/rwordnet/synset.rb', line 51 def initialize(pos, offset) data_line = DB.open(File.join("dict", "data.#{SYNSET_TYPES.fetch(pos)}")) do |f| f.seek(offset) f.readline.strip end info_line, @gloss = data_line.split(" | ", 2) line = info_line.split(" ") @pos = pos @pos_offset = offset @synset_offset = line.shift @lex_filenum = line.shift @synset_type = line.shift @word_counts = {} word_count = line.shift.to_i word_count.times do @word_counts[line.shift] = line.shift.to_i end pointer_count = line.shift.to_i @pointers = Array.new(pointer_count).map do Pointer.new( symbol: line.shift[0], offset: line.shift.to_i, pos: line.shift, source: line.shift ) end end |
Instance Attribute Details
#gloss ⇒ Object (readonly)
Get a string representation of this synset’s gloss. “Gloss” is a human-readable description of this concept, often with example usage, e.g:
move upward; "The fog lifted"; "The smoke arose from the forest fire"; "The mist uprose from the meadows"
for the second sense of the verb “fall”
47 48 49 |
# File 'lib/rwordnet/synset.rb', line 47 def gloss @gloss end |
#lex_filenum ⇒ Object (readonly)
A two digit decimal integer representing the name of the lexicographer file containing the synset for the sense. Probably only of interest if you’re using a wordnet database marked up with custom attributes, and you want to ensure that you’re using your own additions.
25 26 27 |
# File 'lib/rwordnet/synset.rb', line 25 def lex_filenum @lex_filenum end |
#pos ⇒ Object (readonly)
Get a shorthand representation of the part of speech this synset represents, e.g. “v” for verbs.
39 40 41 |
# File 'lib/rwordnet/synset.rb', line 39 def pos @pos end |
#pos_offset ⇒ Object (readonly)
Get the offset, in bytes, at which this synset’s POS information is stored in WordNet’s internal DB. You almost certainly don’t care about this.
36 37 38 |
# File 'lib/rwordnet/synset.rb', line 36 def pos_offset @pos_offset end |
#synset_offset ⇒ Object (readonly)
Get the offset, in bytes, at which this synset’s information is stored in WordNet’s internal DB. You almost certainly don’t care about this.
20 21 22 |
# File 'lib/rwordnet/synset.rb', line 20 def synset_offset @synset_offset end |
#synset_type ⇒ Object (readonly)
Get the part of speech type of this synset. One of ‘n’ (noun), ‘v’ (verb), ‘a’ (adjective), or ‘r’ (adverb)
32 33 34 |
# File 'lib/rwordnet/synset.rb', line 32 def synset_type @synset_type end |
#word_counts ⇒ Object (readonly)
Get the list of words (and their frequencies within the WordNet graph) contained in this Synset.
29 30 31 |
# File 'lib/rwordnet/synset.rb', line 29 def word_counts @word_counts end |
Class Method Details
._apply_rules(forms, pos) ⇒ Object
109 110 111 112 113 114 115 116 117 118 119 120 |
# File 'lib/rwordnet/synset.rb', line 109 def self._apply_rules(forms, pos) substitutions = MORPHOLOGICAL_SUBSTITUTIONS[pos] out = [] forms.each do |form| substitutions.each do |old, new| if form.end_with? old out.push form[0...-old.length] + new end end end return out end |
._filter_forms(forms, pos) ⇒ Object
122 123 124 |
# File 'lib/rwordnet/synset.rb', line 122 def self._filter_forms(forms, pos) forms.reject{|form| Lemma.find(form, pos).nil?}.uniq end |
.find(word, pos) ⇒ Object
Ported from python NLTK Load all synsets with a given lemma and part of speech tag. If no pos is specified, all synsets for all parts of speech will be loaded. If lang is specified, all the synsets associated with the lemma name of that language will be returned.
89 90 91 92 93 |
# File 'lib/rwordnet/synset.rb', line 89 def self.find(word, pos) word = word.downcase lemmas = self.morphy(word, pos).map{|form| WordNet::Lemma.find(form, pos)} lemmas.map{|lemma| lemma.synsets}.flatten end |
.find_all(word) ⇒ Object
95 96 97 |
# File 'lib/rwordnet/synset.rb', line 95 def self.find_all(word) SYNSET_TYPES.values.map{|pos| self.find(word, pos)}.flatten end |
.load_exception_map ⇒ Object
99 100 101 102 103 104 105 106 107 |
# File 'lib/rwordnet/synset.rb', line 99 def self.load_exception_map SYNSET_TYPES.each do |_, pos| @exception_map[pos] = {} File.open(File.join(@morphy_path, 'exceptions', "#{pos}.exc"), 'r').each_line do |line| line = line.split @exception_map[pos][line[0]] = line[1..-1] end end end |
.morphy(form, pos) ⇒ Object
ported from nltk python from jordanbg: Given an original string x
-
Apply rules once to the input to get y1, y2, y3, etc.
-
Return all that are in the database
-
If there are no matches, keep applying rules until you either find a match or you can’t go any further
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
# File 'lib/rwordnet/synset.rb', line 133 def self.morphy(form, pos) if @exception_map == {} self.load_exception_map end exceptions = @exception_map[pos] # 0. Check the exception lists if exceptions.has_key? form return self._filter_forms([form] + exceptions[form], pos) end # 1. Apply rules once to the input to get y1, y2, y3, etc. forms = self._apply_rules([form], pos) # 2. Return all that are in the database (and check the original too) results = self._filter_forms([form] + forms, pos) if results != [] return results end # 3. If there are no matches, keep applying rules until we find a match while forms.length > 0 forms = self._apply_rules(forms, pos) results = self._filter_forms(forms, pos) if results != [] return results end end # Return an empty list if we can't find anything return [] end |
.morphy_all(form) ⇒ Object
166 167 168 |
# File 'lib/rwordnet/synset.rb', line 166 def self.morphy_all(form) SYNSET_TYPES.values.map{|pos| self.morphy(form, pos)}.flatten end |
Instance Method Details
#antonyms ⇒ Object
Get the Synsets of this sense’s antonym
192 193 194 |
# File 'lib/rwordnet/synset.rb', line 192 def antonyms relation(ANTONYM) end |
#expanded_first_hypernyms ⇒ Object
Get the entire hypernym tree (from this synset all the way up to entity
) as an array.
213 214 215 216 217 218 219 220 221 222 223 224 225 226 |
# File 'lib/rwordnet/synset.rb', line 213 def parent = hypernym list = [] return list unless parent while parent break if list.include? parent.pos_offset list.push parent.pos_offset parent = parent.hypernym end list.flatten! list.map! { |offset| Synset.new(@pos, offset)} end |
#expanded_hypernyms ⇒ Object
Get the entire hypernym tree (from this synset all the way up to entity
) as an array.
229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 |
# File 'lib/rwordnet/synset.rb', line 229 def parents = hypernyms list = [] return list unless parents while parents.length > 0 parent = parents.pop next if list.include? parent.pos_offset list.push parent.pos_offset parents.push *parent.hypernyms end list.flatten! list.map! { |offset| Synset.new(@pos, offset)} end |
#expanded_hypernyms_depth ⇒ Object
245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 |
# File 'lib/rwordnet/synset.rb', line 245 def parents = hypernyms.map{|hypernym| [hypernym, 1]} list = [] out = [] return list unless parents max_depth = 1 while parents.length > 0 parent, depth = parents.pop next if list.include? parent.pos_offset list.push parent.pos_offset out.push [Synset.new(@pos, parent.pos_offset), depth] parents.push *(parent.hypernyms.map{|hypernym| [hypernym, depth + 1]}) max_depth = [max_depth, depth].max end return [out, max_depth] end |
#hypernym ⇒ Object Also known as: parent
Get the parent synset (higher-level category, i.e. fruit -> reproductive_structure).
197 198 199 |
# File 'lib/rwordnet/synset.rb', line 197 def hypernym relation(HYPERNYM)[0] end |
#hypernyms ⇒ Object Also known as: parents
Get the parent synset (higher-level category, i.e. fruit -> reproductive_structure) as an array.
203 204 205 |
# File 'lib/rwordnet/synset.rb', line 203 def hypernyms relation(HYPERNYM) end |
#hyponyms ⇒ Object Also known as: children
Get the child synset(s) (i.e., lower-level categories, i.e. fruit -> edible_fruit)
208 209 210 |
# File 'lib/rwordnet/synset.rb', line 208 def hyponyms relation(HYPONYM) end |
#relation(pointer_symbol) ⇒ Object
Get an array of Synsets with the relation ‘pointer_symbol` relative to this Synset. Mostly, this is an internal method used by convience methods (e.g. Synset#antonym), but it can take any valid valid pointer_symbol
defined in pointers.rb.
Example (get the gloss of an antonym for ‘fall’):
WordNet::Lemma.find("fall", :verb).synsets[1].relation("!")[0].gloss
186 187 188 189 |
# File 'lib/rwordnet/synset.rb', line 186 def relation(pointer_symbol) @pointers.select { |pointer| pointer.symbol == pointer_symbol }. map! { |pointer| Synset.new(@synset_type, pointer.offset) } end |
#to_s ⇒ Object Also known as: to_str
Returns a compact, human-readable form of this synset, e.g.
(v) fall (descend in free fall under the influence of gravity; "The branch fell from the tree"; "The unfortunate hiker fell into a crevasse")
for the second meaning of the verb “fall.”
268 269 270 |
# File 'lib/rwordnet/synset.rb', line 268 def to_s "(#{@synset_type}) #{words.map { |x| x.tr('_',' ') }.join(', ')} (#{@gloss})" end |
#word_count ⇒ Object Also known as: size
How many words does this Synset include?
171 172 173 |
# File 'lib/rwordnet/synset.rb', line 171 def word_count @word_counts.size end |
#words ⇒ Object
Get a list of words included in this Synset
176 177 178 |
# File 'lib/rwordnet/synset.rb', line 176 def words @word_counts.keys end |