Class: LinkParser::Dictionary

Inherits:
Object
  • Object
show all
Extended by:
Forwardable
Defined in:
lib/linkparser/dictionary.rb

Constant Summary collapse

DefaultDataDir =
Config::CONFIG['datadir']
Dict =

Lots of constants for the handling of the different dictionary files.

1
Affix =
2
Post =
3
Constituent =
4
DefaultDict =
"4.0.dict"
DefaultAffix =
"4.0.affix"
DefaultPost =
"4.0.knowledge"
DefaultConstituent =
"4.0.constituent-knowledge"

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(dict_opts) ⇒ Dictionary

Initializes a new Dictionary object. takes a hash as its argument. entries: datadir - the directory where the dictionary files are located dict - the main dictionary file affix - the affix dictionary file knowledge - the post-processing dictionary file constituent-knowledge - the constituent knowledge dictionary file

setting a value to an empty string prevents it from being used, which will work out fine for all but the datadir and the main dict. a value set to nil will mean to use the default setting.



143
144
145
146
147
148
149
150
151
152
153
154
# File 'lib/linkparser/dictionary.rb', line 143

def initialize( dict_opts )
	@datadir = dict_opts['datadir'] || DefaultDataDir
	
	@dict =			Dictionary::open_read_dict( Dict,
									dict_opts['dict'], @datadir )
	@affix =		Dictionary::open_read_dict( Affix,
									dict_opts['affix'], @datadir )
	@post =			Dictionary::open_read_dict( Post,
									dict_opts['knowledge'], @datadir )
	@constituent =	Dictionary::open_read_dict( Constituent,
									dict_opts['constituent-knowledge'], @datadir )
end

Instance Attribute Details

#constituentObject (readonly)

the constituent-knowledge dictionary hash



186
187
188
# File 'lib/linkparser/dictionary.rb', line 186

def constituent
  @constituent
end

#datadirObject (readonly)

the directory that the dictionary files are located



157
158
159
# File 'lib/linkparser/dictionary.rb', line 157

def datadir
  @datadir
end

#dictObject (readonly)

the main dictionary hash



160
161
162
# File 'lib/linkparser/dictionary.rb', line 160

def dict
  @dict
end

#postObject (readonly)

the post-processing dictionary hash



183
184
185
# File 'lib/linkparser/dictionary.rb', line 183

def post
  @post
end

Class Method Details

.open_read_dict(dicttype, dictname, datadir = DefaultDataDir) ⇒ Object

This takes a filename of a dictionary, and reads it into the word-keyed hash.



104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
# File 'lib/linkparser/dictionary.rb', line 104

def open_read_dict( dicttype, dictname, datadir = DefaultDataDir )
	if dictname and dictname.empty?
		# do nothing
		return nil
	else
		if dictname
			f = File.open( File.join(datadir, dictname) )
		else
			default = case dicttype
					  when Dict
						  DefaultDict
					  when Affix
						  DefaultAffix
					  when Post
						  DefaultPost
					  when Constituent
						  DefaultConstituent
					  end
			f = File.open( File.join(datadir, default) )
		end
		return read_dict(f.read(f.stat.size), datadir)
	end
end

.read_dict(dict, datadir = DefaultDataDir) ⇒ Object

This parses a dictionary string/file for its words and their definitions, returning a hash keyed by word with values being LinkParser::Definition objects.



60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
# File 'lib/linkparser/dictionary.rb', line 60

def read_dict( dict, datadir = DefaultDataDir )
	wordHash = {} # Hash#[] is faster than BinarySearchTree#[]
	macros = [] # Array#each is faster than Hash#each
	
	# read the dictionary file into an array of words:definition
	# "statements", excluding comments.
	statements = dict.gsub(/\n+|\s*%(?!\").*?\n/, " ").split(/\s*;(?!\")\s*/).compact
	return nil if statements.empty?
	statements.each {|statement|
		words, definition = statement.split(/\s*:(?!\")\s*/)
		macros.each {|macro|
			definition.gsub!(macro[0], macro[1])
		}
		if(words =~ /<.*>/)
			words.strip!
			macros << [Regexp::new(words), definition]
		elsif(!words or !definition)
		$stderr.print "dict error #{statement}"
		
		#	raise ParseError, "Dictionary outta whack: '#{statement}'"
		else
			if words =~ /^\// #/
				# then it's a filename, not a word, and the file will
				# contain a list of words.
#				Log.info("Reading in words from %s." % datadir + words)
     $stderr.print "Reading in words \n"				
     words = File.open(datadir + words) {|f| f.read(f.stat.size)}
			end
			# so now we have a bunch of words and their shared
			# definition.  put each word into the hash with a value of
			# the definition data structure.
			words.gsub!(/"([^ ]+?)"/, '\1') # punctuation marks are in double-quotes
			words = words.split(/\s+/)
			definition = Definition::new(definition)
			words.each {|word|
				wordHash[word] = definition unless word.empty?
			}
		end
	}
	return wordHash
end

Instance Method Details

#affix(words) ⇒ Object

does affix processing on the words, which is just separating conjunctions and punctuation from the words they are next to.



166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
# File 'lib/linkparser/dictionary.rb', line 166

def affix( words )
	return words unless @affix
	@affix.each {|punct,move|
		words = words.inject([]) {|arr,ele|
			if /RPUNC/.match(move.inspect) && /(.*)(#{Regexp.escape(punct)}.*)$/.match(ele)
				arr << $1 << $2
			elsif /LPUNC/.match(move.inspect) && /^(#{Regexp.escape(punct)})(.*)/.match(ele)
				arr << $1 << $2
			else
				arr << ele
			end
		}
	}
	words
end