Class: LinkParser::Dictionary
- Extended by:
- Forwardable
- Defined in:
- lib/linkparser/dictionary.rb
Constant Summary collapse
- DefaultDataDir =
Config::CONFIG['datadir']
- Dict =
Lots of constants for the handling of the different dictionary files.
1
- Affix =
2
- Post =
3
- Constituent =
4
- DefaultDict =
"4.0.dict"
- DefaultAffix =
"4.0.affix"
- DefaultPost =
"4.0.knowledge"
- DefaultConstituent =
"4.0.constituent-knowledge"
Instance Attribute Summary collapse
-
#constituent ⇒ Object
readonly
the constituent-knowledge dictionary hash.
-
#datadir ⇒ Object
readonly
the directory that the dictionary files are located.
-
#dict ⇒ Object
readonly
the main dictionary hash.
-
#post ⇒ Object
readonly
the post-processing dictionary hash.
Class Method Summary collapse
-
.open_read_dict(dicttype, dictname, datadir = DefaultDataDir) ⇒ Object
This takes a filename of a dictionary, and reads it into the word-keyed hash.
-
.read_dict(dict, datadir = DefaultDataDir) ⇒ Object
This parses a dictionary string/file for its words and their definitions, returning a hash keyed by word with values being LinkParser::Definition objects.
Instance Method Summary collapse
-
#affix(words) ⇒ Object
does affix processing on the words, which is just separating conjunctions and punctuation from the words they are next to.
-
#initialize(dict_opts) ⇒ Dictionary
constructor
Initializes a new Dictionary object.
Constructor Details
#initialize(dict_opts) ⇒ Dictionary
Initializes a new Dictionary object. takes a hash as its argument. entries: datadir - the directory where the dictionary files are located dict - the main dictionary file affix - the affix dictionary file knowledge - the post-processing dictionary file constituent-knowledge - the constituent knowledge dictionary file
setting a value to an empty string prevents it from being used, which will work out fine for all but the datadir and the main dict. a value set to nil will mean to use the default setting.
143 144 145 146 147 148 149 150 151 152 153 154 |
# File 'lib/linkparser/dictionary.rb', line 143 def initialize( dict_opts ) @datadir = dict_opts['datadir'] || DefaultDataDir @dict = Dictionary::open_read_dict( Dict, dict_opts['dict'], @datadir ) @affix = Dictionary::open_read_dict( Affix, dict_opts['affix'], @datadir ) @post = Dictionary::open_read_dict( Post, dict_opts['knowledge'], @datadir ) @constituent = Dictionary::open_read_dict( Constituent, dict_opts['constituent-knowledge'], @datadir ) end |
Instance Attribute Details
#constituent ⇒ Object (readonly)
the constituent-knowledge dictionary hash
186 187 188 |
# File 'lib/linkparser/dictionary.rb', line 186 def constituent @constituent end |
#datadir ⇒ Object (readonly)
the directory that the dictionary files are located
157 158 159 |
# File 'lib/linkparser/dictionary.rb', line 157 def datadir @datadir end |
#dict ⇒ Object (readonly)
the main dictionary hash
160 161 162 |
# File 'lib/linkparser/dictionary.rb', line 160 def dict @dict end |
#post ⇒ Object (readonly)
the post-processing dictionary hash
183 184 185 |
# File 'lib/linkparser/dictionary.rb', line 183 def post @post end |
Class Method Details
.open_read_dict(dicttype, dictname, datadir = DefaultDataDir) ⇒ Object
This takes a filename of a dictionary, and reads it into the word-keyed hash.
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
# File 'lib/linkparser/dictionary.rb', line 104 def open_read_dict( dicttype, dictname, datadir = DefaultDataDir ) if dictname and dictname.empty? # do nothing return nil else if dictname f = File.open( File.join(datadir, dictname) ) else default = case dicttype when Dict DefaultDict when Affix DefaultAffix when Post DefaultPost when Constituent DefaultConstituent end f = File.open( File.join(datadir, default) ) end return read_dict(f.read(f.stat.size), datadir) end end |
.read_dict(dict, datadir = DefaultDataDir) ⇒ Object
This parses a dictionary string/file for its words and their definitions, returning a hash keyed by word with values being LinkParser::Definition objects.
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
# File 'lib/linkparser/dictionary.rb', line 60 def read_dict( dict, datadir = DefaultDataDir ) wordHash = {} # Hash#[] is faster than BinarySearchTree#[] macros = [] # Array#each is faster than Hash#each # read the dictionary file into an array of words:definition # "statements", excluding comments. statements = dict.gsub(/\n+|\s*%(?!\").*?\n/, " ").split(/\s*;(?!\")\s*/).compact return nil if statements.empty? statements.each {|statement| words, definition = statement.split(/\s*:(?!\")\s*/) macros.each {|macro| definition.gsub!(macro[0], macro[1]) } if(words =~ /<.*>/) words.strip! macros << [Regexp::new(words), definition] elsif(!words or !definition) $stderr.print "dict error #{statement}" # raise ParseError, "Dictionary outta whack: '#{statement}'" else if words =~ /^\// #/ # then it's a filename, not a word, and the file will # contain a list of words. # Log.info("Reading in words from %s." % datadir + words) $stderr.print "Reading in words \n" words = File.open(datadir + words) {|f| f.read(f.stat.size)} end # so now we have a bunch of words and their shared # definition. put each word into the hash with a value of # the definition data structure. words.gsub!(/"([^ ]+?)"/, '\1') # punctuation marks are in double-quotes words = words.split(/\s+/) definition = Definition::new(definition) words.each {|word| wordHash[word] = definition unless word.empty? } end } return wordHash end |
Instance Method Details
#affix(words) ⇒ Object
does affix processing on the words, which is just separating conjunctions and punctuation from the words they are next to.
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 |
# File 'lib/linkparser/dictionary.rb', line 166 def affix( words ) return words unless @affix @affix.each {|punct,move| words = words.inject([]) {|arr,ele| if /RPUNC/.match(move.inspect) && /(.*)(#{Regexp.escape(punct)}.*)$/.match(ele) arr << $1 << $2 elsif /LPUNC/.match(move.inspect) && /^(#{Regexp.escape(punct)})(.*)/.match(ele) arr << $1 << $2 else arr << ele end } } words end |