Module: TextStat::DictionaryManager
- Included in:
- Main
- Defined in:
- lib/textstat/dictionary_manager.rb
Overview
Dictionary management with high-performance caching
This module handles loading and caching of language-specific dictionaries used for identifying difficult words. The caching system provides a 36x performance improvement over reading dictionaries from disk on every call.
Class Attribute Summary collapse
Class Method Summary collapse
-
.cache_size ⇒ Integer
Get number of cached dictionaries.
-
.cached_languages ⇒ Array<String>
Get list of cached languages.
-
.clear_cache ⇒ Hash
Clear all cached dictionaries.
-
.dictionary_path ⇒ String
Get path to dictionary files.
-
.dictionary_path=(path) ⇒ String
Set dictionary path.
-
.load_dictionary(language) ⇒ Set
Load dictionary with automatic caching.
Instance Method Summary collapse
-
#difficult_words(text, language = 'en_us', return_words = false) ⇒ Integer, Set
Count difficult words in text.
Class Attribute Details
.dictionary_cache ⇒ Object
34 35 36 |
# File 'lib/textstat/dictionary_manager.rb', line 34 def dictionary_cache @dictionary_cache end |
Class Method Details
.cache_size ⇒ Integer
Get number of cached dictionaries
107 108 109 |
# File 'lib/textstat/dictionary_manager.rb', line 107 def cache_size @dictionary_cache.size end |
.cached_languages ⇒ Array<String>
Get list of cached languages
98 99 100 |
# File 'lib/textstat/dictionary_manager.rb', line 98 def cached_languages @dictionary_cache.keys end |
.clear_cache ⇒ Hash
Clear all cached dictionaries
Removes all dictionaries from memory cache. Useful for memory management in long-running applications or when switching between different sets of languages.
89 90 91 |
# File 'lib/textstat/dictionary_manager.rb', line 89 def clear_cache @dictionary_cache.clear end |
.dictionary_path ⇒ String
Get path to dictionary files
117 118 119 |
# File 'lib/textstat/dictionary_manager.rb', line 117 def dictionary_path @dictionary_path ||= File.join(TextStat::GEM_PATH, 'lib', 'dictionaries') end |
.dictionary_path=(path) ⇒ String
Set dictionary path
40 41 42 |
# File 'lib/textstat/dictionary_manager.rb', line 40 def dictionary_path=(path) @dictionary_path = path end |
.load_dictionary(language) ⇒ Set
Load dictionary with automatic caching
Loads a language-specific dictionary from disk and caches it in memory for subsequent calls. This provides significant performance improvements for repeated operations. Uses optimized file reading with streaming for better performance and memory efficiency.
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
# File 'lib/textstat/dictionary_manager.rb', line 58 def load_dictionary(language) # Return cached dictionary if available return @dictionary_cache[language] if @dictionary_cache[language] # Load dictionary from file dictionary_file = File.join(dictionary_path, "#{language}.txt") easy_words = Set.new if File.exist?(dictionary_file) # Use foreach for streaming - efficient and memory-friendly for large files File.foreach(dictionary_file, chomp: true) do |line| easy_words << line end end # Cache the loaded dictionary @dictionary_cache[language] = easy_words easy_words end |
Instance Method Details
#difficult_words(text, language = 'en_us', return_words = false) ⇒ Integer, Set
Count difficult words in text
Identifies words that are considered difficult based on:
-
Not being in the language’s easy words dictionary
-
Having more than one syllable
This method uses the cached dictionary and hyphenator systems for optimal performance.
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
# File 'lib/textstat/dictionary_manager.rb', line 144 def difficult_words(text, language = 'en_us', return_words = false) easy_words = DictionaryManager.load_dictionary(language) # Clean and split text once text_list = text.downcase.gsub(/[^0-9a-z ]/i, '').split return return_words ? Set.new : 0 if text_list.empty? # Get cached hyphenator for syllable counting hyphenator = BasicStats.get_hyphenator(language) diff_words_set = Set.new # Process each word once text_list.each do |word| next if easy_words.include?(word) # Count syllables inline using cached hyphenator word_hyphenated = hyphenator.visualise(word) syllables = word_hyphenated.count('-') + 1 diff_words_set.add(word) if syllables > 1 end return_words ? diff_words_set : diff_words_set.length end |