Class: SpellChecker
- Inherits:
-
Object
- Object
- SpellChecker
- Defined in:
- lib/crypto-toolbox/spell_checker.rb
Instance Method Summary collapse
- #human_language?(str) ⇒ Boolean
-
#initialize(dict_lang = "en_GB") ⇒ SpellChecker
constructor
A new instance of SpellChecker.
-
#known_words(str) ⇒ Object
NOTE: About spelling error rates and language detection:.
- #suggest(str) ⇒ Object
Constructor Details
#initialize(dict_lang = "en_GB") ⇒ SpellChecker
Returns a new instance of SpellChecker.
3 4 5 |
# File 'lib/crypto-toolbox/spell_checker.rb', line 3 def initialize(dict_lang="en_GB") @dict = FFI::Hunspell.dict(dict_lang) end |
Instance Method Details
#human_language?(str) ⇒ Boolean
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
# File 'lib/crypto-toolbox/spell_checker.rb', line 29 def human_language?(str) words = str.split(" ").length errors = str.split(" ").map{|e| @dict.check?(e) }.count{|e| e == false} # using shell instead of hunspell ffi causes lots of escaping errors, even with shellwords.escape #errors = Float(`echo '#{Shellwords.escape(str)}' |hunspell -l |wc -l `.split.first) error_rate = errors.to_f/words $stderr.puts error_rate.round(4) if ENV["CRYPTO_TOOBOX_PRINT_ERROR_RATES"] if error_rate < 0.5 puts "[Success] Found valid result (spell error_rate: #{error_rate*100}% is below threshold: 20%)" return true else return false end end |
#known_words(str) ⇒ Object
NOTE: About spelling error rates and language detection:
missing punctuation support may lead to > 2% errors on valid texts, thus we use a high value . invalid decryptions tend to have spell error rates > 70 Some statistics about it: > summary(invalids)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.6000 1.0000 1.0000 0.9878 1.0000 1.0000
> summary(cut(invalids,10))
(0.6,0.64] (0.64,0.68] (0.68,0.72] (0.72,0.76] (0.76,0.8] (0.8,0.84]
8 13 9 534 1319 2809
(0.84,0.88] (0.88,0.92] (0.92,0.96] (0.96,1]
10581 46598 198477 1440651
21 22 23 |
# File 'lib/crypto-toolbox/spell_checker.rb', line 21 def known_words(str) words = str.split(" ").select{|w| @dict.check?(w) } end |
#suggest(str) ⇒ Object
25 26 27 |
# File 'lib/crypto-toolbox/spell_checker.rb', line 25 def suggest(str) @dict.suggest(str) end |