Class: Analyzers::Utils::SpellChecker

Inherits:
Object
  • Object
show all
Defined in:
lib/crypto-toolbox/analyzers/utils/spell_checker.rb

Instance Method Summary collapse

Constructor Details

#initialize(dict_lang = "en_GB") ⇒ SpellChecker

Returns a new instance of SpellChecker.



9
10
11
# File 'lib/crypto-toolbox/analyzers/utils/spell_checker.rb', line 9

def initialize(dict_lang="en_GB")
  @dict = FFI::Hunspell.dict(dict_lang)
end

Instance Method Details

#human_language?(str) ⇒ Boolean

Check whether a given string seems to be part of a human language using the given dictionary

NOTE: Using shell instead of hunspell ffi causes lots of escaping errors, even with shellwords.escape errors = Float(‘echo ’#Shellwords.escape(str)‘ |hunspell -l |wc -l `.split.first)

Returns:

  • (Boolean)


40
41
42
43
44
45
46
47
48
49
# File 'lib/crypto-toolbox/analyzers/utils/spell_checker.rb', line 40

def human_language?(str)
  words  = str.split(" ").length
  errors = str.split(" ").map{|e| @dict.check?(e) }.count{|e| e == false}
  
  error_rate = errors.to_f/words
  
  $stderr.puts error_rate.round(4) if ENV["CRYPTO_TOOBOX_PRINT_ERROR_RATES"]
  
  error_rate_sufficient?(error_rate)
end

#known_words(str) ⇒ Object

NOTE: About spelling error rates and language detection:

missing punctuation support may lead to > 2% errors on valid texts, thus we use a high value . invalid decryptions tend to have spell error rates > 70 Some statistics about it: > summary(invalids)

  Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.6000  1.0000  1.0000  0.9878  1.0000  1.0000

> summary(cut(invalids,10))

(0.6,0.64] (0.64,0.68] (0.68,0.72] (0.72,0.76]  (0.76,0.8]  (0.8,0.84] 
         8          13           9         534        1319        2809

(0.84,0.88] (0.88,0.92] (0.92,0.96] (0.96,1]

10581       46598      198477     1440651


27
28
29
# File 'lib/crypto-toolbox/analyzers/utils/spell_checker.rb', line 27

def known_words(str)
  words = str.split(" ").select{|w| @dict.check?(w) }
end

#suggest(str) ⇒ Object



31
32
33
# File 'lib/crypto-toolbox/analyzers/utils/spell_checker.rb', line 31

def suggest(str)
  @dict.suggest(str)
end