Class: EncodingEstimator::Detector

Inherits:
Object
  • Object
show all
Defined in:
lib/encoding_estimator/detector.rb

Overview

Class to perform an encoding detection on strings

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(conversions, languages, penalty = 0.01, num_processes = nil) ⇒ Detector

Create a new instance with a given configuration consisting of a list of conversions, languages and the number of processes.

Parameters:

  • conversions (Array<EncodingEstimator::Conversion>)

    Conversions to perform/test on the inputs.

  • languages (Array<EncodingEstimator::LanguageModel>)

    Languages to consider when evaluating the input. Array of two-letter-codes

  • penalty (Float) (defaults to: 0.01)

    Base penalty subtracted from each char’s score

  • num_processes (Integer) (defaults to: nil)

    Number of processes the detection will run on -> true multi-threading through the parallel gem



80
81
82
83
84
85
# File 'lib/encoding_estimator/detector.rb', line 80

def initialize( conversions, languages, penalty = 0.01, num_processes = nil )
  @conversions   = conversions
  @languages     = languages
  @num_processes = num_processes
  @penalty       = penalty
end

Instance Attribute Details

#conversionsObject (readonly)

Returns the value of attribute conversions.



66
67
68
# File 'lib/encoding_estimator/detector.rb', line 66

def conversions
  @conversions
end

#languagesObject (readonly)

Returns the value of attribute languages.



67
68
69
# File 'lib/encoding_estimator/detector.rb', line 67

def languages
  @languages
end

#num_processesObject (readonly)

Returns the value of attribute num_processes.



68
69
70
# File 'lib/encoding_estimator/detector.rb', line 68

def num_processes
  @num_processes
end

#penaltyObject (readonly)

Returns the value of attribute penalty.



69
70
71
# File 'lib/encoding_estimator/detector.rb', line 69

def penalty
  @penalty
end

Instance Method Details

#detect(str) ⇒ EncodingEstimator::Detection

Detect the encoding using the current configuration given an input string

Parameters:

  • str (String)

    Input string the detection will be performed on

Returns:



92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
# File 'lib/encoding_estimator/detector.rb', line 92

def detect( str )
  sums    = {}
  results = (num_processes.nil? or !EncodingEstimator::ParallelSupport.supported?) ?
                detect_st( str, combinations ) : detect_mt( str, combinations )

  results.each do |result|
    sums[result.key] = sums.fetch(result.key, 0.0) + result.score
  end

  range = EncodingEstimator::RangeScale.new( sums.values.min, sums.values.max )

  scaled_scores = {}
  sums.each do |k,s|
    scaled_scores[ k ] = range.scale s
  end

  EncodingEstimator::Detection.new( scaled_scores, @conversions )
end