Module: EncodingEstimator
- Defined in:
- lib/encoding_estimator.rb,
lib/encoding_estimator/version.rb,
lib/encoding_estimator/detector.rb,
lib/encoding_estimator/detection.rb,
lib/encoding_estimator/conversion.rb,
lib/encoding_estimator/distribution.rb,
lib/encoding_estimator/language_model.rb,
lib/encoding_estimator/parallel_support.rb,
lib/encoding_estimator/builder/model_builder.rb,
lib/encoding_estimator/builder/parallel_model_builder.rb
Defined Under Namespace
Classes: CDCombination, Conversion, Detection, Detector, Distribution, LanguageModel, ModelBuilder, ParallelModelBuilder, ParallelSupport, RangeScale, SingleDetectionResult
Constant Summary collapse
- VERSION =
'0.2.0'
Class Method Summary collapse
-
.detect(data, config) ⇒ EncodingEstimator::Detection
Let the EncodingEstimator detect how the input string is encoded.
-
.ensure_utf8(data, config = {}) ⇒ String
Convert a string to a UTF-8 string by performing the conversion that is automatically detected by EncodingEstimator.
Class Method Details
.detect(data, config) ⇒ EncodingEstimator::Detection
Let the EncodingEstimator detect how the input string is encoded
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
# File 'lib/encoding_estimator.rb', line 50 def EncodingEstimator.detect( data, config ) params = { languages: [ :de, :en ], encodings: %w(iso-8859-1 utf-16le windows-1251), operations: [Conversion::Operation::DECODE], include_default: true, penalty: 0.01, num_cores: nil }.merge config Detector.new( Conversion.generate( params[ :encodings ], params[ :operations ], params[ :include_default ] ), params[ :languages ].map { |l| EncodingEstimator::LanguageModel.new( l ) }, params[ :penalty ], params[:num_cores] ).detect data end |
.ensure_utf8(data, config = {}) ⇒ String
Convert a string to a UTF-8 string by performing the conversion that is automatically detected by EncodingEstimator
23 24 25 26 27 28 29 30 31 32 33 34 35 |
# File 'lib/encoding_estimator.rb', line 23 def EncodingEstimator.ensure_utf8( data, config = {} ) params = { languages: [ :de, :en ], encodings: %w(iso-8859-1 utf-16le windows-1251), operations: [Conversion::Operation::DECODE], include_default: true, penalty: 0.01, num_cores: nil }.merge config EncodingEstimator.detect( data, params ).result.perform( data ) end |