Class: Opener::LanguageIdentifier::Backend::LanguageDetection

Inherits:
Object
  • Object
show all
Defined in:
lib/opener/language_identifier/backend/language_detection.rb

Constant Summary collapse

DEFAULT_PROFILES_PATH =

Path to the directory containing the default profiles.

Returns:

  • (String)
File.expand_path(
  '../../../../../core/target/classes/profiles',
  __FILE__
)
DEFAULT_SHORT_PROFILES_PATH =

Path to the directory containing the default short profiles.

Returns:

  • (String)
File.expand_path(
  '../../../../../core/target/classes/short_profiles',
  __FILE__
)
PRIORITIES =

Prioritize OpeNER languages over the rest. Languages not covered by this list are automatically given a default priority.

Returns:

  • (Hash)
{
  'en' => 1.0,
  'es' => 0.9,
  'it' => 0.9,
  'fr' => 0.9,
  'de' => 0.9,
  'nl' => 0.9,

  # These languages are disabled (for the time being) due to conflicting
  # with other (OpeNER) languages too often.
  'af' => 0.0, # conflicts with Dutch
}
DEFAULT_PRIORITY =

The default priority for non OpeNER languages.

Returns:

  • (Float)
0.5
SHORT_THRESHOLD =

The amount of characters after which the detector should switch to using the longer profiles set.

Returns:

  • (Fixnum)
15

Instance Method Summary collapse

Constructor Details

#initializeLanguageDetection

Returns a new instance of LanguageDetection.



62
63
64
# File 'lib/opener/language_identifier/backend/language_detection.rb', line 62

def initialize
  @factory = com.cybozu.labs.langdetect.DetectorFactory.new
end

Instance Method Details

#detect(input) ⇒ String

Returns:

  • (String)


81
82
83
84
85
86
87
88
89
90
# File 'lib/opener/language_identifier/backend/language_detection.rb', line 81

def detect input
  detector = new_detector input
  detector.detect

# The core Java code raise an exception when it can't detect a language.
# Since this isn't actually something fatal we'll capture this and return
# "unknown" instead.
rescue com.cybozu.labs.langdetect.LangDetectException
  return 'unknown'
end

#new_detector(input) ⇒ Object



66
67
68
69
70
71
72
73
74
75
76
# File 'lib/opener/language_identifier/backend/language_detection.rb', line 66

def new_detector input
  @factory.load_profile determine_profiles input
  @factory.set_seed 1

  priorities = build_priorities input, @factory.langlist
  detector   = com.cybozu.labs.langdetect.Detector.new @factory

  detector.set_prior_map priorities
  detector.append input.downcase
  detector
end