Class: CorpusProcessor::Categories

Inherits:
Object
  • Object
show all
Defined in:
lib/corpus-processor/categories.rb

Overview

The helper to load categories definitions.

Categories definitions is a Hash with two keys named :input and :output.

The :input has String keys that match the categories found in original corpus. Its values are Symbols that represent the category internally.

The :output has Symbols keys that represent the category internally and should the values from the :input hash. Its values are the Strings representing the category in the final converted corpus.

An optional :default key is allowed in the :output hash. If present the resulting loaded hash has the specified default value.

Examples:

YAML file defining categories.

---
:input:
  PESSOA: :person
  LOCAL: :location
  ORGANIZACAO: :organization
:output:
  :default: O
  :person: PERSON
  :location: LOCATION
  :organization: ORGANIZATION

Class Method Summary collapse

Class Method Details

.defaultHash

The default set of categories definitions.

The YAML definition file is default.

See Also:



50
51
52
53
# File 'lib/corpus-processor/categories.rb', line 50

def self.default
  self.load(File.expand_path(File.join('..', 'categories', 'default.yml'),
                             __FILE__))
end

.load(path) ⇒ Hash

Load a set of categories definitions.

See Also:



33
34
35
36
37
38
39
40
41
# File 'lib/corpus-processor/categories.rb', line 33

def self.load path
  @@instances[path] ||= YAML.load(File.read(path)).tap { |categories|
    default = categories[:output] && categories[:output][:default]
    if default
      categories[:output].default = default
      categories[:output].delete :default
    end
  }
end