Class: CorpusProcessor::Cli

Inherits:
Thor
  • Object
show all
Defined in:
lib/corpus-processor/cli.rb

Overview

The operations available to users from CLI.

Instance Method Summary collapse

Instance Method Details

#process(input_file = STDIN, output_file = STDOUT) ⇒ void

This method returns an undefined value.

Convert a given corpus from one format to other.

By default the input format is LâMPADA and the output format is the one used by Stanford NER in training.



23
24
25
26
27
28
29
30
31
32
33
34
35
36
# File 'lib/corpus-processor/cli.rb', line 23

def process input_file = STDIN, output_file = STDOUT
  input_file  = File.open( input_file, 'r') if  input_file.is_a? String
  output_file = File.open(output_file, 'w') if output_file.is_a? String
  categories  = if options[:categories]
                  CorpusProcessor::Categories.load(options[:categories])
                else
                  CorpusProcessor::Categories.default
                end

  output_file.puts CorpusProcessor::Processor.new(categories: categories)
                                             .process(input_file.read)

  output_file.close
end