Class: CorpusProcessor::Cli

Inherits:
Thor
  • Object
show all
Defined in:
lib/corpus-processor/cli.rb

Overview

The operations available to users from CLI.

Instance Method Summary collapse

Instance Method Details

#process(input_file = STDIN, output_file = STDOUT) ⇒ void

This method returns an undefined value.

Convert a given corpus from one format to other.

By default the input format is LâMPADA and the output format is the one used by Stanford NER in training.

Parameters:

  • input_file (String, IO) (defaults to: STDIN)

    the file that contains the original corpus.

  • output_file (String, IO) (defaults to: STDOUT)

    the file in which the converted corpus is written.



23
24
25
26
27
28
29
30
31
32
33
34
35
36
# File 'lib/corpus-processor/cli.rb', line 23

def process input_file = STDIN, output_file = STDOUT
  input_file  = File.open( input_file, 'r') if  input_file.is_a? String
  output_file = File.open(output_file, 'w') if output_file.is_a? String
  categories  = if options[:categories]
                  CorpusProcessor::Categories.load(options[:categories])
                else
                  CorpusProcessor::Categories.default
                end

  output_file.puts CorpusProcessor::Processor.new(categories: categories)
                                             .process(input_file.read)

  output_file.close
end