Class: CorpusProcessor::Processor

Inherits:
Object
  • Object
show all
Defined in:
lib/corpus-processor/processor.rb

Overview

The entry point for processing corpus.

Examples:

Simple use with default configuration.

CorpusProcessor::Processor.new.process('<P>Some text</P>')
# => "Some\tO\ntext\tO\n.\tO\n""

Instance Method Summary collapse

Constructor Details

#initialize(categories: CorpusProcessor::Categories.default, parser: CorpusProcessor::Parsers::Lampada.new(categories), generator: CorpusProcessor::Generators::StanfordNer.new(categories)) ⇒ Processor

Returns a new instance of Processor.

Parameters:

  • categories (Hash) (defaults to: CorpusProcessor::Categories.default)

    the categories extracted with Categories.

  • parser (#parse) (defaults to: CorpusProcessor::Parsers::Lampada.new(categories))

    the parser for original corpus.

  • generator (#generate) (defaults to: CorpusProcessor::Generators::StanfordNer.new(categories))

    the generator that computes tokens into the tranformed corpus.



12
13
14
15
16
17
18
# File 'lib/corpus-processor/processor.rb', line 12

def initialize(
  categories: CorpusProcessor::Categories.default,
  parser:     CorpusProcessor::Parsers::Lampada.new(categories),
  generator:  CorpusProcessor::Generators::StanfordNer.new(categories))
  @parser    = parser
  @generator = generator
end

Instance Method Details

#process(corpus) ⇒ String

Perform the processing of corpus.

Returns:

  • (String)

    the converted corpus.



23
24
25
# File 'lib/corpus-processor/processor.rb', line 23

def process corpus
  @generator.generate @parser.parse(corpus)
end