Class: ChupaText::Extractor

Inherits:
Object
  • Object
show all
Includes:
Loggable
Defined in:
lib/chupa-text/extractor.rb

Instance Method Summary collapse

Constructor Details

#initializeExtractor

Returns a new instance of Extractor.


25
26
27
# File 'lib/chupa-text/extractor.rb', line 25

def initialize
  @decomposers = []
end

Instance Method Details

#add_decomposer(decomposer) ⇒ Object


44
45
46
# File 'lib/chupa-text/extractor.rb', line 44

def add_decomposer(decomposer)
  @decomposers << decomposer
end

#apply_configuration(configuration) ⇒ void

This method returns an undefined value.

Sets the extractor up by the configuration. It adds decomposers enabled in the configuration.

Parameters:

  • configuration (Configuration)

    The configuration to be applied.


36
37
38
39
40
41
42
# File 'lib/chupa-text/extractor.rb', line 36

def apply_configuration(configuration)
  decomposers = Decomposers.create(Decomposer.registry,
                                   configuration.decomposer)
  decomposers.each do |decomposer|
    add_decomposer(decomposer)
  end
end

#extract(input) {|text_data| ... } ⇒ void

This method returns an undefined value.

Extracts texts from input. Each extracted text is passes to the given block.

Parameters:

  • input (Data, String)

    The input to be extracted texts. If input is String, it is treated as the local file path or URI of input data.

Yields:

  • (text_data)

    Gives extracted text data to the block. The block may be called zero or more times.

Yield Parameters:

  • text_data (Data)

    The extracted text data. You can get text data by text_data.body.


61
62
63
# File 'lib/chupa-text/extractor.rb', line 61

def extract(input, &block)
  extract_recursive(ensure_data(input), &block)
end