Class: Treat::Workers::Formatters::Readers::Document

Inherits:
Object
  • Object
show all
Defined in:
lib/treat/workers/formatters/readers/document.rb

Overview

This class is a wrapper for Yomu. Yomu is a library for extracting text and metadata from files and documents using the Apache Tika content analysis toolkit.

Class Method Summary collapse

Class Method Details

.read(document, options = {}) ⇒ Object

Extract the readable text from any document.

Options: none.



10
11
12
13
14
15
16
# File 'lib/treat/workers/formatters/readers/document.rb', line 10

def self.read(document, options = {})
  yomu = Yomu.new(document.file)

  document.value = yomu.text
  document.set :format, yomu.mimetype.extensions.first
  document
end