TikaMasala

Simple Wrapper around Tika to parse documents.

Usage

parser = TikaMasala::Parser.new('/path/to/tika/jar/file')
parser.parse('/path/to/pdf')

Everything uses options behind the scene to pass arguments:

parser.options('--detect', '/path/to/file')

What if tika can't parse a document?

The exception thrown contain both the stdout and stderr output. Which means that you can still retrieve part of the content you were trying to extract.

parser = TikaMasala::Parser.new('/path/to/tika/jar/file')

begin
  parser.parse('/path/to/pdf')
rescue TikaMasala::TikaError => e
  # Let's say it produced an error
  e.stdout # contains the parsed text until it reached an error
  e.stderr # contains the exception raised by Java
end

Versions

The version match the version of tika distributed.