TikaMasala
Simple Wrapper around Tika to parse documents.
Usage
parser = TikaMasala::Parser.new('/path/to/tika/jar/file')
parser.parse('/path/to/pdf')
Everything uses options
behind the scene to pass arguments:
parser.('--detect', '/path/to/file')
What if tika can't parse a document?
The exception thrown contain both the stdout and stderr output. Which means that you can still retrieve part of the content you were trying to extract.
parser = TikaMasala::Parser.new('/path/to/tika/jar/file')
begin
parser.parse('/path/to/pdf')
rescue TikaMasala::TikaError => e
# Let's say it produced an error
e.stdout # contains the parsed text until it reached an error
e.stderr # contains the exception raised by Java
end
Versions
The version match the version of tika distributed.