Ddr::Extraction

Pluggable file text and metadata extraction service.

Installation

Add this line to your application's Gemfile:

gem 'ddr-extraction'

And then execute:

$ bundle

Or install it yourself as:

$ gem install ddr-extraction

Dependencies

The gem has no external dependencies of its own. Consult the documentation for each extraction tool used by your configuration.

Configuration

Ddr::Extraction includes default configurations for Aapche Tika (text and metadata extraction) and FITS (metadata only). Tika is set as the default adapter when one is not specified to the builder.

require "ddr-extraction
Ddr::Extraction.load_defaults!

There are rake tasks for downloading Tika and FITS to expected locations.

rake tika:download
rake fits:download

Configuration Example

Ddr::Extraction.configure do |config|
  config.adapters.default = :tika # Use Tika as the default adapter
  config.adapters.tika.path = "/path/to/tika-app.jar"
  config.adapters.fits.path = "/path/to/fits.sh"
end

Usage

>> extractor = Ddr::Extraction.build_extractor
>> text = extractor.extract(:text, "spec/fixtures/sample.docx")
>> puts text.read
This is a sample document.

Contributing

  1. Fork it ( https://github.com/[my-github-username]/ddr_extractor/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request