Ddr::Extraction
Pluggable file text and metadata extraction service.
Installation
Add this line to your application's Gemfile:
gem 'ddr-extraction'
And then execute:
$ bundle
Or install it yourself as:
$ gem install ddr-extraction
Dependencies
The gem has no external dependencies of its own. Consult the documentation for each extraction tool used by your configuration.
Configuration
Ddr::Extraction
includes default configurations for Aapche Tika (text and metadata extraction) and FITS (metadata only). Tika is set as the default adapter when one is not specified to the builder.
require "ddr-extraction
Ddr::Extraction.load_defaults!
There are rake tasks for downloading Tika and FITS to expected locations.
rake tika:download
rake fits:download
Configuration Example
Ddr::Extraction.configure do |config|
config.adapters.default = :tika # Use Tika as the default adapter
config.adapters.tika.path = "/path/to/tika-app.jar"
config.adapters.fits.path = "/path/to/fits.sh"
end
Usage
>> extractor = Ddr::Extraction.build_extractor
>> text = extractor.extract(:text, "spec/fixtures/sample.docx")
>> puts text.read
This is a sample document.
Contributing
- Fork it ( https://github.com/[my-github-username]/ddr_extractor/fork )
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request