Ddr::Extractor
Generic file text and metadata extraction service.
Installation
Add this line to your application's Gemfile:
gem 'ddr-extractor'
And then execute:
$ bundle
Or install it yourself as:
$ gem install ddr-extractor
Usage
>> extractor = Ddr::Extraction::Extractor.new
=> #<Ddr::Extraction::Extractor:0x007fc2851dcfa0>
>> text = extractor.extract(:text, "spec/fixtures/sample.docx")
=> #<IO:fd 11>
>> puts text.read
This is a sample document.
>> metadata = extractor.extract(:metadata, "spec/fixtures/blue-devil.png")
=> #<IO:fd 12>
>> puts metadata.read
<?xml version="1.0" encoding="UTF-8"?>
<fits xmlns="http://hul.harvard.edu/ois/xml/ns/fits/fits_output" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://hul.harvard.edu/ois/xml/ns/fits/fits_output http://hul.harvard.edu/ois/xml/xsd/fits/fits_output.xsd" version="0.8.3" timestamp="11/12/14 12:36 PM">
<identification>
<identity format="Portable Network Graphics" mimetype="image/png" toolname="FITS" toolversion="0.8.3">
<tool toolname="Exiftool" toolversion="9.13" />
<tool toolname="Droid" toolversion="6.1.3" />
<tool toolname="ffident" toolversion="0.2" />
<tool toolname="Tika" toolversion="1.3" />
<version toolname="Droid" toolversion="6.1.3">1.0</version>
<externalIdentifier toolname="Droid" toolversion="6.1.3" type="puid">fmt/11</externalIdentifier>
</identity>
</identification>
<fileinfo>
<lastmodified toolname="Exiftool" toolversion="9.13" status="SINGLE_RESULT">2014:11:12 12:24:18-05:00</lastmodified>
<filepath toolname="OIS File Information" toolversion="0.2" status="SINGLE_RESULT">/path/to/spec/fixtures/blue-devil.png</filepath>
<filename toolname="OIS File Information" toolversion="0.2" status="SINGLE_RESULT">blue-devil.png</filename>
<size toolname="OIS File Information" toolversion="0.2" status="SINGLE_RESULT">75005</size>
<md5checksum toolname="OIS File Information" toolversion="0.2" status="SINGLE_RESULT">e6a5d16da2fbe65311952e2d8b04f069</md5checksum>
<fslastmodified toolname="OIS File Information" toolversion="0.2" status="SINGLE_RESULT">1415813058000</fslastmodified>
</fileinfo>
<filestatus />
<metadata>
<image>
<compressionScheme toolname="Exiftool" toolversion="9.13" status="CONFLICT">Deflate/Inflate</compressionScheme>
<compressionScheme toolname="Tika" toolversion="1.3" status="CONFLICT">Deflate</compressionScheme>
<imageWidth toolname="Exiftool" toolversion="9.13">200</imageWidth>
<imageHeight toolname="Exiftool" toolversion="9.13">200</imageHeight>
<orientation toolname="Tika" toolversion="1.3" status="SINGLE_RESULT">normal*</orientation>
</image>
</metadata>
<statistics fitsExecutionTime="791">
<tool toolname="OIS Audio Information" toolversion="0.1" status="did not run" />
<tool toolname="ADL Tool" toolversion="0.1" status="did not run" />
<tool toolname="Jhove" toolversion="1.5" executionTime="556" />
<tool toolname="file utility" toolversion="5.04" executionTime="623" />
<tool toolname="Exiftool" toolversion="9.13" executionTime="664" />
<tool toolname="Droid" toolversion="6.1.3" executionTime="147" />
<tool toolname="NLNZ Metadata Extractor" toolversion="3.4GA" executionTime="366" />
<tool toolname="OIS File Information" toolversion="0.2" executionTime="142" />
<tool toolname="OIS XML Metadata" toolversion="0.2" status="did not run" />
<tool toolname="ffident" toolversion="0.2" executionTime="369" />
<tool toolname="Tika" toolversion="1.3" executionTime="356" />
</statistics>
</fits>
Contributing
- Fork it ( https://github.com/[my-github-username]/ddr_extractor/fork )
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request