Raw Object Format

Gem Version

This is a pilot project to produce an intermediate data format that makes the bulk ingest of data into the Fedora Commons repository software simple. While the goal is to provide as simple of a format as possible, some affordances are made for defining standard datastreams used by Hydra project front-ends, such as the rightsMetadata datastream.

See spec/fixtures/vecnet-citation.json as a sample two object model. An overview of the format is in bulk-ingest.md.

Sample command line usage:

$ bin/rof ingest --fedora 'http://localhost:8983/fedora' --user fedoraAdmin:fedoraAdmin spec/fixtures/vecnet-citation.json
1. Ingesting vecnet:d217qs82g ...ok. 0.882s
2. Ingesting vecnet:h415pf50x ...ok. 0.283s
Total time 1.165s
0 errors

ROF does more than just ingesting. Should an object already exist in Fedora, it will be updated to match what is provided in the source file. (However, this only applies to datastreams which are mentioned in the source file. Unmentioned datastreams are untouched).

If the fedora path and user are omitted then rof lints the json file.

$ bin/rof ingest spec/fixtures/vecnet-citation.json
1. Verifying vecnet:d217qs82g ...ok. 0.108s
2. Verifying vecnet:h415pf50x ...ok. 0.002s
Total time 0.111s
0 errors

There is a filter which will assign objects identifiers. This requires an external noids service to provide the identifiers. See labels.md.

$ bin/rof filter label spec/fixtures/label.json --noids localhost:13001:test-pool --prefix temp
[
  {
    "type": "fobject",
    "pid": "temp:0k225999n60"
  },
  {
    "type": "fobject",
    "rels-ext": {
      "partOf": [
        "temp:0k225999n60"
      ],
      "refines": [
        "temp:0r96736668t"
      ]
    },
    "pid": "temp:0p096682x75"
  },
  {
    "type": "fobject",
    "pid": "temp:0r96736668t",
    "rels-ext": {
      "partOf": [
        "temp:0r96736668t",
        "temp:0k225999n60",
        "another"
      ]
    }
  }
]

It is envisioned that there could be higher level objects, and that the ingesting into fedora done by this utility will be simply the final step of many. Other ideas for transformations:

  • A service to expand higher-level objects, say an image-collection, into a sequence of fobjects.
  • The ability to run file characterizations and create derivatives before ingest.

Other

Since the files are JSON, any tool for working with JSON files will work with these. For example, the jq tool makes it easy to extract all the pid field from every object in a file, and return it as a JSON array:

jq '[.[]|.pid]' < spec/fixtures/vecnet-citation.json