Curate::Indexer

Build Status Test Coverage Code Climate Documentation Status APACHE 2 License

The Curate::Indexer gem is responsible for indexing the graph relationship of objects. It maps a PreservationDocument to an IndexDocument by mapping a PreservationDocument's direct parents into the paths to get from a root document to the given PreservationDocument.

Background

This is a sandbox to work through the reindexing strategy as it relates to CurateND Collections. At this point the code is separate to allow for rapid testing and prototyping (no sense spinning up SOLR and Fedora to walk an arbitrary graph).

Concepts

As we are indexing objects, we have two types of documents:

  1. PreservationDocument - a light-weight representation of a Fedora object
  2. IndexDocument - a light-weight representation of a SOLR document object

We have four attributes to consider for indexing the graph:

  1. pid - the unique identifier for a document
  2. parent_pids - the pids for all of the parents of a given document
  3. pathnames - the paths to traverse from a root document to the given document
  4. ancestors - the pathnames of each of the ancestors

See Curate::Indexer::Documents::IndexDocument for further discussion.

To reindex a single document, we leverage the Curate::Indexer.reindex_relationships method.

Examples

Given the following PreservationDocuments:

PID Parents
A -
B -
C A
D A, B
E C

If we were to reindex the above PreservationDocuments, we will generate the following IndexDocuments:

PID Parents Pathnames Ancestors
A - [A] []
B - [B] []
C A [A/C] [A]
D A, B [A/D, B/D] [A, B]
E C [A/C/E] [A/C]

For more scenarios, look at the Reindex PID and Descendants specs.

Adapters

An AbstractAdapter provides the method interface for others to build against.

The InMemory adapter is a reference implementation (and used to ease testing overhead).

CurateND has implemented the following adapter for its LibraryCollection indexing.