Curate::Indexer
The Curate::Indexer gem is responsible for indexing the graph relationship of objects. It maps a PreservationDocument to an IndexDocument by mapping a PreservationDocument's direct parents into the paths to get from a root document to the given PreservationDocument.
Background
This is a sandbox to work through the reindexing strategy as it relates to CurateND Collections. At this point the code is separate to allow for rapid testing and prototyping (no sense spinning up SOLR and Fedora to walk an arbitrary graph).
Concepts
As we are indexing objects, we have two types of documents:
- PreservationDocument - a light-weight representation of a Fedora object
- IndexDocument - a light-weight representation of a SOLR document object
We have four attributes to consider for indexing the graph:
- pid - the unique identifier for a document
- parent_pids - the pids for all of the parents of a given document
- pathnames - the paths to traverse from a root document to the given document
- ancestors - the pathnames of each of the ancestors
See Curate::Indexer::Documents::IndexDocument for further discussion.
To reindex a single document, we leverage the Curate::Indexer.reindex_relationships
method.
Examples
Given the following PreservationDocuments:
PID | Parents |
---|---|
A | - |
B | - |
C | A |
D | A, B |
E | C |
If we were to reindex the above PreservationDocuments, we will generate the following IndexDocuments:
PID | Parents | Pathnames | Ancestors |
---|---|---|---|
A | - | [A] | [] |
B | - | [B] | [] |
C | A | [A/C] | [A] |
D | A, B | [A/D, B/D] | [A, B] |
E | C | [A/C/E] | [A/C] |
For more scenarios, look at the Reindex PID and Descendants specs.
Adapters
An AbstractAdapter provides the method interface for others to build against.
The InMemory adapter is a reference implementation (and used to ease testing overhead).
CurateND has implemented the following adapter for its LibraryCollection indexing.