FedoraFS -- FUSE Filesystem for Fedora Commons Repositories

Introduction

FedoraFS is a Ruby class that creates a FUSE (Filesystem in USErspace) filesystem on top of a Fedora Commons repository. Features include:

  • Resource Index (risearch) discovery of repository objects
  • Read/Write access to datastream content
  • Ability to expose object/datastream attributes as datastream.profile.xml files
  • Namespace-specific PID splitting for creation of manageable directory hierarchies

Usage

Usage:

mount_fedora [options]
    -C, --config-file FILE           Load defaults from FILE
    -a, --attribute-xml              Include object/datastream attribute XML files
                                     in directory listings
    -c, --cert-file FILE             Use client certificate from FILE
    -D, --no-daemon                  Run in the foreground (for debugging)
    -f, --fedora-url URL             Use Fedora instance at URL
    -k, --key-file FILE              Use client key from FILE
        --log-file FILE              Send logging output to FILE
        --log-level LEVEL            Set the logging level (0-5; 0 = most verbose)
    -m, --mount-point DIR            Mount filesystem on DIR
    -p, --key-pass STRING            Password for client key
    -R, --read-only                  Don't allow editing of datastream content
    -r, --refresh SECONDS            Refresh directory structure every SECONDS seconds
    -u, --user USER                  Authenticate to Fedora as USER
    -v, --volname NAME               Mount the volume as NAME
    -w, --password PASS              Authenticate to Fedora using PASS
    -z, --cache-size                 Number of objects to hold in memory
    -s, --save FILE                  Save options to FILE
    -h, --help                       Show this help message

Example:

mount_fedora -f http://localhost:8983/fedora -m /Volumes/fedora -u fedoraAdmin -w fedoraAdmin

Volume Layout

Directories

The root directory of the mounted volume contains all of the PID namespaces that exist within the repository (e.g., a default Fedora installation with the demo objects loaded will have "demo" and "fedora-system" directories, and possibly "changeme"). Each namespace directory contains a number of PID trees pointing to the actual objects. By default, the directory structure is flat -- each namespace directory simply contains one directory for each PID within its namespace. However, to make large numbers of PIDs more manageable, the configuration file can define a splitter for each namespace.

Splitters

A splitter is simply a regular expression that defines to turn a PID into a directory layout.

For example, the sample_config.yml file defines splitters for the local:* and druid:* namespaces:

local: /(..?)/
druid: /([a-z]{2})([0-9]{3})([a-z]{2})([0-9]{4})/

The directories under local, therefore, will be split into groups of two characters, while the druid PIDs will be grouped into directories of two lowercase letters, followed by three digits, then two more lowercase letters, and finally four digits. As a result, the content for the object with the PID local:abcdefg will be in local/ab/cd/ef/g, while druid:ab123cd4567 will be in druid/ab/123/cd/4567.

FedoraFS itself defines a splitter for the fedora-system:* namespace, as well as one called :default. Either of these can be overridden by a configuration file. The :default splitter is used when no other splitter is defined for a given PID's namespace.

Object Files

Each object's directory contains a file called foxml.xml, containing the full FOXML representation of the object. In addition, there will be one file for each datastream, with an extension generated by the datastream's MIME type. If the --attribute-xml switch was specified on the command line or in a loaded configuration file, there is also a read-only datastream.profile.xml file containing the datastream's attributes, as well as a profile.xml file containing the object's attributes. (Note: Because the filesystem has to compute size information for each of these files, the --attribute-xml option will slow down directory listings considerably.)

Special Files

The root directory of the volume contains a number of hidden special files that show (or change) information about the FedoraFS filesystem:

last_refresh.txt  (ro) - DateTime of last PID tree refresh from the Fedora repository
next_refresh.txt  (ro) - DateTime of next PID tree refresh
refresh_time.txt  (rw) - Time (in seconds) between PID tree refreshes
object_cache.txt  (ro) - A dump of the LRU cache used to keep the most recently accessed Fedora
                         objects in memory
object_count.txt  (ro) - How many objects are in the PID tree
log_level.txt     (rw) - The current logging level, from 0 to 5 (0 = most verbose)
read_only.txt     (rw) - Contains "true" if datastreams are read-only; "false" if they can be written
attribute_xml.txt (rw) - Contains "true" if *.profile.xml files should be included in directory
                         listings; "false" if they should be hidden

History

  • 0.3.2 - Remove non-sample config files from gemspec
  • 0.3.1 - Add tests; fix write_to() datastream bug
  • 0.3.0 - Initial documented release

Copyright (c) 2011 Michael B. Klein. See LICENSE.txt for further details.