FedoraFS -- FUSE Filesystem for Fedora Commons Repositories
Introduction
FedoraFS is a Ruby class that creates a FUSE (Filesystem in USErspace) filesystem on top of a Fedora Commons repository. Features include:
- Resource Index (risearch) discovery of repository objects
- Read/Write access to datastream content
- Ability to expose object/datastream attributes as datastream.profile.xml files
- Namespace-specific PID splitting for creation of manageable directory hierarchies
Usage
Usage:
mount_fedora [options]
-C, --config-file FILE Load defaults from FILE
-a, --attribute-xml Include object/datastream attribute XML files
in directory listings
-c, --cert-file FILE Use client certificate from FILE
-D, --no-daemon Run in the foreground (for debugging)
-f, --fedora-url URL Use Fedora instance at URL
-k, --key-file FILE Use client key from FILE
--log-file FILE Send logging output to FILE
--log-level LEVEL Set the logging level (0-5; 0 = most verbose)
-m, --mount-point DIR Mount filesystem on DIR
-p, --key-pass STRING Password for client key
-R, --read-only Don't allow editing of datastream content
-r, --refresh SECONDS Refresh directory structure every SECONDS seconds
-u, --user USER Authenticate to Fedora as USER
-v, --volname NAME Mount the volume as NAME
-w, --password PASS Authenticate to Fedora using PASS
-z, --cache-size Number of objects to hold in memory
-s, --save FILE Save options to FILE
-h, --help Show this help message
Example:
mount_fedora -f http://localhost:8983/fedora -m /Volumes/fedora -u fedoraAdmin -w fedoraAdmin
Volume Layout
Directories
The root directory of the mounted volume contains all of the PID namespaces that exist within the repository (e.g., a default Fedora installation with the demo objects loaded will have "demo" and "fedora-system" directories, and possibly "changeme"). Each namespace directory contains a number of PID trees pointing to the actual objects. By default, the directory structure is flat -- each namespace directory simply contains one directory for each PID within its namespace. However, to make large numbers of PIDs more manageable, the configuration file can define a splitter for each namespace.
Splitters
A splitter is simply a regular expression that defines to turn a PID into a directory layout.
For example, the sample_config.yml
file defines splitters for the local:*
and druid:*
namespaces:
local: /(..?)/
druid: /([a-z]{2})([0-9]{3})([a-z]{2})([0-9]{4})/
The directories under local
, therefore, will be split into groups of two characters, while the druid
PIDs will
be grouped into directories of two lowercase letters, followed by three digits, then two more lowercase letters, and
finally four digits. As a result, the content for the object with the PID local:abcdefg
will be in local/ab/cd/ef/g
,
while druid:ab123cd4567
will be in druid/ab/123/cd/4567
.
FedoraFS
itself defines a splitter for the fedora-system:*
namespace, as well as one called :default
. Either of
these can be overridden by a configuration file. The :default
splitter is used when no other splitter is defined for
a given PID's namespace.
Object Files
Each object's directory contains a file called foxml.xml
, containing the full FOXML representation of the object. In
addition, there will be one file for each datastream, with an extension generated by the datastream's MIME type. If the
--attribute-xml
switch was specified on the command line or in a loaded configuration file, there is also a read-only
datastream.profile.xml file containing the datastream's attributes, as well as a profile.xml file containing the object's
attributes. (Note: Because the filesystem has to compute size information for each of these files, the --attribute-xml
option will slow down directory listings considerably.)
Special Files
The root directory of the volume contains a number of hidden special files that show (or change) information about the FedoraFS filesystem:
last_refresh.txt (ro) - DateTime of last PID tree refresh from the Fedora repository
next_refresh.txt (ro) - DateTime of next PID tree refresh
refresh_time.txt (rw) - Time (in seconds) between PID tree refreshes
object_cache.txt (ro) - A dump of the LRU cache used to keep the most recently accessed Fedora
objects in memory
object_count.txt (ro) - How many objects are in the PID tree
log_level.txt (rw) - The current logging level, from 0 to 5 (0 = most verbose)
read_only.txt (rw) - Contains "true" if datastreams are read-only; "false" if they can be written
attribute_xml.txt (rw) - Contains "true" if *.profile.xml files should be included in directory
listings; "false" if they should be hidden
History
- 0.3.2 - Remove non-sample config files from gemspec
- 0.3.1 - Add tests; fix write_to() datastream bug
- 0.3.0 - Initial documented release
Copyright
Copyright (c) 2011 Michael B. Klein. See LICENSE.txt for further details.