Sufia

Version Build Status Dependency Status Coverage Status Documentation Status Code Climate Apache 2.0 License Contribution Guidelines API Docs Stories in Ready

Table of Contents

What is Sufia?

Sufia is a component that adds self-deposit institutional repository features to a Rails app. Sufia builds on the Hydra framework.

Sufia has the following features:

  • Multiple file, or folder, upload
  • Flexible user- and group-based access controls
  • Transcoding of audio and video files
  • Generation and validation of identifiers
  • Fixity checking
  • Version control
  • Characterization of uploaded files
  • Forms for batch editing metadata
  • Faceted search and browse
  • Social media interaction
  • User profiles
  • User dashboard for file management
  • Highlighted files on profile
  • Sharing w/ groups and users
  • User notifications
  • Activity streams
  • Background jobs
  • Single-use links
  • Google Analytics for usage statistics
  • Integration w/ cloud storage providers
  • Google Scholar-specific metadata embedding
  • Schema.org microdata, Open Graph meta tags, and Twitter cards for rich snippets
  • User-managed collections for grouping files
  • Full-text indexing & searching
  • Responsive, fluid, Bootstrap 3-based UI
  • Dynamically configurable featured works and researchers on homepage
  • Proxy deposit and transfers of ownership
  • Integration with Zotero for automatic population of user content

Help

If you have questions or need help, please email the Hydra community tech list or stop by the Hydra community IRC channel.

Creating a Sufia-based app

This document contains instructions specific to setting up an app with Sufia v6.6.0. If you are looking for instructions on installing a different version, be sure to select the appropriate branch or tag from the drop-down menu above.

Prerequisites

Sufia requires the following software to work:

  1. Solr (tested with Solr 4.x)
  2. Fedora 4 repository
  3. A SQL RDBMS (MySQL, PostgreSQL), though note that SQLite will be used by default if you're looking to get up and running quickly
  4. Redis key-value store
  5. ImageMagick
  6. FITS 0.6.2

NOTE: If you do not already have Solr and Fedora instances you can use in your development environment, you may use hydra-jetty (instructions are provided below to get you up and running quickly and with minimal hassle).

Characterization

  1. Go to http://projects.iq.harvard.edu/fits/downloads and download a copy of FITS 0.6.2 & unpack it somewhere on your machine. You can also install FITS on OSX with homebrew brew install fits (you may also have to create a symlink from fits.sh -> fits in the next step).
  2. Mark fits.sh as executable (chmod a+x fits.sh)
  3. Run "fits.sh -h" from the command line and see a help message to ensure FITS is properly installed
  4. Give your Sufia app access to FITS by:
    1. Adding the full fits.sh path to your PATH (e.g., in your .bash_profile), OR
    2. Changing config/initializers/sufia.rb to point to your FITS location: config.fits_path = "/<your full path>/fits.sh"

NOTE: Sufia is not compatible with later versions of Fits, so be sure you are using the latest 0.6.x version.

Environments

Note here that the following commands assume you're setting up Sufia in a development environment (using the Rails built-in development environment). If you're setting up a production or production-like environment, you may wish to tell Rails that by prepending RAILS_ENV=production to the commands that follow, e.g., rails, rake, bundle, and so on.

Ruby

First, you'll need a working Ruby installation. You can install this via your operating system's package manager -- you are likely to get farther with OSX, Linux, or UNIX than Windows but your mileage may vary -- but we recommend using a Ruby version manager such as RVM or rbenv.

We recommend either Ruby 2.2 or the latest 2.1 version.

Rails

Generate a new Rails application. Sufia > 6.1.0 requires Rails 4.2.

gem install rails -v 4.2
rails new my_app

Sufia-related dependencies

Add the following lines to your application's Gemfile.

gem 'sufia', '6.6.0'
gem 'kaminari', github: 'jcoyne/kaminari', branch: 'sufia'  # required to handle pagination properly in dashboard. See https://github.com/amatsuda/kaminari/pull/322

Then install Sufia as a dependency of your app via bundle install

Pagination

The line with kaminari -- a Ruby library that helps build pagination into applications -- listed as a dependency in the Gemfile is a temporary fix to address a known problem in the current release of kaminari.

Install Sufia

Install Sufia into your app using its built-in install generator. This step adds a number of files that Sufia requires within your Rails app, including e.g. a number of database migrations.

rails generate sufia:install -f

Database tables and indexes

Now that Sufia's required database migrations have been generated into your app, you'll need to load them into your application's database.

rake db:migrate

Solr and Fedora

If you already have instances of Solr and Fedora 4 that you would like to use, you may skip this step. Otherwise feel free to use hydra-jetty, the bundled copy of Jetty, a Java servlet container that is configured to run versions of Solr and Fedora that are known to work with Sufia. Hydra-jetty (since v8.4.0) requires Java 8.

The following rake tasks will install hydra-jetty and start up Jetty with Solr and Fedora.

rake jetty:clean
rake sufia:jetty:config
rake jetty:start

Background Workers

Reference: https://github.com/resque/resque

Resque Terminology

resque
resque is a message queue that is used by Sufia to manage long-running or slow processes.
pools
resque-pool is a tool for managing (starting, stopping) and configuring a bunch of resque worker processes. See Configure Resque below for more information.
workers
Workers run the background jobs. Each worker has a copy of the rails-app which has the jobs code in `app/jobs` or `sufia-models/app/jobs`. The workers listen to queues (by polling redis) and pull off messages waiting on the queue. Once a worker pulls a message, it will execute the background job encoded in the json message. A worker can be dedicated to a single queue or may listen to multiple queues. Multiple workers can listen to the same queue.
queue
Messages are sent to a queue where they wait until a worker is ready to process the message. Sufia defines a number of queues for processing different background jobs (e.g. batch_update, characterize, etc.). There could have been one queue to process all messages, but Sufia decided to have specialized queues.
message
Messages encode in json what job should be run. It includes the name of the method to execute in background and any parameters to pass to the method.
redis
Redis is a key-value store. Resque uses redis to track messages in the queue.

Configure Resque

Configuration File: config/resque-pool.yml.

Minimal configuration: Create 1 worker and all queues are processed by this worker.

 "*": 1

Typical configuration: (Scholarsphere example)

batch_update: 3
derivatives: 1
resolrize: 1
audit: 2
event: 5
import_url: 3
sufia: 1
"*": 1

Each line defines a queue name and how many workers should be created to process that queue.

Start Worker Pool

Prerequisite: Redis must be installed and running on your system in order for background workers to pick up jobs. Unless Redis has already been started, you will want to start it up. You can do this either by calling the redis-server command, or if you're on certain Linuxes, you can do this via sudo service redis-server start.

OPTION 1: Start 1 worker (ignores the configuration file): The following command will run until you stop it, so you may want to do this in a dedicated terminal and would typically be used during development only.

IMPORTANT: Change directories to the root of the rails app before executing. RUN_AT_EXIT_HOOKS=true TERM_CHILD=1 QUEUE=* rake environment resque:work

OPTION 2: Start workers based on configuration: Typically used for production, and can be used for development. See configuration examples above.

IMPORTANT: Change directories to the root of the rails app before executing. RUN_AT_EXIT_HOOKS=true TERM_CHILD=1 bundle exec resque-pool --daemon --environment development start

Occasionally, Resque may not give background jobs a chance to clean up temporary files. The RUN_AT_EXIT_HOOKS variable allows Resque to do so. The TERM_CHILD variable allows workers to terminate gracefully rather than interrupting currently running workers. For more information on the signals that Resque responds to, see the resque-pool documentation.

Restarting Worker Pool

IMPORTANT: Change directories to the root of the rails app before executing.

Script to restart worker pool: restart-pool

You may want to adjust it to meet your needs.

NOTE: The default location for the pid file is the tmp directory. You will want to update RESQUE_POOL_PIDFILE to point to the location where your pid file is generated if you use a location other than the default.

Expected processes for Resque -- Resolving Unexpected Behaviors and Failures

The code executed by workers should stay in sync with changes you make to your rails app. If the code isn't staying in sync, you may have more than one resque-pool process running. You can also see unusual behaviors if more than one redis-server is running. The information below describes the processes that will be running when everything is operating correctly.

You should see 1 redis-server process. $ ps -ef | grep redis-server user1 7982 7882 0 01:26 pts/3 00:00:00 grep redis-server root 8398 1 0 00:08 ? 00:00:04 /usr/local/bin/redis-server 0.0.0.0:6379 NOTE: If you see multiple redis-server processes running, kill each and start redis-server again. You should only have one redis-server process running.

You should see 1 resque-pool process. $$ ps -ef | grep resque-pool user1 8059 7882 0 01:27 pts/3 00:00:00 grep resque-pool root 8416 1 0 00:08 ? 00:00:08 resque-pool-master[agriknowledge]: managing [8653]

You should see at least one worker process waiting. $$ ps -ef | grep resque | grep Waiting root 8653 8416 0 00:08 ? 00:00:01 resque-1.25.2: Waiting for * NOTE: If you see multiple resque-pool processes running, kill each AND all the resque Waiting processes as well. Start resque-pool again. You should only have one resque-pool process running. But you may have multiple worker processes running.

Monitor background workers

Edit config/initializers/resque_admin.rb so that ResqueAdmin#matches? returns a true value for the user/s who should be able to access this page. One fast way to do this is to return current_user.admin? and add an admin? method to your user model which checks for specific emails or the admin role. See Admin Users for information on how to add users with the admin role.

Then you can view jobs at the admin/queues route.

Audiovisual transcoding

Sufia includes support for transcoding audio and video files. To enable this, make sure to have ffmpeg > 1.0 installed.

On OSX, you can use homebrew for this.

brew install ffmpeg --with-fdk-aac --with-libvpx --with-libvorbis

To compile ffmpeg yourself, see https://trac.ffmpeg.org/wiki/CompilationGuide

User interface

Remove turbolinks support from app/assets/stylesheets/application.css if present:

//= require turbolinks

Turbolinks causes the dynamic content editor not to load.

Integration with Dropbox, Box, etc.

Sufia provides built-in support for the browse-everything gem, which provides a consolidated file picker experience for selecting files from DropBox, Skydrive, Google Drive, Box, and a server-side directory share.

To activate browse-everything in your sufia app, run the browse-everything config generator

rails g browse_everything:config

This will generate a file at config/browse_everything_providers.yml. Open that file and enter the API keys for the providers that you want to support in your app. For more info on configuring browse-everything, go to the project page on github.

After running the browse-everything config generator and setting the API keys for the desired providers, an extra tab will appear in your app's Upload page allowing users to pick files from those providers and submit them into your app's repository.

If your config/initializers/sufia.rb was generated with sufia 3.7.2 or earlier, then you need to add this line to an initializer (probably _config/initializers/sufia.rb _): ruby config.browse_everything = BrowseEverything.config

Analytics and usage statistics

Sufia provides support for capturing usage information via Google Analytics and for displaying usage stats in the UI.

Capturing usage

To enable the Google Analytics javascript snippet, make sure that config.google_analytics_id is set in your app within the config/initializers/sufia.rb file. A Google Analytics ID typically looks like UA-99999999-1.

Displaying usage in the UI

To display data from Google Analytics in the UI, first head to the Google Developers Console and create a new project:

https://console.developers.google.com/project

Let's assume for now Google assigns it a project ID of foo-bar-123. It may take a few seconds for this to complete (watch the Activities bar near the bottom of the browser). Once it's complete, enable the Google+ and Google Analytics APIs here (note: this is an example URL -- you'll have to change the project ID to match yours):

https://console.developers.google.com/project/apps~foo-bar-123/apiui/api

Finally, head to this URL (note: this is an example URL -- you'll have to change the project ID to match yours):

https://console.developers.google.com/project/apps~foo-bar-537/apiui/credential

And create a new OAuth client ID. When prompted for the type, use the "Service Account" type. This will give you the OAuth client ID, a client email address, a private key file, a private key secret/password, which you will need in the next step.

Then run this generator:

rails g sufia:models:usagestats

The generator will create a configuration file at config/analytics.yml. Edit that file to reflect the information that the Google Developer Console gave you earlier, namely you'll need to provide it:

  • The path to the private key
  • The password/secret for the privatekey
  • The OAuth client email
  • An application name (you can make this up)
  • An application version (you can make this up)

Lastly, you will need to set config.analytics = true and config.analytic_start_date in config/initializers/sufia.rb and ensure that the OAuth client email has the proper access within your Google Analyics account. To do so, go to the Admin tab for your Google Analytics account. Click on User Management, in the Account column, and add "Read & Analyze" permissions for the OAuth client email address.

Zotero integration

Integration with Zotero-managed publications is possible using Arkivo. Arkivo is a Node-based Zotero subscription service that monitors Zotero for changes and will feed those changes to your Sufia-based app. Read more about this work.

To enable Zotero integration, first register an OAuth client with Zotero, then install and start Arkivo-Sufia and then generate the Arkivo API in your Sufia-based application:

rails g sufia:models:arkivo_api

The generator does the following:

  • Enables the API in the Sufia initializer
  • Adds a database migration
  • Creates a routing constraint that allows you to control what clients can access the API
  • Copies a config file that allows you to specify the host and port Arkivo is running on
  • Copies a config file for your Zotero OAuth client credentials

Update your database schema with rake db:migrate.

Add unique Arkivo tokens for each of your existing user accounts with rake sufia:user:tokens. (New users will have tokens created as part of the account creation process.)

Edit the routing constraint in config/initializers/arkivo_constraint.rb so that your Sufia-based app will allow connections from Arkivo. Make sure this is restrictive as you are allowing access to an API that allows creates, updates and deletes.

Tweak config/arkivo.yml to point at the host and port your instance of Arkivo is running on.

Tweak config/zotero.yml to hold your Zotero OAuth client key and secret. Alternatively, if you'd rather not paste these into a file, you may use the environment variables ZOTERO_CLIENT_KEY and ZOTERO_CLIENT_SECRET.

Restart your app and it should now be able to pull in Zotero-managed publications on behalf of your users. Each user will need to link their Sufia app account with their Zotero accounts, which can be done in the "Edit Profile" page. After the accounts are linked, Arkivo will create a subscription to that user's Zotero-hosted "My Publications" collection. When users add items to their "My Publications" collection via the Zotero client, they will automatically be pushed into the Sufia-based repository application. Updates to these items will trigger updates to item metadata in your app, and deletes will delete the files from your app.

Tag Cloud

Sufia provides a tag cloud on the home page. To change which field is displayed in that cloud, change the value of config.tag_cloud_field_name in the blacklight_config section of your CatalogController. For example:

configure_blacklight do |config|
  ...

  # Specify which field to use in the tag cloud on the homepage.
  # To disable the tag cloud, comment out this line.
  config.tag_cloud_field_name = Solrizer.solr_name("tag", :facetable)
end

If your CatalogController was generated by a version of Sufia older than 3.7.3 you need to add that line to the Nlacklight configuration in order to make the tag cloud appear.

The contents of the cloud are retrieved as JSON from Blacklight's CatalogController#facet method. If you need to change how that content is returned (ie. if you need to limit the number of results), override the render_facet_list_as_json method in your CatalogController.

Customizing metadata

Chances are you will want to customize the default metadata provided by Sufia. Here's a guide to help you with that

Admin Users

One time setup for first admin

Follow the directions for installing hydra-role-management.

Add the following gem to Sufia installed app's Gemfile ruby gem "hydra-role-management"

Then install the gem, run the generator, and database migrations: ```

each of these commands will produce some output.

bundle install rails generate roles rake db:migrate ```

Adding an admin user

In rails console, run the following commands to create the admin role. r = Role.create name: "admin"

Add a user as the admin. r.users << User.find_by_user_key( "your_admin_users_email@fake.email.org" ) r.save

Confirm user was made an admin. ``` u = User.find_by_user_key( "your_admin_users_email@fake.email.org" ) u.admin? # shows SELECT statment => true

if u.admin? == true then SUCCESS ```

Confirm in browser

  • go to your Sufia install
  • login as the admin user
  • add /roles to the end of the main URL

SUCCESS will look like...

  • you don't get an error on the /roles page
  • you see a button labeled "Create a new role"

License

Sufia is available under the Apache 2.0 license.

Contributing

We'd love to accept your contributions. Please see our guide to contributing to Sufia.

Development

This information is for people who want to modify the engine itself, not an application that uses the engine:

Regenerating the README TOC

Install the gh-md-toc tool, then ensure your README changes are up on GitHub, and then run:

gh-md-toc https://github.com/USERNAME/sufia/blob/BRANCH/README.md

That will print to stdout the new TOC, which you can copy into README.md, commit, and push.

Run the test suite

rake jetty:start
redis-server
rake engine_cart:clean
rake engine_cart:generate
rake spec

Change validation behavior

To change what happens to files that fail validation add an after_validation hook ``` after_validation :dump_infected_files

def dump_infected_files
  if Array(errors.get(:content)).any? { |msg| msg =~ /A virus was found/ }
    content.content = errors.get(:content)
    save
  end
end

# Acknowledgments

This software has been developed by and is brought to you by the Hydra community.  Learn more at the
[Project Hydra website](http://projecthydra.org)

![Project Hydra Logo](https://github.com/uvalib/libra-oa/blob/a6564a9e5c13b7873dc883367f5e307bf715d6cf/public/images/powered_by_hydra.png?raw=true)