hash-joiner Gem

Gem Version Build Status Code Climate Test Coverage

Performs pruning or one-level promotion of Hash attributes (typically labeled private:), and deep merges and joins of Hash objects. Works on Array objects containing Hash objects as well.

Downloads and API docs are available on the hash-joiner RubyGems page. API documentation is written using YARD markup.

Contributed by the 18F team, part of the United States General Services Administration: https://18f.gsa.gov/

Motivation

This gem was extracted from the 18F Hub Joiner plugin. That plugin manipulates Jekyll-imported data by removing or promoting private data, building indices, and performing joins between different data files so that the results appear as unified collections in Jekyll's site.data object. It serves as the first stage in a pipeline that also builds cross-references and canonicalizes data before generating static HTML pages and other artifacts.

Installation

$ gem install hash-joiner

Usage

The typical use case is to have a YAML file containing both public and private data, with all private data nested within private: properties:

> require 'hash-joiner'
> my_data_collection = {
    'name' => 'mbland', 'full_name' => 'Mike Bland',
    'private' => {
      'email' => '[email protected]', 'location' => 'DCA',
    },
  }

The following examples, except for Join an Array of Hash values, all begin with my_data_collection in the above state. Further examples can be found in the test/ directory.

Strip private data

# Everything within the `private:` property will be deleted.
> HashJoiner.remove_data my_data_collection, "private"
=> {"name"=>"mbland", "full_name"=>"Mike Bland"}

Promote private data

This will render private: data at the same level as other, nonprivate data:

# Everything within the `private:` property will be
# promoted up one level.
> HashJoiner.promote_data my_data_collection, "private"
=> {"name"=>"mbland", "full_name"=>"Mike Bland",
    "email"=>"[email protected]", "location"=>"DCA"}

Perform a deep merge with other Hash values

> extra_info = {
  'languages' => ['C++', 'Python'], 'full_name' => 'Michael S. Bland',
  'private' => {
    'location' => 'Alexandria, VA', 'previous_companies' => ['Google'],
    },
  }

# The original Hash will have information added for
# `full_name`, `languages', and `private => location`.
> HashJoiner.deep_merge my_data_collection, extra_info
=> {"name"=>"mbland", "full_name"=>"Michael S. Bland",
    "private"=>{
      "email"=>"[email protected]", "location"=>"Alexandria, VA",
      "previous_companies"=>["Google"]},
    "languages"=>["C++", "Python"]}

> extra_info = {
    'languages' => ['Ruby'],
    'private' => {
      'previous_companies' => ['Northrop Grumman'],
    },
  }

# The Hash will now have added information for
# `languages` and `private => previous_companies`.
> HashJoiner.deep_merge my_data_collection, extra_info
=> {"name"=>"mbland", "full_name"=>"Michael S. Bland",
    "private"=>{
      "email"=>"[email protected]", "location"=>"Alexandria, VA",
      "previous_companies"=>["Google", "Northrop Grumman"]},
    "languages"=>["C++", "Python", "Ruby"]}

Join an Array of Hash values

This corresponds to the process of joining different collections of Jekyll-imported data within the 18F Hub, such as joining site.data['private']['team'] into site.data['team'].

# This defines a fake object emulating a Jekyll::Site.
> class DummySite
    attr_accessor :data
    def initialize
      @data = {'private' => {}}
    end
  end

> site = DummySite.new

# This data would correspond to _data/team.yml
# in a Jekyll project.
> site.data['team'] = [
    {'name' => 'mbland', 'languages' => ['C++']},
    {'name' => 'foobar', 'full_name' => 'Foo Bar'},
  ]

# This data would correspond to _data/private/team.yml
# in a Jekyll project.
> site.data['private']['team'] = [
    {'name' => 'mbland', 'languages' => ['Python', 'Ruby']},
    {'name' => 'foobar', 'email' => '[email protected]'},
    {'name' => 'bazquux', 'email' => '[email protected]'},
  ]

# After joining, each element of `site.data['team']` contains
# the union of the original element and the corresponding
# element in `site.data['private']['team']`.
#
# `site.data['private']` can now be safely discarded.
> HashJoiner.join_data 'team', 'name', site.data, site.data['private']
=> {"private"=>{
      "team"=>[
        {"name"=>"mbland", "languages"=>["Python", "Ruby"]},
        {"name"=>"foobar", "email"=>"[email protected]"},
        {"name"=>"bazquux", "email"=>"[email protected]"}]},
    "team"=>[
      {"name"=>"mbland", "languages"=>["C++", "Python", "Ruby"]},
      {"name"=>"foobar", "full_name"=>"Foo Bar", "email"=>"[email protected]"},
      {"name"=>"bazquux", "email"=>"[email protected]"}]}

Running filter-yaml-files

The filter-yaml-files program can be used to generate "public" versions of YAML files containing "private" data. For example:

$ export HUB_DATA_DIR=../hub/_data

$ filter-yaml-files ${HUB_DATA_DIR}/private/{team,projects}.yml -o ${HUB_DATA_DIR}/public
../hub/_data/private/team.yml => ../hub/_data/public/team.yml
../hub/_data/private/projects.yml => ../hub/_data/public/projects.yml

The filter-yaml-files program can also strip other properties besides private:, and can promote data contained within a property rather than strip it. Run filter-yaml-files -h to see the options that allow this.

Contributing

Just fork 18F/hash-joiner and start sending pull requests! Feel free to ping @mbland with any questions you may have, especially if the current documentation should've addressed your needs, but didn't.

Public domain

This project is in the worldwide public domain. As stated in CONTRIBUTING:

This project is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the CC0 1.0 Universal public domain dedication.

All contributions to this project will be released under the CC0 dedication. By submitting a pull request, you are agreeing to comply with this waiver of copyright interest.