Blockify

The blockify gem solves some of the problems associated with complex hierarchical nested arrays and hashes. It is possible to represent an HTML file as a series of Arrays and Hashes with very deep levels of Hashes within Arrays within Hashes. Traversing such a tree is tricky at best. The methods of the blockify gem are included in Array, and Hash using a naming convention that should not compete with anything. We don't need to concern ourselves with the complexity of the structure and can instead focus on the base elements wherever they may be.

Installation

Add this line to your application's Gemfile:

gem 'blockify'

And then execute:

$ bundle

Or install it yourself as:

$ gem install blockify

Usage

Blockify includes a module that inserts itself with the Array and Hash classes. This same module could be used to service custom container classes as well, so long as they look Array-like or Hash-like. As a practical point of view, examine the following code:

require 'blockify'

nested_thingee = [0,1,2,[3,4,5,[6,7,8,[9,10,11,12,[13,{:a=>[16,17],:b=>[18,19,20]}]]]],14,15]
ary = nested_thingee.stringify_elements
# ary == ["0","1","2",["3","4","5",["6","7","8",["9","10","11","12",["13",{:a=>["16","17"], :b=>["18","19","20"]}]]]],"14","15"]

We don't need to worry about what the structure is, we can traverse the entire system with a single call. Note that the returned structure is identical to the original. The blockify gem uses recursion to step through the entire structure. If we peek inside the #stringify_elements method we see this:

  def stringify_elements
    blockify_elements {|elm| elm.to_s}
  end

The #blockify_elements method travels through the structure and calls our block whenever it encounters an element that is not an Array or a Hash. Whatever the block returns, the element in question will be modified with that result. In this case, each element is converted to a string. All the recursion magic is performed by the gem. There are many other tools as well as will be detailed in the following sections.

instance.blockify_elements { |element| expression_to_replace_element }

The #blockify method is included in both Hash and Array. This method returns an identical structure of nested Hash and Array elements and allows each element to be modified on an element by element basis. The following example searches the structure and replaces nil elements with empty strings:

require 'blockify'

my_structure  = [0,nil,nil,"string",{a:nil, b:"a string", c:[nil,nil,"good",nil,"bad"]}]
new_structure = my_structure.blockify_elements { |elm| elm.nil? ? "" : elm }
# new_structure == [0, "", "", "string", {:a=>"", :b=>"a string", :c=>["", "", "good", "", "bad"]}]

instance.blockify_elements! { |element| expression_to_replace_element }

This method works identically to #blockify except it self-modifies. It also returns what it changes.

instance.scan_elements { |key,obj| statements }

This method scans the entire Hash-Array structure and provides a means of examining elements in each substructure. Unlike the #blockify method, #scan_elements does not attempt to create a new structure; instead, you have the opportunity to either read, or update each element in place.

In the passed block, the parameter key is either the Hash key or the Array index depending on what type obj happens to be. The expression obj[key] gives us the element. We could also update the element by assignment such as: obj[key]=expr. We can also find out what type our sub-structure currently is with: obj.class. In our block, we do not know and don't care about the structure; instead, we just want to work with each element in the current substructure. We can modify the element in place or extract its content into some external object, or a combination of the two. Although we have access to the sub-structure, we should not delete or add elements on this sub-structure as unexpected things might happen. We could implement a self-modifying version of stringify_elements using scan_elements this way:

require 'blockify'

my_structure  = [0,1,[{a:{a:88}},[6,{a:[1,2,3]},[4,5],4],3],2,1]
my_structure.scan_elements { |key,obj| obj[key] = obj[key].to_s  }
# my_structure == ["0", "1", [{:a=>{:a=>"88"}}, ["6", {:a=>["1", "2", "3"]}, ["4", "5"], "4"], "3"], "2", "1"]

If we wanted to create a non-destructive version of stringify using the #scan_elements method, we would first have to duplicate the structure and return the modified duplicate.

instance.find_element_path { |elm| boolean_statement }

This method returns the location path of the first occurrence of when the block returns true. A path is defined as a sequence of indexes that access the underlying Hash-Array structure. This is best seen by example as follows:

require 'blockify'

nested_thingee = [0,1,2,[3,4,5,[6,7,8,[9,10,11,12,[13,{:a=>[16,17],:b=>[18,19,20]}]]]],14,15]
path = nested_thingee.find_element_path { |t| t == 17 } # path == [3, 3, 3, 4, 1, :a, 1]
elm = nested_thingee[3][3][3][4][1][:a][1]  # elm == 17

# or easier ...
elm = nested_thingee.path_get path          # elm == 17

# and we can change it too!
old = nested_thingee.path_put "Fred" path   # old == 17
elm = nested_thingee.path_get path          # elm =="Fred"
# nested_thingee == [0,1,2,[3,4,5,[6,7,8,[9,10,11,12,[13,{:a=>[16,"Fred"],:b=>[18,19,20]}]]]],14,15]

It is up to your imagination what you put in the block. When the block returns true, the recursion is done, and you get the path. As a spoiler alert, we see two path related access methods which will be explained later.

instance.find_element_paths { |elm| boolean_statement }

This method works similarly as #find_element_path except that an array of paths are found. Instead of stopping on the first find, the search considers the entire structure. See the example below:

require 'blockify'

nested_thingee = [0,1,2,[13,4,15,[6,7,8,[9,10,11,12,[3,{:a=>[16,17],:b=>[18,19,20]}]]]],14,5]
paths = nested_thingee.find_element_paths { |t| (5..15).include? t }
paths.first  # [3,0]
paths.last   # [5]
paths[5]     # [3, 3, 3, 0]
nested_thingee.path_get paths[5]  #  9

# access elements from all paths that match the criteria:
nested_thingee.paths_get paths    #  [13, 15, 6, 7, 8, 9, 10, 11, 12, 14, 5]

And another spoiler alert from the above example comprises the #paths_get method.

instance.path_get(index_list)

This method is a short-handed way of accessing the Hash-Array structure with a list of indexes rather than a chain of access operators. This is typically used to access an element from the #find_element_path method. It can also be used to return part of the structure itself. If we pass an empty array, we get the entire structure. The index_list is simply an array of indexes where an index is either an Integer, or a Hash key.

instance.path_put(data, index_list)

This method replaces the element found at the #index_list with new data. This could also be used to replace the Hash-Array substructure with either data, or another substructure. Unlike #path_get passing an empty array as the index_list is not legal.

instance.paths_get(index_lists)

This method returns an element or a sub-structure for each index_list. The parameter index_lists is defined as an array of path indexes. Structurally, this is an array of an array of indexes. It is typically used in conjunction with the #find_element_paths method. See the example below:

nested_thingee = [0,1,2,[13,4,15,[6,7,8,[9,10,11,12,[3,{:a=>[16,17],:b=>[18,19,20]}]]]],14,5]
index_lists = []
index_lists.push [3,0]
index_lists.push [5]
index_lists.push [3, 3, 3, 0]
index_lists.push [3, 3, 3, 4, 1, :b, 2]
data = nested_thingee.paths_get index_lists  # data == [13, 5, 9, 20]

instance.circular?(path=[])

A circular Hash-Array structure contains paths that point back to itself in an never-ending loop. Using such a structure with some of the methods above will cause a stack overload to occur. This method detects such anomalies. You can also get the first violation path by passing an array. If there are multiple violations, you must remove all violations one-by-one before calling one of the recursive methods above. The example below demonstrates this as follows:


abc = [:a, :b, :c]
xyz = [1,2,3]
example = [0, {first: abc, second: xyz }]

example.circular?  # false

abc.push example   # create a circular structure

example.circular?  # true
abc.circular?      # true

path = []
abc.circular? path      #  true ... path == [3, 1]
path = []
example.circular? path  #  true ... path == [1, :first, 3]

xyz.push abc   # now a second violation 
xyz.circular?  # true
path = []
xyz.circular? path  # true ... path == [3, 3, 1]

example[0] = [[xyz,abc]]  # third violation
path = []
example.circular? path  # true ... path == [3, 3, 0, 0, 0]

instance.unloopify

This method repeatedly calls #circular? and replaces the offending substructure with a container object that sort of looks the the offending substructure. This container is called CircularObjectContainer. If the contained element happens to be a Hash object, then the container will appear Hash like and will include the methods such as #each_pair and #each. The Array like instance will implement the method #each. All instances of CircularObjectContainer are frozen read-only entities. All container elements of a CircularObjectContainer instance will be empty; this effectively breaks the loop. Also, the #[] access operator is available, but #[]= is forbidden. This object is detailed later in this document.

instance.stringify_elements

This method creates a new structure with every element converted to strings. It is the same as: instance.blockify_elements {|elm| elm.to_s}.

instance.stringify_elements!

This bang version of #stringify_elements self-modifies the current structure; additionally, it returns the modified structure.

instance.inspectify_elements

This method creates a new structure with every element replaced with #inspect called on that element. It is the same as: instance.blockify_elements {|elm| elm.inspect}.

instance.inspectify_elements!

This is a self-modifying version of #inspectify_elements which also returns the modified self.

instance.deep_duplify

This creates an identical deep copy of the original Hash-Array structure where each sub-structure is reconstructed from a new Hash or Array. Each element in the structure additionally calls #dup if permitted. This will covert internal CircularObjectContainer objects back into Hash or Array objects. It is a good idea to call this method after you call unloopify because CircularObjectContainer objects cannot be modified. This method internally calls #blockify_elements.

instance.flattenify

This method extracts every element from the Hash-Array structure and places each element in a one-level deep array. See the example below:

structure = ["zero",{m:[{t:["one", "two"],v:["three","four"]}],z:"five"}]
flat = structure.flattenify  # flat == ["zero", "one", "two", "three", "four", "five"]

instance.extractify(cls)

This method extracts elements that belong to the type of its parent container; the parameter cls is either Array or Hash. See the example below:

structure = [1,3,5,{a:2,b:4,c:6,d:[55,777],e:8},{t:[999],a:"even"}]
even = structure.extractify(Hash)  # even == {:a=>"even", :b=>4, :c=>6, :e=>8}
odd  = structure.extractify(Array) # odd == [999, 777, 5]

Note that if a Hash key is repeated, or an Array index is repeated, the returned element will replace previous elements at that key or that index. The returned structure is always flat.

instance.extractifirstify(cls)

Unlike #extractify which returns the last found element in the returned flatten structure, #extractifirstify fills the return structure with the first found element. See the example below:

structure = [1,3,5,{a:2,b:4,c:6,d:[55,777],e:8},{t:[999],a:"even"}]
even = structure.extractifirstify(Hash)  # even == {:a=>2, :b=>4, :c=>6, :e=>8}
odd  = structure.extractifirstify(Array) # odd == [1, 3, 5]

instance.extractistacktify(cls)

This method returns a two-level structure and places all compatible data into the second-level structure. The secondary structure is always an array. The primary structure is either Array or Hash depending on the parameter cls This is best seen by example as follows:

structure = [1,3,5,{a:2,b:4,c:6,d:[55,777],e:8},{t:[999],a:"even"}]
even = structure.extractistacktify(Hash)  # even == {:a=>[2, "even"], :b=>[4], :c=>[6], :e=>[8]}
odd  = structure.extractistacktify(Array) # odd == [[1, 55, 999], [3, 777], [5]]

From the example above, we have effectively captured and reorganized all the elements into two bins.

instance.includify?(search_string)

This method searches the Hash-Array structure's elements for a matching substring. The element is first converted to a string before examining the match. See the example below:

haystack = [555,"123",{a:"apple", b:"banana", c:[567,"needle",:fred]}]
haystack.includify? "needle"  # true
haystack.includify? "need"    # true
haystack.includify? "6"       # true
haystack.includify? "red"     # true
haystack.includify? "24"      # false
haystack.includify? ":"       # false  ... :fred.to_s == "fred"

CircularObjectContainer

CircularObjectContainer instances are created automatically within the Hash-Array structure when #unloopify encounters a circular construct. This object is further detailed the the following subsections:

construction

CircularObjectContainer can contain any object, but its purpose is to contain Hash-like or Array-like entities. It takes this form: CircularObjectContainer.new(item_to_be_contained). The constructor then exposes methods such as #each, #each_pair, and #[] depending on the contained type. The internal object is not saved, but instead saves a reconstructed version of the original. Only the original object_id is preserved. Non-container objects will use the original if #dup is not permitted. The internal reconstructed object calls #freeze as well as the newly constructed returned object.

[] array access postfix operator

This method is added as a singleton method if and only if the contained entity is Hash-like or Array-like. Note that :[]= is never implemented. This method calls the internal access operator of the contained object.

each

This method is added as a singleton method if and only if the contained entity implements #each. This method calls the contained object's #each method.

each_pair

This method is added as a singleton method if and only if the contained entity implements #each_pair. This method calls the contained object's #each_pair method.

id

This method returns the original object_id during construction.

type

This method returns the class of the contained reconstructed entity.

dup

Instead of duplicating a CircularObjectContainer instance, the internal element is duplicated and returned. This will most likely be a Hash or an Array.

taboo?

This always returns true.

def circular?

This returns false and allows CircularObjectContainer objects to be used within a structure and still be able to call #circular?.

blockify_elements

This calls #blockify_elements from the contained entity.

find_element_path

This calls #find_element_path from the contained entity.

find_element_paths

This calls #find_element_paths from the contained entity.

scan_elements

This calls #scan_elements from the contained entity.

Incidental Methods Added to Standard Objects

The main class Object has the method #taboo? added which returns false. The object Hash has two methods added: incrementify(key_name) and decrementify(key_name). These add or subtract the Integer 1 to the indexed element.

Adding blockify to custom objects

If your custom object is Hash-like, or Array-like, it can have the methods of blockify included. The following attributes must be present:

  1. Must be able to be constructed as a blank-slate (#new without parameters).
  2. Must implement :[] and :[]= access operators.
  3. Must implement #each
  4. If #each_pair is implemented, #each must behave the way Hash behaves.
  5. If array-like (#each_pair not implemented), must implement #push and #pop
  6. Call this: MyCustomClass.include Blockify

Development

I need to control this for the time being, so stay tuned! I will add more goodies in later releases.

License

The gem is available as open source under the terms of the MIT License.