Clusta

Clusta is a Ruby gem for network analysis built on top of Wukong.

Wukong lets you write Ruby scripts that run on your laptop as well as on a Hadoop cluster.

Clusta is:

  • classes that make describing the geometry of networks easy

  • network algorithms written with these classes to use Wukong

  • a shim command-line program for running these algorithms

Start with a file containing edges:

Edge  1  2
Edge  2  3
Edge  1  4
Edge  4  5
Edge  5  6
Edge  5  7
Edge  6  8
Edge  7  8
Edge  8  9

Run this through a transformation named edges_to_degrees:

$ clusta --transform=edges_to_degrees /local/edges.tsv -
Degree  1  2
Degree  2  2
Degree  3  1
Degree  4  2
Degree  5  3
Degree  6  2
Degree  7  2
Degree  8  3
Degree  9  1

Chain transformations together:

$ clusta --transform=edges_to_neighborhoods /local/edges.tsv - | clusta --transform=neighborhoods_to_degree_pairs - - | clusta --transform=degree_pairs_to_assortativities - -
Assortativity 1 2 1
Assortativity 1 3 1
Assortativity 2 1 1
Assortativity 2 2 4
Assortativity 2 3 5
Assortativity 3 1 1
Assortativity 3 2 5

And then leverage Wukong when you’re ready:

$ clusta --run=hadoop --transform=edges_to_neighborhoods /hdfs/edges.tsv /hdfs/neighborhoods.tsv
I, [2012-03-03T21:00:39.992750 #25835]  INFO -- :   Launching hadoop!
I, [2012-03-03T21:00:39.992979 #25835]  INFO -- : Running

/usr/lib/hadoop/bin/hadoop  \
  jar /usr/lib/hadoop/contrib/streaming/hadoop-*streaming*.jar  \
  -D mapred.job.name='clusta---spec/data/edges/undirected.unweighted.tsv----'   \
  -mapper  '/usr/bin/ruby1.9.1 clusta --map --log_interval=10000 --log_seconds=30 --transform=edges_to_degrees'   \
  -reducer '/usr/bin/ruby1.9.1 clusta --reduce --log_interval=10000 --log_seconds=30 --transform=edges_to_degrees'  \
  -input   'spec/data/edges/undirected.unweighted.tsv'  \
  -output  '-'  \
  -file    '/home/user/projects/networks/clusta/bin/clusta'
...