Method: Spark::RDD#aggregate

Defined in:
lib/spark/rdd.rb

#aggregate(zero_value, seq_op, comb_op) ⇒ Object

Aggregate the elements of each partition, and then the results for all the partitions, using given combine functions and a neutral “zero value”.

This function can return a different result type. We need one operation for merging.

Result must be an Array otherwise Serializer Array’s zero value will be send as multiple values and not just one.

Example:

# 1 2 3 4 5  => 15 + 1 = 16
# 6 7 8 9 10 => 40 + 1 = 41
# 16 * 41 = 656

seq = lambda{|x,y| x+y}
com = lambda{|x,y| x*y}

rdd = $sc.parallelize(1..10, 2)
rdd.aggregate(1, seq, com)
# => 656


342
343
344
# File 'lib/spark/rdd.rb', line 342

def aggregate(zero_value, seq_op, comb_op)
  _reduce(Spark::Command::Aggregate, seq_op, comb_op, zero_value)
end