Method: Spark::RDD#aggregate
- Defined in:
- lib/spark/rdd.rb
#aggregate(zero_value, seq_op, comb_op) ⇒ Object
Aggregate the elements of each partition, and then the results for all the partitions, using given combine functions and a neutral “zero value”.
This function can return a different result type. We need one operation for merging.
Result must be an Array otherwise Serializer Array’s zero value will be send as multiple values and not just one.
Example:
# 1 2 3 4 5 => 15 + 1 = 16
# 6 7 8 9 10 => 40 + 1 = 41
# 16 * 41 = 656
seq = lambda{|x,y| x+y}
com = lambda{|x,y| x*y}
rdd = $sc.parallelize(1..10, 2)
rdd.aggregate(1, seq, com)
# => 656
342 343 344 |
# File 'lib/spark/rdd.rb', line 342 def aggregate(zero_value, seq_op, comb_op) _reduce(Spark::Command::Aggregate, seq_op, comb_op, zero_value) end |