Class: Ai4r::Clusterers::Diana

Inherits:
Clusterer show all
Defined in:
lib/ai4r/clusterers/diana.rb

Overview

DIANA (Divisive ANAlysis) (Kaufman and Rousseeuw, 1990; Macnaughton - Smith et al. 1964) is a Divisive Hierarchical Clusterer. It begins with only one cluster with all data items, and divides the clusters until the desired clusters number is reached.

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from Data::Parameterizable

#get_parameters, included, #set_parameters

Constructor Details

#initializeDiana

Returns a new instance of Diana.



31
32
33
34
35
36
37
# File 'lib/ai4r/clusterers/diana.rb', line 31

def initialize
  @distance_function = lambda do |a,b| 
      Ai4r::Data::Proximity.squared_euclidean_distance(
        a.select {|att_a| att_a.is_a? Numeric} , 
        b.select {|att_b| att_b.is_a? Numeric})
    end
end

Instance Attribute Details

#clustersObject (readonly)

Returns the value of attribute clusters.



23
24
25
# File 'lib/ai4r/clusterers/diana.rb', line 23

def clusters
  @clusters
end

#data_setObject (readonly)

Returns the value of attribute data_set.



23
24
25
# File 'lib/ai4r/clusterers/diana.rb', line 23

def data_set
  @data_set
end

#number_of_clustersObject (readonly)

Returns the value of attribute number_of_clusters.



23
24
25
# File 'lib/ai4r/clusterers/diana.rb', line 23

def number_of_clusters
  @number_of_clusters
end

Instance Method Details

#build(data_set, number_of_clusters) ⇒ Object

Build a new clusterer, using divisive analysis (DIANA algorithm)



40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# File 'lib/ai4r/clusterers/diana.rb', line 40

def build(data_set, number_of_clusters)
  @data_set = data_set
  @number_of_clusters = number_of_clusters
  @clusters = [@data_set[0..-1]]
  
  while(@clusters.length < @number_of_clusters)
    cluster_index_to_split = max_diameter_cluster(@clusters)
    cluster_to_split = @clusters[cluster_index_to_split]
    splinter_cluster = init_splinter_cluster(cluster_to_split)
    while true
      dist_diff, index = max_distance_difference(cluster_to_split, splinter_cluster)
      break if dist_diff < 0
      splinter_cluster << cluster_to_split.data_items[index]
      cluster_to_split.data_items.delete_at(index)
    end
    @clusters << splinter_cluster
  end
 
  return self
end

#eval(data_item) ⇒ Object

Classifies the given data item, returning the cluster index it belongs to (0-based).



63
64
65
66
67
# File 'lib/ai4r/clusterers/diana.rb', line 63

def eval(data_item)
  get_min_index(@clusters.collect do |cluster|
    distance_sum(data_item, cluster) / cluster.data_items.length
    end)
end