Class: Ai4r::Clusterers::Diana

Inherits:
Clusterer show all
Defined in:
lib/ai4r/clusterers/diana.rb

Overview

DIANA (Divisive ANAlysis) (Kaufman and Rousseeuw, 1990; Macnaughton - Smith et al. 1964) is a Divisive Hierarchical Clusterer. It begins with only one cluster with all data items, and divides the clusters until the desired clusters number is reached.

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods inherited from Clusterer

#supports_eval?

Methods included from Data::Parameterizable

#get_parameters, included, #set_parameters

Constructor Details

#initializeObject



32
33
34
35
36
37
38
39
40
# File 'lib/ai4r/clusterers/diana.rb', line 32

def initialize
  super()
  @distance_function = lambda do |a, b|
    Ai4r::Data::Proximity.squared_euclidean_distance(
      a.select { |att_a| att_a.is_a? Numeric },
      b.select { |att_b| att_b.is_a? Numeric }
    )
  end
end

Instance Attribute Details

#clustersObject (readonly)

Returns the value of attribute clusters.



23
24
25
# File 'lib/ai4r/clusterers/diana.rb', line 23

def clusters
  @clusters
end

#data_setObject (readonly)

Returns the value of attribute data_set.



23
24
25
# File 'lib/ai4r/clusterers/diana.rb', line 23

def data_set
  @data_set
end

#number_of_clustersObject (readonly)

Returns the value of attribute number_of_clusters.



23
24
25
# File 'lib/ai4r/clusterers/diana.rb', line 23

def number_of_clusters
  @number_of_clusters
end

Instance Method Details

#build(data_set, number_of_clusters) ⇒ Object

Build a new clusterer, using divisive analysis (DIANA algorithm)

Parameters:

  • data_set (Object)
  • number_of_clusters (Object)

Returns:

  • (Object)


46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# File 'lib/ai4r/clusterers/diana.rb', line 46

def build(data_set, number_of_clusters)
  @data_set = data_set
  @number_of_clusters = number_of_clusters
  @clusters = [@data_set]

  while @clusters.length < @number_of_clusters
    cluster_index_to_split = max_diameter_cluster(@clusters)
    cluster_to_split = @clusters[cluster_index_to_split]
    splinter_cluster = init_splinter_cluster(cluster_to_split)
    loop do
      dist_diff, index = max_distance_difference(cluster_to_split, splinter_cluster)
      break if dist_diff.negative?

      splinter_cluster << cluster_to_split.data_items[index]
      cluster_to_split.data_items.delete_at(index)
    end
    @clusters << splinter_cluster
  end

  self
end

#eval(data_item) ⇒ Object

Classifies the given data item, returning the cluster index it belongs to (0-based).

Parameters:

  • data_item (Object)

Returns:

  • (Object)


72
73
74
75
76
# File 'lib/ai4r/clusterers/diana.rb', line 72

def eval(data_item)
  get_min_index(@clusters.collect do |cluster|
    distance_sum(data_item, cluster) / cluster.data_items.length
  end)
end