Class: Ai4r::Clusterers::SingleLinkage

Inherits:
Clusterer
  • Object
show all
Includes:
ClusterTree
Defined in:
lib/ai4r/clusterers/single_linkage.rb

Overview

Implementation of a Hierarchical clusterer with single linkage (Everitt et al., 2001 ; Johnson, 1967 ; Jain and Dubes, 1988 ; Sneath, 1957 ) Hierarchical clusterer create one cluster per element, and then progressively merge clusters, until the required number of clusters is reached. With single linkage, the distance between two clusters is computed as the distance between the two closest elements in the two clusters.

D(cx, (ci U cj) = min(D(cx, ci), D(cx, cj))

Instance Attribute Summary collapse

Attributes included from ClusterTree

#cluster_tree

Instance Method Summary collapse

Methods inherited from Clusterer

#supports_eval?

Methods included from Data::Parameterizable

#get_parameters, included, #set_parameters

Constructor Details

#initialize(*args) ⇒ Object



40
41
42
43
44
45
46
47
48
# File 'lib/ai4r/clusterers/single_linkage.rb', line 40

def initialize(*args)
  super(*args)
  @distance_function = lambda do |a, b|
    Ai4r::Data::Proximity.squared_euclidean_distance(
      a.select { |att_a| att_a.is_a? Numeric },
      b.select { |att_b| att_b.is_a? Numeric }
    )
  end
end

Instance Attribute Details

#clustersObject (readonly)

Returns the value of attribute clusters.



31
32
33
# File 'lib/ai4r/clusterers/single_linkage.rb', line 31

def clusters
  @clusters
end

#data_setObject (readonly)

Returns the value of attribute data_set.



31
32
33
# File 'lib/ai4r/clusterers/single_linkage.rb', line 31

def data_set
  @data_set
end

#number_of_clustersObject (readonly)

Returns the value of attribute number_of_clusters.



31
32
33
# File 'lib/ai4r/clusterers/single_linkage.rb', line 31

def number_of_clusters
  @number_of_clusters
end

Instance Method Details

#build(data_set, number_of_clusters = 1, **options) ⇒ Object

Build a new clusterer, using data examples found in data_set. Items will be clustered in “number_of_clusters” different clusters.

If you specify :distance options, it will stop whether number_of_clusters are reached or no distance among clusters are below :distance

Parameters:

  • data_set (Object)
  • number_of_clusters (Object) (defaults to: 1)
  • *options (Object)

Returns:

  • (Object)


60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# File 'lib/ai4r/clusterers/single_linkage.rb', line 60

def build(data_set, number_of_clusters = 1, **options)
  @data_set = data_set
  distance = options[:distance] || Float::INFINITY

  @index_clusters = create_initial_index_clusters
  create_distance_matrix(data_set)
  while @index_clusters.length > number_of_clusters
    ci, cj = get_closest_clusters(@index_clusters)
    break if read_distance_matrix(ci, cj) > distance

    update_distance_matrix(ci, cj)
    merge_clusters(ci, cj, @index_clusters)
  end

  @number_of_clusters = @index_clusters.length
  @distance_matrix = nil
  @clusters = build_clusters_from_index_clusters @index_clusters
  self
end

#draw_map(clusters) ⇒ Object

Parameters:

  • clusters (Object)

Returns:

  • (Object)


82
83
84
85
86
87
88
89
90
# File 'lib/ai4r/clusterers/single_linkage.rb', line 82

def draw_map(clusters)
  map = Array.new(11) { Array.new(11, 0) }
  clusters.each_index do |i|
    clusters[i].data_items.each do |point|
      map[point.first][point.last] = (i + 1)
    end
  end
  map
end

#eval(data_item) ⇒ Object

Classifies the given data item, returning the cluster index it belongs to (0-based).

Parameters:

  • data_item (Object)

Returns:

  • (Object)


96
97
98
99
100
# File 'lib/ai4r/clusterers/single_linkage.rb', line 96

def eval(data_item)
  get_min_index(@clusters.collect do |cluster|
    distance_between_item_and_cluster(data_item, cluster)
  end)
end

#silhouetteObject

Compute mean silhouette coefficient of the clustering result. Returns a float between -1 and 1. Only valid after build.

Returns:

  • (Object)


116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
# File 'lib/ai4r/clusterers/single_linkage.rb', line 116

def silhouette
  return nil unless @index_clusters && @data_set

  total = 0.0
  count = @data_set.data_items.length

  @index_clusters.each_with_index do |cluster, ci|
    cluster.each do |index|
      a = 0.0
      if cluster.length > 1
        cluster.each do |j|
          next if j == index

          a += distance_between_indexes(index, j)
        end
        a /= (cluster.length - 1)
      end

      b = nil
      @index_clusters.each_with_index do |other_cluster, cj|
        next if ci == cj

        dist = 0.0
        other_cluster.each do |j|
          dist += distance_between_indexes(index, j)
        end
        dist /= other_cluster.length
        b = dist if b.nil? || dist < b
      end
      s = b&.positive? ? (b - a) / [a, b].max : 0.0
      total += s
    end
  end

  total / count
end