Class: Ai4r::Clusterers::SingleLinkage
- Includes:
- ClusterTree
- Defined in:
- lib/ai4r/clusterers/single_linkage.rb
Overview
Implementation of a Hierarchical clusterer with single linkage (Everitt et al., 2001 ; Johnson, 1967 ; Jain and Dubes, 1988 ; Sneath, 1957 ) Hierarchical clusterer create one cluster per element, and then progressively merge clusters, until the required number of clusters is reached. With single linkage, the distance between two clusters is computed as the distance between the two closest elements in the two clusters.
D(cx, (ci U cj) = min(D(cx, ci), D(cx, cj))
Direct Known Subclasses
AverageLinkage, CentroidLinkage, CompleteLinkage, MedianLinkage, WardLinkage, WeightedAverageLinkage
Instance Attribute Summary collapse
-
#clusters ⇒ Object
readonly
Returns the value of attribute clusters.
-
#data_set ⇒ Object
readonly
Returns the value of attribute data_set.
-
#number_of_clusters ⇒ Object
readonly
Returns the value of attribute number_of_clusters.
Attributes included from ClusterTree
Instance Method Summary collapse
-
#build(data_set, number_of_clusters = 1, **options) ⇒ Object
Build a new clusterer, using data examples found in data_set.
- #draw_map(clusters) ⇒ Object
-
#eval(data_item) ⇒ Object
Classifies the given data item, returning the cluster index it belongs to (0-based).
- #initialize(*args) ⇒ Object constructor
-
#silhouette ⇒ Object
Compute mean silhouette coefficient of the clustering result.
Methods inherited from Clusterer
Methods included from Data::Parameterizable
#get_parameters, included, #set_parameters
Constructor Details
#initialize(*args) ⇒ Object
40 41 42 43 44 45 46 47 48 |
# File 'lib/ai4r/clusterers/single_linkage.rb', line 40 def initialize(*args) super(*args) @distance_function = lambda do |a, b| Ai4r::Data::Proximity.squared_euclidean_distance( a.select { |att_a| att_a.is_a? Numeric }, b.select { |att_b| att_b.is_a? Numeric } ) end end |
Instance Attribute Details
#clusters ⇒ Object (readonly)
Returns the value of attribute clusters.
31 32 33 |
# File 'lib/ai4r/clusterers/single_linkage.rb', line 31 def clusters @clusters end |
#data_set ⇒ Object (readonly)
Returns the value of attribute data_set.
31 32 33 |
# File 'lib/ai4r/clusterers/single_linkage.rb', line 31 def data_set @data_set end |
#number_of_clusters ⇒ Object (readonly)
Returns the value of attribute number_of_clusters.
31 32 33 |
# File 'lib/ai4r/clusterers/single_linkage.rb', line 31 def number_of_clusters @number_of_clusters end |
Instance Method Details
#build(data_set, number_of_clusters = 1, **options) ⇒ Object
Build a new clusterer, using data examples found in data_set. Items will be clustered in “number_of_clusters” different clusters.
If you specify :distance options, it will stop whether number_of_clusters are reached or no distance among clusters are below :distance
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
# File 'lib/ai4r/clusterers/single_linkage.rb', line 60 def build(data_set, number_of_clusters = 1, **) @data_set = data_set distance = [:distance] || Float::INFINITY @index_clusters = create_initial_index_clusters create_distance_matrix(data_set) while @index_clusters.length > number_of_clusters ci, cj = get_closest_clusters(@index_clusters) break if read_distance_matrix(ci, cj) > distance update_distance_matrix(ci, cj) merge_clusters(ci, cj, @index_clusters) end @number_of_clusters = @index_clusters.length @distance_matrix = nil @clusters = build_clusters_from_index_clusters @index_clusters self end |
#draw_map(clusters) ⇒ Object
82 83 84 85 86 87 88 89 90 |
# File 'lib/ai4r/clusterers/single_linkage.rb', line 82 def draw_map(clusters) map = Array.new(11) { Array.new(11, 0) } clusters.each_index do |i| clusters[i].data_items.each do |point| map[point.first][point.last] = (i + 1) end end map end |
#eval(data_item) ⇒ Object
Classifies the given data item, returning the cluster index it belongs to (0-based).
96 97 98 99 100 |
# File 'lib/ai4r/clusterers/single_linkage.rb', line 96 def eval(data_item) get_min_index(@clusters.collect do |cluster| distance_between_item_and_cluster(data_item, cluster) end) end |
#silhouette ⇒ Object
Compute mean silhouette coefficient of the clustering result. Returns a float between -1 and 1. Only valid after build.
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 |
# File 'lib/ai4r/clusterers/single_linkage.rb', line 116 def silhouette return nil unless @index_clusters && @data_set total = 0.0 count = @data_set.data_items.length @index_clusters.each_with_index do |cluster, ci| cluster.each do |index| a = 0.0 if cluster.length > 1 cluster.each do |j| next if j == index a += distance_between_indexes(index, j) end a /= (cluster.length - 1) end b = nil @index_clusters.each_with_index do |other_cluster, cj| next if ci == cj dist = 0.0 other_cluster.each do |j| dist += distance_between_indexes(index, j) end dist /= other_cluster.length b = dist if b.nil? || dist < b end s = b&.positive? ? (b - a) / [a, b].max : 0.0 total += s end end total / count end |