Class: Gitlab::Database::PostgresHll::Buckets

Inherits:
Object
  • Object
show all
Defined in:
lib/gitlab/database/postgres_hll/buckets.rb

Overview

Note:

HyperLogLog is an PROBABILISTIC algorithm that ESTIMATES distinct count of given attribute value for supplied relation Like all probabilistic algorithm is has ERROR RATE margin, that can affect values, for given implementation no higher value was reported (gitlab.com/gitlab-org/gitlab/-/merge_requests/45673#accuracy-estimation) than 5.3% for the most of a cases this value is lower. However, if the exact value is necessary other tools has to be used.

Constant Summary collapse

TOTAL_BUCKETS =
512

Instance Method Summary collapse

Constructor Details

#initialize(buckets = {}) ⇒ Buckets



26
27
28
# File 'lib/gitlab/database/postgres_hll/buckets.rb', line 26

def initialize(buckets = {})
  @buckets = buckets
end

Instance Method Details

#estimated_distinct_countFloat

Based on HyperLogLog structure estimates number of unique elements in analysed set.



33
34
35
# File 'lib/gitlab/database/postgres_hll/buckets.rb', line 33

def estimated_distinct_count
  @estimated_distinct_count ||= estimate_cardinality
end

#merge_hash!(other_buckets_hash) ⇒ Object

Updates instance underlying HyperLogLog structure by merging it with other HyperLogLog structure



40
41
42
# File 'lib/gitlab/database/postgres_hll/buckets.rb', line 40

def merge_hash!(other_buckets_hash)
  buckets.merge!(other_buckets_hash) { |_key, old, new| new > old ? new : old }
end

#to_json(_ = nil) ⇒ String

Serialize instance underlying HyperLogLog structure to JSON format, that can be stored in various persistence layers



47
48
49
# File 'lib/gitlab/database/postgres_hll/buckets.rb', line 47

def to_json(_ = nil)
  buckets.to_json
end