Module: MoreMath::Entropy

Included in:
Functions
Defined in:
lib/more_math/entropy.rb

Overview

Provides entropy calculation utilities for measuring information content and randomness in text data.

This module implements Shannon entropy calculations to quantify the unpredictability or information content of text strings. It’s commonly used in cryptography, data compression, and information theory applications.

The entropy measures help determine how “random” or “predictable” a text is, which can be useful for:

  • Password strength analysis

  • Data compression efficiency estimation

  • Cryptographic security assessment

  • Text analysis and classification

Examples:

Basic usage

require 'more_math'
include MoreMath::Functions

text = "hello world"
puts entropy(text)        # => 2.3219280948873626
puts entropy_ratio(text)   # => 0.7428571428571429

Using with different text samples

entropy("aaaa")           # => 0.0 (no entropy)
entropy("abcd")           # => 2.0 (actual entropy)

Instance Method Summary collapse

Instance Method Details

#entropy(text) ⇒ Float

Calculates the Shannon entropy in bits of a text string.

Shannon entropy measures the average amount of information (in bits) needed to encode characters in the text based on their frequencies.

Examples:

entropy("hello") # => 2.3219280948873626
entropy("aaaa")  # => 0.0


39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# File 'lib/more_math/entropy.rb', line 39

def entropy(text)
  chars = nil
  if text.respond_to?(:chars)
    chars = text.chars
  else
    chars = text
  end
  size  = chars.size

  chars.each_with_object(Hash.new(0.0)) { |c, h| h[c] += 1 }.
    each_value.reduce(0.0) do |entropy, count|
      frequency = count / size
      entropy + frequency * Math.log2(frequency)
    end.abs
end

#entropy_ideal(size) ⇒ Float

Calculates the ideal (maximum) entropy for a given character set size.

This represents the maximum possible entropy when all characters in the alphabet have equal probability of occurrence.

Examples:

entropy_ideal(2)  # => 1.0
entropy_ideal(256) # => 8.0


66
67
68
69
70
# File 'lib/more_math/entropy.rb', line 66

def entropy_ideal(size)
  size <= 1 and return 0.0
  frequency = 1.0 / size
  -1.0 * size * frequency * Math.log2(frequency)
end

#entropy_maximum(text, size:) ⇒ Integer

Calculates the maximum possible entropy for a given text and alphabet size.

This represents the theoretical maximum entropy that could be achieved if all characters in the text were chosen uniformly at random from the alphabet. It’s used to determine the upper bound of security strength for tokens.

Examples:

entropy_maximum("hello", size: 26)  # => 23
entropy_maximum("abc123", size: 64) # => 36


146
147
148
149
# File 'lib/more_math/entropy.rb', line 146

def entropy_maximum(text, size:)
  size > 1 or return 0
  (text.size * Math.log2(size)).floor
end

#entropy_ratio(text, size:) ⇒ Float

Calculates the normalized entropy ratio of a text string.

The ratio is calculated as actual entropy divided by ideal entropy, giving a value between 0 and 1 where:

  • 0 indicates no entropy (all characters are identical)

  • 1 indicates maximum entropy (uniform distribution across the alphabet)

The normalization uses the specified alphabet size to calculate the theoretical maximum entropy for that character set.

Examples:

entropy_ratio("hello")     # => 0.6834
entropy_ratio("aaaaa")     # => 0.0
entropy_ratio("abcde")     # => 1.0

With custom alphabet size

# Normalizing against a 26-letter alphabet (English)
entropy_ratio("hello", size: 26) # => 0.394...


94
95
96
97
# File 'lib/more_math/entropy.rb', line 94

def entropy_ratio(text, size:)
  size <= 1 and return 0.0
  entropy(text) / entropy_ideal(size)
end

#entropy_ratio_minimum(text, size:, alpha: 0.05) ⇒ Float

Calculates the minimum entropy ratio with confidence interval adjustment

This method computes a adjusted entropy ratio that accounts for statistical uncertainty by incorporating the standard error and a confidence level.

Raises:

  • (ArgumentError)

    When alphabet size is less than 2

  • (ArgumentError)

    When text is empty



112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
# File 'lib/more_math/entropy.rb', line 112

def entropy_ratio_minimum(text, size:, alpha: 0.05)
  raise ArgumentError, 'alphabet size must be ≥ 2' if size < 2
  raise ArgumentError, 'text must not be empty'    if text.empty?

  n = text.size
  k = size

  ratio = MoreMath::Functions.entropy_ratio(text, size: k)

  logk = Math.log2(k)
  diff = logk - 1.0 / Math.log(2)
  var  = (diff ** 2) / (logk ** 2) * (1.0 - 1.0 / k) / n
  se   = Math.sqrt(var)          # standard error

  z = STD_NORMAL_DISTRIBUTION.inverse_probability(1.0 - alpha / 2.0)

  (ratio - z * se).clamp(0, 1)
end