Module: MoreMath::Entropy
- Included in:
- Functions
- Defined in:
- lib/more_math/entropy.rb
Overview
Provides entropy calculation utilities for measuring information content and randomness in text data.
This module implements Shannon entropy calculations to quantify the unpredictability or information content of text strings. It’s commonly used in cryptography, data compression, and information theory applications.
The entropy measures help determine how “random” or “predictable” a text is, which can be useful for:
-
Password strength analysis
-
Data compression efficiency estimation
-
Cryptographic security assessment
-
Text analysis and classification
Instance Method Summary collapse
-
#entropy(text) ⇒ Float
Calculates the Shannon entropy of a text string.
-
#entropy_ideal(size) ⇒ Float
Calculates the ideal (maximum) entropy for a given character set size.
-
#entropy_ratio(text, size: text.size) ⇒ Float
Calculates the normalized entropy ratio of a text string.
-
#entropy_ratio_minimum(text, size: text.size, alpha: 0.05) ⇒ Float
Calculates the minimum entropy ratio with confidence interval adjustment.
Instance Method Details
#entropy(text) ⇒ Float
Calculates the Shannon entropy of a text string.
Shannon entropy measures the average amount of information (in bits) needed to encode characters in the text based on their frequencies.
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
# File 'lib/more_math/entropy.rb', line 39 def entropy(text) chars = nil if text.respond_to?(:chars) chars = text.chars else chars = text end size = chars.size chars.each_with_object(Hash.new(0.0)) { |c, h| h[c] += 1 }. each_value.reduce(0.0) do |entropy, count| frequency = count / size entropy + frequency * Math.log2(frequency) end.abs end |
#entropy_ideal(size) ⇒ Float
Calculates the ideal (maximum) entropy for a given character set size.
This represents the maximum possible entropy when all characters in the alphabet have equal probability of occurrence.
66 67 68 69 70 |
# File 'lib/more_math/entropy.rb', line 66 def entropy_ideal(size) size <= 1 and return 0.0 frequency = 1.0 / size -1.0 * size * frequency * Math.log2(frequency) end |
#entropy_ratio(text, size: text.size) ⇒ Float
Calculates the normalized entropy ratio of a text string.
The ratio is calculated as actual entropy divided by ideal entropy, giving a value between 0 and 1 where:
-
0 indicates no entropy (all characters are identical)
-
1 indicates maximum entropy (uniform distribution across the alphabet)
The normalization uses the specified alphabet size to calculate the theoretical maximum entropy for that character set.
97 98 99 100 |
# File 'lib/more_math/entropy.rb', line 97 def entropy_ratio(text, size: text.size) size <= 1 and return 0.0 entropy(text) / entropy_ideal(size) end |
#entropy_ratio_minimum(text, size: text.size, alpha: 0.05) ⇒ Float
Calculates the minimum entropy ratio with confidence interval adjustment
This method computes a adjusted entropy ratio that accounts for statistical uncertainty by incorporating the standard error and a confidence level.
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
# File 'lib/more_math/entropy.rb', line 114 def entropy_ratio_minimum(text, size: text.size, alpha: 0.05) raise ArgumentError, 'alphabet size must be ≥ 2' if size < 2 raise ArgumentError, 'text must not be empty' if text.empty? n = text.size k = size ratio = MoreMath::Functions.entropy_ratio(text, size: k) logk = Math.log2(k) diff = logk - 1.0 / Math.log(2) var = (diff ** 2) / (logk ** 2) * (1.0 - 1.0 / k) / n se = Math.sqrt(var) # standard error z = STD_NORMAL_DISTRIBUTION.inverse_probability(1.0 - alpha / 2.0) (ratio - z * se).clamp(0, 1) end |