Class: ClassicBandit::Softmax

Inherits:
Object
  • Object
show all
Includes:
ArmUpdatable
Defined in:
lib/classic_bandit/softmax.rb

Overview

Implements the Softmax algorithm for multi-armed bandit problems. This algorithm selects arms based on Boltzmann distribution, with temperature parameter controlling exploration-exploitation balance.

Examples:

Create and use Softmax bandit

arms = [
  ClassicBandit::Arm.new(id: 1, name: "banner_a"),
  ClassicBandit::Arm.new(id: 2, name: "banner_b")
]
bandit = ClassicBandit::Softmax.new(
  arms: arms,
  initial_temperature: 1.0,
  k: 0.5
)

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from ArmUpdatable

#update

Constructor Details

#initialize(arms:, initial_temperature:, k:) ⇒ Softmax

rubocop:disable Naming/MethodParameterName



23
24
25
26
27
28
29
# File 'lib/classic_bandit/softmax.rb', line 23

def initialize(arms:, initial_temperature:, k:) # rubocop:disable Naming/MethodParameterName
  @arms = arms
  @initial_temperature = initial_temperature
  @k = k

  validate_parameters!
end

Instance Attribute Details

#armsObject (readonly)

Returns the value of attribute arms.



21
22
23
# File 'lib/classic_bandit/softmax.rb', line 21

def arms
  @arms
end

Instance Method Details

#select_armObject



31
32
33
34
35
36
37
38
39
40
41
42
43
44
# File 'lib/classic_bandit/softmax.rb', line 31

def select_arm
  return @arms.sample if @arms.all? { |arm| arm.trials.zero? }

  probabilities = @arms.map { |arm| softmax_score(arm, temperature) }
  cumulative_prob = 0
  random_value = rand

  @arms.each_with_index do |arm, i|
    cumulative_prob += probabilities[i]
    return arm if random_value <= cumulative_prob
  end

  @arms.last
end