Class: ClassicBandit::Softmax
- Inherits:
-
Object
- Object
- ClassicBandit::Softmax
- Includes:
- ArmUpdatable
- Defined in:
- lib/classic_bandit/softmax.rb
Overview
Implements the Softmax algorithm for multi-armed bandit problems. This algorithm selects arms based on Boltzmann distribution, with temperature parameter controlling exploration-exploitation balance.
Instance Attribute Summary collapse
-
#arms ⇒ Object
readonly
Returns the value of attribute arms.
Instance Method Summary collapse
-
#initialize(arms:, initial_temperature:, k:) ⇒ Softmax
constructor
rubocop:disable Naming/MethodParameterName.
- #select_arm ⇒ Object
Methods included from ArmUpdatable
Constructor Details
#initialize(arms:, initial_temperature:, k:) ⇒ Softmax
rubocop:disable Naming/MethodParameterName
23 24 25 26 27 28 29 |
# File 'lib/classic_bandit/softmax.rb', line 23 def initialize(arms:, initial_temperature:, k:) # rubocop:disable Naming/MethodParameterName @arms = arms @initial_temperature = initial_temperature @k = k validate_parameters! end |
Instance Attribute Details
#arms ⇒ Object (readonly)
Returns the value of attribute arms.
21 22 23 |
# File 'lib/classic_bandit/softmax.rb', line 21 def arms @arms end |
Instance Method Details
#select_arm ⇒ Object
31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
# File 'lib/classic_bandit/softmax.rb', line 31 def select_arm return @arms.sample if @arms.all? { |arm| arm.trials.zero? } probabilities = @arms.map { |arm| softmax_score(arm, temperature) } cumulative_prob = 0 random_value = rand @arms.each_with_index do |arm, i| cumulative_prob += probabilities[i] return arm if random_value <= cumulative_prob end @arms.last end |