Class: ClassicBandit::EpsilonGreedy

Inherits:
Object
  • Object
show all
Includes:
ArmUpdatable
Defined in:
lib/classic_bandit/epsilon_greedy.rb

Overview

Implements the Epsilon-Greedy algorithm for multi-armed bandit problems. This algorithm makes a random choice with probability epsilon (exploration) and chooses the arm with the highest mean reward with probability 1-epsilon (exploitation).

Examples:

Create and use epsilon-greedy bandit

arms = [
  ClassicBandit::Arm.new(id: 1, name: "banner_a", trials: 100, successes: 10),
  ClassicBandit::Arm.new(id: 2, name: "banner_b", trials: 150, successes: 14)
]
bandit = ClassicBandit::EpsilonGreedy.new(arms: arms, epsilon: 0.1)
selected_arm = bandit.select_arm
bandit.update(selected_arm, reward: 1)

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from ArmUpdatable

#update

Constructor Details

#initialize(arms:, epsilon: 0.1) ⇒ EpsilonGreedy

Returns a new instance of EpsilonGreedy.



21
22
23
24
25
26
# File 'lib/classic_bandit/epsilon_greedy.rb', line 21

def initialize(arms:, epsilon: 0.1)
  @arms = arms
  @epsilon = epsilon

  validate_epsilon!
end

Instance Attribute Details

#armsObject (readonly)

Returns the value of attribute arms.



19
20
21
# File 'lib/classic_bandit/epsilon_greedy.rb', line 19

def arms
  @arms
end

#epsilonObject (readonly)

Returns the value of attribute epsilon.



19
20
21
# File 'lib/classic_bandit/epsilon_greedy.rb', line 19

def epsilon
  @epsilon
end

Instance Method Details

#select_armObject



28
29
30
31
32
33
34
35
36
37
38
39
# File 'lib/classic_bandit/epsilon_greedy.rb', line 28

def select_arm
  # If no arms have been tried, do random selection
  return @arms.sample if @arms.all? { |arm| arm.trials.zero? }

  if rand < @epsilon
    # Exploration: random selection
    @arms.sample
  else
    # Exploitation: select arm with highest mean reward
    @arms.max_by(&:mean_reward)
  end
end