Class: Ai4r::Reinforcement::QLearning

Inherits:

Object

Object
Ai4r::Reinforcement::QLearning

show all

Includes:: Data::Parameterizable

Defined in:: lib/ai4r/reinforcement/q_learning.rb

Overview

Simple Q-learning agent storing Q-values in a Hash.

Instance Attribute Summary collapse

#q ⇒ Object readonly

Direct access to learned Q-values.

Instance Method Summary collapse

#choose_action(state) ⇒ Object

Choose an action using an ε-greedy strategy.
#initialize ⇒ QLearning constructor

A new instance of QLearning.
#update(state, action, reward, next_state) ⇒ Object

Update Q(s,a) from an observed transition.

Methods included from Data::Parameterizable

#get_parameters, included, #set_parameters

Constructor Details

#initialize ⇒ `QLearning`

Returns a new instance of QLearning.

# File 'lib/ai4r/reinforcement/q_learning.rb', line 21

def initialize
  @learning_rate = 0.1
  @discount = 0.9
  @exploration = 0.1
  @q = Hash.new { |h, k| h[k] = Hash.new(0.0) }
end

Instance Attribute Details

#q ⇒ `Object` (readonly)

Direct access to learned Q-values.



48
49
50

# File 'lib/ai4r/reinforcement/q_learning.rb', line 48

def q
  @q
end

Instance Method Details

#choose_action(state) ⇒ `Object`

Choose an action using an ε-greedy strategy.

# File 'lib/ai4r/reinforcement/q_learning.rb', line 37

def choose_action(state)
  return nil if @q[state].empty?

  if rand < @exploration
    @q[state].keys.sample
  else
    @q[state].max_by { |_, v| v }.first
  end
end

#update(state, action, reward, next_state) ⇒ `Object`

Update Q(s,a) from an observed transition.

# File 'lib/ai4r/reinforcement/q_learning.rb', line 29

def update(state, action, reward, next_state)
  best_next = @q[next_state].values.max || 0.0
  @q[state][action] += @learning_rate * (
    reward + @discount * best_next - @q[state][action]
  )
end