Class: Ai4r::Reinforcement::QLearning
- Inherits:
-
Object
- Object
- Ai4r::Reinforcement::QLearning
- Includes:
- Data::Parameterizable
- Defined in:
- lib/ai4r/reinforcement/q_learning.rb
Overview
Simple Q-learning agent storing Q-values in a Hash.
Instance Attribute Summary collapse
-
#q ⇒ Object
readonly
Direct access to learned Q-values.
Instance Method Summary collapse
-
#choose_action(state) ⇒ Object
Choose an action using an ε-greedy strategy.
-
#initialize ⇒ QLearning
constructor
A new instance of QLearning.
-
#update(state, action, reward, next_state) ⇒ Object
Update Q(s,a) from an observed transition.
Methods included from Data::Parameterizable
#get_parameters, included, #set_parameters
Constructor Details
#initialize ⇒ QLearning
Returns a new instance of QLearning.
21 22 23 24 25 26 |
# File 'lib/ai4r/reinforcement/q_learning.rb', line 21 def initialize @learning_rate = 0.1 @discount = 0.9 @exploration = 0.1 @q = Hash.new { |h, k| h[k] = Hash.new(0.0) } end |
Instance Attribute Details
#q ⇒ Object (readonly)
Direct access to learned Q-values.
48 49 50 |
# File 'lib/ai4r/reinforcement/q_learning.rb', line 48 def q @q end |
Instance Method Details
#choose_action(state) ⇒ Object
Choose an action using an ε-greedy strategy.
37 38 39 40 41 42 43 44 45 |
# File 'lib/ai4r/reinforcement/q_learning.rb', line 37 def choose_action(state) return nil if @q[state].empty? if rand < @exploration @q[state].keys.sample else @q[state].max_by { |_, v| v }.first end end |
#update(state, action, reward, next_state) ⇒ Object
Update Q(s,a) from an observed transition.
29 30 31 32 33 34 |
# File 'lib/ai4r/reinforcement/q_learning.rb', line 29 def update(state, action, reward, next_state) best_next = @q[next_state].values.max || 0.0 @q[state][action] += @learning_rate * ( reward + @discount * best_next - @q[state][action] ) end |