Module: FiniteMDP::Model
- Included in:
- ArrayModel, HashModel, TableModel
- Defined in:
- lib/finite_mdp/model.rb
Overview
Interface that defines a finite markov decision process model.
There are several approaches to describing the state, action, transition probability and reward data for use with this library.
-
Write the data directly into a TableModel or HashModel. This is usually the way to go for small models, such as examples from text books.
-
Write a procedure that generates the data and stores them in a TableModel or HashModel. This gives the most flexibility in how the data are generated.
-
Write a class that implements the methods in this module. The methods in this module are a fairly close approximation to the usual way of defining an MDP mathematically, so it can be a useful way of structuring the definition. It can then be converted to one of the other representations (see TableModel.from_model) or passed directly to a Solver.
The discussion below applies to all of these approaches.
Note that there is no special treatment for terminal states, but they can be modeled by including a dummy state (a state with zero reward and one action that brings the process back to the dummy state with probability 1).
The states and actions can be arbitrary objects. The only requirement is that they support hashing and equality (in the sense of eql?
), which all ruby objects do. Built-in types, such as symbols, arrays and Structs, will work as expected. Note, however, that the default hashing and equality semantics for custom classes may not be what you want. The following example illustrates this:
class BadGridState
def initialize x, y
@x, @y = x, y
end
attr_accessor :x, :y
end
BadGridState.new(1, 1) == BadGridState.new(1, 2) #=> false
BadGridState.new(1, 1) == BadGridState.new(1, 1) #=> false (!!!)
This is because, by default, hashing and equality are defined in terms of object identifiers, not the ‘content’ of the objects. The preferred solution is to define the state as a Struct
:
GoodGridState = Struct.new(:x, :y)
GoodGridState.new(1, 1) == GoodGridState.new(1, 2) #=> false
GoodGridState.new(1, 1) == GoodGridState.new(1, 1) #=> true
Struct
is part of the ruby standard library, and it implements hashing and equality based on object content rather than identity.
Alternatively, if you cannot derive your state class from Struct
, you can define your own hash code and equality check. An easy way to do this is to include the VectorValued mix-in. It is also notable that you can make the default semantics work; you just have to make sure that there is only one instance of your state class per state, as in the following example:
g11 = BadGridState.new(1, 1)
g12 = BadGridState.new(1, 2)
g21 = BadGridState.new(2, 1)
model = FiniteMDP::TableModel.new([
[g11, :up, g12, 0, 0.9],
[g11, :up, g21, 0, 0.1],
[g11, :right, g21, 0, 0.9],
# ...
]) # this will work as expected
Note that the Solver will convert the model to its own internal representation. The efficiency of the methods that define the model is important while the solver is building its internal representation, but it does not affect the performance of the iterative algorithm used after that. Also note that the solver handles state and action numbering internally, so it is not necessary to use numbers for the states.
Instance Method Summary collapse
-
#actions(_state) ⇒ Array<action>
abstract
Actions that are valid for the given state.
-
#check_transition_probabilities_sum(tol = 1e-6) ⇒ nil
Raise an error if the sum of the transition probabilities for any (state, action) pair is not sufficiently close to 1.
-
#next_states(_state, _action) ⇒ Array<state>
Successor states after taking the given action in the given state.
-
#reward(_state, _action, _next_state) ⇒ Float?
abstract
Reward for a given transition.
-
#states ⇒ Array<state>
abstract
States in this model.
-
#terminal_states ⇒ Set
Set of states that have no transitions out.
-
#transition_probability(_state, _action, _next_state) ⇒ Float
abstract
Probability of the given transition.
-
#transition_probability_sums ⇒ Hash<[State, Action], Float>
Sum of the transition probabilities for each (state, action) pair; the sums should be one in a valid model.
Instance Method Details
#actions(_state) ⇒ Array<action>
Actions that are valid for the given state.
All states must have at least one valid action; see notes for FiniteMDP::Model regarding how to encode a terminal state.
104 105 106 |
# File 'lib/finite_mdp/model.rb', line 104 def actions(_state) raise NotImplementedError end |
#check_transition_probabilities_sum(tol = 1e-6) ⇒ nil
Raise an error if the sum of the transition probabilities for any (state, action) pair is not sufficiently close to 1.
204 205 206 207 208 209 210 211 |
# File 'lib/finite_mdp/model.rb', line 204 def check_transition_probabilities_sum(tol = 1e-6) transition_probability_sums.each do |(state, action), pr| next if (1 - pr).abs <= tol raise "transition probabilities for state #{state.inspect} and action " \ "#{action.inspect} sum to #{pr}" end nil end |
#next_states(_state, _action) ⇒ Array<state>
Successor states after taking the given action in the given state. Note that the returned states may occur with zero probability.
The default behavior is to return all states as candidate successor states and let #transition_probability determine which ones are possible. It can be overridden in sparse models to avoid storing or computing lots of zeros. Also note that TableModel.from_model and HashModel.from_model can be told to ignore transitions with zero probability, and that the Solver ignores them in its internal representation, so you can usually forget about this method.
126 127 128 |
# File 'lib/finite_mdp/model.rb', line 126 def next_states(_state, _action) states end |
#reward(_state, _action, _next_state) ⇒ Float?
Reward for a given transition.
If the transition is not in the model, in the sense that it would never arise from #states, #actions and #next_states, the result is undefined. Note that HashModel#reward and TableModel#reward return nil
in this case, but this is not part of the contract.
173 174 175 |
# File 'lib/finite_mdp/model.rb', line 173 def reward(_state, _action, _next_state) raise NotImplementedError end |
#states ⇒ Array<state>
States in this model.
88 89 90 |
# File 'lib/finite_mdp/model.rb', line 88 def states raise NotImplementedError end |
#terminal_states ⇒ Set
Set of states that have no transitions out.
At present, this library can’t solve a model with terminal states. However, you can add a dummy state (e.g. :stop
) with zero reward that transitions back to itself with probability one.
Note that if a state has transitions out, but all of them have probability zero, this method does not detect it as a terminal state. You can check for these using #transition_probability_sums instead.
226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 |
# File 'lib/finite_mdp/model.rb', line 226 def terminal_states all_states = Set[] out_states = Set[] states.each do |state| all_states << state any_out_transitions = false actions(state).each do |action| ns = next_states(state, action) all_states.merge ns any_out_transitions ||= !ns.empty? end out_states << state if any_out_transitions end all_states - out_states end |
#transition_probability(_state, _action, _next_state) ⇒ Float
Probability of the given transition.
If the transition is not in the model, in the sense that it would never arise from #states, #actions and #next_states, the result is undefined. Note that HashModel#transition_probability and TableModel#transition_probability return zero in this case, but this is not part of the contract.
150 151 152 |
# File 'lib/finite_mdp/model.rb', line 150 def transition_probability(_state, _action, _next_state) raise NotImplementedError end |
#transition_probability_sums ⇒ Hash<[State, Action], Float>
Sum of the transition probabilities for each (state, action) pair; the sums should be one in a valid model.
183 184 185 186 187 188 189 190 191 192 193 194 |
# File 'lib/finite_mdp/model.rb', line 183 def transition_probability_sums prs = [] states.each do |state| actions(state).each do |action| pr = next_states(state, action).map do |next_state| transition_probability(state, action, next_state) end.inject(:+) prs << [[state, action], pr] end end Hash[prs] end |