Markov Decision Processes
Introduction
Markov Decision Processes (MDPs) provide a mathematical framework for modeling decision-making in environments where outcomes are partly random and partly under the control of a decision maker. This framework is fundamental in fields such as reinforcement learning, operations research, and economics.
This document explores key concepts of MDPs, starting with Markov Processes (MP), then Markov Reward Processes (MRP), and finally MDPs. Each section provides definitions and the corresponding Bellman equations.
Markov Processes
A Markov Process (MP) is a stochastic process characterized by the Markov property, which states that the future state depends only on the current state and not on the sequence of states that preceded it. A state is Markov (or with Markov property) if and only if
Definition
Note
A Markov Process (or Markov Chain) is a tuple
is a (finite) set of states.
is a state transition probability matrix,
Bellman Equation
Tip
For a Markov Process, the Bellman equation describes the recursive relationship of state transition probabilities,
Markov Reward Processes
A Markov Reward Process (MRP) extends an MP by associating rewards (values) with state transitions.
Definition
Note
An MRP is a tuple
is a (finite) set of states.
is a state transition probability matrix,
is a reward function.
is a discount factor.
Bellman Equation
The state value function of an MRP is the expected return starting from state
,
The Bellman equation for the value function is,
Tip
Markov Decision Processes
Definition
A Markov Decision Process (MDP) introduces decision-making into an MRP. It is defined by the tuple where,
is a finite set of states.
is a finite set of actions,
is the transition probability given action
,
is the expected reward for taking action
in state
.
is the discount factor.
Bellman Expectation Equation
The Bellman expectation equation for the state value function and action value function
are,
Tip
Bellman Optimality Equation
For the optimal state value function and action value function
are,
and we have,
Tip
Important
In sum, the Bellman equation recursively defines a value function as the sum of the immediate reward and the discounted expected value of future states.