RL-Math
0.1.0
Contents
Notations
Bellman Expectation Equation
References
Markov Decision Processes
Introduction
Markov Processes
Definition
Bellman Equation
Markov Reward Processes
Definition
Bellman Equation
Markov Decision Processes
Definition
Bellman Expectation Equation
Bellman Optimality Equation
References
Monte Carlo Methods
Introduction
Monte Carlo Prediction
Monte Carlo Control
On-Policy Monte Carlo Control
Off-Policy Monte Carlo Control
Importance Sampling
Incremental Implementation
Conclusion
References
Temporal Difference Learning
Introduction
TD Prediction
TD Control
Eligibility Traces
Conclusion
References
Policy Gradient
Policy Gradient Theorem
Proof of Policy Gradient Theorem
References
Soft Actor-Critic (SAC) Algorithm
Introduction
Theoretical Derivation
Soft Q-Function and Soft Value Function
Automating Entropy Adjustment
Algorithmic flow
References
Bayes Theorem
Introduction
Mathematical Formulation
Proof
Intuitive Explanation
Key Insights
Conclusion
Attention Is All You Need
Introduction
Architecture Overview
Self-Attention Mechanism
Multi-Head Attention
Code Implementation
Conclusion
References
Understanding GPT as an Attention-Driven Decoder
Introduction
Decoder-Only Architecture
Principles
Mathematical Formulation
Code Implementation
Conclusion
References
RL-Math
Index
Index