RL-Math
0.1.0

Contents

  • Notations
    • Bellman Expectation Equation
    • References
  • Markov Decision Processes
    • Introduction
    • Markov Processes
      • Definition
      • Bellman Equation
    • Markov Reward Processes
      • Definition
      • Bellman Equation
    • Markov Decision Processes
      • Definition
      • Bellman Expectation Equation
      • Bellman Optimality Equation
    • References
  • Monte Carlo Methods
    • Introduction
    • Monte Carlo Prediction
    • Monte Carlo Control
      • On-Policy Monte Carlo Control
      • Off-Policy Monte Carlo Control
    • Importance Sampling
    • Incremental Implementation
    • Conclusion
    • References
  • Temporal Difference Learning
    • Introduction
    • TD Prediction
    • TD Control
    • Eligibility Traces
    • Conclusion
    • References
  • Policy Gradient
    • Policy Gradient Theorem
    • Proof of Policy Gradient Theorem
    • References
  • Soft Actor-Critic (SAC) Algorithm
    • Introduction
    • Theoretical Derivation
      • Soft Q-Function and Soft Value Function
      • Automating Entropy Adjustment
    • Algorithmic flow
    • References
  • Bayes Theorem
    • Introduction
    • Mathematical Formulation
    • Proof
    • Intuitive Explanation
    • Key Insights
    • Conclusion
  • Attention Is All You Need
    • Introduction
    • Architecture Overview
    • Self-Attention Mechanism
    • Multi-Head Attention
    • Code Implementation
    • Conclusion
    • References
  • Understanding GPT as an Attention-Driven Decoder
    • Introduction
    • Decoder-Only Architecture
    • Principles
    • Mathematical Formulation
    • Code Implementation
    • Conclusion
    • References
RL-Math
  • Search


© Copyright 2024, Borg.

Built with Sphinx using a theme provided by Read the Docs.