Transition to Approximate Q-Learning

Deep Reinforcement Learning

Last updated: December 31, 2024

1. Introduction

Tabular Q-learning becomes infeasible for high-dimensional or continuous state spaces. Instead of storing a Q-table with an entry for every $(s,a)$ pair, we use a parameterized function $Q(s,a;\theta)$. The agent learns parameters $\theta$ (e.g., weights of a neural network) that can *generalize* from a limited set of experiences to unseen states.

2. Limitations of Tabular Q-Learning

  • Scalability: Grid World or small tasks are fine. But problems like Lunar Lander or Tetris or Atari games have enormous state spaces.
  • Continuous State Spaces: Many real-world tasks (robotics, self-driving cars) have continuous states that can’t be enumerated in a table.
  • Memory Constraints Storing $|S| \times |A|$ Q-values is impossible when $|S|$ is huge.

3. Approximate Q-Learning

3.1 Parameterized Functions

We replace the Q-table with a function $Q(s,a;\theta)$. Examples include:

  1. Linear Models

    • $Q(s,a;\theta) = \sum_i \theta_i \,\phi_i(s,a)$.
    • Works for simpler tasks or hand-crafted feature sets.
  2. Neural Networks

    • $Q(s,a;\theta)$ is approximated by a deep neural net that can capture non-linear relationships.
    • More flexible, can handle raw images or complex states.

3.2 Training via Gradient Descent

Instead of directly updating a table entry, we sample transitions $(s,a,r,s')$, compute a target (like in Q-learning), and use backpropagation to update $\theta$.

$$\theta \leftarrow \theta \alpha \nabla_{\theta} \Bigl[Q(s,a;\theta) - \text{target}\Bigr]^2.$$

4. Advantages

  • Memory Efficiency: A parameterized model (especially a neural network) can generalize across states, avoiding the need to store explicit Q-values for every $(s,a)$.
  • Expressive Power: Neural networks can learn complex state representations from raw inputs (e.g., images in Atari games).
  • Scalable to Large / Continuous: Extends Q-learning to environments previously considered unmanageable.

5. Summary

Approximate Q-learning bridges the gap between tabular Q-learning and real-world applications by leveraging parameterized functions. This sets the stage for Deep Q-Learning (DQN) , which employs deep neural networks to handle complex tasks like image-based Atari games or advanced robotics or Lunar Lander.

Previous Lesson Next Lesson