1. Introduction
Tabular Q-learning becomes infeasible for high-dimensional or continuous state spaces. Instead of storing a Q-table with an entry for every $(s,a)$ pair, we use a parameterized function $Q(s,a;\theta)$. The agent learns parameters $\theta$ (e.g., weights of a neural network) that can *generalize* from a limited set of experiences to unseen states.
2. Limitations of Tabular Q-Learning
- Scalability: Grid World or small tasks are fine. But problems like Lunar Lander or Tetris or Atari games have enormous state spaces.
- Continuous State Spaces: Many real-world tasks (robotics, self-driving cars) have continuous states that can’t be enumerated in a table.
- Memory Constraints Storing $|S| \times |A|$ Q-values is impossible when $|S|$ is huge.
3. Approximate Q-Learning
3.1 Parameterized Functions
We replace the Q-table with a function $Q(s,a;\theta)$. Examples include:
-
Linear Models
- $Q(s,a;\theta) = \sum_i \theta_i \,\phi_i(s,a)$.
- Works for simpler tasks or hand-crafted feature sets.
-
Neural Networks
- $Q(s,a;\theta)$ is approximated by a deep neural net that can capture non-linear relationships.
- More flexible, can handle raw images or complex states.
3.2 Training via Gradient Descent
Instead of directly updating a table entry, we sample transitions $(s,a,r,s')$, compute a target (like in Q-learning), and use backpropagation to update $\theta$.
$$\theta \leftarrow \theta \alpha \nabla_{\theta} \Bigl[Q(s,a;\theta) - \text{target}\Bigr]^2.$$
4. Advantages
- Memory Efficiency: A parameterized model (especially a neural network) can generalize across states, avoiding the need to store explicit Q-values for every $(s,a)$.
- Expressive Power: Neural networks can learn complex state representations from raw inputs (e.g., images in Atari games).
- Scalable to Large / Continuous: Extends Q-learning to environments previously considered unmanageable.
5. Summary
Approximate Q-learning bridges the gap between tabular Q-learning and real-world applications by leveraging parameterized functions. This sets the stage for Deep Q-Learning (DQN) , which employs deep neural networks to handle complex tasks like image-based Atari games or advanced robotics or Lunar Lander.