CartPole Environment

Last updated: December 15, 2024

1. Introduction

Gymnasium is a powerful toolkit for developing and testing reinforcement learning (RL) algorithms. It provides a diverse set of environments that simulate challenges where RL agents interact, learn from observations, and make decisions to achieve goals. Each environment offers a structured framework to train and test algorithms, making Gymnasium an essential tool for researchers and practitioners.

Environments like CartPole, Lunar-Lander, and many others help bridge theoretical RL concepts with practical implementation, enabling agents to perform tasks, receive feedback through rewards, and improve their decision-making strategies over time.

2. What is CartPole?

The CartPole environment is a classic control problem and one of the most widely used benchmarks in reinforcement learning. The challenge involves balancing a pole upright on a moving cart. CartPole is simple yet effective, making it a popular starting point for learners and a testing ground for RL models.

The goal is to prevent the pole from falling over by moving the cart left or right, based on observations of the system's state.

2a. Goal of the Environment

The objective of the CartPole environment is:

To keep the pole balanced upright for as long as possible.
To ensure the cart does not move out of the bounds of the track.

The episode terminates if:

The pole falls beyond a specific angle (±12 degrees from vertical).
The cart moves outside the bounds of the track (±2.4 units from the center).
The pole remains upright for 500 time steps (in which case, the task is "solved").

2b. Action and Observation Space

Action Space:
- The environment has two discrete actions:
  - 0: Push the cart to the left.
  - 1: Push the cart to the right.
- These actions are the agent's choices and directly influence the motion of the cart and the pole's stability.
Observation Space: The state observation is a 4-dimensional vector containing:
- Cart Position: Position of the cart on the track.
- Cart Velocity: Speed and direction of the cart’s motion.
- Pole Angle: Angle of the pole relative to the vertical.
- Pole Angular Velocity: Rate of change of the pole's angle.

These values form the input to the agent's policy or decision-making system.

2c. Rewards

In CartPole, the agent receives:

A reward of +1 for every time step that the pole remains upright.

This simple reward structure encourages the agent to balance the pole for as long as possible.

2d. Parameters Affecting the Environment

The dynamics of the environment depend on:

Gravity: The downward force acting on the pole.
Mass of the Pole and Cart: Influences the system's inertia and response to actions.
Force Magnitude: Determines the effectiveness of the push action.
Length of the Pole: Affects the pole's balance and angular velocity.

These parameters define the physical simulation and make the environment challenging to solve optimally.

2e. Solving the Environment

The environment is considered "solved" when an agent achieves an average score of 195 over 100 consecutive episodes. Each episode can last up to 500 time steps if the pole remains upright.

3. Getting Started with CartPole

Here's how to initialize the environment and retrieve basic information:

import gymnasium as gym

# Initialize the CartPole environment
env = gym.make("CartPole-v1", render_mode="rgb_array")

# Get the number of actions
n_actions = env.action_space.n
print(f"Number of possible actions: {n_actions}")
print("""
Actions:
    0: Push cart to the left
    1: Push cart to the right
""")

# Get the number of state observations
state, info = env.reset()
n_observations = len(state)
print(f"Number of state observations: {n_observations}")

print("""
State (Observation Space):
    [Cart Position, Cart Velocity, Pole Angle, Pole Angular Velocity]
""")
print("Current state: ", state)

Output:

Number of possible actions: 2
Actions:
    0: Push cart to the left
    1: Push cart to the right

Number of state observations: 4
State (Observation Space):
    [Cart Position, Cart Velocity, Pole Angle, Pole Angular Velocity]
Current state:  [ 0.018509  0.036974 -0.005213 -0.045713]

Result from a properly trained RL Algorithm

4. Practical Application

The simplicity of the CartPole environment makes it ideal for:

Learning basic RL algorithms like Deep Q-Learning (DQN).
Exploring policy-based approaches (e.g., REINFORCE).
Testing reward functions, exploration strategies, and neural network architectures.

By mastering CartPole, you gain foundational skills to tackle more complex environments, making it an essential step in your reinforcement learning journey.

Source

https://gymnasium.farama.org/environments/classic_control/cart_pole/