Lunar Lander

Last updated: December 15, 2024

1. What is Lunar Lander v3?

The Lunar Lander v3 environment, part of the OpenAI Gym (now Gymnasium), simulates the challenge of controlling a lunar landing module to land safely on a designated platform. It provides a robust environment for training and evaluating reinforcement learning (RL) algorithms.

The environment is physics-based, capturing the dynamics of the lander under gravitational forces and thruster actions, making it a popular choice for RL practitioners to test their models on continuous or discrete control tasks.

1a. Goal of the Environment

The primary goal in Lunar Lander v3 is to land the lunar module:

Safely: The module should land upright on the designated platform.
Efficiently: Fuel usage should be minimized.

Points are awarded based on landing accuracy, and penalties are given for crashing or using excessive fuel.

1b. Action and Observation Space

Action Space:
- Discrete Mode: The environment has 4 discrete actions:
  1. Do nothing.
  2. Fire left orientation engine.
  3. Fire right orientation engine.
  4. Fire the main engine (upwards thrust).
- Continuous Mode: In continuous control, actions correspond to continuous values for the main engine and the side thrusters.
Observation Space: The observation is an 8-dimensional vector that includes:
- x: Horizontal position of the lander (meters).
- y: Vertical position of the lander (meters).
- vx: Horizontal velocity (meters/second).
- vy: Vertical velocity (meters/second).
- θ: Angle of the lander (radians).
- ω: Angular velocity (radians/second).
- leg1_contact: Boolean indicating whether the left leg has touched the ground.
- leg2_contact: Boolean indicating whether the right leg has touched the ground.

1c. Rewards

The reward system in Lunar Lander v3 is designed to encourage safe and accurate landings:

Positive Rewards:
- Landing close to the target platform.
- Coming to rest with minimal velocity.
- Keeping the module upright.
Negative Rewards:
- Crashing the lander.
- Drifting too far from the platform.
- Excessive use of fuel (penalized for each action).

Landing successfully within a small target zone gives a significant reward boost.

1d. Parameters Affecting the Lander

The behavior of the lander is influenced by:

Gravity: Constant downward force simulating the Moon's gravitational pull.
Thruster Force: Determines how much acceleration the engines produce.
Fuel Consumption: Actions consume fuel, penalizing inefficient strategies.
Wind (Optional): Random lateral forces can be added for additional complexity.

1e. Solving the Environment

The environment is considered "solved" when an RL agent consistently achieves an average score of 200 over 100 episodes.

2. Use in Reinforcement Learning

Lunar Lander v3 is an excellent platform for testing RL techniques like:

Policy Gradient Methods (e.g., PPO, A3C).
Value-Based Methods (e.g., DQN).
Actor-Critic Algorithms (e.g., SAC, DDPG).

The discrete and continuous action spaces make it versatile for both categories of RL algorithms.

2a. Getting Started

import gymnasium as gym

# Initialize the environment
env = gym.make("LunarLander-v3", continuous=False, render_mode="rgb_array")

# Reset the environment
state, info = env.reset()

print(f"Initial State: {state}")
print(f"Action Space: {env.action_space}")
print(f"Observation Space: {env.observation_space}")

This snippet initializes the environment and prints essential information about the state, action, and observation spaces. It’s a starting point for anyone wanting to explore the Lunar Lander v3 environment.

Output:

Initial State: [ 0.00619869  1.4208851   0.62784475  0.44288316 -0.00717595 -0.14221632 0.          0.        ]

Action Space: Discrete(4)

Observation Space: Box(
[ -2.5     -2.5     -10.     -10.     -6.2831855     -10.     -0.     -0. ],
[  2.5      2.5      10.      10.       6.2831855     10.      1.      1. ],
(8,),
float32)

Result from a properly trained RL Algorithm

3. Ideal Behavior: Best Agent's Achievements

Efficient and Safe Landing:
- Land consistently on the landing pad with minimal deviation from the center.
- Achieve a soft landing with low vertical and horizontal velocity.
Optimal Fuel Usage:
- Use the side and main engines sparingly to minimize penalties for engine firing.
Stable Orientation:
- Maintain a near-horizontal orientation during descent, avoiding excessive tilt.
Maximizing Rewards:
- Keep both legs in contact with the ground upon landing (+20 points).
- Avoid crashing (-100 penalty).
- Secure a safe landing (+100 bonus).
Achieve High Scores:
- Consistently score 200 or more points per episode to meet the success criteria.

The best agent would demonstrate highly efficient, controlled landings that maximize rewards while minimizing penalties.

Source

https://gymnasium.farama.org/environments/box2d/lunar_lander/