Deep Q-Network for Autonomous Lunar Landing

Teaching an RL agent to land on the Moon, from 2D prototypes to 3D physics simulations

The Problem

Autonomous lunar landing is one of the hardest control problems in aerospace: the agent must manage fuel, attitude, and descent rate simultaneously while dealing with no atmosphere for aerodynamic braking. I wanted to see if a reinforcement learning agent could learn this from scratch, with no pre-programmed flight dynamics knowledge.

Phase 1: 2D Prototype

I started with OpenAI Gymnasium's LunarLander-v2 environment, a simplified 2D problem with discrete thrust actions. I implemented a Dueling Double Deep Q-Network (D3QN) in PyTorch that achieved a >95% landing success rate after training.

Key insight: The dueling architecture, splitting the Q-value into state-value and advantage streams, dramatically improved learning stability compared to vanilla DQN.

Phase 2: Custom 3D MuJoCo Environment

The 2D environment was a great proof of concept, but real lunar landers operate in 3D with continuous thrust control. I built a custom rigid-body simulation in MuJoCo with accurate 1.62 m/s² lunar gravity, modeling a lander with 5 individually controllable thrusters.

The state space expanded to 13 dimensions: position (x, y, z), velocity, orientation quaternion, angular velocity, and remaining fuel mass. This was too complex for discrete actions, so I switched architectures.

Phase 3: DDPG with Hybrid Control

I upgraded to a Deep Deterministic Policy Gradient (DDPG) agent to handle the continuous action space. The key innovation was a hybrid PD-RL attitude controller: the RL agent outputs high-level 3D thrust and torque commands, while a PD controller handles low-level attitude stabilization across the 5 thrusters.

Training tricks that mattered:

Prioritized Experience Replay (PER) to focus on rare failure modes
Soft target updates (tau = 0.005) instead of hard copies
Gradient clipping to prevent catastrophic policy updates
Shaped reward: fuel efficiency + slow terminal descent + upright stability

Results & Takeaways

The final DDPG agent achieves reliable soft landings with fuel-efficient trajectories. The hybrid controller approach was essential: pure RL struggled with attitude control, while the PD layer provided the stability the RL agent could build on top of.

This project deepened my understanding of sim-to-real transfer challenges and reinforced my interest in autonomous spacecraft systems. The next step would be domain randomization to prepare for transfer to actual flight hardware.

Back to all projects