Reinforcement Learning

Here are some of my results from a Stanford class (234, Emma Brunskill) I took in 2023-4. All these use test RI environments provided by OpenAI's Gymnasium.

RL-pong-dqn.mp4

Pong (DQN)

Learning Pong implemented with DeepMind's deep Q-learning. My model's paddle is on the right. The cool thing about this is that it's learning directly from pixels! The input are the last four video frames of the gameplay, after preprocessing them: max-pooling between adjacent frames, converting to grayscale, cropping, rescaling, and stacking, so 210x160x3x4 becomes a 80x80x4 input.

The following use the REINFORCE policy gradient algorithm, with an optional neural network baseline to reduce variance:

RL-HalfCheetah-no-baseline.mp4

Half-Cheetah

Learns locomotion in most unlikely way.

RL-CartPole-baseline.mp4

Cart Pole

Learns a balancing act, by applying forces horizontally to the cart.

RL-InvertedPendulum-baseline.mp4

Inverted Pendulum

Similar to the last one, but powered by the Mujoco physics simulator.

Page updated

Google Sites

Report abuse