Asyncronous RL in Tensorflow + Keras + OpenAI’s Gym
This is a Tensorflow + Keras implementation of asyncronous 1-step Q learning as described in "Asynchronous Methods for Deep Reinforcement Learning" .
Since we’re using multiple actor-learner threads to stabilize learning in place of experience replay (which is super memory intensive), this runs comfortably on a macbook w/ 4g of ram.
It uses Keras to define the deep q network (see model.py), OpenAI’s gym library to interact with the Atari Learning Environment (see atari_environment.py), and Tensorflow for optimization/execution (see async_dqn.py).
To kick off training, run:
python async_dqn.py --experiment breakout --game "Breakout-v0" --num_concurrent 8
Here we’re organizing the outputs for the current experiment under a folder called ‘breakout’, choosing "Breakout-v0" as our gym environment, and running 8 actor-learner threads concurrently.
Visualizing training with tensorboard
We collect episode reward stats and max q values that can be vizualized with tensorboard by running the following:
tensorboard --logdir /tmp/summaries/breakout
This is what my per-episode reward and average max q value curves looked like over the training period:
To run a gym evaluation, turn the testing flag to True and hand in a current checkpoint file:
python async_dqn.py --experiment breakout --testing True --checkpoint_path /tmp/breakout.ckpt-2690000 --num_eval_episodes 100
After completing the eval, we can upload our eval file to OpenAI’s site as follows:
import gym gym.upload('/tmp/breakout/eval', api_key='YOUR_API_KEY')
Now we can find the eval at https://gym.openai.com/evaluations/eval_uwwAN0U3SKSkocC0PJEwQ
See a3c.py for a WIP async advantage actor critic implementation.
I found these super helpful as general background materials for deep RL:
- David Silver’s "Deep Reinforcement Learning" lecture
- Nervana’s Demystifying Deep Reinforcement Learning blog post
This has no affiliation with Deepmind or the authors, this is just a simple project I was using to learn TensorFlow. Feedback is highly appreciated.