Clipped Action Policy Gradient Model-Based Reinforcement Learning @NIPS2017 Safe and Efficient Off-Policy Reinforcement Learning Learning Continuous Control Policies by Stochastic Value Gradients Trust Region Policy Optimization Effective Modern C++ Item 24: Distinguish universal references from rvalue references. "Playing Atari with Deep Reinforcement Learning"