mooopan

Sort by
Clipped Action Policy Gradient
Model-Based Reinforcement Learning @NIPS2017
ChainerRLの紹介
Safe and Efficient Off-Policy Reinforcement Learning
A3Cという強化学習アルゴリズムで遊んでみた話
最近のDQN
Learning Continuous Control Policies by Stochastic Value Gradients
Trust Region Policy Optimization
Effective Modern C++ Item 24: Distinguish universal references from rvalue references.
"Playing Atari with Deep Reinforcement Learning"