参考文献 I 313
[44] Azal i a M 的 ose i n i , e t al. "Chip placemen t with deep re info rceme11 t learn i ng." arX1v
pre pri nt arX iv:2004. l 0746 (2020).
[45] Il ge A烛aya, e t a l. "Solvi ng rub i k's cube with a robo t hand." arXi v prepri nt arX i v
I 910.071 13 (2019).
[46] Sergey Lev ine, et a l. "Offlin e re inforcemen t learni 11 g: Tu torial, rev i ew, and perspec ti ves
on open problems. " 扣沁v pre pr int arXi v:2005.01643 (2020)
[47] Todd Hes ter, et a l. "Deep Q-learn i n g fr om Demons tra ti ons." Proceed i n gs o f t he
AAAI Conference on 凡tifi c i a l In tell igence. Vo l. 32. No. 1. 2018.
[48] Dav id Si lver, e t a l. " Reward is enou gh." Ar tifi c i al In telligence 299 (2021) : 103535.
附录 C
[49] horom a1-y 「 D QN 0)迪化史@ Dou ble-D QN, Duelin g-ne tw ork, No i sy network 」
参考文献313
[46] Sergey Levine, et al. Ofline reinforeement leaming: Tutorial, review, and perspectives
[4]Todd Hester, et al. "Deep Q-leaming from Demonstrations." Proceedings of the
[45] Ilge Akkaye, et al, "Solving rubik's cube with a robot hand." arXiv preprint arXiv
Azalia Mirboseini, et al. "Chip placement with deep reinforcement leaming," arXiv
[48] David Silver, etal. "Reward is enongh." Artificial Intelligence 299 (2021): 103535.
[49] horomary [DQN の通化史② Double-DQN, Dueling-network, Noisymetwork]
AAAI Conference on Artificial Intlligence. Vol. 32. No. 1. 2018.
on open problems." arXiv preprint arXiv:2005.01643 (2020).
preprint arXiv:2004.10746 (2020)
1910.07113 (2019)
录C
[44]
丫
-
,
I`