Analysis of temporal-diffference learning with function approximation

J Tsitsiklis, B Van Roy - Advances in neural information …, 1996 - proceedings.neurips.cc
Advances in neural information processing systems, 1996proceedings.neurips.cc
We present new results about the temporal-difference learning al (cid: 173) gorithm, as
applied to approximating the cost-to-go function of a Markov chain using linear function
approximators. The algo (cid: 173) rithm we analyze performs on-line updating of a
parameter vector during a single endless trajectory of an aperiodic irreducible finite state
Markov chain. Results include convergence (with probability 1), a characterization of the
limit of convergence, and a bound on the resulting approximation error. In addition to …
Abstract
We present new results about the temporal-difference learning al (cid: 173) gorithm, as applied to approximating the cost-to-go function of a Markov chain using linear function approximators. The algo (cid: 173) rithm we analyze performs on-line updating of a parameter vector during a single endless trajectory of an aperiodic irreducible finite state Markov chain. Results include convergence (with probability 1), a characterization of the limit of convergence, and a bound on the resulting approximation error. In addition to establishing new and stronger results than those previously available, our analysis is based on a new line of reasoning that provides new intuition about the dynamics of temporal-difference learning. Furthermore, we discuss the implications of two counter-examples with regards to the Significance of on-line updating and linearly parameterized function approximators.
proceedings.neurips.cc
Showing the best result for this search. See all results