Google Scholar

Analysis of temporal-diffference learning with function approximation

J Tsitsiklis, B Van Roy - Advances in neural information …, 1996 - proceedings.neurips.cc

Advances in neural information processing systems, 1996•proceedings.neurips.cc

Abstract

We present new results about the temporal-difference learning al (cid: 173) gorithm, as applied to approximating the cost-to-go function of a Markov chain using linear function approximators. The algo (cid: 173) rithm we analyze performs on-line updating of a parameter vector during a single endless trajectory of an aperiodic irreducible finite state Markov chain. Results include convergence (with probability 1), a characterization of the limit of convergence, and a bound on the resulting approximation error. In addition to establishing new and stronger results than those previously available, our analysis is based on a new line of reasoning that provides new intuition about the dynamics of temporal-difference learning. Furthermore, we discuss the implications of two counter-examples with regards to the Significance of on-line updating and linearly parameterized function approximators.

proceedings.neurips.cc

Show moreShow less

Save Cite Cited by 2447 Related articles All 28 versions Library Search View as HTML

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

Analysis of temporal-diffference learning with function approximation