Reinforcement
Reinforcement
and determining from these what it should do. This is the problem
of reinforcement learning. This chapter only considers fully observable,
single-agent reinforcement learning [although Section 10.4.2 considered a
simple form of multiagent reinforcement learning].
We can formalize reinforcement learning in terms of Markov decision
processes, but in which the agent, initially, only knows the set of possible
states and the set of possible actions. Thus, the dynamics, P(s'|a,s), and the
reward function, R(s,a,s'), are initially unknown. An agent can act in a world
and, after each step, it can observe the state of the world and observe what
reward it obtained.