0% found this document useful (0 votes)
28 views1 page

Reinforcement

Reinforcement learning involves an agent receiving rewards and punishments to determine its actions in an environment. The agent initially only knows the possible states and actions, not the dynamics or reward function. It can act and observe the state, then receive a reward. Reinforcement learning is difficult because the agent must determine which past actions led to a reward or punishment, as the responsible actions may have occurred long before and involved a combination of circumstances.

Uploaded by

Ninni Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views1 page

Reinforcement

Reinforcement learning involves an agent receiving rewards and punishments to determine its actions in an environment. The agent initially only knows the possible states and actions, not the dynamics or reward function. It can act and observe the state, then receive a reward. Reinforcement learning is difficult because the agent must determine which past actions led to a reward or punishment, as the responsible actions may have occurred long before and involved a combination of circumstances.

Uploaded by

Ninni Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 1

Imagine a robot that can act in a world, receiving rewards and punishments

and determining from these what it should do. This is the problem
of reinforcement learning. This chapter only considers fully observable,
single-agent reinforcement learning [although Section 10.4.2 considered a
simple form of multiagent reinforcement learning].
We can formalize reinforcement learning in terms of Markov decision
processes, but in which the agent, initially, only knows the set of possible
states and the set of possible actions. Thus, the dynamics, P(s'|a,s), and the
reward function, R(s,a,s'), are initially unknown. An agent can act in a world
and, after each step, it can observe the state of the world and observe what
reward it obtained.

Reinforcement learning is difficult for a number of reasons:


The blame attribution problem is the problem of determining
which action was responsible for a reward or punishment. The
responsible action may have occurred a long time before the reward
was received. Moreover, not a single action but rather a combination
of actions carried out in the appropriate circumstances may be
responsible for the reward. For example, you could teach an agent to
play a game by rewarding it when it wins or loses; it must determine
the brilliant moves that were needed to win. You may try to train a
dog by saying "bad dog" when you come home and find a mess. The
dog has to determine, out of all of the actions it did, which of them
were the actions that were responsible for the reprimand.

You might also like