0% found this document useful (0 votes)
31 views1 page

Artificial Intel

Reinforcement learning involves an agent learning how to interact with an environment through trial and error to maximize rewards and achieve goals. The agent receives observations from the environment and takes actions to optimize a reward signal without examples of optimal behavior. The goal is to optimize the sum of rewards through repeated interaction.

Uploaded by

Bryan C. Guzmán
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views1 page

Artificial Intel

Reinforcement learning involves an agent learning how to interact with an environment through trial and error to maximize rewards and achieve goals. The agent receives observations from the environment and takes actions to optimize a reward signal without examples of optimal behavior. The goal is to optimize the sum of rewards through repeated interaction.

Uploaded by

Bryan C. Guzmán
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd

Reinforcement Learning

Definition of Intelligence: To be able to learn to make decisions to achieve goals.

"Learning, decisions, and goals" - are all central concepts

What is RL?

People and animals learn by interacting with our environment

This differ from certain other types of learning


- It is "Active" rather than "Passive"
- Interaction are often "sequential" - future interactions can depend on earlier
ones

We are "goal-directed" - we do thing with a purpose


We can learn "without examples" of optimal behavior. (Nobody gives you exactly the
low level actions that are required to execute that thing you want to execute)

Maybe we do interpret something that we see in some sense as an example but maybe
typically at a much higher level of abstraction and in order to actually fill in
that example, in order to execute what we want to mimic

Instead, we optimise some "reward signal"

An agent interacting with the environment


agent <---- observation / action ----> environment

The main purpose of this course is then go basically inside that agent and figure
out how we could build learning algorithms that can help that agent learn to
interact
better and what does better mean here, well the agent is going to try to optimize
some "reward signal".

Goal: Optimize sum of rewards, trough repeated


interaction (not the inmediate reward)

If there is no goal specified then it's unclear what we're actually optimizing and
it's unclear what the agent will actually learn to do. So we need some mechanism
to specify that goal. Many cases people put these rewards next to the observation,
and that's one useful way to think about this, that you take an action and then
the environment gives you and observation and a reward.

Definition of RL:

Reinforcement Learning is based on the "reward hypothesis": "Any goal can be


formalized as the outcome of maximizing a cumulative reward"
It's hard to specify your goal precisely or specify a reward which is easy to
optimize

You might also like