Reinforcement Learning
Definition of Intelligence: To be able to learn to make decisions to achieve goals.
"Learning, decisions, and goals" - are all central concepts
What is RL?
People and animals learn by interacting with our environment
This differ from certain other types of learning
- It is "Active" rather than "Passive"
- Interaction are often "sequential" - future interactions can depend on earlier
ones
We are "goal-directed" - we do thing with a purpose
We can learn "without examples" of optimal behavior. (Nobody gives you exactly the
low level actions that are required to execute that thing you want to execute)
Maybe we do interpret something that we see in some sense as an example but maybe
typically at a much higher level of abstraction and in order to actually fill in
that example, in order to execute what we want to mimic
Instead, we optimise some "reward signal"
An agent interacting with the environment
agent <---- observation / action ----> environment
The main purpose of this course is then go basically inside that agent and figure
out how we could build learning algorithms that can help that agent learn to
interact
better and what does better mean here, well the agent is going to try to optimize
some "reward signal".
Goal: Optimize sum of rewards, trough repeated
interaction (not the inmediate reward)
If there is no goal specified then it's unclear what we're actually optimizing and
it's unclear what the agent will actually learn to do. So we need some mechanism
to specify that goal. Many cases people put these rewards next to the observation,
and that's one useful way to think about this, that you take an action and then
the environment gives you and observation and a reward.
Definition of RL:
Reinforcement Learning is based on the "reward hypothesis": "Any goal can be
formalized as the outcome of maximizing a cumulative reward"
It's hard to specify your goal precisely or specify a reward which is easy to
optimize