0% found this document useful (0 votes)

69 views25 pages

21 - Reinforcement Learning

Reinforcement learning is a machine learning technique where an agent learns to achieve a goal in an environment by receiving rewards or punishments for its actions. The agent learns through trial-and-error interactions with the environment without relying on external teachers. In reinforcement learning, the agent learns to map situations to actions in a way that maximizes rewards. It does this by exploring various actions and learning which ones yield the highest rewards.

Uploaded by

nada abdelrahman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views25 pages

21 - Reinforcement Learning

Uploaded by

nada abdelrahman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Reinforcement Learning

Geoff Hulten
Reinforcement Learning
• Learning to interact with an environment
• Robots, games, process control
• With limited human training
• Where the ‘right thing’ isn’t obvious Agent

Reward

Action
State
• Supervised Learning:
• Goal:
• Data:
Environment

• Reinforcement Learning:
• Goal:
Maximize

• Data:
TD-Gammon – Tesauro ~1995 State: Board State
Actions: Valid Moves
Reward: Win or Lose

• Net with 80 hidden units,

initialize to random weights

• Select move based on network

estimate & shallow search
P(win)

• Learn by playing against itself

• 1.5 million games of training

-> competitive with world class players
Atari 2600 games

State: Raw Pixels

Actions: Valid Moves
Reward: Game Score

• Same model/parameters for

~50 games

https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf
Robotics and
Locomotion

State:
Joint States/Velocities
Accelerometer/Gyroscope
Terrain
Actions: Apply Torque to Joints
Reward: Velocity – { stuff }

https://youtu.be/hx_bgoTF7bs

2017 paper https://arxiv.org/pdf/1707.02286.pdf

Alpha Go State: Board State
Actions: Valid Moves
Reward: Win or Lose
• Learning how to beat humans at ‘hard’ games (search
space too big)

• Far surpasses (Human) Supervised learning

• Algorithm learned to outplay humans at chess in 24 hours

https://deepmind.com/documents/119/agz_unformatted_nature.pdf
How Reinforcement Learning is Different
• Delayed Reward

• Agent chooses training data

• Explore vs Exploit (Life long learning)

• Very different terminology (can be confusing)

Setup for Reinforcement Learning
Markov Decision Process Policy
(environment) (agent’s behavior)
• Discrete-time stochastic control process • – The action to take in state

• Each time step, :

• Agent chooses action from set • Goal maximize:
• Moves to new state with probability:
Probability of moving to each state
• Receives reward:
• – Tradeoff immediate vs future

• Every outcome depends on and

• Nothing depends on previous states/actions

Reward for making that move Value of being in that state

Simple Example of Agent in an Environment
State:
Map Locations
Score: 100
0
0, 0 1, 0 2, 0
100
Actions:
Move within map
Reaching chest ends episode 0, 1 1, 1 2, 1

0, 2 1, 2 2, 2
…

Reward:
100 at chest
0 for others
Policies
Policy Evaluating Policies

0, 0 1, 0 2, 0

12.5 100

0, 1 1, 1 2, 1

0, 2 1, 2 2, 2

Move to <1,1>
Move to <0,1> Move to <1,0>
Move to <2,0>

𝜋
Policy could be better
𝑉 ¿
Q learning
Learn a policy that optimizes for all states, using:
• No prior knowledge of state transition probabilities:
• No prior knowledge of the reward function:

Approach:
• Initialize estimate of discounted reward for every state/action pair:
• Repeat (for a while):
• Take a random action from
• Receive and from environment
• Update = +
• Random restart if in terminal state

1 Exploration Policy:
∝𝑣 =
1+𝑣𝑖𝑠𝑖𝑡𝑠( 𝑠 , 𝑎)
Example of Q learning
(round 1)

• Initialize to 0
• Random initial state =
0, 0 1, 0 2, 0
0
• Random action from 0

0
0 0
0 0 100
0
0, 1 1, 1 2, 1
0 0

• Update
0 0
0 0 0
• Random action from 0, 2
0
1, 2
0
2, 2
0

0 0

• Update

• No more moves possible, start again…

Example of Q learning
(round 2)

• Round 2: Random initial state =

• Random action from 0, 0 1, 0 2, 0
0 0

0
0 0
0 0 100
0, 1 1, 1 2, 1
• Update + * 100 0 0

0 0
0 0 0
• Random action from 0, 2
0
1, 2
0
2, 2
0
50

0 0

• Update

• No more moves possible, start again…

𝛾=0.5
Example of Q learning
(some acceleration…)
𝛾=0.5

• Random Initial State 0, 0

0
1, 0
0
2, 0

0
0 0
• Update 0, 1
0
1, 1
0 100
2, 1
0 0
50

0 0
• Update 0
0
0
0
50
0

0, 2 1, 2 2, 2
0 25
0

0 0
Example of Q learning
(some acceleration…)
𝛾=0.5

• Random Initial State 0, 0

0
1, 0
100
0
2, 0

0
0 0
• Update 0, 1
0
1, 1
0 100
2, 1
25
0 50

0 0
• Update 0
0
0
0
50
0

0, 2 1, 2 2, 2
0 25

0 0
Example of Q learning
( after many, many runs…)

0, 0 1, 0 2, 0
50 100
• converged
25
12.5 25
25 50 100
0, 1 1, 1 2, 1
• Policy is: 25 50

12.5 25
6.25 12.5 25
12.5 25 50
0, 2 1, 2 2, 2
12.5 25

6.25 12.5
Challenges for Reinforcement Learning
• When there are many
states and actions Turns Remaining: 15

• When the episode can

end without reward

• When there is a
‘narrow’ path to Each stepexploring
Random ~50% probability of of
will fall off going
ropewrong
~97%way – P(reaching
of the time goal) ~ 0.01%
reward
Reward Shaping

• Hand craft intermediate

objectives that yield
reward

• Encourage the right type

of exploration

• Requires custom human

work

• Risks of learning to game

the rewards
Memory
• Retrain on previous explorations
0, 0 1, 0 2, 0
0
50 100

• Maintain samples of:

0
0 0
25
0 0
50 100
0
0, 1 1, 1 2, 1
25
0 0
50

0 0
0 0 0
25
0
25 0 50
0
0, 2 1, 2 2, 2
• Useful when 0 25
0

• It is cheaper to use some RAM/CPU than to run 0 12.5

0
more simulations
Replay
Replay
Do itana exploration
bunchexploration
a different of times
• It is hard to get to reward so you want to
leverage it for as much as possible when it
happens
Gym – toolkit for reinforcement learning
CartPole
import gym

env = gym.make('CartPole-v0')

import random
import QLearning # Your implementation goes here...
import Assignment7Support

trainingIterations = 20000

qlearner = QLearning.QLearning(<Parameters>)

for trialNumber in range(trainingIterations):

observation = env.reset()
reward = 0
for i in range(300):
Reward +1 per step the pole remains up env.render() # Comment out to make much faster...

currentState = ObservationToStateSpace(observation)
action = qlearner.GetAction(currentState, <Parameters>)
MountainCar
oldState = ObservationToStateSpace(observation)
observation, reward, isDone, info = env.step(action)
newState = ObservationToStateSpace(observation)

qlearner.ObserveAction(oldState, action, newState, reward, …)

if isDone:
if(trialNumber%1000) == 0:
print(trialNumber, i, reward)
break

# Now you have a policy in qlearner – use it...

Reward 200 at flag -1 per step

https://gym.openai.com/docs/
Some Problems with QLearning
• State space is continuous
• Must approximate by discretizing

• Treats states as identities print(env.observation_space.high)

#> array([ 2.4 , inf, 0.20943951, inf])
• No knowledge of how states relate print(env.observation_space.low)
• Requires many iterations to fill in #> array([-2.4 , -inf, -0.20943951, -inf])

• Converging can be difficult with

randomized transitions/rewards
Policy Gradients

• Q-learning -> learn a value function

• = an estimate of the expected
discounted reward of taking from
• Performance time: take the action
that has the highest estimated value

• Policy Gradient -> learn policy

directly
• Probability distribution over
• Performance time: choose action
according to distribution

Example from: https://www.youtube.com/watch?v=tqrcjHuNdmQ

Policy Gradients

• Receive a frame
• Forward propagate to get
• Select by sampling from
• Find the gradient that makes more likely –
store it
• Play the rest of the game
• If won, take a step in direction
• If lost, take a step in direction

One per action

Sum and step
in correct direction
Policy Gradients – reward shaping
Not relevant to outcome(?)

Less important to outcome More important to outcome

Summary Agent
Reinforcement Learning:

Reward

Action
• Goal: Maximize

State
• Data:

Environment

Many (awesome) recent successes:

• Robotics
• Surpassing humans at difficult games
• Doing it with (essentially) zero human knowledge (Simple) Approaches:
• Q-Learning -> discounted reward of action
• Policy Gradients -> Probability distribution over
Challenges: • Reward Shaping
• When the episode can end without reward • Memory
• When there is a ‘narrow’ path to reward • Lots of parameter tweaking…
• When there are many states and actions

Introduction to Reinforcement Learning
No ratings yet
Introduction to Reinforcement Learning
56 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
30 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
Deep Reinforcement Learning Overview
No ratings yet
Deep Reinforcement Learning Overview
52 pages
Reinforcement Learning - Personal Study Notes
No ratings yet
Reinforcement Learning - Personal Study Notes
12 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
10 ReinforcementLearning
No ratings yet
10 ReinforcementLearning
59 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Overview of Reinforcement Learning
No ratings yet
Overview of Reinforcement Learning
17 pages
Module - 5 - 6 - Reinforcement Learning
No ratings yet
Module - 5 - 6 - Reinforcement Learning
15 pages
Intro To Reinforcement Learning - DQ Q AC A3C
No ratings yet
Intro To Reinforcement Learning - DQ Q AC A3C
36 pages
10 Deep Reinforcement
No ratings yet
10 Deep Reinforcement
40 pages
37 RL
No ratings yet
37 RL
18 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
An Introduction To Deep ReinforcementLearning
No ratings yet
An Introduction To Deep ReinforcementLearning
65 pages
Understanding Reinforcement Learning
No ratings yet
Understanding Reinforcement Learning
15 pages
ML Unit-V
No ratings yet
ML Unit-V
20 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
63 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
32 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
10 pages
Reinforcement Learning Guide
No ratings yet
Reinforcement Learning Guide
18 pages
ML - Unit 3 - Part II
No ratings yet
ML - Unit 3 - Part II
51 pages
Reinforcement Learning Mastery Path
No ratings yet
Reinforcement Learning Mastery Path
18 pages
Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
Basics of Reinforcement Learning
No ratings yet
Basics of Reinforcement Learning
15 pages
Reinforcement Learning Overview
No ratings yet
Reinforcement Learning Overview
32 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
59 pages
Sections
No ratings yet
Sections
76 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
16 pages
Reinforedu
No ratings yet
Reinforedu
46 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
35 pages
Learning Task
No ratings yet
Learning Task
14 pages
Lecture Notes On Reinforcement Learning Basics
No ratings yet
Lecture Notes On Reinforcement Learning Basics
6 pages
Unit 1
No ratings yet
Unit 1
18 pages
ML Assignment 2
No ratings yet
ML Assignment 2
6 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
16 pages
Lecture Week12
No ratings yet
Lecture Week12
37 pages
Deep Q-Learning with Python Guide
No ratings yet
Deep Q-Learning with Python Guide
12 pages
Deep Learning Binoy-19-3-RL Q Learning
No ratings yet
Deep Learning Binoy-19-3-RL Q Learning
26 pages
Lec 04 Reinforcement Learning
No ratings yet
Lec 04 Reinforcement Learning
57 pages
Reinforcement Learning - Ipynb - Colaboratory
No ratings yet
Reinforcement Learning - Ipynb - Colaboratory
7 pages
A (Long) Peek Into Reinforcement Learning - Lil'Log
No ratings yet
A (Long) Peek Into Reinforcement Learning - Lil'Log
23 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
15 Deep Reinforcement Learning v24.2
No ratings yet
15 Deep Reinforcement Learning v24.2
115 pages
Unit - 5
No ratings yet
Unit - 5
43 pages
Understanding Reinforcement Learning Basics
No ratings yet
Understanding Reinforcement Learning Basics
11 pages
Reinforcement Learning With Python
No ratings yet
Reinforcement Learning With Python
24 pages
Reinforcement Learning with Latent Confounding
No ratings yet
Reinforcement Learning with Latent Confounding
7 pages
p1 Piotr
No ratings yet
p1 Piotr
7 pages
ML Unit 4
No ratings yet
ML Unit 4
17 pages
AI Learning for Advanced Users
No ratings yet
AI Learning for Advanced Users
12 pages
Hota ML ReinforcementLearning
No ratings yet
Hota ML ReinforcementLearning
12 pages
Hackers Guide To Machine Learning With Python PDF
100% (16)
Hackers Guide To Machine Learning With Python PDF
272 pages
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
94% (18)
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
334 pages
TensorFlow For Machine Intelligence
100% (27)
TensorFlow For Machine Intelligence
305 pages
(EARLY RELEASE) Quick Start Guide To Large Language Models Strategies and Best Practices For Using ChatGPT and Other LLMs (Sinan Ozdemir) (Z-Library)
100% (15)
(EARLY RELEASE) Quick Start Guide To Large Language Models Strategies and Best Practices For Using ChatGPT and Other LLMs (Sinan Ozdemir) (Z-Library)
132 pages
Full Course of Machine Learning
100% (17)
Full Course of Machine Learning
660 pages
Machine Learning Paradigms
100% (10)
Machine Learning Paradigms
336 pages
Apress Understanding Large Language Models B0CJ2C8TXQ
100% (12)
Apress Understanding Large Language Models B0CJ2C8TXQ
166 pages
Burkov's Guide to Machine Learning
100% (11)
Burkov's Guide to Machine Learning
135 pages
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
96% (23)
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
471 pages
Machine Learning With Python
100% (15)
Machine Learning With Python
692 pages
Machine Learning - An Applied Mathematics Introduction PDF
100% (14)
Machine Learning - An Applied Mathematics Introduction PDF
246 pages
Deep Learning - Fundamentals, Theory and Applications 2019 PDF
100% (11)
Deep Learning - Fundamentals, Theory and Applications 2019 PDF
168 pages
Understanding Machine Learning
100% (73)
Understanding Machine Learning
416 pages
Python Cheat Sheet: Ata Tructures
100% (12)
Python Cheat Sheet: Ata Tructures
2 pages
Hands On Machine Learning With Python Concepts and Applications For Beginners - John Anderson 2018
91% (11)
Hands On Machine Learning With Python Concepts and Applications For Beginners - John Anderson 2018
166 pages
Applied Generative AI For Beginners Practical Knowledge 1703207445
94% (18)
Applied Generative AI For Beginners Practical Knowledge 1703207445
221 pages
Machine Learning Masterclass
100% (11)
Machine Learning Masterclass
108 pages
Neural Networks and Deep Learning - Deep Learning Explained To Your Granny - A Visual Introduction For Beginners Who Want To Make Their Own Deep Learning Neural Network (Machine Learning)
100% (5)
Neural Networks and Deep Learning - Deep Learning Explained To Your Granny - A Visual Introduction For Beginners Who Want To Make Their Own Deep Learning Neural Network (Machine Learning)
84 pages
Python Programming for Beginners_ From Basics to AI Integrations. 5-Minute Illustrated Tutorials, Coding Hacks, Hands-On Exercises & Case Studies to Master Python in 7 Days and Get Paid More by Prince
100% (15)
Python Programming for Beginners_ From Basics to AI Integrations. 5-Minute Illustrated Tutorials, Coding Hacks, Hands-On Exercises & Case Studies to Master Python in 7 Days and Get Paid More by Prince
244 pages
The Python Bible
97% (33)
The Python Bible
506 pages
Learn Python in A Day
93% (15)
Learn Python in A Day
141 pages
(Hunt, J.) A Beginners Guide To Python 3 Programming
96% (47)
(Hunt, J.) A Beginners Guide To Python 3 Programming
440 pages
Machine Learning Projects Python
94% (18)
Machine Learning Projects Python
134 pages
Machine Learning Projects in Python
100% (17)
Machine Learning Projects in Python
135 pages
Object Oriented Python Tutorial
100% (21)
Object Oriented Python Tutorial
111 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
64 pages
Machine Learning
100% (3)
Machine Learning
47 pages
Machine Learning Notes
83% (12)
Machine Learning Notes
19 pages
Foundations of Machine Learning
100% (4)
Foundations of Machine Learning
202 pages
Sanet - ST - Building Applications With AI Agents
100% (2)
Sanet - ST - Building Applications With AI Agents
72 pages
Project Charter: People, Processes, Partnerships: A Faculty of Arts Initiative
No ratings yet
Project Charter: People, Processes, Partnerships: A Faculty of Arts Initiative
5 pages
A Markov Decision Model To Optimize Hotel Room Occupancy Under Stochastic Demand
No ratings yet
A Markov Decision Model To Optimize Hotel Room Occupancy Under Stochastic Demand
7 pages
كل القطع من يوم 30-10-2017 حتى 13-11-2017
No ratings yet
كل القطع من يوم 30-10-2017 حتى 13-11-2017
41 pages
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
No ratings yet
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
22 pages
Specification With SRS Process Supporting Document
No ratings yet
Specification With SRS Process Supporting Document
121 pages
Black, White, and Gray Minimalist Professional Graphic Designer Resume
No ratings yet
Black, White, and Gray Minimalist Professional Graphic Designer Resume
1 page
Scientific Software Project Charter
No ratings yet
Scientific Software Project Charter
13 pages
Operating System I Assignment # 3: CPU Schedulers Simulator
No ratings yet
Operating System I Assignment # 3: CPU Schedulers Simulator
5 pages
DSS - Quiz (1) October 2018: 1. The Inclusion of A Model Base and Its Management System
No ratings yet
DSS - Quiz (1) October 2018: 1. The Inclusion of A Model Base and Its Management System
1 page
Structured Modeling Based Methodology
No ratings yet
Structured Modeling Based Methodology
33 pages
Lecture - 03
No ratings yet
Lecture - 03
16 pages
GAMS Set Definitions and Data Management
No ratings yet
GAMS Set Definitions and Data Management
7 pages
GAMS Set Declaration and Definition Guide
No ratings yet
GAMS Set Declaration and Definition Guide
37 pages
Web Applications
No ratings yet
Web Applications
26 pages
Rabin Karp Algorithm of Pattern Matching (Goutam Padhy)
No ratings yet
Rabin Karp Algorithm of Pattern Matching (Goutam Padhy)
15 pages
Let Us C Fifth Edition
No ratings yet
Let Us C Fifth Edition
10 pages
Data Structure MCQ
No ratings yet
Data Structure MCQ
31 pages
Web Services & Mule ESB Guide
No ratings yet
Web Services & Mule ESB Guide
23 pages
Lilan Getachew AUD 22035
No ratings yet
Lilan Getachew AUD 22035
47 pages
Bison
No ratings yet
Bison
108 pages
Assignment 6
No ratings yet
Assignment 6
11 pages
Structured Programming, Dahl, Dijkstra, Hoare, Academic Press 1972
100% (7)
Structured Programming, Dahl, Dijkstra, Hoare, Academic Press 1972
234 pages
RPL Parameter Types and DODAG Overview
No ratings yet
RPL Parameter Types and DODAG Overview
14 pages
Designing GUI Using NetBeans
No ratings yet
Designing GUI Using NetBeans
3 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
38 pages
Hybris Solr Debugging and Indexing Tips
No ratings yet
Hybris Solr Debugging and Indexing Tips
7 pages
Android Shake Sensor Color Change
No ratings yet
Android Shake Sensor Color Change
2 pages
Writing Custom Control Surfaces For Ableton
No ratings yet
Writing Custom Control Surfaces For Ableton
5 pages
SQL Injection Basics and Prevention
No ratings yet
SQL Injection Basics and Prevention
67 pages
BNY Code Divas Challenge 2025
No ratings yet
BNY Code Divas Challenge 2025
1 page
Knapsack
No ratings yet
Knapsack
5 pages
Dbms Lab File
100% (1)
Dbms Lab File
53 pages
CS281 2023 Fall HW3
No ratings yet
CS281 2023 Fall HW3
2 pages
Coding Area: Roman Iteration
No ratings yet
Coding Area: Roman Iteration
3 pages
Student GPA
No ratings yet
Student GPA
2 pages
Chap 3: General OOP Concepts: Computer Science
No ratings yet
Chap 3: General OOP Concepts: Computer Science
3 pages
Spring Web Services Reference Documentation
No ratings yet
Spring Web Services Reference Documentation
10 pages
Django AJAX Requests Guide
No ratings yet
Django AJAX Requests Guide
10 pages
APT Detection with Sysmon & PowerShell
100% (1)
APT Detection with Sysmon & PowerShell
115 pages
HSYD100 1 Jul Dec2024 FA3 RR V.2 16052024
No ratings yet
HSYD100 1 Jul Dec2024 FA3 RR V.2 16052024
4 pages
Comprehensive Java Training Schedule
No ratings yet
Comprehensive Java Training Schedule
5 pages
Coal Paper PDF
0% (1)
Coal Paper PDF
3 pages
CGR Microproject Tirth
No ratings yet
CGR Microproject Tirth
16 pages
MCA Syallabus PDF
No ratings yet
MCA Syallabus PDF
136 pages