Reinforcement Learning

Uploaded by

hiiamom04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Reinforcement Learning

Uploaded by

hiiamom04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Value Function 𝑽(𝒔)

RL Cheat Sheet • Long term value of state S

• State value function 𝑉(𝑠) of a MRP is expected
Definitions reward from state s
• State(S): Current Condition • 𝑉(𝑠) = 𝐸(𝐺𝑡 |𝑆𝑡 = 𝑠)
• Reward(R): Instant Return from environment to appraise the last action Action Value Function 𝒒(𝒔, 𝒂)
• Value(V): Expected Long-term reward with discount, as opposed to short-term reward R • 𝑞𝜋 (𝑠, 𝑎) = 𝐸𝜋 (𝐺𝑡 |𝑆𝑡 = 𝑠, 𝐴𝑡 = 𝑎)
• Action-Value(Q): This is similar to Value, except, it takes extra parameter, action A Bellmen Equation
• Policy(𝝅): Approach of agent to determine next action based on current state • 𝑉(𝑠) = 𝑅(𝑠) + 𝛾𝐸𝑠′ ∈𝑠 [𝑉(𝑠 ′ )]
• Exploitation is about using already known info to maximize rewards • 𝑉(𝑠) = 𝑅(𝑠) + 𝛾 ∑𝑠′ ∈𝑆 𝑃𝑠𝑠′ (𝑉(𝑠 ′ ))
• Exploration is about exploring and capturing more information • 𝑉 = 𝑅 + 𝛾𝑃𝑉 → 𝑉 = (𝐼 − 𝛾𝑃)−1 𝑅
Discount Factor (𝜸) Markov Process
• Varies between 0 to 1 • Consists of < 𝑠, 𝑝 > tuple where s are states and p is state transition matrix
• Closer to 0 → Agent tend to consider immediate reward • 𝑃𝑠𝑠′ = 𝑃(𝑠𝑡+1 = 𝑠 ′ |𝑠𝑡 = 𝑠)
• Closer to 1→ Agent tend to consider future reward with greater • 𝜇𝑡+1 = 𝑝𝑇 𝜇𝑡 where 𝜇𝑡 = [𝜇𝑡,1 … 𝜇𝑡,𝑛 ]
𝑇

Q-Learning
weight
4.Create reward matrix 𝑅 where 𝑅𝑠𝑎 = reward for taking action 𝑎 Markov Reward Process
in state 𝑠 and set 𝛾 parameter. • Consists of < 𝑠, 𝑝, 𝑅, 𝛾 > tuple where R is reward 𝛾 is discount
5.Initialize Q matrix to 0 • 𝑅 = 𝐸[𝑅𝑡+1 |𝑆𝑡 = 𝑆] = 𝑅(𝑠)
6.Set initial random state and assign this to current state • 𝐺𝑡 = 𝑅𝑡+1 + 𝛾𝑅𝑡+2 + ⋯ = ∑𝑘=0 𝛾 𝑘 𝑅𝑡+𝑘+1 is Total discounted reward
7.Select One among all possible actions of current state
a. Use this action to get new state Discounted Reward cons: Uncertainty may not be fully represented. Immediate
b. Get maximum Q value for this state based on all previous rewards values> delayed, Avoid ∞ rewards in cycle
actions
Markov Decision Process
c. Compute Q matrix using
• Consists of < 𝑠, 𝐴, 𝑝, 𝑅, 𝛾 > tuple where A is action
𝑄𝑠𝑎 = 𝑅𝑠𝑎 + 𝛾 × 𝑀𝑎𝑥[𝑄𝑠′ 𝑎′ ]
• 𝑃𝑠𝑠′ = 𝑃(𝑆𝑡+1 = 𝑆 ′ |𝑆𝑡 = 𝑆, 𝐴𝑡 = 𝑎)
∀ 𝑠 ′ 𝑎𝑐𝑐𝑒𝑠𝑠𝑖𝑏𝑙𝑒 𝑓𝑟𝑜𝑚 𝑠 Discounted Reward cons: Uncertainty may not be fully represented. Immediate
∀𝑎′ 𝑎𝑣𝑎𝑖𝑙𝑎𝑏𝑙𝑒 𝑖𝑛 𝑠 ′ rewards values> delayed, Avoid ∞ rewards in cycle
8.Repeat 4 until current state=goal state

Policy
Monte Carlo Policy Evaluation • 𝜋(𝑎|𝑠) = 𝑃(𝑎𝑡 = 𝑎|𝑠𝑡 = 𝑠)
1.To evaluate Value(s) 𝑉𝜋 (𝑆) • Either deterministic or stochastics. In deterministic P=1 for one 𝑎𝑡
2.At any time step t when state s is visited in an episode • 𝑃𝜋 (𝑠 ′ |𝑠) = ∑𝜋(𝑎|𝑠) × 𝑃(𝑠 ′ |𝑠, 𝑎) for stochastic process.
a. Increment Counter N(s)<- N(s)+1
• One step expected reward 𝑟𝜋 = ∑𝑎 𝜋(𝑎|𝑠)𝑟(𝑠, 𝑎)
b. Increment total return S(s) <- S(s)+𝐺𝑡
• For rewards as function of transition states
3.Value estimated is mean 𝑉(𝑠) = 𝑆(𝑠)/𝑁(𝑠)
𝑟𝜋 = ∑ 𝜋(𝑎|𝑠) ∑ 𝜋(𝑎|𝑠) × ∑ 𝑃(𝑠 ′ |𝑠, 𝑎) × 𝑟(𝑠, 𝑎, 𝑠 ′ )
Policy Gradients 𝑎 𝑠 𝑠′
• 𝑝𝜃 (𝑠1 , 𝑎1 , 𝑠2 , 𝑎2 … ) = 𝑝(𝑠1 ) × Π𝑡=1
𝑇
𝑝(𝑠𝑡+1 |𝑠𝑡 , 𝑎𝑡 )𝜋𝜃 (𝑠𝑡 , 𝑎𝑡 )
• Goal is to 𝜃 = arg max 𝐸𝜏~𝑃𝜃 (𝜏) [∑𝑡 𝑟(𝑠𝑡 , 𝑎𝑡 )] = arg max 𝐽(𝜃) Relation Between 𝑽𝝅 and 𝒒𝝅
∗
θ
1 • 𝑉𝜋 (𝑠) = ∑𝑎∈𝐴 𝜋(𝑎|𝑠𝑡 = 𝑠)𝑞𝜋 (𝑠, 𝑎)
• 𝐽(𝜃) = ∑𝑖 ∑𝑡 𝑟(𝑠𝑖 , 𝑎𝑖 )
𝑁 • 𝑉𝜋 (𝑠) = ∑𝑎∈𝐴 𝜋(𝑎|𝑠) × {𝑟(𝑠, 𝑎) + 𝛾 × ∑𝑠′ ∈𝑆 𝑃(𝑠 ′ |𝑠, 𝑎)𝑉𝜋 (𝑠)}
• ∇𝜃 𝑝𝜃 (𝜏) = 𝑝𝜃 (𝜏) × ∇𝜃 log 𝑝𝜃 (𝜏) • 𝑉𝜋 (𝑠) = 𝑟(𝑠) + 𝛾 ∑𝑎∈𝐴 𝜋(𝑎|𝑠)∑𝑝(𝑠 ′ |𝑠, 𝑎) × 𝑉𝜋 (𝑠 ′ )
1
• ∇𝜃 𝐽𝜃 = ∑𝑁 {∑ ∇
𝑡=1 𝜃 log 𝜋𝜃 (𝑎 |𝑠
𝑖,𝑡 𝑖,𝑡 ) ∑ 𝑡=1 𝑟(𝑠 , 𝑎
𝑖,𝑡 𝑖,𝑡 ) • 𝑞𝜋 (𝑠, 𝑎) = 𝑟(𝑠, 𝑎) + 𝛾 × ∑𝑠′ ∈𝑆 𝑃(𝑠 ′ |𝑠, 𝑎)𝑉𝜋 (𝑠)
𝑁 𝑖=1
• 𝜃 ← 𝜃 + ∇𝜃 𝐽(𝜃) • 𝑞𝜋 (𝑠, 𝑎) = 𝑟(𝑠, 𝑎) + 𝛾 × ∑𝑃(𝑠 ′ |𝑠, 𝑎){ ∑𝑎′ ∈𝐴 𝜋(𝑎′ |𝑠 ′ )𝑞𝜋 (𝑠 ′ , 𝑎′ ) }
• log 𝜋𝜃 (𝑎𝑖,𝑡 |𝑠𝑖,𝑡 ) is the log probability of action, defines how • 𝑞𝜋 (𝑠, 𝑎) = 𝑟(𝑠, 𝑎) + ∑𝑠∈𝑆 ′ 𝑝(𝑠 ′ |𝑠, 𝑎) ∑𝑎′ ∈𝐴 𝑞𝜋 (𝑠 ′ , 𝑎′ ) × 𝑃(𝑎′ |𝑠 ′ )
likely are we going to see 𝑎𝑖,𝑡 as action Optimality Condition
• 𝑉𝜋∗ (𝑠) = max 𝑉𝜋 (𝑠) ∀𝑠 ∈ 𝑆 similarly for 𝑞𝜋∗ (𝑠, 𝑎)
Actor Critic π
• Q-V = Advantage
• 𝑄𝜋 (𝑠𝑡 , 𝑎𝑡 ) = ∑𝑇𝑡′=𝑡 𝐸𝜋0 [𝑟(𝑠𝑡 , 𝑎𝑡 )|𝑠𝑡 , 𝑎𝑡 ] Reward of action 𝑎𝑡 in 𝑠𝑡
• 𝑉 𝜋 (𝑠𝑡 ) = 𝐸𝑎𝑡 ~𝜋𝜃 (𝑎𝑡|𝑠𝑡) [𝑄𝜋 (𝑠𝑡 , 𝑎𝑡 )] total reward from st
• 𝐴𝜋 (𝑠𝑡 , 𝑎𝑡 ) = 𝑄𝜋 (𝑠𝑡 , 𝑎𝑡 ) − 𝑉 𝜋 (𝑠𝑡 ) how much better 𝐴𝑡 is
1
• ∇𝜃 𝐽(𝜃) = ∑𝑁 𝑇 𝜋
𝑖=1 ∑𝑡=1 ∇𝜃 log 𝜋𝜃 (𝑎𝑖,𝑡 |𝑠𝑖,𝑡 ) 𝐴 (𝑠𝑡 , 𝑎𝑡 )
𝑁
• 𝑄𝜋 (𝑠𝑡 , 𝑎𝑡 ) = 𝑟(𝑠𝑡 , 𝑎𝑡 ) + ∑𝑇𝑡′=𝑡+1 𝐸𝜋𝜃 [𝑟(𝑠𝑡′ , 𝑎𝑡′ )|𝑠𝑡 , 𝑎𝑡 ]
= 𝑟(𝑠𝑡 , 𝑎𝑡 ) + 𝐸𝜋𝑡+1 ~𝑝(𝑠𝑡+1|𝑠𝑡,𝑎𝑡 ) [𝑉 𝜋 (𝑠𝑡+1 )]
• 𝐴 𝑡 , 𝑎𝑡 ) = 𝑟(𝑠𝑡 , 𝑎𝑡 ) + 𝑉 𝜋 (𝑠𝑡+1 ) − 𝑉 𝜋 (𝑠𝑡 )
𝜋 (𝑠

University of Central Punjab: Department of Computer Engineering Assignment 3 (Part 1)
50% (2)
University of Central Punjab: Department of Computer Engineering Assignment 3 (Part 1)
3 pages
Reinforcement Learning Cheat Sheet: Return
No ratings yet
Reinforcement Learning Cheat Sheet: Return
7 pages
Krejcie Morgan
100% (3)
Krejcie Morgan
4 pages
2
No ratings yet
2
23 pages
Reinforcement Learning: Amulya Viswambaran (202090007) Kehkashan Fatima (202090202) Sruthi Krishnan (202090333)
No ratings yet
Reinforcement Learning: Amulya Viswambaran (202090007) Kehkashan Fatima (202090202) Sruthi Krishnan (202090333)
40 pages
Machine Learning Unit4
No ratings yet
Machine Learning Unit4
21 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
M 2
No ratings yet
M 2
12 pages
Unit 5 Reinforcement Learning Notes
No ratings yet
Unit 5 Reinforcement Learning Notes
20 pages
Lecture 3 - MDPs and Dynamic Programming
No ratings yet
Lecture 3 - MDPs and Dynamic Programming
66 pages
ML Unit 4
No ratings yet
ML Unit 4
9 pages
Reinforcement Learning: Part I - Definitions
No ratings yet
Reinforcement Learning: Part I - Definitions
26 pages
UNIT-5 AI
No ratings yet
UNIT-5 AI
19 pages
2024 MDPs Part 1
No ratings yet
2024 MDPs Part 1
59 pages
Reinforcement-Learning-Cheatsheet
No ratings yet
Reinforcement-Learning-Cheatsheet
16 pages
02 MarkovDecisionProcess
No ratings yet
02 MarkovDecisionProcess
51 pages
کتاب هشتم بارگزاری شده
No ratings yet
کتاب هشتم بارگزاری شده
112 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
RL UNIT - II
No ratings yet
RL UNIT - II
20 pages
A Brief Introduction To Reinforcement Learning
No ratings yet
A Brief Introduction To Reinforcement Learning
4 pages
17 - Markov Decision Processes.pptx
No ratings yet
17 - Markov Decision Processes.pptx
59 pages
Finite Markov Decision Processes-BR
No ratings yet
Finite Markov Decision Processes-BR
31 pages
Lec17-ReinforcementLearning
No ratings yet
Lec17-ReinforcementLearning
58 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
lecture 9 Reiforcement learning (1)
No ratings yet
lecture 9 Reiforcement learning (1)
29 pages
Reinforcement learning lec12
No ratings yet
Reinforcement learning lec12
60 pages
Lec 09
No ratings yet
Lec 09
51 pages
DEEP RL - CONTENT BEYOND SYLLABUS
No ratings yet
DEEP RL - CONTENT BEYOND SYLLABUS
16 pages
New CZ3005 Module 4 - Markov Decision Process
No ratings yet
New CZ3005 Module 4 - Markov Decision Process
38 pages
MIT 6.036 Lecture
No ratings yet
MIT 6.036 Lecture
64 pages
Add-On DRL CS06
No ratings yet
Add-On DRL CS06
23 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
15 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
7 pages
Sections
No ratings yet
Sections
76 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
35 pages
lecture-06
No ratings yet
lecture-06
98 pages
Sp14 Cs188 Lecture 9 - Mdps II
No ratings yet
Sp14 Cs188 Lecture 9 - Mdps II
48 pages
Lecture3__InsideAnAgent
No ratings yet
Lecture3__InsideAnAgent
35 pages
Reinforcement Learning: Nguyen Do Van, PHD
No ratings yet
Reinforcement Learning: Nguyen Do Van, PHD
40 pages
Markov decision
No ratings yet
Markov decision
4 pages
Reinforcement Learning in A Nutshell
No ratings yet
Reinforcement Learning in A Nutshell
12 pages
A17 Complexdecisions
No ratings yet
A17 Complexdecisions
28 pages
Lecture_12_slides_-_after
No ratings yet
Lecture_12_slides_-_after
50 pages
(Partially Observable) Markov Decision Processes: Frederike Petzschner & Lionel Rigoux
No ratings yet
(Partially Observable) Markov Decision Processes: Frederike Petzschner & Lionel Rigoux
19 pages
cs229-notes12 Reinforcement in Control
No ratings yet
cs229-notes12 Reinforcement in Control
17 pages
L12 Markov Decision Processes
No ratings yet
L12 Markov Decision Processes
64 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
57 pages
Reinforcement Learning2A
No ratings yet
Reinforcement Learning2A
88 pages
MDP PDF
No ratings yet
MDP PDF
37 pages
5.4-Reinforcement Learning-Part1-Introduction
No ratings yet
5.4-Reinforcement Learning-Part1-Introduction
15 pages
22 Reinforcement Learning
No ratings yet
22 Reinforcement Learning
18 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
policy (RL IITH)
No ratings yet
policy (RL IITH)
46 pages
Unit 5 Notes
No ratings yet
Unit 5 Notes
20 pages
1.1 Discounted (Infinite-Horizon) Markov Decision Processes
No ratings yet
1.1 Discounted (Infinite-Horizon) Markov Decision Processes
26 pages
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
No ratings yet
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
74 pages
Value Functions & Bellman Equations: UNIT-3
No ratings yet
Value Functions & Bellman Equations: UNIT-3
11 pages
20AI903_RL_UNIT 2
No ratings yet
20AI903_RL_UNIT 2
27 pages
Reinforcement Learning Note
No ratings yet
Reinforcement Learning Note
16 pages
06 MDP
No ratings yet
06 MDP
89 pages
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
No ratings yet
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
40 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Lampiran 1. Surat Permohonan Kesediaan Menjadi Subjek Penelitian
No ratings yet
Lampiran 1. Surat Permohonan Kesediaan Menjadi Subjek Penelitian
9 pages
ﺺـــﺨﻠﻣ: Parenting Styles, Identity Styles and Academic Adjustment as Predictors of Academic Self-Efficacy Among Hashemite University Students
No ratings yet
ﺺـــﺨﻠﻣ: Parenting Styles, Identity Styles and Academic Adjustment as Predictors of Academic Self-Efficacy Among Hashemite University Students
23 pages
COMPLEX-VARIABLES-AND-STATISTICAL-METHODS-DEC-2022 (1)
No ratings yet
COMPLEX-VARIABLES-AND-STATISTICAL-METHODS-DEC-2022 (1)
2 pages
Binomial and Poisson Distribution
No ratings yet
Binomial and Poisson Distribution
26 pages
Hierarchical Regression
No ratings yet
Hierarchical Regression
3 pages
Animals 11 02088
No ratings yet
Animals 11 02088
8 pages
Chapter 5. Continuous Random Variables Practice and Homework Solutions
No ratings yet
Chapter 5. Continuous Random Variables Practice and Homework Solutions
7 pages
Exercise 1
0% (1)
Exercise 1
5 pages
Take Home Exam 1 30 Probability SIR GEORGE
No ratings yet
Take Home Exam 1 30 Probability SIR GEORGE
10 pages
ML Unit No.4 Naïve Bayes Classifiers PPT Notes
No ratings yet
ML Unit No.4 Naïve Bayes Classifiers PPT Notes
47 pages
Correlation and Regression Analysis Using SPSS: December 2019
No ratings yet
Correlation and Regression Analysis Using SPSS: December 2019
8 pages
It2302-Information Theory and Coding Unit - I
No ratings yet
It2302-Information Theory and Coding Unit - I
17 pages
The Normal Distribution
No ratings yet
The Normal Distribution
47 pages
Ch.3 - Ch.4 - Ch.5 Part II
No ratings yet
Ch.3 - Ch.4 - Ch.5 Part II
86 pages
Notes Estimation Theory
100% (3)
Notes Estimation Theory
39 pages
Common Confusions and Mistakes
No ratings yet
Common Confusions and Mistakes
8 pages
Lec 4 Cost Behavior
No ratings yet
Lec 4 Cost Behavior
37 pages
The Design and Statistical Analysis of Animal Experiments Bate S.T. pdf download
100% (1)
The Design and Statistical Analysis of Animal Experiments Bate S.T. pdf download
50 pages
AIML-UNIT-3
No ratings yet
AIML-UNIT-3
17 pages
Pairs Trading (Copulas)
No ratings yet
Pairs Trading (Copulas)
20 pages
Introduction to Friedmans Test Report
No ratings yet
Introduction to Friedmans Test Report
13 pages
Sikap Korupsi
100% (1)
Sikap Korupsi
5 pages
Sta 221 Assignment
No ratings yet
Sta 221 Assignment
2 pages
An Introduction to Multilevel Modeling Techniques MLM and SEM Approaches 4th Edition Ronald H. Heck All Chapters Instant Download
100% (9)
An Introduction to Multilevel Modeling Techniques MLM and SEM Approaches 4th Edition Ronald H. Heck All Chapters Instant Download
55 pages
GERIE Biostatic
No ratings yet
GERIE Biostatic
275 pages
2024 M1 Detailed Solution
No ratings yet
2024 M1 Detailed Solution
14 pages
Time Series Analysis
0% (1)
Time Series Analysis
2 pages
A Guide to Modern Econometrics 5th Edition Marno Verbeek download
100% (1)
A Guide to Modern Econometrics 5th Edition Marno Verbeek download
52 pages

Reinforcement Learning

Uploaded by

Reinforcement Learning

Uploaded by

Value Function 𝑽(𝒔)

RL Cheat Sheet • Long term value of state S

You might also like