0% found this document useful (0 votes)

24 views

Assignment 5

Ai association

Uploaded by

ratnababu.m

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

Assignment 5

Ai association

Uploaded by

ratnababu.m

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Assignment 5

Reinforcement Learning
Prof. B. Ravindran

1. For a particular finite MDP with bounded rewards, let V be the space of bounded functions
on S, the state space of the MDP. Let Π be the set of all policies, and let vπ be the value
function corresponding to policy π where π ∈ Π. Is it true that vπ ∈ V , ∀π ∈ Π?

(a) no
(b) yes

2. In the proof of the value iteration theorem, we saw that Lv n+1 = Lπ v n+1 . Is it true, in general,
that for an arbitrary bounded function v, Lv = Lπ v (disregarding any special conditions that
may be existing in the aforementioned proof)?

(a) no
(b) yes

3. Continuing with the previous question, why is it the case that Lv n+1 = Lπ v n+1 in the proof
of the value iteration theorem?

(a) because the equality holds in general

(b) because v n+1 is the optimal value function
(c) because we are considering only deterministic policies choosing a max valued action in
each state
(d) because v n+1 is not a value function

4. Given that qπ (s, a) > vπ (s), we can conclude

(a) action a is the best action that can be taken in state s

(b) π may be an optimal policy
(c) π is not an optimal policy
(d) none of the above

5. Recall the problem described in the first question of the previous assignment. Use the MDP
formulation arrived at in that question and starting with policy π(laughing) = π(silent) =
(incense, no organ), perform a couple of policy iterations or value iterations (by hand!) until
you find an optimal policy (if you are taking a lot of iterations, stop and reconsider your
formulation!). What are the resulting optimal state-action values for all state-action pairs?

1
(a) q∗ (s, a) = 8, ∀a
(b) q∗ (s, a) = 10, ∀a
(c) q∗ (s, a∗ ) = 10, q∗ (s, a) = −10, ∀a 6= a∗
(d) q∗ (s, a∗ ) = 10, q∗ (s, a) = 8, ∀a 6= a∗

6. In the previous question, what does the state value function converge to for the policy we
started off with?

(a) vπ (laughing) = vπ (silent) = 10

(b) vπ (laughing) = 8, vπ (silent) = 10
(c) vπ (laughing) = −10, vπ (silent) = 10
(d) vπ (laughing) = −8, vπ (silent) = 10

7. In solving an episodic problem we observe that all trajectories from the start state to the goal
state pass through a particular state exactly twice. In such a scenario, is it preferable to use
first-visit or every-visit MC for evaluating the policy?

(a) first-visit MC
(b) every-visit MC
(c) every-visit MC with exploring starts
(d) neither, as there are issues with the problem itself

8. Which of the following are advantages of Monte Carlo methods over dynamic programming
techniques?

(a) the ability to learn from actual experience

(b) the ability to learn from simulated experience
(c) the ability to estimate the value of a single state independent of the number of states
(d) the ability to show guaranteed convergence to an optimal policy

9. For a specific MDP, suppose we have a policy that we want to evaluate through the use of
actual experience in the environment alone and using Monte Carlo methods. We decide to use
the first-visit approach along with the technique of always picking the start state at random
from the available set of states. Will this approach ensure complete evaluation of the action
value function corresponding to the policy?

(a) no
(b) yes

10. Assuming an MDP where there are n actions a ∈ A each of which is applicable in each state
s ∈ S, if π is an -soft policy for some > 0, then

(a) π(a|s) = , ∀a, s

(b) π(a|s) = n , ∀a, s
(c) π(a|s) >= n , ∀a, s
(d) π(a0 |s) = 1 − + n , π(a|s) =
n, ∀a 6= a0 , ∀s

6618 Generative - AI - E0 - 77094
100% (5)
6618 Generative - AI - E0 - 77094
7 pages
GENAI - Assessment ID 77961
100% (4)
GENAI - Assessment ID 77961
7 pages
Process Improvement (PI) Techniques MCQs Quiz With Answers by Genesis Mudarra
75% (12)
Process Improvement (PI) Techniques MCQs Quiz With Answers by Genesis Mudarra
6 pages
Data Analytics Using Python
100% (1)
Data Analytics Using Python
982 pages
Web Technology Notes
87% (46)
Web Technology Notes
73 pages
S. Rajasekaran - Neural Networks, Fuzzy Logic and Genetic Algorithms-PHI Learning Private Limited (2004)
57% (7)
S. Rajasekaran - Neural Networks, Fuzzy Logic and Genetic Algorithms-PHI Learning Private Limited (2004)
965 pages
AI Full Notes
50% (2)
AI Full Notes
81 pages
Deep Learning MCQ
90% (73)
Deep Learning MCQ
34 pages
DL Notes 1 5 Deep Learning
100% (1)
DL Notes 1 5 Deep Learning
189 pages
Python Programming Lecture Notes
89% (9)
Python Programming Lecture Notes
116 pages
Python Handwritten Notes (Original Images)
83% (6)
Python Handwritten Notes (Original Images)
186 pages
Python Machine Learning For Beginners Ebook Final
100% (11)
Python Machine Learning For Beginners Ebook Final
305 pages
Data Visualization in Python Preview PDF
100% (8)
Data Visualization in Python Preview PDF
58 pages
Solved Exercise Boolean Algebra
80% (10)
Solved Exercise Boolean Algebra
35 pages
Sample 2 (Web) Rise Paper Application - 2023-2024
No ratings yet
Sample 2 (Web) Rise Paper Application - 2023-2024
9 pages
Assignment 1: Reinforcement Learning Prof. B. Ravindran
100% (2)
Assignment 1: Reinforcement Learning Prof. B. Ravindran
4 pages
Problem 1: Markov Reward Process
No ratings yet
Problem 1: Markov Reward Process
3 pages
AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions
0% (1)
AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions
4 pages
CS 188 Fall 2018 Written HW4 Soln
No ratings yet
CS 188 Fall 2018 Written HW4 Soln
6 pages
Machine Learning Masterclass
100% (11)
Machine Learning Masterclass
108 pages
Assignment 5 (Sol.) : Reinforcement Learning
100% (1)
Assignment 5 (Sol.) : Reinforcement Learning
4 pages
Practice Assignment 5: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Practice Assignment 5: Reinforcement Learning Prof. B. Ravindran
2 pages
Solution3
No ratings yet
Solution3
4 pages
Assignment 4: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Assignment 4: Reinforcement Learning Prof. B. Ravindran
4 pages
Rl Exam Tutti
No ratings yet
Rl Exam Tutti
47 pages
Practice Assignment 4: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Practice Assignment 4: Reinforcement Learning Prof. B. Ravindran
2 pages
Assignment 6 (Sol.) : Reinforcement Learning
No ratings yet
Assignment 6 (Sol.) : Reinforcement Learning
4 pages
Instructor (Andrew NG) :okay, Good Morning. Welcome Back. So I Hope All of You Had
No ratings yet
Instructor (Andrew NG) :okay, Good Morning. Welcome Back. So I Hope All of You Had
14 pages
DRL_Homework_1
No ratings yet
DRL_Homework_1
4 pages
RL_2021_22_Exam_I_63163060c243ad69c552d008b899be82
No ratings yet
RL_2021_22_Exam_I_63163060c243ad69c552d008b899be82
4 pages
CS229
No ratings yet
CS229
17 pages
242 Sheet 02 02
No ratings yet
242 Sheet 02 02
6 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
15 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
7 pages
Reinforcement Learning - Unit 6 - Week 4
No ratings yet
Reinforcement Learning - Unit 6 - Week 4
3 pages
A12 Spring2024
No ratings yet
A12 Spring2024
5 pages
Littomore
No ratings yet
Littomore
169 pages
E0_270_RL
No ratings yet
E0_270_RL
10 pages
Quiz2_sol
No ratings yet
Quiz2_sol
4 pages
Exam Prep 4 Solutions: Q1. MDPS: Dice Bonanza
No ratings yet
Exam Prep 4 Solutions: Q1. MDPS: Dice Bonanza
4 pages
lec12
No ratings yet
lec12
60 pages
Solution8
No ratings yet
Solution8
3 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
43 pages
Solution to Assignment_4_Dynamic Programming
No ratings yet
Solution to Assignment_4_Dynamic Programming
11 pages
cs229-notes12 Reinforcement in Control
No ratings yet
cs229-notes12 Reinforcement in Control
17 pages
AI 3000 / CS 5500: Reinforcement Learning Assignment 1: Problem 1: Markov Reward Process
No ratings yet
AI 3000 / CS 5500: Reinforcement Learning Assignment 1: Problem 1: Markov Reward Process
5 pages
Reinforcement Learning Exam
No ratings yet
Reinforcement Learning Exam
6 pages
Tut21 RL
No ratings yet
Tut21 RL
101 pages
Notações Dos Algoritimos
No ratings yet
Notações Dos Algoritimos
10 pages
Assignment 4
No ratings yet
Assignment 4
6 pages
RL_Practice_Midterm
No ratings yet
RL_Practice_Midterm
4 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
101 pages
EE290 Lecture 16
No ratings yet
EE290 Lecture 16
4 pages
Reinforcement Learning Note
No ratings yet
Reinforcement Learning Note
16 pages
A10
No ratings yet
A10
4 pages
Homework 1: ELEN E6885: Introduction To Reinforcement Learning September 21, 2021
No ratings yet
Homework 1: ELEN E6885: Introduction To Reinforcement Learning September 21, 2021
8 pages
Reinforcement Learning in A Nutshell
No ratings yet
Reinforcement Learning in A Nutshell
12 pages
Lecture4 Model Free Prediction
No ratings yet
Lecture4 Model Free Prediction
34 pages
q2B Review
No ratings yet
q2B Review
9 pages
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
No ratings yet
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
15 pages
Unit 05 Dynamic Programming
No ratings yet
Unit 05 Dynamic Programming
9 pages
Solution 9
No ratings yet
Solution 9
3 pages
242 Sheet 02 03
No ratings yet
242 Sheet 02 03
5 pages
q2B Review Sol
No ratings yet
q2B Review Sol
14 pages
CSE2530__Reinforcement_Learning__2025_P1+2
No ratings yet
CSE2530__Reinforcement_Learning__2025_P1+2
115 pages
tut_RL-1
No ratings yet
tut_RL-1
2 pages
Bits
No ratings yet
Bits
5 pages
Assignment 7 (Sol.) : Reinforcement Learning
0% (1)
Assignment 7 (Sol.) : Reinforcement Learning
3 pages
Week 10
No ratings yet
Week 10
5 pages
Reinforcement Learning - - Unit 13 - Week 10
No ratings yet
Reinforcement Learning - - Unit 13 - Week 10
3 pages
RL UNIT-4
No ratings yet
RL UNIT-4
18 pages
Reinforcement-Learning-Cheatsheet
No ratings yet
Reinforcement-Learning-Cheatsheet
16 pages
RL_Paper_Deepsk
No ratings yet
RL_Paper_Deepsk
4 pages
RL UNIT - II
No ratings yet
RL UNIT - II
20 pages
Markov Decision Processes and Exact Solution Methods
No ratings yet
Markov Decision Processes and Exact Solution Methods
34 pages
Markov decision
No ratings yet
Markov decision
4 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Artificial Intelligence - Knowledge Representation and Reasoning - Unit 7 - Week 4 - Models
No ratings yet
Artificial Intelligence - Knowledge Representation and Reasoning - Unit 7 - Week 4 - Models
4 pages
Python For Data Science PDF
100% (8)
Python For Data Science PDF
30 pages
Operating System Quantum
No ratings yet
Operating System Quantum
85 pages
Full Course of Machine Learning
100% (16)
Full Course of Machine Learning
660 pages
Machine Learning Notes - TutorialsDuniya
100% (1)
Machine Learning Notes - TutorialsDuniya
58 pages
Lab 13 Gas Laws
No ratings yet
Lab 13 Gas Laws
15 pages
Science - Grade 7
No ratings yet
Science - Grade 7
10 pages
Whats New
100% (1)
Whats New
130 pages
Ai Prelim Q2
No ratings yet
Ai Prelim Q2
8 pages
Eng - Lit - Marking Scheme For Assessment 1 Paper
No ratings yet
Eng - Lit - Marking Scheme For Assessment 1 Paper
3 pages
(1+1) D Calculation Provides Evidence That Quantum Entanglement Survives A Firewall
No ratings yet
(1+1) D Calculation Provides Evidence That Quantum Entanglement Survives A Firewall
5 pages
1130758743
No ratings yet
1130758743
10 pages
Absolute Obstetric Anesthesia Review The Complete Study Guide for Certification and Recertification Free Download
100% (14)
Absolute Obstetric Anesthesia Review The Complete Study Guide for Certification and Recertification Free Download
16 pages
Education and Training
No ratings yet
Education and Training
5 pages
Whole Brain Lesson Plan 1St Quarter: I. Contents
No ratings yet
Whole Brain Lesson Plan 1St Quarter: I. Contents
3 pages
Controlling Rail Potential of DC Supplied Rail Traction Systems PDF
100% (1)
Controlling Rail Potential of DC Supplied Rail Traction Systems PDF
10 pages
Worksheet - Experiment 4 Color Reactions of Proteins
No ratings yet
Worksheet - Experiment 4 Color Reactions of Proteins
3 pages
SI 47-01 Amdt
No ratings yet
SI 47-01 Amdt
20 pages
IT Bank Questions
No ratings yet
IT Bank Questions
22 pages
2021 EARTH AND LIFE - Flexible Evaluation Mechanism
No ratings yet
2021 EARTH AND LIFE - Flexible Evaluation Mechanism
2 pages
SAMPLE Mastercam X9 Handbook Volume 2
No ratings yet
SAMPLE Mastercam X9 Handbook Volume 2
36 pages
The Eight Essential Supply Chain Management Processes2004
No ratings yet
The Eight Essential Supply Chain Management Processes2004
10 pages
Analysis of Organic Compounds
No ratings yet
Analysis of Organic Compounds
23 pages
10 Requirements Engineering 2024 1.2
No ratings yet
10 Requirements Engineering 2024 1.2
54 pages
2025-01-23 VMware Horizon Pricing, Packaging, and Licensing - Licenseware
No ratings yet
2025-01-23 VMware Horizon Pricing, Packaging, and Licensing - Licenseware
4 pages
LAI Report PT KAI 31 Des 2017 PDF
No ratings yet
LAI Report PT KAI 31 Des 2017 PDF
136 pages
Conversion Functional Shift Root Creation: English Linguistics
No ratings yet
Conversion Functional Shift Root Creation: English Linguistics
11 pages
Failure Theory in Metals
No ratings yet
Failure Theory in Metals
16 pages
Idiomatic Expressions in Bayambang, Pangasinan
No ratings yet
Idiomatic Expressions in Bayambang, Pangasinan
13 pages
Sewing Silver Spark
No ratings yet
Sewing Silver Spark
15 pages
Boi Form 501 Infrastructure and Services Industries
No ratings yet
Boi Form 501 Infrastructure and Services Industries
8 pages
Categorizations of Behavioral Biases
No ratings yet
Categorizations of Behavioral Biases
9 pages
System Rehabilitation For NRW Reduction in South Part of Colombo City
No ratings yet
System Rehabilitation For NRW Reduction in South Part of Colombo City
1 page
Standard Chartered: Credit Cards
No ratings yet
Standard Chartered: Credit Cards
26 pages

Assignment 5

Uploaded by

Assignment 5

Uploaded by

Assignment 5

(a) because the equality holds in general

4. Given that qπ (s, a) > vπ (s), we can conclude

(a) action a is the best action that can be taken in state s

(a) vπ (laughing) = vπ (silent) = 10

(a) the ability to learn from actual experience

(a) π(a|s) = , ∀a, s

You might also like

(a) π(a|s) = , ∀a, s