0% found this document useful (0 votes)

10 views4 pages

Reinforcement Learning - - Unit 6 - Week 3

The document pertains to the Week 3 assessment for the Reinforcement Learning course on NPTEL, detailing various questions related to the REINFORCE algorithm, policy updates, and Markov Decision Processes (MDPs). It includes multiple-choice questions that test understanding of concepts such as cumulative rewards, Gaussian distributions, and the implications of discount factors in reinforcement learning. The assessment was submitted on August 14, 2024, and allows for multiple submissions before the due date.

Uploaded by

somupurush

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views4 pages

Reinforcement Learning - - Unit 6 - Week 3

Uploaded by

somupurush

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

8/14/24, 4:12 PM Reinforcement Learning - - Unit 6 - Week 3

Assessment submitted.
(https://swayam.gov.in) (https://swayam.gov.in/nc_details/NPTEL)
X

[email protected] 

NPTEL (https://swayam.gov.in/explorer?ncCode=NPTEL) » Reinforcement Learning (course)


Click to register
for Certification
exam
Thank you for taking the Week 3:
Assignment 3.
(https://examform.nptel.ac.in/2024_10/exam_form/dashboard)

If already
registered, click
to check your
Week 3: Assignment 3
payment status Your last recorded submission was on 2024-08-14, 16:22 IST Due date: 2024-08-14, 23:59 IST.

1) The baseline in the REINFORCE update should not depend on which of the following 1 point
(without voiding any of the steps in the proof of REINFORCE)?
Course
outline rn−1

rn
About NPTEL
() Action taken(an )
None of the above
How does an
NPTEL online
2) Which of the following statements is true about the RL problem? 1 point
course work?
()
Our main aim is to maximize the cumulative reward.
The agent always performs the actions in a deterministic fashion.
Week 0 ()
We assume that the agent determines the next state based on the current state and action
Week 1 () It is impossible to have zero rewards.

Week 2 () 3) Let us say we are taking actions according to a Gaussian distribution with parameters 1 point
µ and σ . We update the parameters according to REINFORCE and at denote the action taken at
Week 3 () step t.
µt − at
Policy Search (i) µt+1 = µt + αr t
2
σ
(unit? t

unit=34&lesson (a t − µt ) 2 1
(ii) σ t+1 = σ t + αr t ( − )
=35) σ
3 σt
t

https://onlinecourses.nptel.ac.in/noc24_cs102/unit?unit=34&assessment=235 1/4
8/14/24, 4:12 PM Reinforcement Learning - - Unit 6 - Week 3

REINFORCE (a t − µt ) 2
Assessment submitted. (iii) σ t+1 = σ t + αr t
(unit? σ
3

X unit=34&lesson
t

at − µt
=36) (iv) µt+1 = µt + αr t
2
σ
t

Contextual
Bandits (unit? Which of the above updates are correct?
unit=34&lesson
=37) (i), (iii)
Full RL (i), (iv)
Introduction (ii), (iv)
(unit?
(ii), (iii)
unit=34&lesson
=38) 4) The update in REINFORCE is given by ∂ ln π(a t;θt ) 1 point
θt+1 = θt + αrt , where
∂θt
Returns, Value ∂ ln π(a t;θt )
rt is an unbiased estimator of the true gradient of the performance function. However,
Functions and ∂θt

MDPs (unit? there was another variant of REINFORCE, where a baseline b , that is independent of the action
unit=34&lesson taken, is subtracted from the obtained reward, i.e, the update is given by
=39) ∂ ln π(a t;θt ) ∂ ln π(a t;θt ) ∂ ln π(a t;θt )
θt+1 = θt + α(rt − b) . How are E[(rt − b) ] and E[rt ]
∂θt ∂θt ∂θt

Practice: Week related?

3 : Assignment
3(Non Graded)
(assessment? E[(rt − b)
∂ ln π(a t;θt )
] = E[rt
∂ ln π(a t;θt )
]
∂θt ∂θt
name=218)

Quiz: Week 3: E[(rt − b)

∂ ln π(a t;θt )
] < E[rt
∂ ln π(a t;θt )
]
∂θt ∂θt
Assignment 3
(assessment?
∂ ln π(a t;θt ) ∂ ln π(a t;θt )

name=235) E[(rt − b)
∂θt
> E[rt
∂θt
]

Week 3 Could be either of a, b or c, depending on the choice of baseline

Feedback Form
: Reinforcement 5) Consider the following policy-search algorithm for a multi-armed binary bandit: 1 point
Learning (unit?
unit=34&lesson
=236)

Week 4 ()
where is 1 if a = at and 0 otherwise. Which of the following is true for the above algorithm?

DOWNLOAD
VIDEOS () It is LR−I algorithm.

Text It is LR−ϵP algorithm.

Transcripts ()
It would work well if the best arm had probability of 0.9 of resulting in +1 reward and the next
NPTEL best arm had probability of 0.5 of resulting in +1 reward
Resources ()
It would work well if the best arm had probability of 0.3 of resulting in +1 reward and the worst
Problem arm had probability of 0.25 of resulting in +1 reward
Solving

https://onlinecourses.nptel.ac.in/noc24_cs102/unit?unit=34&assessment=235 2/4
8/14/24, 4:12 PM Reinforcement Learning - - Unit 6 - Week 3

Session - 6) Assertion: Contextual bandits can be modeled as a full reinforcement learning problem. 1 point
Assessment
July submitted.
2024 () Reason: We can define an MDP with n states where n is the number of bandits. The number of
X actions from each state corresponds to the arms in each bandit, with every action leading to
termination of the episode, and giving a reward according to the corresponding bandit and arm.

Assertion and Reason are both true and Reason is a correct explanation of Assertion
Assertion and Reason are both true and Reason is not a correct explanation of Assertion
Assertion is true and Reason is false
Both Assertion and Reason are false

7) Let’s assume for some full RL problem we are acting according to a policy π . At some 1 point
time t, we are in a state s where we took action a1 . After few time steps, at time t', the same state s
was reached where we performed an action a2 (≠ a1 ) . Which of the following statements is true?

π is definitely a Stationary policy

π is definitely a Non-Stationary policy

π can be Stationary or Non-Stationary

8) Stochastic gradient ascent/descent update occurs in the right direction at every step 1 point

True
False

9) Which of the following is true for an MDP? 1 point

P r (st+1 , rt+1 |st , at ) = P r (st+1 , rt+1 )

P r (st+1 , rt+1 |st , at , st−1 , at−1 , st−2 , at−2 , . . . , s0 , a0 ) = P r (st+1 , rt+1 |st , at )

P r (st+1 , rt+1 |st , at ) = P r (st+1 , rt+1 |s0 , a0 )

P r (st+1 , rt+1 |st , at ) = P r (st , rt |st−1 , at−1 )

10) Remember for discounted returns, 1 point

2
Gt = rt + γrt+1 + γ rt+2 +. . .

Where γ is a discount factor. Which of the following best explains what happens when γ > 1 , (say
γ = 5 )?

Nothing, γ > 1 is common for many RL problems

Theoretically nothing can go wrong, but this case does not represent any real world
problems

https://onlinecourses.nptel.ac.in/noc24_cs102/unit?unit=34&assessment=235 3/4
8/14/24, 4:12 PM Reinforcement Learning - - Unit 6 - Week 3

The agent will learn that delayed rewards will always be beneficial and so will not learn
Assessment submitted.
properly.
X
None of the above is true.

You may submit any number of times before the due date. The final submission will be considered
for grading.
Submit Answers

https://onlinecourses.nptel.ac.in/noc24_cs102/unit?unit=34&assessment=235 4/4

WhatsApp Security Whitepaper Preview - JPG
100% (4)
WhatsApp Security Whitepaper Preview - JPG
27 pages
Dynamic Lot Sizing-A1-Revised
No ratings yet
Dynamic Lot Sizing-A1-Revised
4 pages
Machine Learning With Python
100% (14)
Machine Learning With Python
692 pages
Reinforcement Learning - Week 12
No ratings yet
Reinforcement Learning - Week 12
3 pages
Assignment 3: Reinforcement Learning Prof. B. Ravindran
100% (1)
Assignment 3: Reinforcement Learning Prof. B. Ravindran
4 pages
Non-Graded: Assignment 1: (Https://swayam - Gov.in)
No ratings yet
Non-Graded: Assignment 1: (Https://swayam - Gov.in)
37 pages
Assignment 3- solution
No ratings yet
Assignment 3- solution
4 pages
rl
No ratings yet
rl
11 pages
2023 Week2 Lecture Before
No ratings yet
2023 Week2 Lecture Before
77 pages
2310.06147v1
No ratings yet
2310.06147v1
11 pages
Reinforcement Learning - Unit 6 - Week 4
No ratings yet
Reinforcement Learning - Unit 6 - Week 4
3 pages
RL Unit 1
100% (1)
RL Unit 1
26 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I
35 pages
Tri-Tue-Nhan-Tao - Nathan-Lambert - Lec13 - 6up-Reinforcement-Learning - (Cuuduongthancong - Com)
No ratings yet
Tri-Tue-Nhan-Tao - Nathan-Lambert - Lec13 - 6up-Reinforcement-Learning - (Cuuduongthancong - Com)
8 pages
Lecture13 Postclass
No ratings yet
Lecture13 Postclass
36 pages
Lecture 1: Introduction: Lecturer: Prof. Subrahmanya Swamy Peruru Scribe: Harshvardhan Arya - Rishabh Katiyar
No ratings yet
Lecture 1: Introduction: Lecturer: Prof. Subrahmanya Swamy Peruru Scribe: Harshvardhan Arya - Rishabh Katiyar
4 pages
5385 Representation Driven Reinforc
No ratings yet
5385 Representation Driven Reinforc
16 pages
Mid Term Report SoS (3)
No ratings yet
Mid Term Report SoS (3)
18 pages
16 RL
No ratings yet
16 RL
51 pages
F20-AI-L10
No ratings yet
F20-AI-L10
45 pages
ML Mod 6
No ratings yet
ML Mod 6
11 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
Solution3
No ratings yet
Solution3
4 pages
ML UNIT 5
No ratings yet
ML UNIT 5
13 pages
Question Bank 1
No ratings yet
Question Bank 1
2 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
5 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
Practice Assignment 6: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Practice Assignment 6: Reinforcement Learning Prof. B. Ravindran
24 pages
A (Long) Peek Into Reinforcement Learning _ Lil'Log
No ratings yet
A (Long) Peek Into Reinforcement Learning _ Lil'Log
23 pages
Lec17-ReinforcementLearning
No ratings yet
Lec17-ReinforcementLearning
58 pages
UNIT-3
No ratings yet
UNIT-3
29 pages
RL UNIT PPT
No ratings yet
RL UNIT PPT
595 pages
3. Reinforcement Learning (1)
No ratings yet
3. Reinforcement Learning (1)
28 pages
20CM1111
No ratings yet
20CM1111
3 pages
AI (IT) UNIT-5
No ratings yet
AI (IT) UNIT-5
43 pages
Delphic Offline Reinforcement
No ratings yet
Delphic Offline Reinforcement
29 pages
2
No ratings yet
2
23 pages
Reinforcement Learning Note
No ratings yet
Reinforcement Learning Note
16 pages
important questions
No ratings yet
important questions
3 pages
RL_Basics_1737166593
No ratings yet
RL_Basics_1737166593
30 pages
Active Learning For Reward Estimation in Inverse Reinforcement Learning
No ratings yet
Active Learning For Reward Estimation in Inverse Reinforcement Learning
16 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
Unit-5 Mla
No ratings yet
Unit-5 Mla
22 pages
Unit 3
No ratings yet
Unit 3
12 pages
Neural Networks: 1 October, 2016
No ratings yet
Neural Networks: 1 October, 2016
51 pages
rl-3
No ratings yet
rl-3
31 pages
DLMAIRIL01_Q4-2024_Session4
No ratings yet
DLMAIRIL01_Q4-2024_Session4
80 pages
Multi-armed bandits
No ratings yet
Multi-armed bandits
11 pages
Lec 10
No ratings yet
Lec 10
50 pages
CS 188 Introduction To Artificial Intelligence Summer 2019 Note 4
No ratings yet
CS 188 Introduction To Artificial Intelligence Summer 2019 Note 4
9 pages
rl-unit5
No ratings yet
rl-unit5
101 pages
Solutions To Reinforcement Learning by Sutton Chapter 4 r5
No ratings yet
Solutions To Reinforcement Learning by Sutton Chapter 4 r5
5 pages
UNIT 4 QP
No ratings yet
UNIT 4 QP
19 pages
Reinforcement-Learning-Cheatsheet
No ratings yet
Reinforcement-Learning-Cheatsheet
16 pages
Yang - Good Brief of RL
No ratings yet
Yang - Good Brief of RL
87 pages
RL Complete Unit-5
No ratings yet
RL Complete Unit-5
30 pages
Reinforcement Learning Exam
No ratings yet
Reinforcement Learning Exam
6 pages
RL-1
No ratings yet
RL-1
30 pages
SP14 CS188 Lecture 10 -- Reinforcement Learning I -Print
No ratings yet
SP14 CS188 Lecture 10 -- Reinforcement Learning I -Print
25 pages
RL-Endterm Report - Mridul Agarwal
No ratings yet
RL-Endterm Report - Mridul Agarwal
27 pages
Policy Gradient Methods
No ratings yet
Policy Gradient Methods
70 pages
Deep Learning
No ratings yet
Deep Learning
45 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Learning Activity Sheet in Mathematics: Steps in Solving Quadratic Equations by Factoring
No ratings yet
Learning Activity Sheet in Mathematics: Steps in Solving Quadratic Equations by Factoring
2 pages
Vacant Parking Slot Detection in The Around View
No ratings yet
Vacant Parking Slot Detection in The Around View
22 pages
Abaqus
No ratings yet
Abaqus
4 pages
Endsem NLP IMPORTANT QUESTIONS
No ratings yet
Endsem NLP IMPORTANT QUESTIONS
2 pages
Big M Method PDF
No ratings yet
Big M Method PDF
3 pages
Final Exam: CS 188 Spring 2019 Introduction To Artificial Intelligence
No ratings yet
Final Exam: CS 188 Spring 2019 Introduction To Artificial Intelligence
23 pages
Institute For Advanced Computing & Development Iacsd: Software
No ratings yet
Institute For Advanced Computing & Development Iacsd: Software
53 pages
Lecture Notes: Jay A. Farrell, Professor College of Engineering University of California, Riverside February 10, 2014
No ratings yet
Lecture Notes: Jay A. Farrell, Professor College of Engineering University of California, Riverside February 10, 2014
153 pages
Distributed Systems Question Bank
100% (1)
Distributed Systems Question Bank
2 pages
Synopsis New
No ratings yet
Synopsis New
5 pages
VLSI Physical Design: From Graph Partitioning To Timing Closure
No ratings yet
VLSI Physical Design: From Graph Partitioning To Timing Closure
30 pages
Hazards in Combinational Logic
No ratings yet
Hazards in Combinational Logic
6 pages
ME Math 8 Q2 0603 PS
No ratings yet
ME Math 8 Q2 0603 PS
27 pages
An Overview of Evolutionary Computation
No ratings yet
An Overview of Evolutionary Computation
18 pages
NuhaAlruwais 2023
No ratings yet
NuhaAlruwais 2023
27 pages
Affine Arithmetic-Based Methods for Uncertain Power System Analysis 1st edition - eBook PDF 2024 Scribd Download
100% (3)
Affine Arithmetic-Based Methods for Uncertain Power System Analysis 1st edition - eBook PDF 2024 Scribd Download
69 pages
Assignment
No ratings yet
Assignment
2 pages
Paraphrasing Tool For Hindi Text
No ratings yet
Paraphrasing Tool For Hindi Text
34 pages
Improved TSO Algorithm for Engineering Design Problems
No ratings yet
Improved TSO Algorithm for Engineering Design Problems
22 pages
5 Big M and Two-Phase Methods
No ratings yet
5 Big M and Two-Phase Methods
11 pages
Automatic Detection of Pneumonia On Compressed Sensing Images Using Deep Learning
No ratings yet
Automatic Detection of Pneumonia On Compressed Sensing Images Using Deep Learning
4 pages
Nikita Kangralkar Resume Finaljob PDF
No ratings yet
Nikita Kangralkar Resume Finaljob PDF
3 pages
A New Exemplar Based Image Completion Using Belief Propagation
No ratings yet
A New Exemplar Based Image Completion Using Belief Propagation
7 pages
Heat Equation PDF
No ratings yet
Heat Equation PDF
4 pages
CNS Assignment
No ratings yet
CNS Assignment
10 pages
Basic Programming Concept
No ratings yet
Basic Programming Concept
9 pages

Reinforcement Learning - - Unit 6 - Week 3

Uploaded by

Reinforcement Learning - - Unit 6 - Week 3

Uploaded by

8/14/24, 4:12 PM Reinforcement Learning - - Unit 6 - Week 3

NPTEL (https://swayam.gov.in/explorer?ncCode=NPTEL) » Reinforcement Learning (course)

Practice: Week related?

Quiz: Week 3: E[(rt − b)

Week 3 Could be either of a, b or c, depending on the choice of baseline

Text It is LR−ϵP algorithm.

π is definitely a Stationary policy

π is definitely a Non-Stationary policy

π can be Stationary or Non-Stationary

9) Which of the following is true for an MDP? 1 point

P r (st+1 , rt+1 |st , at ) = P r (st+1 , rt+1 )

P r (st+1 , rt+1 |st , at ) = P r (st+1 , rt+1 |s0 , a0 )

P r (st+1 , rt+1 |st , at ) = P r (st , rt |st−1 , at−1 )

10) Remember for discounted returns, 1 point

Nothing, γ > 1 is common for many RL problems

You might also like