Markov Decision Processes Overview

The document discusses Markov Decision Processes (MDPs), which provide a framework for sequential decision making in uncertain environments. An MDP is modeled as a 4-tuple of states, actions, transition probabilities, and rewards. Transition probabilities define the likelihood of moving between states based on actions. The goal is to find an optimal policy that maximizes expected rewards over time. MDPs can be used for problems involving uncertainty like robotics, resource allocation, and more. A policy maps each state to an action, and the value is the expected utility or reward when following that policy over time.

Uploaded by

Anonymous LI2DAcv

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

170 views14 pages

Markov Decision Processes Overview

Uploaded by

Anonymous LI2DAcv

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Markov Decision Processes (MDP)

Sudeshna Sarkar
Department of Computer Science & Engineering
IIT Kharagpur
6-7 Sep 2017
How would you get to the airport in the
least amount of time?
 Metro
 Uber
 Taxi
 Airport Express

2
Uncertainty in the real world
 Randomness shows up in many places.
 Could be caused by limitations of the sensors and actuators of the
robot
 Could be caused by market forces or nature, which we have no
control over.

 State s, action a
 State s1’
 State s2’
 …

 How can we hope to act optimally in the face of randomness?

 Certainly we can't just have a single deterministic plan, and
talking about a minimum cost path doesn't make sense.

3
Applications
 Robotics: decide where to move, but actuators can
fail, hit unseen obstacles, etc.

 Resource allocation: decide what to produce, don't

know the customer demand for various products

 Agriculture: decide what to plant, but don't know

weather and thus crop yield

4
Volcano crossing

5
Dice Game
For each round r = 1, 2, …
 You choose stay or quit.
 If quit, you get $10 and we end the game.
 If stay, you get $4 and then I roll a 6-sided dice.
 If the dice results in 1 or 2, we end the game.
 Otherwise, you continue to the next round.

6
MDP for Dice Game
For each round r = 1, 2, …
 You choose stay or quit.
 If quit, you get $10 and we end the game.
 If stay, you get $4 and then I roll a 6-sided dice.
 If the dice results in 1 or 2, we end the game.
 Otherwise, you continue to the next round.

7
MDP
Markov Decision Processes

Decision Theoretic Planning

Markov Property: The transition properties depend

only on the current state, not on previous history
(how that state was reached)

8
MDP Model
MDP Model <S, A, T, R>
Agent State set S
Action set A
State Reward Action Markov transition function
T(s,a,s’)=Pr(s’|s,a)
Environment
Bounded real-valued reward
function R(s)
• Can be generalized to include
a0 a1 a2 action costs: R(s,a)
s0 s1 s2 s3
r0 r1 r2 • Can be generalized to be a
stochastic function
Process:
• Observe state st in S
• Choose action at in At
• Receive immediate reward rt
• State changes to st+1
9
Similarities of MDP with Search?

10
Transitions
 The transition probabilities T(s, a, s’) specify the
probability of ending up in state s’ if taken action a in
state s.
s a s’ T(s,a,s’)
in quit end 1
in stay in 2/3
in stay end 1/3
 For each state s and action a:

෍ 𝑇 𝑠, 𝑎, 𝑠 ′ = 1
𝑠′∈𝑆
11 Successors: 𝑠′ such that 𝑇 𝑠, 𝑎, 𝑠 ′ > 0
Exercise: Transportation problem
 Street with blocks numbered 1 to n.
 Walking from s to s + 1 takes 1 minute.
 Taking a magic tram from s to 2s takes 2 minutes.
 How to travel from 1 to n in the least time?
 Tram fails with probability 0.5.

12
What is a solution?
 Search problem: path (sequence of actions)
 MDP: ??
 MDP: Policy
 A Policy 𝜋 is a mapping from each state s 2 States to
an action 𝑎 ∈ Actions(𝑠)

13
Evaluating a policy
 Following a policy yields a random path.
 The utility of a policy is the (discounted) sum of the
rewards on the path (this is a random quantity).
Path Utility
[in; stay, 4, end] 4
[in; stay, 4, in; stay, 4, in; stay, 4, end] 12
[in; stay, 4, in; stay, 4, end] 8
[in; stay, 4, in; stay, 4, in; stay, 4, in; stay, 4, end] 16
...
The value of a policy is the expected utility.

Markov Decision Process
No ratings yet
Markov Decision Process
8 pages
RL DQN PG
No ratings yet
RL DQN PG
65 pages
Ai (It) Unit-4
100% (1)
Ai (It) Unit-4
37 pages
Logistics: CSE 473 Markov Decision Processes
No ratings yet
Logistics: CSE 473 Markov Decision Processes
10 pages
L12 Markov Decision Processes
No ratings yet
L12 Markov Decision Processes
64 pages
Markov Decision Process
No ratings yet
Markov Decision Process
15 pages
Understanding Markov Decision Processes
No ratings yet
Understanding Markov Decision Processes
21 pages
Markov Decision Process
No ratings yet
Markov Decision Process
21 pages
DSA5102 Lecture11
No ratings yet
DSA5102 Lecture11
44 pages
Understanding The Markov Decision Process (MDP) - Built in
No ratings yet
Understanding The Markov Decision Process (MDP) - Built in
18 pages
Markov Decision Processes Explained
No ratings yet
Markov Decision Processes Explained
65 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
43 pages
Understanding Markov Decision Processes
No ratings yet
Understanding Markov Decision Processes
15 pages
ML Presentation Final
No ratings yet
ML Presentation Final
26 pages
Markov Decision
100% (3)
Markov Decision
212 pages
Markov Decision Processes & RL Techniques
No ratings yet
Markov Decision Processes & RL Techniques
40 pages
Introduction to Learning Paradigms
No ratings yet
Introduction to Learning Paradigms
77 pages
Markov Decision Processes Overview
No ratings yet
Markov Decision Processes Overview
19 pages
Markov Decision & RL Overview
No ratings yet
Markov Decision & RL Overview
39 pages
Markov Decision Processes Explained
No ratings yet
Markov Decision Processes Explained
27 pages
MDPs: Policies, Search & Utility
No ratings yet
MDPs: Policies, Search & Utility
13 pages
AIML Unit - 3 MDP New
No ratings yet
AIML Unit - 3 MDP New
30 pages
Introduction to Markov Decision Processes
No ratings yet
Introduction to Markov Decision Processes
13 pages
Introduction to Markov Decision Processes
No ratings yet
Introduction to Markov Decision Processes
29 pages
Markov Decision Processes Overview
No ratings yet
Markov Decision Processes Overview
56 pages
Lec6 MDPIntro
No ratings yet
Lec6 MDPIntro
24 pages
MDP Basics for AI Researchers
No ratings yet
MDP Basics for AI Researchers
22 pages
Lecture 3 - MDPs and Dynamic Programming
No ratings yet
Lecture 3 - MDPs and Dynamic Programming
62 pages
Introduction to Markov Decision Process
No ratings yet
Introduction to Markov Decision Process
36 pages
119686
No ratings yet
119686
24 pages
MDP Basics for AI Researchers
No ratings yet
MDP Basics for AI Researchers
23 pages
Non-Deterministic Search in MDPs
No ratings yet
Non-Deterministic Search in MDPs
11 pages
Understanding Markov Decision Processes in RL
No ratings yet
Understanding Markov Decision Processes in RL
26 pages
Understanding Markov Decision Processes
No ratings yet
Understanding Markov Decision Processes
2 pages
Markov Decision Processes Explained
No ratings yet
Markov Decision Processes Explained
70 pages
Markov Decision Processes Explained
No ratings yet
Markov Decision Processes Explained
28 pages
Markov Decision Processes Overview
No ratings yet
Markov Decision Processes Overview
12 pages
Lecture 3 - MDPs and Dynamic Programming
No ratings yet
Lecture 3 - MDPs and Dynamic Programming
66 pages
Nondeterministic Search in AI MDPs
No ratings yet
Nondeterministic Search in AI MDPs
11 pages
Slides
No ratings yet
Slides
10 pages
Understanding Markov Decision Processes
No ratings yet
Understanding Markov Decision Processes
24 pages
Robust Markov Decision Processes Overview
No ratings yet
Robust Markov Decision Processes Overview
29 pages
Artificial Intelligence and Intelligent Agents (F29AI) MDP I: Intro To Markov Decision Processes
No ratings yet
Artificial Intelligence and Intelligent Agents (F29AI) MDP I: Intro To Markov Decision Processes
10 pages
Solve MDPs with MDPtoolbox in Matlab
No ratings yet
Solve MDPs with MDPtoolbox in Matlab
9 pages
Understanding Regret and MDP Concepts
No ratings yet
Understanding Regret and MDP Concepts
29 pages
Understanding Markov Decision Processes
No ratings yet
Understanding Markov Decision Processes
89 pages
An Introduction To Reinforcement Learning From Theory To Algorithms (December 19, 2024) - Joon Kwon
No ratings yet
An Introduction To Reinforcement Learning From Theory To Algorithms (December 19, 2024) - Joon Kwon
66 pages
Markov Decision Processes Overview
No ratings yet
Markov Decision Processes Overview
34 pages
Understanding Markov Decision Processes
No ratings yet
Understanding Markov Decision Processes
35 pages
RL Unit-Ii
No ratings yet
RL Unit-Ii
14 pages
Markov Decision Processes Overview
100% (1)
Markov Decision Processes Overview
86 pages
Learning in Robotics with Reinforcement
No ratings yet
Learning in Robotics with Reinforcement
11 pages
CSE2530 Reinforcement Learning 2025 P1+2
No ratings yet
CSE2530 Reinforcement Learning 2025 P1+2
115 pages
Understanding Markov Decision Processes
No ratings yet
Understanding Markov Decision Processes
57 pages
MDPs and State Machines Overview
No ratings yet
MDPs and State Machines Overview
64 pages
MADPToolbox-0 3
No ratings yet
MADPToolbox-0 3
25 pages
Unit-4 of Ai
100% (1)
Unit-4 of Ai
9 pages
G11 Math FerrisWheel NL 052814
100% (1)
G11 Math FerrisWheel NL 052814
34 pages
Astm B777 15
No ratings yet
Astm B777 15
2 pages
p001 Corrosion Protection - r2 DTD 310524
No ratings yet
p001 Corrosion Protection - r2 DTD 310524
82 pages
Natural Vegetation
100% (2)
Natural Vegetation
9 pages
Renaissance medals from the Samuel H. Kress collection at the National Gallery of Art / based on the catalogue of renaissance medals in the Gustave Dreyfus collection by G.F. Hill ; rev. and enl. by Graham Pollard
100% (5)
Renaissance medals from the Samuel H. Kress collection at the National Gallery of Art / based on the catalogue of renaissance medals in the Gustave Dreyfus collection by G.F. Hill ; rev. and enl. by Graham Pollard
313 pages
CS301 (P) Lab 1
No ratings yet
CS301 (P) Lab 1
10 pages
Class X Revision Worksheet June 2025
No ratings yet
Class X Revision Worksheet June 2025
5 pages
Comprehensive Risk Management Plan
No ratings yet
Comprehensive Risk Management Plan
25 pages
Maths Bt3 HOt Midterm Exam 2023-2024
No ratings yet
Maths Bt3 HOt Midterm Exam 2023-2024
4 pages
Foodpanda - MM Final Report
75% (8)
Foodpanda - MM Final Report
36 pages
KADPOLY - Admission Letter
No ratings yet
KADPOLY - Admission Letter
2 pages
Victron RS Smart Solar 48/6000 Overview
No ratings yet
Victron RS Smart Solar 48/6000 Overview
2 pages
Civil Engineer's Professional Experience
100% (1)
Civil Engineer's Professional Experience
3 pages
Autumn 2024 IS Course Schedule
No ratings yet
Autumn 2024 IS Course Schedule
1 page
Basic SQL Database Management Course
No ratings yet
Basic SQL Database Management Course
80 pages
Starbucks: Global Expansion Overview
No ratings yet
Starbucks: Global Expansion Overview
25 pages
MATH 10 Day 3 4
No ratings yet
MATH 10 Day 3 4
7 pages
Winners Exam Schedules (New) - Ix Class
No ratings yet
Winners Exam Schedules (New) - Ix Class
1 page
Anatomy and Features of the Heart
No ratings yet
Anatomy and Features of the Heart
62 pages
Key Concepts in Artificial Intelligence
No ratings yet
Key Concepts in Artificial Intelligence
140 pages
SCW Curriculum Level IV
No ratings yet
SCW Curriculum Level IV
161 pages
6418-Article Text-18715-1-10-20240125
No ratings yet
6418-Article Text-18715-1-10-20240125
16 pages
Common Retaining Walls: Dr. Mohammed E. Haque, P.E. Retaining Walls
No ratings yet
Common Retaining Walls: Dr. Mohammed E. Haque, P.E. Retaining Walls
18 pages
Catalogo Simbox W lv10
No ratings yet
Catalogo Simbox W lv10
13 pages
Titanium Alloy Structural Analysis Report
No ratings yet
Titanium Alloy Structural Analysis Report
21 pages
Sample Article Review
No ratings yet
Sample Article Review
4 pages
Abas Ii
No ratings yet
Abas Ii
2 pages
Scott Hooks Resume
No ratings yet
Scott Hooks Resume
3 pages
Relativistic Force Transformation Derivations
No ratings yet
Relativistic Force Transformation Derivations
11 pages
2D Beam Structure Analysis
No ratings yet
2D Beam Structure Analysis
33 pages

Markov Decision Processes Overview

Uploaded by

Markov Decision Processes Overview

Uploaded by

Markov Decision Processes (MDP)

 How can we hope to act optimally in the face of randomness?

 Resource allocation: decide what to produce, don't

 Agriculture: decide what to plant, but don't know

Decision Theoretic Planning

Markov Property: The transition properties depend

You might also like