0% found this document useful (0 votes)

22 views61 pages

14 RA MIRI MarkovChains

The document provides an introduction to Markov Chains, defining them as a type of stochastic process characterized by a finite set of states and memoryless transitions. It uses the example of Gambler's Ruin to illustrate the concepts, where a player bets until they either lose all their money or reach a target amount. The document also discusses the transition probability matrix and its applications in various fields such as computer science and machine learning.

Uploaded by

Berilhes Borges Garcia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views61 pages

14 RA MIRI MarkovChains

Uploaded by

Berilhes Borges Garcia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

Markov Chains: Introduction

Josep Díaz Maria J. Serna Conrado Martínez

U. Politècnica de Catalunya

RA-MIRI 2024–2025
Stochastic Process

A stochastic process is a sequence of random variables

{Xt }n
t=0 .
Usually the subindex t refers to time steps and if t ∈ N, the
stochastic process is said to be discrete.
The random variable Xt is called the state at time t.
If n < ∞ the process is said to be finite, otherwise it is said
infinite.
A stochastic process is used as a model to study the
probability of events associated to a random phenomena.
An example: Gambler’s Ruin

Model used to evaluate insurance risks.

One goal is finding the probability of winning i.e. getting

the 1000e.
Notice in this process, once we get 0e or 1000e, the process
stops.
An example: Gambler’s Ruin

Model used to evaluate insurance risks.

One goal is finding the probability of winning i.e. getting

the 1000e.
Notice in this process, once we get 0e or 1000e, the process
stops.
An example: Gambler’s Ruin

Model used to evaluate insurance risks.

One goal is finding the probability of winning i.e. getting

the 1000e.
Notice in this process, once we get 0e or 1000e, the process
stops.
Markov Chain

One simple model of stochastic process is the Markov Chain:

Markov Chains are defined on a finite set of states (S),

where at time t, Xt could be any state in S, together with
by the matrix of transition probability for going from each
state in S to any other state in S, including the case that the
state Xt remains the same at t + 1.

In a Markov Chain, at any given time t, the state Xt is

determined only by Xt−1 .
memoryless: does not remember the history of past
events,
Other memoryless stochastic processes are said to be
Markovian.
An example: Gambler’s Ruin

You place bets of 1e. With probability p, you gain 1e, and
with probability q = 1 − p you loose your 1e bet.
You start with an initial amount of 100e.
You keep playing until you loose all your money or you
arrive to have 1000e.
We have a state for each possible amount of money you
can accumulate S = {0, 1, . . . , 1000}.
The probability of losing/winning is independent on the
state and the time, so this process is a Markov chain.
Observe that the number of states is finite.
An example: Gambler’s Ruin

One of the simplest forms of stochastic dynamics.

Allows to model stochastic temporal dependencies
Applications in many areas
Surfing the web
Design of randomizes algorithms
Random walks
Machine Learning (Markov Decision Processes)
Computer Vision (Markov Random Fields)
etc. etc.
Formal definition of Markov Chains

Definition
A finite, time-discrete Markov Chain, with finite state S =
{1, 2, . . . , k} is a stochastic process {Xt } s.t. for all i, j ∈
S, and for all t > 0,

P[Xt+1 = j | X0 = i0 , X1 = i1 , . . . , Xt = i] = P[Xt+1 = j | Xt = i] .

We can abstract the time and consider only the probability of

moving from state i to state j, as P[Xt+1 = j | Xt = i]
Formal definition of Markov Chains

Definition
A finite, time-discrete Markov Chain, with finite state S =
{1, 2, . . . , k} is a stochastic process {Xt } s.t. for all i, j ∈
S, and for all t > 0,

P[Xt+1 = j | X0 = i0 , X1 = i1 , . . . , Xt = i] = P[Xt+1 = j | Xt = i] .

We can abstract the time and consider only the probability of

moving from state i to state j, as P[Xt+1 = j | Xt = i]
MC: Transition probability matrix

For v, u ∈ S, let pu,v be the probability of going from u v in 1

step i.e. pu,v = P[Xs+1 = v | Xs = u].
P = (pu,v )u,v∈S is a matrix describing the transition
probabilities of the MC
P is called the transition matrix
P also defines digraph, possibly with loops.

A
2/3 1/3
1/2 1/2
1/2
B 1/2 C
MC: Transition probability matrix

For v, u ∈ S, let pu,v be the probability of going from u v in 1

step i.e. pu,v = P[Xs+1 = v | Xs = u].
P = (pu,v )u,v∈S is a matrix describing the transition
probabilities of the MC
P is called the transition matrix
P also defines digraph, possibly with loops.

A
2/3 1/3
1/2 1/2
1/2
B 1/2 C
MC: Transition probability matrix

For v, u ∈ S, let pu,v be the probability of going from u v in 1

step i.e. pu,v = P[Xs+1 = v | Xs = u].
P = (pu,v )u,v∈S is a matrix describing the transition
probabilities of the MC
P is called the transition matrix
P also defines digraph, possibly with loops.

A
2/3 1/3
1/2 1/2
1/2
B 1/2 C
MC: Transition probability matrix

For v, u ∈ S, let pu,v be the probability of going from u v in 1

step i.e. pu,v = P[Xs+1 = v | Xs = u].
P = (pu,v )u,v∈S is a matrix describing the transition
probabilities of the MC
P is called the transition matrix
P also defines digraph, possibly with loops.

A
2/3 1/3
1/2 1/2
1/2
B 1/2 C
Gambler’s Ruin: MC digraph

You place bets of 1e. With probability p, you gain 1e, and
with probability q = 1 − p you loose your 1e bet.
You start with an initial amount of i e and keep playing until
you loose all your money or you arrive to have n e.
We have a state for each possible amount of money you
can accumulate S = {0, 1, . . . , n}.
1 q q q q 1
q p
0 1 i−1 i i+1 n−1 n
p p p p
Initial State

Absorbing States
Gambler’s Ruin: MC digraph

Absorbing States
Transition matrix: Example

A B C
A A 0 2/3 1/3
2/3 1/3
1/2 1/2 B 1/2 0 1/2 =P
1/2
B 1/2 C C 1/2 0 1/2

Notice the entry (u, v) in P denotes the probability of going from

u → v in one step.
Notice, in a MC the transition matrix is stochastic, so sum of
transitions out of any state must be 1 = sum of the elements of
any row of the transition matrix must be 1
Longer transition probabilities

(t)
For u, v ∈ S, let pu,v be the probability of going from u v in
(t)
exactly t steps i.e. pu,v = P[Xs+t = v | Xs = u].
(t)
Formally for s > 0 and t > 1, pu,v = P[Xs+t = v | Xs = u].
(1)
Notice that pu,v = pu,v ; we shall use P(t) for the matrix whose
(t)
entries are the values pu,v , and P(1) = P.
How can we relate P(t) with P?
The powers of the transition matrix

A B C
A A 0 2/3 1/3
2/3 1/3
1/2 1/2 B 1/2 0 1/2 =P
1/2
B 1/2 C C 1/2 0 1/2

(1)
In ex. P[X1 = C|X0 = A] = PA,C = 1/3.
(1) (1) (1) (1) (2)
P[X2 = C|X0 = A] = PAB PBC + PAC PCC = 1/3 + 1/6 = PA,C

In general, assume a MC with k states and transition matrix P,

let u, v ∈ S:
• What is the P[X1 = v|X0 = u], i.e. = pu,v ?
(2)
• What is the P[X2 = v|X0 = u] = pu,v ?
The powers of the transition matrix

Use Law Total Probability+ Markov property:

Lemma
Given the transition matrix P of a MC, then for any t >
1,
P(t) = P(t−1) · P
With the convention P(0) = I (the identity matrix), we
have
P(t) = Pt ,
for any t > 0.
Distributions at time t

To fix the initial state, we consider a random variable X0 ,

assigning to S an initial distribution π0 , which is a row vector
indicating at t = 0 the probability of being in the corresponding
state.
For example, in the MC:
A
2/3 1/3
1/2 1/2
1/2
B 1/2 C

we may consider,

A B C
(0 0.4 0.6) = π0
Distributions at time t

Starting with an initial distribution π0 , we can compute the state

distribution πt (on S) at time t,
For a state v,

πt [v] = P[Xt = v]
X
= P[X0 = u] P[Xt = v|X0 = u]
u∈S
X (t)
= π0 [u]Pu,v .
u∈S

where πt [y] is the probability that the system is in state y at

time t.
Therefore, πt = π0 Pt and πs+t = πs Pt .
Gambler’s Ruin: Exercise

Which is the initial distribution π0 ?

And, the state distribution at time t = 3?
Gambler’s Ruin: Exercise

Which is the initial distribution π0 ?

And, the state distribution at time t = 3?
Example MC: Writing a research paper

Recall that Markov Chains are given either by a weighted

digraph, where the edge weights are the transition probabilities,
or by the |S| × |S| transition probability matrix P,

Example: Writing a paper S = {k, w, e, s}

0.5 0.5
0.3
k w e s
Think 0.2 Write k 0.5 0.3 0 0.2
0.2 0.2 w 0.2 0.5 0.1 0.2
0.2 0.3 e 0.1 0.3 0.3 0.3
 
0.1
0.1 s 0 0.2 0.3 0.5
Surf 0.3 e−mail
0.3
0.5 0.3
More on the Markovian property

Notice the memoryless property does not mean that Xt+1 is

independent from X0 , X1 , . . . , Xt−1 .
(For instance notice that intuitively we have:
P[Thinking at t + 1] < P[Thinking at t | Thinking at t − 1]).
But, the dependencies of Xt on X0 , . . . , Xt−1 , are all captured
by Xt−1 .

0.5 0.5
0.3

Think 0.2 Write

0.2 0.2 0.2

0.1 0.3
0.1
Surf 0.3 e−mail
0.3
0.5 0.3
Example of writing a paper

P[X2 = k|X0 = s] is the probability that, at t = 2, we are in state

k (think), starting in state s (surf).

    
0.5 0.3 0 0.2 0.5 0.3 0 0.2 0.31 0.34 0.09 0.26
0.2 0.5 0.1 0.2 0.2 0.5 0.1 0.2 = 0.21 0.38 0.14 0.27
 
 
0.1 0.3 0.3 0.3 0.1 0.3 0.3 0.3 0.14 0.33 0.21 0.32
0 0.2 0.3 0.5 0 0.2 0.3 0.5 0.07 0.29 0.26 0.38

P[X2 = k|X0 = s] = 0.07.

Distribution on states

Recall πt is the prob. distribution at time t over S.

For our example of writing a paper, if t = 0 (after waking up):
k w e s
π0 = (0.2 0 0.3 0.5)

 
0.5 0.3 0 0.2
0.2 0.5 0.1 0.2
0.2 0 0.3 0.5 0.1
 = 0.13 0.25 0.24 0.38 = π1
0.3 0.3 0.3
0 0.2 0.3 0.5

Therefore, we have πt = π0 × Pt and πk+t = πk × Pt

Notice πt = (πt [r], πt [w], πt [e], πt [s])
An Example of MC analysis: The 2-SAT problem

Section 7.1 of [MU].

Given a Boolean formula φ, on
• a set X of n Boolean variables,
• defined by m clauses C1 , . . . Cm , where each clause is the
disjunction of exactly 2 literals, (xi or x̄i ), on different variables.
• φ = conjunction of the m clauses.
The 2-SAT problem is to find an assignment A∗ : X → {0, 1},
which satisfies φ,
i.e, to find an A∗ s.t. A∗ (φ) = 1.
Notice that if |X| = n, then m 6 2n 2

2 = O(n ).
In general k-SAT∈ NP-complete, for k > 3. But 2-SAT∈ P.
A randomized algorithm for 2-SAT

Given a n variable 2-SAT formula φ, {Cj }mj=1

for 1 6 i 6 n do
A(xi ) := 1
end for
t := 0
while t 6 2cn2 and some clause is unsatisfied do
Pick and unsatisfied clause Cj
Choose u.a.r. one of the 2 variables in Cj and flip its value
if φ is satisfied then
return A
end if
end while
return φ is unsatisfiable
An example: unsat formula

If φ = (x1 ∨ x2 ) ∧ (x̄1 ∨ x̄2 ) ∧ (x̄1 ∨ x2 ) ∧ (x1 ∨ x̄2 )

does not has a A∗ |= φ.

t x1 x2 sel clause
1 1 1 2
2 1 0 3
3 0 0 1
.. .. .. ..
. . . .

φ is unsat eventually the algorithm will stop after reaching the

maximum number of steps.
An example: unsat formula

If φ = (x1 ∨ x2 ) ∧ (x̄1 ∨ x̄2 ) ∧ (x̄1 ∨ x2 ) ∧ (x1 ∨ x̄2 )

does not has a A∗ |= φ.

t x1 x2 sel clause
1 1 1 2
2 1 0 3
3 0 0 1
.. .. .. ..
. . . .

φ is unsat eventually the algorithm will stop after reaching the

maximum number of steps.
An example: unsat formula

If φ = (x1 ∨ x2 ) ∧ (x̄1 ∨ x̄2 ) ∧ (x̄1 ∨ x2 ) ∧ (x1 ∨ x̄2 )

does not has a A∗ |= φ.

t x1 x2 sel clause
1 1 1 2
2 1 0 3
3 0 0 1
.. .. .. ..
. . . .

φ is unsat eventually the algorithm will stop after reaching the

maximum number of steps.
An example: unsat formula

If φ = (x1 ∨ x2 ) ∧ (x̄1 ∨ x̄2 ) ∧ (x̄1 ∨ x2 ) ∧ (x1 ∨ x̄2 )

does not has a A∗ |= φ.

t x1 x2 sel clause
1 1 1 2
2 1 0 3
3 0 0 1
.. .. .. ..
. . . .

φ is unsat eventually the algorithm will stop after reaching the

maximum number of steps.
An example: unsat formula

If φ = (x1 ∨ x2 ) ∧ (x̄1 ∨ x̄2 ) ∧ (x̄1 ∨ x2 ) ∧ (x1 ∨ x̄2 )

does not has a A∗ |= φ.

t x1 x2 sel clause
1 1 1 2
2 1 0 3
3 0 0 1
.. .. .. ..
. . . .

φ is unsat eventually the algorithm will stop after reaching the

maximum number of steps.
An example: unsat formula

If φ = (x1 ∨ x2 ) ∧ (x̄1 ∨ x̄2 ) ∧ (x̄1 ∨ x2 ) ∧ (x1 ∨ x̄2 )

does not has a A∗ |= φ.

t x1 x2 sel clause
1 1 1 2
2 1 0 3
3 0 0 1
.. .. .. ..
. . . .