0% found this document useful (0 votes)
426 views14 pages

Problem Solving

The article by Olle Häggström discusses the utility of Markov chains in problem-solving across various fields, emphasizing their application in generating samples from complex distributions through the Markov chain Monte Carlo (MCMC) method. It highlights key concepts such as coupling, correlation inequalities, and percolation theory, illustrating their significance with examples and theoretical results, particularly focusing on Harris' inequality. The author aims to encourage mathematicians to incorporate Markov chain ideas into their problem-solving toolboxes, even in contexts that do not require direct simulation.

Uploaded by

Chie Yu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
426 views14 pages

Problem Solving

The article by Olle Häggström discusses the utility of Markov chains in problem-solving across various fields, emphasizing their application in generating samples from complex distributions through the Markov chain Monte Carlo (MCMC) method. It highlights key concepts such as coupling, correlation inequalities, and percolation theory, illustrating their significance with examples and theoretical results, particularly focusing on Harris' inequality. The author aims to encourage mathematicians to incorporate Markov chain ideas into their problem-solving toolboxes, even in contexts that do not require direct simulation.

Uploaded by

Chie Yu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Board of the Foundation of the Scandinavian Journal of Statistics

Problem Solving is Often a Matter of Cooking Up an Appropriate Markov Chain


Author(s): OLLE HÄGGSTRÖM
Source: Scandinavian Journal of Statistics, Vol. 34, No. 4 (December 2007), pp. 768-780
Published by: Wiley on behalf of Board of the Foundation of the Scandinavian Journal of
Statistics
Stable URL: http://www.jstor.org/stable/41548580
Accessed: 01-09-2016 11:19 UTC

REFERENCES
Linked references are available on JSTOR for this article:
http://www.jstor.org/stable/41548580?seq=1&cid=pdf-reference#references_tab_contents
You may need to log in to JSTOR to access the linked references.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted
digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about
JSTOR, please contact [email protected].

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://about.jstor.org/terms

Wiley, Board of the Foundation of the Scandinavian Journal of Statistics are collaborating
with JSTOR to digitize, preserve and extend access to Scandinavian Journal of Statistics

This content downloaded from 178.250.250.21 on Thu, 01 Sep 2016 11:19:53 UTC
All use subject to http://about.jstor.org/terms
doi: 10.1 1 1 l/j.l467-9469.2007.00561x
© Board of the Foundation of the Scandinavian Journal of Statistics 2007. Published by Blackwell Publishing Ltd, 9600 Garsington
Road, Oxford OX4 2DQ, UK and 350 Main Street, Maiden, MA 02148, USA Vol 34: 768-780, 2007

Problem Solving is Often a Matter of


Cooking Up an Appropriate Markov Chain*
OLLE HÄGGSTRÖM
Mathematical Sciences, Chalmers University of Technology, Göteborg, Sweden

ABSTRACT. By means of a series of examples, from classic contributions to probability theory as


well as the author's own, an attempt is made to convince the reader that problem solving is often
a matter of cooking up an appropriate Markov chain. Topics touched upon along the way include
coupling, correlation inequalities, and percolation.

Key words: correlation inequality, coupling, Markov chain, Markov chain Monte Carlo, per-
colation, stochastic domination

1. Introduction

A century has passed since the introduction by A. A. Markov of what we now know as
Markov chains; see Markov (1906) and Basharian et al. (2004). During this period, Markov
chains have turned out to be not only a rich source of beautiful mathematics but also im-
mensely useful in a variety of applied areas such as statistical mechanics, queueing theory,
information theory, statistics, speech recognition and bioinformatics, to name just a few.
The most common way to use Markov chains in these and other areas is as ingredients
in the modelling of one kind or another of time dynamics. A completely different use of
Markov chains is the so-called Markov chain Monte Carlo (MCMC) method, pioneered
by Metropolis et al. (1953), Hastings (1970), Geman & Geman (1984) and others. Here,
Markov chains are applied to situations that in themselves need not involve any time dyna-
mics at all. The problem is to generate computer samples with some prescribed but typically
very complicated distribution n on some large state space S , and the idea of the MCMC
method is that, in situations where it appears practically impossible to sample directly from
7i, it may be easy to sample from the transition kernel of some irreducible and aperiodic
Markov chain ^ = {^(0), X(l), ...} on S whose unique stationary distribution is precisely
и. If the chain has the property of rapid convergence to stationarity (as hopefully it has),
then an easy way to generate an S -valued random object whose distribution is close to n is
to start the chain with X(0) chosen arbitrarily and to run it for a while (say, time n) and
output X(n). See, for example, Gilks et al. (1996) or Häggström (2002) for introductions to
the theory and practice of MCMC.
The purpose of the present article is to elaborate on the somewhat less well-known idea
that the central ingredient in the MCMC method - namely, the introduction of a Markov
chain designed to have a prescribed stationary distribution n - is useful in a variety of con-
texts that do not involve computer simulations of any sort. Rather, in the kind of applications
I have in mind, it is not necessary to implement and run the Markov chains: it will suffice
to think about them on a more abstract level. Every mathematician needs to have a toolbox
of devices and tricks to use in various situations, and I hope to convince readers that the
readiness to try out such Markov chain ideas is a useful enough device that they will want
to include in their own toolboxes.

This paper was presented at the 21st Nordic Conference on Mathematical Statistics, Rebild,
Denmark, 2006 (NORDSTAT 2006).

This content downloaded from 178.250.250.21 on Thu, 01 Sep 2016 11:19:53 UTC
All use subject to http://about.jstor.org/terms
Scand J Statist 34 Problem solving using Markov chains 769

At this point, a line from Lindvall's (1992) influential introduction to


seems apt: "To know a method is to have learned how it works. What
us is a collection of applications of a few basic ideas" (p. 6). In the fol
will focus on three basic examples. In section 2, I will discuss the use of
proving the very useful correlation inequality of Harris (1960). Then, I
examples from my own practice - a domination inequality needed in a
survey sampling in section 3 and a conditional correlation inequality fo
in section 4.

One aspect of my Lindvallian approach in this article is that I have no pretentions of


providing an exhaustive survey of the topic. For a particular subtopic that is left out of
the discussion but which I recommend to the ambitious reader, let me mention the exploit-
ation of ideas from the coupling-from-the-past approach of Propp & Wilson (1996) in the
so-called perfect MCMC to the rigorous analysis of ergodic properties of Gibbisan random
fields; this idea was first conceived by van den Berg & Steif (1998) and then further exploited
by Häggström & Steif (2000) and Häggström et al (2000, 2002).

2. Harris9 inequality

Harris (1960) is a classic paper. Way ahead of its time, it contains a number of ideas that
have influenced percolation theory for decades and one - a correlation inequality now known
as Harris' inequality - whose influence has extended far beyond that field.
Percolation theory (see Grimmett (1999) for an introduction) deals with connectivity prop-
erties of random media, and the basic mathematical model is as follows. Let G = ( V, E) be a
(finite or countably infinite) connected graph with vertex set V and edge set E , where each
edge eeE links two of the vertices. When G is infinite, it is customary to also impose the
condition of local finiteness, meaning that each л; e К is incident to only finitely many edges.
A standard example is to let G be the infinite square lattice that is denoted by Z2 and arises
by letting V consist of all integer points in the Euclidean plane and having edges between
vertices at Euclidean distance 1 from each other. The so-called retention parameter /?€[0, 1]
is fixed, and each edge in G is removed independently with probability 1 -p. This produces a
random subgraph of G, and the percolation-theoretic challenge is to say something about the
connected components of this subgraph. For instance, if G is infinite, one may ask whether
an infinite connected component occurs. The probability of getting an infinitely connected
component is easily shown to be 0 or 1, depending on whether p is above or below a criti-
cal value pc = Pc(G)e[ 0,1]. The main result of Harris (1960) was that, for the square lattice,
This inequality was conjectured to, in fact, be an equality, but it took another 20
years before that was rigorously estabished by Kesten (1980).
What I have described here is the so-called bond percolation, as opposed to site perco-
lation, which has much the same flavour but where it is vertices rather than edges that are
removed at random.

Let me give a vague motivation for why correlation inequalities are important in perco-
lation theory. Establishing connectivities over long distances often proceeds through a kind
of concatenation procedure. For two vertices x,yeV, an obvious sufficient condition for the
existence of a path between * and y - an event that we denote by {x^y} - is that both of
them have paths to some third vertex z e V. Thus,

P(jc <->}>) > P(x <-> z, z y ).

Here it is typically useful to be able to conclude that

P(x <-> y) > P(x <-► z)P(z <-* y ),

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

This content downloaded from 178.250.250.21 on Thu, 01 Sep 2016 11:19:53 UTC
All use subject to http://about.jstor.org/terms
770 О. Häggström Scand J Statist 34

an argument that, however, requires the ev


in the sense that

J*(x<-+Z1Z<r+y)>P(x<r+z)P(<Z<r+y). (1)
Are they? The answer is yes, by an application of Harris
To set the stage for this result, we need some definitio
collection {Xi}i€i of real- valued random variables, where
be finite or countably infinite. For jc, jc'gR7, we write x^x
say a function /: R7- >R is increasing if f(x)<f(x') when

Theorem 1 (Harris9 inequality)


Let X ={Xi}iej be a collection of independent real-valued ran
be two bounded and increasing functions. Then,

E [ДХ)д(Х)] > E[f(X)]E[g(X)]. (2)


The significant condition here on / and g is t
is just a convenient way of making sure that t
(2) holds for all bounded and increasing / an
ciations property of {Xi}i€i, and so with this
collection of independent real-valued random
To see how (1) follows from Harris' inequali
dom variable Xe that takes values 1 or 0 depe
after random thinning of the graph G. That
( p ) random variables. The indicator function
increasing the Xe's means inserting edges, and
{x<r+z}. The same goes for the indicator 1{Z
cular that

E[l{ x*->z X +-*Z


which is just another way of expressing (1).
There are various ways to prove theorem 1 - see Harris (1960) or Grimmett (1999) - apart
from the Markov chain approach employed here, which I personally find the most illumin-
ating. This approach goes back to Holley (1974). The core of the matter lies in proving the
following special case consisting of a finite collection of i.i.d. {0, l}-valued random variables;
once that is done, the general case follows, as we shall see, in a fairly straightforward manner.

Proposition 1
Let X', ... ,Xn be i.i.d. Bernoulli (p) random variables, and let f,g: {0, 1}" - >R be increasing
functions. Then,

E[f{X{ , . . . , Xn)g{Xx , . . . , Xn)] > E [f{Xx , . . . , Xn)]E [g(Xx , . . . , Xn)]. (3)

A key ingredient in the preferred proof of this result, besides Markov chains, is the notion of
a coupling. A coupling of two probability distributions ц and is a joint construction on the
same probability space of two random objects with respective distributions ju and '¿ done with
the explicit purpose of drawing conclusions about (and sometimes comparing) these distribu-
tions. The coupling idea is best explained via examples, as will be done in the following text,
but see also Lindvall (1992) and Thorisson (1995, 2000) for introductions to coupling methods.
Here, let ц denote the probability distribution on {0, 1 }n of (X' , . . . , Xn)9 as above i.i.d. Ber-
noulli (p ). Furthermore, let цд be the so-called p-biased perturbation of ju, defined by setting

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

This content downloaded from 178.250.250.21 on Thu, 01 Sep 2016 11:19:53 UTC
All use subject to http://about.jstor.org/terms
Scand J Statist 34 Problem solving using Markov chains 111

Hg(co) = Z'lfi(co)g(co)

for each coe {0, 1}W, where

Z= ц{(о)д{(о)
we{0, i}«

is a normalizing constant. Of course, this makes fig a probability m


negative (and not identically zero). But, because adding a constant to
change irrespective of whether (3) holds or not, there is no loss of gen
(7(a>)>0 for all сое {0, 1}". So let us assume that.
An intermediate step in proving proposition 1 is the following lem

Lemma 1

It is possible to couple two {0, 1 }n -valued random variables X and Y with respective distributions
/X and fig, such that

P(X±Y)=l. (4)

Note in particula
have the
lemma, p
the desired inequa

M/s)>iU(/Xs)> (5)

and Z = n(g). We get, w

vAf) = Е[/(Х)]<Е[/(У)]
= Pgtf)

we{0, l}"

_ E„€(0, и» A«)
z

_M/g)
Kg)

and multiplying by p{g) yields (5). Thus, in order to prove proposition 1, it only remains to
prove lemma 1.

Proof of Lemma 1. Here is where the long-awaited Markov chains enter our game. We will
begin by defining two {0, l}w-valued Markov chains (X(0),X(l),X(2), ...) and (F(0), F(l),
Y (2), . . . ) designed to have fx and цд as their respective unique stationary distributions.
The transition mechanism for (X(0),X('),X(2), . . .) is as follows. Given X(k ), set i =
&(mod n) + 1 and set independently of everything else

Л-, <* + „={'


[ 0 [W.p.
0 •»*'
1-/7

while setting Xj(k+l) = Xj(k) for all j^i. (Note that this makes the chain time-
inhomogeneous with a transition kernel that repeats itself every n time units.)
It is obvious that, if X{k) has distribution /¿, then so has X(k+ 1). So, /z is a stationary dis-
tribution for the chain. And it is equally obvious that the chain is irreducible and aperiodic,
and so X{k) converges in distribution to [i as к - ► oo no matter how it is started. (Note: While

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

This content downloaded from 178.250.250.21 on Thu, 01 Sep 2016 11:19:53 UTC
All use subject to http://about.jstor.org/terms
772 О. Häggström Scand J Statist 34

it is true that, in the time-inhomogeneous c


general sufficient for a finite-state Markov cha
a problem because sampling X at every wth t
corresponding properties.) Readers familiar
Markov chain is precisely the so-called systema
Let us construct (F(0), 7(1), Y( 2), . . .) in the s
£g{0, 1}{1,-""}'{'}, define ¿ to be the conditio
{0, 1}" -valued random object with distributio
nates are given by Let the transition mechan
Given Y(k), we set i = &(mod n) + 1 and set

у,(*+1)=П
'0 w.p. l-yu
where is given by the values of Y(k) on {1, . . . ,л}'{/}; and finally we set Yj(k+ 1)= Yj(k)
for all j±i.
This makes (Y( 0), F(l), Y (2), ...) another instance of the Gibbs sampler, irreducible and
aperiodic, with Y(k) converging in distribution to the chain's unique stationary distribution
/V
Next, we specify how to run the two chains simultaneously on the same probability space.
We start the chains by picking X(0) and F(0) indepedently according to their respective
stationary distributions 'i and цд. Let (t/o, U', I/2, . . .) be a sequence of i.i.d. random vari-
ables uniformly distributed on [0,1]. To go from time к to time k+ 1, set

^(fc+i)={! L 0 ú"k<p otherwise (6)


L 0 otherwise
and

Y,(fc + 1)={¿ 1 0 otherwise. (7)


1 0 otherwise.
I now claim, crucially, that
y¡,(>P (8)
regardless of
Xi(k) < Yi(k) f
visited, which
( X , Y) accord
that establishes the lemma.

It only remains to prove the claim (8). Write VO (resp. { V 1) for the element of {0, 1}"
that equals 0 (resp. 1) at the ith coordinate, and agrees with £ elsewhere. Showing (8) is the
same as showing that

У'л > (9)


1 - Vf, í 1 ~p
We get

У ut _^«Vl)_M^Vl)g(^Vl)
1 - y i, { ng(š v 0) MŠV0MČV0)
^ M^vl)
- MÉVO)
p

1 -p
where the inequality is due to g being increasing.

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

This content downloaded from 178.250.250.21 on Thu, 01 Sep 2016 11:19:53 UTC
All use subject to http://about.jstor.org/terms
Scand J Statist 34 Problem solving using Markov chains 11Ъ

Lemma 1 and, consequently, proposition 1 are thus established.


It is a slightly unusual feature of this particular Markov chain proof tha
looking at the chains at a fixed finite time n ; in the following two sect
consider asymptotics as time tends to infinity.
Equipped with proposition 1, we are now in a position to obtain theor

Proof of Theorem 1. We proceed by extending proposition 1 via a coup


levels of generality. As a first step, consider the case in which we allow an
X = (X',X2, ...) of i.i.d. variables, but still insist that they are binary. De

fn(X) = E[f(X)'Xu...,Xn]
and g„(X) analogously. Both /„ and gn are increasing, and so propsition

E [fn(X)gn(X)' > E[fn(X)]E[gn(X)]. (10)


Furthermore, a standard application (to be found,
of the Martingale convergence theorem tells us t
n - > oo. Thus, we may take limits in (10) to conc
As a next step, note that we can go directly from t
ables to that of finitely or infinitely many variables
simply by representing the latter by their binary e
the usual sense, then / (x) is also an increasing functi
the converse is not true.)
Finally, to go from uniform [0, 1] variables to arb
suffices to recall the inverse probability transform,
variable can be obtained as a monotone transforma
while noting that the composition of two increasi
therefore established.
Extensions of Harris' inequality to certain classes of dependent random variables have been
made. One contribution worth mentioning in this context is Esary et al. (1967). Arguably
the most famous extension is the so-called Fortuin-Kasteleyn-Ginibre (FKG) inequality of
Fortuin et al. (1971); see also Holley (1974) and Georgii et al. (2001). Here I feel compelled
to point out that it is fairly common in the literature that alleged applications of the FKG
inequality concern i.i.d. systems so that a lot of credit that should rightfully go to Harris
ends up instead with the FKG trio.
Inspecting the Markov chain argument in the proof of proposition 1 to see what assump-
tions on Ц are really needed, and considering the asymptotic joint distribution of ( X(k ), Y(k))
as к - > oo, leads to the variation of the FKG inequality that appears, for example, in theo-
rem 4.11 of Georgii et al. (2001). Besides a technical assumption such as requiring that ¡i
assigns positive probability to all a>£ {0, 1 }n (this may be weakened), the crucial assumption
is that, for any /, the conditional /¿-probability of seeing a 1 at coordinate /, given that the
other coordinates take values according to ¿€{0, l}*1, - • is increasing as a function of
This turns out to hold for many important examples, such as in the ferromagnetic Ising
model and in the most relevant parts of the parameter space of the so-called random-cluster
model; see Georgii et al. (2001).

3. A domination result for sampling

In 1995, I was approached by two of my local colleagues at Chalmers, Johan Jonasson and
Olle Nerman, who were stuck on a seemingly obvious inequality they needed in the context

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

This content downloaded from 178.250.250.21 on Thu, 01 Sep 2016 11:19:53 UTC
All use subject to http://about.jstor.org/terms
IIA О. Häggström Scand J Statist 34

of survey sampling with unequal probabilities.


approach to Harris' inequality, I was quickly
Fix n and p', ... ,pn e [0, 1], and let X', ... ,
identically distributed) Bernoulli variables wi
their sum. The question Jonasson and Nerm
s e {0, 1 }, it is the case that P(X¡ = 1 1 S = 5
highly plausible: the larger S is, the more lik
take value 1 . Indeed:

Proposition 2
With X', ... ,Xn and S as defined earlier we have, for any i e {I, ... ,n} and s € {0, . . . , n - 1 },

P(A^ = 1 1 5 = 5) < Y(Xi = 1 1 5 = 5 + 1). (11)


The similar inequality P(X¡ = 1 1 S < s) < P(X¡ = 1 1 S > s) follows imm
inequality (with f(X) = X¡ and р(ЛГ) = 1 {$>,}), but (11) requires a dif
lowing is how I argued using Markov chains.

Proof. For s = 0, . . . , n, let ¡is denote the probability measure on {0,


ditioning (X'9 ... 9X„) on the event {5=5}. For each ц59 we would li
chain with fis as its stationary distribution. Directly copying the G
in section 2 will not do, because the /^-conditional probability of h
i given the values at all other coordinates is always degenerate, thu
Markov chain that ends up mapping any state onto itself with proba
Instead, let us try a variant of the Gibbs sampler where we updat
time. For fixed distinct i J e {1, . . . ,л}, the conditional distribution of
Xt + Xj = 1, is

(X. jn-i(1,0) wp-


1.(0,1) W-p' Pi(i-Pj)+PjO-PtV
and this conditional distribution is unaffected by further conditioning on S. Therefore, ц0,
ци . . . , fin are all stationary distributions for the {0, l}"-valued Markov chain (Jf(0), XO), . . .)
with the transition mechanism where, at each time k, we do the following:

1. Pick two indices i J e {1, . . . ,«} at random according to uniform distribution without
replacement.
2. If Xi(k) = Xj(k), then set Х£к+1) = Х;{к+1) = ХАк). Otherwise, set

да+1),А-#+1))=| I/0'1)
I/0'1) W-P-W-P- ^ М,-%№ГР1) MI-¿)+pjO-P<) ■ О2)
MI-¿)+pjO-P<)
3. Set Xh(k + 1) = Xh(k) for all h i {ìj}.

Now let us run (A^O), X(l)9 ...) together with a second {0, 1}" -valued Markov chain (Y( 0),
F(l), . . .) with exactly the same transition kernel. We 'synchronize' the transitions by

(a) Always picking the same coordinates i and j in the Y chain as in the X chain
(b) Whenever Xi(k) + Xj(k)= Yi(k)+ Yj(k)=', we take (*¿(fc+ 1), Xj{k+ 1)) and(Y/(Â;+ 1),
Yj(k- hi)) to be equal (and equal to (1,0) or (0, 1) with the probabilities prescribed in
(12))

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

This content downloaded from 178.250.250.21 on Thu, 01 Sep 2016 11:19:53 UTC
All use subject to http://about.jstor.org/terms
Scand J Statist 34 Problem solving using Markov chains 115

Now start the chains with X(O) chosen according to ns and, independen
to ця+ 1.
Define, for each к , Z(k) as the number of coordinates in which X(k) and Y(k) differ. Note
that Z(k) > 1, with equality if and only if X(k)< Y(k). Furthermore, with the aforementioned
synchronization of the two chains, we see that (Z(0),Z(1),Z(2), . . .) is a decreasing process,
because Z(k + 1) will equal Z(k) in all cases except when i and j happen to be chosen in
such a way that №(&), A}(£)) = (1,0) and ( Y¡(k ), iy(Ä:)) = (0, 1) or vice versa, in which case
we get Z(k + 1) = Z(k) - 2. Whenever Z(k) > 1, there is a positive probability (bounded below
by 2 ln(n- 1)) that such i and j are chosen, and repeated application of the Borel-Cantelli
lemma implies that a.s. the Z process keeps decreasing until eventually it reaches the absorb-
ing level 1.
Hence, from some (random) time and onwards, we will have X(k )■< Y(k), and in parti-
cular Xi(k)< Yi(k). By picking two {0, l}"-valued random objects X and Y according to the
asymptotic distribution as k- >oo of ( X(k ), Y(k)) (after passing to a subsequence if neces-
sary - which incidentally it is not), and recalling that X{k) and Y{k) have distributions ns
and ns+l for each к (and thus also in the limit), we obtain (11).
This argument has to my knowledge previously appeared only in the preprint by Jonasson
& Nerman (1996). Publication of their paper was delayed, and when eventually its descen-
dant, Aires et al. (2002), appeared, the work had evolved to the point where proposition 2
was no longer needed.
Some years after Jonasson and Nerman's original query, Yuval Peres and independently
Tue Tjur explained to me, in response to my Markov chain argument, that proposition 2 is
in fact intimately related to an inequality of no less a soul than Isaac Newton.
In its simplest form, Newton's inequality states that, if a polynomial

Р(х) = аъ + а'х-'

with real coefficients has only real roots, then th


concavity relation

0,-10,-1., <aj (13)


for every j; see Newton (1707) o
sums of Bernoulli variables proc
Bernoulli variables with parame
coefficient a¡ in the polynomial
n

no -Pj+PjX).
j= 1

Now, for i e {1, . . . ,«}, define S¡ = S - X¡. Since S¡ is a sum of n - 1 independent Bernoulli
variables, we get from (13)

p (s;=s- i)P(s¡=s+ 1) < pos; =sý (i4)


for any s. The inequality (11) in proposition 2 is the

P(Xi = 1 1 S = s) ^ P(X> , = 1 1 S = 5 + 1 )
P(Xt =0 j S = s)~P(Xi ^ = 0|S' = 5+1)'
which, by rewriting the two ratios, we see is equivalen

PCr, = l,fl = j-l) P(Jr, = l,ff = 5)


вд=о ,s;=s) -p(x,=o,s¡=s+iy 1 ;
© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

This content downloaded from 178.250.250.21 on Thu, 01 Sep 2016 11:19:53 UTC
All use subject to http://about.jstor.org/terms
776 О. Häggström Scand J Statist 34

The probability in the numerator of the left-h


larly for the three other probabilities; so (15) is
p,P(s;=S- D PiPÇs'^s)
(1 -pdPiS'^s) - (1 -Pi)P(S¡=s+'y
which of course follows via (14) from Newton's inequality.
Note that this reasoning can be turned around to derive Newton's inequality - at least in
the case where all roots are negative - from proposition 2. At one point, I therefore toyed
with the idea of publishing a note with a title like A probabilistic proof of an inequality of
Newton', but decided against it, as (13) admits a relatively straightforward proof by induction
in n (I leave this to the reader).

4. Conditioning and correlation in percolation

Harris' inequality and related results have proved extremely useful in percolation theory and
related topics - see, for example, Grimmett (1999) and Georgii et al. (2001) - a fact that
motivates considerable interest in trying to come up with new correlation inequalities. An
important example is the so-called BK inequality of van den Berg & Kesten (1985) for 'dis-
joint occurence' of increasing events, and the extension of this by Reimer (2000) to arbitrary
events. Here let us look in another direction, namely, that of whether (variants of) the Harris
and FKG inequalities are preserved under various kinds of conditioning.
Let us focus on the standard bond percolation model on a finite or infinite but locally
finite graph G = (V,E) and retention parameter p, where each edge is independently deleted
with probability 1 -p, as in section 2. Write X e {0, 1}^ for the resulting random subgraph
as represented by the indicator variables Xe = l{eis retained} for each eeE. Harris' inequality
tells us that, for any two bounded and increasing functions /, д : {0, 1}£ - >R, we have

ЕЩХ)д(Х)] > Е[/(ЛГ)]Е[р(ЛГ)].


Suppose now that we condition on an event А с {0, '}E. Does the inequality
W(X)g(X) I A] > Щ/(Х) I АЩд(Х) I A] (16)
then hold? It is easy to devise examples that show that the ans
even recover (16) by requiring that (the indicator function of
decreasing.
But all is not lost. Restricting to certain 'connectivity events' i
certain correlation inequalities. For a vertex xeV, write Tx for th
whose occurrence or non-occurrence can be determined from
tices and edges are part of the connected component of X
events in T are

A ={the connected component containing x has at least 10 edges}

and, for fixed y e К, the event B={x^y} that у is in the same connected component of X as
x is. We write {xfiy} for the complement of the latter event, and note that this complement
is in Tx.
The following conditional correlation inequality was recently established by van den Berg
et al. (2006a).

Theorem 2

Consider bond percolation on a locally finite graph G = (V,E) with retention parameter
p e [0, 1]. Then, for any two vertices x,y € V and any two bounded and increasing events AeTx
and В G Ту, we have

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

This content downloaded from 178.250.250.21 on Thu, 01 Sep 2016 11:19:53 UTC
All use subject to http://about.jstor.org/terms
Scand J Statist 34 Problem solving using Markov chains 111

Р(ЛПЯ| {x^y}) < P (A I {x^y})P(B I {x^y}). (17)


Note the reversal of the inequality compared with Harris' inequal
tuitively plausible, because if we condition on {xý+y}, then furt
connected component Cx being large (in some sense) restricts the
should therefore tend to make it smaller. It appears, however, to
lenge to find a more direct proof than the Markov chain argument g
An alternate proof, based on induction in the size of the graph G
van den Berg & Kahn (2001), proof of theorem 1.2, and van den B
of theorem 1.5.

In van den Berg et al. (2006a), theorem 2 is proved in the greater generality of the random-
cluster model with clustering parameter q > 1 (the case q = 1 corresponds to the ordinary
bond percolation setup considered here) using an extension of the Markov chain argument;
the induction-based alternative proof seems not to work in this setting. Applications of theo-
rem 2 to settle certain open problems concerning the equilibrium behaviour of an interacting
particle system known as the contact process appear in van den Berg et al. (2006a, b).

Proof. It suffices to prove the theorem for the case where G is finite, as the infinite case
follows from standard limiting arguments similar to those discussed in the proof of theorem
1. Consider the {0, l}£-valued Markov chain (^(0)^(1), ...) with the following transition
mechanism, where, to go from time к to time k + 1, the edge configuration X(k) is modified
to X(k+ 1) via an intermediate configuration X'(k):

1. For each edge eeE that either is in the connected component of X(k) containing x or
has a vertex in this connected component as an endpoint, set X'e(k) = Xe(k), whereas
for all other edges set X'e(k) = 1 (resp. 0) with probability p (resp. 1 - p), independently
for different edges.
2. For each edge eeE that either is in the connected component of X'(k) containing
у or has a vertex in this connected component as an endpoint, set Xe(k+ 1 ) = X'(k),
whereas for all other edges set Xe(k+ 1)= 1 (resp. 0) with probability p (resp. 1 -p)
independently for different edges.

Write fi for the probability measure on {0, 1}" corresponding to conditioning percolation
with parameter p on the event {x/>y}. If we condition /¿ on the connected component Cx
contaning x, then clearly the conditional distribution of the rest of the configuration is i.i.d.
(p) percolation on the edges that are neither in Cx nor adjacent to a vertex in Cx. Viewing
the aforementioned transition kernel as the composition of two kernels (Steps 1 and 2) we
thus see immediately that ¡i is invariant under Step 1, and similarly under Step 2, and there-
fore also under the full kernel for (X(0), A^l), ...). Furthermore, the chain is easily seen to
be irreducible (within the set of states satisfying x/+y) and aperiodic; so no matter how the
initial state A^O) is chosen, we know that the distribution of X(k) tends to fi as fc- ► oo.
Now, if we were to imitate the approach of the previous two chapters, we would look
for some suitable second Markov chain to couple (Af(0), A'(l), ...) with. We will not do so
here, but will instead do something similar in spirit, namely to specify in more detail how the
randomization in the transition mechanism is carried out. To this end, we introduce an array
{Ue(k)}eeE,k= о, of i.i.d. Bernoulli (p ) random variables, and an array {U*(k)}eeE,k= o, 1,2,...
of i.i.d. Bernoulli (1 - p) random variables, independent of the first array. (Coupling affi-
conados may view the following as a coupling of the Markov chain and these arrays.) The
random parts of Steps 1 and 2 are implemented as follows:

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

This content downloaded from 178.250.250.21 on Thu, 01 Sep 2016 11:19:53 UTC
All use subject to http://about.jstor.org/terms
778 О. Häggström Scand J Statist 34

1. For each edge eeE that is neither in the co


incident to it, set X'e(k) = Ue(k).
2. For each edge e e E that is neither in the c
incident to it, set Xe(k + 1)= 1 - U*(k).

Now, start the Markov chain in some fixed s


the chosen transition mechanism, the set of
becomes an increasing function of the variab
connected component of у in X(l) becomes a d
and {£/*(0)}>е£. And proceeding by induction
component of x in X(k) is an increasing func
{U¿(i)}eeE,i=o,...,k-2, and that the set of edges
decreasing function of the variables {Ue(i)}e
Constructing the Markov chain as a (in parts)
puts us in an ideal position for exploiting Harr
{0, 1}£ of X(k), and fix two increasing event
that the composition of two increasing funct
inequality (or from the slightly more element

Hk(AnB)<iik(A)nk(B).
As к - > oo, we get

ПВ)< ц(А)ц(В),

which proves theorem 2.


Note that this proof shows a bit more than the
are bounded and increasing in the set of edge
well as decreasing in the set of edges that ar
n(f)ß(g). If we are restricted to the special case
nected component containing x9 then we rec
(2001), theorem 1.5.
Another aspect of the proof worth noting is
of the proof to 'standard limiting arguments
chain directly on an infinite graph. The fact
slightly less elementary than in the finite case
ling argument will do: Run the chain (^(0), Л
figuration, together with another chain (F(0),
with F(0) chosen at random according to fi.
{Ue(k)}ezE,k =o,i,2,... and {U*(k)}eeE,k= o,i,2,...
dent to exactly d edges e' , . . . then this gua
from time k+ 1 and onwards as soon as Uei(k
event A that we have

МЛ) - мл)| < (i - (i -p)df , (18)


which thus is a bound on the so-called total variation distance betwe
key here is of course that this bound tends to 0 as к - > oo.

5. Concluding remarks

The examples of Markov chain arguments in sections 2 to 4 have a nu


tures. One such feature is that each of them is used to derive one inequ

© Board of the Foundation of the Scandinavian Journal of Statistics 2007,

This content downloaded from 178.250.250.21 on Thu, 01 Sep 2016 11:19:53 UTC
All use subject to http://about.jstor.org/terms
Scand J Statist 34 Problem solving using Markov chains 119

raises the question of whether the circle of ideas that I try to vaguely de
examples is limited to establishing inequalities. The answer is no, and th
example, turn to the proof of lemma 2.4 in Häggström (1996) for an argu
mediately be recognized as belonging to the same circle but used to prov
identity. Other examples of results of kinds other than inequalities obtained
ion are the ergodic- theoretic results of van den Berg & Steif (1998), Häggst
and Häggström et al (2000, 2002) mentioned at the end of section 1.
Another noteworthy feature is the following. As in the MCMC method
amples exploit the asymptotic behaviour of their respective Markov chains.
MCMC method, where it is of crucial importance that the convergence
relatively fast, the rate of convergence to equilibrium is unimportant in ea
here. The difference is, of course, that in the MCMC method we need to i
the chains in computer simulations, whereas in the examples considered h
to run them 'in our heads', where it only takes a split second to imagine
until we are close to equilibrium.

Acknowledgement

I am grateful to Yuval Peres and to Tue Tjur for pointing out the connect
osition 2 and Newton's inequality. Thanks also to Jeff Steif and to a refe
and corrections.

References

Aires, N., Jonasson, J. & Nerman, O. (2002). Order sampling with prescribed inclusion probabilities.
Scand. J. Statist. 29, 183-187.
Basharian, G. P., Langville, A. N. & Naumov, V. A. (2004). The life and work of A.A. Markov. Linear
Algebra Appl. 386, 3-26.
van den Berg, J. & Kahn, J. (2001). A correlation inequality for connection events in percolation. Ann.
Probab. 29, 123-126.
van den Berg, J. & Kesten, H. (1985). Inequalities with applications to percolation and reliability. J. Appi
Probab. 22, 556-569.
van den Berg, J. & Steif, J. E. (1998). On the existence and non-existence of finitary codings for a class
of random fields. Ann. Probab 27, 1501-1522.
van den Berg, J., Häggström, О. & Kahn, J. (2006a). Some conditional correlation inequalities for per-
colation and related processes. Random Structures Algorithms 29, 417-435.
van den Berg, J., Häggström, О. & Kahn, J. (2006b). Proof of a conjecture of N. Konno for the ID
contact process. In Dynamics and s tochas tics: Festschrift in honor of Michael Keane (eds D. Dente-
neer, F. den Hollander & E. Verbitskiy), 16-23. IMS Lecture Notes-Monograph Series. Institute of
Mathematical Statistics, Beachwood, Ohio.
Esary, J. D., Proschan, F. & Walkup, D. W. (1967). Association of random variables, with applications.
Ann. Math. Statist. 38, 1466-1474.
Fortuin, C. M., Kasteleyn, P. W. & Ginibre, J. (1971). Correlation inequalities on some partially ordered
sets. Comm. Math. Phys. 22, 89-103.
Geman, S. & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration
of images. IEEE Trans. Pattern Anal. Machine Intell. 6, 721-741.
Georgii, H.-O., Häggström, О. & Maes, С. (2001). The random geometry of equilibrium phases. In Phase
transitions and critical phenomena , Vol. 18 (eds C. Domb & J. L. Lebowitz), 1-142. Academic Press,
London.
Gilks, W., Richardson, S. & Spiegelhalter, D. (1996). Markov chain Monte Carlo in practice. Chapman
& Hall, London.
Grimmett, G. R. (1999). Percolation, 2nd edn. Springer, New York.
Häggström, О. (1996). The random-cluster model on a homogeneous tree. Probab. Theory Related Fields
104, 231-253.

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

This content downloaded from 178.250.250.21 on Thu, 01 Sep 2016 11:19:53 UTC
All use subject to http://about.jstor.org/terms
780 О. Häggström Scand J Statist 34

Häggström, О. (2002). Finite Markov chains and alg


Cambridge.
Häggström, О., Jonasson, J. & Lyons, R. (2002). Coupling and Bernoullicity in random-cluster and Potts
models. Bernoulli 8, 275-294.
Häggström, О., Schonmann, R. & Steif, J. E. (2000). The Ising model on diluted graphs and strong
amenability. Ann. Probab. 28, 1111-1137.
Häggström, О. & Steif, J. E. (2000). Propp-Wilson algorithms and finitary codings for high noise Markov
random fields. Combin. Probab. Comput. 9, 425-439.
Harris, Т. E. (1960). Lower bound for the critical probability in a certain percolation process. Proc.
Cambridge Philos. Soc. 56, 13-20.
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications.
Biometrika 57, 97-109.
Holley, R. (1974). Remarks on the FKG inequalities. Comm. Math. Phys. 36, 227-231.
Jonasson, J. & Nerman, O. (1996). On maximum entropy ftps-sampling with fixed sample size,
technical report, Chalmers and Göteborg University. http://www.math.chalmers.se/Stat/Research/
Preprints/index.cgi.
Kallenberg, О. (1997). Foundations of modern probability. Springer, New York.
Kesten, H. (1980). The critical probability of bond percolation on the square lattice equals Comm.
Math. Phys. 74, 41-59.
Lindvall, T. (1992). Lectures on the coupling method. Wiley, New York.
Markov, A. A. (1906). Rasprostranenie zakona bol'shih chisel na velichiny, zavisyaschie drug ot druga.
Izvestiya Fiziko-matematicheskogo obschestva pri Kazanskom université te, 2-ya seriya, 15, 135-156.
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N, Teller, A. H. & Teller, E. (1953). Equations of
state calculations by fast computing machines. J. Chem. Phys. 21, 1087-1092.
Newton, I. (1707). Arithmetica Universalis: Sive de Compositione et Resolutione Arithmetica Liber.
Niculescu, C. (2000). A new look at Newton's inequalities. J. Inequalities Pure Appl. Math. 1, Issue 1,
Article 17, 1-14.
Propp, J. & Wilson, D. (1996). Exact sampling with coupled Markov chains and applications to statistical
mechanics. Random Structures Algorithms. 9, 223-252.
Reimer, D. (2000). Proof of the van den Berg-Kesten conjecture. Combin. Probab. Comput. 9, 27-32.
Thorisson, H. (1995). Coupling methods in probability theory. Scand J. Statist. 22, 159-182.
Thorisson, H. (2000). Coupling, stationarity, and regeneration. Springer, New York.

Received July 2006, in final form January 2007

Olle Häggström, Mathematical Sciences, Chalmers University of Technology, 412 96 Göteborg, Sweden.
E-mail: [email protected]

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

This content downloaded from 178.250.250.21 on Thu, 01 Sep 2016 11:19:53 UTC
All use subject to http://about.jstor.org/terms

Common questions

Powered by AI

The random-cluster model generalizes the Ising model by considering bond percolation with varying degrees of retention and connection. This adaptation allows it to represent various physical phenomena in the ferromagnetic Ising model by adjusting the parameters of connectivity, effectively capturing the phase transition conditions and critical behavior observed in such systems .

Newton's inequalities involve factorial calculations that encapsulate the combinatorial nature of dependencies within probability distributions. They provide insights into the order and relationship among elements, which can be extended to understanding statistical dependencies. These insights facilitate the derivation of inequalities and comparisons in probability theory, offering a mathematical foundation for exploring structured dependencies and behaviors in complex distributions .

Proving inequality (16) regarding connectivity events in percolation models is challenging because conditioning on connected components can lead to dependencies that complicate straightforward probability calculations. Direct proofs are difficult due to these complexities, with Markov chain arguments often required to manage the interactions between various potential components. These challenges highlight the intricate balance of dependence and independence within the connectivity structures of percolation models .

Harris' inequality suggests that the probability of a Bernoulli variable taking a value of 1 increases with the sum of other variables in the sample. In survey sampling with unequal probabilities, this principle is applied by assuming that if a variable is part of a larger sum, it is more likely to be 1. This intuition is synthesized into a formal inequality, P(X_i = 1 | S = s) < P(X_i = 1 | S = s + 1), reflecting that the probability increases with the overall sum .

Coupling is a technique used to construct two random objects on the same probability space with respective distributions to draw conclusions about or compare these distributions. It involves creating a joint construction of the random variables to facilitate analysis, as seen in Markov chains where coupling can help understand and compare different states or distributions over time .

Gibbs sampler variants in Markov chains manage degenerate conditional probabilities by iteratively updating subsets of variables while conditioning on others, allowing the method to explore the sample space even when straightforward transitions are not feasible. This variant of the Gibbs sampler is particularly useful in high-dimensional models, where direct sampling might be problematic due to degeneracies, as it can maintain the stationary distribution across iterations despite the challenges posed by the conditional dependencies .

The conditional correlation inequality established by van den Berg et al. for bond percolation models is significant because it provides insights into the structure and limitations of connected components in graphs under specific conditions. It states that for any two vertices in a percolation model, conditioning on certain structures leads to conditional independence that can limit the size of connected components, providing a probabilistic framework for understanding connectivity in complex networks .

Ergodic-theoretic results in the context of Markov chains focus on the long-term average behavior of a system and imply that under certain conditions, time averages converge to ensemble averages. These results are used in proofs where rapid convergence to equilibrium is not critical. In contrast, MCMC methods for practical computation critically rely on fast convergence to ensure the accuracy and efficiency of simulations. Therefore, while both use Markov chains, their emphasis differs, with proofs focusing on theoretical convergence and MCMC on practical rates and computational feasibility .

In theoretical proofs, the convergence rate of Markov chains primarily ensures that certain probabilistic characteristics hold in the limit, without concern for the time taken to reach the steady state. In contrast, computational simulations, such as those used in MCMC methods, require reasonably fast convergence to produce accurate and reliable results within limited computing time. While proofs may often consider asymptotic properties unaffected by time, simulations necessitate rapid convergence to utilize the chains effectively in practice .

Coupling assists in achieving exact sampling within Markov chains by constructing two versions of a Markov chain such that they meet (or couple) at a time that determines the stationary distribution. This method helps in minimizing the sampling error and achieving more accurate representation of the distribution by effectively representing the variability and convergence of states within one framework, as shown by Propp and Wilson in their sampling algorithm .

You might also like