0% found this document useful (0 votes)
28 views37 pages

Laumont etal22-BaysianImagingPnP

Uploaded by

吴子晖
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views37 pages

Laumont etal22-BaysianImagingPnP

Uploaded by

吴子晖
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

SIAM J.

IMAGING SCIENCES © 2022 Society for Industrial and Applied Mathematics


Vol. 15, No. 2, pp. 701–737

Bayesian Imaging Using Plug & Play Priors: When Langevin Meets Tweedie∗
Rémi Laumont† , Valentin De Bortoli‡ , Andrés Almansa† , Julie Delon§ , Alain Durmus¶, and
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Marcelo Pereyra∥

Abstract. Since the seminal work of Venkatakrishnan, Bouman, and Wohlberg [Proceedings of the Global Con-
ference on Signal and Information Processing, IEEE, 2013, pp. 945–948] in 2013, Plug & Play (PnP)
methods have become ubiquitous in Bayesian imaging. These methods derive estimators for inverse
problems in imaging by combining an explicit likelihood function with a prior that is implicitly
defined by an image denoising algorithm. In the case of optimization schemes, some recent works
guarantee the convergence to a fixed point, albeit not necessarily a maximum a posteriori Bayesian
estimate. In the case of Monte Carlo sampling schemes for general Bayesian computation, to the
best of our knowledge there is no known proof of convergence. Algorithm convergence issues aside,
there are important open questions regarding whether the underlying Bayesian models and estima-
tors are well defined, are well posed, and have the basic regularity properties required to support
efficient Bayesian computation schemes. This paper develops theory for Bayesian analysis and com-
putation with PnP priors. We introduce PnP-ULA (Plug & Play unadjusted Langevin algorithm)
for Monte Carlo sampling and minimum mean square error estimation. Using recent results on the
quantitative convergence of Markov chains, we establish detailed convergence guarantees for this
algorithm under realistic assumptions on the denoising operators used, with special attention to
denoisers based on deep neural networks. We also show that these algorithms approximately target
a decision-theoretically optimal Bayesian model that is well posed and meaningful from a frequen-
tist viewpoint. PnP-ULA is demonstrated on several canonical problems such as image deblurring
and inpainting, where it is used for point estimation as well as for uncertainty visualization and
quantification.

Key words. Plug & Play, inverse problems, deblurring, inpainting, Markov chain Monte Carlo, Langevin
algorithm


Received by the editors March 22, 2021; accepted for publication (in revised form) December 28, 2021; published
electronically May 31, 2022. The first two authors contributed equally to this work.
https://doi.org/10.1137/21M1406349
Funding: The work of the first author was partially supported by grants from Région Ile-De-France. The work of
the second author was partially supported by EPSRC grant EP/R034710/1. The work of the third and fourth authors
was supported by the French Research Agency through the PostProdLEAP project ANR-19-CE23-0027-01. The work
of the fifth author was supported by the Lagrange Mathematical and Computing Research Center. The work of
the sixth author was supported by EPSRC grants EP/T007346/1 and EP/W007681/1. Computer experiments for
this work ran on a Titan Xp GPU donated by NVIDIA, as well as on HPC resources from GENCI-IDRIS (grants
2020-AD011011641, 2021-AD011011641R1).

Université Paris Cité, CNRS, MAP5 UMR 8145, F-75006 Paris, France (remi.laumont@parisdescartes.fr, andres.
almansa@parisdescartes.fr).

Department of Statistics University of Oxford, 24-29 St Giles, OX1 3LB, Oxford, United Kingdom (valentin.
debortoli@gmail.com).
§
Université Paris Cité, CNRS, MAP5 UMR 8145, F-75006 Paris, France, and Institut Universitaire de France
(IUF), 75231 Paris Cedex 05, France (julie.delon@parisedescartes.fr).

Centre Borelli, UMR 9010, École Normale Supérieure Paris-Saclay, 91190 Gif-sur-Yvette, France (alain.durmus@
cmla.ens-cachan.fr).

School of Mathematical and Computer Sciences, Heriot-Watt University and Maxwell Institute for Mathematical
Sciences, Edinburgh, United Kingdom (m.pereyra@hw.ac.uk).

701

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


702 LAUMONT ET AL.

AMS subject classifications. 65J22, 68U10, 62F15, 65C40, 65C60, 65J20, 65D18, 90C26, 65K05, 68Q25

DOI. 10.1137/21M1406349
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

1. Introduction.
1.1. Bayesian inference in imaging inverse problems. Most inverse problems in imaging
aim at reconstructing an unknown image x ∈ Rd from a degraded observation y ∈ Rm un-
der some assumptions on their relationship. For example, many works consider observation
models of the form y = A(x) + n, where A : Rd → Rm is a degradation operator modeling de-
terministic instrumental aspects of the observation process, and n is an unknown (stochastic)
noise term taking values in Rm . The operator A can be known or not and is usually assumed
to be linear (e.g., A can represent blur, missing pixels, a projection, etc.).
The estimation of x from y is usually ill posed or ill conditioned1 and additional assump-
tions on the unknown x are required in order to deliver meaningful estimates. The Bayesian
statistical paradigm provides a natural framework to regularize such estimation problems.
The relationship between x and y is described by a statistical model with likelihood function
p(y|x), and the knowledge about x is encoded by the prior distribution for x, typically speci-
fied via a density function p(x) or by its potential U (x) = − log p(x). Similarly, in some cases
the likelihood p(y|x) is specified via the potential F (x, y) = − log p(y|x). The likelihood and
prior define the joint distribution with density p(x, y) = p(y|x)p(x), from which we derive the
posterior distribution with density p(x|y) where for any x ∈ Rd , y ∈ Rm
.Z
p(x|y) = p(y|x)p(x) p(y|x̃)p(x̃)dx̃ ,
Rd

which underpins all inference about x given the observation y. Most imaging methods seek to
derive estimators reaching some kind of consensus between prior and likelihood, as for instance
the minimum mean square error (MMSE) or maximum a posteriori (MAP) estimators
(1.1) x̂map = arg max p(x|y) = arg min {F (x, y) + U (x)} ,
x∈Rd x∈Rd
Z
2
(1.2) x̂mmse = arg min E[∥x − u∥ |y] = E[x|y] = x̃p(x̃|y)dx̃ .
u∈Rd Rd

The quality of the inference about x given y depends on how accurately the specified prior
represents the true marginal distribution for x. Most works in the Bayesian imaging literature
consider relatively simple priors promoting sparsity in transformed domains or piecewise reg-
ularity (e.g., involving the ℓ1 norm or the total variation pseudonorm [64, 18, 51, 56]), Markov
random fields [13], or learning-based priors like patch-based Gaussian or Gaussian mixture
models [82, 78, 1, 71, 40]. Special attention is given in the literature to models that have
specific factorization structures or that are log-concave, as this enables the use of Bayesian
computation algorithms that scale efficiently to high dimensions and which have detailed
convergence guarantees [56, 29, 61, 35, 21].
1
That is, either the estimation problem does not admit a unique solution, or there exists a unique solution
but it is not Lipschitz continuous w.r.t. to perturbations in the data y.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


BAYESIAN IMAGING USING PLUG & PLAY PRIORS 703

1.2. Bayesian computation in imaging inverse problems. There is a vast literature on


Bayesian computation methodology for models related to imaging sciences (see, e.g., [58]).
Here, we briefly summarize efficient high-dimensional Bayesian computation strategies derived
from the Langevin stochastic differential equation (SDE)
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

√ √
(1.3) dXt = ∇ log p(Xt |y) + 2dBt = ∇ log p(y|Xt ) + ∇ log p(Xt ) + 2dBt ,

where (Bt )t⩾0 is a d-dimensional Brownian motion. When p(x|y) is proper and smooth, with
x 7→ ∇ log p(x|y) Lipschitz continuous,2 then, for any initial condition X0 ∈ Rd , the SDE (1.3)
has a unique strong solution (Xt )t⩾0 that admits the posterior of interest p(x|y) as unique
stationary density [63]. In addition, for any initial condition X0 ∈ Rd the distribution of
Xt converges toward the posterior distribution in total variation. Although solving (1.3) in
continuous time is generally not possible, we can use discrete time approximations of (1.3)
to generate samples that are approximately distributed according to p(x|y). A natural choice
is the unadjusted Langevin algorithm (ULA) Markov chain (Xk )k⩾0 obtained from an Euler–
Maruyama discretization of (1.3), given by X0 ∈ Rd and the recursion for all k ∈ N

(1.4) Xk+1 = Xk + δ∇ log p(y|Xk ) + δ∇ log p(Xk ) + 2δZk+1 ,

where {Zk : k ∈ N} is a family of independent and identically distributed (i.i.d.) Gaussian


random variables with zero mean and identity covariance matrix and δ > 0 is a step-size
which controls a trade-off between asymptotic accuracy and convergence speed [24, 28]. The
approximation error involved in discretizing (1.3) can be asymptotically removed at the ex-
pense of additional computation by combining (1.4) with a Metropolis–Hastings correction
step, leading to the so-called Metropolis-adjusted Langevin algorithm (MALA) [63].
When the prior density p(x) is log-concave but not smooth, one can still use ULA by ap-
proximating the gradient of U (x) = − log p(x) in (1.4) by the gradient of the smooth Moreau–
Yosida envelope Uλ (x), given for any x ∈ Rd and λ > 0 by ∇Uλ (x) = λ1 (x − proxλU (x)).3 For
example, one could use the Moreau–Yosida ULA [29], given by X0 ∈ Rd and the following
recursion for all k ∈ N:
δh i √
(1.5) Xk+1 = Xk + δ∇ log p(y|Xk ) + proxλU (Xk ) − Xk + 2δZk+1 .
λ
Notice that proxλU is equivalent to MAP denoising under the prior p(x) for additive white
Gaussian noise with noise variance λ. The Plug & Play ULA (PnP-ULA) methods studied
in this paper are closely related to (1.5), with a state-of-the-art Gaussian denoiser “plugged”
in lieu of proxλU . However, instead of approximating ∇U via a Moreau–Yosida envelope as
above, we use Tweedie’s identity (2.2) relating ∇U to an MMSE denoiser (see subsection 2.1).
1.3. Machine learning and P lug & P lay approaches in imaging inverse problems. In
an apparently different direction, machine learning approaches have recently gained consider-
able importance in the field of imaging inverse problems, particularly strategies based on deep
2
That is, there exists L ⩾ 0 such that for any x1 , x2 ∈ Rd , ∥∇ log p(x1 |y) − ∇ log p(x2 |y)∥ ⩽ L∥x1 − x2 ∥.
3
Recall that the Moreau–Yosida envelope is defined as Uλ (x) = inf x̃ U (x̃) + 2λ 1
∥x − x̃∥2 and the proximal
λ 1 2
operator is defined as proxU (x) = arg minx̃∈Rd U (x̃) + 2λ ∥x − x̃∥2 .

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


704 LAUMONT ET AL.

neural networks. Indeed, neural networks can be trained as regressors to learn the function
y 7→ x̂mmse empirically from a huge dataset of examples {x′i , yi′ }N
i=1 , where N ∈ N is the size
of the training dataset. Many recent works on the topic report unprecedented accuracy. This
training can be agnostic [26, 79, 81, 33, 66, 32] or exploit the knowledge of A in the network
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

architecture via unrolled optimization techniques [37, 22, 25, 34]. However, solutions encoded
by end-to-end neural networks are mostly problem specific and not easily adapted to reflect
changes in the problem (e.g., in instrumental settings). There also exist concerns regarding
the stability of such approaches for the general reconstruction problem [5, 4].
A natural strategy to reconcile the strengths of the Bayesian paradigm and neural networks
is provided by Plug & Play approaches. These data-driven regularization approaches learn
an implicit representation of the prior density p(x) (or its potential U (x) = − log p(x)) while
keeping an explicit likelihood density, which is usually assumed to be known and calibrated
[6]. More precisely, using a denoising algorithm Dε , Plug & Play approaches seek to derive an
approximation of the gradient ∇U (called the Stein score) [11, 12] or proxU [52, 80, 19, 44, 65],
which can, for instance, be used within an iterative minimization scheme to approximate x̂MAP ,
or within an Monte Carlo sampling scheme to approximate x̂mmse [3, 38, 43]. To the best of our
knowledge, the idea of leveraging a denoising algorithm to approximate the score ∇U within
an iterative Monte Carlo scheme was first proposed in the seminal paper [3] in the context of
generative modeling with denoising autoencoders, where the authors present a Monte Carlo
scheme that can be viewed as an approximate Plug & Play MALA. This scheme was recently
combined with an expectation maximization approach and applied to Bayesian inference for
inverse problems in imaging in [38]. Similarly, the recent work [43] proposes to solve imaging
inverse problems by using a Plug & Play stochastic gradient strategy that has close connections
to an unadjusted version of the MALA scheme of [3]. While these approaches have shown
some remarkable empirical performance, they rely on hybrid algorithms that are not always
well understood and that in some cases fail to converge. Indeed, their convergence properties
remain an important open question, especially when Dε is implemented as a neural network
that is not a gradient mapping. These algorithms are better understood when interpreted as
fixed point algorithms seeking to reach a set of equilibrium equations between the denoiser
and the data fidelity term [16]. Our understanding of the convergence properties of hybrid
optimization methods has advanced significantly recently [65, 77, 70, 41], but these questions
remain largely unexplored in the context of stochastic Bayesian algorithms, to compute x̂mmse
or perform other forms of statistical inference.
The use of Plug & Play operators has also been investigated in the context of approximate
message passing (AMP) computation methods (see [27] for an introduction to AMP focused
on compressed sensing and [2] for a survey on PnP-AMP in the context of magnetic resonance
imaging), particularly for applications involving randomized forward operators where it is
possible to characterize AMP schemes in detail (see, e.g., [9, 42, 53, 20]). This is an active
area of research, and recent works have extended the approach to vector AMP strategies and
characterized their behavior for a wider class of problems [31].
Approaches based on score matching techniques [68, 39] have also shown promising results
recently [46, 45]. These methods are linked with Plug & Play approaches as they also estimate
a Stein score. However, they do not rely on the asymptotic convergence of a diffusion, but
instead aim at inverting a noising process stemming from an optimal transport problem [15].

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


BAYESIAN IMAGING USING PLUG & PLAY PRIORS 705

The recent work [45] is particularly relevant in this context as it considers a range of imaging
inverse problems, where it exploits the structure of the forward operator to perform posterior
sampling in a coarse-to-fine manner. This also allows the use of multivariate step-sizes that
are specific to each scale and ensure stability. However, to the best of our knowledge, the
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

convergence properties of [45] have not been studied yet.

1.4. Contributions summary. This paper presents a formal framework for Bayesian analy-
sis and computation with Plug & Play priors. We propose two Plug & Play ULAs, with
detailed convergence guarantees under realistic assumptions on the denoiser used. We also
study important questions regarding whether the underlying Bayesian models and estimators
are well defined, are well posed, and have the basic regularity properties required to sup-
port efficient Bayesian computation schemes. We pay particular attention to denoisers based
on deep neural networks and report extensive numerical experiments with a specific neural
network denoiser [65] shown to satisfy our convergence guarantees.
The remainder of the paper is organized as follows. Section 2 defines notation, introduces
our framework for studying Bayesian inference methods with Plug & Play priors, and presents
two Plug & Play ULAs for Bayesian computation in imaging problems. This is then followed
by a detailed theoretical analysis of Plug & Play Bayesian models and algorithms in section 3.
Section 4 demonstrates the proposed approach with experiments related to nonblind image
deblurring and image inpainting, where we perform point estimation and uncertainty visual-
ization analyses, and report comparisons with the Plug & Play stochastic gradient descent
method of [49]. Conclusions and perspectives for future work are finally reported in section 5.

2. Bayesian inference with P lug & P lay priors: Theory, methods, and algorithms.

2.1. Bayesian modeling and analysis with P lug & P lay priors. This section presents
a formal framework for Bayesian analysis and computation with Plug & Play priors. As
explained previously, we are interested in the estimation of the unknown image x from an
observation y when the problem is ill conditioned or ill posed, resulting in significant uncer-
tainty about the value of x. The Bayesian framework addresses this difficulty by using prior
knowledge about the marginal distribution of x in order to reduce the uncertainty about x|y
and make the estimation problem well posed. In the Bayesian Plug & Play approach, instead
of explicitly specifying the marginal distribution of x, we introduce prior knowledge about
x by specifying an image denoising operator Dε for recovering x from a noisy observation
xε ∼ N (x, ε Id) with noise variance ε > 0. A case of particular relevance in this context is
when Dε is implemented by a neural network, trained by using a set of clean images {x′i }N i=1 .
A central challenge in the formalization of Bayesian inference with Plug & Play priors is
that the denoiser Dε used is generally not directly related to a marginal distribution for x, so
it is not possible to derive an explicit posterior for x|y from Dε . As a result, it is not clear that
plugging Dε into gradient-based algorithms such as ULA leads to a well-defined or convergent
scheme that is targeting a meaningful Bayesian model.
To overcome this difficulty, in this paper we analyze Plug & Play Bayesian models through
the prism of M-complete Bayesian modeling [10]. Accordingly, there exists a true—albeit
unknown and intractable—marginal distribution for x and posterior distribution for x|y. If it
were possible, basing inferences on these true marginal and posterior distributions would be

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


706 LAUMONT ET AL.

optimal both in terms of point estimation and in terms of delivering Bayesian probabilities
that are valid from a frequentist viewpoint. We henceforth use µ to denote this optimal prior
distribution for x on (Rd , B(Rd ))—where B(Rd ) denotes the Borel σ-field of Rd , and when µ
admits a density w.r.t. the Lebesgue measure on Rd , we denote it by p⋆ . In the latter case,
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

the posterior distribution for x|y associated with the marginal µ also admits a density that is
given for any x ∈ Rd and y ∈ Rm by4
.Z
⋆ ⋆
(2.1) p (x|y) = p(y|x)p (x) p(y|x̃)p⋆ (x̃)dx̃ .
Rd

Unlike most Bayesian imaging approaches that operate implicitly in an M-closed manner and
treat their postulated Bayesian models as true models (see [10] for more details), we explicitly
regard p⋆ (or more precisely µ) as a fundamental property of the unknown x, and models
used for inference as operational approximations of p⋆ specified by the practitioner (either
analytically, algorithmically, or from training data). This distinction will be useful for using
the oracle posterior (2.1) as a reference, and Plug & Play Bayesian algorithms based on a
denoiser Dε as approximations to reference algorithms to perform inference w.r.t. p⋆ . The
accuracy of the Plug & Play approximations will depend chiefly on the closeness between Dε
and an optimal denoiser Dε⋆ derived form p⋆ that we define shortly.
In this conceptual construction, the marginal µ naturally depends on the imaging appli-
cation considered. It could be the distribution of natural images of the size and resolution of
x, or that of a class of images related to a specific application. And in problems where there
is training data {x′i }N ′ N
i=1 available, we regard {xi }i=1 as samples from µ. Last, we note that
the posterior for x|y remains well defined when µ does not admit a density; this is important
to provide robustness to situations where p⋆ is nearly degenerate or improper. For clarity, our
presentation assumes that p⋆ exists, although this is not strictly required.5
Notice that because µ is unknown, we cannot verify that p⋆ (x|y) satisfies the basic desider-
ata for gradient-based Bayesian computation: i.e., p⋆ (x|y) need not be proper and differen-
tiable, with ∇ log p⋆ (x|y) Lipschitz continuous. To guarantee that gradient-based algorithms
that target approximations of p⋆ (x|y) are well defined by construction, we introduce a reg-
ularized oracle µε obtained via the convolution of µ with a Gaussian smoothing kernel with
bandwidth ε > 0. Indeed, by construction, µε has a smooth proper density pε given for any
x ∈ Rd and ε > 0 by
Z
p⋆ε (x) = (2πε)−d/2 exp [−∥x − x̃∥22 /(2ε)]p⋆ (x̃)dx̃ .
Rd

Equipped with this regularized marginal distribution, we use Bayes’ theorem to involve the
likelihood p(y|x) and derive the posterior density p⋆ε (x|y), given for any ε > 0 and x ∈ Rd by

4
Strictly speaking, the true likelihood p⋆ (y|x) may also be unknown; this is particularly relevant in the
case of blind or myopic inverse imaging problems. For simplicity, we restrict our experiments and theoretical
development to the case where p(y|x) represents the true likelihood. Generalizations of our approach to the blind
or semiblind setting are discussed, e.g., by [38]—formalizing these generalizations is an important perspective
for future work.
5
Operating without densities requires measure disintegration concepts that are technical [67].

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


BAYESIAN IMAGING USING PLUG & PLAY PRIORS 707

.Z
p⋆ε (x|y) = p(y|x)p⋆ε (x) p(y|x̃)p⋆ε (x̃)dx̃ ,
Rd

which inherits the regularity properties required for gradient-based Bayesian computation
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

when the likelihood satisfies the following standard conditions:


H1. For any y ∈ Rm , supx∈Rd p(y|x) < +∞, p(y|·) ∈ C1 (Rd , (0, +∞)) and there exists
Ly > 0 such that ∇ log(p(y|·)) is Ly Lipschitz continuous.
More precisely, Proposition 2.1 below establishes that the regularized prior p⋆ε (x) and
posterior p⋆ε (x|y) are proper, are smooth, and can be made arbitrarily close to the original
oracle models p⋆ (x) and p⋆ (x|y) by reducing ε, with the approximation error vanishing as
ε → 0.
Proposition 2.1. Assume H1. Then, for any ε > 0 and y ∈ Rm , the following hold:
(a) p⋆ε and p⋆ε (·|y) are proper.
(b) For any k ∈ N, p⋆ε ∈ Ck (Rd ). In addition, if p(y|·) ∈ Ck (Rd ), then p⋆ε (·|y) ∈
Ck (Rd , R). R
(c) Let k ∈ N. If Rd ∥x̃∥k p⋆ (x)dx̃ < +∞, then Rd ∥x̃∥k p⋆ε (x̃|y)dx̃ < +∞.
R

(d) limε→0 ∥p⋆ε (·|y) − p⋆ (·|y)∥1 = 0.


(e) In addition, if there exist κ, β ⩾ 0 such that for any x ∈ Rd , ∥p⋆ − p⋆ (· − x)∥1 ⩽ ∥x∥β ,
then there exists C ⩾ 0 such that ∥p⋆ε (·|y) − p⋆ (·|y)∥1 ⩽ Cεβ/2 .
Proof. The proof is postponed to subsection SM8.2.
Under H1 and p(y|·) ∈ C1 (Rd ), x 7→ ∇ log p⋆ε (x|y) is well defined and continuous. However,
x 7→ ∇ log p⋆ε (x|y) might not be Lipschitz continuous and hence the Langevin SDE (1.3) might
not have a strong solution. This requires an additional assumption on µ.
To study the Lipschitz continuity of x 7→ ∇ log p⋆ε (x|y), as well as to set the grounds for
Plug & Play methods that define priors implicitly through a denoising algorithm, we introduce
the oracle MMSE denoiser Dε⋆ defined for any x ∈ Rd and ε > 0 by
Z
⋆ −d/2
Dε (x) = (2πε) x̃ exp [−∥x − x̃∥2 /(2ε)]p⋆ (x̃)dx̃ .
Rd

Under the assumption that the expected mean square error (MSE) is finite, Dε⋆ is the MMSE
estimator to recover an image x ∼ µ from a noisy observation xε ∼ N (x, ε Id) [62]. Again, this
optimal denoiser is a fundamental property of x and it is generally intractable. Motivated by
the fact that state-of-the-art image denoisers are close to optimal in terms of MSE, in subsec-
tion 2.3 we will characterize the accuracy of Plug & Play Bayesian methods for approximate
inference w.r.t. p⋆ε (x|y) and p⋆ (x|y) as a function of the closeness between the denoiser Dϵ
used and the reference Dε⋆ .
To relate the gradient x 7→ ∇ log p⋆ε (x) and Dε⋆ , we use Tweedie’s identity [30], which
states that for all x ∈ Rd

(2.2) ε∇ log p⋆ε (x) = Dε⋆ (x) − x ,

and hence x 7→ ∇ log p⋆ε (x|y) is Lipschitz continuous if and only if Dε⋆ has this property. We
argue that this is a natural assumption on Dε⋆ , as it is essentially equivalent to assuming that

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


708 LAUMONT ET AL.

the denoising problem underpinning Dε⋆ is well posed in the sense of Hadamard (recall that
an inverse problem is said to be well posed if its solution is unique and Lipschitz continuous
w.r.t. to the observation [69]). As established in Proposition 2.2 below, this happens when the
expected MSE involved in using Dε⋆ to recover x from xε ∼ N (x, ε Id), where x has marginal
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

µ, is finite and uniformly upper bounded for all xε ∈ Rd .


Proposition 2.2. Assume H1. Let ε > 0. ∇ log p⋆ε is Lipschitz continuous if and only if
there exists C ⩾ 0 such that for any xε ∈ Rd
Z
∥x − Dε⋆ (xε )∥2 gε (x|xε )dx ⩽ C ,
Rd

where gε (·|xε ) is the density of the conditional distribution of the unknown image x ∈ Rd with
marginal µ, given a noisy observation xε ∼ N (x, ε Id). See subsection 3.2 for details.
Proof. The proof is postponed to Lemma SM6.2.
These results can be generalized to hold under the weaker assumption that the expected
MSE for Dε⋆ is finite but not uniformly bounded, as in this case x 7→ ∇ log p⋆ε (x|y) is locally
instead of globally Lipschitz continuous (we postpone this technical extension to future work).
The pathological case where Dε⋆ does not have a finite MSE arises when µ is such that the
denoising problem does not admit a Bayesian estimator w.r.t. to the MSE loss. In summary,
the gradient x 7→ ∇ log p⋆ε (x|y) is Lipschitz continuous when µ carries enough information to
make the problem of Bayesian image denoising under Gaussian additive noise well posed.
Notice that by using Tweedie’s identity, we can express a ULA recursion for sampling
approximately from p⋆ε (x|y) as follows:

(2.3) Xk+1 = Xk + δ∇ log p(y|Xk ) + (δ/ε) (Dε⋆ (Xk ) − Xk ) + 2δZk+1 ,

where we recall that {Zk : k ∈ N} are i.i.d. standard Gaussian random variables on Rd
and δ > 0 is a positive step-size. Under standard assumptions on δ, the sequence generated
by (2.3) is a Markov chain which admits an invariant probability distribution whose density
is provably close to p⋆ε (x|y), with δ controlling a trade-off between asymptotic accuracy and
convergence speed. In the following section we present Plug & Play ULAs that arise from
replacing Dε⋆ in (2.3) with a denoiser Dε that is tractable.
Before concluding this section, we study whether the oracle p⋆ (x|y) is itself well posed,
i.e., if p⋆ (x|y) changes continuously w.r.t. y under a suitable probability metric (see [48]).
We answer positively to this question in Proposition 2.3, which states that, under mild as-
sumptions on the likelihood, p⋆ (x|y) is locally Lipschitz continuous w.r.t. y for an appropriate
metric. This stability result implies, for example, that the MMSE estimator derived from
p⋆ (x|y) is locally Lipschitz continuous w.r.t. y, and hence stable w.r.t. small perturbations
of y. Note that a similar property holds for the regularized posterior p⋆ε (x|y). In particular,
Proposition 2.3 holds for Gaussian likelihoods (see section 3 for details).
Proposition 2.3. Assume that there exist Φ1 : Rd → [0, +∞) and Φ2 : Rm → [0, +∞)
such that for any x ∈ Rd and y1 , y2 ∈ Rm

∥log(p(y1 |x)) − log(p(y2 |x))∥ ⩽ (Φ1 (x) + Φ2 (y1 ) + Φ2 (y2 )) ∥y1 − y2 ∥ ,

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


BAYESIAN IMAGING USING PLUG & PLAY PRIORS 709

and for any c > 0, Rd (1 + Φ1 (x̃)) exp[cΦ1 (x̃)]p⋆ (x)dx̃ < +∞. Then y 7→ p⋆ (·|y) is locally
R

Lipschitz w.r.t. ∥ · ∥1 , i.e., for any compact set K there exists CK ⩾ 0 such that for any
y1 , y2 ∈ K , ∥p⋆ (·|y1 ) − p⋆ (·|y2 )∥1 ⩽ CK ∥y1 − y2 ∥.
Proof. The proof is a straightforward application of Proposition SM5.3.
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

To conclude, starting from the decision-theoretically optimal model p⋆ (x|y), we have con-
structed a regularized approximation p⋆ε (x|y) that is proper and smooth by construction, with
gradients that are explicitely related to denoising operators by Tweedie’s formula. Under mild
assumptions on p(y|x), the approximation p⋆ε (x|y) is well posed and can be made arbitrarily
close to the oracle p⋆ (x|y) by controlling ε. Moreover, we established that x 7→ ∇ log p⋆ε (x)
is Lipschitz continuous when the problem of Gaussian image denoising for µ under the MSE
loss is well posed. This allows imagining convergent gradient-based algorithms for performing
Bayesian computation for p⋆ε (x|y), setting the basis for Plug & Play ULA schemes that mimic
these idealized algorithms by using a tractable denoiser Dϵ such as neural network, trained to
optimize MSE performance and hence to approximate the oracle MSE denoiser Dϵ⋆ .
2.2. Bayesian computation with P lug & P lay priors. We are now ready to study Plug
& Play ULA schemes to perform approximate inference w.r.t. p⋆ε (x|y) (and hence indirectly
w.r.t. p⋆ (x|y)). We use (2.3) as starting point, with Dε⋆ replaced by a surrogate denoiser
Dε , but also modify (2.3) to guarantee geometrically fast convergence6 to a neighborhood of
p⋆ε (x|y). In particular, geometrically fast convergence is achieved here by modifying far-tail
probabilities to prevent the Markov chain from becoming too diffusive as it explores the tails
of p⋆ε (x|y). We consider two alternatives to guarantee geometric convergence with markedly
different bias-variance trade-offs: one with excellent accuracy guarantees but that requires
using a small step-size δ and hence has a higher computational cost, and another one that
allows taking a larger step-size δ to improve convergence speed at the expense of weaker
guarantees in terms of estimation bias.
First, in the spirit of Moreau–Yosida regularized ULA [29], we define PnP-ULA as the
following recursion: given an initial state X0 ∈ Rd and for any k ∈ N,

(PnP-ULA) Xk+1 = Xk + δ∇ log p(y|Xk ) + (δ/ε) (Dε (Xk ) − Xk )



+ (δ/λ)(ΠC (Xk ) − Xk ) + 2δZk+1 ,

where C ⊂ Rd is some large compact convex set that contains most of the prior probability
mass of x, ΠC is the projection operator onto C w.r.t. the Euclidean scalar product on Rd , and
λ > 0 is a tail regularization parameter that is set such that the drift in PnP-ULA satisfies a
certain growth condition as ∥x∥ → ∞ (see section 3 for details).
An alternative strategy (which we call projected PnP-ULA, i.e., PPnP-ULA; see Algo-
rithm 2.2) is to modify PnP-ULA to include a hard projection onto C, i.e., (Xk )k∈N is defined
by X0 ∈ C and the recursion for any k ∈ N
h √ i
Xk+1 = ΠC Xk + δ∇ log p(y|Xk ) + (δ/ε)(Dε (Xk ) − Xk ) + 2δZk+1 ,

6
Geometric convergence is a highly desirable property in large-scale problems and guarantees that the
generated Markov chains can be used for Monte Carlo integration.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


710 LAUMONT ET AL.

where we notice that, by construction, the chain cannot exit C because of the action of
the projection operator ΠC . The hard projection guarantees geometric convergence with
weaker restrictions on δ and hence PPnP-ULA can be tuned to converge significantly faster
than PnP-ULA, albeit with a potentially larger bias. These two schemes are summarized in
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Algorithms 2.1 and 2.2 below. Note the presence of a regularization parameter α in these
algorithms, which permits us to balance the weights between the prior and data terms. For
the sake of simplicity, this parameter is set to α = 1 in sections 3 and 4 but will be taken into
account in the supplementary material (M140634 01.pdf [local/web 533KB]). Subsections 3.2
and 3.3 present detailed convergence results for PnP-ULA and PPnP-ULA. Implementation
guidelines, including suggestions for how to set the algorithm parameters of PnP-ULA and
PPnP-ULA, are provided in section 4.
Last, it is worth mentioning that Algorithms 2.1 and 2.2 can be straightforwardly modified
to incorporate additional regularization terms. More precisely, one could consider a prior
defined as the (normalized) product of a Plug & Play term and an explicit analytical term.
In that case, one should simply modify the recursion defining the Markov chain by adding the
gradient associated with the analytical term. In a manner akin to [29], analytical terms that
are not smooth are involved via their proximal operator.
Before concluding this section, it is worth emphasizing that, in addition to being important
in their own right, Algorithms 2.1 and 2.2 and the associated theoretical results set the
grounds for analyzing more advanced stochastic simulation and optimization schemes for
performing Bayesian inference with Plug & Play priors, in particular accelerated optimization
and sampling algorithms [59]. This is an important perspective for future work.

Algorithm 2.1. PnP-ULA


Require: n ∈ N, y ∈ Rm , ε, λ, α, δ > 0, C ⊂ Rd convex and compact
Ensure: 2λ(2Ly + αL/ε) ⩽ 1 and δ < (1/3)(Ly + 1/λ + αL/ε)−1
Initialization: Set X0 ∈ Rd and k = 0.
for k = 0 : N do
Zk+1 ∼ N (0, Id) √
Xk+1 = Xk + δ∇ log(p(y|Xk )) + (αδ/ε)(Dε (Xk ) − Xk ) + (δ/λ)(ΠC (Xk ) − Xk ) + 2δZk+1
end for
return {Xk : k ∈ {0, . . . , N + 1}}

Algorithm 2.2. PPnP-ULA


Require: n ∈ N, y ∈ Rm , ε, λ, α, δ > 0, C ⊂ Rd convex and compact
Initialization: Set X0 ∈ C and k = 0.
for k = 0 : N do
Zk+1 ∼ N (0, Id) √ 
Xk+1 = ΠC Xk + δ∇ log(p(y|Xk )) + (αδ/ε)(Dε (Xk ) − Xk ) + 2δZk+1
end for
return {Xk : k ∈ {0, . . . , N + 1}}

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


BAYESIAN IMAGING USING PLUG & PLAY PRIORS 711

3. Theoretical analysis. In this section, we provide a theoretical study of the long-time


behavior of PnP-ULA (see Algorithm 2.1) and PPnP-ULA (see Algorithm 2.2). For any ε > 0
we recall that p⋆ε is given by the Gaussian smoothing of p with level ε for any x ∈ Rd by
Z
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy


pε (x) = (2πε) −d/2
exp[− ∥x − x̃∥2 /(2ε)] p⋆ (x̃)dx̃ .
Rd
One typical example of likelihood function that we consider in our numerical illustration (see
section 4) is p(y|x) ∝ exp[− ∥Ax − y∥2 /(2σ 2 )] for any x ∈ Rd with σ > 0 and A ∈ Rm×d .
We define π the target posterior distribution given for any x ∈ Rd by (dπ/dLeb)(x) =

p (x|y). We also consider the family of probability distributions {πε : ε > 0} given for any
ε > 0 and x ∈ Rd by
Z

(dπε /dLeb)(x) = p(y|x)pε (x) p(y|x̃)p⋆ε (x̃)dx̃ .
Rd
Note that in the supplementary material (M140634 01.pdf [local/web 533KB]) we investigate
the general setting where p⋆ε is replaced by (p⋆ε )α for some α > 0 that acts as a regularization
parameter. We divide our study into two parts. We recall that πε is well defined for any
ε > 0 under H1; see Proposition 2.1. We start with some notation in subsection 3.1. We then
establish nonasymptotic bounds between the iterates of PnP-ULA and πε with respect to the
total variation distance for any ε > 0, in subsection 3.2. Finally, in subsection 3.3 we establish
similar results for PPnP-ULA.
3.1. Notation. Denote by B(Rd ) the Borel σ-field of Rd , and for f : Rd → R measurable,
∥f ∥∞ = supx̃∈Rd |f (x̃)|. For µ a probability measure on (Rd , B(Rd )) and f a µ-integrable
function, denote by µ(f ) the integral of f w.r.t. µ. For f : Rd → R measurable and V :
Rd → [1, ∞) measurable, the V -norm f is given by ∥f ∥V = supx̃∈Rd |f (x̃)|/V (x̃). Let ξ be a
finite signed measure on (Rd , B(Rd )). The V -total variation distance of ξ is defined as
Z
∥ξ∥V = sup f (x̃)dξ(x̃) .
∥f ∥V ⩽1 Rd

If V = 1, then ∥·∥V is the total variation denoted by ∥·∥TV . Let U be an open set of Rd . For
any pair of measurable spaces (X, X ) and (Y, Y), measurable function f : (X, X ) → (Y, Y),
and measure µ on (X, X ) we denote by f# µ the pushforward measure of µ on (Y, Y) given for
any A ∈ Y by f# µ(A) = µ(f −1 (A)). We denote P(Rd ) theR set of probability measures over
(Rd , B(Rd )) and for any m ∈ N, Pm (Rd ) = {ν ∈ P(Rd ) : Rd ∥x̃∥m dν(x̃) < +∞}.
We denote by Ck (U, Rm ) and Ckc (U, Rm ) the set of Rm -valued k-differentiable functions,
respectively, the set of compactly supported Rm -valued and k-differentiable functions. Letting
f : U → R, we denote by ∇f the gradient of f if it exists. f is said to be m-convex with m ⩾ 0
if for all x1 , x2 ∈ Rd and t ∈ [0, 1],
f (tx1 + (1 − t)x2 ) ⩽ tf (x1 ) + (1 − t)f (x2 ) − mt(1 − t) ∥x1 − x2 ∥2 /2 .
For any a ∈ Rd and R > 0, denote B(a, R) the open ball centered at a with radius R. Let (X, X )
and (Y, Y) be two measurable spaces. A Markov kernel P is a mapping K : X×Y → [0, 1] such
that for any x̃ ∈ X, P(x̃, ·) is a probability measure and for any A ∈ Y, P(·, A) is measurable.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


712 LAUMONT ET AL.

For anyR probability measure µ on R (X, X ) and measurable function f : Y → R+ we denote


µP = X P(x, ·)dµ(x) and Pf = Y f (y)P(·, dy). In what follows the Dirac mass at x̃ ∈ Rd is
denoted by δx̃ . For any x̃ ∈ Rd , we denote τx̃ : Rd → Rd the translation operator given for any
x̃′ ∈ Rd by τx̃ (x̃′ ) = x̃′ − x̃. The complement of a set A ⊂ Rd is denoted by Ac . All densities
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

are w.r.t. the Lebesgue measure (denoted Leb) unless stated otherwise. For all convex and
closed sets C ⊂ Rd , we define ΠC the projection operator onto C w.r.t. the Euclidean scalar
product on Rd . For any matrix a ∈ Rd1 ×d2 with d1 , d2 ∈ N, we denote a⊤ ∈ Rd2 ×d1 its adjoint.
3.2. Convergence of PnP-ULA. In this section, we fix ε > 0 and derive quantitative
bounds between the iterates of PnP-ULA and πε with respect to the total variation distance.
To address this issue, we first show that PnP-ULA is geometrically ergodic and establish non-
asymptotic bounds between the corresponding Markov kernel and its invariant distribution.
Second, we analyze the distance between this stationary distribution and πε .
For any ε > 0 we define gε : Rd × Rd → [0, +∞) for any x1 , x2 ∈ Rd by
Z
2
(3.1) ⋆
gε (x1 |x2 ) = p (x1 ) exp[− ∥x2 − x1 ∥ /(2ε)] p⋆ (x̃) exp[− ∥x2 − x̃∥2 /(2ε)]dx̃ .
Rd

Note that g(·|Xε ) is the density with respect to the Lebesgue measure of the distribution
of X given Xε , where X is sampled according to the prior distribution µ (with density p⋆ )
and Xε = X + ε1/2 Z where Z is a Gaussian random variable with zero mean and identity
covariance matrix. Throughout, this section, we consider the following assumption on the
family of denoising operators {Dε : ε > 0} which will ensure that PnP-ULA approximately
targets πε .
H2 (R). We have that Rd ∥x̃∥2 p⋆ (x̃)dx̃ < +∞. In addition, there exist ε0 > 0, MR ⩾ 0,
R

and L ⩾ 0 such that for any ε ∈ (0, ε0 ], x1 , x2 ∈ Rd and x ∈ B(0, R) we have

(3.2) ∥(Id −Dε )(x1 ) − (Id −Dε )(x2 )∥ ⩽ L ∥x1 − x2 ∥ , ∥Dε (x) − Dε⋆ (x)∥ ⩽ MR ,

where we recall that


Z
(3.3) Dε⋆ (x1 ) = x̃ gε (x̃|x1 )dx̃ .
Rd

The Lipschitz continuity condition in (3.2) will be useful for establishing the stability and
geometric convergence of the Markov chain generated by PnP-ULA. This condition can be
explicitly enforced during training by using an appropriate regularization of the neural network
weights [65, 54]. Regarding the second condition in (3.2), MR is a bound on the error involved
in using Dε as an approximation of Dε⋆ for images of magnitude R (i.e., for any x ∈ B(0, R)),
and it will be useful for bounding the bias resulting from using PnP-ULA for inference w.r.t.
πε (recall that the bias vanishes as MR → 0 and δ → 0). For denoisers represented by neural
networks, one can promote a small value of MR during training by using an appropriate loss
function. More precisely, consider a neural network fw : Rd → Rd , parameterized by its
weights and bias gathered in w ∈ W where W is some measurable space, for any ε > 0,
one could target empirical R approximation of a loss of the form ℓε : W → [0, +∞) given for
any w ∈ W by ℓε (w) = Rd ×Rd ∥x − fw (xε )∥2 p⋆ε (xε )gε (x|xε )dxε dx. Note that such a loss is
considered in the Noise2Noise network introduced in [50].

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


BAYESIAN IMAGING USING PLUG & PLAY PRIORS 713

With regard to the theoretical limitations stemming from representing Dε by a deep neural
network, universal approximation theorems (see, e.g., [7, subsection 4.7]) suggest that MR could
be arbitrarily low in principle. For a given architecture and training strategy, and if there exists
−1
M̃R ⩾ 0 such that inf w∈W supx∈B(0,R) M̃R ∥fw (x) − Dε⋆ (x)∥} ⩽ 1, then the second condition in
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

(3.2) holds upon letting Dε = fw† for an appropriate choice of weights w† ∈ W. This last
inequality can be established using universal approximation theorems such as [7, subsection
4.7]. Moreover, for any other w ∈ W, ℓε (w) ⩾ Rd ×Rd ∥x − Dε⋆ (xε )∥2 p⋆ε (xε )gε (x|xε )dxdxε = ℓ⋆ε ,
R

since for any xε ∈ Rd , Dε⋆ (xε ) = Rd x̃ gε (x̃|xε )dx̃; see (3.3). Consider w† ∈ W obtained
R

after numerically minimizing ℓε and satisfying ℓε (w† ) ⩽ ℓ⋆ε + η with η > 0. In this case, the
following result ensures that (3.2) is satisfied with MR of order η 1/(2d+2) for any R > 0 and
letting Dε = fw† .
Proposition 3.1. Assume that for any w ∈ W
Z
(3.4) (∥x∥2 + ∥fw (xε )∥2 )p⋆ε (xε )gε (x|xε )dxdxε < +∞ .
Rd

Let R, η > 0 and w† ∈ W such that ℓε (w† ) ⩽ ℓ⋆ε + η. In addition, assume that
n o
sup ∥x2 − x1 ∥−1 (∥fw† (x2 ) − fw† (x1 )∥ + ∥Dε⋆ (x2 ) − Dε⋆ (x1 )∥) < +∞ ,
x1 ,x2 ∈B(0,2R)

where Dε⋆ is given in (3.3). Then there exists CR , η̄R ⩾ 0 such that if η ∈ (0, η̄R ], then for any
x̃ ∈ B(0, R), ∥fw† (x̃) − Dε⋆ (x̃)∥ ⩽ CR η 1/(2d+2) .
Proof. The proof is postponed to subsection SM6.1.
Note that (3.4) is satisfied if for any w ∈ W, supx∈Rd ∥fw (x)∥(1 + ∥x∥)−1 < +∞ and H2
holds.
We recall that PnP-ULA (see Algorithm 2.1) is given by the following recursion: X0 ∈ Rd
and for any k ∈ N

(3.5) Xk+1 = Xk + δbε (Xk ) + 2δZk+1 ,
bε (x) = ∇ log p(y|x) + Pε (x) + (proxλ (ιC )(x) − x)/λ , Pε (x) = (Dε (x) − x)/ε ,

where δ > 0 is a step-size, ε, λ > 0 are hyperparameters of the algorithm, C ⊂ Rd is a closed


convex set, {Zk : k ∈ N} is a family of i.i.d. Gaussian random variables with zero mean and
identity covariance matrix, and proxλ (ιC ) is the proximal operator of ιC with step-size λ (see
[8, Definition 12.23]), where ιC is the convex indicator of C defined for x ∈ Rd by ιC = +∞ if
x∈ / C and 0 if x ∈ C. Note that for any x ∈ Rd we have proxλ (ιC )(x) = ΠC (x), where ΠC is
the projection onto C.
In what follows, for any δ > 0 and C ⊂ Rd closed and convex, we denote by Rε,δ :
Rd × B(Rd ) → [0, 1] the Markov kernel associated with the recursion (3.5) and given for any
x ∈ Rd and A ∈ B(Rd ) by
Z √
Rε,δ (x, A) = (2π)−d/2 1A (x + δbε (x) + 2δz) exp[− ∥z∥2 /2]dz .
Rd

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


714 LAUMONT ET AL.

Note that for ease of notation, we do not explicitly highlight the dependency of Rε,δ and bε
with respect to the hyperparameter λ > 0 and C.
Here we consider the case where x 7→ log p(y|x) satisfies a one-sided Lipschitz condition,
i.e., we consider the following condition.
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

H3. There exists m ∈ R such that for any x1 , x2 ∈ Rd we have

⟨∇ log p(y|x2 ) − ∇ log p(y|x1 ), x2 − x1 ⟩ ⩽ −m ∥x2 − x1 ∥2 .

We refer to the supplementary material section SM3 for refined convergence rates in the
case where x 7→ log p(y|x) is strongly m-concave. Note that if H3 is satisfied with m > 0, then
x 7→ log p(y|x) is m-concave. Assume H1, then H3 holds for m = −Ly . However, it is possible
that m > −Ly , which leads to better convergence rates for PnP-ULA. As a result even when
H1 holds we still consider H3. In order to deal with H3 in the case where m ⩽ 0, we set
C ⊂ Rd to be some convex compact set fixed by the user. Doing so, we ensure the stability
of the Markov chain. The choice of C in practice is discussed in section 4. In our imaging
experiments, we recall that for any x ∈ Rd we have p(y|x) ∝ exp[− ∥Ax − y∥2 /(2σ 2 )]. If A
is not invertible, then x 7→ log p(y|x) is not m-concave with m > 0. This is the case in our
deblurring experiment when the convolution kernel has zeros in the Fourier domain.
We start with the following result, which ensures that the Markov chain (3.5) is geometri-
cally ergodic under H2 for the Wasserstein metric W1 and in V -norm for V : Rd → [1, +∞)
given for any x ∈ Rd by

(3.6) V (x) = 1 + ∥x∥2 .

Proposition 3.2. Assume H1, H2(R) for some R > 0, and H3. Let λ > 0, ε ∈ (0, ε0 ] such
that 2λ(Ly + L/ε − min(m, 0)) ⩽ 1 and δ̄ = (1/3)(Ly + L/ε + 1/λ)−1 . Then for any C ⊂ Rd
convex and compact with 0 ∈ C, there exist A1,C ⩾ 0 and ρ1,C ∈ [0, 1) such that for any
δ ∈ (0, δ̄], x1 , x2 ∈ Rd , and k ∈ N we have

δx1 Rkε,δ − δx2 Rkε,δ ⩽ A1,C ρkδ 2 2


1,C (V (x1 ) + V (x2 )) ,
V
W1 (δx1 Rkε,δ , δx2 Rkε,δ ) ⩽ A1,C ρkδ
1,C ∥x1 − x2 ∥ ,

where V is given in (3.6).


Proof. The proof is postponed to subsection SM6.2.
The constants A1,C and ρ1,C do not depend on the dimension d but only on the parameters
m, L, Ly , ε, and C. Note that a similar result can be established for Wp for any p ∈ N∗ instead
of W1 . Under the conditions of Proposition 3.2 we have for any ν1 , ν2 ∈ P1 (Rd )
Z Z 
(3.7) ν1 Rkε,δ − ν2 Rkε,δ ⩽ A1,C ρkδ
1,C V 2
(x̃)dν1 (x̃) + V 2
(x̃)dν2 (x̃) ,
V Rd Rd
Z Z 
k k kδ
W1 (ν1 Rε,δ , ν2 Rε,δ ) ⩽ A1,C ρ1,C ∥x̃∥ dν1 (x̃) + ∥x̃∥ dν2 (x̃) .
Rd Rd

First, (P1 (Rd ), W1 ) is a complete metric space [73, Theorem 6.18]. Second, for any
δ ∈ (0, δ̄], there exists m ∈ N∗ such that f m is contractive with f : P1 (Rd ) → P1 (Rd ) given

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


BAYESIAN IMAGING USING PLUG & PLAY PRIORS 715

for any ν ∈ P1 (Rd ) by f(ν) = νRε,δ using Proposition 3.2. Therefore we can apply the
Picard fixed point theorem and we obtain that Rε,δ admits an invariant probability measure
πε,δ ∈ P1 (Rd ).
Therefore, since πε,δ is an invariant probability measure for Rε,δ and πε,δ ∈ P1 (Rd ), using
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

(3.7), we have for any ν ∈ P1 (Rd )


Z Z 
k kδ 2 2
νRε,δ − πε,δ ⩽ A1,C ρ1,C V (x̃)dν(x̃) + V (x̃)dπε,δ (x̃) ,
V Rd Rd
Z Z 
k kδ
W1 (νRε,δ , πε,δ ) ⩽ A1,C ρ1,C ∥x̃∥ dν(x̃) + ∥x̃∥ dπε,δ (x̃) .
Rd Rd

Combining this result with the fact that for any t ⩾ 0, (1 − e−t )−1 ⩽ 1 + t−1 , we get that for
any n ∈ N∗ and h : Rd → R measurable such that supx∈Rd {(1 + ∥x∥2 )−1 |h(x)|} < +∞
Xn Z
n−1 E[h(Xk )] − h(x̃)dπε,δ (x̃)
k=1 Rd
 Z 
−1 2 2
⩽ A1,C (δ̄ + log (1/ρ1,C )) V (x) + V (x̃)dπε,δ (x̃) (nδ) ,
Rd

where (Xk )k∈N is the Markov chain given by (3.5) with starting point X0 = x ∈ Rd .
In the rest of this section we evaluate how close the invariant measure πε,δ is to πε . Our
proof will rely on the following assumption, which is necessary to ensure that x 7→ log p⋆ε (x)
has Lipschitz gradients; see Proposition 2.2.
H4. For any ε > 0, there exists Kε ⩾ 0 such that for any x ∈ Rd ,
Z Z 2
x̃ − x̃′ gε (x̃′ |x)dx̃′ gε (x̃|x)dx̃ ⩽ Kε ,
Rd Rd

with gε given in (3.1).


We emphasize that H4 is not needed to establish the convergence of the Markov chain.
However, we impose it in order to compare the stationary distribution of PnP-ULA with the
target distribution πε . Depending on the prior distribution density p⋆ , H4 may be checked by
hand. Finally, note that H4 can be extended to cover the case where the prior distribution µ
does not admit a density with respect to the Lebesgue measure.
In the following proposition, we show that we can control the distance between πε,δ and
πε based on the previous observations.
Proposition 3.3.
R Assume H1, H2(R) for some R > 0, H3, and H4. Moreover, let ε ∈ (0, ε0 ]
4 ⋆
and assume that Rd (1 + ∥x̃∥ )pε (x̃)dx̃ < +∞. Let λ > 0 such that 2λ(Ly + (/ε) max(L, 1 +
Kε /ε) − min(m, 0)) ⩽ 1 and δ̄ = (1/3)(Ly + L/ε + 1/λ)−1 . Then for any δ ∈ (0, δ̄] and C convex
and compact with 0 ∈ C, Rε,δ admits an invariant probability measure πε,δ . In addition, there
exists B0 ⩾ 0 such that for any C convex compact with B(0, RC ) ⊂ C and RC > 0, there exists
B1,C ⩾ 0 such that for any δ ∈ (0, δ̄]
∥πε,δ − πε ∥V ⩽ B0 RC−1 + B1,C (δ 1/2 + MR + exp[−R]) ,
where V is given in (3.6).

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


716 LAUMONT ET AL.

Proof. The proof is postponed to subsection SM6.3.


We now combine Propositions 3.2 and 3.3 in order to control the bias of the Monte
Carlo estimator obtained using
Pn PnP-ULA. InR the supplementary material section SM4 we
also provide bounds on |n −1
k=1 E[h(Xk )] − Rd h(x̃)dπ(x̃)| by controlling ∥π − πε ∥V .
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

R Assume H1, H2(R) for some R > 0, H3, and H4. Moreover, let ε ∈ (0, ε0 ]
Proposition 3.4.
and assume that Rd (1 + ∥x̃∥4 )p⋆ε (x̃)dx̃ < +∞. Let λ > 0 such that 2λ(Ly + (1/ε) max(L, 1 +
Kε /ε) − min(m, 0)) ⩽ 1 and δ̄ = (1/3)(Ly + L/ε + 1/λ)−1 . Then there exists C1,ε > 0 such that
for any C convex compact with B(0, RC ) ⊂ C and RC > 0 there exists C2,ε,C such that for any
h : Rd → R measurable with supx∈Rd {|h(x)| (1 + ∥x∥2 )−1 } ⩽ 1, n ∈ N∗ , δ ∈ (0, δ̄] we have

n
X Z
−1
(3.8) n E [h(Xk )] − h(x̃)dπε (x̃)
k=1 Rd
n  o
⩽ C1,ε RC−1 + C2,ε,C δ 1/2 + MR + exp[−R] + (nδ)−1 (1 + ∥x∥4 ) .

Proof. The proof is straightforward combining Propositions 3.2 and 3.3.

3.3. Convergence guarantees for PPnP-ULA. We now study PPnP-ULA. It is given by


the following recursion: X0 ∈ C and for any k ∈ N

(3.9) Xk+1 = ΠC (Xk + δbε (Xk ) + 2δZk+1 ) ,
bε (x) = ∇ log p(y|x) + Pε (x) , Pε (x) = (Dε (x) − x)/ε ,

where δ > 0 is a step-size, ε > 0 is a hyperparameter of the algorithm, C ⊂ Rd is a closed


convex set, {Zk : k ∈ N} is a family of i.i.d. Gaussian random variables with zero mean and
identity covariance matrix, and ΠC is the projection onto C. In what follows, for any δ > 0
and C ⊂ Rd closed and convex, we denote by Qε,δ : Rd × B(Rd ) → [0, 1] the Markov kernel
associated with the recursion (3.9) and given for any x ∈ Rd and A ∈ B(Rd ) by
Z √
Qε,δ (x, A) = (2π) −d/2
1Π−1 (A) (x + δbε (x) + 2δz) exp[− ∥z∥2 /2]dz .
C
Rd

Note that for ease of notation, we do not explicitly highlight the dependency of Qε,δ and bε
with respect to the hyperparameter C.
First, we have the following result, which ensures that PPnP-ULA is geometrically ergodic
for all step-sizes.
Proposition 3.5. Assume H1, H2(R) for some R > 0. Let λ, ε, δ̄ > 0. Then for any C ⊂ Rd
convex and compact with 0 ∈ C, there exist ÃC ⩾ 0 and ρ̃C ∈ [0, 1) such that for any δ ∈ (0, δ̄],
x1 , x2 ∈ C, and k ∈ N we have

∥δx1 Qkε,δ − δx2 Qkε,δ ∥TV ⩽ ÃC ρ̃kδ


C .

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


BAYESIAN IMAGING USING PLUG & PLAY PRIORS 717

Proof. The proof is postponed to subsection SM7.1.


In particular Qε,δ admits an invariant probability measure πε,δ C . The next proposition

ensures that for small enough step-size δ the invariant measures of PnP-ULA and PPnP-ULA
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

are close if the compact convex set C has a large diameter.


Proposition 3.6. Assume H1, H2(R) for some R > 0, and H3. In addition, assume that
there exists m̃, c > 0 such that for C = Rd and for any ε > 0 and x ∈ Rd , ⟨bε (x), x⟩ ⩽
−m̃ ∥x∥2 + c. Let λ > 0, ε ∈ (0, ε0 ] such that 2λ(Ly + L/ε − min(m, 0)) ⩽ 1. Then there
exist Ā ⩾ 0 and η, δ̄ > 0 such that for any C ⊂ Rd convex and compact with 0 ∈ C and
B(0, RC /2) ⊂ C ⊂ B(0, RC ) and δ ∈ (0, δ̄] we have

C
∥πε,δ − πε,δ ∥TV ⩽ Ā exp[−ηRC ] ,

C is the invariant measure of Q .


where πε,δ is the invariant measure of Rε,δ and πε,δ ε,δ

Proof. The proof is postponed to subsection SM7.2.


It is worth mentioning at this point that in our experiments (see section 4) the probability
of the iterates (Xn )n∈N leaving C with PnP-ULA or with PPnP-ULA is so low that the
projection constraint is not activated. As a result, if implemented with the same step-size both
algorithms produce the same results. We do not suggest completely removing the constraints
as this is important to theoretically guarantee the geometric ergodicity of the algorithms.
Regarding the choice of the step-size, we observe that the bound δ̄ = (1/3)(Ly + L/ε +
1/λ)−1 used in PnP-ULA is conservative and our experiments suggest that PnP-ULA is stable
for larger step-sizes.

4. Experimental study. This section illustrates the behavior of PnP-ULA and PPnP-ULA
with two classical imaging inverse problems: nonblind image deblurring and inpainting. For
these two problems, we first analyze in detail the convergence of the Markov chain generated
by PnP-ULA for different test images. This is then followed by a comparison between the
MMSE Bayesian point estimator, as calculated by using PnP-ULA and PPnP-ULA, and the
MAP estimator provided by the recent PnP-SGD method [49]. We refer the reader to [49]
for comparisons with PnP-ADMM [65]. To simplify comparisons, for all experiments and
algorithms, the operator Dε is chosen as the pretrained denoising neural network introduced
in [65], for which (Dε − Id) is L-Lipschitz with L < 1.
For the deblurring experiments, the observation model takes the form

y = Ax + n ,

where x ∈ Rd is the unknown original image, y ∈ Rm is the observed image, n is a realization


of a Gaussian i.i.d. centered noise with variance σ 2 Id (with σ 2 = (1/255)2 ), and A is a 9 × 9
box blur operator. The log-likelihood for this case reads log p(y|x) = −∥Ax − y∥2 /(2σ 2 ).
In the inpainting experiments, we seek to recover x ∈ Rd from y = Ax where the matrix
A is an m × d matrix containing m randomly selected rows of the d × d identity matrix. We
focus on a case where 80% of the image pixels are hidden and the observed pixels are measured

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


718 LAUMONT ET AL.

without any noise. Because the posterior density for x|y is degenerate, we run PnP-ULA on
the posterior x̃|y where x̃ := Px ∈ Rn denotes the vector of n = d − m unobserved pixels of
x, and map samples to the pixel space by using the affine mapping fy : Rn → Rd defined for
any x̃ ∈ Rn and y ∈ Rm by
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

fy (x̃) = P⊤ x̃ + A⊤ y.

Note that we can write the log-posterior Ũε (x̃) = − log pε (x̃|y) on the set Rn of hidden pixels
in terms of fy and the log-prior Uε (x) = − log pε (x) on the set Rd :

Ũε = Uε ◦ fy .

Using the chain rule and Tweedie’s formula, we have that for any x ∈ Rd and y ∈ Rm

bε (x̃) = −∇Ũε (x̃) = −P∇Uε (fy (x̃)) = (1/ε)P(Dε − Id)(fy (x̃)) .

Since P and fy are 1-Lipschitz, bε = −∇Ũε is also Lipschitz with constant L̃ ⩽ (L/ε).
Figure 1 shows the six test images of size 256×256 pixels that were used in the experiments.
We have selected these six images for their diversity in composition, content, and level of detail
(some images are predominantly composed of piecewise constant regions, whereas others are
rich in complex textures). This diversity will highlight strengths and limitations of the chosen
denoiser as an image prior. Figure 2 depicts the corresponding blurred images and Figure 3
the images to inpaint.

Cameraman. Simpson. Traffic.

Alley. Bridge. Goldhill.


Figure 1. Original images used for the deblurring and inpainting experiments.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


BAYESIAN IMAGING USING PLUG & PLAY PRIORS 719
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

PSNR=20.30/SSIM=0.70 PSNR=22.44/SSIM=0.66 PSNR=20.34/SSIM=0.49

PSNR=22.64/SSIM=0.46 PSNR=21.84/SSIM=0.49 PSNR=22.61/SSIM=0.45

Figure 2. Images of Figure 1, blurred using a 9×9-box-filter operator and corrupted by an additive Gaussian
white noise with standard deviation σ = 1/255.

PSNR=6.69/SSIM=0.11 PSNR=7.43/SSIM=0.04 PSNR=8.35/SSIM=0.09

PSNR=8.27/SSIM=0.004 PSNR=5.71/SSIM=0.004 PSNR=6.61/SSIM=0.03

Figure 3. Images of Figure 1, with 80% missing pixels.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


720 LAUMONT ET AL.

4.1. Implementation guidelines and parameter setting. In the following, we provide


some simple and robust rules in order to set the parameters of the different algorithms, in
particular the discretization step-size δ and the tail regularization parameter λ.
Choice of the denoiser. The theory presented in section 3 requires that Dε satisfies
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

H2(R). As default choice, we recommend using a pretrained denoising neural network such
as the one described in [65]. The Lipschitz constant of the network is controlled during
training by using spectral normalization and therefore the first condition of H2(R) holds.
Moreover, the loss function used to train the network is given by ℓε as introduced in subsec-
tion 3.2. Therefore, under the conditions of Proposition 3.1, we get that the second condition of
H2(R) holds.
Step-size δ. The parameter δ controls the asymptotic accuracy of PnP-ULA and PPnP-
ULA, as well as the speed of convergence to stationarity. This leads to the following bias-
variance trade-off. For large values of δ, the Markov chain has low autocorrelation and con-
verges quickly to its stationary regime. Consequently, the Monte Carlo estimates computed
from the chain exhibit low asymptotic variance, at the expense of some asymptotic bias. On
the contrary, small values of δ produce a Markov chain that explores the parameter space less
efficiently, but more accurately. As a result, the asymptotic bias is smaller, but the variance
is larger. In the context of inverse problems that are high-dimensional and ill posed, properly
exploring the solution space can take a large number of iterations. For this reason, we recom-
mend using large values of δ, at the expense of some bias. In addition, in PnP-ULA, δ is also
subject to a numerical stability constraint related to the inverse of the Lipschitz constant of
bε (x) = ∇ log pε (x|y); namely, we require δ < (1/3) Lip(bε )−1 where
(
αL/ε + 1/λ for the inpainting problem,
Lip(bε ) =
αL/ε + Ly + 1/λ otherwise,

where L and Ly are respectively the Lipschitz constant of the denoiser residual (Dε − Id)
and the Lipschitz constant of the log-likelihood gradient. In our experiments, L = 1 and
Ly = ∥A⊤ A∥/σ 2 , so we choose δ just below the upper bound δth = 1/3(Lip(bε ))−1 where
A⊤ is the adjoint of A. For PPnP-ULA, we set δ < (L/ε + Ly )−1 (resp., δ < (L/ε)−1 for
inpainting) to prevent excessive bias.
Parameter λ. The parameter λ controls the tail behavior of the target density. As previ-
ously explained, it must be set so that the tails of the target density decay sufficiently fast
to ensure convergence at a geometric rate, a key property for guaranteeing that the Monte
Carlo estimates computed from √ the chain are consistent and subject to a central limit the-
orem with the standard O( k) rate. More precisely, we require λ ∈ (0, 1/2(L/ε + 2Ly )).
Within this admissible range, if λ is too small this limits the maximal δ and leads to a
slow Markov chain. For this reason, we recommend setting λ as large as possible below
(2L/ε + 4Ly )−1 .
Other parameters. The compact set C is defined as C = [−1, 2]d , even if in practice no
samples were generated outside of C in all our experiments, which suggests that the tail decay
conditions hold without explicitly enforcing them. In all our experiments, we set the noise
level of the denoiser Dε to ε = (5/255)2 . The initialization X0 can be set to a random vector.
In our experiments (where m = d), we chose X0 = y in order to reduce the number of burn-

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


BAYESIAN IMAGING USING PLUG & PLAY PRIORS 721

in iterations. For m ̸= d we could use X0 = A⊤ y instead. Concerning the regularization


parameter α, by default we set α = 1, but in some cases it is possible to marginally improve
the results by fine tuning it.
All algorithms are implemented using Python and the PyTorch library, and run on an
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Intel Xeon CPU E5-2609 server with a Nvidia Titan XP graphic card or on Idris’ Jean-Zay
servers featuring Intel Cascade Lake 6248 CPUs with a single Nvidia Tesla V100 SXM2 GPU.
Reported running times correspond to the Xeon + Titan XP configuration.

4.2. Convergence analysis of PnP-ULA in nonblind image deblurring and inpainting.


When using a sampling algorithm such as PnP-ULA on a new problem, it is essential to
check that the state space is correctly explored. In order to provide a thorough convergence
study, we first run the algorithm for 25 × 106 iterations. We use a burn-in period of 2.5 × 106
iterations, and consider only the samples computed after this burn-in period to study the
Markov chain in a close-to-stationary regime. In subsection 4.3, we will see that many fewer
iterations are required if the goal is only to compute point estimators with PnP-ULA. For
simplicity, the algorithm is always initialized with the observation y in our experiments with
PnP-ULA (for inpainting, this means that unknown pixels are initialized to the value 0).
There is no fully comprehensive way to empirically characterize the convergence prop-
erties of a high-dimensional Markov chain, as different statistics computed from the same
chain align differently with the eigenfunctions of the Markov kernel and hence exhibit differ-
ent convergence speeds. In problems of small dimension, we would calculate and analyze the
d-dimensional multivariate autocorrelation function (ACF) of the Markov chain, but this is
not feasible in imaging problems. In problems of moderate dimension, one could characterize
the range of convergence speeds by first estimating the posterior covariance matrix (which, for
256 × 256 images, would be a 2562 × 2562 matrix) and then performing a principal component
analysis on this matrix to identify the directions with smallest and largest uncertainty, as
these would provide a good indication of the subspaces where the chain converges the fastest
and the slowest. However, computing the posterior covariance matrix is also not possible in
imaging problems because of the dimensionality involved. Here we focus on approximations
of the posterior covariance which make sense for the particular inverse problem we study.
More precisely, we use the diagonalization basis of the inverse operator, i.e., the Fourier basis
for the deblurring experiments, and the basis formed by the unknown pixels for the inpaint-
ing experiments. Under the assumption that the posterior covariance is mostly determined
by the likelihood, this strategy allows broadly identifying the linear statistics that converge
fastest and slowest, without requiring the estimation and manipulation of prohibitively large
matrices.
Inpainting. We first focus on the inpainting problem. Figure 4 shows a map of the pixel-
wise marginal standard deviations for all images. We observe that pixels in homogeneous
regions have low uncertainty, while pixels on textured regions, edges, or complex structures
(a reflection on the window shutter in the Alley image, for instance) are the most uncertain.
For the same experiments, Figure 5 shows the Euclidean distance between the final MMSE
estimate (computed using all samples) and the samples of the chain, every 2500 samples
(after the burn-in period, and hence in what is considered to be a close-to-stationary regime).
Fluctuations around the posterior mean and the absence of temporal structure in the plots

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


722 LAUMONT ET AL.
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Figure 4. Marginal posterior standard deviation of the unobserved pixels for the inpainting problem. Un-
certainty is located around edges and in textured areas.

of Alley or Goldhill are a first indication that the chain explores the solution space with
ease. However, in some other cases such as the Simpson image, we observe meta-stability,
where the chain stays in a region of the space for millions of iterations and then jumps to
a different region, again for millions of iterations. This is one of the drawbacks of operating
with a posterior distribution that is not log-concave and that may exhibit several modes.
Last, Figure 6 displays the sample ACFs of the fastest and slowest converging statistics
associated with the inpainting experiments (as estimated by identifying, for each image, the
unknown pixels with lowest and highest uncertainty). These ACF plots measure how fast
samples become uncorrelated. A fast decay of the ACF is associated with good Markov chain
mixing, which in turn implies accurate Monte Carlo estimates. On the contrary, a slow decay
of the ACF indicates that the Markov chain is moving slowly, which leads to Monte Carlo
estimates with high variance. As mentioned previously, because computing and visualizing a
multivariate ACF is difficult, here we show the ACF of the chain along the slowest and the
fastest directions in the spatial domain (for completeness, we also show the ACF for a pixel
with median uncertainty). We see that independence is reached very fast in the subspaces of
low or median uncertainty and is much slower for the few very uncertain pixels.
Deblurring. We now focus on the nonblind image deblurring experiments, where, as ex-
plained previously, we perform our convergence analysis by using statistics associated with the
Fourier domain. Figure 7 depicts the marginal standard deviation of the Fourier coefficients
(in absolute value) for all images. For the three images Cameraman, Simpson, and Traffic,

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


BAYESIAN IMAGING USING PLUG & PLAY PRIORS 723
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Cameraman. Simpson. Traffic.

Alley. Bridge. Goldhill.


Figure 5. Evolution of the L2 distance between the final MMSE estimate and the samples generated by
PnP-ULA for the inpainting problem after the burn-in phase. Samples randomly oscillate around the MMSE.
It means that they are uncorrelated. For the images Cameraman, Simpson, or Bridge, we note a change of range
for the L2 distance. It could be interpreted as a mode switching as our posterior is likely not log-concave.

Fastest direction Median direction Slowest direction


Figure 6. ACF for the inpainting problem. The ACF are shown for lags up to 5e5 for all images in the
pixel domain. After 5e5 iterations, sample pixels are nearly uncorrelated in all spatial directions for the images
Traffic, Alley, Bridge, and Goldhill. For the images Cameraman and Simpson, in the slowest direction,
samples need more iterations to become uncorrelated.

all the standard deviations have a similar range of values, and the largest values are observed
around frequencies in the kernel of the blur filter (shown on the right of the same figure)
and for high frequencies. Conversely, for the three images Alley, Bridge, and Goldhill,

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


724 LAUMONT ET AL.
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Fourier
transform of the
deblurring kernel.

Figure 7. Log-standard deviation maps in the Fourier domain for the Markov chains defined by PnP-ULA
for the deblurring problem. First line: images Cameraman, Simpson, Traffic. Second line: images Alley,
Bridge, and Goldhill. For the first three images, we clearly see that uncertainty is observed on frequencies
that are near the kernel of the blur filter (shown on the right) and is also higher around high frequencies (i.e.,
around edges and textured areas in images). For the last three images, very high uncertainty is observed around
some specific frequencies. In the direction of these frequencies, the Markov chain is moving very slowly and the
mixing time of the chain is particularly slow, as shown on Figure 9.

very high uncertainty is observed in the vicinity of four specific frequencies. This suggests
that the denoiser used is struggling to regularize these specific frequencies, and consequently
the posterior distribution is very spread out along these directions and difficult to explore by
Markov chain sampling as a result. Interestingly, this phenomenon is only observed in the
images that are rich in texture content.
Moreover, Figure 8 depicts the Euclidean distance between the MMSE estimator com-
puted from the entire chain (i.e., all samples) and each sample (we show one point every
2500 samples). We notice that many of the images exhibit some degree of meta-stability or
slow convergence because of the presence of directions in the solution space with very high
uncertainty. Again, this is consistent with our convergence theory, which identifies posterior
multimodality and anisotropy as key challenges that future work should seek to overcome.
Last, we show in Figure 9 the sample ACFs for the slowest and the fastest directions
in the Fourier domain.7 Again, in all experiments, independence is achieved quickly in the
fastest direction. The behaviors of the slowest direction for the three images Alley, Bridge,
and Goldhill suggest that the Markov chain is close to the stability limit and exhibits highly
oscillatory behavior as well as poor mixing.

7
The slowest direction corresponds to the Fourier coefficient with the highest (real or imaginary) variance.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


BAYESIAN IMAGING USING PLUG & PLAY PRIORS 725
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Cameraman. Simpson. Traffic.

Alley. Bridge. Goldhill.


Figure 8. Evolution of the L2 distance between the final MMSE estimate and the samples generated by
PnP-ULA for the deblurring problem after the burn-in phase. For images as Cameraman or Simpson, samples
randomly oscillate around the MMSE. On the contrary, for images as Bridge or Goldhill, the plot is structured,
meaning that samples are still correlated.

Fast direction Slow direction Fast direction Slow direction


Figure 9. ACF for the deblurring problem. The ACF are shown for lags up to 1.75e5 for the three images
Cameraman, Simpson, and Traffic (see the two plots to the left) and independence seems to be achieved in all
directions. For the three other images, independence is not achieved in the slowest direction (corresponding to
the most uncertain frequency of the samples in the Fourier domain) even after 1e6 iterations.

4.3. Point estimation for nonblind image deblurring and inpainting. We are now ready
to study the quality of the MMSE estimators delivered by PnP-ULA and PPnP-ULA and
report comparisons with MAP estimation by PnP-SGD [49].

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


726 LAUMONT ET AL.

Quantitative results. Figure 10 illustrates the evolution of the peak signal-to-noise ratio
(PSNR) of the mean of the Markov chain (the Monte Carlo estimate of the MMSE solution),
as a function of the number of iterations, for the six images of Figure 1. These plots have
been computed by using a step-size δ = δth that is just below the stability limit and a 1-
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

in-2500 thinning. We observe that the PSNR between the MMSE solution as computed by
the Markov chain and the truth stabilizes in approximately 105 iterations in the experiments
where the chain exhibits fast convergence, whereas over 106 are required in experiments that
suffer from slow convergence (e.g., deblurring of Alley, Bridge, and Goldhill). Moreover,
we observe that using PPnP-ULA with a larger step-size can noticeably reduce the number
of iterations required to obtain a stable estimate of the posterior mean, particularly in the
image deblurring experiments.
Visual results. Figures 11 to 14 show the MMSE estimate computed by PnP-ULA on the
whole chain including the burn-in for the six images, for the inpainting and deblurring experi-
ments. We also provide the MAP estimation results computed by using PnP-SGD [49], which
targets the same posterior distributions. We report the PSNR and the structural similarity
index (SSIM) [74, 75] for all these experiments.
.
PnP-ULA, δ = δth .
.
PPnP-ULA, δ = 6δth .

Figure 10. Left: PSNR evolution of the estimated MMSE for the inpainting problem. After 5e5 iter-
ations, the convergence of the first order moment of the posterior distribution seems to be achieved for all
images. Middle and right: PSNR evolution of the estimated MMSE for the deblurring problem. The con-
vergence for the posterior mean can be fast for simple images such as Cameraman, Simpson, and Traffic
(for these images the PSNR evolution is shown for the first 5e5 iterations). Increasing the δ increases the
convergence speed for these images by a factor close to 2. For more complex images, such as Alley or
Goldhill, the convergence is much slower and is still not achieved after 3e6 iterations with PPnP-ULA for
δ = 6δth .

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


BAYESIAN IMAGING USING PLUG & PLAY PRIORS 727

.
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

PnP-ULA

PSNR=25.06/SSIM=0.89 PSNR=30.62/SSIM=0.93 PSNR=26.90/SSIM=0.85


.
PnP-SGD

PSNR=23.94/SSIM=0.88 PSNR=28.90/SSIM=0.90 PSNR=24.20/SSIM=0.81

Figure 11. Results comparison for the inpainting task of the images presented in Figure 3 using PnP-ULA
(first row) and PnP-SGD initialized with a TVL2 restoration (second row).
.
PnP-ULA

PSNR=27.74/SSIM=0.79 PSNR=26.16/SSIM=0.80 PSNR=26.76/SSIM=0.74


.
PnP-SGD

PSNR=26.45/SSIM=0.75 PSNR=24.71/SSIM=0.77 PSNR=25.96/SSIM=0.72

Figure 12. Results comparison for the inpainting task of the images presented in Figure 3 using PnP-ULA
(first row) and PnP-SGD initialized with a TVL2 restoration (second row).

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


728 LAUMONT ET AL.

PnP-ULA, α = 1.
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

PSNR=30.50/SSIM=0.93 PSNR=34.26/SSIM=0.94 PSNR=29.90/SSIM=0.90


PnP-SGD, α = 0.3.
.

PSNR=30.73/SSIM=0.92 PSNR=33.52/SSIM=0.92 PSNR=29.42/SSIM=0.88


PnP-SGD, α = 1.
.

PSNR=29.39/SSIM=0.93 PSNR=33.33/SSIM=0.93 PSNR=28.13/SSIM=0.85

Figure 13. Results comparison for the deblurring task of the images presented in Figure 2 using PnP-ULA
with α = 1 (first row), PnP-SGD with α = 0.3 (second row) and α = 1 (third row). PnP-ULA was initialized
with the observation y (see Figure 2), whereas PnP-SGD was initialized with a TVL2 restoration.

For the inpainting experiments, PnP-SGD struggles to converge when initialized with the
observed image (see [49]). For this reason, we warm start PnP-SGD by using an estimate of
x obtained by minimizing the total variation pseudonorm under the constraint of the known
pixels. For simplicity, PnP-ULA is initialized with the observation y. We observe in Figures 11
and 12 that the results obtained by computing the MMSE Bayesian estimator with PnP-ULA
are visually and quantitatively superior to the ones delivered by MAP estimation with PnP-
SGD. In particular, the sampling approach seems to better recover the continuity of fine
structures and lines in the different images.
For the deblurring experiments, the results of PnP-SGD are provided by using a regular-
ization parameter α = 0.3 (which was shown to yield optimal results on this set of images

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


BAYESIAN IMAGING USING PLUG & PLAY PRIORS 729

PnP-ULA, α = 1.
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

PSNR=28.98/SSIM=0.80 PSNR=28.28/SSIM=0.84 PSNR=27.72/SSIM=0.73


PnP-SGD, α = 0.3. .

PSNR=29.26/SSIM=0.82 PSNR=28.04/SSIM=0.84 PSNR=28.27/SSIM=0.76


PnP-SGD, α = 1.
.

PSNR=28.28/SSIM=0.76 PSNR=27.14/SSIM=0.79 PSNR=27.42/SSIM=0.70

Figure 14. Results comparison for the deblurring task of the images presented in Figure 2 using PnP-ULA
with α = 1 (first row), PnP-SGD with α = 0.3 (second row) and α = 1 (third row). PnP-ULA was initialized
with the observation y (see Figure 2), whereas PnP-SGD was initialized with a TVL2 restoration.

in [49]) and for α = 1, which recovers the model used by PnP-ULA. Observe that for the
three first images (shown on Figure 13), the MMSE result is much sharper than the best
MAP result, and the PSNR/SSIM results also show a clear advantage for the MMSE. For
the other three images (results are shown on Figure 14), the quality of the MMSE solutions
delivered is slightly deteriorated by the slow convergence of the Markov chain and the poor
regularization of some specific frequencies, which leads to a common visual artifact (a rotated
rectangular pattern). Using a different denoiser more suitable for handling textures, or com-
bining a learned denoiser with an analytic regularization term, might correct this behavior
and will be the topic of future work.
A partial conclusion from this set of comparisons is that the sampling approach of PnP-
ULA, when it samples the space correctly, seems to provide much better results than the MAP

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


730 LAUMONT ET AL.
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Figure 15. Marginal posterior standard deviation for the deblurring problem. On simple images such as
Simpson (see Figure 1), most of the uncertainty is located around the edges. For the images Alley, Bridge, and
Goldhill, associated with a highly correlated Markov chain in some directions, some areas are very uncertain.
They correspond to the zones where the rotated rectangular pattern appears in the MMSE estimate.

estimator for the same posterior. Of course, this increase in quality comes at the cost of a
much higher computation time.

4.4. Deblurring and inpainting: Uncertainty visualization study. One of the benefits of
sampling from the posterior distribution with PnP-ULA is that we can probe the uncertainty
in the delivered solutions. In the following, we present an uncertainty visualization analysis
that is useful for displaying the uncertainty related to image structures of different sizes and
located in different regions of the image (see [17] for more details). The analysis proceeds
as follows. First, Figures 4 and 15 show the marginal posterior standard deviation associ-
ated with each image pixel, as computed by PnP-ULA over all samples, for the inpainting
and deblurring problems. As could be expected, we observe for both problems that highly
uncertain pixels are concentrated around the edges of the reconstructed images, but also on
textured areas. The dynamic range of the pixel standard deviations is larger for the inpainting
problem than for deblurring, which suggests that the problem has a higher level of intrinsic
uncertainty.
Figure 16 shows the evolution of the RMSE between the standard deviation computed
along the samples and its asymptotic value, respectively, for the inpainting and deblurring
problems. Estimating these standard deviation maps necessitates running the chain longer
than to estimate the MMSE, as could be expected for second order statistical moment.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


BAYESIAN IMAGING USING PLUG & PLAY PRIORS 731
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Inpainting. Deblurring.
Figure 16. Evolution of the RMSE between the final standard deviation and the estimated current standard
deviation for the inpainting and deblurring problems.

Scale 1 Scale 2 Scale 3 Scale 4


Figure 17. Marginal posterior standard deviation of the Alley and Simpson images for the inpainting
problem at different scales. The scale i corresponds to a downsampling by a factor 2i of the original sample
size.

Following on from this, to explore the uncertainty for structures that are larger than one
pixel, Figures 17 and 18 report the marginal standard deviation associated with higher scales.
More precisely, for different values of the scale i, we downsample the stored samples by a factor
2i before computing the standard deviation. This downsampling step permits quantifying the
uncertainty of larger or lower-frequency structures, such as the bottom of the glass in Simpson
for the deblurring experiment. At each scale, we see that the uncertainty of the estimate is
much more localized for the inpainting problem (resulting in higher uncertainty values in some
specific regions) and more spread out for deblurring, certainly because of the different nature
of the degradations involves.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


732 LAUMONT ET AL.
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Scale 1 Scale 2 Scale 3 Scale 4


Figure 18. Marginal posterior standard deviation of the images Alley and Simpson for the deblurring
problem at different scales. The scale i corresponds to a downsampling by a factor 2i of the original sample
size.

5. Conclusion. This paper presented theory, methods, and computation algorithms for
performing Bayesian inference with Plug & Play priors. This mathematical and computa-
tional framework is rooted in the Bayesian M-complete paradigm and adopts the view that
Plug & Play models approximate a regularized oracle model. We established clear conditions
ensuring that the involved models and quantities of interest are well defined and well posed.
Following on from this, we studied three Bayesian computation algorithms related to biased
approximations of a Langevin diffusion process, for which we provide detailed convergence
guarantees under easily verifiable and realistic conditions. For example, our theory does not
require the denoising algorithms representing the prior to be gradient or proximal operators.
We also studied the estimation error involved in using these algorithms and models instead of
the oracle model, which is decision-theoretically optimal but intractable. To the best of our
knowledge, this is the first Bayesian Plug & Play framework with this level of insight and guar-
antees on the delivered solutions. We illustrated the proposed framework with two Bayesian
image restoration experiments—deblurring and inpainting—where we computed point esti-
mates as well as uncertainty visualization and quantification analyses and highlighted how
the limitations of the chosen denoiser manifest in the resulting Bayesian model and estimates.
In future work, we would like to continue our theoretical and empirical investigation of
Bayesian Plug & Play models, methods, and algorithms. From a modeling viewpoint, it
would be interesting to consider priors that combine a denoiser with an analytic regulariza-
tion term, and other neural network–based priors such as the generative ones used in [14]
or the autoencoder-based priors in [36], as well as to generalize the Gaussian smoothing to
other smoothings and investigate their properties in the context of Bayesian inverse problems.
We are also very interested in strategies for training denoisers that automatically verify the

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


BAYESIAN IMAGING USING PLUG & PLAY PRIORS 733

conditions required for exponentially fast convergence of the Langevin SDE, for example, by
using the framework recently proposed in [60] to learn maximally monotone operators, or
the data-driven regularizers described in [47, 55]. In addition, we would like to understand
when the projected RED estimator [23]—or its relaxed variant—is the MAP estimator for
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

well-defined Bayesian models, as well as to study the interplay between the geometric as-
pects of the loss defining this estimator [57] and the geometry of the set of fixed points of
the denoiser defining the model. With regard to Bayesian analysis, it would be important
to investigate the frequentist accuracy of Plug & Play models, as well as the adoption of
robust Bayesian techniques in order to perform inference directly w.r.t. to the oracle model
[76]. From a Bayesian computation viewpoint, a priority is to develop accelerated algorithms
similar to [59]. Last, with regards to experimental work, we intend to study the applica-
tion of this framework to uncertainty quantification problems, e.g., in the context of medical
imaging.

REFERENCES

[1] C. Aguerrebere, A. Almansa, J. Delon, Y. Gousseau, and P. Muse, A Bayesian hyperprior


approach for joint image denoising and interpolation, with an application to HDR imaging, IEEE
Trans. Comput. Imaging, 3 (2017), pp. 633–646, https://doi.org/10.1109/TCI.2017.2704439.
[2] R. Ahmad, C. A. Bouman, G. T. Buzzard, S. Chan, S. Liu, E. T. Reehorst, and P. Schniter,
Plug-and-play methods for magnetic resonance imaging: Using denoisers for image recovery, IEEE
Signal Processing Magazine, 37 (2020), pp. 105–116, https://doi.org/10.1109/MSP.2019.2949470.
[3] G. Alain and Y. Bengio, What regularized auto-encoders learn from the data-generating distribution,
J. Mach. Learn. Res., 15 (2014), pp. 3743–3773, http://jmlr.org/papers/v15/alain14a.html.
[4] V. Antun, M. J. Colbrook, and A. C. Hansen, Can Stable and Accurate Neural Networks be
Computed?–On the Barriers of Deep Learning and Smale’s 18th Problem, preprint, arXiv:2101.08286,
2021.
[5] V. Antun, F. Renna, C. Poon, B. Adcock, and A. C. Hansen, On instabilities of deep learning in
image reconstruction and the potential costs of AI, Proc. Natl. Acad. Sci. USA, 117 (2020), pp. 30088–
30095, https://doi.org/10.1073/pnas.1907377117.
[6] S. Arridge, P. Maass, O. Öktem, and C.-B. Schönlieb, Solving inverse problems using data-driven
models, Acta Numer., 28 (2019), p. 1–174, https://doi.org/10.1017/S0962492919000059.
[7] F. Bach, Breaking the Curse of Dimensionality with Convex Neural Networks, J. Mach. Learn. Res., 18
(2017), pp. 629–681, https://jmlr.org/papers/volume18/14-546/14-546.pdf.
[8] H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spa-
ces, CMS Books Math. 408, Springer, New York, 2011, https://doi.org/10.1007/978-3-319-48311-5.
[9] M. Bayati and A. Montanari, The dynamics of message passing on dense graphs, with applications to
compressed sensing, IEEE Trans. Inform. Theory, 57 (2011), pp. 764–785, https://doi.org/10.1109/
TIT.2010.2094817.
[10] J. Bernardo and A. Smith, Bayesian Theory, Wiley Ser. Probab. Stat. 15, Wiley, New York, 2000.
[11] S. A. Bigdeli, M. Jin, P. Favaro, and M. Zwicker, Deep mean-shift priors for image restoration, in
Advances in Neural Information Processing Systems, Vol. 30, Curran Associates, 2017, pp. 763–772,
http://papers.nips.cc/paper/6678-deep-mean-shift-priors-for-image-restoration.
[12] S. A. Bigdeli and M. Zwicker, Image Restoration Using Autoencoding Priors, Tech. report, https:
//arxiv.org/abs/1703.09964, 2017.
[13] A. Blake, P. Kohli, and C. Rother, Markov Random Fields for Vision and Image Processing, MIT
Press, Cambridge, MA, 2011, https://doi.org/10.7551/mitpress/8579.003.0001.
[14] A. Bora, A. Jalal, E. Price, and A. G. Dimakis, Compressed sensing using generative models,
in Proceedings of the 34th International Conference on Machine Learning, Proceedings of Machine
Learning Research 70, PMLR, 2017, pp. 537–546, https://proceedings.mlr.press/v70/bora17a.html.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


734 LAUMONT ET AL.

[15] V. D. Bortoli and A. Durmus, Convergence of Diffusions and Their Discretizations: From Continuous
to Discrete Processes and Back, https://arxiv.org/abs/1904.09808, 2020.
[16] G. T. Buzzard, S. H. Chan, S. Sreehari, and C. A. Bouman, Plug-and-Play unplugged:
Optimization-free reconstruction using consensus equilibrium, SIAM J. Imaging Sci., 11 (2018),
pp. 2001–2020, https://doi.org/10.1137/17M1122451.
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

[17] X. Cai, M. Pereyra, and J. D. McEwen, Uncertainty quantification for radio interferometric imaging
I. Proximal MCMC methods, Monthly Not. Roy. Astronom. Soc., 480 (2018), pp. 4154–4169, https:
//doi.org/10.1093/mnras/sty2004.
[18] A. Chambolle, An algorithm for Total Variation Minimization and Applications, J. Math. Imaging
Vision, 20 (2004), pp. 89–97, https://doi.org/10.1023/B:JMIV.0000011325.36760.1e.
[19] S. H. Chan, X. Wang, and O. A. Elgendy, Plug-and-Play ADMM for image restoration: Fixed-point
convergence and applications, IEEE Trans. Comput. Imaging, 3 (2017), pp. 84–98, https://doi.org/
10.1109/TCI.2016.2629286.
[20] S. Chen, C. Luo, B. Deng, Y. Qin, H. Wang, and Z. Zhuang, BM3D vector approximate message
passing for radar coded-aperture imaging, in Proceedings of Progress in Electromagnetics Research
Symposium-Fall, IEEE, 2017, pp. 2035–2038, https://doi.org/10.1109/PIERS-FALL.2017.8293472.
[21] T. Chen, E. Fox, and C. Guestrin, Stochastic gradient Hamiltonian Monte Carlo, in Proceedings of
the 31st International Conference on Machine Learning, Proceedings of Machine Learning Research
32, PMLR, 2014, pp. 1683–1691, https://proceedings.mlr.press/v32/cheni14.html.
[22] Y. Chen and T. Pock, Trainable nonlinear reaction diffusion: A flexible framework for fast and effective
image restoration, IEEE Trans. Pattern Analysis Machine Intelligence, 39 (2017), pp. 1256–1272,
https://doi.org/10.1109/TPAMI.2016.2596743.
[23] R. Cohen, M. Elad, and P. Milanfar, Regularization by denoising via fixed-point projection (RED-
PRO), SIAM J. Imaging Sci., 14 (2021), pp. 1374–1406, https://doi.org/10.1137/20M1337168.
[24] A. S. Dalalyan, Theoretical guarantees for approximate sampling from smooth and log-concave densities,
J. R. Stat. Soc. Ser. B. Stat. Methodol., 79 (2017), pp. 651–676, https://doi.org/10.1111/rssb.12183.
[25] S. Diamond, V. Sitzmann, F. Heide, and G. Wetzstein, Unrolled Optimization with Deep Priors,
https://arxiv.org/abs/1705.08041, 2017.
[26] C. Dong, C. C. Loy, K. He, and X. Tang, Learning a deep convolutional network for image super-
resolution, in Computer Vision – ECCV 2014, Springer, New York, 2014, pp. 184–199.
[27] D. L. Donoho, A. Maleki, and A. Montanari, Message-passing algorithms for compressed sensing,
Proc. Natl. Acad. Sci. USA, 106 (2009), pp. 18914–18919, https://doi.org/10.1073/pnas.0909892106.
[28] A. Durmus and E. Moulines, Nonasymptotic convergence analysis for the unadjusted Langevin algo-
rithm, Ann. Appl. Probab., 27 (2017), pp. 1551–1587, https://doi.org/10.1214/16-AAP1238.
[29] A. Durmus, E. Moulines, and M. Pereyra, Efficient Bayesian computation by proximal Markov
chain Monte Carlo: When Langevin meets Moreau, SIAM J. Imaging Sci., 11 (2018), pp. 473–506,
https://doi.org/10.1137/16M1108340.
[30] B. Efron, Tweedie’s formula and selection bias, J. Amer. Statist. Assoc., 106 (2011), pp. 1602–1614,
https://doi.org/10.1198/jasa.2011.tm11181.
[31] A. K. Fletcher, P. Pandit, S. Rangan, S. Sarkar, and P. Schniter, Plug in estimation in high
dimensional linear inverse problems a rigorous analysis, J. Stat. Mech. Theory Exp., 2019 (2019),
124021, https://doi.org/10.1088/1742-5468/ab321a.
[32] H. Gao, X. Tao, X. Shen, and J. Jia, Dynamic scene deblurring with parameter selective sharing
and nested skip connections, in Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, 2019, pp. 3843–3851, https://doi.org/10.1109/CVPR.2019.00397.
[33] M. Gharbi, G. Chaurasia, S. Paris, and F. Durand, Deep joint demosaicking and denoising, ACM
Trans. Graphics, 35 (2016), 191, https://doi.org/10.1145/2980179.2982399.
[34] D. Gilton, G. Ongie, and R. Willett, Neumann Networks for Inverse Problems in Imaging, https:
//arxiv.org/abs/1901.03707, 2019.
[35] M. Girolami and B. Calderhead, Riemann manifold Langevin and Hamiltonian Monte Carlo methods,
J. R. Stat. Soc. Ser. B Stat. Methodol., 73 (2011), pp. 123–214, https://doi.org/10.1111/j.1467-9868.
2010.00765.x.
[36] M. González, A. Almansa, and P. Tan, Solving Inverse Problems by Joint Posterior Maximization
with Autoencoding Prior, https://arxiv.org/abs/2103.01648, 2021.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


BAYESIAN IMAGING USING PLUG & PLAY PRIORS 735

[37] K. Gregor and Y. LeCun, Learning fast approximations of sparse coding, in Proceedings of the 27th In-
ternational Conference on International Conference on Machine Learning, Omnipress, 2010, pp. 399–
406, https://icml.cc/Conferences/2010/papers/449.pdf.
[38] B. Guo, Y. Han, and J. Wen, AGEM: Solving linear inverse problems via deep priors and sampling,
in Advances in Neural Information Processing Systems, Curran Associates, 2019, pp. 547–558, https:
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

//proceedings.neurips.cc/paper/2019/file/49182f81e6a13cf5eaa496d51fea6406-Paper.pdf.
[39] J. Ho, A. Jain, and P. Abbeel, Denoising diffusion probabilistic models, Advances in Neural Informa-
tion Processing Systems, 33 (2020), pp. 6840–6851, https://proceedings.neurips.cc/paper/2020/file/
4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf.
[40] A. Houdard, C. Bouveyron, and J. Delon, High-dimensional mixture models for unsupervised im-
age denoising (HDMI), SIAM J. Imaging Sci., 11 (2018), pp. 2815–2846, https://doi.org/10.1137/
17M1135694.
[41] S. Hurault, A. Leclaire, and N. Papadakis, Gradient Step Denoiser for Convergent Plug-and-Play,
preprint, arXiv:2110.03220, 2021.
[42] A. Javanmard and A. Montanari, State evolution for general approximate message passing algorithms,
with applications to spatial coupling, Inf. Inference, 2 (2013), pp. 115–144, https://doi.org/10.1093/
imaiai/iat004.
[43] Z. Kadkhodaie and E. P. Simoncelli, Stochastic solutions for linear inverse problems using the prior
implicit in a denoiser, Advances in Neural Information Processing Systems, 2021, https://proceedings.
neurips.cc/paper/2021/file/6e28943943dbed3c7f82fc05f269947a-Paper.pdf.
[44] U. S. Kamilov, H. Mansour, and B. Wohlberg, A plug-and-play priors approach for solving nonlinear
imaging inverse problems, IEEE Signal Processing Letters, 24 (2017), pp. 1872–1876, https://doi.org/
10.1109/LSP.2017.2763583.
[45] B. Kawar, G. Vaksman, and M. Elad, SNIPS: Solving Noisy Inverse Problems Stochastically, preprint,
arXiv:2105.14951, 2021.
[46] B. Kawar, G. Vaksman, and M. Elad, Stochastic image denoising by sampling from the posterior
distribution, in Proceedings of the IEEE/CVF International Conference on Computer Vision
Workshops, 2021, pp. 1866–1875, https://openaccess.thecvf.com/content/ICCV2021W/AIM/html/
Kawar Stochastic Image Denoising by Sampling From the Posterior Distribution ICCVW 2021
paper.html.
[47] E. Kobler, A. Effland, K. Kunisch, and T. Pock, Total deep variation for linear inverse problems,
in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, https://doi.org/
10.1109/CVPR42600.2020.00757.
[48] J. Latz, On the well-posedness of Bayesian inverse problems, SIAM/ASA J. Uncertain. Quantif., 8 (2020),
pp. 451–482, https://doi.org/10.1137/19M1247176.
[49] R. Laumont, V. de Bortoli, A. Almansa, J. Delon, A. Durmus, and M. Pereyra, On
Maximum-a-Posteriori Estimation with Plug & Play Priors and Stochastic Gradient Descent, https:
//hal.archives-ouvertes.fr/hal-03348735/document, 2021.
[50] J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, and T. Aila,
Noise2noise: Learning image restoration without clean data, 80 (2018), pp. 2965–2974, https:
//proceedings.mlr.press/v80/lehtinen18a.html.
[51] C. Louchet and L. Moisan, Posterior expectation of the total variation model: Properties and experi-
ments, SIAM J. Imaging Sci., 6 (2013), pp. 2640–2684, https://doi.org/10.1137/120902276.
[52] T. Meinhardt, M. Moller, C. Hazirbas, and D. Cremers, Learning proximal operators: Using
denoising networks for regularizing inverse imaging problems, in Proceedings of the International
Conference on Computer Vision, 2017, pp. 1781–1790, https://doi.org/10.1109/ICCV.2017.198.
[53] C. A. Metzler, A. Maleki, and R. G. Baraniuk, From denoising to compressed sensing, IEEE Trans.
Inform. Theory, 62 (2016), pp. 5117–5144, https://doi.org/10.1109/TIT.2016.2556683.
[54] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, Spectral Normalization for Generative Ad-
versarial Networks, https://openreview.net/forum?id=B1QRgziT-, 2018.
[55] S. Mukherjee, S. Dittmer, Z. Shumaylov, S. Lunz, O. Öktem, and C.-B. Schönlieb, Learned
Convex Regularizers for Inverse Problems, https://arxiv.org/abs/2008.02839, 2021.
[56] M. Pereyra, Proximal Markov chain Monte Carlo algorithms, Statist. Comput., 26 (2016), pp. 745–760,
https://doi.org/10.1007/s11222-015-9567-4.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


736 LAUMONT ET AL.

[57] M. Pereyra, Revisiting maximum-a-posteriori estimation in log-concave models, SIAM J. Imaging Sci.,
12 (2019), pp. 650–670, https://doi.org/10.1137/18M1174076.
[58] M. Pereyra, P. Schniter, E. Chouzenoux, J.-C. Pesquet, J.-Y. Tourneret, A. O. Hero, and
S. McLaughlin, A survey of stochastic simulation and optimization methods in signal processing,
IEEE J. Selected Topics Signal Processing, 10 (2015), pp. 224–241, https://doi.org/10.1109/JSTSP.
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

2015.2496908.
[59] M. Pereyra, L. Vargas Mieles, and K. C. Zygalakis, Accelerating proximal Markov chain Monte
Carlo by using an explicit stabilized method, SIAM J. Imaging Sci., 13 (2020), pp. 905–935, https:
//doi.org/10.1137/19M1283719.
[60] J.-C. Pesquet, A. Repetti, M. Terris, and Y. Wiaux, Learning maximally monotone operators for
image recovery, SIAM J. Imaging Sci. 14 (2020), https://doi.org/10.1137/20M1387961.
[61] A. Repetti, M. Pereyra, and Y. Wiaux, Scalable Bayesian uncertainty quantification in imaging
inverse problems via convex optimization, SIAM J. Imaging Sci., 12 (2019), pp. 87–118, https://doi.
org/10.1137/18M1173629.
[62] C. Robert, The Bayesian Choice: From Decision-Theoretic Foundations to Computational Imple-
mentation, Springer Texts Statist., Springer, New York, 2007, https://books.google.fr/books?id=
6oQ4s8Pq9pYC.
[63] G. O. Roberts and R. L. Tweedie, Exponential convergence of Langevin distributions and their discrete
approximations, Bernoulli, 2 (1996), pp. 341–363, https://doi.org/10.2307/3318418.
[64] L. I. Rudin, S. Osher, and E. Fatemi, Nonlinear total variation based noise removal algorithms, Phys.
D, 60 (1992), pp. 259–268, https://doi.org/10.1016/0167-2789(92)90242-F.
[65] E. K. Ryu, J. Liu, S. Wang, X. Chen, Z. Wang, and W. Yin, Plug-and-Play methods provably converge
with properly trained denoisers, in Proceedings of the 36th International Conference on Machine
Learning, Long Beach, CA, 2019, pp. 5546–5557, http://proceedings.mlr.press/v97/ryu19a.html.
[66] E. Schwartz, R. Giryes, and A. M. Bronstein, DeepISP: Toward learning an end-to-end image
processing pipeline, IEEE Trans. Image Process., 28 (2018), pp. 912–923, https://doi.org/10.1109/
TIP.2018.2872858.
[67] L. Schwartz, Désintégration d’une mesure, Séminaire Équations aux dérivées partielles (Polytechnique),
http://eudml.org/doc/111551.
[68] Y. Song and S. Ermon, Generative modeling by estimating gradients of the data distribution, in Advances
in Neural Information Processing Systems, Vol. 32, 2019, https://proceedings.neurips.cc/paper/2019/
file/3001ef257407d5a371a96dcd947c7d93-Paper.pdf.
[69] A. M. Stuart, Inverse problems: A Bayesian perspective, Acta Numer., 19 (2010), pp. 451–559, https:
//doi.org/10.1017/S0962492910000061.
[70] Y. Sun, Z. Wu, B. Wohlberg, and U. S. Kamilov, Scalable Plug-and-Play ADMM with convergence
guarantees, IEEE Trans. Comput. Imaging, 7 (2021), pp. 849–863, https://doi.org/10.1109/TCI.2021.
3094062.
[71] A. M. Teodoro, J. M. Bioucas-Dias, and M. A. T. Figueiredo, Scene-Adapted Plug-and-Play
Algorithm with Guaranteed Convergence: Applications to Data Fusion in Imaging, https://arxiv.org/
abs/1801.00605, 2018.
[72] S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg, Plug-and-Play priors for model based
reconstruction, in Proceedings of the Global Conference on Signal and Information Processing, IEEE,
2013, pp. 945–948, https://doi.org/10.1109/GlobalSIP.2013.6737048.
[73] C. Villani, Optimal Transport: Old and New, Grundlehren Math. Wiss. 338, Springer-Verlag, Berlin,
2009, https://doi.org/10.1007/978-3-540-71050-9.
[74] Z. Wang and A. C. Bovik, Mean squared error: Love it or leave it? A new look at signal fidelity
measures, IEEE Signal Processing Magazine, 26 (2009), pp. 98–117, https://doi.org/10.1109/MSP.
2008.930649.
[75] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, Image quality assessment: From error
visibility to structural similarity, IEEE Trans. Image Processing, 13 (2004), pp. 600–612.
[76] J. Watson and C. Holmes, Approximate models and robust decisions, Statist. Sci., 31 (2016), pp. 465–
489, https://doi.org/10.1214/16-sts592.
[77] X. Xu, Y. Sun, J. Liu, B. Wohlberg, and U. S. Kamilov, Provable convergence of Plug-and-Play
priors with MMSE denoisers, IEEE Signal Processing Letters, 27 (2020), pp. 1–10, https://doi.org/
10.1109/LSP.2020.3006390.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


BAYESIAN IMAGING USING PLUG & PLAY PRIORS 737

[78] G. Yu, G. Sapiro, and S. Mallat, Solving inverse problems with piecewise linear estimators: From
Gaussian mixture models to structured sparsity, IEEE Trans. Image Processing, 21 (2011), pp. 2481–
2499, https://doi.org/10.1109/TIP.2011.2176743.
[79] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, Beyond a Gaussian denoiser: Residual
learning of deep CNN for image denoising, IEEE Trans. Image Processing, 26 (2017), pp. 3142–3155,
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

https://doi.org/10.1109/TIP.2017.2662206.
[80] K. Zhang, W. Zuo, S. Gu, and L. Zhang, Learning Deep CNN Denoiser Prior for Image Restoration,
in Proceedings of the Conference on Computer Vision and Pattern Recognition, IEEE, 2017, pp. 2808–
2817, https://doi.org/10.1109/CVPR.2017.300.
[81] K. Zhang, W. Zuo, and L. Zhang, FFDNet: Toward a fast and flexible solution for CNN-based image
denoising, IEEE Trans. Image Processing, 27 (2018), pp. 4608–4622, https://doi.org/10.1109/TIP.
2018.2839891.
[82] D. Zoran and Y. Weiss, From learning models of natural image patches to whole image restoration, in
Proceedings of the International Conference on Computer Vision, IEEE, 2011, pp. 479–486, https:
//doi.org/10.1109/ICCV.2011.6126278.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

You might also like