Laumont etal22-BaysianImagingPnP
Laumont etal22-BaysianImagingPnP
Bayesian Imaging Using Plug & Play Priors: When Langevin Meets Tweedie∗
Rémi Laumont† , Valentin De Bortoli‡ , Andrés Almansa† , Julie Delon§ , Alain Durmus¶, and
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
Marcelo Pereyra∥
Abstract. Since the seminal work of Venkatakrishnan, Bouman, and Wohlberg [Proceedings of the Global Con-
ference on Signal and Information Processing, IEEE, 2013, pp. 945–948] in 2013, Plug & Play (PnP)
methods have become ubiquitous in Bayesian imaging. These methods derive estimators for inverse
problems in imaging by combining an explicit likelihood function with a prior that is implicitly
defined by an image denoising algorithm. In the case of optimization schemes, some recent works
guarantee the convergence to a fixed point, albeit not necessarily a maximum a posteriori Bayesian
estimate. In the case of Monte Carlo sampling schemes for general Bayesian computation, to the
best of our knowledge there is no known proof of convergence. Algorithm convergence issues aside,
there are important open questions regarding whether the underlying Bayesian models and estima-
tors are well defined, are well posed, and have the basic regularity properties required to support
efficient Bayesian computation schemes. This paper develops theory for Bayesian analysis and com-
putation with PnP priors. We introduce PnP-ULA (Plug & Play unadjusted Langevin algorithm)
for Monte Carlo sampling and minimum mean square error estimation. Using recent results on the
quantitative convergence of Markov chains, we establish detailed convergence guarantees for this
algorithm under realistic assumptions on the denoising operators used, with special attention to
denoisers based on deep neural networks. We also show that these algorithms approximately target
a decision-theoretically optimal Bayesian model that is well posed and meaningful from a frequen-
tist viewpoint. PnP-ULA is demonstrated on several canonical problems such as image deblurring
and inpainting, where it is used for point estimation as well as for uncertainty visualization and
quantification.
Key words. Plug & Play, inverse problems, deblurring, inpainting, Markov chain Monte Carlo, Langevin
algorithm
∗
Received by the editors March 22, 2021; accepted for publication (in revised form) December 28, 2021; published
electronically May 31, 2022. The first two authors contributed equally to this work.
https://doi.org/10.1137/21M1406349
Funding: The work of the first author was partially supported by grants from Région Ile-De-France. The work of
the second author was partially supported by EPSRC grant EP/R034710/1. The work of the third and fourth authors
was supported by the French Research Agency through the PostProdLEAP project ANR-19-CE23-0027-01. The work
of the fifth author was supported by the Lagrange Mathematical and Computing Research Center. The work of
the sixth author was supported by EPSRC grants EP/T007346/1 and EP/W007681/1. Computer experiments for
this work ran on a Titan Xp GPU donated by NVIDIA, as well as on HPC resources from GENCI-IDRIS (grants
2020-AD011011641, 2021-AD011011641R1).
†
Université Paris Cité, CNRS, MAP5 UMR 8145, F-75006 Paris, France (remi.laumont@parisdescartes.fr, andres.
almansa@parisdescartes.fr).
‡
Department of Statistics University of Oxford, 24-29 St Giles, OX1 3LB, Oxford, United Kingdom (valentin.
debortoli@gmail.com).
§
Université Paris Cité, CNRS, MAP5 UMR 8145, F-75006 Paris, France, and Institut Universitaire de France
(IUF), 75231 Paris Cedex 05, France (julie.delon@parisedescartes.fr).
¶
Centre Borelli, UMR 9010, École Normale Supérieure Paris-Saclay, 91190 Gif-sur-Yvette, France (alain.durmus@
cmla.ens-cachan.fr).
∥
School of Mathematical and Computer Sciences, Heriot-Watt University and Maxwell Institute for Mathematical
Sciences, Edinburgh, United Kingdom (m.pereyra@hw.ac.uk).
701
AMS subject classifications. 65J22, 68U10, 62F15, 65C40, 65C60, 65J20, 65D18, 90C26, 65K05, 68Q25
DOI. 10.1137/21M1406349
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
1. Introduction.
1.1. Bayesian inference in imaging inverse problems. Most inverse problems in imaging
aim at reconstructing an unknown image x ∈ Rd from a degraded observation y ∈ Rm un-
der some assumptions on their relationship. For example, many works consider observation
models of the form y = A(x) + n, where A : Rd → Rm is a degradation operator modeling de-
terministic instrumental aspects of the observation process, and n is an unknown (stochastic)
noise term taking values in Rm . The operator A can be known or not and is usually assumed
to be linear (e.g., A can represent blur, missing pixels, a projection, etc.).
The estimation of x from y is usually ill posed or ill conditioned1 and additional assump-
tions on the unknown x are required in order to deliver meaningful estimates. The Bayesian
statistical paradigm provides a natural framework to regularize such estimation problems.
The relationship between x and y is described by a statistical model with likelihood function
p(y|x), and the knowledge about x is encoded by the prior distribution for x, typically speci-
fied via a density function p(x) or by its potential U (x) = − log p(x). Similarly, in some cases
the likelihood p(y|x) is specified via the potential F (x, y) = − log p(y|x). The likelihood and
prior define the joint distribution with density p(x, y) = p(y|x)p(x), from which we derive the
posterior distribution with density p(x|y) where for any x ∈ Rd , y ∈ Rm
.Z
p(x|y) = p(y|x)p(x) p(y|x̃)p(x̃)dx̃ ,
Rd
which underpins all inference about x given the observation y. Most imaging methods seek to
derive estimators reaching some kind of consensus between prior and likelihood, as for instance
the minimum mean square error (MMSE) or maximum a posteriori (MAP) estimators
(1.1) x̂map = arg max p(x|y) = arg min {F (x, y) + U (x)} ,
x∈Rd x∈Rd
Z
2
(1.2) x̂mmse = arg min E[∥x − u∥ |y] = E[x|y] = x̃p(x̃|y)dx̃ .
u∈Rd Rd
The quality of the inference about x given y depends on how accurately the specified prior
represents the true marginal distribution for x. Most works in the Bayesian imaging literature
consider relatively simple priors promoting sparsity in transformed domains or piecewise reg-
ularity (e.g., involving the ℓ1 norm or the total variation pseudonorm [64, 18, 51, 56]), Markov
random fields [13], or learning-based priors like patch-based Gaussian or Gaussian mixture
models [82, 78, 1, 71, 40]. Special attention is given in the literature to models that have
specific factorization structures or that are log-concave, as this enables the use of Bayesian
computation algorithms that scale efficiently to high dimensions and which have detailed
convergence guarantees [56, 29, 61, 35, 21].
1
That is, either the estimation problem does not admit a unique solution, or there exists a unique solution
but it is not Lipschitz continuous w.r.t. to perturbations in the data y.
√ √
(1.3) dXt = ∇ log p(Xt |y) + 2dBt = ∇ log p(y|Xt ) + ∇ log p(Xt ) + 2dBt ,
where (Bt )t⩾0 is a d-dimensional Brownian motion. When p(x|y) is proper and smooth, with
x 7→ ∇ log p(x|y) Lipschitz continuous,2 then, for any initial condition X0 ∈ Rd , the SDE (1.3)
has a unique strong solution (Xt )t⩾0 that admits the posterior of interest p(x|y) as unique
stationary density [63]. In addition, for any initial condition X0 ∈ Rd the distribution of
Xt converges toward the posterior distribution in total variation. Although solving (1.3) in
continuous time is generally not possible, we can use discrete time approximations of (1.3)
to generate samples that are approximately distributed according to p(x|y). A natural choice
is the unadjusted Langevin algorithm (ULA) Markov chain (Xk )k⩾0 obtained from an Euler–
Maruyama discretization of (1.3), given by X0 ∈ Rd and the recursion for all k ∈ N
√
(1.4) Xk+1 = Xk + δ∇ log p(y|Xk ) + δ∇ log p(Xk ) + 2δZk+1 ,
neural networks. Indeed, neural networks can be trained as regressors to learn the function
y 7→ x̂mmse empirically from a huge dataset of examples {x′i , yi′ }N
i=1 , where N ∈ N is the size
of the training dataset. Many recent works on the topic report unprecedented accuracy. This
training can be agnostic [26, 79, 81, 33, 66, 32] or exploit the knowledge of A in the network
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
architecture via unrolled optimization techniques [37, 22, 25, 34]. However, solutions encoded
by end-to-end neural networks are mostly problem specific and not easily adapted to reflect
changes in the problem (e.g., in instrumental settings). There also exist concerns regarding
the stability of such approaches for the general reconstruction problem [5, 4].
A natural strategy to reconcile the strengths of the Bayesian paradigm and neural networks
is provided by Plug & Play approaches. These data-driven regularization approaches learn
an implicit representation of the prior density p(x) (or its potential U (x) = − log p(x)) while
keeping an explicit likelihood density, which is usually assumed to be known and calibrated
[6]. More precisely, using a denoising algorithm Dε , Plug & Play approaches seek to derive an
approximation of the gradient ∇U (called the Stein score) [11, 12] or proxU [52, 80, 19, 44, 65],
which can, for instance, be used within an iterative minimization scheme to approximate x̂MAP ,
or within an Monte Carlo sampling scheme to approximate x̂mmse [3, 38, 43]. To the best of our
knowledge, the idea of leveraging a denoising algorithm to approximate the score ∇U within
an iterative Monte Carlo scheme was first proposed in the seminal paper [3] in the context of
generative modeling with denoising autoencoders, where the authors present a Monte Carlo
scheme that can be viewed as an approximate Plug & Play MALA. This scheme was recently
combined with an expectation maximization approach and applied to Bayesian inference for
inverse problems in imaging in [38]. Similarly, the recent work [43] proposes to solve imaging
inverse problems by using a Plug & Play stochastic gradient strategy that has close connections
to an unadjusted version of the MALA scheme of [3]. While these approaches have shown
some remarkable empirical performance, they rely on hybrid algorithms that are not always
well understood and that in some cases fail to converge. Indeed, their convergence properties
remain an important open question, especially when Dε is implemented as a neural network
that is not a gradient mapping. These algorithms are better understood when interpreted as
fixed point algorithms seeking to reach a set of equilibrium equations between the denoiser
and the data fidelity term [16]. Our understanding of the convergence properties of hybrid
optimization methods has advanced significantly recently [65, 77, 70, 41], but these questions
remain largely unexplored in the context of stochastic Bayesian algorithms, to compute x̂mmse
or perform other forms of statistical inference.
The use of Plug & Play operators has also been investigated in the context of approximate
message passing (AMP) computation methods (see [27] for an introduction to AMP focused
on compressed sensing and [2] for a survey on PnP-AMP in the context of magnetic resonance
imaging), particularly for applications involving randomized forward operators where it is
possible to characterize AMP schemes in detail (see, e.g., [9, 42, 53, 20]). This is an active
area of research, and recent works have extended the approach to vector AMP strategies and
characterized their behavior for a wider class of problems [31].
Approaches based on score matching techniques [68, 39] have also shown promising results
recently [46, 45]. These methods are linked with Plug & Play approaches as they also estimate
a Stein score. However, they do not rely on the asymptotic convergence of a diffusion, but
instead aim at inverting a noising process stemming from an optimal transport problem [15].
The recent work [45] is particularly relevant in this context as it considers a range of imaging
inverse problems, where it exploits the structure of the forward operator to perform posterior
sampling in a coarse-to-fine manner. This also allows the use of multivariate step-sizes that
are specific to each scale and ensure stability. However, to the best of our knowledge, the
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
1.4. Contributions summary. This paper presents a formal framework for Bayesian analy-
sis and computation with Plug & Play priors. We propose two Plug & Play ULAs, with
detailed convergence guarantees under realistic assumptions on the denoiser used. We also
study important questions regarding whether the underlying Bayesian models and estimators
are well defined, are well posed, and have the basic regularity properties required to sup-
port efficient Bayesian computation schemes. We pay particular attention to denoisers based
on deep neural networks and report extensive numerical experiments with a specific neural
network denoiser [65] shown to satisfy our convergence guarantees.
The remainder of the paper is organized as follows. Section 2 defines notation, introduces
our framework for studying Bayesian inference methods with Plug & Play priors, and presents
two Plug & Play ULAs for Bayesian computation in imaging problems. This is then followed
by a detailed theoretical analysis of Plug & Play Bayesian models and algorithms in section 3.
Section 4 demonstrates the proposed approach with experiments related to nonblind image
deblurring and image inpainting, where we perform point estimation and uncertainty visual-
ization analyses, and report comparisons with the Plug & Play stochastic gradient descent
method of [49]. Conclusions and perspectives for future work are finally reported in section 5.
2. Bayesian inference with P lug & P lay priors: Theory, methods, and algorithms.
2.1. Bayesian modeling and analysis with P lug & P lay priors. This section presents
a formal framework for Bayesian analysis and computation with Plug & Play priors. As
explained previously, we are interested in the estimation of the unknown image x from an
observation y when the problem is ill conditioned or ill posed, resulting in significant uncer-
tainty about the value of x. The Bayesian framework addresses this difficulty by using prior
knowledge about the marginal distribution of x in order to reduce the uncertainty about x|y
and make the estimation problem well posed. In the Bayesian Plug & Play approach, instead
of explicitly specifying the marginal distribution of x, we introduce prior knowledge about
x by specifying an image denoising operator Dε for recovering x from a noisy observation
xε ∼ N (x, ε Id) with noise variance ε > 0. A case of particular relevance in this context is
when Dε is implemented by a neural network, trained by using a set of clean images {x′i }N i=1 .
A central challenge in the formalization of Bayesian inference with Plug & Play priors is
that the denoiser Dε used is generally not directly related to a marginal distribution for x, so
it is not possible to derive an explicit posterior for x|y from Dε . As a result, it is not clear that
plugging Dε into gradient-based algorithms such as ULA leads to a well-defined or convergent
scheme that is targeting a meaningful Bayesian model.
To overcome this difficulty, in this paper we analyze Plug & Play Bayesian models through
the prism of M-complete Bayesian modeling [10]. Accordingly, there exists a true—albeit
unknown and intractable—marginal distribution for x and posterior distribution for x|y. If it
were possible, basing inferences on these true marginal and posterior distributions would be
optimal both in terms of point estimation and in terms of delivering Bayesian probabilities
that are valid from a frequentist viewpoint. We henceforth use µ to denote this optimal prior
distribution for x on (Rd , B(Rd ))—where B(Rd ) denotes the Borel σ-field of Rd , and when µ
admits a density w.r.t. the Lebesgue measure on Rd , we denote it by p⋆ . In the latter case,
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
the posterior distribution for x|y associated with the marginal µ also admits a density that is
given for any x ∈ Rd and y ∈ Rm by4
.Z
⋆ ⋆
(2.1) p (x|y) = p(y|x)p (x) p(y|x̃)p⋆ (x̃)dx̃ .
Rd
Unlike most Bayesian imaging approaches that operate implicitly in an M-closed manner and
treat their postulated Bayesian models as true models (see [10] for more details), we explicitly
regard p⋆ (or more precisely µ) as a fundamental property of the unknown x, and models
used for inference as operational approximations of p⋆ specified by the practitioner (either
analytically, algorithmically, or from training data). This distinction will be useful for using
the oracle posterior (2.1) as a reference, and Plug & Play Bayesian algorithms based on a
denoiser Dε as approximations to reference algorithms to perform inference w.r.t. p⋆ . The
accuracy of the Plug & Play approximations will depend chiefly on the closeness between Dε
and an optimal denoiser Dε⋆ derived form p⋆ that we define shortly.
In this conceptual construction, the marginal µ naturally depends on the imaging appli-
cation considered. It could be the distribution of natural images of the size and resolution of
x, or that of a class of images related to a specific application. And in problems where there
is training data {x′i }N ′ N
i=1 available, we regard {xi }i=1 as samples from µ. Last, we note that
the posterior for x|y remains well defined when µ does not admit a density; this is important
to provide robustness to situations where p⋆ is nearly degenerate or improper. For clarity, our
presentation assumes that p⋆ exists, although this is not strictly required.5
Notice that because µ is unknown, we cannot verify that p⋆ (x|y) satisfies the basic desider-
ata for gradient-based Bayesian computation: i.e., p⋆ (x|y) need not be proper and differen-
tiable, with ∇ log p⋆ (x|y) Lipschitz continuous. To guarantee that gradient-based algorithms
that target approximations of p⋆ (x|y) are well defined by construction, we introduce a reg-
ularized oracle µε obtained via the convolution of µ with a Gaussian smoothing kernel with
bandwidth ε > 0. Indeed, by construction, µε has a smooth proper density pε given for any
x ∈ Rd and ε > 0 by
Z
p⋆ε (x) = (2πε)−d/2 exp [−∥x − x̃∥22 /(2ε)]p⋆ (x̃)dx̃ .
Rd
Equipped with this regularized marginal distribution, we use Bayes’ theorem to involve the
likelihood p(y|x) and derive the posterior density p⋆ε (x|y), given for any ε > 0 and x ∈ Rd by
4
Strictly speaking, the true likelihood p⋆ (y|x) may also be unknown; this is particularly relevant in the
case of blind or myopic inverse imaging problems. For simplicity, we restrict our experiments and theoretical
development to the case where p(y|x) represents the true likelihood. Generalizations of our approach to the blind
or semiblind setting are discussed, e.g., by [38]—formalizing these generalizations is an important perspective
for future work.
5
Operating without densities requires measure disintegration concepts that are technical [67].
.Z
p⋆ε (x|y) = p(y|x)p⋆ε (x) p(y|x̃)p⋆ε (x̃)dx̃ ,
Rd
which inherits the regularity properties required for gradient-based Bayesian computation
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
Under the assumption that the expected mean square error (MSE) is finite, Dε⋆ is the MMSE
estimator to recover an image x ∼ µ from a noisy observation xε ∼ N (x, ε Id) [62]. Again, this
optimal denoiser is a fundamental property of x and it is generally intractable. Motivated by
the fact that state-of-the-art image denoisers are close to optimal in terms of MSE, in subsec-
tion 2.3 we will characterize the accuracy of Plug & Play Bayesian methods for approximate
inference w.r.t. p⋆ε (x|y) and p⋆ (x|y) as a function of the closeness between the denoiser Dϵ
used and the reference Dε⋆ .
To relate the gradient x 7→ ∇ log p⋆ε (x) and Dε⋆ , we use Tweedie’s identity [30], which
states that for all x ∈ Rd
and hence x 7→ ∇ log p⋆ε (x|y) is Lipschitz continuous if and only if Dε⋆ has this property. We
argue that this is a natural assumption on Dε⋆ , as it is essentially equivalent to assuming that
the denoising problem underpinning Dε⋆ is well posed in the sense of Hadamard (recall that
an inverse problem is said to be well posed if its solution is unique and Lipschitz continuous
w.r.t. to the observation [69]). As established in Proposition 2.2 below, this happens when the
expected MSE involved in using Dε⋆ to recover x from xε ∼ N (x, ε Id), where x has marginal
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
where gε (·|xε ) is the density of the conditional distribution of the unknown image x ∈ Rd with
marginal µ, given a noisy observation xε ∼ N (x, ε Id). See subsection 3.2 for details.
Proof. The proof is postponed to Lemma SM6.2.
These results can be generalized to hold under the weaker assumption that the expected
MSE for Dε⋆ is finite but not uniformly bounded, as in this case x 7→ ∇ log p⋆ε (x|y) is locally
instead of globally Lipschitz continuous (we postpone this technical extension to future work).
The pathological case where Dε⋆ does not have a finite MSE arises when µ is such that the
denoising problem does not admit a Bayesian estimator w.r.t. to the MSE loss. In summary,
the gradient x 7→ ∇ log p⋆ε (x|y) is Lipschitz continuous when µ carries enough information to
make the problem of Bayesian image denoising under Gaussian additive noise well posed.
Notice that by using Tweedie’s identity, we can express a ULA recursion for sampling
approximately from p⋆ε (x|y) as follows:
√
(2.3) Xk+1 = Xk + δ∇ log p(y|Xk ) + (δ/ε) (Dε⋆ (Xk ) − Xk ) + 2δZk+1 ,
where we recall that {Zk : k ∈ N} are i.i.d. standard Gaussian random variables on Rd
and δ > 0 is a positive step-size. Under standard assumptions on δ, the sequence generated
by (2.3) is a Markov chain which admits an invariant probability distribution whose density
is provably close to p⋆ε (x|y), with δ controlling a trade-off between asymptotic accuracy and
convergence speed. In the following section we present Plug & Play ULAs that arise from
replacing Dε⋆ in (2.3) with a denoiser Dε that is tractable.
Before concluding this section, we study whether the oracle p⋆ (x|y) is itself well posed,
i.e., if p⋆ (x|y) changes continuously w.r.t. y under a suitable probability metric (see [48]).
We answer positively to this question in Proposition 2.3, which states that, under mild as-
sumptions on the likelihood, p⋆ (x|y) is locally Lipschitz continuous w.r.t. y for an appropriate
metric. This stability result implies, for example, that the MMSE estimator derived from
p⋆ (x|y) is locally Lipschitz continuous w.r.t. y, and hence stable w.r.t. small perturbations
of y. Note that a similar property holds for the regularized posterior p⋆ε (x|y). In particular,
Proposition 2.3 holds for Gaussian likelihoods (see section 3 for details).
Proposition 2.3. Assume that there exist Φ1 : Rd → [0, +∞) and Φ2 : Rm → [0, +∞)
such that for any x ∈ Rd and y1 , y2 ∈ Rm
and for any c > 0, Rd (1 + Φ1 (x̃)) exp[cΦ1 (x̃)]p⋆ (x)dx̃ < +∞. Then y 7→ p⋆ (·|y) is locally
R
Lipschitz w.r.t. ∥ · ∥1 , i.e., for any compact set K there exists CK ⩾ 0 such that for any
y1 , y2 ∈ K , ∥p⋆ (·|y1 ) − p⋆ (·|y2 )∥1 ⩽ CK ∥y1 − y2 ∥.
Proof. The proof is a straightforward application of Proposition SM5.3.
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
To conclude, starting from the decision-theoretically optimal model p⋆ (x|y), we have con-
structed a regularized approximation p⋆ε (x|y) that is proper and smooth by construction, with
gradients that are explicitely related to denoising operators by Tweedie’s formula. Under mild
assumptions on p(y|x), the approximation p⋆ε (x|y) is well posed and can be made arbitrarily
close to the oracle p⋆ (x|y) by controlling ε. Moreover, we established that x 7→ ∇ log p⋆ε (x)
is Lipschitz continuous when the problem of Gaussian image denoising for µ under the MSE
loss is well posed. This allows imagining convergent gradient-based algorithms for performing
Bayesian computation for p⋆ε (x|y), setting the basis for Plug & Play ULA schemes that mimic
these idealized algorithms by using a tractable denoiser Dϵ such as neural network, trained to
optimize MSE performance and hence to approximate the oracle MSE denoiser Dϵ⋆ .
2.2. Bayesian computation with P lug & P lay priors. We are now ready to study Plug
& Play ULA schemes to perform approximate inference w.r.t. p⋆ε (x|y) (and hence indirectly
w.r.t. p⋆ (x|y)). We use (2.3) as starting point, with Dε⋆ replaced by a surrogate denoiser
Dε , but also modify (2.3) to guarantee geometrically fast convergence6 to a neighborhood of
p⋆ε (x|y). In particular, geometrically fast convergence is achieved here by modifying far-tail
probabilities to prevent the Markov chain from becoming too diffusive as it explores the tails
of p⋆ε (x|y). We consider two alternatives to guarantee geometric convergence with markedly
different bias-variance trade-offs: one with excellent accuracy guarantees but that requires
using a small step-size δ and hence has a higher computational cost, and another one that
allows taking a larger step-size δ to improve convergence speed at the expense of weaker
guarantees in terms of estimation bias.
First, in the spirit of Moreau–Yosida regularized ULA [29], we define PnP-ULA as the
following recursion: given an initial state X0 ∈ Rd and for any k ∈ N,
where C ⊂ Rd is some large compact convex set that contains most of the prior probability
mass of x, ΠC is the projection operator onto C w.r.t. the Euclidean scalar product on Rd , and
λ > 0 is a tail regularization parameter that is set such that the drift in PnP-ULA satisfies a
certain growth condition as ∥x∥ → ∞ (see section 3 for details).
An alternative strategy (which we call projected PnP-ULA, i.e., PPnP-ULA; see Algo-
rithm 2.2) is to modify PnP-ULA to include a hard projection onto C, i.e., (Xk )k∈N is defined
by X0 ∈ C and the recursion for any k ∈ N
h √ i
Xk+1 = ΠC Xk + δ∇ log p(y|Xk ) + (δ/ε)(Dε (Xk ) − Xk ) + 2δZk+1 ,
6
Geometric convergence is a highly desirable property in large-scale problems and guarantees that the
generated Markov chains can be used for Monte Carlo integration.
where we notice that, by construction, the chain cannot exit C because of the action of
the projection operator ΠC . The hard projection guarantees geometric convergence with
weaker restrictions on δ and hence PPnP-ULA can be tuned to converge significantly faster
than PnP-ULA, albeit with a potentially larger bias. These two schemes are summarized in
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
Algorithms 2.1 and 2.2 below. Note the presence of a regularization parameter α in these
algorithms, which permits us to balance the weights between the prior and data terms. For
the sake of simplicity, this parameter is set to α = 1 in sections 3 and 4 but will be taken into
account in the supplementary material (M140634 01.pdf [local/web 533KB]). Subsections 3.2
and 3.3 present detailed convergence results for PnP-ULA and PPnP-ULA. Implementation
guidelines, including suggestions for how to set the algorithm parameters of PnP-ULA and
PPnP-ULA, are provided in section 4.
Last, it is worth mentioning that Algorithms 2.1 and 2.2 can be straightforwardly modified
to incorporate additional regularization terms. More precisely, one could consider a prior
defined as the (normalized) product of a Plug & Play term and an explicit analytical term.
In that case, one should simply modify the recursion defining the Markov chain by adding the
gradient associated with the analytical term. In a manner akin to [29], analytical terms that
are not smooth are involved via their proximal operator.
Before concluding this section, it is worth emphasizing that, in addition to being important
in their own right, Algorithms 2.1 and 2.2 and the associated theoretical results set the
grounds for analyzing more advanced stochastic simulation and optimization schemes for
performing Bayesian inference with Plug & Play priors, in particular accelerated optimization
and sampling algorithms [59]. This is an important perspective for future work.
⋆
pε (x) = (2πε) −d/2
exp[− ∥x − x̃∥2 /(2ε)] p⋆ (x̃)dx̃ .
Rd
One typical example of likelihood function that we consider in our numerical illustration (see
section 4) is p(y|x) ∝ exp[− ∥Ax − y∥2 /(2σ 2 )] for any x ∈ Rd with σ > 0 and A ∈ Rm×d .
We define π the target posterior distribution given for any x ∈ Rd by (dπ/dLeb)(x) =
⋆
p (x|y). We also consider the family of probability distributions {πε : ε > 0} given for any
ε > 0 and x ∈ Rd by
Z
⋆
(dπε /dLeb)(x) = p(y|x)pε (x) p(y|x̃)p⋆ε (x̃)dx̃ .
Rd
Note that in the supplementary material (M140634 01.pdf [local/web 533KB]) we investigate
the general setting where p⋆ε is replaced by (p⋆ε )α for some α > 0 that acts as a regularization
parameter. We divide our study into two parts. We recall that πε is well defined for any
ε > 0 under H1; see Proposition 2.1. We start with some notation in subsection 3.1. We then
establish nonasymptotic bounds between the iterates of PnP-ULA and πε with respect to the
total variation distance for any ε > 0, in subsection 3.2. Finally, in subsection 3.3 we establish
similar results for PPnP-ULA.
3.1. Notation. Denote by B(Rd ) the Borel σ-field of Rd , and for f : Rd → R measurable,
∥f ∥∞ = supx̃∈Rd |f (x̃)|. For µ a probability measure on (Rd , B(Rd )) and f a µ-integrable
function, denote by µ(f ) the integral of f w.r.t. µ. For f : Rd → R measurable and V :
Rd → [1, ∞) measurable, the V -norm f is given by ∥f ∥V = supx̃∈Rd |f (x̃)|/V (x̃). Let ξ be a
finite signed measure on (Rd , B(Rd )). The V -total variation distance of ξ is defined as
Z
∥ξ∥V = sup f (x̃)dξ(x̃) .
∥f ∥V ⩽1 Rd
If V = 1, then ∥·∥V is the total variation denoted by ∥·∥TV . Let U be an open set of Rd . For
any pair of measurable spaces (X, X ) and (Y, Y), measurable function f : (X, X ) → (Y, Y),
and measure µ on (X, X ) we denote by f# µ the pushforward measure of µ on (Y, Y) given for
any A ∈ Y by f# µ(A) = µ(f −1 (A)). We denote P(Rd ) theR set of probability measures over
(Rd , B(Rd )) and for any m ∈ N, Pm (Rd ) = {ν ∈ P(Rd ) : Rd ∥x̃∥m dν(x̃) < +∞}.
We denote by Ck (U, Rm ) and Ckc (U, Rm ) the set of Rm -valued k-differentiable functions,
respectively, the set of compactly supported Rm -valued and k-differentiable functions. Letting
f : U → R, we denote by ∇f the gradient of f if it exists. f is said to be m-convex with m ⩾ 0
if for all x1 , x2 ∈ Rd and t ∈ [0, 1],
f (tx1 + (1 − t)x2 ) ⩽ tf (x1 ) + (1 − t)f (x2 ) − mt(1 − t) ∥x1 − x2 ∥2 /2 .
For any a ∈ Rd and R > 0, denote B(a, R) the open ball centered at a with radius R. Let (X, X )
and (Y, Y) be two measurable spaces. A Markov kernel P is a mapping K : X×Y → [0, 1] such
that for any x̃ ∈ X, P(x̃, ·) is a probability measure and for any A ∈ Y, P(·, A) is measurable.
are w.r.t. the Lebesgue measure (denoted Leb) unless stated otherwise. For all convex and
closed sets C ⊂ Rd , we define ΠC the projection operator onto C w.r.t. the Euclidean scalar
product on Rd . For any matrix a ∈ Rd1 ×d2 with d1 , d2 ∈ N, we denote a⊤ ∈ Rd2 ×d1 its adjoint.
3.2. Convergence of PnP-ULA. In this section, we fix ε > 0 and derive quantitative
bounds between the iterates of PnP-ULA and πε with respect to the total variation distance.
To address this issue, we first show that PnP-ULA is geometrically ergodic and establish non-
asymptotic bounds between the corresponding Markov kernel and its invariant distribution.
Second, we analyze the distance between this stationary distribution and πε .
For any ε > 0 we define gε : Rd × Rd → [0, +∞) for any x1 , x2 ∈ Rd by
Z
2
(3.1) ⋆
gε (x1 |x2 ) = p (x1 ) exp[− ∥x2 − x1 ∥ /(2ε)] p⋆ (x̃) exp[− ∥x2 − x̃∥2 /(2ε)]dx̃ .
Rd
Note that g(·|Xε ) is the density with respect to the Lebesgue measure of the distribution
of X given Xε , where X is sampled according to the prior distribution µ (with density p⋆ )
and Xε = X + ε1/2 Z where Z is a Gaussian random variable with zero mean and identity
covariance matrix. Throughout, this section, we consider the following assumption on the
family of denoising operators {Dε : ε > 0} which will ensure that PnP-ULA approximately
targets πε .
H2 (R). We have that Rd ∥x̃∥2 p⋆ (x̃)dx̃ < +∞. In addition, there exist ε0 > 0, MR ⩾ 0,
R
(3.2) ∥(Id −Dε )(x1 ) − (Id −Dε )(x2 )∥ ⩽ L ∥x1 − x2 ∥ , ∥Dε (x) − Dε⋆ (x)∥ ⩽ MR ,
The Lipschitz continuity condition in (3.2) will be useful for establishing the stability and
geometric convergence of the Markov chain generated by PnP-ULA. This condition can be
explicitly enforced during training by using an appropriate regularization of the neural network
weights [65, 54]. Regarding the second condition in (3.2), MR is a bound on the error involved
in using Dε as an approximation of Dε⋆ for images of magnitude R (i.e., for any x ∈ B(0, R)),
and it will be useful for bounding the bias resulting from using PnP-ULA for inference w.r.t.
πε (recall that the bias vanishes as MR → 0 and δ → 0). For denoisers represented by neural
networks, one can promote a small value of MR during training by using an appropriate loss
function. More precisely, consider a neural network fw : Rd → Rd , parameterized by its
weights and bias gathered in w ∈ W where W is some measurable space, for any ε > 0,
one could target empirical R approximation of a loss of the form ℓε : W → [0, +∞) given for
any w ∈ W by ℓε (w) = Rd ×Rd ∥x − fw (xε )∥2 p⋆ε (xε )gε (x|xε )dxε dx. Note that such a loss is
considered in the Noise2Noise network introduced in [50].
With regard to the theoretical limitations stemming from representing Dε by a deep neural
network, universal approximation theorems (see, e.g., [7, subsection 4.7]) suggest that MR could
be arbitrarily low in principle. For a given architecture and training strategy, and if there exists
−1
M̃R ⩾ 0 such that inf w∈W supx∈B(0,R) M̃R ∥fw (x) − Dε⋆ (x)∥} ⩽ 1, then the second condition in
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
(3.2) holds upon letting Dε = fw† for an appropriate choice of weights w† ∈ W. This last
inequality can be established using universal approximation theorems such as [7, subsection
4.7]. Moreover, for any other w ∈ W, ℓε (w) ⩾ Rd ×Rd ∥x − Dε⋆ (xε )∥2 p⋆ε (xε )gε (x|xε )dxdxε = ℓ⋆ε ,
R
since for any xε ∈ Rd , Dε⋆ (xε ) = Rd x̃ gε (x̃|xε )dx̃; see (3.3). Consider w† ∈ W obtained
R
after numerically minimizing ℓε and satisfying ℓε (w† ) ⩽ ℓ⋆ε + η with η > 0. In this case, the
following result ensures that (3.2) is satisfied with MR of order η 1/(2d+2) for any R > 0 and
letting Dε = fw† .
Proposition 3.1. Assume that for any w ∈ W
Z
(3.4) (∥x∥2 + ∥fw (xε )∥2 )p⋆ε (xε )gε (x|xε )dxdxε < +∞ .
Rd
Let R, η > 0 and w† ∈ W such that ℓε (w† ) ⩽ ℓ⋆ε + η. In addition, assume that
n o
sup ∥x2 − x1 ∥−1 (∥fw† (x2 ) − fw† (x1 )∥ + ∥Dε⋆ (x2 ) − Dε⋆ (x1 )∥) < +∞ ,
x1 ,x2 ∈B(0,2R)
where Dε⋆ is given in (3.3). Then there exists CR , η̄R ⩾ 0 such that if η ∈ (0, η̄R ], then for any
x̃ ∈ B(0, R), ∥fw† (x̃) − Dε⋆ (x̃)∥ ⩽ CR η 1/(2d+2) .
Proof. The proof is postponed to subsection SM6.1.
Note that (3.4) is satisfied if for any w ∈ W, supx∈Rd ∥fw (x)∥(1 + ∥x∥)−1 < +∞ and H2
holds.
We recall that PnP-ULA (see Algorithm 2.1) is given by the following recursion: X0 ∈ Rd
and for any k ∈ N
√
(3.5) Xk+1 = Xk + δbε (Xk ) + 2δZk+1 ,
bε (x) = ∇ log p(y|x) + Pε (x) + (proxλ (ιC )(x) − x)/λ , Pε (x) = (Dε (x) − x)/ε ,
Note that for ease of notation, we do not explicitly highlight the dependency of Rε,δ and bε
with respect to the hyperparameter λ > 0 and C.
Here we consider the case where x 7→ log p(y|x) satisfies a one-sided Lipschitz condition,
i.e., we consider the following condition.
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
We refer to the supplementary material section SM3 for refined convergence rates in the
case where x 7→ log p(y|x) is strongly m-concave. Note that if H3 is satisfied with m > 0, then
x 7→ log p(y|x) is m-concave. Assume H1, then H3 holds for m = −Ly . However, it is possible
that m > −Ly , which leads to better convergence rates for PnP-ULA. As a result even when
H1 holds we still consider H3. In order to deal with H3 in the case where m ⩽ 0, we set
C ⊂ Rd to be some convex compact set fixed by the user. Doing so, we ensure the stability
of the Markov chain. The choice of C in practice is discussed in section 4. In our imaging
experiments, we recall that for any x ∈ Rd we have p(y|x) ∝ exp[− ∥Ax − y∥2 /(2σ 2 )]. If A
is not invertible, then x 7→ log p(y|x) is not m-concave with m > 0. This is the case in our
deblurring experiment when the convolution kernel has zeros in the Fourier domain.
We start with the following result, which ensures that the Markov chain (3.5) is geometri-
cally ergodic under H2 for the Wasserstein metric W1 and in V -norm for V : Rd → [1, +∞)
given for any x ∈ Rd by
Proposition 3.2. Assume H1, H2(R) for some R > 0, and H3. Let λ > 0, ε ∈ (0, ε0 ] such
that 2λ(Ly + L/ε − min(m, 0)) ⩽ 1 and δ̄ = (1/3)(Ly + L/ε + 1/λ)−1 . Then for any C ⊂ Rd
convex and compact with 0 ∈ C, there exist A1,C ⩾ 0 and ρ1,C ∈ [0, 1) such that for any
δ ∈ (0, δ̄], x1 , x2 ∈ Rd , and k ∈ N we have
First, (P1 (Rd ), W1 ) is a complete metric space [73, Theorem 6.18]. Second, for any
δ ∈ (0, δ̄], there exists m ∈ N∗ such that f m is contractive with f : P1 (Rd ) → P1 (Rd ) given
for any ν ∈ P1 (Rd ) by f(ν) = νRε,δ using Proposition 3.2. Therefore we can apply the
Picard fixed point theorem and we obtain that Rε,δ admits an invariant probability measure
πε,δ ∈ P1 (Rd ).
Therefore, since πε,δ is an invariant probability measure for Rε,δ and πε,δ ∈ P1 (Rd ), using
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
Combining this result with the fact that for any t ⩾ 0, (1 − e−t )−1 ⩽ 1 + t−1 , we get that for
any n ∈ N∗ and h : Rd → R measurable such that supx∈Rd {(1 + ∥x∥2 )−1 |h(x)|} < +∞
Xn Z
n−1 E[h(Xk )] − h(x̃)dπε,δ (x̃)
k=1 Rd
Z
−1 2 2
⩽ A1,C (δ̄ + log (1/ρ1,C )) V (x) + V (x̃)dπε,δ (x̃) (nδ) ,
Rd
where (Xk )k∈N is the Markov chain given by (3.5) with starting point X0 = x ∈ Rd .
In the rest of this section we evaluate how close the invariant measure πε,δ is to πε . Our
proof will rely on the following assumption, which is necessary to ensure that x 7→ log p⋆ε (x)
has Lipschitz gradients; see Proposition 2.2.
H4. For any ε > 0, there exists Kε ⩾ 0 such that for any x ∈ Rd ,
Z Z 2
x̃ − x̃′ gε (x̃′ |x)dx̃′ gε (x̃|x)dx̃ ⩽ Kε ,
Rd Rd
R Assume H1, H2(R) for some R > 0, H3, and H4. Moreover, let ε ∈ (0, ε0 ]
Proposition 3.4.
and assume that Rd (1 + ∥x̃∥4 )p⋆ε (x̃)dx̃ < +∞. Let λ > 0 such that 2λ(Ly + (1/ε) max(L, 1 +
Kε /ε) − min(m, 0)) ⩽ 1 and δ̄ = (1/3)(Ly + L/ε + 1/λ)−1 . Then there exists C1,ε > 0 such that
for any C convex compact with B(0, RC ) ⊂ C and RC > 0 there exists C2,ε,C such that for any
h : Rd → R measurable with supx∈Rd {|h(x)| (1 + ∥x∥2 )−1 } ⩽ 1, n ∈ N∗ , δ ∈ (0, δ̄] we have
n
X Z
−1
(3.8) n E [h(Xk )] − h(x̃)dπε (x̃)
k=1 Rd
n o
⩽ C1,ε RC−1 + C2,ε,C δ 1/2 + MR + exp[−R] + (nδ)−1 (1 + ∥x∥4 ) .
Note that for ease of notation, we do not explicitly highlight the dependency of Qε,δ and bε
with respect to the hyperparameter C.
First, we have the following result, which ensures that PPnP-ULA is geometrically ergodic
for all step-sizes.
Proposition 3.5. Assume H1, H2(R) for some R > 0. Let λ, ε, δ̄ > 0. Then for any C ⊂ Rd
convex and compact with 0 ∈ C, there exist ÃC ⩾ 0 and ρ̃C ∈ [0, 1) such that for any δ ∈ (0, δ̄],
x1 , x2 ∈ C, and k ∈ N we have
ensures that for small enough step-size δ the invariant measures of PnP-ULA and PPnP-ULA
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
C
∥πε,δ − πε,δ ∥TV ⩽ Ā exp[−ηRC ] ,
4. Experimental study. This section illustrates the behavior of PnP-ULA and PPnP-ULA
with two classical imaging inverse problems: nonblind image deblurring and inpainting. For
these two problems, we first analyze in detail the convergence of the Markov chain generated
by PnP-ULA for different test images. This is then followed by a comparison between the
MMSE Bayesian point estimator, as calculated by using PnP-ULA and PPnP-ULA, and the
MAP estimator provided by the recent PnP-SGD method [49]. We refer the reader to [49]
for comparisons with PnP-ADMM [65]. To simplify comparisons, for all experiments and
algorithms, the operator Dε is chosen as the pretrained denoising neural network introduced
in [65], for which (Dε − Id) is L-Lipschitz with L < 1.
For the deblurring experiments, the observation model takes the form
y = Ax + n ,
without any noise. Because the posterior density for x|y is degenerate, we run PnP-ULA on
the posterior x̃|y where x̃ := Px ∈ Rn denotes the vector of n = d − m unobserved pixels of
x, and map samples to the pixel space by using the affine mapping fy : Rn → Rd defined for
any x̃ ∈ Rn and y ∈ Rm by
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
fy (x̃) = P⊤ x̃ + A⊤ y.
Note that we can write the log-posterior Ũε (x̃) = − log pε (x̃|y) on the set Rn of hidden pixels
in terms of fy and the log-prior Uε (x) = − log pε (x) on the set Rd :
Ũε = Uε ◦ fy .
Using the chain rule and Tweedie’s formula, we have that for any x ∈ Rd and y ∈ Rm
Since P and fy are 1-Lipschitz, bε = −∇Ũε is also Lipschitz with constant L̃ ⩽ (L/ε).
Figure 1 shows the six test images of size 256×256 pixels that were used in the experiments.
We have selected these six images for their diversity in composition, content, and level of detail
(some images are predominantly composed of piecewise constant regions, whereas others are
rich in complex textures). This diversity will highlight strengths and limitations of the chosen
denoiser as an image prior. Figure 2 depicts the corresponding blurred images and Figure 3
the images to inpaint.
Figure 2. Images of Figure 1, blurred using a 9×9-box-filter operator and corrupted by an additive Gaussian
white noise with standard deviation σ = 1/255.
H2(R). As default choice, we recommend using a pretrained denoising neural network such
as the one described in [65]. The Lipschitz constant of the network is controlled during
training by using spectral normalization and therefore the first condition of H2(R) holds.
Moreover, the loss function used to train the network is given by ℓε as introduced in subsec-
tion 3.2. Therefore, under the conditions of Proposition 3.1, we get that the second condition of
H2(R) holds.
Step-size δ. The parameter δ controls the asymptotic accuracy of PnP-ULA and PPnP-
ULA, as well as the speed of convergence to stationarity. This leads to the following bias-
variance trade-off. For large values of δ, the Markov chain has low autocorrelation and con-
verges quickly to its stationary regime. Consequently, the Monte Carlo estimates computed
from the chain exhibit low asymptotic variance, at the expense of some asymptotic bias. On
the contrary, small values of δ produce a Markov chain that explores the parameter space less
efficiently, but more accurately. As a result, the asymptotic bias is smaller, but the variance
is larger. In the context of inverse problems that are high-dimensional and ill posed, properly
exploring the solution space can take a large number of iterations. For this reason, we recom-
mend using large values of δ, at the expense of some bias. In addition, in PnP-ULA, δ is also
subject to a numerical stability constraint related to the inverse of the Lipschitz constant of
bε (x) = ∇ log pε (x|y); namely, we require δ < (1/3) Lip(bε )−1 where
(
αL/ε + 1/λ for the inpainting problem,
Lip(bε ) =
αL/ε + Ly + 1/λ otherwise,
where L and Ly are respectively the Lipschitz constant of the denoiser residual (Dε − Id)
and the Lipschitz constant of the log-likelihood gradient. In our experiments, L = 1 and
Ly = ∥A⊤ A∥/σ 2 , so we choose δ just below the upper bound δth = 1/3(Lip(bε ))−1 where
A⊤ is the adjoint of A. For PPnP-ULA, we set δ < (L/ε + Ly )−1 (resp., δ < (L/ε)−1 for
inpainting) to prevent excessive bias.
Parameter λ. The parameter λ controls the tail behavior of the target density. As previ-
ously explained, it must be set so that the tails of the target density decay sufficiently fast
to ensure convergence at a geometric rate, a key property for guaranteeing that the Monte
Carlo estimates computed from √ the chain are consistent and subject to a central limit the-
orem with the standard O( k) rate. More precisely, we require λ ∈ (0, 1/2(L/ε + 2Ly )).
Within this admissible range, if λ is too small this limits the maximal δ and leads to a
slow Markov chain. For this reason, we recommend setting λ as large as possible below
(2L/ε + 4Ly )−1 .
Other parameters. The compact set C is defined as C = [−1, 2]d , even if in practice no
samples were generated outside of C in all our experiments, which suggests that the tail decay
conditions hold without explicitly enforcing them. In all our experiments, we set the noise
level of the denoiser Dε to ε = (5/255)2 . The initialization X0 can be set to a random vector.
In our experiments (where m = d), we chose X0 = y in order to reduce the number of burn-
Intel Xeon CPU E5-2609 server with a Nvidia Titan XP graphic card or on Idris’ Jean-Zay
servers featuring Intel Cascade Lake 6248 CPUs with a single Nvidia Tesla V100 SXM2 GPU.
Reported running times correspond to the Xeon + Titan XP configuration.
Figure 4. Marginal posterior standard deviation of the unobserved pixels for the inpainting problem. Un-
certainty is located around edges and in textured areas.
of Alley or Goldhill are a first indication that the chain explores the solution space with
ease. However, in some other cases such as the Simpson image, we observe meta-stability,
where the chain stays in a region of the space for millions of iterations and then jumps to
a different region, again for millions of iterations. This is one of the drawbacks of operating
with a posterior distribution that is not log-concave and that may exhibit several modes.
Last, Figure 6 displays the sample ACFs of the fastest and slowest converging statistics
associated with the inpainting experiments (as estimated by identifying, for each image, the
unknown pixels with lowest and highest uncertainty). These ACF plots measure how fast
samples become uncorrelated. A fast decay of the ACF is associated with good Markov chain
mixing, which in turn implies accurate Monte Carlo estimates. On the contrary, a slow decay
of the ACF indicates that the Markov chain is moving slowly, which leads to Monte Carlo
estimates with high variance. As mentioned previously, because computing and visualizing a
multivariate ACF is difficult, here we show the ACF of the chain along the slowest and the
fastest directions in the spatial domain (for completeness, we also show the ACF for a pixel
with median uncertainty). We see that independence is reached very fast in the subspaces of
low or median uncertainty and is much slower for the few very uncertain pixels.
Deblurring. We now focus on the nonblind image deblurring experiments, where, as ex-
plained previously, we perform our convergence analysis by using statistics associated with the
Fourier domain. Figure 7 depicts the marginal standard deviation of the Fourier coefficients
(in absolute value) for all images. For the three images Cameraman, Simpson, and Traffic,
all the standard deviations have a similar range of values, and the largest values are observed
around frequencies in the kernel of the blur filter (shown on the right of the same figure)
and for high frequencies. Conversely, for the three images Alley, Bridge, and Goldhill,
Fourier
transform of the
deblurring kernel.
Figure 7. Log-standard deviation maps in the Fourier domain for the Markov chains defined by PnP-ULA
for the deblurring problem. First line: images Cameraman, Simpson, Traffic. Second line: images Alley,
Bridge, and Goldhill. For the first three images, we clearly see that uncertainty is observed on frequencies
that are near the kernel of the blur filter (shown on the right) and is also higher around high frequencies (i.e.,
around edges and textured areas in images). For the last three images, very high uncertainty is observed around
some specific frequencies. In the direction of these frequencies, the Markov chain is moving very slowly and the
mixing time of the chain is particularly slow, as shown on Figure 9.
very high uncertainty is observed in the vicinity of four specific frequencies. This suggests
that the denoiser used is struggling to regularize these specific frequencies, and consequently
the posterior distribution is very spread out along these directions and difficult to explore by
Markov chain sampling as a result. Interestingly, this phenomenon is only observed in the
images that are rich in texture content.
Moreover, Figure 8 depicts the Euclidean distance between the MMSE estimator com-
puted from the entire chain (i.e., all samples) and each sample (we show one point every
2500 samples). We notice that many of the images exhibit some degree of meta-stability or
slow convergence because of the presence of directions in the solution space with very high
uncertainty. Again, this is consistent with our convergence theory, which identifies posterior
multimodality and anisotropy as key challenges that future work should seek to overcome.
Last, we show in Figure 9 the sample ACFs for the slowest and the fastest directions
in the Fourier domain.7 Again, in all experiments, independence is achieved quickly in the
fastest direction. The behaviors of the slowest direction for the three images Alley, Bridge,
and Goldhill suggest that the Markov chain is close to the stability limit and exhibits highly
oscillatory behavior as well as poor mixing.
7
The slowest direction corresponds to the Fourier coefficient with the highest (real or imaginary) variance.
4.3. Point estimation for nonblind image deblurring and inpainting. We are now ready
to study the quality of the MMSE estimators delivered by PnP-ULA and PPnP-ULA and
report comparisons with MAP estimation by PnP-SGD [49].
Quantitative results. Figure 10 illustrates the evolution of the peak signal-to-noise ratio
(PSNR) of the mean of the Markov chain (the Monte Carlo estimate of the MMSE solution),
as a function of the number of iterations, for the six images of Figure 1. These plots have
been computed by using a step-size δ = δth that is just below the stability limit and a 1-
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
in-2500 thinning. We observe that the PSNR between the MMSE solution as computed by
the Markov chain and the truth stabilizes in approximately 105 iterations in the experiments
where the chain exhibits fast convergence, whereas over 106 are required in experiments that
suffer from slow convergence (e.g., deblurring of Alley, Bridge, and Goldhill). Moreover,
we observe that using PPnP-ULA with a larger step-size can noticeably reduce the number
of iterations required to obtain a stable estimate of the posterior mean, particularly in the
image deblurring experiments.
Visual results. Figures 11 to 14 show the MMSE estimate computed by PnP-ULA on the
whole chain including the burn-in for the six images, for the inpainting and deblurring experi-
ments. We also provide the MAP estimation results computed by using PnP-SGD [49], which
targets the same posterior distributions. We report the PSNR and the structural similarity
index (SSIM) [74, 75] for all these experiments.
.
PnP-ULA, δ = δth .
.
PPnP-ULA, δ = 6δth .
Figure 10. Left: PSNR evolution of the estimated MMSE for the inpainting problem. After 5e5 iter-
ations, the convergence of the first order moment of the posterior distribution seems to be achieved for all
images. Middle and right: PSNR evolution of the estimated MMSE for the deblurring problem. The con-
vergence for the posterior mean can be fast for simple images such as Cameraman, Simpson, and Traffic
(for these images the PSNR evolution is shown for the first 5e5 iterations). Increasing the δ increases the
convergence speed for these images by a factor close to 2. For more complex images, such as Alley or
Goldhill, the convergence is much slower and is still not achieved after 3e6 iterations with PPnP-ULA for
δ = 6δth .
.
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
PnP-ULA
Figure 11. Results comparison for the inpainting task of the images presented in Figure 3 using PnP-ULA
(first row) and PnP-SGD initialized with a TVL2 restoration (second row).
.
PnP-ULA
Figure 12. Results comparison for the inpainting task of the images presented in Figure 3 using PnP-ULA
(first row) and PnP-SGD initialized with a TVL2 restoration (second row).
PnP-ULA, α = 1.
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
Figure 13. Results comparison for the deblurring task of the images presented in Figure 2 using PnP-ULA
with α = 1 (first row), PnP-SGD with α = 0.3 (second row) and α = 1 (third row). PnP-ULA was initialized
with the observation y (see Figure 2), whereas PnP-SGD was initialized with a TVL2 restoration.
For the inpainting experiments, PnP-SGD struggles to converge when initialized with the
observed image (see [49]). For this reason, we warm start PnP-SGD by using an estimate of
x obtained by minimizing the total variation pseudonorm under the constraint of the known
pixels. For simplicity, PnP-ULA is initialized with the observation y. We observe in Figures 11
and 12 that the results obtained by computing the MMSE Bayesian estimator with PnP-ULA
are visually and quantitatively superior to the ones delivered by MAP estimation with PnP-
SGD. In particular, the sampling approach seems to better recover the continuity of fine
structures and lines in the different images.
For the deblurring experiments, the results of PnP-SGD are provided by using a regular-
ization parameter α = 0.3 (which was shown to yield optimal results on this set of images
PnP-ULA, α = 1.
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
Figure 14. Results comparison for the deblurring task of the images presented in Figure 2 using PnP-ULA
with α = 1 (first row), PnP-SGD with α = 0.3 (second row) and α = 1 (third row). PnP-ULA was initialized
with the observation y (see Figure 2), whereas PnP-SGD was initialized with a TVL2 restoration.
in [49]) and for α = 1, which recovers the model used by PnP-ULA. Observe that for the
three first images (shown on Figure 13), the MMSE result is much sharper than the best
MAP result, and the PSNR/SSIM results also show a clear advantage for the MMSE. For
the other three images (results are shown on Figure 14), the quality of the MMSE solutions
delivered is slightly deteriorated by the slow convergence of the Markov chain and the poor
regularization of some specific frequencies, which leads to a common visual artifact (a rotated
rectangular pattern). Using a different denoiser more suitable for handling textures, or com-
bining a learned denoiser with an analytic regularization term, might correct this behavior
and will be the topic of future work.
A partial conclusion from this set of comparisons is that the sampling approach of PnP-
ULA, when it samples the space correctly, seems to provide much better results than the MAP
Figure 15. Marginal posterior standard deviation for the deblurring problem. On simple images such as
Simpson (see Figure 1), most of the uncertainty is located around the edges. For the images Alley, Bridge, and
Goldhill, associated with a highly correlated Markov chain in some directions, some areas are very uncertain.
They correspond to the zones where the rotated rectangular pattern appears in the MMSE estimate.
estimator for the same posterior. Of course, this increase in quality comes at the cost of a
much higher computation time.
4.4. Deblurring and inpainting: Uncertainty visualization study. One of the benefits of
sampling from the posterior distribution with PnP-ULA is that we can probe the uncertainty
in the delivered solutions. In the following, we present an uncertainty visualization analysis
that is useful for displaying the uncertainty related to image structures of different sizes and
located in different regions of the image (see [17] for more details). The analysis proceeds
as follows. First, Figures 4 and 15 show the marginal posterior standard deviation associ-
ated with each image pixel, as computed by PnP-ULA over all samples, for the inpainting
and deblurring problems. As could be expected, we observe for both problems that highly
uncertain pixels are concentrated around the edges of the reconstructed images, but also on
textured areas. The dynamic range of the pixel standard deviations is larger for the inpainting
problem than for deblurring, which suggests that the problem has a higher level of intrinsic
uncertainty.
Figure 16 shows the evolution of the RMSE between the standard deviation computed
along the samples and its asymptotic value, respectively, for the inpainting and deblurring
problems. Estimating these standard deviation maps necessitates running the chain longer
than to estimate the MMSE, as could be expected for second order statistical moment.
Inpainting. Deblurring.
Figure 16. Evolution of the RMSE between the final standard deviation and the estimated current standard
deviation for the inpainting and deblurring problems.
Following on from this, to explore the uncertainty for structures that are larger than one
pixel, Figures 17 and 18 report the marginal standard deviation associated with higher scales.
More precisely, for different values of the scale i, we downsample the stored samples by a factor
2i before computing the standard deviation. This downsampling step permits quantifying the
uncertainty of larger or lower-frequency structures, such as the bottom of the glass in Simpson
for the deblurring experiment. At each scale, we see that the uncertainty of the estimate is
much more localized for the inpainting problem (resulting in higher uncertainty values in some
specific regions) and more spread out for deblurring, certainly because of the different nature
of the degradations involves.
5. Conclusion. This paper presented theory, methods, and computation algorithms for
performing Bayesian inference with Plug & Play priors. This mathematical and computa-
tional framework is rooted in the Bayesian M-complete paradigm and adopts the view that
Plug & Play models approximate a regularized oracle model. We established clear conditions
ensuring that the involved models and quantities of interest are well defined and well posed.
Following on from this, we studied three Bayesian computation algorithms related to biased
approximations of a Langevin diffusion process, for which we provide detailed convergence
guarantees under easily verifiable and realistic conditions. For example, our theory does not
require the denoising algorithms representing the prior to be gradient or proximal operators.
We also studied the estimation error involved in using these algorithms and models instead of
the oracle model, which is decision-theoretically optimal but intractable. To the best of our
knowledge, this is the first Bayesian Plug & Play framework with this level of insight and guar-
antees on the delivered solutions. We illustrated the proposed framework with two Bayesian
image restoration experiments—deblurring and inpainting—where we computed point esti-
mates as well as uncertainty visualization and quantification analyses and highlighted how
the limitations of the chosen denoiser manifest in the resulting Bayesian model and estimates.
In future work, we would like to continue our theoretical and empirical investigation of
Bayesian Plug & Play models, methods, and algorithms. From a modeling viewpoint, it
would be interesting to consider priors that combine a denoiser with an analytic regulariza-
tion term, and other neural network–based priors such as the generative ones used in [14]
or the autoencoder-based priors in [36], as well as to generalize the Gaussian smoothing to
other smoothings and investigate their properties in the context of Bayesian inverse problems.
We are also very interested in strategies for training denoisers that automatically verify the
conditions required for exponentially fast convergence of the Langevin SDE, for example, by
using the framework recently proposed in [60] to learn maximally monotone operators, or
the data-driven regularizers described in [47, 55]. In addition, we would like to understand
when the projected RED estimator [23]—or its relaxed variant—is the MAP estimator for
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
well-defined Bayesian models, as well as to study the interplay between the geometric as-
pects of the loss defining this estimator [57] and the geometry of the set of fixed points of
the denoiser defining the model. With regard to Bayesian analysis, it would be important
to investigate the frequentist accuracy of Plug & Play models, as well as the adoption of
robust Bayesian techniques in order to perform inference directly w.r.t. to the oracle model
[76]. From a Bayesian computation viewpoint, a priority is to develop accelerated algorithms
similar to [59]. Last, with regards to experimental work, we intend to study the applica-
tion of this framework to uncertainty quantification problems, e.g., in the context of medical
imaging.
REFERENCES
[15] V. D. Bortoli and A. Durmus, Convergence of Diffusions and Their Discretizations: From Continuous
to Discrete Processes and Back, https://arxiv.org/abs/1904.09808, 2020.
[16] G. T. Buzzard, S. H. Chan, S. Sreehari, and C. A. Bouman, Plug-and-Play unplugged:
Optimization-free reconstruction using consensus equilibrium, SIAM J. Imaging Sci., 11 (2018),
pp. 2001–2020, https://doi.org/10.1137/17M1122451.
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
[17] X. Cai, M. Pereyra, and J. D. McEwen, Uncertainty quantification for radio interferometric imaging
I. Proximal MCMC methods, Monthly Not. Roy. Astronom. Soc., 480 (2018), pp. 4154–4169, https:
//doi.org/10.1093/mnras/sty2004.
[18] A. Chambolle, An algorithm for Total Variation Minimization and Applications, J. Math. Imaging
Vision, 20 (2004), pp. 89–97, https://doi.org/10.1023/B:JMIV.0000011325.36760.1e.
[19] S. H. Chan, X. Wang, and O. A. Elgendy, Plug-and-Play ADMM for image restoration: Fixed-point
convergence and applications, IEEE Trans. Comput. Imaging, 3 (2017), pp. 84–98, https://doi.org/
10.1109/TCI.2016.2629286.
[20] S. Chen, C. Luo, B. Deng, Y. Qin, H. Wang, and Z. Zhuang, BM3D vector approximate message
passing for radar coded-aperture imaging, in Proceedings of Progress in Electromagnetics Research
Symposium-Fall, IEEE, 2017, pp. 2035–2038, https://doi.org/10.1109/PIERS-FALL.2017.8293472.
[21] T. Chen, E. Fox, and C. Guestrin, Stochastic gradient Hamiltonian Monte Carlo, in Proceedings of
the 31st International Conference on Machine Learning, Proceedings of Machine Learning Research
32, PMLR, 2014, pp. 1683–1691, https://proceedings.mlr.press/v32/cheni14.html.
[22] Y. Chen and T. Pock, Trainable nonlinear reaction diffusion: A flexible framework for fast and effective
image restoration, IEEE Trans. Pattern Analysis Machine Intelligence, 39 (2017), pp. 1256–1272,
https://doi.org/10.1109/TPAMI.2016.2596743.
[23] R. Cohen, M. Elad, and P. Milanfar, Regularization by denoising via fixed-point projection (RED-
PRO), SIAM J. Imaging Sci., 14 (2021), pp. 1374–1406, https://doi.org/10.1137/20M1337168.
[24] A. S. Dalalyan, Theoretical guarantees for approximate sampling from smooth and log-concave densities,
J. R. Stat. Soc. Ser. B. Stat. Methodol., 79 (2017), pp. 651–676, https://doi.org/10.1111/rssb.12183.
[25] S. Diamond, V. Sitzmann, F. Heide, and G. Wetzstein, Unrolled Optimization with Deep Priors,
https://arxiv.org/abs/1705.08041, 2017.
[26] C. Dong, C. C. Loy, K. He, and X. Tang, Learning a deep convolutional network for image super-
resolution, in Computer Vision – ECCV 2014, Springer, New York, 2014, pp. 184–199.
[27] D. L. Donoho, A. Maleki, and A. Montanari, Message-passing algorithms for compressed sensing,
Proc. Natl. Acad. Sci. USA, 106 (2009), pp. 18914–18919, https://doi.org/10.1073/pnas.0909892106.
[28] A. Durmus and E. Moulines, Nonasymptotic convergence analysis for the unadjusted Langevin algo-
rithm, Ann. Appl. Probab., 27 (2017), pp. 1551–1587, https://doi.org/10.1214/16-AAP1238.
[29] A. Durmus, E. Moulines, and M. Pereyra, Efficient Bayesian computation by proximal Markov
chain Monte Carlo: When Langevin meets Moreau, SIAM J. Imaging Sci., 11 (2018), pp. 473–506,
https://doi.org/10.1137/16M1108340.
[30] B. Efron, Tweedie’s formula and selection bias, J. Amer. Statist. Assoc., 106 (2011), pp. 1602–1614,
https://doi.org/10.1198/jasa.2011.tm11181.
[31] A. K. Fletcher, P. Pandit, S. Rangan, S. Sarkar, and P. Schniter, Plug in estimation in high
dimensional linear inverse problems a rigorous analysis, J. Stat. Mech. Theory Exp., 2019 (2019),
124021, https://doi.org/10.1088/1742-5468/ab321a.
[32] H. Gao, X. Tao, X. Shen, and J. Jia, Dynamic scene deblurring with parameter selective sharing
and nested skip connections, in Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, 2019, pp. 3843–3851, https://doi.org/10.1109/CVPR.2019.00397.
[33] M. Gharbi, G. Chaurasia, S. Paris, and F. Durand, Deep joint demosaicking and denoising, ACM
Trans. Graphics, 35 (2016), 191, https://doi.org/10.1145/2980179.2982399.
[34] D. Gilton, G. Ongie, and R. Willett, Neumann Networks for Inverse Problems in Imaging, https:
//arxiv.org/abs/1901.03707, 2019.
[35] M. Girolami and B. Calderhead, Riemann manifold Langevin and Hamiltonian Monte Carlo methods,
J. R. Stat. Soc. Ser. B Stat. Methodol., 73 (2011), pp. 123–214, https://doi.org/10.1111/j.1467-9868.
2010.00765.x.
[36] M. González, A. Almansa, and P. Tan, Solving Inverse Problems by Joint Posterior Maximization
with Autoencoding Prior, https://arxiv.org/abs/2103.01648, 2021.
[37] K. Gregor and Y. LeCun, Learning fast approximations of sparse coding, in Proceedings of the 27th In-
ternational Conference on International Conference on Machine Learning, Omnipress, 2010, pp. 399–
406, https://icml.cc/Conferences/2010/papers/449.pdf.
[38] B. Guo, Y. Han, and J. Wen, AGEM: Solving linear inverse problems via deep priors and sampling,
in Advances in Neural Information Processing Systems, Curran Associates, 2019, pp. 547–558, https:
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
//proceedings.neurips.cc/paper/2019/file/49182f81e6a13cf5eaa496d51fea6406-Paper.pdf.
[39] J. Ho, A. Jain, and P. Abbeel, Denoising diffusion probabilistic models, Advances in Neural Informa-
tion Processing Systems, 33 (2020), pp. 6840–6851, https://proceedings.neurips.cc/paper/2020/file/
4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf.
[40] A. Houdard, C. Bouveyron, and J. Delon, High-dimensional mixture models for unsupervised im-
age denoising (HDMI), SIAM J. Imaging Sci., 11 (2018), pp. 2815–2846, https://doi.org/10.1137/
17M1135694.
[41] S. Hurault, A. Leclaire, and N. Papadakis, Gradient Step Denoiser for Convergent Plug-and-Play,
preprint, arXiv:2110.03220, 2021.
[42] A. Javanmard and A. Montanari, State evolution for general approximate message passing algorithms,
with applications to spatial coupling, Inf. Inference, 2 (2013), pp. 115–144, https://doi.org/10.1093/
imaiai/iat004.
[43] Z. Kadkhodaie and E. P. Simoncelli, Stochastic solutions for linear inverse problems using the prior
implicit in a denoiser, Advances in Neural Information Processing Systems, 2021, https://proceedings.
neurips.cc/paper/2021/file/6e28943943dbed3c7f82fc05f269947a-Paper.pdf.
[44] U. S. Kamilov, H. Mansour, and B. Wohlberg, A plug-and-play priors approach for solving nonlinear
imaging inverse problems, IEEE Signal Processing Letters, 24 (2017), pp. 1872–1876, https://doi.org/
10.1109/LSP.2017.2763583.
[45] B. Kawar, G. Vaksman, and M. Elad, SNIPS: Solving Noisy Inverse Problems Stochastically, preprint,
arXiv:2105.14951, 2021.
[46] B. Kawar, G. Vaksman, and M. Elad, Stochastic image denoising by sampling from the posterior
distribution, in Proceedings of the IEEE/CVF International Conference on Computer Vision
Workshops, 2021, pp. 1866–1875, https://openaccess.thecvf.com/content/ICCV2021W/AIM/html/
Kawar Stochastic Image Denoising by Sampling From the Posterior Distribution ICCVW 2021
paper.html.
[47] E. Kobler, A. Effland, K. Kunisch, and T. Pock, Total deep variation for linear inverse problems,
in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, https://doi.org/
10.1109/CVPR42600.2020.00757.
[48] J. Latz, On the well-posedness of Bayesian inverse problems, SIAM/ASA J. Uncertain. Quantif., 8 (2020),
pp. 451–482, https://doi.org/10.1137/19M1247176.
[49] R. Laumont, V. de Bortoli, A. Almansa, J. Delon, A. Durmus, and M. Pereyra, On
Maximum-a-Posteriori Estimation with Plug & Play Priors and Stochastic Gradient Descent, https:
//hal.archives-ouvertes.fr/hal-03348735/document, 2021.
[50] J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, and T. Aila,
Noise2noise: Learning image restoration without clean data, 80 (2018), pp. 2965–2974, https:
//proceedings.mlr.press/v80/lehtinen18a.html.
[51] C. Louchet and L. Moisan, Posterior expectation of the total variation model: Properties and experi-
ments, SIAM J. Imaging Sci., 6 (2013), pp. 2640–2684, https://doi.org/10.1137/120902276.
[52] T. Meinhardt, M. Moller, C. Hazirbas, and D. Cremers, Learning proximal operators: Using
denoising networks for regularizing inverse imaging problems, in Proceedings of the International
Conference on Computer Vision, 2017, pp. 1781–1790, https://doi.org/10.1109/ICCV.2017.198.
[53] C. A. Metzler, A. Maleki, and R. G. Baraniuk, From denoising to compressed sensing, IEEE Trans.
Inform. Theory, 62 (2016), pp. 5117–5144, https://doi.org/10.1109/TIT.2016.2556683.
[54] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, Spectral Normalization for Generative Ad-
versarial Networks, https://openreview.net/forum?id=B1QRgziT-, 2018.
[55] S. Mukherjee, S. Dittmer, Z. Shumaylov, S. Lunz, O. Öktem, and C.-B. Schönlieb, Learned
Convex Regularizers for Inverse Problems, https://arxiv.org/abs/2008.02839, 2021.
[56] M. Pereyra, Proximal Markov chain Monte Carlo algorithms, Statist. Comput., 26 (2016), pp. 745–760,
https://doi.org/10.1007/s11222-015-9567-4.
[57] M. Pereyra, Revisiting maximum-a-posteriori estimation in log-concave models, SIAM J. Imaging Sci.,
12 (2019), pp. 650–670, https://doi.org/10.1137/18M1174076.
[58] M. Pereyra, P. Schniter, E. Chouzenoux, J.-C. Pesquet, J.-Y. Tourneret, A. O. Hero, and
S. McLaughlin, A survey of stochastic simulation and optimization methods in signal processing,
IEEE J. Selected Topics Signal Processing, 10 (2015), pp. 224–241, https://doi.org/10.1109/JSTSP.
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
2015.2496908.
[59] M. Pereyra, L. Vargas Mieles, and K. C. Zygalakis, Accelerating proximal Markov chain Monte
Carlo by using an explicit stabilized method, SIAM J. Imaging Sci., 13 (2020), pp. 905–935, https:
//doi.org/10.1137/19M1283719.
[60] J.-C. Pesquet, A. Repetti, M. Terris, and Y. Wiaux, Learning maximally monotone operators for
image recovery, SIAM J. Imaging Sci. 14 (2020), https://doi.org/10.1137/20M1387961.
[61] A. Repetti, M. Pereyra, and Y. Wiaux, Scalable Bayesian uncertainty quantification in imaging
inverse problems via convex optimization, SIAM J. Imaging Sci., 12 (2019), pp. 87–118, https://doi.
org/10.1137/18M1173629.
[62] C. Robert, The Bayesian Choice: From Decision-Theoretic Foundations to Computational Imple-
mentation, Springer Texts Statist., Springer, New York, 2007, https://books.google.fr/books?id=
6oQ4s8Pq9pYC.
[63] G. O. Roberts and R. L. Tweedie, Exponential convergence of Langevin distributions and their discrete
approximations, Bernoulli, 2 (1996), pp. 341–363, https://doi.org/10.2307/3318418.
[64] L. I. Rudin, S. Osher, and E. Fatemi, Nonlinear total variation based noise removal algorithms, Phys.
D, 60 (1992), pp. 259–268, https://doi.org/10.1016/0167-2789(92)90242-F.
[65] E. K. Ryu, J. Liu, S. Wang, X. Chen, Z. Wang, and W. Yin, Plug-and-Play methods provably converge
with properly trained denoisers, in Proceedings of the 36th International Conference on Machine
Learning, Long Beach, CA, 2019, pp. 5546–5557, http://proceedings.mlr.press/v97/ryu19a.html.
[66] E. Schwartz, R. Giryes, and A. M. Bronstein, DeepISP: Toward learning an end-to-end image
processing pipeline, IEEE Trans. Image Process., 28 (2018), pp. 912–923, https://doi.org/10.1109/
TIP.2018.2872858.
[67] L. Schwartz, Désintégration d’une mesure, Séminaire Équations aux dérivées partielles (Polytechnique),
http://eudml.org/doc/111551.
[68] Y. Song and S. Ermon, Generative modeling by estimating gradients of the data distribution, in Advances
in Neural Information Processing Systems, Vol. 32, 2019, https://proceedings.neurips.cc/paper/2019/
file/3001ef257407d5a371a96dcd947c7d93-Paper.pdf.
[69] A. M. Stuart, Inverse problems: A Bayesian perspective, Acta Numer., 19 (2010), pp. 451–559, https:
//doi.org/10.1017/S0962492910000061.
[70] Y. Sun, Z. Wu, B. Wohlberg, and U. S. Kamilov, Scalable Plug-and-Play ADMM with convergence
guarantees, IEEE Trans. Comput. Imaging, 7 (2021), pp. 849–863, https://doi.org/10.1109/TCI.2021.
3094062.
[71] A. M. Teodoro, J. M. Bioucas-Dias, and M. A. T. Figueiredo, Scene-Adapted Plug-and-Play
Algorithm with Guaranteed Convergence: Applications to Data Fusion in Imaging, https://arxiv.org/
abs/1801.00605, 2018.
[72] S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg, Plug-and-Play priors for model based
reconstruction, in Proceedings of the Global Conference on Signal and Information Processing, IEEE,
2013, pp. 945–948, https://doi.org/10.1109/GlobalSIP.2013.6737048.
[73] C. Villani, Optimal Transport: Old and New, Grundlehren Math. Wiss. 338, Springer-Verlag, Berlin,
2009, https://doi.org/10.1007/978-3-540-71050-9.
[74] Z. Wang and A. C. Bovik, Mean squared error: Love it or leave it? A new look at signal fidelity
measures, IEEE Signal Processing Magazine, 26 (2009), pp. 98–117, https://doi.org/10.1109/MSP.
2008.930649.
[75] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, Image quality assessment: From error
visibility to structural similarity, IEEE Trans. Image Processing, 13 (2004), pp. 600–612.
[76] J. Watson and C. Holmes, Approximate models and robust decisions, Statist. Sci., 31 (2016), pp. 465–
489, https://doi.org/10.1214/16-sts592.
[77] X. Xu, Y. Sun, J. Liu, B. Wohlberg, and U. S. Kamilov, Provable convergence of Plug-and-Play
priors with MMSE denoisers, IEEE Signal Processing Letters, 27 (2020), pp. 1–10, https://doi.org/
10.1109/LSP.2020.3006390.
[78] G. Yu, G. Sapiro, and S. Mallat, Solving inverse problems with piecewise linear estimators: From
Gaussian mixture models to structured sparsity, IEEE Trans. Image Processing, 21 (2011), pp. 2481–
2499, https://doi.org/10.1109/TIP.2011.2176743.
[79] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, Beyond a Gaussian denoiser: Residual
learning of deep CNN for image denoising, IEEE Trans. Image Processing, 26 (2017), pp. 3142–3155,
Downloaded 12/19/22 to 131.215.220.164 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
https://doi.org/10.1109/TIP.2017.2662206.
[80] K. Zhang, W. Zuo, S. Gu, and L. Zhang, Learning Deep CNN Denoiser Prior for Image Restoration,
in Proceedings of the Conference on Computer Vision and Pattern Recognition, IEEE, 2017, pp. 2808–
2817, https://doi.org/10.1109/CVPR.2017.300.
[81] K. Zhang, W. Zuo, and L. Zhang, FFDNet: Toward a fast and flexible solution for CNN-based image
denoising, IEEE Trans. Image Processing, 27 (2018), pp. 4608–4622, https://doi.org/10.1109/TIP.
2018.2839891.
[82] D. Zoran and Y. Weiss, From learning models of natural image patches to whole image restoration, in
Proceedings of the International Conference on Computer Vision, IEEE, 2011, pp. 479–486, https:
//doi.org/10.1109/ICCV.2011.6126278.