Parameter-Free Plug-And-play Admm For Image Restoration
Parameter-Free Plug-And-play Admm For Image Restoration
School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907.
ABSTRACT
Plug-and-Play ADMM is a recently developed variation of the clas-
sical ADMM algorithm that replaces one of the subproblems using
an off-the-shelf image denoiser. Despite its apparently ad-hoc na-
ture, Plug-and-Play ADMM produces surprisingly good image re-
covery results. However, since in Plug-and-Play ADMM the de-
noiser is treated as a black-box, behavior of the overall algorithm
is largely unknown. In particular, the internal parameter that con-
trols the rate of convergence of the algorithm has to be adjusted
by the user, and a bad choice of the parameter can lead to severe
degradation of the result. In this paper, we present a parameter-free
Plug-and-Play ADMM where internal parameters are updated as part (a) Input (b) Ground Truth
of the optimization. Our algorithm is derived from the generalized
approximate message passing, with several essential modifications.
Experimentally, we find that the new algorithm produces solutions
along a reliable and fast converging path.
Index Terms— Plug-and-Play, ADMM, image restoration,
parameter-free, generalized approximate message passing
1. INTRODUCTION
tuning. It does not, however, mean that the regularization param- τ (k+1)
v = (τ (k)
x − πx
(k+1)
)./(τ (k) 2
x ) , (10)
eter λ is automatically tuned. Should λ be tuned automatically, a (k+1)
= (x (k+1)
−x (k+1)
)./τ (k)
u e x . (11)
few existing methods can be considered, e.g., SURE [14] and cross
validation [15], etc. In this set of equations, the function proxτ f is the proximal operator,
Our key idea behind the parameter-free Plug-and-Play ADMM defined as
is the Generalized Approximate Message Passing (GAMP) in the
1 1
compressive sensing literature [16–20]. GAMP is a generic algo- proxτ f (e
x) = argmin kAx − yk2τ + kx − x
ek2 , (12)
rithm that solves problems in the form of (1), typically for f (x) = x 2 2
kAx − yk2 with a random matrix A and a regularization function 2 2
g(x) = kxk1 . In GAMP, the internal parameters are self-updated, Pn the2 norm k · kτ is a weighted norm given by kxkτ =
where
τ x
i=1 i i . The variable τ x can be regarded as a vector version of
which is a feature this paper attempts to obtain. Another piece of
the internal parameter ρ in ADMM, and π x can be regarded as a
related work is the Denoiser-AMP by Metzler et al. [21], where a
measure of the variance x conditioned on u.
denoiser was used to replace the shrinkage step in the classical AMP.
The computation on the input node involves
Convergence of Denoiser-AMP is known for i.i.d. Gaussian matrix
A but not for general matrices. e(k+1) = v (k) + τ (k+1)
v v · u(k+1) , (13)
The goal of this paper is to derive a parameter-free Plug-and-
(k+1) (k)
Play ADMM from GAMP. In Section II we provide a brief introduc- v = proxτ (k+1) λg (e
v ), (14)
v
tion to the GAMP algorithm. Then in Section III, we show how the ∂
GAMP algorithm can be modified into a parameter-free Plug-and- π (k+1)
v = τ (k+1)
v · proxτ (k) λg (e
v) , (15)
∂ve v v (k)
e=e
v
Play ADMM. Experimental results are shown in Section IV.
τ (k+1)
x = π (k+1)
v . (16)
Pn
2. GENERALIZED APPROXIMATE MESSAGE PASSING For separable g(v) = i=1 gi (vi ), the proximal operator proxτ g (e
v)
reads as
In this section we provide a brief introduction to the generalized ap-
1
proximate message passing (GAMP) algorithm. For full discussions proxτi λg (e
vi ) = argmin τi λgi (v) + (v − vei )2 . (17)
of GAMP, we refer the readers to [18]. Among all image restoration v 2
2.2. Equivalence between GAMP and ADMM where in (a) we used the fact that a convolution matrix A is diago-
nalizable by the Fourier transform√matrix F to yield A = F T ΛF ,
In the above input and output computation, if we ignore Equations and in (b) we observe that F 1 = n[1, 0, . . . , 0]T . The scalar λ1,1
(9)–(10) and (15)–(16), we will arrive at an ADMM algorithm with is the first entry of the eigenvalue matrix Λ, which is 1 for convolu-
vector-valued parameters τ x and τ v , instead of a common scalar tional matrices. Substituting this result into (9), we have
parameter ρ. More specifically, GAMP and ADMM are related ac-
cording to the following theorem [17]: πx(k+1) = τx(k) /(τx(k) + 1). (21)
Theorem 1 The iterates of the GAMP satisfy
3.2. Input Node
1
v (k+1) = argmin L(x(k) , v, u(k) ) + kv − v (k) k2τ (k) ,
v 2 v On the input node, we have to replace the proximal operator by a
(k+1) (k+1) (k) 1 (k+1) 2
denoiser Dσ . Here,
√ the noise level of the denoiser, σ, should be
x = argmin L(x, v , u ) + kx − v kτ (k) , defined as σ = τx λ. This explains (14).
x 2 x
For the derivative in (15), we note that since τv is now a scalar,
1
u(k+1) (k)
= u + (k) (x (k+1)
−v (k+1)
), we have to replace the derivative by its divergence (which is the sum
τx of gradients). This gives
where L(x, v, u) = f (x) + λg(v) + uT (x − v) is the Lagrangian πv(k+1) = τv(k+1) divDσ (e
v (k) ). (22)
function.
Calculating the divergence can be performed numerically using a
Therefore, the key difference between GAMP and ADMM is the pa-
(k) (k) Monte Carlo scheme. More specifically, the divergence of the de-
rameters τ v and τ x . This suggests that if we want to derive a e can be approximated by
noiser at v
Plug-and-Play ADMM from GAMP, we must first define the param-
(k) (k)
eters τ v and τ x . Dσ (e
v + ǫb) − Dσ (e
v)
divDσ (ev ) = lim Eb bT
ǫ→0 ǫ
3. PARAMETER-FREE PLUG-AND-PLAY D σ (e
v + ǫb) − Dσ (e
v )
≈ bT , (23)
nǫ
We now derive the parameter-free Plug-and-Play using the GAMP
formulation above. There are two major modifications we need for where b ∼ N (0, I) is a random vector, and ǫ ≪ 1 is a small con-
the derivation. stant (typically ǫ = 10−3 ). The approximation of the expectation
• The vector-valued parameters τ x and τ v should become generally holds for large n due to concentration of measure [14].
scalars τx and τv . This would allow us to consider arbitrary Numerically, computing (23) only requires evaluating the de-
denoisers which are not-necessarily separable. noiser twice: once for Dσ (e
v ), and the other time for Dσ (e
v + ǫb).
• The proximal operator in (14) is replaced by an off-the-shelf
denoiser as defined in (5) so that it fits the Plug-and-Play 3.3. Final Algorithm
framework.
The final algorithm can be derived by substituting (18) into (8), (21)
With these two modifications we can consider the output and the into (9) for the output nodes, and (5) into (14), (22) into (15) for the
input nodes. We also note that among the equations (7)-(16), the input nodes. Moreover, we can simplify steps by defining
biggest challenges are the proximal operators. The following two
subsections will address these operators. ρx = 1/τx , ρv = 1/τv . (24)
3.1. Output Node Then we can show that the GAMP algorithm can be simplified to
Algorithm 1. We call the resulting algorithm Plug-and-Play general-
For a convolution matrix A, the proximal operator can be shown as ized approximate message passing (PAMP).
As shown in Algorithm 1, the difference between PAMP and
τx 1
proxτx f (e
x) = argmin kAx − yk2 + kx − x ek2 Plug-and-Play ADMM is the parameters ρx and ρv . In Plug-and-
x 2 2 Play, the parameters share the same value ρ and is fixed throughout
−1
= τx A T A + I τx A T y + x
e . (18) the iterations. In PAMP, ρx and ρv are automatically updated as part
of the algorithm. Therefore, PAMP is a parameter-free algorithm.
e yields
Taking derivative with respect to x
−1 4. EXPERIMENTAL RESULTS
∂
x ) = τx A T A + I
proxτx f (e 1 (19)
∂xe In this section we present experimental results to evaluate the per-
Note that this is a vector of gradients. Since we are looking for a formance of the proposed PAMP algorithm. We focus on the image
scalar, one option is to consider the divergence. This yields deblurring problem, although the method can easily be extended to
image interpolation and image super-resolution.
n o 1 −1 We test the algorithm using 10 standard gray-scale images. Each
div proxτx f (e x) = 1T τx AT A + I 1 (20) image is blurred by a spatially invariant Gaussian blur kernel of size
n
(a) 1 T −1 9 × 9 and standard deviation 1. Additive i.i.d. Gaussian noise of
= 1 F T τx |Λ|2 + I F1 zero mean and standard deviation σ = 5/255 is added to the blurred
n
(b) image. Two denoisers Dσ are considered in this experiment: total
= (τx |λ1,1 |2 + 1)−1 = (τx + 1)−1 , variation denoising [22], and BM3D [13]. For total variation, we
27 28
26 27
25 26
PSNR (dB)
PSNR (dB)
24 rho=0.01 25 rho=0.01
rho=0.03 rho=0.03
rho=0.08 rho=0.08
23 rho=0.22 24 rho=0.22
rho=0.60 rho=0.60
rho=1.67 rho=1.67
22 23
rho=4.64 rho=4.64
rho=12.92 rho=12.92
21 rho=35.94 22 rho=35.94
rho=100.00 rho=100.00
PAMP PAMP
20 21
0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50
iteration iteration
Fig. 2: PSNR of PAMP compared to Plug-and-Play ADMM using a fixed ρ. (a) using total variation denoising as the denoiser; (b) using
BM3D as the denoiser. In this figure, all PSNR values are averaged over 10 testing images.
Algorithm 1 Proposed Algorithm: PAMP Additional results, including other types of blur and other noise
(0) (0) (0) levels, can be found at https://engineering.purdue.edu/ChanGroup/
1: Initialize u = x = 0, ρv = 1.
2: for k = 0, 1, . . . , kmax do
3: % (v-subproblem) 5. DISCUSSION AND CONCLUSION
(k) (k)
4: e=x
v + (1/ρv )ut q Why it works? Line 6 and Line 11 of Algorithm 1 reveals that there
(k)
5: v (k+1) = Dσ (e
v ), where σ = λ/ρv are two opposite forces in updating ρv and ρx :
(k+1) (k)
6: ρx = ρv /divDσ (e
v) ρx ← ρv /divDσ (e
v ), (25)
7: n o
8: % (x-subproblem) ρv ← ρx /(ρx + 1) = ρx div prox(1/ρx )f (e
x) . (26)
9: e = v (k+1) − (1/ρ(k+1)
x x )u(k)
10: x(k+1) T
= (A A + ρx
(k+1) (k+1)
I)−1 (AT y + ρx e)
x Divergence of a function is an indicator of the sensitivity with respect
(k+1) (k+1) (k+1) to the input. When divDσ is large, the denoiser Dσ behaves sensi-
11: ρv = ρx /(ρx + 1) tively at ve and so the denoised output is less reliable. Thus, PAMP
12: makes ρx small to attenuate the influence of the denoiser when solv-
13: % (Multiplier update) ing the inversion. Now, since ρx becomes small, the inversion is
(k+1)
14: u(k+1) = u(k) + ρx (x(k+1) − v (k+1) ) weak and so in the next iteration a strong denoiser is needed. This is
15: end for achieved by decreasing ρv in (26). These two opposite forces form
a trajectory of the pair (ρx , ρv ). As k → ∞, one can show that
(ρx , ρv ) approaches a steadynstate where the two
o divergence terms
use the MATLAB implementation in [4], whereas for BM3D, we coincides: divDσ (e
v ) = div prox(1/ρx )f (e
x) .
use the code available on author’s website. When total variation is
used, we set the regularization parameter λ = 10−2 . When BM3D Convergence? Convergence of GAMP is an open problem. The
is used, we set λ = 10−3 . These values are selected as they produce best result we know so far is that with appropriately designed damp-
the best overall result for the entire dataset. ing strategies, GAMP converges for strictly convex functions f and
Since the objective of PAMP is to automatically select ρ, we g [23]. However, when damping is not used, there are examples
compare PAMP with Plug-and-Play ADMM which uses a fixed ρ. where GAMP diverges [24]. Moving from an explicit g to an im-
We select 10 values of ρ from 10−2 to 102 in the logarithmic scale. plicit denoiser Dσ will cause additional challenges yet to be studied.
For each ρ, we run Plug-and-Play ADMM for 50 iterations and
record the PSNR values. For PAMP, we initialize the algorithm with Conclusion. Plug-and-Play generalized approximate message pass-
(0) ing (PAMP) is a new algorithm that automatically updates the inter-
ρv = 1 and let the algorithm to update ρx and ρv internally. nal parameters of a conventional Plug-and-Play ADMM. The update
The results are shown in Figure 2. In this figure, the PSNR val- rules are based on the measure of the divergence of the subproblems,
ues are averaged over the 10 testing images. As can be observed, the and are derived from the generalized approximate message passing
parameter ρ has important influence to the Plug-and-Play ADMM (GAMP). At the current stage, numerical results show that PAMP
algorithm where large ρ tends to converge slower and approaches is a promising algorithm as it generates solutions through a rapid
a solution with low PSNR. If ρ is too small, e.g., ρ = 0.01 in the convergence path. Intuitive arguments of the algorithm are made.
BM3D case, the PSNR actually drops rapid after the first iteration. Future work should focus on the convergence analysis.
As for PAMP, we observe that in both denoisers the solution picks a
rapid convergence path with almost the highest PSNR.
6. REFERENCES processing framework,” ACM Transactions on Graphics (Pro-
ceedings SIGGRAPH Asia 2014), vol. 33, no. 6, Dec. 2014.
[1] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Dis-
[13] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image
tributed optimization and statistical learning via the alternating
denoising by sparse 3D transform-domain collaborative filter-
direction method of multipliers,” Found. Trends Mach. Learn.,
ing,” IEEE Trans. Image Process., vol. 16, no. 8, pp. 2080–
vol. 3, no. 1, pp. 1–122, Jan. 2011.
2095, Aug. 2007.
[2] J. Yang, Y. Zhang, and W. Yin, “An efficient TVL1 algo-
[14] S. Ramani, T. Blu, and M. Unser, “Monte-Carlo SURE: A
rithm for deblurring multichannel images corrupted by impul-
black-box optimization of regularization parameters for gen-
sive noise,” SIAM J. on Sci. Comput., vol. 31, no. 4, pp. 2842–
eral denoising algorithms,” IEEE Trans. Image Process., vol.
2865, Jul. 2009.
17, no. 9, pp. 1540–1554, 2008.
[3] M. Afonso, J. Bioucas-Dias, and M. Figueiredo, “Fast im- [15] N. Nguyen, P. Milanfar, and G. Golub, “Efficient generalized
age recovery using variable splitting and constrained optimiza- cross-validation with applications to parametric image restora-
tion,” IEEE Trans. Image Process., vol. 19, no. 9, pp. 2345– tion and resolution enhancement,” IEEE Trans. Image Pro-
2356, Apr. 2010. cess., vol. 10, no. 9, pp. 1299–1308, Sep. 2001.
[4] S. H. Chan, R. Khoshabeh, K. B. Gibson, P. E. Gill, and T. Q.
[16] M. Borgerding and P. Schniter, “Generalized approximate mes-
Nguyen, “An augmented Lagrangian method for total variation
sage passing for the cosparse analysis model,” in Proc. IEEE
video restoration,” IEEE Trans. Image Process., vol. 20, no. 11,
Int. Conf. Acoustics, Speech, Signal Process. (ICASSP), 2015,
pp. 3097–3111, May 2011.
pp. 3756–3760.
[5] C. A. Bouman, “Model-based image processing,” Avail-
[17] S. Rangan, P. Schniter, E. Riegler, A. Fletcher, and V. Cevher,
able online at https://engineering.purdue.edu/
“Fixed points of generalized approximate message passing
˜bouman/publications/pdf/MBIP-book.pdf, with arbitrary matrices,” in Proc. IEEE Int. Symp. Information
2015.
Theory, 2013, pp. 664–668.
[6] S. H. Chan, X. Wang, and O. A. Elgendy, “Plug-and-
Play ADMM for image restoration: Fixed point convergence [18] S. Rangan, “Generalized approximate message passing for es-
and applications,” IEEE Trans. Computational Imaging, timation with random linear mixing,” in Proc. IEEE Int. Symp.
In Press. Available online at http://arxiv.org/abs/ Information Theory, 2011, pp. 2168–2172.
1605.01710. [19] D. L. Donoho, A. Maleki, and A. Montanari, “Message passing
[7] S. Venkatakrishnan, C. Bouman, and B. Wohlberg, “Plug- algorithms for compressed sensing,” Proc. Nat. Acad. Sci., vol.
and-play priors for model based reconstruction,” in Proc. 106, no. 45, pp. 18914–18919, 2009.
IEEE Global Conference on Signal and Information Process- [20] D. L. Donoho, A. Maleki, and A. Montanari, “How to de-
ing, 2013, pp. 945–948. sign message passing algorithms for compressed sensing,”
[8] S. Sreehari, S. V. Venkatakrishnan, B. Wohlberg, G. T. Buz- Tech. Rep., Rice University, 2011, Available online at
zard, L. F. Drummy, J. P. Simmons, and C. A. Bouman, “Plug- http://www.ece.rice.edu/ mam15/bpist.pdf.
and-play priors for bright field electron tomography and sparse [21] C. A. Metzler, A. Maleki, and R. G. Baraniuk, “From denois-
interpolation,” IEEE Trans. Computational Imaging, vol. 2, ing to compressed sensing,” IEEE Trans. Information Theory,
no. 4, pp. 408 – 423, Dec. 2016. vol. 62, no. 9, pp. 5117–5144, Sep. 2016.
[9] S. H. Chan, “Algorithm-induced prior for image restora- [22] L. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation
tion,” Available online at http://arxiv.org/abs/ based noise removal algorithms,” Physica D, vol. 60, pp. 259–
1602.00715, Feb. 2016. 268, 1992.
[10] A. Brifman, Y. Romano, and M. Elad, “Turning a denoiser into [23] J. Vila, P. Schniter, S. Rangan, F. Krzakala, and L. Zdeborova,
a super-resolver using plug and play priors,” in Proc. IEEE Int. “Adaptive damping and mean removal for the generalized ap-
Conf. Image Process. (ICIP), Sep. 2016, pp. 1404–1408. proximate message passing algorithm,” in Proc. IEEE Int.
[11] A. Rond, R. Giryes, and M. Elad, “Poisson inverse problems Conf. Acoustics, Speech, Signal Process. (ICASSP), 2015, pp.
by the plug-and-play scheme,” Journal of Visual Communi- 2021–2025.
cation and Image Representation, vol. 41, pp. 96–108, Nov. [24] S. Rangan, P. Schniter, and A. K. Fletcher, “On the conver-
2015. gence of approximate message passing with arbitrary matri-
[12] F. Heide, M. Steinberger, Y.-T. Tsai, M. Rouf, D. Pajak, ces,” in Proc. IEEE Int. Symp. Information Theory, 2014, pp.
D. Reddy, O. Gallo, J. Liu abd W. Heidrich, K. Egiazarian, 236–240.
J. Kautz, and K. Pulli, “FlexISP: A flexible camera image