4 Proximal Methods and ADMM Modified Ver1
4 Proximal Methods and ADMM Modified Ver1
and ADMM
Non-smooth functions
f ( x)
Non Smooth Function - l1 norm
x
f ( x ) x abs ( x )
f ( x) x 1
abs ( xi )
i
f ( x) x 1 i 1 xi abs ( xi )
n
x1
x2
Non-smooth+Smooth
f ( x) x 1 x v
2
f 2 ( x) x v
2
f1 ( x) x 1
x
Non-smooth + Smooth=well behaved
Finding optimal point
x* arg min f ( x) = x ( x c) 2
x
f ( x) sign( x) 2( x c) 0
d
x sign (x ), x 0
dx
Algorithm for 1D
if c x* c sign(c )
2 2
Else x* 0
x1 c1
x c
x2 c2
x1 c1
f ( x) x 1 x c x1 x2 [ x1 c1 , x2 c2 ]
2
x
2 2 c
x1* c1
sign( x1* ) x* c
other wise 0
2 1 1 sign(c1 ) c1
sign(c1 ) ; if c1
2 2 2
*
*
x2 c2 *
sign( x2 ) x2 c2 sign ( c2 ) c2 sign ( c2 ) ; if c2 other wise 0
2 2 2 2
Exercise
5
3
min f ( x) x 1 x c
2
1 c
x 7
1
ci / 2 1 / 2 i
x1* c1 sign(c1 )
* sign(c ) 1 9
x c 5
2 2 2
2 2
x3* c3 2 sign(c3 ) x1*
* * 3 1 5
x c
4 4 sign ( c )
4 2
x2 2
x1* 5 1 x3* 1 13
* 1 * 7
x 3
2 1 x4 2 2
x3* 7 2 1 1 1
* 1
x4 1 1 2 2
Another equivalent form
1 2
min f ( x) x 1 x c
x 2
1
sign( x) ( x c) 0 x* c sign(c )
Algorithm
*
c sign(c), if c ,
x S (c)
0, if c
c , if c > 0 >
Shrink towards 0 by
x S (c) c , if c < 0, c >
*
0 While shrinking do not cross 0
if c
Proximal methods and Proximal
Operators
1 2
Proximal operator of a function prox(v) arg min f ( x) x v 2
f x 2
1 2
prox f (v ) arg min f ( x) x v 2 0
x 2
ADMM-Alternating Direction Method of
Multipliers
Why ADMM ?
ADMM is a simple and powerful iterative algorithm for convex optimization problems.
1
x k 1 is the solution of f ( x)
x z k
uk 0 vector
1
x
2
k 1 k 1 k
z arg min g ( z ) z u High dim parabola
2 2
z
1
z k 1 is the solution of f ( z )
z x k 1
uk 0 vector
u k 1 u k x k 1 z k 1
ADMM -Philosophy
Increase number of variables from one set to three set , solve one set at a time
x1 z1 0
min f ( x) g ( z ) x z2 0
subject to x z 0 x z 0 vector 2
xn zn 0
Lagrangian Multiplier term
y1 x1 z1 y2 x2 z2 yn xn zn
2
L ( x, y, z ) f ( x) g ( z ) yT ( x z ) x z
2
2
x k 1 :arg min L x, z k , y k
x
z k 1 :arg min L x k 1 , z , y k
Augmented langrangian term
z
y k 1 : y k x k 1 z k 1
min f ( x) g ( z )
min f ( x) g ( x)
x subject to x z 0
KKT condition
f ( x) g ( x) 0 vector L ( x, y , z ) f ( x ) g ( z ) y T ( x z )
KKT Conditions
L
f ( x) y 0 (1)
x
L
g ( z ) y 0 (2)
z
(1) + (2) f ( x) g ( z ) 0 vector-----(3)
L
x z 0vector x z (4)
y
(3) and (4) f ( x) g ( x) 0vector
T 2
L ( x, y , z ) f ( x ) g ( z ) y ( x z ) x z 2
2
KKT Conditions
L
f ( x) y x z 0 (1)
x
L
f ( z ) y x z 0 (2)
z
(1) + (2) f ( x) g ( z ) 0 vector-----(3)
L
x z 0vector x z (4)
y
(3) and (4) f ( x) g ( x) 0vector
y ,x y ,z
2
The lagrangian function is solved ' one set ' of var iable at a time.
In the k 1 th iteration for computing x, we assume y k , z k is known .
x k , y k , z k are values of x, y and z obtained in k th iteration
cons tan t y ,x
2
cons tan t
y ,x
2
k 1
L ( x)
You can think that , you obtained x by solving 0 vector
x
Update of z in the k+1 th iteration.
We assume x and y vectors are known.
We rewrite Lagrangian as follows
k 1
L ( z ) f ( x ) g ( z ) y x y
k 1 k T k 1
k T
z ( x z )T ( x k 1 z )
2
cons tan t Cons tan t
2
y,z
k 1 k k 1 2
z arg min g ( z ) y , z x z
z 2 2
k 1
L ( z )
You can think that , you obtained z by solving 0 vector
z
Update of y in the k+1 th iteration.
y vector elements are lagrangian multipliers.
The iterative algorithm proceeds towards (x * , z * , y* ) which is a saddle point of L.
This means , w.r.t x, z variables the lagrangian function L is minimum at (x * , z * , y* ).
Also, w.r.t y variables the lagrangian function L is maximum at (x * , z* , y* ).
So, y vector is updated towards the maximum gradient of L with respect to y
L
This direction is given by
y
k ! k 1 T k 1 k 1 k 1 k 1 2
L ( y ) f ( x ) g ( z ) y ( x z ) x z
2 2
y k 1 : y k x k 1 z k 1
k 1
1
2
x arg min f ( x) x zk yk
x 2
2
k 1
k 1 1
2
z arg min g ( z ) x z yk
z 2
2
y k 1 y k x k 1 z k 1
We verify it by expansion.
Consider first optimization
2
1
Let L ( x) f ( x) x zk yk
2 2
T
1 k 1 k
k k
f ( x) x z y x z y
2
k T 1 1 k 1 k
T
f ( x) x z x z 2 x z
k T k k
y y y
2
x z x z x
k T k T
=f ( x) y k constant
2
On rearranging
T k k 2
L ( x) f ( x) x y x z
2 2
k 1 k k 2
So, x arg min f ( x) x, y x z
x 2 2
We verify it by expansion.
Consider second optimization term
2
1
Let L ( z ) g ( z ) x k 1 z yk
2 2
T
k 1 1 k k 1 1 k
g ( z) x z y x z y
2
k 1 T 1 1 k 1 k
T
g ( z ) x z x z 2 x z
T k 1 k 1 k
y y y
2
z x z z T y k constant
k 1 T k 1
=g ( z ) x
2
On rearranging
T k k 1 2
L ( z ) g ( z ) z y x z
2 2
k 1 k k 1 2
So, z arg min g ( z ) z , y x z
z 2 2
Simplify
1 1
With u
k
y ,
k
1
2
k 1 1 k k 2
k 1
x arg min f ( x) x z k y k x arg min f ( x) x (z u )
x 2 x 2 2
2
k 1 1 k 1 k 2
1
2
z arg min g ( z ) z (x u )
z arg min g ( z ) x k 1 z y k
k 1
z 2 2
2
y k 1 y k x k 1 z k 1
z
2
y k 1 y k x k 1 z k 1
x k 1 : prox f z k
uk
z k 1 : prox g x k 1
uk
u k 1 :u k x k 1 z k 1
Let us be more wise after the event
Start with augmented Lagrangian function in the following form
1 2
L ( x, u, z ) f ( x) g ( z ) x z u
2
2
instead of
T
L ( x, y, z ) f ( x) g ( z ) y ( x z )
2
x z
2
2
Indicator function 0 if z C
g ( z )
otherwise
1 2
x k 1 arg min f ( x) x zk uk
x 2 2
z k 1 C x k 1 u k
C stands for Projection
u k 1 u k x k 1 z k 1 onto Convex set C
k 1
1
2
x arg min f ( x) x zk yk
x 2
2
k 1
k 1 1
2
z arg min g ( z ) x z yk
z 2
2
y k 1 y k x k 1 z k 1
Explanation for projection
z k 1 C x k 1 u k
Consider augmented Lagrangian
1 2
L ( x, z , u ) f ( x) g ( z ) x z u 2
2
Assume x k 1 and u k is known. Then
1 k 1 2
L ( x k 1 , z , u k ) f ( x k 1 ) g ( z ) x z u k
2 2
1
f ( x) xT Px qT x, domf {x | Ax b,}
2
g ( z) Indicator Function for z 0
1 2
L ( x, z , u ) f ( x) g ( z ) x z u
2
2
1 2
x k 1 arg min f ( x ) x zk uk
x: Ax b 2 2
1 2
x k 1 arg min f ( x ) x zk uk
2 2
x: Ax b
LP and QP
k 1 1 2
x arg min f ( x ) x zk uk
2
1 1
AT x k 1 q ( z k u k ) 0
x: Ax b 2 P I
v 0
A 0 b
1 1 2 1
L( x, v) xT Px qT x x zk uk vT ( Ax b)
k 1
x P I
1 T 1 k k
2 2 2 A (z u ) q
v
A 0 b
L 1 L
Px q x z k u k AT v v
Ax b
x
L 1 1
x
0 Px k 1 q x k 1 z k u k AT v 0
Px k 1 q
x k 1
z k u k AT v 0
1 1 1 1 k 1 1 k k
Px k 1 q x k 1 z k u k AT v 0 P I x q A v z u 0
T
L
0 Ax k 1 b 0
z k 1 x k 1 u k (2)
v
u k 1 u k x k 1 z k 1 (3)
(1)
LP
Choose any three from
16 problems
given in the ADMM web site
by Boyd
https://web.stanford.edu/~boyd/papers/admm/
ADMM for two more
Optimization Problems
LP,QP,SVM,LASSO,BP,LAD
LASSO (Least Absolute Shrinkage and Selection Operator)
The following L1 regularized least-squares problem :
1 1
Ax b 2 x 1 ; or minnimize ( Ax b)T ( Ax b) xi
2
minnimize
x 2 x 2 i
1 2
Ax b 2 and g(z)= z
Taking f ( x)= 1
2
ADMM formulation is:
minimize f ( x) g ( z ) subject to x - z 0 vector
x,z
Augmented Lagrangian is
T
L ( x, z , y ) f ( x ) g ( z ) y ( x z ) ( x z )T ( x z )
2
1
L( x, z , y ) Ax b Ax b z 1 y ( x z ) ( x z )T ( x z )
T T
2 2
Re duced equivalent form is (y is replaced by equivalent u )
1 1
T 2
L ( x, z , u ) Ax b Ax b z x z u
2 1
2 2
1 1 k 2
arg min Ax b Ax b
T
x ( k 1) k
x z u
x 2 2 2
1 1 2
arg min Ax b Ax b
( k 1) T
x x zk uk
x 2 2 2
1 1
T
A Ax
T
x A b
z k
u k
T 1 1 k
A A I
x AT
b
z u k
1
T 1 T 1 k
x x A A I A b z u k
k 1
update of z is:
1 1
L( x, z , y ) Ax b Ax b z 1
T 2
x z u 2
2 2
( k 1) 1 k 1 k 2 1 2
z arg min z 1 x z u arg min z 1 xk 1 z u k
z 2 2 z 2 2
z ( k 1) S x k 1 u k
Remember
1 2
min f ( x) x 1 x c
x 2
1
sign( x) ( x c) 0 x* c sign(c )
*
c sign(c), if c ,
x S (c)
0, if c
To Summarise
1
T 1 T 1 k k
x A A I A b z u
k 1
k 1 k 1 k
z S ( x u )
u k 1 u k ( x k 1 z k 1 )
Basis Pursuit
A good proxy for finding the sparsest solution to an
underdetermined system of equations Ax = b is to solve
minimize x 1 subject to Ax b
ADMM formulation:
minimize f ( x) g ( z ) subject to x - z 0 vector
where,
f ( x) x 1
0, Az b
g ( z )
otherwise
Projection on to Az=b
Suppose we got approximate z =z app in an iteration and Az app b
What we need is A(z app e) b
Ae b Azapp e pinv ( A) b Azapp
z k 1 z app e z app pinv( A) b Azapp
Augmented lagrangian is
1 2
L(x,z,u)= x 1 g ( z ) x z u
2 2
1 2
x k 1 arg min x 1 x zk uk
x 2 2
S z k u k
k 1 1 k 1 2
z arg min g ( z ) x z uk
x 2 2
z k 1 PAz b x k 1 u k x k 1 u k pinv ( A) b A x k 1 u k
u k 1 u k x k 1 z k 1
C:\Users\admin\Desktop\RL_David
C:\Users\admin\Desktop\RL Deep Assignments 2018-19\RL full PPTS
Compressed sensing
Single pixel camera
Multispectral camera
Sampling theorem