0% found this document useful (0 votes)
4 views48 pages

4 Proximal Methods and ADMM Modified Ver1

Uploaded by

pooja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views48 pages

4 Proximal Methods and ADMM Modified Ver1

Uploaded by

pooja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 48

Proximal Methods

and ADMM
Non-smooth functions
f ( x)
Non Smooth Function - l1 norm
x

f ( x )  x abs ( x )

f ( x)  x 1
 abs ( xi )
i

f ( x)  x 1  i 1 xi  abs ( xi )
n

x1
x2
Non-smooth+Smooth

f ( x)  x 1   x  v 
2

f 2 ( x)  x  v 
2

f1 ( x)  x 1

x
Non-smooth + Smooth=well behaved
Finding optimal point

x* arg min f ( x) = x  ( x  c) 2
x

f ( x)  sign( x)  2( x  c) 0

d
x sign (x ), x 0
dx

x* c   sign( x* ) c   sign(c ) (Why?.)


2 2

Solution point is always between 0 and location


of vertex of parabola which is c
Sign inversion
1
f ( x )  x   x  c  ,
2
 =1, c=
4
1
Solution is expected between 0 and c=
4
 1 1 1
But as per formula, x* c  sign(c)   1 
2 4 2 4
 solution do not have si gn of c

Algorithm for 1D
 
if c  x* c  sign(c )
2 2

Else x* 0

Shrink towards zero but do not cross


2D
f ( x)  x 1  x  c 2   x1  x2    x  c   x  c 
2 T

 x1   c1 
x   c  
 x2   c2 

  x1  c1  
f ( x)  x 1  x  c   x1  x2   [ x1  c1 , x2  c2 ] 
2

 x
 2 2  c 

x1* c1 

sign( x1* )  x* c 
   other wise 0
2 1 1 sign(c1 ) c1 
sign(c1 ) ; if c1 
2 2 2
 *   
*
x2 c2  *
sign( x2 )  x2 c2  sign ( c2 ) c2  sign ( c2 ) ; if c2  other wise 0
2 2 2 2
Exercise
 5 
 3 
min f ( x)  x 1  x  c
2
 1 c  
x   7
 
  1 
ci  / 2 1 / 2 i

 x1*   c1   sign(c1 ) 
 *    sign(c )   1   9 
x c   5 
     
2 2 2 
2   2 
 x3*   c3  2  sign(c3 )   x1*     
 *      *  3 1   5 
x c
 4   4   sign ( c )
4     2 
 x2   2
 x1*   5   1  x3*  1    13 
 *    1  *  7   
x 3
 2      1  x4   2  2 
 x3*    7  2   1  1  1 
 *       1   
 x4    1   1  2  2 
Another equivalent form
1 2
min f ( x)  x 1  x c
x 2
1
sign( x)  ( x  c) 0  x* c   sign(c )

Algorithm
*
c   sign(c), if   c ,
x S (c) 
 0, if   c

 c  , if c > 0 >  
  Shrink towards 0 by 
x S (c) c   , if c < 0, c > 
*

 0  While shrinking do not cross 0
 if c  
Proximal methods and Proximal
Operators

Notation used in ADMM


x, v  R n ,   R  a hyper parameter

 1 2
Proximal operator of a function prox(v) arg min  f ( x)  x  v 2 
f x  2 

 1 2 
prox f (v ) arg min  f ( x)  x v 2  0
x  2 
ADMM-Alternating Direction Method of
Multipliers
Why ADMM ?

ADMM is a simple and powerful iterative algorithm for convex optimization problems.

It is almost 80 times faster for multivariable problems than conventional methods.

A DMM put linear and quadratic programming in a single frame work.


ADMM Form-1 x, u , z  R n ,
min f ( x )  g ( x )
x
  R  a hyper parameter
x k 1 : prox f  z k  u k 
z k 1 : prox g  x k 1  u k  z-surrogate variable to x
u-lagrangian multipliers
u k 1 :u k  x k 1  z k 1
If f(x) and g(z) are differentiable
High dim parabola
1
z 
2
k 1 k k
x arg min f ( x)  x  u
   2 2
x

1
 x k 1 is the solution of f ( x) 

 x  z k
 uk  0 vector
1
x 
2
k 1 k 1 k
z arg min g ( z )  z u High dim parabola
   2 2
z

1
 z k 1 is the solution of f ( z ) 

 z  x k 1
 uk  0 vector
u k 1 u k  x k 1  z k 1
ADMM -Philosophy
Increase number of variables from one set to three set , solve one set at a time

 x1  z1 0 
min f ( x)  g ( z ) x  z2 0 
subject to x  z 0 x  z 0 vector  2
  
 
 xn  zn 0 
Lagrangian Multiplier term
y1  x1  z1   y2  x2  z2     yn  xn  zn 

 2
L ( x, y, z )  f ( x)  g ( z )  yT ( x  z )   x z
2
2

x k 1 :arg min L  x, z k , y k 
x

z k 1 :arg min L  x k 1 , z , y k 
Augmented langrangian term
z

y k 1 : y k    x k 1  z k 1 
min f ( x)  g ( z )
min f ( x)  g ( x)
x subject to x  z 0
KKT condition
f ( x)  g ( x) 0 vector L ( x, y , z )  f ( x )  g ( z )  y T ( x  z )
KKT Conditions
L
f ( x)  y 0    (1)
x
L
g ( z )  y 0    (2)
z
(1) + (2)  f ( x)  g ( z ) 0 vector-----(3)
L
 x  z 0vector  x  z   (4)
y
(3) and (4)  f ( x)  g ( x) 0vector

KKT Conditions are same


T
 
L ( x, y, z )  f ( x)  g ( z )  y ( x  z )   x  z
2
2
2

T  2
L ( x, y , z )  f ( x )  g ( z )  y ( x  z )  x  z 2
2
KKT Conditions
L
f ( x)  y    x  z  0    (1)
x
L
f ( z )  y    x  z  0    (2)
z
(1) + (2)  f ( x)  g ( z ) 0 vector-----(3)
L
 x  z 0vector  x  z   (4)
y
(3) and (4)  f ( x)  g ( x) 0vector

KKT Conditions are same


ADMM
 2
L ( x, y, z )  f ( x)  g ( z )  y T ( x  z ) 
x z 2
2
T 
=f ( x)  g ( z )   y z  ( x  z )T ( x  z )
y x T

y ,x y ,z
2

The lagrangian function is solved ' one set ' of var iable at a time.
In the k  1 th iteration for computing x, we assume y k , z k is known .
x k , y k , z k are values of x, y and z obtained in k th iteration

Update of x in k+1 th iteration

Consider minimization of Lagrangian function L w.r.t x,


assuming y  y k and z  z k is known.
Lagrangian function is:

L ( x)  f ( x)   k
g(z )   T
x y   y  z k  ( x  z k )T ( x  z k )
k k T

cons tan t y ,x
   2
cons tan t

Omiting constant terms from objective function, we get



x y  ( x  z k )T ( x  z k )
L ( x)  f ( x)  T k

y ,x
2

So the update for x can be written as


 
  2 
x k 1 k
arg min  f ( x)  y , x  x  zk
   2 2 
x
 
 yT x 

k 1
L ( x)
You can think that , you obtained x by solving 0 vector
x
Update of z in the k+1 th iteration.
We assume x and y vectors are known.
We rewrite Lagrangian as follows
 k 1
L ( z )  f ( x )  g ( z )   y  x   y
k 1 k T k 1

k T
z  ( x  z )T ( x k 1  z )
       2
cons tan t Cons tan t

Omiting constant terms from objective function, we get



L ( z )  g ( z )   y  z  ( x k 1  z )T ( x k 1  z )
k T

   2
y,z

k 1  k  k 1 2
z arg min  g ( z )  y , z  x  z 
z  2 2

k 1
L ( z )
You can think that , you obtained z by solving 0 vector
z
Update of y in the k+1 th iteration.
y vector elements are lagrangian multipliers.
The iterative algorithm proceeds towards (x * , z * , y* ) which is a saddle point of L.
This means , w.r.t x, z variables the lagrangian function L is minimum at (x * , z * , y* ).
Also, w.r.t y variables the lagrangian function L is maximum at (x * , z* , y* ).
So, y vector is updated towards the maximum gradient of L with respect to y
L
This direction is given by
y
k ! k 1 T k 1 k 1  k 1 k 1 2
L ( y )  f ( x )  g ( z )  y ( x  z )  x  z
2 2

Omiting constant terms from objective function, we get L ( y )  y T ( x k 1  z k 1 )


L ( y )
( x k 1  z k 1 )  gradient of L w.r.t y
y
So to move towards maximum , y is moved along gradient direction from the current position
 y k 1  y k  ( step _ size)  gradient of L w.r.t y 
y k 1  y k    x k 1  z k 1  . We asumed step _ size as 
We combine 2nd and 3rd term in the first two
optimization into a single term
k 1   2 
x arg min  f ( x)  y k , x  x  zk
2 2 
x  
k 1  k  k 1 2
z arg min  g ( z )  y , z  x  z 
z  2 2

y k 1 : y k    x k 1  z k 1 

k 1
  1
2

x arg min  f ( x)  x  zk  yk 
x  2  
 2

k 1
  k 1 1
2

z arg min  g ( z )  x  z  yk 
z  2  
 2

y k 1  y k    x k 1  z k 1 
We verify it by expansion.
Consider first optimization
2
 1
Let L ( x)  f ( x)  x  zk  yk
2  2
T
 1 k  1 k
    
k k
 f ( x)  x  z  y x  z  y 
2     

 k T 1  1 k   1 k 
T

 f ( x)   x  z   x  z   2  x  z 
k T k k
y   y   y 
2
     


 x  z  x  z   x
k T k T
=f ( x)  y k  constant
2
On rearranging
T k  k 2
L ( x)  f ( x)  x y  x z
2 2

k 1 k  k 2
So, x arg min f ( x)  x, y  x z
x 2 2
We verify it by expansion.
Consider second optimization term
2
 1
Let L ( z )  g ( z )  x k 1  z  yk
2  2
T

k 1 1 k   k 1 1 k
 g ( z)   x  z  y   x  z  y 
2     

  k 1 T 1  1 k   1 k 
T

 g ( z )   x  z   x  z   2  x  z 
T k 1 k 1 k
y   y   y 
2     


  z x  z   z T y k  constant
k 1 T k 1
=g ( z )  x
2
On rearranging
T k  k 1 2
L ( z )  g ( z )  z y  x  z
2 2

k 1 k  k 1 2
So, z arg min g ( z )  z , y  x  z
z 2 2
Simplify
1 1
With u 
k
y , 
k

 

  1
2
 k 1  1 k k 2
k 1
x arg min  f ( x)  x  z k  y k  x arg min  f ( x)  x  (z  u ) 
x  2   x  2  2

 2
k 1  1 k 1 k 2
  1
2
 z arg min  g ( z )  z  (x  u ) 
z arg min  g ( z )  x k 1  z  y k
k 1
 z  2 2

 2  
y k 1  y k    x k 1  z k 1 
z
 2

y k 1  y k    x k 1  z k 1 

x k 1 : prox f z k
 uk 
z k 1 : prox g x k 1
 uk 
u k 1 :u k  x k 1  z k 1
Let us be more wise after the event
Start with augmented Lagrangian function in the following form

 1  2
L ( x, u, z )  f ( x)  g ( z )    x  z  u
 2 
2

instead of

T
 
L ( x, y, z )  f ( x)  g ( z )  y ( x  z )  
2
x z
2
2

If you are able to remember above, we


can easily write first two optimization
quite easily
ADMM-Form II
min f ( x) min f ( x)  g ( z )
subject to x  C Convert into subject to x  z 0

Indicator function 0 if z  C 
g ( z )  
  otherwise 

Augmented Lagrangian function can be written as


 1  2
L ( x, z , u )  f ( x)  g ( z )    x  z u
 2 
2

 1 2
x k 1 arg min  f ( x)  x  zk  uk 
x  2 2

z k 1  C  x k 1  u k 
 C stands for Projection
u k 1 u k  x k 1  z k 1 onto Convex set C
k 1
  1
2

x arg min  f ( x)  x  zk  yk 
x  2  
 2

k 1
  k 1 1
2

z arg min  g ( z )  x  z  yk 
z  2  
 2

y k 1  y k    x k 1  z k 1 
Explanation for projection
z k 1  C  x k 1  u k 
Consider augmented Lagrangian
 1  2
L ( x, z , u )  f ( x)  g ( z )    x  z u 2
 2 
Assume x k 1 and u k is known. Then
 1  k 1 2
L ( x k 1 , z , u k )  f ( x k 1 )  g ( z )    x  z  u k

 2   2

Omitting constant terms


k 1 k  1  k 1 k 2
L ( x , z , u )  g ( z )    x  z u 2
 2 
But , g(z) is an indicator function. It is minimum when z  C.
Second term is minimum when z = x k 1  u k .
z = x k 1  u k may not be  C. If it is not, g(z) is infinity.
So, we project x k 1  u k into C
Linear and Quadratic Programming
1 T
min   x Px  qT x min f ( x)  g ( z )
 2 Reduce to subject to x  z 0
subject to Ax b, x 0

1
f ( x)   xT Px  qT x, domf {x | Ax b,}
 2
g ( z) Indicator Function for z 0

 1  2
L ( x, z , u )  f ( x)  g ( z )    x  z u
 2  
2

 1 2 
x k 1 arg min  f ( x )  x  zk  uk 
x: Ax b  2 2

 1 2 
x k 1 arg min  f ( x )  x  zk  uk
2 2 
x: Ax b  
LP and QP
k 1  1 2 
x arg min  f ( x )  x  zk  uk
2 
 1   1 
AT   x k 1   q  ( z k  u k )   0 
x: Ax b  2  P I   
   v     0 
 A 0  b 
1 1 2 1
L( x, v)   xT Px  qT x  x  zk  uk  vT ( Ax  b) 
k 1
x  P I
1 T 1 k k 
 2 2 2 A    (z  u )  q
   
  
 v 
 A 0  b 
L 1 L
Px  q   x  z k  u k   AT v v
 Ax  b
x 

L 1 1
x
0  Px k 1  q   x k 1  z k  u k   AT v 0

Px k 1  q 

x k 1
 z k  u k   AT v 0

1 1 1  1  k 1 1 k k
Px k 1  q  x k 1  z k  u k  AT v 0  P  I  x  q  A v   z  u  0
T

      

L
0  Ax k 1  b 0
z k 1  x k 1  u k  (2)

v

u k 1 u k  x k 1  z k 1 (3)

(1)
LP
Choose any three from
16 problems
given in the ADMM web site
by Boyd

https://web.stanford.edu/~boyd/papers/admm/
ADMM for two more
Optimization Problems
LP,QP,SVM,LASSO,BP,LAD
LASSO (Least Absolute Shrinkage and Selection Operator)
The following L1 regularized least-squares problem :

1  1 
Ax  b 2   x 1 ;  or minnimize ( Ax  b)T ( Ax  b)    xi 
2
minnimize
x 2  x 2 i 

is called the LASSO.


It is prevalent all across machine learning, model selection in statistics,
and compressed sensing in signal processing.
The   0 above is a user-defined " smoothing parameter"

1 2
Ax  b 2 and g(z)=  z
Taking f ( x)= 1
2
ADMM formulation is:
minimize f ( x)  g ( z ) subject to x - z 0 vector
x,z
Augmented Lagrangian is
T 
L ( x, z , y )  f ( x )  g ( z )  y ( x  z )  ( x  z )T ( x  z )
2
1 
L( x, z , y )   Ax  b   Ax  b    z 1  y ( x  z )  ( x  z )T ( x  z )
T T

2 2
Re duced equivalent form is (y is replaced by equivalent u )
1 1
  
T 2
L ( x, z , u )  Ax  b Ax  b   z  x  z  u
2 1
2 2

1 1 k 2
arg min  Ax  b   Ax  b  
T
x ( k 1) k
x  z u
x 2 2 2
1 1 2
arg min  Ax  b   Ax  b  
( k 1) T
x x  zk  uk
x 2 2 2

on differentiating and putting equal to 0 vector


1
A ( Ax  b)   x  z k  u k  0 vector
T


1 1
T
A Ax 

T
x A b 

 z k
 u k

 T 1  1 k


A A  I
  x  AT
b 

 z  u k

1
 T 1   T 1 k 
x x  A A  I   A b   z  u k 
k 1

     
update of z is:
1 1
L( x, z , y )   Ax  b   Ax  b    z 1 
T 2
x  z u 2
2 2
( k 1) 1 k 1 k 2 1 2
z arg min  z 1  x  z  u arg min z 1  xk 1  z  u k
z 2 2 z 2 2

z ( k 1) S  x k 1  u k 

Remember
1 2
min f ( x)  x 1  x c
x 2
1
sign( x)  ( x  c) 0  x* c   sign(c )

*
c   sign(c), if   c ,
x S (c) 
 0, if   c
To Summarise
1
 T 1   T 1 k k 
x  A A  I   A b   z  u 
k 1

     
k 1 k 1 k
z S ( x  u )
u k 1 u k  ( x k 1  z k 1 )
Basis Pursuit
A good proxy for finding the sparsest solution to an
underdetermined system of equations Ax = b is to solve

minimize x 1 subject to Ax b
ADMM formulation:
minimize f ( x)  g ( z ) subject to x - z 0 vector
where,
f ( x)  x 1
 0, Az b
g ( z ) 
 otherwise
Projection on to Az=b
Suppose we got approximate z =z app in an iteration and Az app b
What we need is A(z app  e) b
Ae b  Azapp  e  pinv ( A) b  Azapp 
 z k 1 z app  e z app  pinv( A) b  Azapp 

Augmented lagrangian is
1 2
L(x,z,u)= x 1  g ( z )  x  z u
2 2

1 2
x k 1 arg min x 1  x  zk  uk
x 2 2

S  z k  u k 
k 1 1 k 1 2
z arg min g ( z )  x  z  uk
x 2 2

 
z k 1 PAz b  x k 1  u k   x k 1  u k   pinv ( A)  b  A  x k 1  u k 

u k 1 u k   x k 1  z k 1 
C:\Users\admin\Desktop\RL_David
C:\Users\admin\Desktop\RL Deep Assignments 2018-19\RL full PPTS

Compressed sensing
Single pixel camera
Multispectral camera
Sampling theorem

You might also like