文章目录
LASSO by Coordinate Descent Method
Prepare:
from itertools import cycle
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import lasso_path, enet_path
from sklearn import datasets
from copy import deepcopy
X = np.random.randn(100,10)
y = np.dot(X,[1,2,3,4,5,6,7,8,9,10])
The code is the simplified version of _cd_fast.enet_coordinate_descent()
with beta=0
and l1_ratio=1
from scikit-learn (source code:lasso coodinate descent source). The original code is implemented in Cython
, and code here is pure python for convenience, easier to understand but much slower.
Coordinate Descent Method Framework
- randomly set β ( 0 ) \beta^{(0)} β(0) for iteration 0
- For k k kth iteration:
----For j = 1 j=1 j=1 to p p p:
-------- β j ( k ) = a r g m i n β j L l 1 ( β ) = a r g m i n β j L l 1 ( β 1 ( k − 1 ) , β 2 ( k − 1 ) , … , β j − 1 ( k − 1 ) , β j , β j + 1 ( k ) , … , β p ( k − 1 ) ) \beta^{(k)}_j = argmin_{\beta_j} \mathcal{L}_{l1}(\beta)=argmin_{\beta_j} \mathcal{L}_{l1}(\beta_1^{(k-1)}, \beta_2^{(k-1)},\ldots, \beta_{j-1}^{(k-1)}, \beta_j, \beta_{j+1}^{(k)}, \ldots, \beta_p^{(k-1)}) βj(k)=argminβjLl1(β)=argminβjLl1(β1(k−1),β2(k−1),…,βj−1(k−1),βj,βj+1(k),…,βp(k−1))
----Endfor
----Check convergence: if yes, end algorithm; else continue update
Endfor
Here the objective function is
L l 1 = 1 2 N ( Y − X β ) T ( Y − X β ) + λ ∥ β ∥ 1 \mathcal{L}_{l1}=\frac{1}{2N}(Y-X\beta)^T (Y-X\beta) + \lambda \left\lVert \beta \right\rVert_1 Ll1=2N1(Y−Xβ)T(Y−Xβ)+λ∥β∥1
where the size of X X X, Y Y Y, β \beta β is N × p N\times p N×p, N × 1 N\times 1 N×1, p × 1 p\times 1 p×1, which means N N N samples and p p p features.
Coordinate Descent Method Update Details
To update the β j \beta_j βj, we need to find the β j ∗ s.t. ∂ L l 1 ( β ) ∂ β j = 0 \beta_j^*\;\text{ s.t.}\; \frac{\partial \mathcal{L}_{l1}(\beta)}{\partial \beta_j}=0 βj∗ s.t.∂βj∂Ll1(β)=0.
Given