Modified projected Gauss-Newton method for constrained nonlinear least-squares: application to power flow analysis

Yassine Nabou1, Lucian Toma2 and Ion Necoara1,3 1Automatic Control and Systems Engineering Department, University Politehnica Bucharest, 060042 Bucharest, Romania. [email protected]; [email protected]2Electrical Power Systems Department, University Politehnica Bucharest, 060042 Bucharest, Romania. [email protected]. 3Gheorghe Mihoc-Caius Iacob Institute of Mathematical Statistics and Applied Mathematics of the Romanian Academy, 050711 Bucharest, Romania.
Abstract

In this paper, we consider a modified projected Gauss-Newton method for solving constrained nonlinear least-squares problems. We assume that the functional constraints are smooth and the the other constraints are represented by a simple closed convex set. We formulate the nonlinear least-squares problem as an optimization problem using the Euclidean norm as a merit function. In our method, at each iteration we linearize the functional constraints inside the merit function at the current point and add a quadratic regularization, yielding a strongly convex subproblem that is easy to solve, whose solution is the next iterate. We present global convergence guarantees for the proposed method under mild assumptions. In particular, we prove stationary point convergence guarantees and under Kurdyka-Lojasiewicz (KL) property for the objective function we derive convergence rates depending on the KL parameter. Finally, we show the efficiency of this method on the power flow analysis problem using several IEEE bus test cases.

I INTRODUCTION

In many areas of engineering, such as maximum likelihood estimations, non-linear data fitting, parameter estimation or power flow analysis, one finds applications that can be recast as nonlinear least-squares problems of the form [11, 8, 6]:

minF(x)norm𝐹𝑥\displaystyle\min\|F(x)\|roman_min ∥ italic_F ( italic_x ) ∥ (1)
s.t.x𝐂n,formulae-sequence𝑠𝑡𝑥𝐂superscript𝑛\displaystyle s.t.\;x\in\mathbf{C}\subseteq\mathbb{R}^{n},italic_s . italic_t . italic_x ∈ bold_C ⊆ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ,

where 𝐂𝐂\mathbf{C}bold_C is a closed convex set and F=(F1,,Fm)𝐹subscript𝐹1subscript𝐹𝑚F=(F_{1},\cdots,F_{m})italic_F = ( italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ), Fi:n:subscript𝐹𝑖superscript𝑛F_{i}:\mathbb{R}^{n}\to\mathbb{R}italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R for i=1:m:𝑖1𝑚i=1:mitalic_i = 1 : italic_m, are nonlinear differentiable functions. When 𝐂=n𝐂superscript𝑛\mathbf{C}=\mathbb{R}^{n}bold_C = blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and m=n𝑚𝑛m=nitalic_m = italic_n, problem (1) is equivalent to a squared system of nonlinear equations. Hence several algorithms were proposed for solving this problem, among these algorithms the most popular is Newton-Raphson method (NR) [22]. In Newton-Raphson method one uses the inverse of the Jacobian matrix in order to update the iterations, i.e., the iterations are of the following form:

x+=xF(x)1F(x),superscript𝑥𝑥𝐹superscript𝑥1𝐹𝑥\displaystyle x^{+}=x-\nabla F(x)^{-1}F(x),italic_x start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT = italic_x - ∇ italic_F ( italic_x ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_F ( italic_x ) ,

where x𝑥xitalic_x is the current iteration and F(x)𝐹𝑥\nabla F(x)∇ italic_F ( italic_x ) is the Jacobian matrix of F(x)𝐹𝑥F(x)italic_F ( italic_x ). Although NR has fast convergence, it has several drawbacks. First of all, it can happen that at current test point the Jacobian is degenerate; in this case the method is not well-defined. Secondly, this convergence is not guaranteed when the initial point x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is far from the optimum [13]. Many approaches have been proposed in order to deal with these challenges, e.g., improving the starting point [3], or using different approximations for the Jacobian [4, 1]. In [17], Nesterov proposed a modified Gauss-Newton scheme (M-GN) for solving unconstrained nonlinear least-squares problems. The M-GN method constructs a convex model by linearizing the nonlinear function F𝐹Fitalic_F inside a sharp merit function and adding a quadratic regularization term, i.e.:

x+=argminynF(x)+F(x)(yx)+M2yx2.superscript𝑥subscript𝑦superscript𝑛norm𝐹𝑥𝐹𝑥𝑦𝑥𝑀2superscriptnorm𝑦𝑥2\displaystyle x^{+}=\arg\min\limits_{y\in\mathbb{R}^{n}}\|F(x)+\nabla F(x)(y-x% )\|+\frac{M}{2}\|y-x\|^{2}.italic_x start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT = roman_arg roman_min start_POSTSUBSCRIPT italic_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ italic_F ( italic_x ) + ∇ italic_F ( italic_x ) ( italic_y - italic_x ) ∥ + divide start_ARG italic_M end_ARG start_ARG 2 end_ARG ∥ italic_y - italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

When M=0𝑀0M=0italic_M = 0, we recover the NR method described above. In [17] it was proved that, under a nondegenerate assumption (i.e., σmin(F(x))>0subscript𝜎min𝐹𝑥0\sigma_{\text{min}}(\nabla F(x))>0italic_σ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ( ∇ italic_F ( italic_x ) ) > 0 for all x𝑥xitalic_x in the level set of F(x0)norm𝐹subscript𝑥0\|F(x_{0})\|∥ italic_F ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ∥, where x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the starting point and σminsubscript𝜎min\sigma_{\text{min}}italic_σ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT denotes the smallest singular value), this scheme has global convergence. Moreover, the solution of each subproblem can be computed with a standard convex optimization solver. Further, problem (1) is equivalent to the following composite optimization problem:

minxnF(x)2+I𝐂(x),subscript𝑥superscript𝑛superscriptnorm𝐹𝑥2subscript𝐼𝐂𝑥\displaystyle\min\limits_{x\in\mathbb{R}^{n}}\|F(x)\|^{2}+{\color[rgb]{0,0,0}I% }_{\mathbf{C}}(x),roman_min start_POSTSUBSCRIPT italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ italic_F ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_I start_POSTSUBSCRIPT bold_C end_POSTSUBSCRIPT ( italic_x ) , (2)

where I𝐂subscript𝐼𝐂I_{\mathbf{C}}italic_I start_POSTSUBSCRIPT bold_C end_POSTSUBSCRIPT is the indicator function of the convex set 𝐂𝐂\mathbf{C}bold_C. Note that using only the norm \|\cdot\|∥ ⋅ ∥ as the merit function is beneficial than using 2\|\cdot\|^{2}∥ ⋅ ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, since in the latest case the condition number is doubled. Another possible algorithm for solving this problem is the Projected Gradient Descent (PGD) [18, 9, 20, 21]. The standard PGD algorithm is given by:

x+=Π𝐂(xαF(x)F(x)),superscript𝑥subscriptΠ𝐂𝑥𝛼𝐹𝑥𝐹𝑥\displaystyle x^{+}=\Pi_{\mathbf{C}}\left(x-\alpha\nabla F(x)F(x)\right),italic_x start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT = roman_Π start_POSTSUBSCRIPT bold_C end_POSTSUBSCRIPT ( italic_x - italic_α ∇ italic_F ( italic_x ) italic_F ( italic_x ) ) ,

where Π𝐂subscriptΠ𝐂\Pi_{\mathbf{C}}roman_Π start_POSTSUBSCRIPT bold_C end_POSTSUBSCRIPT is the projection operator (see Section II) and α𝛼\alphaitalic_α is a step size. PGD descent is a simple method easy to implement, but the main drawback is that it has slow convergence.

A natural questions arises whether we can prove global convergence of MG-N method without assuming the nondegeneracy assumption on the Jacobian F(x)𝐹𝑥\nabla F(x)∇ italic_F ( italic_x ) , i.e., without assuming σmin(F(x))>0subscript𝜎min𝐹𝑥0\sigma_{\text{min}}(\nabla F(x))>0italic_σ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ( ∇ italic_F ( italic_x ) ) > 0 for all x𝑥xitalic_x in the level set of F(x0)norm𝐹subscript𝑥0\|F(x_{0})\|∥ italic_F ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ∥ (see (5)). Such a condition is conservative and it is not always satisfied in practice. In this paper we answer positively to this question, i.e., we consider a Modified Projected Gauss-Newton method (MPG-N) for solving problem (1), where 𝐂𝐂\mathbf{C}bold_C is a simple closed convex set. At each iteration, MPG-N aims to solve the following strongly convex subproblem:

xk+1=argminx𝐂F(xk)+F(xk)(xxk)+M2xxk2,subscript𝑥𝑘1subscript𝑥𝐂norm𝐹subscript𝑥𝑘𝐹subscript𝑥𝑘𝑥subscript𝑥𝑘𝑀2superscriptnorm𝑥subscript𝑥𝑘2\displaystyle x_{k+1}\!=\!\arg\min\limits_{x\in\mathbf{C}}\|F(x_{k})\!+\!% \nabla F(x_{k})(x\!-\!x_{k})\|\!+\!\frac{M}{2}\|x-x_{k}\|^{2},italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = roman_arg roman_min start_POSTSUBSCRIPT italic_x ∈ bold_C end_POSTSUBSCRIPT ∥ italic_F ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + ∇ italic_F ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ( italic_x - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∥ + divide start_ARG italic_M end_ARG start_ARG 2 end_ARG ∥ italic_x - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (3)

which is a slightly modified version of [17] as it considers constraints x𝐂𝑥𝐂x\in\mathbf{C}italic_x ∈ bold_C. We prove, under mild assumptions, that this scheme can achieve global convergence without any assumption on the Jacobian matrix. More precisely, we prove that any limit point of the sequence generated by MPG-N is a stationary point and under the Kurdyka-Lojasiewicz (KL) property, we derive convergence rates in function value depending on the KL parameter. Finally, we consider solving a power flow analysis problem, with functional constraints which do not usually satisfy the non-degenerate assumption, while it satisfies the KL property. We compare the performance of such a scheme with the projected gradient scheme and demonstrate its efficiency of the proposed method on several IEEE bus test cases.

Content. The rest of the paper is organized as follows: Section II provides some notations and preliminaries, Section III presents the new algorithm and the convergence results, Section IV describes the power flow analysis problem and numerical results on several IEEE bus test cases.

II Notations and preliminaries

We denote a finite-dimensional real vector space with 𝔼𝔼\mathbb{E}blackboard_E and by 𝔼superscript𝔼\mathbb{E}^{*}blackboard_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT its dual space composed of linear functions on 𝔼𝔼\mathbb{E}blackboard_E. Using a self-adjoint positive-definite operator D:𝔼𝔼:𝐷𝔼superscript𝔼D:\mathbb{E}\rightarrow\mathbb{E}^{*}italic_D : blackboard_E → blackboard_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT(notation D=D0𝐷superscript𝐷succeeds0D=D^{*}\succ 0italic_D = italic_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≻ 0), we can endow these spaces with conjugate Euclidean norms:

x=Dx,x,12x𝔼,g=g,D1g12,g𝔼.\displaystyle\lVert x\rVert=\langle Dx,x\rangle{\color[rgb]{0,0,0}{}^{\frac{1}% {2}}},\quad x\in\mathbb{E},\qquad\lVert g\rVert_{*}=\langle g,D^{-1}g\rangle^{% \frac{1}{2}},\quad g\in\mathbb{E}^{*}.∥ italic_x ∥ = ⟨ italic_D italic_x , italic_x ⟩ start_FLOATSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_FLOATSUPERSCRIPT , italic_x ∈ blackboard_E , ∥ italic_g ∥ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT = ⟨ italic_g , italic_D start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_g ⟩ start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT , italic_g ∈ blackboard_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT .

For simplicity, we consider in the following 𝔼=n𝔼superscript𝑛\mathbb{E}=\mathbb{R}^{n}blackboard_E = blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and D𝐷Ditalic_D is the identity matrix. Let F=(F1,,Fm)𝐹subscript𝐹1subscript𝐹𝑚F=(F_{1},\cdots,F_{m})italic_F = ( italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ), where Fisubscript𝐹𝑖F_{i}italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s, i=1:m:𝑖1𝑚i=1:mitalic_i = 1 : italic_m, are differentiable functions and the Jacobian is Lipschitz continuous, i.e.:

F(x)F(y)LFxyx,yn.formulae-sequencenorm𝐹𝑥𝐹𝑦subscript𝐿𝐹norm𝑥𝑦for-all𝑥𝑦superscript𝑛\displaystyle\|\nabla F(x)-\nabla F(y)\|\leq L_{F}\|x-y\|\;\;\forall x,y\in% \mathbb{R}^{n}.∥ ∇ italic_F ( italic_x ) - ∇ italic_F ( italic_y ) ∥ ≤ italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ∥ italic_x - italic_y ∥ ∀ italic_x , italic_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT .

It follows that [17]:

F(x)F(y)+F(y)(xy)LF2xy2x,yn.formulae-sequencenorm𝐹𝑥norm𝐹𝑦𝐹𝑦𝑥𝑦subscript𝐿𝐹2superscriptnorm𝑥𝑦2for-all𝑥𝑦superscript𝑛\displaystyle\|F(x)\|\!\!-\!\!\|F(y)\!+\!\nabla F(y)(x\!\!-\!\!y)\|\leq\frac{L% _{F}}{2}\|x\!\!-\!\!y\|^{2}\;\;\forall x,y\!\in\!\mathbb{R}^{n}.∥ italic_F ( italic_x ) ∥ - ∥ italic_F ( italic_y ) + ∇ italic_F ( italic_y ) ( italic_x - italic_y ) ∥ ≤ divide start_ARG italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ italic_x - italic_y ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∀ italic_x , italic_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT . (4)

Let hhitalic_h be proper lower semicontinuous function and μ>0𝜇0\mu>0italic_μ > 0. Then, the proximal operator with respect to hhitalic_h is:

proxμh(x)=argminyh(y)+μ2yx2,subscriptprox𝜇𝑥subscript𝑦𝑦𝜇2superscriptnorm𝑦𝑥2\displaystyle\text{prox}_{\mu h}(x)=\arg\min_{y}h(y)+\frac{\mu}{2}\|y-x\|^{2},prox start_POSTSUBSCRIPT italic_μ italic_h end_POSTSUBSCRIPT ( italic_x ) = roman_arg roman_min start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_h ( italic_y ) + divide start_ARG italic_μ end_ARG start_ARG 2 end_ARG ∥ italic_y - italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

and the Moreau envelop is defined as:

hμ(x)=minyh(y)+μ2yx2.subscript𝜇𝑥subscript𝑦𝑦𝜇2superscriptnorm𝑦𝑥2\displaystyle h_{\mu}(x)=\min_{y}\;h(y)+\frac{\mu}{2}\|y-x\|^{2}.italic_h start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT ( italic_x ) = roman_min start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_h ( italic_y ) + divide start_ARG italic_μ end_ARG start_ARG 2 end_ARG ∥ italic_y - italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

When hhitalic_h is the indicator function of a convex set C𝐶Citalic_C, ICsubscript𝐼𝐶I_{C}italic_I start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT, then the proximal operator is the projection:

proxμI𝐂(x)subscriptprox𝜇subscript𝐼𝐂𝑥\displaystyle\text{prox}_{\mu{\color[rgb]{0,0,0}I}_{\mathbf{C}}}(x)prox start_POSTSUBSCRIPT italic_μ italic_I start_POSTSUBSCRIPT bold_C end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x ) =ΠC(x)=argminyCyx2.absentsubscriptΠC𝑥subscript𝑦𝐶superscriptnorm𝑦𝑥2\displaystyle=\Pi_{\textbf{C}}(x)=\arg\min_{y\in C}\|y-x\|^{2}.= roman_Π start_POSTSUBSCRIPT C end_POSTSUBSCRIPT ( italic_x ) = roman_arg roman_min start_POSTSUBSCRIPT italic_y ∈ italic_C end_POSTSUBSCRIPT ∥ italic_y - italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

We say that hhitalic_h is μ𝜇\muitalic_μ-weakly convex if the function

xh(x)+μ2x2maps-to𝑥𝑥𝜇2superscriptnorm𝑥2x\mapsto h(x)+\frac{\mu}{2}\|x\|^{2}italic_x ↦ italic_h ( italic_x ) + divide start_ARG italic_μ end_ARG start_ARG 2 end_ARG ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

is convex. The level set of hhitalic_h at x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is defined:

(h(x0)):={xn:h(x)h(x0)}.assignsubscript𝑥0conditional-set𝑥superscript𝑛𝑥subscript𝑥0\displaystyle\mathcal{L}(h(x_{0})):=\{x\in\mathbb{R}^{n}:h(x)\leq h(x_{0})\}.caligraphic_L ( italic_h ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) := { italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT : italic_h ( italic_x ) ≤ italic_h ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) } . (5)

Next, we provide few definitions and properties concerning subdifferential calculs (see also [14, 19]).

Definition 1

(Subdifferential): Let f:n¯:𝑓superscript𝑛¯f:\mathbb{R}^{n}\to\bar{\mathbb{R}}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → over¯ start_ARG blackboard_R end_ARG be a proper lower semicontinuous function. For a given xdomf𝑥dom𝑓x\in\text{dom}\;fitalic_x ∈ dom italic_f, the Frechet subdifferential of f𝑓fitalic_f at x𝑥xitalic_x, written ^f(x)^𝑓𝑥\widehat{\partial}f(x)over^ start_ARG ∂ end_ARG italic_f ( italic_x ), is the set of all vectors gxnsubscript𝑔𝑥superscript𝑛g_{x}\in\mathbb{R}^{n}italic_g start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT satisfying:

limxy,yxf(y)f(x)gx,yxxy0.subscriptformulae-sequence𝑥𝑦𝑦𝑥𝑓𝑦𝑓𝑥subscript𝑔𝑥𝑦𝑥delimited-∥∥𝑥𝑦0\lim\limits_{x\neq y,y\to x}\frac{f(y)-f(x)-\langle g_{x},y-x\rangle}{\lVert x% -y\rVert}\geq 0.roman_lim start_POSTSUBSCRIPT italic_x ≠ italic_y , italic_y → italic_x end_POSTSUBSCRIPT divide start_ARG italic_f ( italic_y ) - italic_f ( italic_x ) - ⟨ italic_g start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , italic_y - italic_x ⟩ end_ARG start_ARG ∥ italic_x - italic_y ∥ end_ARG ≥ 0 .

When xdomf𝑥dom𝑓x\notin\text{dom}\;fitalic_x ∉ dom italic_f, we set ^f(x)=^𝑓𝑥\widehat{\partial}f(x)=\emptysetover^ start_ARG ∂ end_ARG italic_f ( italic_x ) = ∅. The limiting-subdifferential, or simply the subdifferential, of f𝑓fitalic_f at xdomf𝑥dom𝑓x\in\text{dom}\,fitalic_x ∈ dom italic_f, written f(x)𝑓𝑥\partial f(x)∂ italic_f ( italic_x ), is defined through the following closure process [14]:

f(x):=assign𝑓𝑥absent\displaystyle\partial f(x):=∂ italic_f ( italic_x ) := {gx𝔼:xkxwithf(xk)f(x)\displaystyle\left\{g_{x}\in\mathbb{E}^{*}\!\!:\exists x^{k}\to x\;\text{with}% \;f(x^{k})\to f(x)\;\right.{ italic_g start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ∈ blackboard_E start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT : ∃ italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → italic_x with italic_f ( italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) → italic_f ( italic_x )
andgxk^f(xk)withgxkgx}.\displaystyle\quad\left.\text{and}\;\exists g_{x}^{k}\in\widehat{\partial}f(x^% {k})\;\;\text{with}\;\;g_{x}^{k}\to g_{x}\right\}.and ∃ italic_g start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∈ over^ start_ARG ∂ end_ARG italic_f ( italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) with italic_g start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → italic_g start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT } .

Note that we have ^f(x)f(x)^𝑓𝑥𝑓𝑥\widehat{\partial}f(x)\subseteq\partial f(x)over^ start_ARG ∂ end_ARG italic_f ( italic_x ) ⊆ ∂ italic_f ( italic_x ) for each xdomf𝑥dom𝑓x\in\text{dom}\,fitalic_x ∈ dom italic_f. In the previous inclusion, the first set is closed and convex while the second one is closed, see e.g., [19](Theorem 8.6). For any xdomf𝑥dom𝑓x\in\text{dom}\;fitalic_x ∈ dom italic_f let us define:

Sf(x)=dist(0,f(x)):=infgxf(x)gx.subscript𝑆𝑓𝑥dist0𝑓𝑥assignsubscriptinfimumsubscript𝑔𝑥𝑓𝑥delimited-∥∥subscript𝑔𝑥\displaystyle{\color[rgb]{0,0,0}S_{f}(x)}=\text{dist}\big{(}0,\partial f(x)% \big{)}:=\inf\limits_{g_{x}\in\partial f(x)}\lVert g_{x}\rVert.italic_S start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_x ) = dist ( 0 , ∂ italic_f ( italic_x ) ) := roman_inf start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ∈ ∂ italic_f ( italic_x ) end_POSTSUBSCRIPT ∥ italic_g start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ∥ .

If f(x)=𝑓𝑥\partial f(x)=\emptyset∂ italic_f ( italic_x ) = ∅, we set Sf(x)=subscript𝑆𝑓𝑥S_{f}(x)=\inftyitalic_S start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_x ) = ∞. Let us also recall the definition of a function satisfying the Kurdyka-Lojasiewicz (KL) property (see [2] for more details).

Definition 2

A proper lower semicontinuous function f:n¯:𝑓superscript𝑛¯f:\mathbb{R}^{n}\rightarrow\bar{\mathbb{R}}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → over¯ start_ARG blackboard_R end_ARG satisfies Kurdyka-Lojasiewicz (KL) property on the compact set ΩdomfΩdom𝑓\Omega\subseteq\text{dom}\;froman_Ω ⊆ dom italic_f on which f𝑓fitalic_f takes a constant value fsubscript𝑓f_{*}italic_f start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT if there exist δ,ϵ,q>0𝛿italic-ϵ𝑞0\delta,\epsilon,q>0italic_δ , italic_ϵ , italic_q > 0 such that one has:

f(x)fσqSf(x)q𝑓𝑥subscript𝑓subscript𝜎𝑞subscript𝑆𝑓superscript𝑥𝑞\displaystyle f(x)-f_{*}\leq\sigma_{q}\;{\color[rgb]{0,0,0}S_{f}(x)^{q}}italic_f ( italic_x ) - italic_f start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ≤ italic_σ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_x ) start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT (6)
x:dist(x,Ω)δ,f<f(x)<f+ϵ.:for-all𝑥formulae-sequencedist𝑥Ω𝛿subscript𝑓𝑓𝑥subscript𝑓italic-ϵ\displaystyle\forall x\!:\;\text{dist}(x,\Omega)\leq\delta,\;f_{*}<f(x)<f_{*}+\epsilon.∀ italic_x : dist ( italic_x , roman_Ω ) ≤ italic_δ , italic_f start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT < italic_f ( italic_x ) < italic_f start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT + italic_ϵ .

Note that the relevant aspect of the KL property is when ΩΩ\Omegaroman_Ω is a subset of critical points for f𝑓fitalic_f, i.e. Ω{x:0f(x)}Ωconditional-set𝑥0𝑓𝑥\Omega\subseteq\{x:0\in\partial f(x)\}roman_Ω ⊆ { italic_x : 0 ∈ ∂ italic_f ( italic_x ) }, since it is easy to establish the KL property when ΩΩ\Omegaroman_Ω is not related to critical points. The KL property holds for a large class of functions including semi-algebraic functions (e.g., real polynomial functions), vector or matrix (semi)norms (e.g., p\|\cdot\|_{p}∥ ⋅ ∥ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT with p0𝑝0p\geq 0italic_p ≥ 0 rational number), trigonometric functions, logarithm functions, exponential functions and uniformly convex functions, see [2] for a comprehensive list.

III Modified Projected Gauss-Newton method

In this section, we present the Modified Projected Gauss-Newton (MPG-N) method and then derive convergence results. We recall the problem of our interest is:

minxnf(x):=F(x)+I𝐂(x).assignsubscript𝑥superscript𝑛𝑓𝑥norm𝐹𝑥subscript𝐼𝐂𝑥\displaystyle\min\limits_{x\in\mathbb{R}^{n}}f(x):=\|F(x)\|+I_{\mathbf{C}}(x).roman_min start_POSTSUBSCRIPT italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_f ( italic_x ) := ∥ italic_F ( italic_x ) ∥ + italic_I start_POSTSUBSCRIPT bold_C end_POSTSUBSCRIPT ( italic_x ) . (7)

We consider the following assumption:

Assumption 1
  1. 1.

    F𝐹Fitalic_F is differentiable and the Jacobian is Lipschitz continuous:

    F(x)F(y)LFxy,x,y𝐂.formulae-sequencenorm𝐹𝑥𝐹𝑦subscript𝐿𝐹norm𝑥𝑦for-all𝑥𝑦𝐂\displaystyle\|\nabla F(x)-\nabla F(y)\|\leq L_{F}\|x-y\|,\quad\forall x,y\in% \mathbf{C}.∥ ∇ italic_F ( italic_x ) - ∇ italic_F ( italic_y ) ∥ ≤ italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ∥ italic_x - italic_y ∥ , ∀ italic_x , italic_y ∈ bold_C .
  2. 2.

    Problem (3) has solution, i.e., there exist x𝐂superscript𝑥𝐂x^{*}\in\mathbf{C}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ bold_C such that f(x)>𝑓superscript𝑥f(x^{*})>-\inftyitalic_f ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) > - ∞.

An immediate consequence of (4) is:

f(x)F(y)+F(y)(xy)+LF2xy2x,y𝐂.formulae-sequence𝑓𝑥norm𝐹𝑦𝐹𝑦𝑥𝑦subscript𝐿𝐹2superscriptnorm𝑥𝑦2for-all𝑥𝑦𝐂\displaystyle f(x)\leq\|F(y)+\nabla F(y)(x-y)\|+\frac{L_{F}}{2}\|x-y\|^{2}% \quad\forall x,y\in\mathbf{C}.italic_f ( italic_x ) ≤ ∥ italic_F ( italic_y ) + ∇ italic_F ( italic_y ) ( italic_x - italic_y ) ∥ + divide start_ARG italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ italic_x - italic_y ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∀ italic_x , italic_y ∈ bold_C .

Then, for M>0𝑀0M>0italic_M > 0, we define the modified projected Gauss-Newton iterate at a point x𝐂𝑥𝐂x\in\mathbf{C}italic_x ∈ bold_C as follows:

TM(x)subscript𝑇𝑀𝑥\displaystyle T_{M}(x)italic_T start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_x ) =argminy𝐂ΨM(y;x)absentsubscript𝑦𝐂subscriptΨ𝑀𝑦𝑥\displaystyle=\arg\min_{y\in\mathbf{C}}\Psi_{M}(y;x)= roman_arg roman_min start_POSTSUBSCRIPT italic_y ∈ bold_C end_POSTSUBSCRIPT roman_Ψ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_y ; italic_x ) (8)
:=argminy𝐂F(x)+F(x)(yx)+M2yx2.assignabsentsubscript𝑦𝐂norm𝐹𝑥𝐹𝑥𝑦𝑥𝑀2superscriptnorm𝑦𝑥2\displaystyle:=\arg\min_{y\in\mathbf{C}}\|F(x)+\nabla F(x)(y-x)\|+\frac{M}{2}% \|y-x\|^{2}.:= roman_arg roman_min start_POSTSUBSCRIPT italic_y ∈ bold_C end_POSTSUBSCRIPT ∥ italic_F ( italic_x ) + ∇ italic_F ( italic_x ) ( italic_y - italic_x ) ∥ + divide start_ARG italic_M end_ARG start_ARG 2 end_ARG ∥ italic_y - italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Note that this subproblem is strongly convex, hence TM(x)subscript𝑇𝑀𝑥T_{M}(x)italic_T start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_x ) is well defined and unique. Finally, the modified projected Gauss-Newton algorithm is as follows:

MPG-N algorithm
Chose x0𝐂subscript𝑥0𝐂x_{0}\in\mathbf{C}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ bold_C and L0,δ>0subscript𝐿0𝛿0{\color[rgb]{0,0,0}L_{0},\delta>0}italic_L start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_δ > 0. For k0𝑘0k\geq 0italic_k ≥ 0 do:
Find L0Mk2LFsubscript𝐿0subscript𝑀𝑘2subscript𝐿𝐹L_{0}\leq M_{k}\leq 2L_{F}italic_L start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ≤ italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ 2 italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT such that: δ2TMk(xk)xk2ΨMk(TMk(xk);xk)f(TMk(xk))𝛿2superscriptnormsubscript𝑇subscript𝑀𝑘subscript𝑥𝑘subscript𝑥𝑘2subscriptΨsubscript𝑀𝑘subscript𝑇subscript𝑀𝑘subscript𝑥𝑘subscript𝑥𝑘𝑓subscript𝑇subscript𝑀𝑘subscript𝑥𝑘\displaystyle\frac{\delta}{2}\|T_{M_{k}}(x_{k})-x_{k}\|^{2}\leq\Psi_{M_{k}}% \left(T_{M_{k}}(x_{k});x_{k}\right)\!-\!f\left(T_{M_{k}}(x_{k})\right)divide start_ARG italic_δ end_ARG start_ARG 2 end_ARG ∥ italic_T start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ roman_Ψ start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ; italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_f ( italic_T start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) (9) Update xk+1=TMk(xk)subscript𝑥𝑘1subscript𝑇subscript𝑀𝑘subscript𝑥𝑘x_{k+1}=T_{M_{k}}(x_{k})italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_T start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ).

The first step of MPG-N algorithm consists of finding a constant Mk>0subscript𝑀𝑘0M_{k}>0italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > 0 such that inequality (9) holds. If the constant LFsubscript𝐿𝐹L_{F}italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT is known, we can take Mk=LF+δsubscript𝑀𝑘subscript𝐿𝐹𝛿M_{k}=L_{F}+\deltaitalic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT + italic_δ. Otherwise, we can apply the following line search procedure [16]:

While(9)is not satisfieddoMk=2MkWhileitalic-(9italic-)is not satisfieddosubscript𝑀𝑘2subscript𝑀𝑘\displaystyle\textbf{While}\;\eqref{eq:alg_desc}\;\text{is not satisfied}\;% \textbf{do}\;\;M_{k}=2M_{k}\;While italic_( italic_) is not satisfied do italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 2 italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
Mk+1=max(Mk2,L0).subscript𝑀𝑘1maxsubscript𝑀𝑘2subscript𝐿0\displaystyle M_{k+1}=\text{max}\left(\frac{M_{k}}{2},L_{0}\right).italic_M start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = max ( divide start_ARG italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG , italic_L start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) .

The next lemma shows that this process is well defined.

Lemma 1

Let Assumption 1 hold. At k𝑘kitalic_kth iteration of MPG-N algorithm, if MkLFδsubscript𝑀𝑘subscript𝐿𝐹𝛿M_{k}-L_{F}\geq\deltaitalic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≥ italic_δ, then inequality (9) holds.

Proof:

We have from inequality (4) that:

MkLF2TMk(xk)xk2+f(TMk(xk))subscript𝑀𝑘subscript𝐿𝐹2superscriptnormsubscript𝑇subscript𝑀𝑘subscript𝑥𝑘subscript𝑥𝑘2𝑓subscript𝑇subscript𝑀𝑘subscript𝑥𝑘\displaystyle\frac{M_{k}-L_{F}}{2}\|T_{M_{k}}(x_{k})-x_{k}\|^{2}+f(T_{M_{k}}(x% _{k}))divide start_ARG italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ italic_T start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_f ( italic_T start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) )
F(xk)+F(xk)(TMk(xk)xk)+Mk2TMk(xk)xk2absentnorm𝐹subscript𝑥𝑘𝐹subscript𝑥𝑘subscript𝑇subscript𝑀𝑘subscript𝑥𝑘subscript𝑥𝑘subscript𝑀𝑘2superscriptnormsubscript𝑇subscript𝑀𝑘subscript𝑥𝑘subscript𝑥𝑘2\displaystyle\leq\|F(x_{k})+\nabla F(x_{k})(T_{M_{k}}(x_{k})-x_{k})\|+\frac{M_% {k}}{2}\|T_{M_{k}}(x_{k})-x_{k}\|^{2}≤ ∥ italic_F ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + ∇ italic_F ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ( italic_T start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∥ + divide start_ARG italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ italic_T start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=ΨMk(TMk(xk);xk).absentsubscriptΨsubscript𝑀𝑘subscript𝑇subscript𝑀𝑘subscript𝑥𝑘subscript𝑥𝑘\displaystyle=\Psi_{M_{k}}\left(T_{M_{k}}(x_{k});x_{k}\right).= roman_Ψ start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ; italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) .

Since MkLFδsubscript𝑀𝑘subscript𝐿𝐹𝛿M_{k}-L_{F}\geq\deltaitalic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≥ italic_δ, it follows immediately that:

δ2TMk(xk)xk2ΨMk(TMk(xk);xk)f(TMk(xk)).𝛿2superscriptnormsubscript𝑇subscript𝑀𝑘subscript𝑥𝑘subscript𝑥𝑘2subscriptΨsubscript𝑀𝑘subscript𝑇subscript𝑀𝑘subscript𝑥𝑘subscript𝑥𝑘𝑓subscript𝑇subscript𝑀𝑘subscript𝑥𝑘\displaystyle\frac{\delta}{2}\|T_{M_{k}}(x_{k})-x_{k}\|^{2}\leq\Psi_{M_{k}}% \left(T_{M_{k}}(x_{k});x_{k}\right)-f(T_{M_{k}}(x_{k})).divide start_ARG italic_δ end_ARG start_ARG 2 end_ARG ∥ italic_T start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ roman_Ψ start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ; italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_f ( italic_T start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) .

Hence, this is the statement of the lemma. ∎

Note that Lemma 1 ensures that (9) always holds, provided that MkLf+δsubscript𝑀𝑘subscript𝐿𝑓𝛿M_{k}\geq L_{f}+\deltaitalic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≥ italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT + italic_δ. However, in practice, using the line search procedure allows us to work with Mksubscript𝑀𝑘M_{k}italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT small (i.e., MkLFsubscript𝑀𝑘subscript𝐿𝐹M_{k}\leq L_{F}italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT) such that condition (9) holds. Next, let us discuss the solution of the subproblem (8). Following [18], we have:

miny𝐂F(x)+F(x)(yx)+M2yx2subscript𝑦𝐂norm𝐹𝑥𝐹𝑥𝑦𝑥𝑀2superscriptnorm𝑦𝑥2\displaystyle\min_{y\in\mathbf{C}}\|F(x)+\nabla F(x)(y-x)\|+\frac{M}{2}\|y-x\|% ^{2}roman_min start_POSTSUBSCRIPT italic_y ∈ bold_C end_POSTSUBSCRIPT ∥ italic_F ( italic_x ) + ∇ italic_F ( italic_x ) ( italic_y - italic_x ) ∥ + divide start_ARG italic_M end_ARG start_ARG 2 end_ARG ∥ italic_y - italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=miny𝐂maxs1s,F(x)+F(x)(yx)+M2yx2absentsubscript𝑦𝐂subscriptnorm𝑠1𝑠𝐹𝑥𝐹𝑥𝑦𝑥𝑀2superscriptnorm𝑦𝑥2\displaystyle=\min_{y\in\mathbf{C}}\max_{\|s\|\leq 1}\langle s,F(x)+\nabla F(x% )(y-x)\rangle+\frac{M}{2}\|y-x\|^{2}= roman_min start_POSTSUBSCRIPT italic_y ∈ bold_C end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT ∥ italic_s ∥ ≤ 1 end_POSTSUBSCRIPT ⟨ italic_s , italic_F ( italic_x ) + ∇ italic_F ( italic_x ) ( italic_y - italic_x ) ⟩ + divide start_ARG italic_M end_ARG start_ARG 2 end_ARG ∥ italic_y - italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=maxs1miny𝐂s,F(x)+F(x)(yx)+M2yx2absentsubscriptnorm𝑠1subscript𝑦𝐂𝑠𝐹𝑥𝐹𝑥𝑦𝑥𝑀2superscriptnorm𝑦𝑥2\displaystyle=\max_{\|s\|\leq 1}\min_{y\in\mathbf{C}}\langle s,F(x)+\nabla F(x% )(y-x)\rangle+\frac{M}{2}\|y-x\|^{2}= roman_max start_POSTSUBSCRIPT ∥ italic_s ∥ ≤ 1 end_POSTSUBSCRIPT roman_min start_POSTSUBSCRIPT italic_y ∈ bold_C end_POSTSUBSCRIPT ⟨ italic_s , italic_F ( italic_x ) + ∇ italic_F ( italic_x ) ( italic_y - italic_x ) ⟩ + divide start_ARG italic_M end_ARG start_ARG 2 end_ARG ∥ italic_y - italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=maxs1miny𝐂F(x)Ts,(yx)+M2yx2+s,F(x)absentsubscriptnorm𝑠1subscript𝑦𝐂𝐹superscript𝑥𝑇𝑠𝑦𝑥𝑀2superscriptnorm𝑦𝑥2𝑠𝐹𝑥\displaystyle=\max_{\|s\|\leq 1}\min_{y\in\mathbf{C}}\langle\nabla F(x)^{T}s,(% y-x)\rangle+\frac{M}{2}\|y-x\|^{2}+\langle s,F(x)\rangle= roman_max start_POSTSUBSCRIPT ∥ italic_s ∥ ≤ 1 end_POSTSUBSCRIPT roman_min start_POSTSUBSCRIPT italic_y ∈ bold_C end_POSTSUBSCRIPT ⟨ ∇ italic_F ( italic_x ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_s , ( italic_y - italic_x ) ⟩ + divide start_ARG italic_M end_ARG start_ARG 2 end_ARG ∥ italic_y - italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ⟨ italic_s , italic_F ( italic_x ) ⟩
=maxs1miny𝐂M2yx+1MF(x)Ts212MF(x)Ts2absentsubscriptnorm𝑠1subscript𝑦𝐂𝑀2superscriptnorm𝑦𝑥1𝑀𝐹superscript𝑥𝑇𝑠212𝑀superscriptnorm𝐹superscript𝑥𝑇𝑠2\displaystyle=\max_{\|s\|\leq 1}\min_{y\in\mathbf{C}}\frac{M}{2}\|y-x+\frac{1}% {M}\nabla F(x)^{T}s\|^{2}-\frac{1}{2M}\|F(x)^{T}s\|^{2}= roman_max start_POSTSUBSCRIPT ∥ italic_s ∥ ≤ 1 end_POSTSUBSCRIPT roman_min start_POSTSUBSCRIPT italic_y ∈ bold_C end_POSTSUBSCRIPT divide start_ARG italic_M end_ARG start_ARG 2 end_ARG ∥ italic_y - italic_x + divide start_ARG 1 end_ARG start_ARG italic_M end_ARG ∇ italic_F ( italic_x ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_s ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 italic_M end_ARG ∥ italic_F ( italic_x ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_s ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+s,F(x)𝑠𝐹𝑥\displaystyle\quad+\langle s,F(x)\rangle+ ⟨ italic_s , italic_F ( italic_x ) ⟩
=maxs1M2ΠC(x1MF(x)Ts)x+1MF(x)Ts2absentsubscriptnorm𝑠1𝑀2superscriptnormsubscriptΠC𝑥1𝑀𝐹superscript𝑥𝑇𝑠𝑥1𝑀𝐹superscript𝑥𝑇𝑠2\displaystyle=\max_{\|s\|\leq 1}\frac{M}{2}\|\Pi_{\textbf{C}}\left(x-\frac{1}{% M}\nabla F(x)^{T}s\right)-x+\frac{1}{M}\nabla F(x)^{T}s\|^{2}= roman_max start_POSTSUBSCRIPT ∥ italic_s ∥ ≤ 1 end_POSTSUBSCRIPT divide start_ARG italic_M end_ARG start_ARG 2 end_ARG ∥ roman_Π start_POSTSUBSCRIPT C end_POSTSUBSCRIPT ( italic_x - divide start_ARG 1 end_ARG start_ARG italic_M end_ARG ∇ italic_F ( italic_x ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_s ) - italic_x + divide start_ARG 1 end_ARG start_ARG italic_M end_ARG ∇ italic_F ( italic_x ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_s ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
12MF(x)Ts2+s,F(x),12𝑀superscriptnorm𝐹superscript𝑥𝑇𝑠2𝑠𝐹𝑥\displaystyle\quad-\frac{1}{2M}\|F(x)^{T}s\|^{2}+\langle s,F(x)\rangle,- divide start_ARG 1 end_ARG start_ARG 2 italic_M end_ARG ∥ italic_F ( italic_x ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_s ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ⟨ italic_s , italic_F ( italic_x ) ⟩ ,

which can be solved with standard convex optimization tools, such as trust-region methods [5].

III-A Convergence analysis

In this section we derive convergence results for MPG-N algorithm. First, we can prove the following descent:

Lemma 2

Let Assumption 1 hold. Let (xk)k0subscriptsubscript𝑥𝑘𝑘0(x_{k})_{k\geq 0}( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k ≥ 0 end_POSTSUBSCRIPT be generated by MPG-N algorithm. Then, we have:

  1. 1.

    Sequence (f(xk))k0subscript𝑓subscript𝑥𝑘𝑘0(f(x_{k}))_{k\geq 0}( italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT italic_k ≥ 0 end_POSTSUBSCRIPT is nonincreasing and satisfies:

    δ2xk+1xk2f(xk)f(xk+1).𝛿2superscriptnormsubscript𝑥𝑘1subscript𝑥𝑘2𝑓subscript𝑥𝑘𝑓subscript𝑥𝑘1\displaystyle\frac{\delta}{2}\|x_{k+1}-x_{k}\|^{2}\leq f(x_{k})-f(x_{k+1}).divide start_ARG italic_δ end_ARG start_ARG 2 end_ARG ∥ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) . (10)
  2. 2.

    The sequence (xk)k0subscriptsubscript𝑥𝑘𝑘0(x_{k})_{k\geq 0}( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k ≥ 0 end_POSTSUBSCRIPT satisfies:

    k=1xk+1xk2<,limkxk+1xk2=0.formulae-sequencesuperscriptsubscript𝑘1superscriptnormsubscript𝑥𝑘1subscript𝑥𝑘2subscript𝑘superscriptnormsubscript𝑥𝑘1subscript𝑥𝑘20\displaystyle\sum_{k=1}^{\infty}\|x_{k+1}-x_{k}\|^{2}<\infty,\quad\lim_{k\to% \infty}\|x_{k+1}-x_{k}\|^{2}=0.∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∥ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT < ∞ , roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT ∥ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0 .
Proof:

We have:

ΨMk(TMk(xk);xk)minx𝐂ΨMk(x;xk)ΨMk(xk;xk)=f(xk).subscriptΨsubscript𝑀𝑘subscript𝑇subscript𝑀𝑘subscript𝑥𝑘subscript𝑥𝑘subscript𝑥𝐂subscriptΨsubscript𝑀𝑘𝑥subscript𝑥𝑘subscriptΨsubscript𝑀𝑘subscript𝑥𝑘subscript𝑥𝑘𝑓subscript𝑥𝑘\displaystyle\Psi_{M_{k}}\left(T_{M_{k}}(x_{k});x_{k}\right)\leq\min\limits_{x% \in\mathbf{C}}\Psi_{M_{k}}\left(x;x_{k}\right)\leq\Psi_{M_{k}}\left(x_{k};x_{k% }\right)=f(x_{k}).roman_Ψ start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ; italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ≤ roman_min start_POSTSUBSCRIPT italic_x ∈ bold_C end_POSTSUBSCRIPT roman_Ψ start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x ; italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ≤ roman_Ψ start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ; italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) .

Then, combining this inequality with equation (9) we get the first statement. Further, summing up the inequality (9) and using that f𝑓fitalic_f is bounded from below by fsuperscript𝑓f^{*}italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, we get:

k=1Nδ2xk+1xk2f(x0)f(xN)f(x0)f,superscriptsubscript𝑘1𝑁𝛿2superscriptnormsubscript𝑥𝑘1subscript𝑥𝑘2𝑓subscript𝑥0𝑓subscript𝑥𝑁𝑓subscript𝑥0superscript𝑓\displaystyle\sum_{k=1}^{N}\frac{\delta}{2}\|x_{k+1}-x_{k}\|^{2}\leq f(x_{0})-% f(x_{N})\leq f(x_{0})-f^{*},∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT divide start_ARG italic_δ end_ARG start_ARG 2 end_ARG ∥ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_f ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) ≤ italic_f ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ,

and the second statement follows. ∎

In [7, 15], the authors prove that for the composite problem (3), the quantity dist(0,f(xk+1))dist0𝑓subscript𝑥𝑘1\text{dist}(0,\partial f(x_{k+1}))dist ( 0 , ∂ italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) ) does not always tend to zero in the limit, even if xk+1xknormsubscript𝑥𝑘1subscript𝑥𝑘\|x_{k+1}-x_{k}\|∥ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ goes to zero. Thus, we must look elsewhere for a connection between dist(0,f())dist0𝑓\text{dist}(0,\partial f(\cdot))dist ( 0 , ∂ italic_f ( ⋅ ) ) and xk+1xknormsubscript𝑥𝑘1subscript𝑥𝑘\|x_{k+1}-x_{k}\|∥ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥. Let us start with the following observation, whose proof can be found in Lemma 4.2 [7]: the function f(x):=F(x)+I𝐂(x)assign𝑓𝑥norm𝐹𝑥subscript𝐼𝐂𝑥f(x):=\|F(x)\|+{\color[rgb]{0,0,0}I}_{\mathbf{C}}(x)italic_f ( italic_x ) := ∥ italic_F ( italic_x ) ∥ + italic_I start_POSTSUBSCRIPT bold_C end_POSTSUBSCRIPT ( italic_x ) is LFsubscript𝐿𝐹L_{F}italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT-weakly convex. Weak convexity of f𝑓fitalic_f has an immediate consequence on the Moreau envelope, denoted fμsubscript𝑓𝜇f_{\mu}italic_f start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT:

Lemma 3

(Lemma 4.3 [7]) Let μ>LF𝜇subscript𝐿𝐹\mu>L_{F}italic_μ > italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT. Then, the proximal map proxμfsubscriptprox𝜇𝑓\text{prox}_{\mu f}prox start_POSTSUBSCRIPT italic_μ italic_f end_POSTSUBSCRIPT is well-defined and single-valued. The Moreau envelope fμsubscript𝑓𝜇f_{\mu}italic_f start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT is smooth with gradient given by:

fμ(x)=μ(xproxμf(x)).subscript𝑓𝜇𝑥𝜇𝑥subscriptprox𝜇𝑓𝑥\displaystyle\nabla f_{\mu}(x)=\mu(x-\text{prox}_{\mu f}(x)).∇ italic_f start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT ( italic_x ) = italic_μ ( italic_x - prox start_POSTSUBSCRIPT italic_μ italic_f end_POSTSUBSCRIPT ( italic_x ) ) .

Further, we have the following lemma whose prove is similar to the proof of Lemma 5555 in [15].

Lemma 4

Let Assumption 1 holds. Let (xk)k0subscriptsubscript𝑥𝑘𝑘0(x_{k})_{k\geq 0}( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k ≥ 0 end_POSTSUBSCRIPT be generated by MPG-N method and consider yk+1=proxμf(xk)subscript𝑦𝑘1subscriptprox𝜇𝑓subscript𝑥𝑘y_{k+1}=\text{prox}_{\mu f}(x_{k})italic_y start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = prox start_POSTSUBSCRIPT italic_μ italic_f end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), where 1μ(0,13LF)1𝜇013subscript𝐿𝐹\frac{1}{\mu}\in(0,\frac{1}{3L_{F}})divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG ∈ ( 0 , divide start_ARG 1 end_ARG start_ARG 3 italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_ARG ). Then, we have the following relations:

  1. 1.

    yk+1xk2μμ3LFxk+1xk2superscriptnormsubscript𝑦𝑘1subscript𝑥𝑘2𝜇𝜇3subscript𝐿𝐹superscriptnormsubscript𝑥𝑘1subscript𝑥𝑘2\|y_{k+1}-x_{k}\|^{2}\leq\frac{\mu}{\mu-3L_{F}}\|x_{k+1}-x_{k}\|^{2}∥ italic_y start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG italic_μ end_ARG start_ARG italic_μ - 3 italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_ARG ∥ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

  2. 2.

    dist(0,f(yk+1))μyk+1xkdist0𝑓subscript𝑦𝑘1𝜇normsubscript𝑦𝑘1subscript𝑥𝑘\text{dist}(0,\partial f(y_{k+1}))\leq\mu\|y_{k+1}-x_{k}\|dist ( 0 , ∂ italic_f ( italic_y start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) ) ≤ italic_μ ∥ italic_y start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥.

Proof:

Let us prove the first statement. Since f𝑓fitalic_f is weakly convex, then yk+1subscript𝑦𝑘1y_{k+1}italic_y start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT is well-defined and unique. Thus:

f(yk+1)+μ2yk+1xk2f(xk+1)+μ2xk+1xk2.𝑓subscript𝑦𝑘1𝜇2superscriptnormsubscript𝑦𝑘1subscript𝑥𝑘2𝑓subscript𝑥𝑘1𝜇2superscriptnormsubscript𝑥𝑘1subscript𝑥𝑘2\displaystyle f(y_{k+1})+\frac{\mu}{2}\|y_{k+1}-x_{k}\|^{2}\leq f(x_{k+1})+% \frac{\mu}{2}\|x_{k+1}-x_{k}\|^{2}.italic_f ( italic_y start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) + divide start_ARG italic_μ end_ARG start_ARG 2 end_ARG ∥ italic_y start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) + divide start_ARG italic_μ end_ARG start_ARG 2 end_ARG ∥ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (11)

Further, from the definition of xk+1subscript𝑥𝑘1x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT, we have:

f(xk+1)𝑓subscript𝑥𝑘1\displaystyle f(x_{k+1})italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) (9)F(xk)+F(xk)(xk+1xk)+Mk2xk+1xk2superscriptitalic-(9italic-)absentnorm𝐹subscript𝑥𝑘𝐹subscript𝑥𝑘subscript𝑥𝑘1subscript𝑥𝑘subscript𝑀𝑘2superscriptnormsubscript𝑥𝑘1subscript𝑥𝑘2\displaystyle\stackrel{{\scriptstyle\eqref{eq:alg_desc}}}{{\leq}}\|F(x_{k})\!+% \!\nabla F(x_{k})(x_{k+1}\!\!-\!\!x_{k})\|+\frac{M_{k}}{2}\|x_{k+1}\!\!-\!\!x_% {k}\|^{2}start_RELOP SUPERSCRIPTOP start_ARG ≤ end_ARG start_ARG italic_( italic_) end_ARG end_RELOP ∥ italic_F ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + ∇ italic_F ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∥ + divide start_ARG italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
(8)minx𝐂F(xk)+F(xk)(xxk)+Mk2xxk2superscriptitalic-(8italic-)absentsubscript𝑥𝐂norm𝐹subscript𝑥𝑘𝐹subscript𝑥𝑘𝑥subscript𝑥𝑘subscript𝑀𝑘2superscriptnorm𝑥subscript𝑥𝑘2\displaystyle\stackrel{{\scriptstyle\eqref{eq:iter}}}{{\leq}}\min_{x\in\mathbf% {C}}\|F(x_{k})+\nabla F(x_{k})(x-x_{k})\|+\frac{M_{k}}{2}\|x-x_{k}\|^{2}start_RELOP SUPERSCRIPTOP start_ARG ≤ end_ARG start_ARG italic_( italic_) end_ARG end_RELOP roman_min start_POSTSUBSCRIPT italic_x ∈ bold_C end_POSTSUBSCRIPT ∥ italic_F ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + ∇ italic_F ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ( italic_x - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∥ + divide start_ARG italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ italic_x - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
(9)minx𝐂f(x)+Mk+LF2xxk2superscriptitalic-(9italic-)absentsubscript𝑥𝐂𝑓𝑥subscript𝑀𝑘subscript𝐿𝐹2superscriptnorm𝑥subscript𝑥𝑘2\displaystyle\stackrel{{\scriptstyle{\color[rgb]{0,0,0}\eqref{eq:alg_desc}}}}{% {\leq}}\min_{x\in\mathbf{C}}f(x)+\frac{M_{k}+L_{F}}{2}\|x-x_{k}\|^{2}start_RELOP SUPERSCRIPTOP start_ARG ≤ end_ARG start_ARG italic_( italic_) end_ARG end_RELOP roman_min start_POSTSUBSCRIPT italic_x ∈ bold_C end_POSTSUBSCRIPT italic_f ( italic_x ) + divide start_ARG italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ italic_x - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
f(yk+1)+Mk+LF2yk+1xk2,absent𝑓subscript𝑦𝑘1subscript𝑀𝑘subscript𝐿𝐹2superscriptnormsubscript𝑦𝑘1subscript𝑥𝑘2\displaystyle\leq f(y_{k+1})+\frac{M_{k}+L_{F}}{2}\|y_{k+1}-x_{k}\|^{2},≤ italic_f ( italic_y start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) + divide start_ARG italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ italic_y start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where the last inequality follows by taking x=yk+1𝑥subscript𝑦𝑘1x=y_{k+1}italic_x = italic_y start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT. Thus, we have:

f(xk+1)𝑓subscript𝑥𝑘1\displaystyle f(x_{k+1})italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) f(yk+1)+3LF2yk+1xk2.absent𝑓subscript𝑦𝑘13subscript𝐿𝐹2superscriptnormsubscript𝑦𝑘1subscript𝑥𝑘2\displaystyle\leq f(y_{k+1})+\frac{3L_{F}}{2}\|y_{k+1}-x_{k}\|^{2}.≤ italic_f ( italic_y start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) + divide start_ARG 3 italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ italic_y start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (12)

Finally, combining this inequality with (11), we get:

yk+1xk2μμ3LFxk+1xk2,superscriptnormsubscript𝑦𝑘1subscript𝑥𝑘2𝜇𝜇3subscript𝐿𝐹superscriptnormsubscript𝑥𝑘1subscript𝑥𝑘2\displaystyle\|y_{k+1}-x_{k}\|^{2}\leq\frac{\mu}{\mu-3L_{F}}\|x_{k+1}-x_{k}\|^% {2},∥ italic_y start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG italic_μ end_ARG start_ARG italic_μ - 3 italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_ARG ∥ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

which proves the first statement. Further, from the optimality conditions of yk+1subscript𝑦𝑘1y_{k+1}italic_y start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT, we get:

μ(yk+1xk)f(yk+1).𝜇subscript𝑦𝑘1subscript𝑥𝑘𝑓subscript𝑦𝑘1\displaystyle-\mu(y_{k+1}-x_{k})\in\partial f(y_{k+1}).- italic_μ ( italic_y start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∈ ∂ italic_f ( italic_y start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) . (13)

Thus, the second statement follows. ∎

Using the strict descent and Lemma 4, we can conclude the following global convergence rate:

Theorem 1

Let the assumptions of Lemma 4 hold. Then:

minj=1:kdist(0,f(yj))𝒪(1k1/2).subscript:𝑗1𝑘dist0𝑓subscript𝑦𝑗𝒪1superscript𝑘12\displaystyle\min\limits_{j=1:k}\text{dist}(0,\partial f(y_{j}))\leq\mathcal{O% }\left(\frac{1}{k^{1/2}}\right).roman_min start_POSTSUBSCRIPT italic_j = 1 : italic_k end_POSTSUBSCRIPT dist ( 0 , ∂ italic_f ( italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ) ≤ caligraphic_O ( divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT end_ARG ) .

Moreover, any limit point of the sequence (xk)k0subscriptsubscript𝑥𝑘𝑘0(x_{k})_{k\geq 0}( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k ≥ 0 end_POSTSUBSCRIPT is a stationary point of problem (7).

Proof:

From Lemma 4, we have:

dist(0,f(yk+1))2distsuperscript0𝑓subscript𝑦𝑘12\displaystyle\text{dist}(0,f(y_{k+1}))^{2}dist ( 0 , italic_f ( italic_y start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT μ3μ3Lfxk+1xk2.absentsuperscript𝜇3𝜇3subscript𝐿𝑓superscriptnormsubscript𝑥𝑘1subscript𝑥𝑘2\displaystyle\leq\frac{\mu^{3}}{\mu-3L_{f}}\|x_{k+1}-x_{k}\|^{2}.≤ divide start_ARG italic_μ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ - 3 italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_ARG ∥ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Further, combining this inequality with (10), we get:

dist(0,f(yk+1))26μ3δ(μ3LF)f(xk)f(xk+1).distsuperscript0𝑓subscript𝑦𝑘126superscript𝜇3𝛿𝜇3subscript𝐿𝐹𝑓subscript𝑥𝑘𝑓subscript𝑥𝑘1\displaystyle\text{dist}(0,f(y_{k+1}))^{2}\leq\frac{6\mu^{3}}{\delta(\mu-3L_{F% })}f(x_{k})-f(x_{k+1}).dist ( 0 , italic_f ( italic_y start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 6 italic_μ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ ( italic_μ - 3 italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ) end_ARG italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) .

Summing up this inequality and taking the minimum we get:

minj=0:kdist(0,f(yj+1))6μ3δ(μ3LF)1k1/2.subscript:𝑗0𝑘dist0𝑓subscript𝑦𝑗16superscript𝜇3𝛿𝜇3subscript𝐿𝐹1superscript𝑘12\displaystyle\min\limits_{j=0:k}\text{dist}(0,f(y_{j+1}))\leq\sqrt{\frac{6\mu^% {3}}{\delta(\mu-3L_{F})}}\frac{1}{k^{1/2}}.roman_min start_POSTSUBSCRIPT italic_j = 0 : italic_k end_POSTSUBSCRIPT dist ( 0 , italic_f ( italic_y start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT ) ) ≤ square-root start_ARG divide start_ARG 6 italic_μ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ ( italic_μ - 3 italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ) end_ARG end_ARG divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT end_ARG .

which prove our first statement. Further, let xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT be a limit point of (xk)k0subscriptsubscript𝑥𝑘𝑘0(x_{k})_{k\geq 0}( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k ≥ 0 end_POSTSUBSCRIPT, then one can notice that it is also a limit point of the sequence (yk)k0subscriptsubscript𝑦𝑘𝑘0(y_{k})_{k\geq 0}( italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k ≥ 0 end_POSTSUBSCRIPT. This means that there exist a subsequence (ykj)j0subscriptsubscript𝑦subscript𝑘𝑗𝑗0(y_{k_{j}})_{j\geq 0}( italic_y start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j ≥ 0 end_POSTSUBSCRIPT such that ykjxsubscript𝑦subscript𝑘𝑗superscript𝑥y_{k_{j}}\to x^{*}italic_y start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT → italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Since F𝐹Fitalic_F is continuous, then f(ykj)f(x)𝑓subscript𝑦subscript𝑘𝑗𝑓superscript𝑥f(y_{k_{j}})\to f(x^{*})italic_f ( italic_y start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) → italic_f ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ). Note that we have μ(ykjxkj1)f(ykj)𝜇subscript𝑦subscript𝑘𝑗subscript𝑥subscript𝑘𝑗1𝑓subscript𝑦subscript𝑘𝑗\mu(y_{k_{j}}-x_{k_{j}-1})\in\partial f(y_{k_{j}})italic_μ ( italic_y start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT ) ∈ ∂ italic_f ( italic_y start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) and (ykjxkj1)0subscript𝑦subscript𝑘𝑗subscript𝑥subscript𝑘𝑗10(y_{k_{j}}-x_{k_{j}-1})\to 0( italic_y start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT ) → 0. Then, we conclude from the definition of the generalized subgradient that 0f(x)0𝑓superscript𝑥0\in\partial f(x^{*})0 ∈ ∂ italic_f ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) and hence xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is a stationary point. ∎

III-B Better rates under KL

In [17], the authors impose a nondegeneracy assumption on the Jacobian, that is, σmin(F(x))>0subscript𝜎min𝐹𝑥0\sigma_{\text{min}}(\nabla F(x))>0italic_σ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ( ∇ italic_F ( italic_x ) ) > 0 for all x𝑥xitalic_x in the level set of F(x0)norm𝐹subscript𝑥0\|F(x_{0})\|∥ italic_F ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ∥ in order to prove global convergence rate for MG-N method. Such a condition is not always valid in practice. In this section, we derive improved convergence rates for MPG-N method provided that the objective function satisfies the KL property. In general, the KL condition is less conservative than the nondegeneracy condition (see Section IV). Let us denote the set of limit points of (xk)k0subscriptsubscript𝑥𝑘𝑘0(x_{k})_{k\geq 0}( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k ≥ 0 end_POSTSUBSCRIPT by:

Ω(x0)=Ωsubscript𝑥0absent\displaystyle\Omega(x_{0})=roman_Ω ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = {x¯𝔼: an increasing sequence of integers\displaystyle\{\bar{x}\in\mathbb{E}:\exists\text{ an increasing sequence of % integers }{ over¯ start_ARG italic_x end_ARG ∈ blackboard_E : ∃ an increasing sequence of integers
(kt)t0, such that xktx¯ as t}.\displaystyle(k_{t})_{t\geq 0},\text{ such that }x_{k_{t}}\to\bar{x}\text{ as % }t\to\infty\}.( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT , such that italic_x start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT → over¯ start_ARG italic_x end_ARG as italic_t → ∞ } .

We have the following convergence rate:

Theorem 2

Let the assumptions of Lemma 4 hold. Additionally, assume that f𝑓fitalic_f satisfy the KL property (6) on Ω(x0)Ωsubscript𝑥0\Omega(x_{0})roman_Ω ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ). Then, the following convergence rates hold for the sequence (xk)k0subscriptsubscript𝑥𝑘𝑘0(x_{k})_{k\geq 0}( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k ≥ 0 end_POSTSUBSCRIPT generated by MPG-N algorithm in function values:

  1. \bullet

    If q2𝑞2q\geq 2italic_q ≥ 2, then f(xk)𝑓subscript𝑥𝑘f(x_{k})italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) converge to fsubscript𝑓f_{*}italic_f start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT linearly for k𝑘kitalic_k sufficiently large.

  2. \bullet

    If q<2𝑞2q<2italic_q < 2, then f(xk)𝑓subscript𝑥𝑘f(x_{k})italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) converge to fsubscript𝑓f_{*}italic_f start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT at sublinear rate of order 𝒪(1kq2q)𝒪1superscript𝑘𝑞2𝑞\mathcal{O}\left(\frac{1}{k^{\frac{q}{2-q}}}\right)caligraphic_O ( divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUPERSCRIPT divide start_ARG italic_q end_ARG start_ARG 2 - italic_q end_ARG end_POSTSUPERSCRIPT end_ARG ) for k𝑘kitalic_k sufficiently large.

Proof:

We have:

f(xk+1)f𝑓subscript𝑥𝑘1subscript𝑓\displaystyle f(x_{k+1})-f_{*}italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT (12)f(yk+1)f+3LF2yk+1xk2superscriptitalic-(12italic-)absent𝑓subscript𝑦𝑘1subscript𝑓3subscript𝐿𝐹2superscriptnormsubscript𝑦𝑘1subscript𝑥𝑘2\displaystyle\stackrel{{\scriptstyle{\color[rgb]{0,0,0}\eqref{eq:01}}}}{{\leq}% }f(y_{k+1})-f_{*}+\frac{3L_{F}}{2}\|y_{k+1}-x_{k}\|^{2}start_RELOP SUPERSCRIPTOP start_ARG ≤ end_ARG start_ARG italic_( italic_) end_ARG end_RELOP italic_f ( italic_y start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT + divide start_ARG 3 italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ italic_y start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
(6)σqSf(yk+1)q+3LF2yk+1xk2superscriptitalic-(6italic-)absentsubscript𝜎𝑞subscript𝑆𝑓superscriptsubscript𝑦𝑘1𝑞3subscript𝐿𝐹2superscriptnormsubscript𝑦𝑘1subscript𝑥𝑘2\displaystyle\stackrel{{\scriptstyle{\color[rgb]{0,0,0}\eqref{eq:kl}}}}{{\leq}% }\sigma_{q}{\color[rgb]{0,0,0}S_{f}(y_{k+1})^{q}}+\frac{3L_{F}}{2}\|y_{k+1}-x_% {k}\|^{2}start_RELOP SUPERSCRIPTOP start_ARG ≤ end_ARG start_ARG italic_( italic_) end_ARG end_RELOP italic_σ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT + divide start_ARG 3 italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ italic_y start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
σqμqyk+1xkq+3LF2yk+1xk2absentsubscript𝜎𝑞superscript𝜇𝑞superscriptnormsubscript𝑦𝑘1subscript𝑥𝑘𝑞3subscript𝐿𝐹2superscriptnormsubscript𝑦𝑘1subscript𝑥𝑘2\displaystyle\leq\sigma_{q}\mu^{q}\|y_{k+1}-x_{k}\|^{q}+\frac{3L_{F}}{2}\|y_{k% +1}-x_{k}\|^{2}≤ italic_σ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT italic_μ start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT ∥ italic_y start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT + divide start_ARG 3 italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ italic_y start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
σqμq(μμ3LF)q/2xk+1xkqabsentsubscript𝜎𝑞superscript𝜇𝑞superscript𝜇𝜇3subscript𝐿𝐹𝑞2superscriptnormsubscript𝑥𝑘1subscript𝑥𝑘𝑞\displaystyle\leq\sigma_{q}\mu^{q}\left(\frac{\mu}{\mu-3L_{F}}\right)^{q/2}\|x% _{k+1}-x_{k}\|^{q}≤ italic_σ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT italic_μ start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT ( divide start_ARG italic_μ end_ARG start_ARG italic_μ - 3 italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT italic_q / 2 end_POSTSUPERSCRIPT ∥ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT
+2μ3LF(μ3LF)xk+1xk22𝜇3subscript𝐿𝐹𝜇3subscript𝐿𝐹superscriptnormsubscript𝑥𝑘1subscript𝑥𝑘2\displaystyle\quad+\frac{2\mu}{3L_{F}(\mu-3L_{F})}\|x_{k+1}-x_{k}\|^{2}+ divide start_ARG 2 italic_μ end_ARG start_ARG 3 italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ( italic_μ - 3 italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ) end_ARG ∥ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
C1(f(xk)f(xk+1))q2+C2(f(xk)f(xk+1)).absentsubscript𝐶1superscript𝑓subscript𝑥𝑘𝑓subscript𝑥𝑘1𝑞2subscript𝐶2𝑓subscript𝑥𝑘𝑓subscript𝑥𝑘1\displaystyle\leq C_{1}(f(x_{k})-f(x_{k+1}))^{\frac{q}{2}}+C_{2}(f(x_{k})-f(x_% {k+1})).≤ italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT divide start_ARG italic_q end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT + italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) ) .

where the third and the fourth inequalities follow from Lemma 4, the last inequality follows from the descent (10), C1=σqμq(μμ3LF)q/2(2/δ)q/2subscript𝐶1subscript𝜎𝑞superscript𝜇𝑞superscript𝜇𝜇3subscript𝐿𝐹𝑞2superscript2𝛿𝑞2C_{1}=\sigma_{q}\mu^{q}\left(\frac{\mu}{\mu-3L_{F}}\right)^{q/2}(2/\delta)^{q/2}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_σ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT italic_μ start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT ( divide start_ARG italic_μ end_ARG start_ARG italic_μ - 3 italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT italic_q / 2 end_POSTSUPERSCRIPT ( 2 / italic_δ ) start_POSTSUPERSCRIPT italic_q / 2 end_POSTSUPERSCRIPT and C2=4μδ(3LF)(μ3LF)subscript𝐶24𝜇𝛿3subscript𝐿𝐹𝜇3subscript𝐿𝐹C_{2}=\frac{4\mu}{\delta(3L_{F})(\mu-3L_{F})}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = divide start_ARG 4 italic_μ end_ARG start_ARG italic_δ ( 3 italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ) ( italic_μ - 3 italic_L start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ) end_ARG. Denote δk=f(xk)fsubscript𝛿𝑘𝑓subscript𝑥𝑘subscript𝑓\delta_{k}=f(x_{k})-f_{*}italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT, then we get:

δk+1C1(δkδk+1)q2+C2(δkδk+1).subscript𝛿𝑘1subscript𝐶1superscriptsubscript𝛿𝑘subscript𝛿𝑘1𝑞2subscript𝐶2subscript𝛿𝑘subscript𝛿𝑘1\displaystyle\delta_{k+1}\leq C_{1}(\delta_{k}-\delta_{k+1})^{\frac{q}{2}}+C_{% 2}(\delta_{k}-\delta_{k+1}).italic_δ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ≤ italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_δ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG italic_q end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT + italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_δ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) .

Using Lemma 2 in [15] with θ=2q𝜃2𝑞\theta=\frac{2}{q}italic_θ = divide start_ARG 2 end_ARG start_ARG italic_q end_ARG we get our statement. ∎

IV Power flow analysis

Power flow problems are ones of the most studied in power systems being an important tool for planning and operation of the electric grid. In this section we consider the particular problem of power flow analysis. This is defined as follows. Consider a power system with N𝑁Nitalic_N bus (see e.g., Figure 1 for the IEEE 14 bus system). We denote visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and qisubscript𝑞𝑖q_{i}italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT the complex voltage, active power and reactive power for the i𝑖iitalic_i bus, respectively. Let Y:=G+jBassign𝑌𝐺𝑗𝐵Y:=G+jBitalic_Y := italic_G + italic_j italic_B be the admittance matrix and denote p=(p1,,pN)𝑝subscript𝑝1subscript𝑝𝑁p=(p_{1},\cdots,p_{N})italic_p = ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_p start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ), q=(q1,,qN)𝑞subscript𝑞1subscript𝑞𝑁q=(q_{1},\cdots,q_{N})italic_q = ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_q start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) and v=(v1,,vN)𝑣subscript𝑣1subscript𝑣𝑁v=(v_{1},\cdots,v_{N})italic_v = ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_v start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ). Given a complex load vector s:=sR+jsIassign𝑠subscript𝑠𝑅𝑗subscript𝑠𝐼s:=s_{R}+js_{I}italic_s := italic_s start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT + italic_j italic_s start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT, then the power flow analysis problem is to find v=(v1,,vN)𝑣subscript𝑣1subscript𝑣𝑁v=(v_{1},\cdots,v_{N})italic_v = ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_v start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) such that [6]:

F(v)=s;F(v)=p+jq=diag(vvHYH),formulae-sequence𝐹𝑣𝑠𝐹𝑣𝑝𝑗𝑞diag𝑣superscript𝑣𝐻superscript𝑌𝐻\displaystyle F(v)=s\;;\quad F(v)=p+jq=\text{diag}(vv^{H}Y^{H}),italic_F ( italic_v ) = italic_s ; italic_F ( italic_v ) = italic_p + italic_j italic_q = diag ( italic_v italic_v start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT italic_Y start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT ) , (14)

where (.)H(.)^{H}( . ) start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT is the Hermitian transpose. This problem is equivalent to the following optimization problem:

Refer to caption
Figure 1: Representation of the IEEE 14-bus system [10].
minv=(u,θ)F(v)ssubscript𝑣𝑢𝜃norm𝐹𝑣𝑠\displaystyle\min_{v=(u,\theta)}\|F(v)-s\|roman_min start_POSTSUBSCRIPT italic_v = ( italic_u , italic_θ ) end_POSTSUBSCRIPT ∥ italic_F ( italic_v ) - italic_s ∥
s.t.u[umin,umax],θ[π,π].\displaystyle s.t.\quad u\in[u_{\text{min}},u_{\text{max}}],\quad\theta\in[-% \pi,\pi].italic_s . italic_t . italic_u ∈ [ italic_u start_POSTSUBSCRIPT min end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ] , italic_θ ∈ [ - italic_π , italic_π ] .

In [6], the authors provide a similar formulation for the power flow analysis problem, but using 2\|\cdot\|^{2}∥ ⋅ ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT as the merit function to measure the distance between the objective function F()𝐹F(\cdot)italic_F ( ⋅ ) and the desired complex load s𝑠sitalic_s. As we have mentioned earlier, it is beneficial to use only \|\cdot\|∥ ⋅ ∥ as the merit function. Further, since we have (see e.g., [12]):

pi(u,θ)=k=1Nuiuk(G(i,k)cos(θiθk)+B(i,k)sin(θiθk)),subscript𝑝𝑖𝑢𝜃superscriptsubscript𝑘1𝑁subscript𝑢𝑖subscript𝑢𝑘𝐺𝑖𝑘cossubscript𝜃𝑖subscript𝜃𝑘𝐵𝑖𝑘sinsubscript𝜃𝑖subscript𝜃𝑘\displaystyle p_{i}(u,\theta)=\sum_{k=1}^{N}u_{i}u_{k}\left(G(i,k)\text{cos}(% \theta_{i}-\theta_{k})+B(i,k)\text{sin}(\theta_{i}-\theta_{k})\right),italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_u , italic_θ ) = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_G ( italic_i , italic_k ) cos ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_B ( italic_i , italic_k ) sin ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) ,
qi(u,θ)=k=1Nuiuk(B(i,k)cos(θiθk)+G(i,k)sin(θiθk)),subscript𝑞𝑖𝑢𝜃superscriptsubscript𝑘1𝑁subscript𝑢𝑖subscript𝑢𝑘𝐵𝑖𝑘cossubscript𝜃𝑖subscript𝜃𝑘𝐺𝑖𝑘sinsubscript𝜃𝑖subscript𝜃𝑘\displaystyle q_{i}(u,\theta)=-\sum_{k=1}^{N}u_{i}u_{k}(B(i,k)\text{cos}(% \theta_{i}\!-\!\theta_{k})\!+\!G(i,k)\text{sin}(\theta_{i}\!-\!\theta_{k})),italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_u , italic_θ ) = - ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_B ( italic_i , italic_k ) cos ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_G ( italic_i , italic_k ) sin ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) ,

and denote:

C={(u,θ):u[umin,umax],θ[π,π]},Cconditional-set𝑢𝜃formulae-sequence𝑢subscript𝑢minsubscript𝑢max𝜃𝜋𝜋\textbf{C}=\{(u,\theta):u\in[u_{\text{min}},u_{\text{max}}],\quad\theta\in[-% \pi,\pi]\},C = { ( italic_u , italic_θ ) : italic_u ∈ [ italic_u start_POSTSUBSCRIPT min end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ] , italic_θ ∈ [ - italic_π , italic_π ] } ,

then, the previous optimization problem is equivalent to the following optimization problem:

minx=(u;θ)Cf(x)=p(x)sRq(x)sI.subscript𝑥𝑢𝜃C𝑓𝑥normmatrix𝑝𝑥subscript𝑠𝑅𝑞𝑥subscript𝑠𝐼\displaystyle\min\limits_{x=(u;\theta)\in\textbf{C}}f(x)=\begin{Vmatrix}p(x)-s% _{R}\\ q(x)-s_{I}\end{Vmatrix}.roman_min start_POSTSUBSCRIPT italic_x = ( italic_u ; italic_θ ) ∈ C end_POSTSUBSCRIPT italic_f ( italic_x ) = ∥ start_ARG start_ROW start_CELL italic_p ( italic_x ) - italic_s start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_q ( italic_x ) - italic_s start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ∥ . (15)

The most efficient algorithm for solving the (unconstrained) power flow analysis problem is the Newton-Raphson (NR) method [22]. However it may lead to poor performance when the initialization point is far from the optimum or the system is stressed (i.e., the problem is ill-conditioned). In a recent paper, [6], the authors proposed a hybrid method that combines stochastic gradient descent (SGD) and the NR methods to overcome the numerical challenges in this problem. The iterative process starts with the NR algorithm, and if the method detect a divergence (e.g., when the condition number of the Jacobian deteriorates), then switch to the SGD algorithm. After running a few SGD steps, then again switch to the NR iterates and repeat the process until an (approximate) optimal solution is found. Since this hybrid algorithm cannot deal with (simple) constraints as in (15), we propose to use our new method, modified projected Gauss-Newton (MPG-N), and compare its performance with the projected gradient descent (PGD) method applied to the problem (2), where F𝐹Fitalic_F is given in (14). In order to apply both methods, one needs to evaluate the gradient of the functions p(x)𝑝𝑥p(x)italic_p ( italic_x ) and q(x)𝑞𝑥q(x)italic_q ( italic_x ). We have the following expressions for the derivatives of pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s and qisubscript𝑞𝑖q_{i}italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s:

piui=2G(i,i)+k=1kiNuk(G(i,k)cos(θiθk)+B(i,k)sin(θiθk)),subscript𝑝𝑖subscript𝑢𝑖2𝐺𝑖𝑖superscriptsubscript𝑘1𝑘𝑖𝑁subscript𝑢𝑘𝐺𝑖𝑘cossubscript𝜃𝑖subscript𝜃𝑘𝐵𝑖𝑘sinsubscript𝜃𝑖subscript𝜃𝑘\displaystyle\frac{\partial p_{i}}{\partial u_{i}}\!=\!2G(i,i)+\!\!\sum_{% \begin{subarray}{c}k=1\\ k\neq i\end{subarray}}^{N}u_{k}\left(G(i,k)\text{cos}(\theta_{i}\!\!-\!\!% \theta_{k})\!\!+\!\!B(i,k)\text{sin}(\theta_{i}\!\!-\!\!\theta_{k})\right),divide start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = 2 italic_G ( italic_i , italic_i ) + ∑ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_k = 1 end_CELL end_ROW start_ROW start_CELL italic_k ≠ italic_i end_CELL end_ROW end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_G ( italic_i , italic_k ) cos ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_B ( italic_i , italic_k ) sin ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) ,
piuk=ui(G(i,k)cos(θiθk)+B(i,k)sin(θiθk)),ki,formulae-sequencesubscript𝑝𝑖subscript𝑢𝑘subscript𝑢𝑖𝐺𝑖𝑘cossubscript𝜃𝑖subscript𝜃𝑘𝐵𝑖𝑘sinsubscript𝜃𝑖subscript𝜃𝑘for-all𝑘𝑖\displaystyle\frac{\partial p_{i}}{\partial u_{k}}=u_{i}\left(G(i,k)\text{cos}% (\theta_{i}\!-\!\theta_{k})+B(i,k)\text{sin}(\theta_{i}\!-\!\theta_{k})\right)% ,\forall k\!\neq\!i,divide start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG = italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_G ( italic_i , italic_k ) cos ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_B ( italic_i , italic_k ) sin ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) , ∀ italic_k ≠ italic_i ,
piθi=k=1kiNukui(G(i,k)sin(θiθk)+B(i,k)cos(θiθk)),subscript𝑝𝑖subscript𝜃𝑖superscriptsubscript𝑘1𝑘𝑖𝑁subscript𝑢𝑘subscript𝑢𝑖𝐺𝑖𝑘sinsubscript𝜃𝑖subscript𝜃𝑘𝐵𝑖𝑘cossubscript𝜃𝑖subscript𝜃𝑘\displaystyle\frac{\partial p_{i}}{\partial\theta_{i}}=\sum_{\begin{subarray}{% c}k=1\\ k\neq i\end{subarray}}^{N}u_{k}u_{i}\left(-G(i,k)\text{sin}(\theta_{i}\!-\!% \theta_{k})+B(i,k)\text{cos}(\theta_{i}\!-\!\theta_{k})\right),divide start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = ∑ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_k = 1 end_CELL end_ROW start_ROW start_CELL italic_k ≠ italic_i end_CELL end_ROW end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( - italic_G ( italic_i , italic_k ) sin ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_B ( italic_i , italic_k ) cos ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) ,
piθk=uiuk(B(i,k)cos(θiθk)G(i,k)sin(θiθk)),ki,formulae-sequencesubscript𝑝𝑖subscript𝜃𝑘subscript𝑢𝑖subscript𝑢𝑘𝐵𝑖𝑘cossubscript𝜃𝑖subscript𝜃𝑘𝐺𝑖𝑘sinsubscript𝜃𝑖subscript𝜃𝑘for-all𝑘𝑖\displaystyle\frac{\partial p_{i}}{\partial\theta_{k}}=-u_{i}u_{k}\left(-B(i,k% )\text{cos}(\theta_{i}\!\!-\!\!\theta_{k})\!\!-\!\!G(i,k)\text{sin}(\theta_{i}% \!\!-\!\!\theta_{k})\right),\!\forall k\neq\!i,divide start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG = - italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( - italic_B ( italic_i , italic_k ) cos ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_G ( italic_i , italic_k ) sin ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) , ∀ italic_k ≠ italic_i ,
qiui=2B(i,i)k=1kiNuk(B(i,k)cos(θiθk)G(i,k)sin(θiθk)),subscript𝑞𝑖subscript𝑢𝑖2𝐵𝑖𝑖superscriptsubscript𝑘1𝑘𝑖𝑁subscript𝑢𝑘𝐵𝑖𝑘cossubscript𝜃𝑖subscript𝜃𝑘𝐺𝑖𝑘sinsubscript𝜃𝑖subscript𝜃𝑘\displaystyle\frac{\partial q_{i}}{\partial u_{i}}\!=\!\!-\!2B(i,i)\!\!-\!\!\!% \sum_{\begin{subarray}{c}k=1\\ k\neq i\end{subarray}}^{N}u_{k}\left(B(i,k)\text{cos}(\theta_{i}\!\!-\!\!% \theta_{k})\!\!-\!\!G(i,k)\text{sin}(\theta_{i}\!\!-\!\!\theta_{k})\right),divide start_ARG ∂ italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = - 2 italic_B ( italic_i , italic_i ) - ∑ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_k = 1 end_CELL end_ROW start_ROW start_CELL italic_k ≠ italic_i end_CELL end_ROW end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_B ( italic_i , italic_k ) cos ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_G ( italic_i , italic_k ) sin ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) ,
qiuk=ui(B(i,k)cos(θiθk)G(i,k)sin(θiθk)),ki,formulae-sequencesubscript𝑞𝑖subscript𝑢𝑘subscript𝑢𝑖𝐵𝑖𝑘cossubscript𝜃𝑖subscript𝜃𝑘𝐺𝑖𝑘sinsubscript𝜃𝑖subscript𝜃𝑘for-all𝑘𝑖\displaystyle\frac{\partial q_{i}}{\partial u_{k}}=-u_{i}\left(B(i,k)\text{cos% }(\theta_{i}\!-\!\theta_{k})-G(i,k)\text{sin}(\theta_{i}\!-\!\theta_{k})\right% ),\forall k\!\neq\!i,divide start_ARG ∂ italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG = - italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_B ( italic_i , italic_k ) cos ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_G ( italic_i , italic_k ) sin ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) , ∀ italic_k ≠ italic_i ,
qiθi=k=1kiNukui(B(i,k)sin(θiθk)+G(i,k)cos(θiθk)),subscript𝑞𝑖subscript𝜃𝑖superscriptsubscript𝑘1𝑘𝑖𝑁subscript𝑢𝑘subscript𝑢𝑖𝐵𝑖𝑘sinsubscript𝜃𝑖subscript𝜃𝑘𝐺𝑖𝑘cossubscript𝜃𝑖subscript𝜃𝑘\displaystyle\frac{\partial q_{i}}{\partial\theta_{i}}=\sum_{\begin{subarray}{% c}k=1\\ k\neq i\end{subarray}}^{N}u_{k}u_{i}\left(B(i,k)\text{sin}(\theta_{i}\!-\!% \theta_{k})+G(i,k)\text{cos}(\theta_{i}\!-\!\theta_{k})\right),divide start_ARG ∂ italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = ∑ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_k = 1 end_CELL end_ROW start_ROW start_CELL italic_k ≠ italic_i end_CELL end_ROW end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_B ( italic_i , italic_k ) sin ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_G ( italic_i , italic_k ) cos ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) ,
qiθk=ukui(G(i,k)cos(θiθk)+B(i,k)sin(θiθk)),ki.formulae-sequencesubscript𝑞𝑖subscript𝜃𝑘subscript𝑢𝑘subscript𝑢𝑖𝐺𝑖𝑘cossubscript𝜃𝑖subscript𝜃𝑘𝐵𝑖𝑘sinsubscript𝜃𝑖subscript𝜃𝑘for-all𝑘𝑖\displaystyle\frac{\partial q_{i}}{\partial\theta_{k}}=-u_{k}u_{i}\left(G(i,k)% \text{cos}(\theta_{i}\!-\!\theta_{k})+B(i,k)\text{sin}(\theta_{i}\!-\!\theta_{% k})\right),\forall k\!\neq\!i.divide start_ARG ∂ italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG = - italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_G ( italic_i , italic_k ) cos ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_B ( italic_i , italic_k ) sin ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) , ∀ italic_k ≠ italic_i .

Hence, f(x)2N𝑓𝑥superscript2𝑁\nabla f(x)\in\mathbb{R}^{2N}∇ italic_f ( italic_x ) ∈ blackboard_R start_POSTSUPERSCRIPT 2 italic_N end_POSTSUPERSCRIPT and we have:

f(x)=i=1Npi(x)x(pi(x)sR)+qi(x)x(qi(x)sI),𝑓𝑥superscriptsubscript𝑖1𝑁subscript𝑝𝑖𝑥𝑥subscript𝑝𝑖𝑥subscript𝑠𝑅subscript𝑞𝑖𝑥𝑥subscript𝑞𝑖𝑥subscript𝑠𝐼\displaystyle\nabla f(x)=\sum_{i=1}^{N}\frac{\partial p_{i}(x)}{\partial x}(p_% {i}(x)-s_{R})+\frac{\partial q_{i}(x)}{\partial x}(q_{i}(x)-s_{I}),∇ italic_f ( italic_x ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT divide start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG ∂ italic_x end_ARG ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) - italic_s start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) + divide start_ARG ∂ italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG ∂ italic_x end_ARG ( italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) - italic_s start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ) ,

where pi(x)x=(pi(x)u1;;pi(x)uN;pi(x)θ1;;pi(x)θN)subscript𝑝𝑖𝑥𝑥subscript𝑝𝑖𝑥subscript𝑢1subscript𝑝𝑖𝑥subscript𝑢𝑁subscript𝑝𝑖𝑥subscript𝜃1subscript𝑝𝑖𝑥subscript𝜃𝑁\frac{\partial p_{i}(x)}{\partial x}=\left(\frac{\partial p_{i}(x)}{\partial u% _{1}};\cdots;\frac{\partial p_{i}(x)}{\partial u_{N}};\frac{\partial p_{i}(x)}% {\partial\theta_{1}};\cdots;\frac{\partial p_{i}(x)}{\partial\theta_{N}}\right)divide start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG ∂ italic_x end_ARG = ( divide start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG ∂ italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ; ⋯ ; divide start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG ∂ italic_u start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT end_ARG ; divide start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG ∂ italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ; ⋯ ; divide start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG ∂ italic_θ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT end_ARG ) and qi(x)x=(qi(x)u1;;qi(x)uN;qi(x)θ1;;qi(x)θN)subscript𝑞𝑖𝑥𝑥subscript𝑞𝑖𝑥subscript𝑢1subscript𝑞𝑖𝑥subscript𝑢𝑁subscript𝑞𝑖𝑥subscript𝜃1subscript𝑞𝑖𝑥subscript𝜃𝑁\frac{\partial q_{i}(x)}{\partial x}=\left(\frac{\partial q_{i}(x)}{\partial u% _{1}};\cdots;\frac{\partial q_{i}(x)}{\partial u_{N}};\frac{\partial q_{i}(x)}% {\partial\theta_{1}};\cdots;\frac{\partial q_{i}(x)}{\partial\theta_{N}}\right)divide start_ARG ∂ italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG ∂ italic_x end_ARG = ( divide start_ARG ∂ italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG ∂ italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ; ⋯ ; divide start_ARG ∂ italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG ∂ italic_u start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT end_ARG ; divide start_ARG ∂ italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG ∂ italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ; ⋯ ; divide start_ARG ∂ italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG ∂ italic_θ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT end_ARG ) for i=1:N:𝑖1𝑁i=1:Nitalic_i = 1 : italic_N. Note that the Jacobian F𝐹\nabla F∇ italic_F may be ill-conditioned, but the objective function f𝑓fitalic_f (may) satisfy KL inequality.

IV-A Numerical simulations

In this subsection, we demonstrate the efficiency of the modified projected Gauss-Newton (MPG-N) method using several IEEE bus test cases from [10] (IEEE 14 bus, IEEE 39 bus, IEEE 57 bus and IEEE 118 bus). We chose an optimal point xCsuperscript𝑥Cx^{*}\in\textbf{C}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ C, then we generate sR=p(x)subscript𝑠𝑅𝑝superscript𝑥s_{R}=p(x^{*})italic_s start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT = italic_p ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) and sI=q(x)subscript𝑠𝐼𝑞superscript𝑥s_{I}=q(x^{*})italic_s start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT = italic_q ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) (see also [6]). We apply MPG-N method on problem (15) and PGD method on problem (2), where F𝐹Fitalic_F is given in (14), and test whether the algorithms can reach xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT from a random feasible starting point. The stopping criterion for both algorithms is F(xk)103norm𝐹subscript𝑥𝑘superscript103\|F(x_{k})\|\leq 10^{-3}∥ italic_F ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∥ ≤ 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT. The results are given in Figure (2), where we plot the evolution of the function value F(xk)norm𝐹subscript𝑥𝑘\|F(x_{k})\|∥ italic_F ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∥ along iterations. From this figure one can observe that in the beginning, the PGD performs better than the MPG-N method. However, MPG-N method requires small number of iterations (even 5 times less) than the PGD in order to achieving the desired accuracy.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 2: Comparison between MPG-N and PGD methods in terms of F(x)norm𝐹𝑥\|F(x)\|∥ italic_F ( italic_x ) ∥ along iterations on several IEEE bus systems.

V Conclusion

In this paper, we have proposed a modified projected Gauss-Newton (MPG-N) method for solving constrained least-squares problems. Under mild assumptions, we have proved global convergence results for the iterates. More precisely, we have proved that any limit point of the sequence generated by MPG-N algorithm is a stationary point and under the KL property, we have derived convergence rates in function values depending on the KL parameter. Finally, we have considered solving a power flow problem and compared the performance of our scheme with the projected gradient method, showing the efficiency of the proposed method on several IEEE bus test cases.

ACKNOWLEDGMENT

The research leading to these results has received funding from: ITN-ETN project TraDE-OPT funded by the European Union’s Horizon 2020 Research and Innovation Programme under the Marie Skolodowska-Curie grant agreement No. 861137; NO Grants 2014-2021, RO-NO-2019-0184, under project ELO-Hyp, contract no. 24/2020; UEFISCDI PN-III-P4-PCE-2021-0720, under project L2O-MOC, nr. 70/2022.

References

  • [1] S. Abhyankar, Q. Cui, and A. J. Flueck, Fast power flow analysis using a hybrid current-power balance formulation in rectangular coordinates, in IEEE PES TD Conference and Exposition, 1–5, 2014.
  • [2] J. Bolte, A. Daniilidis, A. Lewis, and M. Shiota, Clarke subgradients of stratifiable functions, SIAM Journal on Optimization, 18(2): 556–572, 2007.
  • [3] L. M. Braz, C. A. Castro, and C. Murati, A critical evaluation of step size optimization based load flow methods, IEEE Transactions on Power Systems, 15(1): 202–207, 2000.
  • [4] Y. Chen and C. Shen, A Jacobian-free newton method with adaptive preconditioner and its application for power flow calculations, IEEE Transactions on Power Systems, 21(3): 1096–1103, 2006.
  • [5] A.R. Conn, Ni.I.M. Gould, and Ph. L. Toint, Trust region methods, Society for Industrial and Applied Mathematics, 2000.
  • [6] N. Costilla-Enriquez, Y. Weng, and B. Zhang,Combining Newton-Raphson and Stochastic Gradient Descent for Power Flow Analysis. IEEE Trans. Power Syst, 36, 514–517, 2021.
  • [7] D. Drusvyatskiy and C. Paquette, Efficiency of minimizing compositions of convex functions and smooth maps, Mathematical Programming, 178(1-2): 503–558, 2019.
  • [8] H. O. Hartley, The modified Gauss-Newton method for the fitting of non-linear regression functions by least squares, Technometrics 3(2): 269–280, 1961.
  • [9] A. Hauswirth, S. Bolognani, G.Hug and F.Dorfler, Projected gradient descent on Riemannian manifolds with applications to online power system optimization, 54thsuperscript54𝑡54^{th}54 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 2016.
  • [10] Illinois Center for a Smarter Electric Grid. Available: http://publish.illinois.edu/smartergrid/, 2013.
  • [11] R.I. Jennrich, and S. M. Robinson, A Newton-Raphson algorithm for maximum likelihood factor analysis, Psychometrika 34(1): 111–123, 1969.
  • [12] H. Le Nguyen, Newton-Raphson method in complex form . IEEE Transactions on Power Systems, 12(3), 1997.
  • [13] F. Milano, Continuous Newton’s method for power flow analysis, IEEE Transactions on Power Systems, 24(1): 50–57, 2009.
  • [14] B. Mordukhovich, Variational analysis and generalized differentiation. Basic theory, Springer, 2006.
  • [15] Y. Nabou, I. Necoara, Efficiency of higher-order algorithms for minimizing general composite functions, arXiv preprint :2203.13367, 2022.
  • [16] Yu. Nesterov and B.T. Polyak, Cubic regularization of Newton method and its global performance, Mathematical Programming, 108(1): 177–205, 2006.
  • [17] Y. Nesterov, Modified Gauss-Newton scheme with worst case guarantees for global performance, Optimization Methods and Software, 22(3): 469–483, 2007.
  • [18] Y. Nesterov, Gradient methods for minimizing composite functions, Math. Program. 140, 125–161, 2013.
  • [19] R.T. Rockafellar and R. Wets, Variational Analysis, Springer, 1998.
  • [20] R. Salgado, A. Brameller, and P. Aitchison, Optimal power flow solutions using the gradient projection method. Part 1: Theoretical basis, IEE Proceedings C (Generation, Transmission and Distribution). Vol. 137. No. 6. IET Digital Library, 1990.
  • [21] R. Salgado, A. Brameller, and P. Aitchison,Optimal power flow solutions using the gradient projection method. Part 2: Modelling of the power system equations, IEE Proceedings C (Generation, Transmission and Distribution). Vol. 137. No. 6. IET Digital Library, 1990.
  • [22] B. Stott, Review of load-flow calculation methods, Proceedings of the IEEE, 62(7): 916–929, 1974.