Modified projected Gauss-Newton method for constrained nonlinear least-squares: application to power flow analysis

Yassine Nabou¹, Lucian Toma² and Ion Necoara^1,3 ¹Automatic Control and Systems Engineering Department, University Politehnica Bucharest, 060042 Bucharest, Romania. [email protected]; [email protected]²Electrical Power Systems Department, University Politehnica Bucharest, 060042 Bucharest, Romania. [email protected]. ³Gheorghe Mihoc-Caius Iacob Institute of Mathematical Statistics and Applied Mathematics of the Romanian Academy, 050711 Bucharest, Romania.

Abstract

In this paper, we consider a modified projected Gauss-Newton method for solving constrained nonlinear least-squares problems. We assume that the functional constraints are smooth and the the other constraints are represented by a simple closed convex set. We formulate the nonlinear least-squares problem as an optimization problem using the Euclidean norm as a merit function. In our method, at each iteration we linearize the functional constraints inside the merit function at the current point and add a quadratic regularization, yielding a strongly convex subproblem that is easy to solve, whose solution is the next iterate. We present global convergence guarantees for the proposed method under mild assumptions. In particular, we prove stationary point convergence guarantees and under Kurdyka-Lojasiewicz (KL) property for the objective function we derive convergence rates depending on the KL parameter. Finally, we show the efficiency of this method on the power flow analysis problem using several IEEE bus test cases.

I INTRODUCTION

In many areas of engineering, such as maximum likelihood estimations, non-linear data fitting, parameter estimation or power flow analysis, one finds applications that can be recast as nonlinear least-squares problems of the form [11, 8, 6]:

		$\displaystyle\min\\|F(x)\\|$		(1)
		$\displaystyle s.t.\;x\in\mathbf{C}\subseteq\mathbb{R}^{n},$

where $\mathbf{C}$ is a closed convex set and $F=(F_{1},\cdots,F_{m})$ , $F_{i}:\mathbb{R}^{n}\to\mathbb{R}$ for $i=1:m$ , are nonlinear differentiable functions. When $\mathbf{C}=\mathbb{R}^{n}$ and $m=n$ , problem (1) is equivalent to a squared system of nonlinear equations. Hence several algorithms were proposed for solving this problem, among these algorithms the most popular is Newton-Raphson method (NR) [22]. In Newton-Raphson method one uses the inverse of the Jacobian matrix in order to update the iterations, i.e., the iterations are of the following form:

\displaystyle x^{+}=x-\nabla F(x)^{-1}F(x),

where $x$ is the current iteration and $\nabla F(x)$ is the Jacobian matrix of $F(x)$ . Although NR has fast convergence, it has several drawbacks. First of all, it can happen that at current test point the Jacobian is degenerate; in this case the method is not well-defined. Secondly, this convergence is not guaranteed when the initial point $x_{0}$ is far from the optimum [13]. Many approaches have been proposed in order to deal with these challenges, e.g., improving the starting point [3], or using different approximations for the Jacobian [4, 1]. In [17], Nesterov proposed a modified Gauss-Newton scheme (M-GN) for solving unconstrained nonlinear least-squares problems. The M-GN method constructs a convex model by linearizing the nonlinear function $F$ inside a sharp merit function and adding a quadratic regularization term, i.e.:

\displaystyle x^{+}=\arg\min\limits_{y\in\mathbb{R}^{n}}\|F(x)+\nabla F(x)(y-x% )\|+\frac{M}{2}\|y-x\|^{2}.

When $M=0$ , we recover the NR method described above. In [17] it was proved that, under a nondegenerate assumption (i.e., $\sigma_{\text{min}}(\nabla F(x))>0$ for all $x$ in the level set of $\|F(x_{0})\|$ , where $x_{0}$ is the starting point and $\sigma_{\text{min}}$ denotes the smallest singular value), this scheme has global convergence. Moreover, the solution of each subproblem can be computed with a standard convex optimization solver. Further, problem (1) is equivalent to the following composite optimization problem:

\displaystyle\min\limits_{x\in\mathbb{R}^{n}}\|F(x)\|^{2}+{\color[rgb]{0,0,0}I% }_{\mathbf{C}}(x),

(2)

where $I_{\mathbf{C}}$ is the indicator function of the convex set $\mathbf{C}$ . Note that using only the norm $\|\cdot\|$ as the merit function is beneficial than using $\|\cdot\|^{2}$ , since in the latest case the condition number is doubled. Another possible algorithm for solving this problem is the Projected Gradient Descent (PGD) [18, 9, 20, 21]. The standard PGD algorithm is given by:

\displaystyle x^{+}=\Pi_{\mathbf{C}}\left(x-\alpha\nabla F(x)F(x)\right),

where $\Pi_{\mathbf{C}}$ is the projection operator (see Section II) and $\alpha$ is a step size. PGD descent is a simple method easy to implement, but the main drawback is that it has slow convergence.

A natural questions arises whether we can prove global convergence of MG-N method without assuming the nondegeneracy assumption on the Jacobian $\nabla F(x)$ , i.e., without assuming $\sigma_{\text{min}}(\nabla F(x))>0$ for all $x$ in the level set of $\|F(x_{0})\|$ (see (5)). Such a condition is conservative and it is not always satisfied in practice. In this paper we answer positively to this question, i.e., we consider a Modified Projected Gauss-Newton method (MPG-N) for solving problem (1), where $\mathbf{C}$ is a simple closed convex set. At each iteration, MPG-N aims to solve the following strongly convex subproblem:

\displaystyle x_{k+1}\!=\!\arg\min\limits_{x\in\mathbf{C}}\|F(x_{k})\!+\!% \nabla F(x_{k})(x\!-\!x_{k})\|\!+\!\frac{M}{2}\|x-x_{k}\|^{2},

(3)

which is a slightly modified version of [17] as it considers constraints $x\in\mathbf{C}$ . We prove, under mild assumptions, that this scheme can achieve global convergence without any assumption on the Jacobian matrix. More precisely, we prove that any limit point of the sequence generated by MPG-N is a stationary point and under the Kurdyka-Lojasiewicz (KL) property, we derive convergence rates in function value depending on the KL parameter. Finally, we consider solving a power flow analysis problem, with functional constraints which do not usually satisfy the non-degenerate assumption, while it satisfies the KL property. We compare the performance of such a scheme with the projected gradient scheme and demonstrate its efficiency of the proposed method on several IEEE bus test cases.

Content. The rest of the paper is organized as follows: Section II provides some notations and preliminaries, Section III presents the new algorithm and the convergence results, Section IV describes the power flow analysis problem and numerical results on several IEEE bus test cases.

II Notations and preliminaries

We denote a finite-dimensional real vector space with $\mathbb{E}$ and by $\mathbb{E}^{*}$ its dual space composed of linear functions on $\mathbb{E}$ . Using a self-adjoint positive-definite operator $D:\mathbb{E}\rightarrow\mathbb{E}^{*}$ (notation $D=D^{*}\succ 0$ ), we can endow these spaces with conjugate Euclidean norms:

\displaystyle\lVert x\rVert=\langle Dx,x\rangle{\color[rgb]{0,0,0}{}^{\frac{1}% {2}}},\quad x\in\mathbb{E},\qquad\lVert g\rVert_{*}=\langle g,D^{-1}g\rangle^{% \frac{1}{2}},\quad g\in\mathbb{E}^{*}.

For simplicity, we consider in the following $\mathbb{E}=\mathbb{R}^{n}$ and $D$ is the identity matrix. Let $F=(F_{1},\cdots,F_{m})$ , where $F_{i}$ ’s, $i=1:m$ , are differentiable functions and the Jacobian is Lipschitz continuous, i.e.:

\displaystyle\|\nabla F(x)-\nabla F(y)\|\leq L_{F}\|x-y\|\;\;\forall x,y\in% \mathbb{R}^{n}.

It follows that [17]:

\displaystyle\|F(x)\|\!\!-\!\!\|F(y)\!+\!\nabla F(y)(x\!\!-\!\!y)\|\leq\frac{L% _{F}}{2}\|x\!\!-\!\!y\|^{2}\;\;\forall x,y\!\in\!\mathbb{R}^{n}.

(4)

Let $h$ be proper lower semicontinuous function and $\mu>0$ . Then, the proximal operator with respect to $h$ is:

\displaystyle\text{prox}_{\mu h}(x)=\arg\min_{y}h(y)+\frac{\mu}{2}\|y-x\|^{2},

and the Moreau envelop is defined as:

\displaystyle h_{\mu}(x)=\min_{y}\;h(y)+\frac{\mu}{2}\|y-x\|^{2}.

When $h$ is the indicator function of a convex set $C$ , $I_{C}$ , then the proximal operator is the projection:

\displaystyle\text{prox}_{\mu{\color[rgb]{0,0,0}I}_{\mathbf{C}}}(x)

\displaystyle=\Pi_{\textbf{C}}(x)=\arg\min_{y\in C}\|y-x\|^{2}.

We say that $h$ is $\mu$ -weakly convex if the function

x\mapsto h(x)+\frac{\mu}{2}\|x\|^{2}

is convex. The level set of $h$ at $x_{0}$ is defined:

\displaystyle\mathcal{L}(h(x_{0})):=\{x\in\mathbb{R}^{n}:h(x)\leq h(x_{0})\}.

(5)

Next, we provide few definitions and properties concerning subdifferential calculs (see also [14, 19]).

Definition 1

(Subdifferential): Let $f:\mathbb{R}^{n}\to\bar{\mathbb{R}}$ be a proper lower semicontinuous function. For a given $x\in\text{dom}\;f$ , the Frechet subdifferential of $f$ at $x$ , written $\widehat{\partial}f(x)$ , is the set of all vectors $g_{x}\in\mathbb{R}^{n}$ satisfying:

\lim\limits_{x\neq y,y\to x}\frac{f(y)-f(x)-\langle g_{x},y-x\rangle}{\lVert x% -y\rVert}\geq 0.

When $x\notin\text{dom}\;f$ , we set $\widehat{\partial}f(x)=\emptyset$ . The limiting-subdifferential, or simply the subdifferential, of $f$ at $x\in\text{dom}\,f$ , written $\partial f(x)$ , is defined through the following closure process [14]:

	$\displaystyle\partial f(x):=$	$\displaystyle\left\{g_{x}\in\mathbb{E}^{*}\!\!:\exists x^{k}\to x\;\text{with}% \;f(x^{k})\to f(x)\;\right.$
		$\displaystyle\quad\left.\text{and}\;\exists g_{x}^{k}\in\widehat{\partial}f(x^% {k})\;\;\text{with}\;\;g_{x}^{k}\to g_{x}\right\}.$

Note that we have $\widehat{\partial}f(x)\subseteq\partial f(x)$ for each $x\in\text{dom}\,f$ . In the previous inclusion, the first set is closed and convex while the second one is closed, see e.g., [19](Theorem 8.6). For any $x\in\text{dom}\;f$ let us define:

\displaystyle{\color[rgb]{0,0,0}S_{f}(x)}=\text{dist}\big{(}0,\partial f(x)% \big{)}:=\inf\limits_{g_{x}\in\partial f(x)}\lVert g_{x}\rVert.

If $\partial f(x)=\emptyset$ , we set $S_{f}(x)=\infty$ . Let us also recall the definition of a function satisfying the Kurdyka-Lojasiewicz (KL) property (see [2] for more details).

Definition 2

A proper lower semicontinuous function $f:\mathbb{R}^{n}\rightarrow\bar{\mathbb{R}}$ satisfies Kurdyka-Lojasiewicz (KL) property on the compact set $\Omega\subseteq\text{dom}\;f$ on which $f$ takes a constant value $f_{*}$ if there exist $\delta,\epsilon,q>0$ such that one has:

		$\displaystyle f(x)-f_{*}\leq\sigma_{q}\;{\color[rgb]{0,0,0}S_{f}(x)^{q}}$		(6)
		$\displaystyle\forall x\!:\;\text{dist}(x,\Omega)\leq\delta,\;f_{}<f(x)<f_{}+\epsilon.$

Note that the relevant aspect of the KL property is when $\Omega$ is a subset of critical points for $f$ , i.e. $\Omega\subseteq\{x:0\in\partial f(x)\}$ , since it is easy to establish the KL property when $\Omega$ is not related to critical points. The KL property holds for a large class of functions including semi-algebraic functions (e.g., real polynomial functions), vector or matrix (semi)norms (e.g., $\|\cdot\|_{p}$ with $p\geq 0$ rational number), trigonometric functions, logarithm functions, exponential functions and uniformly convex functions, see [2] for a comprehensive list.

III Modified Projected Gauss-Newton method

In this section, we present the Modified Projected Gauss-Newton (MPG-N) method and then derive convergence results. We recall the problem of our interest is:

\displaystyle\min\limits_{x\in\mathbb{R}^{n}}f(x):=\|F(x)\|+I_{\mathbf{C}}(x).

(7)

We consider the following assumption:

Assumption 1

$F$ is differentiable and the Jacobian is Lipschitz continuous:

\displaystyle\|\nabla F(x)-\nabla F(y)\|\leq L_{F}\|x-y\|,\quad\forall x,y\in% \mathbf{C}.

2.

Problem (3) has solution, i.e., there exist $x^{*}\in\mathbf{C}$ such that $f(x^{*})>-\infty$ .

An immediate consequence of (4) is:

\displaystyle f(x)\leq\|F(y)+\nabla F(y)(x-y)\|+\frac{L_{F}}{2}\|x-y\|^{2}% \quad\forall x,y\in\mathbf{C}.

Then, for $M>0$ , we define the modified projected Gauss-Newton iterate at a point $x\in\mathbf{C}$ as follows:

	$\displaystyle T_{M}(x)$	$\displaystyle=\arg\min_{y\in\mathbf{C}}\Psi_{M}(y;x)$		(8)
		$\displaystyle:=\arg\min_{y\in\mathbf{C}}\\|F(x)+\nabla F(x)(y-x)\\|+\frac{M}{2}% \\|y-x\\|^{2}.$

Note that this subproblem is strongly convex, hence $T_{M}(x)$ is well defined and unique. Finally, the modified projected Gauss-Newton algorithm is as follows:

MPG-N algorithm
Chose $x_{0}\in\mathbf{C}$ and ${\color[rgb]{0,0,0}L_{0},\delta>0}$ . For $k\geq 0$ do: Find $L_{0}\leq M_{k}\leq 2L_{F}$ such that: $\displaystyle\frac{\delta}{2}\|T_{M_{k}}(x_{k})-x_{k}\|^{2}\leq\Psi_{M_{k}}% \left(T_{M_{k}}(x_{k});x_{k}\right)\!-\!f\left(T_{M_{k}}(x_{k})\right)$ (9) Update $x_{k+1}=T_{M_{k}}(x_{k})$ .

The first step of MPG-N algorithm consists of finding a constant $M_{k}>0$ such that inequality (9) holds. If the constant $L_{F}$ is known, we can take $M_{k}=L_{F}+\delta$ . Otherwise, we can apply the following line search procedure [16]:

	$\displaystyle\textbf{While}\;\eqref{eq:alg_desc}\;\text{is not satisfied}\;% \textbf{do}\;\;M_{k}=2M_{k}\;$
	$\displaystyle M_{k+1}=\text{max}\left(\frac{M_{k}}{2},L_{0}\right).$

The next lemma shows that this process is well defined.

Lemma 1

Let Assumption 1 hold. At $k$ th iteration of MPG-N algorithm, if $M_{k}-L_{F}\geq\delta$ , then inequality (9) holds.

Proof:

We have from inequality (4) that:

	$\displaystyle\frac{M_{k}-L_{F}}{2}\\|T_{M_{k}}(x_{k})-x_{k}\\|^{2}+f(T_{M_{k}}(x% _{k}))$
	$\displaystyle\leq\\|F(x_{k})+\nabla F(x_{k})(T_{M_{k}}(x_{k})-x_{k})\\|+\frac{M_% {k}}{2}\\|T_{M_{k}}(x_{k})-x_{k}\\|^{2}$
	$\displaystyle=\Psi_{M_{k}}\left(T_{M_{k}}(x_{k});x_{k}\right).$

Since $M_{k}-L_{F}\geq\delta$ , it follows immediately that:

\displaystyle\frac{\delta}{2}\|T_{M_{k}}(x_{k})-x_{k}\|^{2}\leq\Psi_{M_{k}}% \left(T_{M_{k}}(x_{k});x_{k}\right)-f(T_{M_{k}}(x_{k})).

Hence, this is the statement of the lemma. ∎

Note that Lemma 1 ensures that (9) always holds, provided that $M_{k}\geq L_{f}+\delta$ . However, in practice, using the line search procedure allows us to work with $M_{k}$ small (i.e., $M_{k}\leq L_{F}$ ) such that condition (9) holds. Next, let us discuss the solution of the subproblem (8). Following [18], we have:

	$\displaystyle\min_{y\in\mathbf{C}}\\|F(x)+\nabla F(x)(y-x)\\|+\frac{M}{2}\\|y-x\\|% ^{2}$
	$\displaystyle=\min_{y\in\mathbf{C}}\max_{\\|s\\|\leq 1}\langle s,F(x)+\nabla F(x% )(y-x)\rangle+\frac{M}{2}\\|y-x\\|^{2}$
	$\displaystyle=\max_{\\|s\\|\leq 1}\min_{y\in\mathbf{C}}\langle s,F(x)+\nabla F(x% )(y-x)\rangle+\frac{M}{2}\\|y-x\\|^{2}$
	$\displaystyle=\max_{\\|s\\|\leq 1}\min_{y\in\mathbf{C}}\langle\nabla F(x)^{T}s,(% y-x)\rangle+\frac{M}{2}\\|y-x\\|^{2}+\langle s,F(x)\rangle$
	$\displaystyle=\max_{\\|s\\|\leq 1}\min_{y\in\mathbf{C}}\frac{M}{2}\\|y-x+\frac{1}% {M}\nabla F(x)^{T}s\\|^{2}-\frac{1}{2M}\\|F(x)^{T}s\\|^{2}$
	$\displaystyle\quad+\langle s,F(x)\rangle$
	$\displaystyle=\max_{\\|s\\|\leq 1}\frac{M}{2}\\|\Pi_{\textbf{C}}\left(x-\frac{1}{% M}\nabla F(x)^{T}s\right)-x+\frac{1}{M}\nabla F(x)^{T}s\\|^{2}$
	$\displaystyle\quad-\frac{1}{2M}\\|F(x)^{T}s\\|^{2}+\langle s,F(x)\rangle,$

which can be solved with standard convex optimization tools, such as trust-region methods [5].

III-A Convergence analysis

In this section we derive convergence results for MPG-N algorithm. First, we can prove the following descent:

Lemma 2

Let Assumption 1 hold. Let $(x_{k})_{k\geq 0}$ be generated by MPG-N algorithm. Then, we have:

Sequence $(f(x_{k}))_{k\geq 0}$ is nonincreasing and satisfies:

\displaystyle\frac{\delta}{2}\|x_{k+1}-x_{k}\|^{2}\leq f(x_{k})-f(x_{k+1}).

(10)

The sequence $(x_{k})_{k\geq 0}$ satisfies:

\displaystyle\sum_{k=1}^{\infty}\|x_{k+1}-x_{k}\|^{2}<\infty,\quad\lim_{k\to% \infty}\|x_{k+1}-x_{k}\|^{2}=0.

Proof:

We have:

\displaystyle\Psi_{M_{k}}\left(T_{M_{k}}(x_{k});x_{k}\right)\leq\min\limits_{x% \in\mathbf{C}}\Psi_{M_{k}}\left(x;x_{k}\right)\leq\Psi_{M_{k}}\left(x_{k};x_{k% }\right)=f(x_{k}).

Then, combining this inequality with equation (9) we get the first statement. Further, summing up the inequality (9) and using that $f$ is bounded from below by $f^{*}$ , we get:

\displaystyle\sum_{k=1}^{N}\frac{\delta}{2}\|x_{k+1}-x_{k}\|^{2}\leq f(x_{0})-% f(x_{N})\leq f(x_{0})-f^{*},

and the second statement follows. ∎

In [7, 15], the authors prove that for the composite problem (3), the quantity $\text{dist}(0,\partial f(x_{k+1}))$ does not always tend to zero in the limit, even if $\|x_{k+1}-x_{k}\|$ goes to zero. Thus, we must look elsewhere for a connection between $\text{dist}(0,\partial f(\cdot))$ and $\|x_{k+1}-x_{k}\|$ . Let us start with the following observation, whose proof can be found in Lemma 4.2 [7]: the function $f(x):=\|F(x)\|+{\color[rgb]{0,0,0}I}_{\mathbf{C}}(x)$ is $L_{F}$ -weakly convex. Weak convexity of $f$ has an immediate consequence on the Moreau envelope, denoted $f_{\mu}$ :

Lemma 3

(Lemma 4.3 [7]) Let $\mu>L_{F}$ . Then, the proximal map $\text{prox}_{\mu f}$ is well-defined and single-valued. The Moreau envelope $f_{\mu}$ is smooth with gradient given by:

\displaystyle\nabla f_{\mu}(x)=\mu(x-\text{prox}_{\mu f}(x)).

Further, we have the following lemma whose prove is similar to the proof of Lemma $5$ in [15].

Lemma 4

Let Assumption 1 holds. Let $(x_{k})_{k\geq 0}$ be generated by MPG-N method and consider $y_{k+1}=\text{prox}_{\mu f}(x_{k})$ , where $\frac{1}{\mu}\in(0,\frac{1}{3L_{F}})$ . Then, we have the following relations:

1.

$\|y_{k+1}-x_{k}\|^{2}\leq\frac{\mu}{\mu-3L_{F}}\|x_{k+1}-x_{k}\|^{2}$
2.

$\text{dist}(0,\partial f(y_{k+1}))\leq\mu\|y_{k+1}-x_{k}\|$ .

Proof:

Let us prove the first statement. Since $f$ is weakly convex, then $y_{k+1}$ is well-defined and unique. Thus:

\displaystyle f(y_{k+1})+\frac{\mu}{2}\|y_{k+1}-x_{k}\|^{2}\leq f(x_{k+1})+% \frac{\mu}{2}\|x_{k+1}-x_{k}\|^{2}.

(11)

Further, from the definition of $x_{k+1}$ , we have:

	$\displaystyle f(x_{k+1})$	$\displaystyle\stackrel{{\scriptstyle\eqref{eq:alg_desc}}}{{\leq}}\\|F(x_{k})\!+% \!\nabla F(x_{k})(x_{k+1}\!\!-\!\!x_{k})\\|+\frac{M_{k}}{2}\\|x_{k+1}\!\!-\!\!x_% {k}\\|^{2}$
		$\displaystyle\stackrel{{\scriptstyle\eqref{eq:iter}}}{{\leq}}\min_{x\in\mathbf% {C}}\\|F(x_{k})+\nabla F(x_{k})(x-x_{k})\\|+\frac{M_{k}}{2}\\|x-x_{k}\\|^{2}$
		$\displaystyle\stackrel{{\scriptstyle{\color[rgb]{0,0,0}\eqref{eq:alg_desc}}}}{% {\leq}}\min_{x\in\mathbf{C}}f(x)+\frac{M_{k}+L_{F}}{2}\\|x-x_{k}\\|^{2}$
		$\displaystyle\leq f(y_{k+1})+\frac{M_{k}+L_{F}}{2}\\|y_{k+1}-x_{k}\\|^{2},$

where the last inequality follows by taking $x=y_{k+1}$ . Thus, we have:

\displaystyle f(x_{k+1})

\displaystyle\leq f(y_{k+1})+\frac{3L_{F}}{2}\|y_{k+1}-x_{k}\|^{2}.

(12)

Finally, combining this inequality with (11), we get:

\displaystyle\|y_{k+1}-x_{k}\|^{2}\leq\frac{\mu}{\mu-3L_{F}}\|x_{k+1}-x_{k}\|^% {2},

which proves the first statement. Further, from the optimality conditions of $y_{k+1}$ , we get:

\displaystyle-\mu(y_{k+1}-x_{k})\in\partial f(y_{k+1}).

(13)

Thus, the second statement follows. ∎

Using the strict descent and Lemma 4, we can conclude the following global convergence rate:

Theorem 1

Let the assumptions of Lemma 4 hold. Then:

\displaystyle\min\limits_{j=1:k}\text{dist}(0,\partial f(y_{j}))\leq\mathcal{O% }\left(\frac{1}{k^{1/2}}\right).

Moreover, any limit point of the sequence $(x_{k})_{k\geq 0}$ is a stationary point of problem (7).

Proof:

From Lemma 4, we have:

\displaystyle\text{dist}(0,f(y_{k+1}))^{2}

\displaystyle\leq\frac{\mu^{3}}{\mu-3L_{f}}\|x_{k+1}-x_{k}\|^{2}.

Further, combining this inequality with (10), we get:

\displaystyle\text{dist}(0,f(y_{k+1}))^{2}\leq\frac{6\mu^{3}}{\delta(\mu-3L_{F% })}f(x_{k})-f(x_{k+1}).

Summing up this inequality and taking the minimum we get:

\displaystyle\min\limits_{j=0:k}\text{dist}(0,f(y_{j+1}))\leq\sqrt{\frac{6\mu^% {3}}{\delta(\mu-3L_{F})}}\frac{1}{k^{1/2}}.

which prove our first statement. Further, let $x^{*}$ be a limit point of $(x_{k})_{k\geq 0}$ , then one can notice that it is also a limit point of the sequence $(y_{k})_{k\geq 0}$ . This means that there exist a subsequence $(y_{k_{j}})_{j\geq 0}$ such that $y_{k_{j}}\to x^{*}$ . Since $F$ is continuous, then $f(y_{k_{j}})\to f(x^{*})$ . Note that we have $\mu(y_{k_{j}}-x_{k_{j}-1})\in\partial f(y_{k_{j}})$ and $(y_{k_{j}}-x_{k_{j}-1})\to 0$ . Then, we conclude from the definition of the generalized subgradient that $0\in\partial f(x^{*})$ and hence $x^{*}$ is a stationary point. ∎

III-B Better rates under KL

In [17], the authors impose a nondegeneracy assumption on the Jacobian, that is, $\sigma_{\text{min}}(\nabla F(x))>0$ for all $x$ in the level set of $\|F(x_{0})\|$ in order to prove global convergence rate for MG-N method. Such a condition is not always valid in practice. In this section, we derive improved convergence rates for MPG-N method provided that the objective function satisfies the KL property. In general, the KL condition is less conservative than the nondegeneracy condition (see Section IV). Let us denote the set of limit points of $(x_{k})_{k\geq 0}$ by:

	$\displaystyle\Omega(x_{0})=$	$\displaystyle\{\bar{x}\in\mathbb{E}:\exists\text{ an increasing sequence of % integers }$
		$\displaystyle(k_{t})_{t\geq 0},\text{ such that }x_{k_{t}}\to\bar{x}\text{ as % }t\to\infty\}.$

We have the following convergence rate:

Theorem 2

Let the assumptions of Lemma 4 hold. Additionally, assume that $f$ satisfy the KL property (6) on $\Omega(x_{0})$ . Then, the following convergence rates hold for the sequence $(x_{k})_{k\geq 0}$ generated by MPG-N algorithm in function values:

$\bullet$

If $q\geq 2$ , then $f(x_{k})$ converge to $f_{*}$ linearly for $k$ sufficiently large.
$\bullet$

If $q<2$ , then $f(x_{k})$ converge to $f_{*}$ at sublinear rate of order $\mathcal{O}\left(\frac{1}{k^{\frac{q}{2-q}}}\right)$ for $k$ sufficiently large.

Proof:

We have:

	$\displaystyle f(x_{k+1})-f_{*}$	$\displaystyle\stackrel{{\scriptstyle{\color[rgb]{0,0,0}\eqref{eq:01}}}}{{\leq}% }f(y_{k+1})-f_{*}+\frac{3L_{F}}{2}\\|y_{k+1}-x_{k}\\|^{2}$
		$\displaystyle\stackrel{{\scriptstyle{\color[rgb]{0,0,0}\eqref{eq:kl}}}}{{\leq}% }\sigma_{q}{\color[rgb]{0,0,0}S_{f}(y_{k+1})^{q}}+\frac{3L_{F}}{2}\\|y_{k+1}-x_% {k}\\|^{2}$
		$\displaystyle\leq\sigma_{q}\mu^{q}\\|y_{k+1}-x_{k}\\|^{q}+\frac{3L_{F}}{2}\\|y_{k% +1}-x_{k}\\|^{2}$
		$\displaystyle\leq\sigma_{q}\mu^{q}\left(\frac{\mu}{\mu-3L_{F}}\right)^{q/2}\\|x% _{k+1}-x_{k}\\|^{q}$
		$\displaystyle\quad+\frac{2\mu}{3L_{F}(\mu-3L_{F})}\\|x_{k+1}-x_{k}\\|^{2}$
		$\displaystyle\leq C_{1}(f(x_{k})-f(x_{k+1}))^{\frac{q}{2}}+C_{2}(f(x_{k})-f(x_% {k+1})).$

where the third and the fourth inequalities follow from Lemma 4, the last inequality follows from the descent (10), $C_{1}=\sigma_{q}\mu^{q}\left(\frac{\mu}{\mu-3L_{F}}\right)^{q/2}(2/\delta)^{q/2}$ and $C_{2}=\frac{4\mu}{\delta(3L_{F})(\mu-3L_{F})}$ . Denote $\delta_{k}=f(x_{k})-f_{*}$ , then we get:

\displaystyle\delta_{k+1}\leq C_{1}(\delta_{k}-\delta_{k+1})^{\frac{q}{2}}+C_{% 2}(\delta_{k}-\delta_{k+1}).

Using Lemma 2 in [15] with $\theta=\frac{2}{q}$ we get our statement. ∎

IV Power flow analysis

Power flow problems are ones of the most studied in power systems being an important tool for planning and operation of the electric grid. In this section we consider the particular problem of power flow analysis. This is defined as follows. Consider a power system with $N$ bus (see e.g., Figure 1 for the IEEE 14 bus system). We denote $v_{i}$ , $p_{i}$ and $q_{i}$ the complex voltage, active power and reactive power for the $i$ bus, respectively. Let $Y:=G+jB$ be the admittance matrix and denote $p=(p_{1},\cdots,p_{N})$ , $q=(q_{1},\cdots,q_{N})$ and $v=(v_{1},\cdots,v_{N})$ . Given a complex load vector $s:=s_{R}+js_{I}$ , then the power flow analysis problem is to find $v=(v_{1},\cdots,v_{N})$ such that [6]:

\displaystyle F(v)=s\;;\quad F(v)=p+jq=\text{diag}(vv^{H}Y^{H}),

(14)

where $(.)^{H}$ is the Hermitian transpose. This problem is equivalent to the following optimization problem:

Refer to caption — Figure 1: Representation of the IEEE 14-bus system [10].

	$\displaystyle\min_{v=(u,\theta)}\\|F(v)-s\\|$
	$\displaystyle s.t.\quad u\in[u_{\text{min}},u_{\text{max}}],\quad\theta\in[-% \pi,\pi].$

In [6], the authors provide a similar formulation for the power flow analysis problem, but using $\|\cdot\|^{2}$ as the merit function to measure the distance between the objective function $F(\cdot)$ and the desired complex load $s$ . As we have mentioned earlier, it is beneficial to use only $\|\cdot\|$ as the merit function. Further, since we have (see e.g., [12]):

	$\displaystyle p_{i}(u,\theta)=\sum_{k=1}^{N}u_{i}u_{k}\left(G(i,k)\text{cos}(% \theta_{i}-\theta_{k})+B(i,k)\text{sin}(\theta_{i}-\theta_{k})\right),$
	$\displaystyle q_{i}(u,\theta)=-\sum_{k=1}^{N}u_{i}u_{k}(B(i,k)\text{cos}(% \theta_{i}\!-\!\theta_{k})\!+\!G(i,k)\text{sin}(\theta_{i}\!-\!\theta_{k})),$

and denote:

\textbf{C}=\{(u,\theta):u\in[u_{\text{min}},u_{\text{max}}],\quad\theta\in[-% \pi,\pi]\},

then, the previous optimization problem is equivalent to the following optimization problem:

\displaystyle\min\limits_{x=(u;\theta)\in\textbf{C}}f(x)=\begin{Vmatrix}p(x)-s% _{R}\\ q(x)-s_{I}\end{Vmatrix}.

(15)

The most efficient algorithm for solving the (unconstrained) power flow analysis problem is the Newton-Raphson (NR) method [22]. However it may lead to poor performance when the initialization point is far from the optimum or the system is stressed (i.e., the problem is ill-conditioned). In a recent paper, [6], the authors proposed a hybrid method that combines stochastic gradient descent (SGD) and the NR methods to overcome the numerical challenges in this problem. The iterative process starts with the NR algorithm, and if the method detect a divergence (e.g., when the condition number of the Jacobian deteriorates), then switch to the SGD algorithm. After running a few SGD steps, then again switch to the NR iterates and repeat the process until an (approximate) optimal solution is found. Since this hybrid algorithm cannot deal with (simple) constraints as in (15), we propose to use our new method, modified projected Gauss-Newton (MPG-N), and compare its performance with the projected gradient descent (PGD) method applied to the problem (2), where $F$ is given in (14). In order to apply both methods, one needs to evaluate the gradient of the functions $p(x)$ and $q(x)$ . We have the following expressions for the derivatives of $p_{i}$ ’s and $q_{i}$ ’s:

	$\displaystyle\frac{\partial p_{i}}{\partial u_{i}}\!=\!2G(i,i)+\!\!\sum_{% \begin{subarray}{c}k=1\\ k\neq i\end{subarray}}^{N}u_{k}\left(G(i,k)\text{cos}(\theta_{i}\!\!-\!\!% \theta_{k})\!\!+\!\!B(i,k)\text{sin}(\theta_{i}\!\!-\!\!\theta_{k})\right),$
	$\displaystyle\frac{\partial p_{i}}{\partial u_{k}}=u_{i}\left(G(i,k)\text{cos}% (\theta_{i}\!-\!\theta_{k})+B(i,k)\text{sin}(\theta_{i}\!-\!\theta_{k})\right)% ,\forall k\!\neq\!i,$
	$\displaystyle\frac{\partial p_{i}}{\partial\theta_{i}}=\sum_{\begin{subarray}{% c}k=1\\ k\neq i\end{subarray}}^{N}u_{k}u_{i}\left(-G(i,k)\text{sin}(\theta_{i}\!-\!% \theta_{k})+B(i,k)\text{cos}(\theta_{i}\!-\!\theta_{k})\right),$
	$\displaystyle\frac{\partial p_{i}}{\partial\theta_{k}}=-u_{i}u_{k}\left(-B(i,k% )\text{cos}(\theta_{i}\!\!-\!\!\theta_{k})\!\!-\!\!G(i,k)\text{sin}(\theta_{i}% \!\!-\!\!\theta_{k})\right),\!\forall k\neq\!i,$
	$\displaystyle\frac{\partial q_{i}}{\partial u_{i}}\!=\!\!-\!2B(i,i)\!\!-\!\!\!% \sum_{\begin{subarray}{c}k=1\\ k\neq i\end{subarray}}^{N}u_{k}\left(B(i,k)\text{cos}(\theta_{i}\!\!-\!\!% \theta_{k})\!\!-\!\!G(i,k)\text{sin}(\theta_{i}\!\!-\!\!\theta_{k})\right),$
	$\displaystyle\frac{\partial q_{i}}{\partial u_{k}}=-u_{i}\left(B(i,k)\text{cos% }(\theta_{i}\!-\!\theta_{k})-G(i,k)\text{sin}(\theta_{i}\!-\!\theta_{k})\right% ),\forall k\!\neq\!i,$
	$\displaystyle\frac{\partial q_{i}}{\partial\theta_{i}}=\sum_{\begin{subarray}{% c}k=1\\ k\neq i\end{subarray}}^{N}u_{k}u_{i}\left(B(i,k)\text{sin}(\theta_{i}\!-\!% \theta_{k})+G(i,k)\text{cos}(\theta_{i}\!-\!\theta_{k})\right),$
	$\displaystyle\frac{\partial q_{i}}{\partial\theta_{k}}=-u_{k}u_{i}\left(G(i,k)% \text{cos}(\theta_{i}\!-\!\theta_{k})+B(i,k)\text{sin}(\theta_{i}\!-\!\theta_{% k})\right),\forall k\!\neq\!i.$

Hence, $\nabla f(x)\in\mathbb{R}^{2N}$ and we have:

\displaystyle\nabla f(x)=\sum_{i=1}^{N}\frac{\partial p_{i}(x)}{\partial x}(p_% {i}(x)-s_{R})+\frac{\partial q_{i}(x)}{\partial x}(q_{i}(x)-s_{I}),

where $\frac{\partial p_{i}(x)}{\partial x}=\left(\frac{\partial p_{i}(x)}{\partial u% _{1}};\cdots;\frac{\partial p_{i}(x)}{\partial u_{N}};\frac{\partial p_{i}(x)}% {\partial\theta_{1}};\cdots;\frac{\partial p_{i}(x)}{\partial\theta_{N}}\right)$ and $\frac{\partial q_{i}(x)}{\partial x}=\left(\frac{\partial q_{i}(x)}{\partial u% _{1}};\cdots;\frac{\partial q_{i}(x)}{\partial u_{N}};\frac{\partial q_{i}(x)}% {\partial\theta_{1}};\cdots;\frac{\partial q_{i}(x)}{\partial\theta_{N}}\right)$ for $i=1:N$ . Note that the Jacobian $\nabla F$ may be ill-conditioned, but the objective function $f$ (may) satisfy KL inequality.

IV-A Numerical simulations

In this subsection, we demonstrate the efficiency of the modified projected Gauss-Newton (MPG-N) method using several IEEE bus test cases from [10] (IEEE 14 bus, IEEE 39 bus, IEEE 57 bus and IEEE 118 bus). We chose an optimal point $x^{*}\in\textbf{C}$ , then we generate $s_{R}=p(x^{*})$ and $s_{I}=q(x^{*})$ (see also [6]). We apply MPG-N method on problem (15) and PGD method on problem (2), where $F$ is given in (14), and test whether the algorithms can reach $x^{*}$ from a random feasible starting point. The stopping criterion for both algorithms is $\|F(x_{k})\|\leq 10^{-3}$ . The results are given in Figure (2), where we plot the evolution of the function value $\|F(x_{k})\|$ along iterations. From this figure one can observe that in the beginning, the PGD performs better than the MPG-N method. However, MPG-N method requires small number of iterations (even 5 times less) than the PGD in order to achieving the desired accuracy.

V Conclusion

In this paper, we have proposed a modified projected Gauss-Newton (MPG-N) method for solving constrained least-squares problems. Under mild assumptions, we have proved global convergence results for the iterates. More precisely, we have proved that any limit point of the sequence generated by MPG-N algorithm is a stationary point and under the KL property, we have derived convergence rates in function values depending on the KL parameter. Finally, we have considered solving a power flow problem and compared the performance of our scheme with the projected gradient method, showing the efficiency of the proposed method on several IEEE bus test cases.

ACKNOWLEDGMENT

The research leading to these results has received funding from: ITN-ETN project TraDE-OPT funded by the European Union’s Horizon 2020 Research and Innovation Programme under the Marie Skolodowska-Curie grant agreement No. 861137; NO Grants 2014-2021, RO-NO-2019-0184, under project ELO-Hyp, contract no. 24/2020; UEFISCDI PN-III-P4-PCE-2021-0720, under project L2O-MOC, nr. 70/2022.

References

[1] S. Abhyankar, Q. Cui, and A. J. Flueck, Fast power flow analysis using a hybrid current-power balance formulation in rectangular coordinates, in IEEE PES TD Conference and Exposition, 1–5, 2014.
[2] J. Bolte, A. Daniilidis, A. Lewis, and M. Shiota, Clarke subgradients of stratifiable functions, SIAM Journal on Optimization, 18(2): 556–572, 2007.
[3] L. M. Braz, C. A. Castro, and C. Murati, A critical evaluation of step size optimization based load flow methods, IEEE Transactions on Power Systems, 15(1): 202–207, 2000.
[4] Y. Chen and C. Shen, A Jacobian-free newton method with adaptive preconditioner and its application for power flow calculations, IEEE Transactions on Power Systems, 21(3): 1096–1103, 2006.
[5] A.R. Conn, Ni.I.M. Gould, and Ph. L. Toint, Trust region methods, Society for Industrial and Applied Mathematics, 2000.
[6] N. Costilla-Enriquez, Y. Weng, and B. Zhang,Combining Newton-Raphson and Stochastic Gradient Descent for Power Flow Analysis. IEEE Trans. Power Syst, 36, 514–517, 2021.
[7] D. Drusvyatskiy and C. Paquette, Efficiency of minimizing compositions of convex functions and smooth maps, Mathematical Programming, 178(1-2): 503–558, 2019.
[8] H. O. Hartley, The modified Gauss-Newton method for the fitting of non-linear regression functions by least squares, Technometrics 3(2): 269–280, 1961.
[9] A. Hauswirth, S. Bolognani, G.Hug and F.Dorfler, Projected gradient descent on Riemannian manifolds with applications to online power system optimization, $54^{th}$ Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 2016.
[10] Illinois Center for a Smarter Electric Grid. Available: http://publish.illinois.edu/smartergrid/, 2013.
[11] R.I. Jennrich, and S. M. Robinson, A Newton-Raphson algorithm for maximum likelihood factor analysis, Psychometrika 34(1): 111–123, 1969.
[12] H. Le Nguyen, Newton-Raphson method in complex form . IEEE Transactions on Power Systems, 12(3), 1997.
[13] F. Milano, Continuous Newton’s method for power flow analysis, IEEE Transactions on Power Systems, 24(1): 50–57, 2009.
[14] B. Mordukhovich, Variational analysis and generalized differentiation. Basic theory, Springer, 2006.
[15] Y. Nabou, I. Necoara, Efficiency of higher-order algorithms for minimizing general composite functions, arXiv preprint :2203.13367, 2022.
[16] Yu. Nesterov and B.T. Polyak, Cubic regularization of Newton method and its global performance, Mathematical Programming, 108(1): 177–205, 2006.
[17] Y. Nesterov, Modified Gauss-Newton scheme with worst case guarantees for global performance, Optimization Methods and Software, 22(3): 469–483, 2007.
[18] Y. Nesterov, Gradient methods for minimizing composite functions, Math. Program. 140, 125–161, 2013.
[19] R.T. Rockafellar and R. Wets, Variational Analysis, Springer, 1998.
[20] R. Salgado, A. Brameller, and P. Aitchison, Optimal power flow solutions using the gradient projection method. Part 1: Theoretical basis, IEE Proceedings C (Generation, Transmission and Distribution). Vol. 137. No. 6. IET Digital Library, 1990.
[21] R. Salgado, A. Brameller, and P. Aitchison,Optimal power flow solutions using the gradient projection method. Part 2: Modelling of the power system equations, IEE Proceedings C (Generation, Transmission and Distribution). Vol. 137. No. 6. IET Digital Library, 1990.
[22] B. Stott, Review of load-flow calculation methods, Proceedings of the IEEE, 62(7): 916–929, 1974.

	$\displaystyle\min_{y\in\mathbf{C}}\\|F(x)+\nabla F(x)(y-x)\\|+\frac{M}{2}\\|y-x\\|% ^{2}$
	$\displaystyle=\min_{y\in\mathbf{C}}\max_{\\|s\\|\leq 1}\langle s,F(x)+\nabla F(x% )(y-x)\rangle+\frac{M}{2}\\|y-x\\|^{2}$
	$\displaystyle=\max_{\\|s\\|\leq 1}\min_{y\in\mathbf{C}}\langle s,F(x)+\nabla F(x% )(y-x)\rangle+\frac{M}{2}\\|y-x\\|^{2}$
	$\displaystyle=\max_{\\|s\\|\leq 1}\min_{y\in\mathbf{C}}\langle\nabla F(x)^{T}s,(% y-x)\rangle+\frac{M}{2}\\|y-x\\|^{2}+\langle s,F(x)\rangle$
	$\displaystyle=\max_{\\|s\\|\leq 1}\min_{y\in\mathbf{C}}\frac{M}{2}\\|y-x+\frac{1}% {M}\nabla F(x)^{T}s\\|^{2}-\frac{1}{2M}\\|F(x)^{T}s\\|^{2}$
	$\displaystyle\quad+\langle s,F(x)\rangle$
	$\displaystyle=\max_{\\|s\\|\leq 1}\frac{M}{2}\\|\Pi_{\textbf{C}}\left(x-\frac{1}{% M}\nabla F(x)^{T}s\right)-x+\frac{1}{M}\nabla F(x)^{T}s\\|^{2}$
	$\displaystyle\quad-\frac{1}{2M}\\|F(x)^{T}s\\|^{2}+\langle s,F(x)\rangle,$

	$\displaystyle f(x_{k+1})$	$\displaystyle\stackrel{{\scriptstyle\eqref{eq:alg_desc}}}{{\leq}}\\|F(x_{k})\!+% \!\nabla F(x_{k})(x_{k+1}\!\!-\!\!x_{k})\\|+\frac{M_{k}}{2}\\|x_{k+1}\!\!-\!\!x_% {k}\\|^{2}$
		$\displaystyle\stackrel{{\scriptstyle\eqref{eq:iter}}}{{\leq}}\min_{x\in\mathbf% {C}}\\|F(x_{k})+\nabla F(x_{k})(x-x_{k})\\|+\frac{M_{k}}{2}\\|x-x_{k}\\|^{2}$
		$\displaystyle\stackrel{{\scriptstyle{\color[rgb]{0,0,0}\eqref{eq:alg_desc}}}}{% {\leq}}\min_{x\in\mathbf{C}}f(x)+\frac{M_{k}+L_{F}}{2}\\|x-x_{k}\\|^{2}$
		$\displaystyle\leq f(y_{k+1})+\frac{M_{k}+L_{F}}{2}\\|y_{k+1}-x_{k}\\|^{2},$

	$\displaystyle f(x_{k+1})-f_{*}$	$\displaystyle\stackrel{{\scriptstyle{\color[rgb]{0,0,0}\eqref{eq:01}}}}{{\leq}% }f(y_{k+1})-f_{*}+\frac{3L_{F}}{2}\\|y_{k+1}-x_{k}\\|^{2}$
		$\displaystyle\stackrel{{\scriptstyle{\color[rgb]{0,0,0}\eqref{eq:kl}}}}{{\leq}% }\sigma_{q}{\color[rgb]{0,0,0}S_{f}(y_{k+1})^{q}}+\frac{3L_{F}}{2}\\|y_{k+1}-x_% {k}\\|^{2}$
		$\displaystyle\leq\sigma_{q}\mu^{q}\\|y_{k+1}-x_{k}\\|^{q}+\frac{3L_{F}}{2}\\|y_{k% +1}-x_{k}\\|^{2}$
		$\displaystyle\leq\sigma_{q}\mu^{q}\left(\frac{\mu}{\mu-3L_{F}}\right)^{q/2}\\|x% _{k+1}-x_{k}\\|^{q}$
		$\displaystyle\quad+\frac{2\mu}{3L_{F}(\mu-3L_{F})}\\|x_{k+1}-x_{k}\\|^{2}$
		$\displaystyle\leq C_{1}(f(x_{k})-f(x_{k+1}))^{\frac{q}{2}}+C_{2}(f(x_{k})-f(x_% {k+1})).$