Modified projected Gauss-Newton method for constrained nonlinear least-squares: application to power flow analysis
Abstract
In this paper, we consider a modified projected Gauss-Newton method for solving constrained nonlinear least-squares problems. We assume that the functional constraints are smooth and the the other constraints are represented by a simple closed convex set. We formulate the nonlinear least-squares problem as an optimization problem using the Euclidean norm as a merit function. In our method, at each iteration we linearize the functional constraints inside the merit function at the current point and add a quadratic regularization, yielding a strongly convex subproblem that is easy to solve, whose solution is the next iterate. We present global convergence guarantees for the proposed method under mild assumptions. In particular, we prove stationary point convergence guarantees and under Kurdyka-Lojasiewicz (KL) property for the objective function we derive convergence rates depending on the KL parameter. Finally, we show the efficiency of this method on the power flow analysis problem using several IEEE bus test cases.
I INTRODUCTION
In many areas of engineering, such as maximum likelihood estimations, non-linear data fitting, parameter estimation or power flow analysis, one finds applications that can be recast as nonlinear least-squares problems of the form [11, 8, 6]:
(1) | ||||
where is a closed convex set and , for , are nonlinear differentiable functions. When and , problem (1) is equivalent to a squared system of nonlinear equations. Hence several algorithms were proposed for solving this problem, among these algorithms the most popular is Newton-Raphson method (NR) [22]. In Newton-Raphson method one uses the inverse of the Jacobian matrix in order to update the iterations, i.e., the iterations are of the following form:
where is the current iteration and is the Jacobian matrix of . Although NR has fast convergence, it has several drawbacks. First of all, it can happen that at current test point the Jacobian is degenerate; in this case the method is not well-defined. Secondly, this convergence is not guaranteed when the initial point is far from the optimum [13]. Many approaches have been proposed in order to deal with these challenges, e.g., improving the starting point [3], or using different approximations for the Jacobian [4, 1]. In [17], Nesterov proposed a modified Gauss-Newton scheme (M-GN) for solving unconstrained nonlinear least-squares problems. The M-GN method constructs a convex model by linearizing the nonlinear function inside a sharp merit function and adding a quadratic regularization term, i.e.:
When , we recover the NR method described above. In [17] it was proved that, under a nondegenerate assumption (i.e., for all in the level set of , where is the starting point and denotes the smallest singular value), this scheme has global convergence. Moreover, the solution of each subproblem can be computed with a standard convex optimization solver. Further, problem (1) is equivalent to the following composite optimization problem:
(2) |
where is the indicator function of the convex set . Note that using only the norm as the merit function is beneficial than using , since in the latest case the condition number is doubled. Another possible algorithm for solving this problem is the Projected Gradient Descent (PGD) [18, 9, 20, 21]. The standard PGD algorithm is given by:
where is the projection operator (see Section II) and is a step size. PGD descent is a simple method easy to implement, but the main drawback is that it has slow convergence.
A natural questions arises whether we can prove global convergence of MG-N method without assuming the nondegeneracy assumption on the Jacobian , i.e., without assuming for all in the level set of (see (5)). Such a condition is conservative and it is not always satisfied in practice. In this paper we answer positively to this question, i.e., we consider a Modified Projected Gauss-Newton method (MPG-N) for solving problem (1), where is a simple closed convex set. At each iteration, MPG-N aims to solve the following strongly convex subproblem:
(3) |
which is a slightly modified version of [17] as it considers constraints . We prove, under mild assumptions, that this scheme can achieve global convergence without any assumption on the Jacobian matrix. More precisely, we prove that any limit point of the sequence generated by MPG-N is a stationary point and under the Kurdyka-Lojasiewicz (KL) property, we derive convergence rates in function value depending on the KL parameter. Finally, we consider solving a power flow analysis problem, with functional constraints which do not usually satisfy the non-degenerate assumption, while it satisfies the KL property. We compare the performance of such a scheme with the projected gradient scheme and demonstrate its efficiency of the proposed method on several IEEE bus test cases.
II Notations and preliminaries
We denote a finite-dimensional real vector space with and by its dual space composed of linear functions on . Using a self-adjoint positive-definite operator (notation ), we can endow these spaces with conjugate Euclidean norms:
For simplicity, we consider in the following and is the identity matrix. Let , where ’s, , are differentiable functions and the Jacobian is Lipschitz continuous, i.e.:
It follows that [17]:
(4) |
Let be proper lower semicontinuous function and . Then, the proximal operator with respect to is:
and the Moreau envelop is defined as:
When is the indicator function of a convex set , , then the proximal operator is the projection:
We say that is -weakly convex if the function
is convex. The level set of at is defined:
(5) |
Next, we provide few definitions and properties concerning subdifferential calculs (see also [14, 19]).
Definition 1
(Subdifferential): Let be a proper lower semicontinuous function. For a given , the Frechet subdifferential of at , written , is the set of all vectors satisfying:
When , we set . The limiting-subdifferential, or simply the subdifferential, of at , written , is defined through the following closure process [14]:
Note that we have for each . In the previous inclusion, the first set is closed and convex while the second one is closed, see e.g., [19](Theorem 8.6). For any let us define:
If , we set . Let us also recall the definition of a function satisfying the Kurdyka-Lojasiewicz (KL) property (see [2] for more details).
Definition 2
A proper lower semicontinuous function satisfies Kurdyka-Lojasiewicz (KL) property on the compact set on which takes a constant value if there exist such that one has:
(6) | ||||
Note that the relevant aspect of the KL property is when is a subset of critical points for , i.e. , since it is easy to establish the KL property when is not related to critical points. The KL property holds for a large class of functions including semi-algebraic functions (e.g., real polynomial functions), vector or matrix (semi)norms (e.g., with rational number), trigonometric functions, logarithm functions, exponential functions and uniformly convex functions, see [2] for a comprehensive list.
III Modified Projected Gauss-Newton method
In this section, we present the Modified Projected Gauss-Newton (MPG-N) method and then derive convergence results. We recall the problem of our interest is:
(7) |
We consider the following assumption:
Assumption 1
-
1.
is differentiable and the Jacobian is Lipschitz continuous:
-
2.
Problem (3) has solution, i.e., there exist such that .
An immediate consequence of (4) is:
Then, for , we define the modified projected Gauss-Newton iterate at a point as follows:
(8) | ||||
Note that this subproblem is strongly convex, hence is well defined and unique. Finally, the modified projected Gauss-Newton algorithm is as follows:
MPG-N algorithm
Chose and . For do:
Find such that:
(9)
Update .
The first step of MPG-N algorithm consists of finding a constant such that inequality (9) holds. If the constant is known, we can take . Otherwise, we can apply the following line search procedure [16]:
The next lemma shows that this process is well defined.
Proof:
We have from inequality (4) that:
Since , it follows immediately that:
Hence, this is the statement of the lemma. ∎
Note that Lemma 1 ensures that (9) always holds, provided that . However, in practice, using the line search procedure allows us to work with small (i.e., ) such that condition (9) holds. Next, let us discuss the solution of the subproblem (8). Following [18], we have:
which can be solved with standard convex optimization tools, such as trust-region methods [5].
III-A Convergence analysis
In this section we derive convergence results for MPG-N algorithm. First, we can prove the following descent:
Lemma 2
Let Assumption 1 hold. Let be generated by MPG-N algorithm. Then, we have:
-
1.
Sequence is nonincreasing and satisfies:
(10) -
2.
The sequence satisfies:
Proof:
In [7, 15], the authors prove that for the composite problem (3), the quantity does not always tend to zero in the limit, even if goes to zero. Thus, we must look elsewhere for a connection between and . Let us start with the following observation, whose proof can be found in Lemma 4.2 [7]: the function is -weakly convex. Weak convexity of has an immediate consequence on the Moreau envelope, denoted :
Lemma 3
(Lemma 4.3 [7]) Let . Then, the proximal map is well-defined and single-valued. The Moreau envelope is smooth with gradient given by:
Further, we have the following lemma whose prove is similar to the proof of Lemma in [15].
Lemma 4
Let Assumption 1 holds. Let be generated by MPG-N method and consider , where . Then, we have the following relations:
-
1.
-
2.
.
Proof:
Let us prove the first statement. Since is weakly convex, then is well-defined and unique. Thus:
(11) |
Further, from the definition of , we have:
where the last inequality follows by taking . Thus, we have:
(12) |
Finally, combining this inequality with (11), we get:
which proves the first statement. Further, from the optimality conditions of , we get:
(13) |
Thus, the second statement follows. ∎
Using the strict descent and Lemma 4, we can conclude the following global convergence rate:
Theorem 1
Proof:
From Lemma 4, we have:
Further, combining this inequality with (10), we get:
Summing up this inequality and taking the minimum we get:
which prove our first statement. Further, let be a limit point of , then one can notice that it is also a limit point of the sequence . This means that there exist a subsequence such that . Since is continuous, then . Note that we have and . Then, we conclude from the definition of the generalized subgradient that and hence is a stationary point. ∎
III-B Better rates under KL
In [17], the authors impose a nondegeneracy assumption on the Jacobian, that is, for all in the level set of in order to prove global convergence rate for MG-N method. Such a condition is not always valid in practice. In this section, we derive improved convergence rates for MPG-N method provided that the objective function satisfies the KL property. In general, the KL condition is less conservative than the nondegeneracy condition (see Section IV). Let us denote the set of limit points of by:
We have the following convergence rate:
Theorem 2
Let the assumptions of Lemma 4 hold. Additionally, assume that satisfy the KL property (6) on . Then, the following convergence rates hold for the sequence generated by MPG-N algorithm in function values:
-
If , then converge to linearly for sufficiently large.
-
If , then converge to at sublinear rate of order for sufficiently large.
IV Power flow analysis
Power flow problems are ones of the most studied in power systems being an important tool for planning and operation of the electric grid. In this section we consider the particular problem of power flow analysis. This is defined as follows. Consider a power system with bus (see e.g., Figure 1 for the IEEE 14 bus system). We denote , and the complex voltage, active power and reactive power for the bus, respectively. Let be the admittance matrix and denote , and . Given a complex load vector , then the power flow analysis problem is to find such that [6]:
(14) |
where is the Hermitian transpose. This problem is equivalent to the following optimization problem:

In [6], the authors provide a similar formulation for the power flow analysis problem, but using as the merit function to measure the distance between the objective function and the desired complex load . As we have mentioned earlier, it is beneficial to use only as the merit function. Further, since we have (see e.g., [12]):
and denote:
then, the previous optimization problem is equivalent to the following optimization problem:
(15) |
The most efficient algorithm for solving the (unconstrained) power flow analysis problem is the Newton-Raphson (NR) method [22]. However it may lead to poor performance when the initialization point is far from the optimum or the system is stressed (i.e., the problem is ill-conditioned). In a recent paper, [6], the authors proposed a hybrid method that combines stochastic gradient descent (SGD) and the NR methods to overcome the numerical challenges in this problem. The iterative process starts with the NR algorithm, and if the method detect a divergence (e.g., when the condition number of the Jacobian deteriorates), then switch to the SGD algorithm. After running a few SGD steps, then again switch to the NR iterates and repeat the process until an (approximate) optimal solution is found. Since this hybrid algorithm cannot deal with (simple) constraints as in (15), we propose to use our new method, modified projected Gauss-Newton (MPG-N), and compare its performance with the projected gradient descent (PGD) method applied to the problem (2), where is given in (14). In order to apply both methods, one needs to evaluate the gradient of the functions and . We have the following expressions for the derivatives of ’s and ’s:
Hence, and we have:
where and for . Note that the Jacobian may be ill-conditioned, but the objective function (may) satisfy KL inequality.
IV-A Numerical simulations
In this subsection, we demonstrate the efficiency of the modified projected Gauss-Newton (MPG-N) method using several IEEE bus test cases from [10] (IEEE 14 bus, IEEE 39 bus, IEEE 57 bus and IEEE 118 bus). We chose an optimal point , then we generate and (see also [6]). We apply MPG-N method on problem (15) and PGD method on problem (2), where is given in (14), and test whether the algorithms can reach from a random feasible starting point. The stopping criterion for both algorithms is . The results are given in Figure (2), where we plot the evolution of the function value along iterations. From this figure one can observe that in the beginning, the PGD performs better than the MPG-N method. However, MPG-N method requires small number of iterations (even 5 times less) than the PGD in order to achieving the desired accuracy.




V Conclusion
In this paper, we have proposed a modified projected Gauss-Newton (MPG-N) method for solving constrained least-squares problems. Under mild assumptions, we have proved global convergence results for the iterates. More precisely, we have proved that any limit point of the sequence generated by MPG-N algorithm is a stationary point and under the KL property, we have derived convergence rates in function values depending on the KL parameter. Finally, we have considered solving a power flow problem and compared the performance of our scheme with the projected gradient method, showing the efficiency of the proposed method on several IEEE bus test cases.
ACKNOWLEDGMENT
The research leading to these results has received funding from: ITN-ETN project TraDE-OPT funded by the European Union’s Horizon 2020 Research and Innovation Programme under the Marie Skolodowska-Curie grant agreement No. 861137; NO Grants 2014-2021, RO-NO-2019-0184, under project ELO-Hyp, contract no. 24/2020; UEFISCDI PN-III-P4-PCE-2021-0720, under project L2O-MOC, nr. 70/2022.
References
- [1] S. Abhyankar, Q. Cui, and A. J. Flueck, Fast power flow analysis using a hybrid current-power balance formulation in rectangular coordinates, in IEEE PES TD Conference and Exposition, 1–5, 2014.
- [2] J. Bolte, A. Daniilidis, A. Lewis, and M. Shiota, Clarke subgradients of stratifiable functions, SIAM Journal on Optimization, 18(2): 556–572, 2007.
- [3] L. M. Braz, C. A. Castro, and C. Murati, A critical evaluation of step size optimization based load flow methods, IEEE Transactions on Power Systems, 15(1): 202–207, 2000.
- [4] Y. Chen and C. Shen, A Jacobian-free newton method with adaptive preconditioner and its application for power flow calculations, IEEE Transactions on Power Systems, 21(3): 1096–1103, 2006.
- [5] A.R. Conn, Ni.I.M. Gould, and Ph. L. Toint, Trust region methods, Society for Industrial and Applied Mathematics, 2000.
- [6] N. Costilla-Enriquez, Y. Weng, and B. Zhang,Combining Newton-Raphson and Stochastic Gradient Descent for Power Flow Analysis. IEEE Trans. Power Syst, 36, 514–517, 2021.
- [7] D. Drusvyatskiy and C. Paquette, Efficiency of minimizing compositions of convex functions and smooth maps, Mathematical Programming, 178(1-2): 503–558, 2019.
- [8] H. O. Hartley, The modified Gauss-Newton method for the fitting of non-linear regression functions by least squares, Technometrics 3(2): 269–280, 1961.
- [9] A. Hauswirth, S. Bolognani, G.Hug and F.Dorfler, Projected gradient descent on Riemannian manifolds with applications to online power system optimization, Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 2016.
- [10] Illinois Center for a Smarter Electric Grid. Available: http://publish.illinois.edu/smartergrid/, 2013.
- [11] R.I. Jennrich, and S. M. Robinson, A Newton-Raphson algorithm for maximum likelihood factor analysis, Psychometrika 34(1): 111–123, 1969.
- [12] H. Le Nguyen, Newton-Raphson method in complex form . IEEE Transactions on Power Systems, 12(3), 1997.
- [13] F. Milano, Continuous Newton’s method for power flow analysis, IEEE Transactions on Power Systems, 24(1): 50–57, 2009.
- [14] B. Mordukhovich, Variational analysis and generalized differentiation. Basic theory, Springer, 2006.
- [15] Y. Nabou, I. Necoara, Efficiency of higher-order algorithms for minimizing general composite functions, arXiv preprint :2203.13367, 2022.
- [16] Yu. Nesterov and B.T. Polyak, Cubic regularization of Newton method and its global performance, Mathematical Programming, 108(1): 177–205, 2006.
- [17] Y. Nesterov, Modified Gauss-Newton scheme with worst case guarantees for global performance, Optimization Methods and Software, 22(3): 469–483, 2007.
- [18] Y. Nesterov, Gradient methods for minimizing composite functions, Math. Program. 140, 125–161, 2013.
- [19] R.T. Rockafellar and R. Wets, Variational Analysis, Springer, 1998.
- [20] R. Salgado, A. Brameller, and P. Aitchison, Optimal power flow solutions using the gradient projection method. Part 1: Theoretical basis, IEE Proceedings C (Generation, Transmission and Distribution). Vol. 137. No. 6. IET Digital Library, 1990.
- [21] R. Salgado, A. Brameller, and P. Aitchison,Optimal power flow solutions using the gradient projection method. Part 2: Modelling of the power system equations, IEE Proceedings C (Generation, Transmission and Distribution). Vol. 137. No. 6. IET Digital Library, 1990.
- [22] B. Stott, Review of load-flow calculation methods, Proceedings of the IEEE, 62(7): 916–929, 1974.