0% found this document useful (0 votes)

41 views39 pages

OSQP

Uploaded by

AmanTiwary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views39 pages

OSQP

Uploaded by

AmanTiwary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

OSQP: An Operator Splitting Solver for

Quadratic Programs
Bartolomeo Stellato, Goran Banjac, Paul Goulart,
arXiv:1711.08013v4 [math.OC] 12 Feb 2020

Alberto Bemporad, and Stephen Boyd

February 13, 2020

Abstract

We present a general-purpose solver for convex quadratic programs based on the

alternating direction method of multipliers, employing a novel operator splitting tech-
nique that requires the solution of a quasi-definite linear system with the same co-
efficient matrix at almost every iteration. Our algorithm is very robust, placing no
requirements on the problem data such as positive definiteness of the objective func-
tion or linear independence of the constraint functions. It can be configured to be
division-free once an initial matrix factorization is carried out, making it suitable for
real-time applications in embedded systems. In addition, our technique is the first op-
erator splitting method for quadratic programs able to reliably detect primal and dual
infeasible problems from the algorithm iterates. The method also supports factorization
caching and warm starting, making it particularly efficient when solving parametrized
problems arising in finance, control, and machine learning. Our open-source C imple-
mentation OSQP has a small footprint, is library-free, and has been extensively tested
on many problem instances from a wide variety of application areas. It is typically ten
times faster than competing interior-point methods, and sometimes much more when
factorization caching or warm start is used. OSQP has already shown a large impact
with tens of thousands of users both in academia and in large corporations.

1 Introduction
1.1 The problem
Consider the following optimization problem
minimize (1/2)xT P x + q T x
(1)
subject to Ax ∈ C,
where x ∈ Rn is the decision variable. The objective function is defined by a positive
semidefinite matrix P ∈ Sn+ and a vector q ∈ Rn , and the constraints by a matrix A ∈ Rm×n

1
and a nonempty, closed and convex set C ⊆ Rm . We will refer to it as general (convex)
quadratic program.
If the set C takes the form

C = [l, u] = {z ∈ Rm | li ≤ zi ≤ ui , i = 1, . . . , m} ,

with li ∈ {−∞} ∪ R and ui ∈ R ∪ {+∞}, we can write problem (1) as

minimize (1/2)xT P x + q T x
(2)
subject to l ≤ Ax ≤ u,

which we will refer to as a quadratic program (QP). Linear equality constraints can be
encoded in this way by setting li = ui for some or all of the elements in (l, u). Note that any
linear program (LP) can be written in this form by setting P = 0. We will characterize the
size of (2) with the tuple (n, m, N ) where N is the sum of the number of nonzero entries in
P and A, i.e., N = nnz(P ) + nnz(A).

Applications. Optimization problems of the form (1) arise in a huge variety of applica-
tions in engineering, finance, operations research and many other fields. Applications in
machine learning include support vector machines (SVM) [CV95], Lasso [Tib96, CWB08]
and Huber fitting [Hub64, Hub81]. Financial applications of (1) include portfolio opti-
mization [CT06, Mar52, BMOW14, BBD+ 17] [BV04, §4.4.1]. In the field of control engi-
neering, model predictive control (MPC) [RM09, GPM89] and moving horizon estimation
(MHE) [ABQ+ 99] techniques require the solution of a QP at each time instant. Several
signal processing problems also fall into the same class [BV04, §6.3.3][MB10]. In addition,
the numerical solution of QP subproblems is an essential component in nonconvex opti-
mization methods such as sequential quadratic programming (SQP) [NW06, Chap. 18] and
mixed-integer optimization using branch-and-bound algorithms [BKL+ 13, FL98].

1.2 Solution methods

Convex QPs have been studied since the 1950s [FW56], following from the seminal work on
LPs started by Kantorovich [Kan60]. Several solution methods for both LPs and QPs have
been proposed and improved upon throughout the years.

Active-set methods. Active-set methods were the first algorithms popularized as solution
methods for QPs [Wol59], and were obtained from an extension of Dantzig’s simplex method
for solving LPs [Dan63]. Active-set algorithms select an active-set (i.e., a set of binding
constraints) and then iteratively adapt it by adding and dropping constraints from the index
of active ones [NW06, §16.5]. New active constraints are added based on the cost function
gradient and the current dual variables. Active-set methods for QPs differ from the simplex
method for LPs because the iterates are not necessarily vertices of the feasible region. These
methods can easily be warm started to reduce the number of active-set recalculations re-
quired. However, the major drawback of active-set methods is that the worst-case complexity

2
grows exponentially with the number of constraints, since it may be necessary to investigate
all possible active-sets before reaching the optimal one [KM70]. Modern implementations of
active-set methods for the solution of QPs can be found in many commercial solvers, such as
MOSEK [MOS17] and GUROBI [Gur16], and in the open-source solver qpOASES [FKP+ 14].

Interior-point methods. Interior-point algorithms gained popularity in the 1980s as a

method for solving LPs in polynomial time [Kar84, GMS+ 86]. In the 90s these techniques
were extended to general convex optimization problems, including QPs [NN94]. Interior-
point methods model the problem constraints as parametrized penalty functions, also re-
ferred to as barrier functions. At each iteration an unconstrained optimization problem is
solved for varying barrier function parameters until the optimum is achieved; see [BV04,
Chap. 11] and [NW06, §16.6] for details. Primal-dual interior-point methods, in particu-
lar the Mehrotra predictor-corrector [Meh92] method, became the algorithms of choice for
practical implementation [Wri97] because of their good performance across a wide range of
problems. However, interior-point methods are not easily warm started and do not scale well
for very large problems. Interior-point methods are currently the default algorithms in the
commercial solvers MOSEK [MOS17], GUROBI [Gur16] and CVXGEN [MB12] and in the
open-source solver OOQP [GW03].

First-order methods. First-order optimization methods for solving quadratic programs

date to the 1950s [FW56]. These methods iteratively compute an optimal solution using only
first-order information about the cost function. Operator splitting techniques such as the
Douglas-Rachford splitting [LM79, DR56] are a particular class of first-order methods which
model the optimization problem as the problem of finding a zero of the sum of monotone
operators.
In recent years, the operator splitting method known as the alternating direction method
of multipliers (ADMM) [GM76, GM75] has received particular attention because of its very
good practical convergence behavior; see [BPC+ 11] for a survey. ADMM can be seen as a
variant of the classical alternating projections algorithm [BB96] for finding a point in the
intersection of two convex sets, and can also be shown to be equivalent to the Douglas-
Rachford splitting [Gab83]. ADMM has been shown to reliably provide modest accuracy
solutions to QPs in a relatively small number of computationally inexpensive iterations. It
is therefore well suited to applications such as embedded optimization or large-scale opti-
mization, wherein high accuracy solutions are typically not required due to noise in the data
and arbitrariness of the cost function. ADMM steps are computationally very cheap and sim-
ple to implement, and thus ideal for embedded processors with limited computing resources
such as those found in embedded control systems [JGR+ 14, OSB13, SSS+ 16]. ADMM is
also compatible with distributed optimization architectures enabling the solution of very
large-scale problems [BPC+ 11].
A drawback of first-order methods is that they are typically unable to detect primal
and/or dual infeasibility. In order to address this shortcoming, a homogeneous self-dual
embedding has been proposed in conjunction with ADMM for solving conic optimization

3
problems and implemented in the open-source solver SCS [OCPB16]. Although every QP can
be reformulated as a conic program, this reformulation is not efficient from a computational
point of view. A further drawback of ADMM is that number of iterations required to converge
is highly dependent on the problem data and on the user’s choice of the algorithm’s step-
size parameters. Despite some recent theoretical results [GB17, BG18], it remains unclear
how to select those parameters to optimize the algorithm convergence rate. For this reason,
even though there are several benefits in using ADMM techniques for solving optimization
problems, there exists no reliable general-purpose QP solver based on operator splitting
methods.

1.3 Our approach

In this work we present a new general-purpose QP solver based on ADMM that is able
to provide high accuracy solutions. The proposed algorithm is based on a novel splitting
requiring the solution of a quasi-definite linear system that is always solvable for any choice of
problem data. We therefore impose no constraints such as strict convexity of the cost function
or linear independence of the constraints. Since the linear system’s matrix coefficients remain
the same at every iteration when ρ is fixed, our algorithm requires only a single factorization
to solve the QP (2). Once this initial factorization is computed, we can fix the linear system
matrix coefficients to make the algorithm division-free. If we allow divisions, then we can
make occasional updates to the term ρ in this linear system to improve our algorithm’s
convergence. We find that our algorithm typically updates these coefficients very few times,
e.g., 1 or 2 in our experiments. In contrast to other first-order methods, our approach is able
to return primal and dual solutions when the problem is solvable or to provide certificates
of primal and dual infeasibility without resorting to the homogeneous self-dual embedding.
To obtain high accuracy solutions, we perform solution polishing on the iterates obtained
from ADMM. By identifying the active constraints from the final dual variable iterates, we
construct an ancillary equality-constrained QP whose solution is equivalent to that of the
original QP (1). This ancillary problem is then solved by computing the solution of a single
linear system of typically much lower dimensions than the one solved during the ADMM
iterations. If we identify the active constraints correctly, then the resulting solution of our
method has accuracy equal to or even better than interior-point methods.
Our algorithm can be efficiently warm started to reduce the number of iterations. More-
over, if the problem matrices do not change then the quasi-definite system factorization can
be reused across multiple solves greatly improving the computation time. This feature is
particularly useful when solving multiple instances of parametric QPs where only a few el-
ements of the problem data change. Examples illustrating the effectiveness of the proposed
algorithm in parametric programs arising in embedded applications appear in [BSM+ 17].
We implemented our method in the open-source “Operator Splitting Quadratic Program”
(OSQP) solver. OSQP is written in C and can be compiled to be library free. OSQP is
robust against noisy and unreliable problem data, has a very small code footprint, and is
suitable for both embedded and large-scale applications. We have extensively tested our code
and carefully tuned its parameters by solving millions of QPs. We benchmarked our solver

4
against state-of-the-art interior-point and active-set solvers over a benchmark library of 1400
problems from 7 different classes and over the hard QPs Maros-Mészáros test set [MM99].
Numerical results show that our algorithm is able to provide up to an order of magnitude
computational time improvements over existing commercial and open-source solvers in a
wide variety of applications. We also showed further time reductions from warm starting
and factorization caching.

2 Optimality conditions
We will find it convenient to rewrite problem (1) by introducing an additional decision
variable z ∈ Rm , to obtain the equivalent problem
minimize (1/2)xT P x + q T x
subject to Ax = z (3)
z ∈ C.
We can write the optimality conditions of problem (3) as [BGSB19, Lem. A.1] [RW98, Thm.
6.12]
Ax = z, (4)
P x + q + AT y = 0, (5)
z ∈ C, y ∈ NC (z), (6)
where y ∈ Rm is the Lagrange multiplier associated with the constraint Ax = z and NC (z)
denotes the normal cone of C at z. If there exist x ∈ Rn , z ∈ Rm and y ∈ Rm that satisfy the
conditions above, then we say that (x, z) is a primal and y is a dual solution to problem (3).
We define the primal and dual residuals of problem (1) as
rprim = Ax − z, (7)
T
rdual = P x + q + A y. (8)

Quadratic programs. In case of QPs of the form (2), condition (6) reduces to
T T
l ≤ z ≤ u, y+ (z − u) = 0, y− (z − l) = 0, (9)
where y+ = max(y, 0) and y− = min(y, 0).

2.1 Certificates of primal and dual infeasibility

From the theorem of strong alternatives [BV04, §5.8], [BGSB19, Prop. 3.1], exactly one of
the following sets is nonempty
P = {x ∈ Rn | Ax ∈ C} , (10)
D = y ∈ Rm | AT y = 0,

SC (y) < 0 , (11)

5
where SC is the support function of C, provided that some type of constraint qualification
holds [BV04]. In other words, any variable y ∈ D serves as a certificate that problem (1) is
primal infeasible.

Quadratic programs. In case C = [l, u], certifying primal infeasibility of (2) amounts to
finding a vector y ∈ Rm such that
AT y = 0, uT y+ + lT y− < 0. (12)
n
Similarly, it can be shown that a vector x ∈ R satisfying

 = 0 li , ui ∈ R

P x = 0, q T x < 0, (Ax)i ≥ 0 ui = +∞, li ∈ R (13)

 ≤ 0 l = −∞, u ∈ R
i i

is a certificate of dual infeasibility for problem (2); see [BGSB19, Prop. 3.1] for more details.

3 Solution with ADMM

Our method solves problem (3) using ADMM [BPC+ 11]. By introducing auxiliary variables
x̃ ∈ Rn and z̃ ∈ Rm , we can rewrite problem (3) as
minimize (1/2)x̃T P x̃ + q T x̃ + IAx=z (x̃, z̃) + IC (z)
(14)
subject to (x̃, z̃) = (x, z),
where IAx=z and IC are the indicator functions given by
( (
0 Ax = z 0 z∈C
IAx=z (x, z) = , IC (z) = .
+∞ otherwise +∞ otherwise
An iteration of ADMM for solving problem (14) consists of the following steps:
(x̃k+1 , z̃ k+1 ) ← argmin (1/2)x̃T P x̃ + q T x̃ + (σ/2)kx̃ − xk + σ −1 wk k22 (15)
(x̃,z̃):Ax̃=z̃
+ (ρ/2)kz̃ − z k + ρ−1 y k k22
xk+1 ← αx̃k+1 + (1 − α)xk + σ −1 wk (16)
z k+1 ← Π αz̃ k+1 + (1 − α)z k + ρ−1 y k

(17)
wk+1 ← wk + σ αx̃k+1 + (1 − α)xk − x k+1

(18)
k+1 k k+1 k k+1

y ← y + ρ αz̃ + (1 − α)z − z (19)
where σ > 0 and ρ > 0 are the step-size parameters, α ∈ (0, 2) is the relaxation parameter,
and Π denotes the Euclidean projection onto C. The introduction of the splitting variable x̃
ensures that the subproblem in (15) is always solvable for any P ∈ Sn+ which can also be 0
for LPs. Note that all the derivations hold also for σ and ρ being positive definite diagonal
matrices. The iterates wk and y k are associated with the dual variables of the equality
constraints x̃ = x and z̃ = z, respectively. Observe from steps (16) and (18) that wk+1 = 0
for all k ≥ 0, and consequently the w-iterate and the step (18) can be disregarded.

6
3.1 Solving the linear system
Evaluating the ADMM step (15) involves solving the equality constrained QP

minimize (1/2)x̃T P x̃ +q T x̃ +(σ/2)kx̃ − xk k22 +(ρ/2)kz̃ − z k +ρ−1 y k k22

(20)
subject to Ax̃ = z̃.

The optimality conditions for this equality constrained QP are

P x̃k+1 + q + σ(x̃k+1 − xk ) + AT ν k+1 = 0, (21)

ρ(z̃ k+1 − z k ) + y k − ν k+1 = 0, (22)
k+1 k+1
Ax̃ − z̃ = 0, (23)

where ν k+1 ∈ Rm is the Lagrange multiplier associated with the constraint Ax = z. By

eliminating the variable z̃ k+1 from (22), the above linear system reduces to
k+1
P + σI AT x̃ σxk − q
= k , (24)
A −ρ−1 I ν k+1 z − ρ−1 y k

with z̃ k+1 recoverable as

z̃ k+1 = z k + ρ−1 (ν k+1 − y k ).
We will refer to the coefficient matrix in (24) as the KKT matrix. This matrix always has
full rank thanks to the positive parameters σ and ρ introduced in our splitting, so (24)
always has a unique solution for any matrices P ∈ Sn+ and A ∈ Rm×n . In other words, we
do not impose any additional assumptions on the problem data such as strong convexity of
the objective function or linear independence of the constraints as was done in [GTSJ15,
RDC14b, RDC14a].

Direct method. A direct method for solving the linear system (24) computes its solution
by first factoring the KKT matrix and then performing forward and backward substitution.
Since the KKT matrix remains the same for every iteration of ADMM, we only need to
perform the factorization once prior to the first iteration and cache the factors so that we can
reuse them in subsequent iterations. This approach is very efficient when the factorization
cost is considerably higher than the cost of forward and backward substitutions, so that
each iteration is computed quickly. Note that if ρ or σ change, the KKT matrix needs to be
factored again.
Our particular choice of splitting results in a KKT matrix that is quasi-definite, i.e., it can
be written as a 2-by-2 block-symmetric matrix where the (1, 1)-block is positive definite, and
the (2, 2)-block is negative definite. It therefore always has a well defined LDLT factorization,
with L being a lower triangular matrix with unit diagonal elements and D a diagonal matrix
with nonzero diagonal elements [Van95]. Note that once the factorization is carried out,
computing the solution of (24) can be made division-free by storing D−1 instead of D.
When the KKT matrix is sparse and quasi-definite, efficient algorithms can be used for
computing a suitable permutation matrix P for which the factorization of P KP T results in

7
a sparse factor L [ADD04, Dav06] without regard for the actual nonzero values appearing in
the KKT matrix. The LDLT factorization consists of two steps. In the first step we compute
the sparsity pattern of the factor L. This step is referred to as the symbolic factorization
and requires only the sparsity pattern of the KKT matrix. In the second step, referred to as
the numerical factorization, we determine the values of nonzero elements in L and D. Note
that we do not need to update the symbolic factorization if the nonzero entries of the KKT
matrix change but the sparsity pattern remains the same.

Indirect method. With large-scale QPs, factoring linear system (24) might be prohibitive.
In these cases it might be more convenient to use an indirect method by solving instead the
linear system
P + σI + ρAT A x̃k+1 = σxk − q + AT (ρz k − y k )

obtained by eliminating ν k+1 from (24). We then compute z̃ k+1 as z̃ k+1 = Ax̃k+1 . Note
that the coefficient matrix in the above linear system is always positive definite. The linear
system can therefore be solved with an iterative scheme such as the conjugate gradient
method [GVL96, NW06]. When the linear system is solved up to some predefined accuracy,
we terminate the method. We can also warm start the method using the linear system
solution at the previous iteration of ADMM to speed up its convergence. In contrast to
direct methods, the complexity of indirect methods does not change if we update ρ and σ
since there is no factorization required. This allows for more updates to take place without
any overhead.

3.2 Final algorithm

By simplifying the ADMM iterations according to the previous discussion, we obtain Algo-
rithm 1. Steps 4, 5, 6 and 7 of Algorithm 1 are very easy to evaluate since they involve
only vector addition and subtraction, scalar-vector multiplication and projection onto a box.
Moreover, they are component-wise separable and can be easily parallelized. The most com-
putationally expensive part is solving the linear system in Step 3, which can be performed
as discussed in Section 3.1.

Algorithm 1
1: given initial values x0 , z 0 , y 0 and parameters ρ > 0, σ > 0, α ∈ (0, 2)
2: repeat k+1
k+1 k+1 P + σI AT x̃ σxk − q
3: (x̃ , ν ) ← solve linear system = k
A −ρ−1 I ν k+1 z − ρ−1 y k
k+1 k −1 k+1 k
4: z̃ ← z + ρ (ν −y )
k+1 k+1
5: x ← αx̃ + (1 − α)xk
z k+1 ← Π αz̃ k+1 + (1 − α)z k + ρ−1 y k

6:
7: y k+1 ← y k + ρ αz̃ k+1 + (1 − α)z k − z k+1
8: until termination criterion is satisfied

8
3.3 Convergence and infeasibility detection
We show in this section that the proposed algorithm generates a sequence of iterates
(xk , z k , y k ) that in the limit satisfy the optimality conditions (4)–(6) when problem (1) is
solvable, or provides a certificate of primal or dual infeasibility otherwise.
If we denote the argument of the projection operator in step 6 of Algorithm 1 by v k+1 ,
then we can express z k and y k as

z k = Π(v k ) and y k = ρ v k − Π(v k ) .

(25)

Observe from (25) that iterates z k and y k satisfy optimality condition (6) for all k > 0
by construction [BC11, Prop. 6.46]. Therefore, it only remains to show that optimality
conditions (4)–(5) are satisfied in the limit.
As shown in [BGSB19, Prop. 5.3], if problem (2) is solvable, then Algorithm 1 produces
a convergent sequence of iterates (xk , z k , y k ) so that
k
lim rprim = 0,
k→∞
k
lim rdual = 0,
k→∞

k k
where rprim and rdual correspond to the residuals defined in (7) and (8) respectively.
On the other hand, if problem (2) is primal and/or dual infeasible, then the sequence of
iterates (xk , z k , y k ) generated by Algorithm 1 does not converge. However, the sequence

(δxk , δz k , δy k ) = (xk − xk−1 , z k − z k−1 , y k − y k−1 )

always converges and can be used to certify infeasibility of the problem. According to
[BGSB19, Thm. 5.1], if the problem is primal infeasible, then δy = limk→∞ δy k satisfies
conditions (12), whereas δx = limk→∞ δxk satisfies conditions (13) if it is dual infeasible.

3.4 Termination criteria

We can define termination criteria for Algorithm 1 so that the iterations stop when either
a primal-dual solution or a certificate of primal or dual infeasibility is found up to some
predefined accuracy.
A reasonable termination criterion for detecting optimality is that the norms of the
k k
residuals rprim and rdual are smaller than some tolerance levels εprim > 0 and εdual > 0
[BPC+ 11], i.e.,
k k
krprim k∞ ≤ εprim , krdual k∞ ≤ εdual . (26)
We set the tolerance levels as

εprim = εabs + εrel max{kAxk k∞ , kz k k∞ }

εdual = εabs + εrel max{kP xk k∞ , kAT y k k∞ , kqk∞ },

where εabs > 0 and εrel > 0 are absolute and relative tolerances, respectively.

9
Quadratic programs infeasibility. If C = [l, u], we check the following conditions for primal
infeasibility

AT δy k ∞
≤ εpinf kδy k k∞ , uT (δy k )+ + lT (δy k )− ≤ εpinf kδy k k∞ ,

where εpinf > 0 is some tolerance level. Similarly, we define the following criterion for
detecting dual infeasibility

kP δxk k∞ ≤ εdinf kδxk k∞ , q T δxk ≤ εdinf kδxk k∞ ,


k
∈ [−εdinf , εdinf ] kδx k∞ ui , li ∈ R

(Aδxk )i ≥ −εdinf kδxk k∞ ui = +∞
 k
≤ εdinf kδx k∞ li = −∞,


for i = 1, . . . , m where εdinf > 0 is some tolerance level. Note that kδxk k∞ and kδy k k∞
appear in the right-hand sides to avoid division when considering normalized vectors δxk
and δy k in the termination criteria.

4 Solution polishing
Operator splitting methods are typically used for obtaining solution of an optimization prob-
lem with a low or medium accuracy. However, even if a solution is not very accurate we can
often guess which constraints are active from an approximate primal-dual solution. When
dealing with QPs of the form (2), we can obtain high accuracy solutions from the final
ADMM iterates by solving one additional system of equations.
Given a dual solution y of the problem, we define the sets of lower- and upper-active
constraints

L = {i ∈ {1, . . . , m} | yi < 0} ,
U = {i ∈ {1, . . . , m} | yi > 0} .

According to (9) we have that zL = lL and zU = uU , where lL denotes the vector composed
of elements of l corresponding to the indices in L. Similarly, we will denote by AL the matrix
composed of rows of A corresponding to the indices in L.
If the sets of active constraints are known a priori, then a primal-dual solution (x, y, z)
can be found by solving the following linear system
    
P ATL ATU x −q
AL  yL  =  lL  , (27)
AU yU uU
yi = 0, i ∈
/ (L ∪ U), (28)
z = Ax. (29)

10
We can then apply the aforementioned procedure to obtain a candidate solution (x, y, z).
If (x, y, z) satisfies the optimality conditions (4)–(6), then our guess is correct and (x, y, z)
is a primal-dual solution of problem (3). This approach is referred to as solution polishing.
Note that the dimension of the linear system (27) is usually much smaller than the KKT
system in Section 3.1 because the number of active constraints at optimality is less than or
equal to n for non-degenerate QPs.
However, the linear system (27) is not necessarily solvable even if the sets of active
constraints L and U have been correctly identified. This can happen, e.g., if the solution is
degenerate, i.e., if it has one or more redundant active constraints. We make the solution
polishing procedure more robust by solving instead the following linear system
    
P + δI ATL ATU x̂ −q
 AL −δI  ŷL  =  lL  , (30)
AU −δI ŷU uU
where δ > 0 is a regularization parameter with value δ ≈ 10−6 . Since the regularized matrix
in (30) is quasi-definite, the linear system (30) is always solvable.
By using regularization, we actually solve a perturbed linear system and thus introduce
a small error to the polished solution. If we denote by K and (K + ∆K) the coefficient
matrices in (27) and (30), respectively, then we can represent the two linear systems as
Kt = g and (K + ∆K)t̂ = g. To compensate for this error, we apply an iterative refinement
procedure [Wil63], i.e., we iteratively solve
(K + ∆K)∆t̂k = g − K t̂k (31)
and update t̂k+1 = t̂k + ∆t̂k . The sequence {t̂k } converges to the true solution t, provided
that it exists. Observe that, compared to solving the linear system (30), iterative refine-
ment requires only a backward- and a forward-solve, and does not require another matrix
factorization. Since the iterative refinement iterations converge very quickly in practice, we
just run them for a fixed number of passes without imposing any termination condition to
satisfy. Note that this is the same strategy used in commercial linear system solvers using
iterative refinement [Int17].

5 Preconditioning and parameter selection

A known weakness of first-order methods is their inability to deal effectively with ill-
conditioned problems, and their convergence rate can vary significantly when data are badly
scaled. In this section we describe how to precondition the data and choose the optimal
parameters to speed up the convergence of our algorithm.

5.1 Preconditioning
Preconditioning is a common heuristic aiming to reduce the number of iterations in first-
order methods [NW06, Chap. 5],[GTSJ15, Ben02, PC11, GB15, GB17]. The optimal choice

11
of preconditioners has been studied for at least two decades and remains an active area of
research [Kel95, Chap. 2],[Gre97, Chap. 10]. For example, the optimal diagonal precon-
ditioner required to minimize the condition number of a matrix can be found exactly by
solving a semidefinite program [BEGFB94]. However, this computation is typically more
complicated than solving the original QP, and is therefore unlikely to be worth the effort
since preconditioning is only a heuristic to minimize the number of iterations.
In order to keep the preconditioning procedure simple, we instead make use of a simple
heuristic called matrix equilibration [Bra10, TJ14, FB18, DB17]. Our goal is to rescale
the problem data to reduce the condition number of the symmetric matrix M ∈ Sn+m
representing the problem data, defined as

P AT
M= . (32)
A 0

In particular, we use symmetric matrix equilibration by computing the diagonal matrix S ∈

Sn+m
++ to decrease the condition number of SM S. We can write matrix S as

D
S= , (33)
E

where D ∈ Sn++ and E ∈ Sm ++ are both diagonal. In addition, we would like to normalize
the cost function to prevent the dual variables from being too large. We can achieve this by
multiplying the cost function by the scalar c > 0.
Preconditioning effectively modifies problem (1) into the following

minimize (1/2)x̄T P̄ x̄ + q̄ T x̄
¯ (34)
subject to Āx̄ ∈ C,

where x̄ = D−1 x, P̄ = cDP D, q̄ = cDq, Ā = EAD and C¯ = {Ez ∈ Rm | z ∈ C}. The

dual variables of the new problem are ȳ = cE −1 y. Note that when C = [l, u] the Euclidean
projection onto C¯ = [El, Eu] is as easy to evaluate as the projection onto C.
The main idea of the equilibration procedure is to scale the rows of matrix M so that
they all have equal `p norm. It is possible to show that finding such a scaling matrix S can
be cast as a convex optimization problem [BHT04]. However, it is computationally more
convenient to solve this problem with heuristic iterative methods, rather than continuous
optimization algorithms such as interior-point methods. We refer the reader to [Bra10] for
more details on matrix equilibration.

Ruiz equilibration. In this work we apply a variation of the Ruiz equilibration [Rui01].
This technique was originally proposed to equilibrate square matrices showing fast linear
convergence superior to other methods such as the Sinkhorn-Knopp equilibration [SK67].
Ruiz equilibration converges in few tens of iterations even in cases when Sinkhorn-Knopp
equilibration takes thousands of iterations [KRU14]. The steps are outlined in Algorithm 2
and differ from the original Ruiz algorithm by adding a cost scaling step that takes into

12
Algorithm 2 Modified Ruiz equilibration
initialize c = 1, S = I, δ = 0, P̄ = P, q̄ = q, Ā = A, C¯ = C
while k1 − δk∞ > εequil do
for i = 1, . .p . , n + m do
δi ← 1/ kMi k∞ . M equilibration
P̄ , q̄, Ā, C¯ ← Scale P̄ , q̄, Ā, C¯ using diag(δ)
γ ← 1/ max{mean(kP̄i k∞ ), kq̄k∞ } . Cost scaling
P̄ ← γ P̄ , q̄ ← γ q̄
S ← diag(δ)S, c ← γc
return S, c

account very large values of the cost. The first part is the usual Ruiz equilibration step.
Since M is symmetric, we focus only on the columns Mi and apply the scaling to both sides
of M . At each iteration, we compute the ∞-norm of each column and normalize that column
by the inverse of its square root. The second part is a cost scaling step. The scalar γ is the
current cost normalization coefficient taking into account the maximum between the average
norm of the columns of P̄ and the norm of q̄. We normalize problem data P̄ , q̄, Ā, ¯l, ū in
place at each iteration using the current values of δ and γ.

Unscaled termination criteria. Although we rescale our problem in the form (34), we
would still like to apply the stopping criteria defined in Section 3.4 to an unscaled version of
our problem. The primal and dual residuals in (26) can be rewritten in terms of the scaled
problem as
k
rprim = E −1 r̄prim
k
= E −1 (Āx̄k − z̄ k ), k
rdual = c−1 D−1 r̄dual
k
= c−1 D−1 (P̄ x̄k + q̄ + ĀT ȳ k ),
and the tolerances levels as
εprim = εabs + εrel max{kE −1 Āx̄k k∞ , kE −1 z̄ k k∞ }
εdual = εabs + εrel c−1 max{kD−1 P̄ x̄k k∞ , kD−1 ĀT ȳ k k∞ , kD−1 q̄k∞ }.

Quadratic programs infeasibility. When C = [l, u], the primal infeasibility conditions be-
come
D−1 ĀT δ ȳ k ∞ ≤ εpinf kEδ ȳ k k∞ , ūT (δ ȳ k )+ + ¯lT (δ ȳ k )− ≤ εpinf kEδ ȳ k k∞ ,
where the primal infeasibility certificate is c−1 Eδ ȳ k . The dual infeasibility criteria are
kD−1 P̄ δx̄k k∞ ≤ cεdinf kDδx̄k k∞ , q̄ T δx̄k ≤ cεdinf kDδx̄k k∞ ,

k
∈ [−εdinf , εdinf ] kDδx̄ k∞
 ui , li ∈ R
(E −1 Āδx̄k )i ≥ −εdinf kDδx̄k k∞ ui = +∞

≤ εdinf kDδx̄k k∞ li = −∞,


where the dual infeasibility certificate is Dδx̄k .

13
5.2 Parameter selection
The choice of parameters (ρ, σ, α) in Algorithm 1 is a key factor in determining the number
of iterations required to find an optimal solution. Unfortunately, it is still an open research
question how to select the optimal ADMM parameters, see [GTSJ15, NLR+ 15, GB17]. After
extensive numerical testing on millions of problem instances and a wide range of dimensions,
we chose the algorithm parameters as follows for QPs.

Choosing σ and α. The parameter σ is a regularization term which is used to ensure that
a unique solution of (15) will always exist, even when P has one or more zero eigenvalues.
After scaling P in order to minimize its condition number, we choose σ as small as possible
to preserve numerical stability without slowing down the algorithm. We set the default value
as σ = 10−6 . The relaxation parameter α in the range [1.5, 1.8] has empirically shown to
improve the convergence rate [Eck94, EF98]. In the proposed method, we set the default
value of α = 1.6.

Choosing ρ. The most crucial parameter is the step-size ρ. Numerical testing showed that
having different values of ρ for different constraints, can greatly improve the performance.
For this reason, without altering the algorithm steps, we chose ρ ∈ Sm ++ being a positive
definite diagonal matrix with different elements ρi .
For a specific QP, if we know the active and inactive constraints, then we can rewrite it
simply as an equality constrained QP. In this case the optimal ρ is defined as ρi = ∞ for
the active constraints and ρi = 0 for the inactive constraints, therefore reducing the linear
system (24) to the optimality conditions of the equivalent equality constrained QP (after
setting σ = 0). Unfortunately, it is impossible to know a priori whether any given constraint
is active or inactive at optimality, so we must instead adopt some heuristics. We define ρ as
follows (
ρ̄ li 6= ui
ρ = diag(ρ1 , . . . , ρm ), ρi = 3
10 ρ̄ li = ui ,
where ρ̄ > 0. In this way we assign a high value to the step-size related to the equality
constraints since they will be active at the optimum. Having a fixed value of ρ̄ cannot
provide fast convergence for different kind of problems since the optimal solution and the
active constraints vary greatly. To compensate for this issue, we adopt an adaptive scheme
which updates ρ̄ during the iterations based on the ratio between primal and dual residuals.
The idea of introducing “feedback” in the algorithm steps makes ADMM more robust to bad
scaling in the data; see [HYW00, BPC+ 11, Woh17]. Contrary to the adaptation approaches
in the literature where the update increases or decreases the value of the step-size by a fixed
amount, we adopt the following rule
s
k
k+1 k
kr̄prim k∞ / max{kĀx̄k k∞ , kz̄ k k∞ }
ρ̄ ← ρ̄ k
.
kr̄dual k∞ / max{kP̄ x̄k k∞ , kĀT ȳ k k∞ , kq̄k∞ }

14
In other words we update ρ̄k using the square root of the ratio between the scaled residuals
normalized by the magnitudes of the relative part of the tolerances. We set the initial value as
ρ̄0 = 0.1. In our benchmarks, if ρ̄0 does not already give a low number of ADMM iterations,
it gets usually tuned with a maximum of 1 or 2 updates. The adaptation causes the KKT
matrix in (24) to change and, if the linear system solver solution method is direct, it requires
a new numerical factorization. We do not require a new symbolic factorization because
the sparsity pattern of the KKT matrix does not change. Since the numerical factorization
can be costly, we perform the adaptation only when it is really necessary. In particular, we
allow an update if the accumulated iterations time is greater than a certain percentage of the
factorization time (nominally 40%) and if the new parameter is sufficiently different than the
current one, i.e., 5 times larger or smaller. Note that in the case of an indirect method this
rule allows for more frequent changes of ρ since there is no need to factor the KKT matrix
and the update is numerically much cheaper. Note that the convergence of the ADMM
algorithm is hard to prove in general if the ρ updates happen at each iteration. However, if
we assume that the updates stop after a fixed number of iterations the convergence results
hold [BPC+ 11, Section 3.4.1].

6 Parametric programs
In application domains such as control, statistics, finance, and SQP, problem (1) is solved
repeatedly for varying data. For these problems, usually referred to as parametric programs,
we can speed up the repeated OSQP calls by re-using the computations across multiple
solves.
We make the distinction between cases in which only the vectors or all data in (1) change
between subsequent problem instances. We assume that the problem dimensions n and m
and the sparsity patterns of P and A are fixed.

Vectors as parameters. If the vectors q, l, and u are the only parameters that vary, then
the KKT coefficient matrix in Algorithm 1 does not change across different instances of
the parametric program. Thus, if a direct method is used, we perform and store its fac-
torization only once before the first solution and reuse it across all subsequent iterations.
Since the matrix factorization is the computationally most expensive step of the algorithm,
this approach reduces significantly the amount of time OSQP takes to solve subsequent
problems. This class of problems arises very frequently in many applications including
linear MPC and MHE [RM09, ABQ+ 99], Lasso [Tib96, CWB08], and portfolio optimiza-
tion [BMOW14, Mar52].

Matrices and vectors as parameters. We separately consider the case in which the values
(but not the locations) of the nonzero entries of matrices P and A are updated. In this
case, in a direct method, we need to refactor the matrix in Algorithm 1. However, since
the sparsity pattern does not change we need only to recompute the numerical factorization
while reusing the symbolic factorization from the previous solution. This results in a modest

15
reduction in the computation time. This class of problems encompasses several applications
such as nonlinear MPC and MHE [DFH09] and sequential quadratic programming [NW06].

Warm starting. In contrast to interior-point methods, OSQP is easily initialized by pro-

viding an initial guess of both the primal and dual solutions to the QP. This approach
is known as warm starting and is particularly effective when the subsequent QP solutions
do not vary significantly, which is the case for most parametric programs applications. We
can warm start the ADMM iterates from the previous OSQP solution (x? , y ? ) by setting
(x0 , z 0 , y 0 ) ← (x? , Ax? , y ? ). Note that we can warm-start the ρ estimation described in Sec-
tion 7 to exploit the ratio between the primal and dual residuals to speed up convergence in
subsequent solves.

7 OSQP
We have implemented our proposed approach in the “Operator Splitting Quadratic Program”
(OSQP) solver, an open-source software package in the C language. OSQP can solve any
QP of the form (2) and makes no assumptions about the problem data other than convexity.
OSQP is available online at
https://osqp.org.
Users can call OSQP from C, C++, Fortran, Python, Matlab, R, Julia, Ruby and Rust, and
via parsers such as CVXPY [DB16, AVDB18], JuMP [DHL17], and YALMIP [L0̈4].
To exploit the data sparsity pattern, OSQP accepts matrices in Compressed-Sparse-
Column (CSC) format [Dav06]. We implemented the linear system solution described in
Section 3.1 as an object-oriented interface to easily switch between efficient algorithms. At
present, OSQP ships with the open-source QDLDL direct solver which is our independent
implementation based on [Dav05], and also supports dynamic loading of more advanced
algorithms such as the MKL Pardiso direct solver [Int17]. We plan to add iterative indirect
solvers and other direct solvers in future versions.
The default values for the OSQP termination tolerances described in Section 3.4 are
εabs = εrel = 10−3 , εpinf = εdinf = 10−4 .
The default step-size parameter σ and the relaxation parameter α are set to
σ = 10−6 , α = 1.6,
while ρ is automatically chosen by default as described in Section 5.2, with optional user
override. We set the default fixed number of iterative refinement steps to 3.
OSQP reports the total computation time divided by the time required to perform pre-
processing operations such as scaling or matrix factorization and the time to carry out the
ADMM iterations. If the solver is called multiple times reusing the same matrix factoriza-
tion, it will report only the ADMM solve time as total computation time. For more details
we refer the reader to the solver documentation on the OSQP project website.

16
8 Numerical examples
We benchmarked OSQP against the open-source interior-point solver ECOS [DCB13], the
open-source active-set solver qpOASES [FKP+ 14], and the commercial interior-point solvers
GUROBI [Gur16] and MOSEK [MOS17]. We executed every benchmark comparing different
solvers with both low accuracy, i.e., εabs = εrel = 10−3 , and high accuracy, i.e., εabs = εrel =
10−5 . We set GUROBI, ECOS, MOSEK and OSQP primal and dual feasibility tolerances
to our low and high accuracy tolerances. Since qpOASES is an active-set method and does
not allow the user to tune primal nor dual feasibility tolerances, we set it to its default
termination settings. In addition, the maximum time we allow each solver to run is 1000 sec
and no limit on the maximum number of iterations. Note that the use of maximum time
limits with no bounds on the number of iterations is the default setting in commercial solvers
such as MOSEK. For every solver we leave all the other settings to the internal defaults.
In general it is hard to compare the solution accuracies because all the solvers, especially
commercial ones, use an internal problem scaling and verify that the termination conditions
are satisfied against their scaled version of the problem. In contrast, OSQP allows the option
to check the termination conditions against the internally scaled or the original problem.
Therefore, to make the benchmark fair, we say that the primal-dual solution (x? , y ? ) returned
by each solver is optimal if the following optimality conditions are satisfied with tolerances
defined above with low and high accuracy modes,

k(Ax? − u)+ + (Ax? − l)− k∞ ≤ εprim , kP x? + q + AT y ? k∞ ≤ εdual ,

where εprim and εdual are defined in Section 3.4. If the primal-dual solution returned by a
solver does not satisfy the optimality conditions defined above, we consider it a failure. Note
that we decided not to include checks on the complementary slackness satisfaction because
interior-point solvers satisfied them with different metrics and scalings, therefore failing very
often. In contrast OSQP always satisfies complementary slackness conditions with machine
precision by construction.
In addition, we used the direct single-threaded linear system solver QDLDL [GSB18]
based on [ADD04, Dav05] and very simple linear algebra where other solvers such as
GUROBI and MOSEK use advanced multi-threaded linear system solvers and custom linear
algebra.
All the experiments were carried out on the MIT SuperCloud facility in collaboration
with the Lincoln Laboratory [RKB+ 18] with 16 Intel Xeon E5-2650 cores. The code for all
the numerical examples is available online at [SB19].

Shifted geometric mean. As in most common benchmarks [Mit], we make use of the
normalized shifted geometric mean to compare the timings of the various solvers. Given the
time required by solver s to solve problem p tp,s , we define the shifted geometric mean as
sY
gs = n (tp,s + k) − k,
p

17
where n is the number of problem instances considered and k = 1 is the shift [Mit]. The
normalized shifted geometric mean is therefore

rs = gs / min gs .
s

This value shows the factor at which a specific solver is slower than the fastest one with
scaled value of 1.00. If solver s fails at solving problem p, we set the time as the maximum
allowed, i.e., tp,s = 1000 sec. Note that to avoid memory overflows in the product, we
compute in practice the shifted geometric mean as eln gs .

Performance profiles. We also make use of the performance profiles [DM02] to compare
the solver timings. We define the performance ratio

up,s = tp,s / min tp,s .

The performance profile plots the function fs : R 7→ [0, 1] defined as

1X
fs (τ ) = I≤τ (up,s ),
n p

where I≤τ (up,s ) = 1 if up,s ≤ τ or 0 otherwise. The value fs (τ ) corresponds to the fraction of
problems solved within τ times from the best solver. Note that while we cannot necessarily
assess the performance of one solver relative to another with performance profiles, they still
represent a viable choice to benchmark the performance of a solver with respect to the best
one [GS16].

8.1 Benchmark problems

We considered QPs in the form (2) from 7 problem classes ranging from standard random
programs to applications in the areas of control, portfolio optimization and machine learning.
For each problem class, we generated 10 different instances for 20 dimensions giving a total of
1400 problem instances. All instances were obtained from either real data or from non-trivial
random data. Note that the random QPs and random equality constrained QPs problem
classes might not closely correspond to a real-world application. However, they have a typical
number of nonzero elements appearing in practice. We described generation for each class in
Appendix A. Throughout all the problem classes, n ranges between 101 and 104 , m between
102 and 105 , and the number of nonzeros N between 102 and 108 .

Results. We show in Figures 1 and 2 the OSQP and GUROBI computation times across all
the problem classes for low and high accuracy solutions respectively. OSQP is competitive
or even faster than GUROBI for several problem classes. Results are shown in Table 1
and Figure 3. OSQP shows the best performance across these benchmarks with MOSEK
performing better at lower accuracy and GUROBI at higher accuracy. ECOS is generally

18
Table 1: Benchmark problems comparison with timings as shifted geometric mean and
failure rates.
OSQP GUROBI MOSEK ECOS qpOASES
Shifted geometric Low accuracy 1.000 4.285 2.522 28.847 149.932
means High accuracy 1.000 1.886 6.234 52.718 66.254
Low accuracy 0.000 1.429 0.071 20.714 31.857
Failure rates [%]
High accuracy 0.000 1.429 11.000 45.571 31.714

Table 2: Benchmark problems OSQP statistics.

Median Max
Low accuracy 60.23 1550.19
Setup/solve time [%]
High accuracy 29.65 1373.18
Low accuracy 19.20 876.80
Polish time increase [%]
High accuracy 10.63 1408.83
Low accuracy 1 3
Number of ρ updates
High accuracy 1 5

Mean
Low accuracy 42.79
Polish success [%]
High accuracy 83.21

slower than the other interior-point solvers but faster than qpOASES that shows issues with
many constraints. Table 2 contains the OSQP statistics for this benchmark class. Because
of the good convergence behavior of OSQP on these problems, the setup time is significant
compared to the solve time, especially at low accuracy. Solution polishing increases the
solution time by a median of 10 to 20 percent due to the additional factorization used. The
worst-case time increase is very high and happens for the problems that converge in very few
iterations. Note that with high accuracy, polishing succeeds in 83% of test cases while on
low accuracy it succeeds in only 42% of cases. The number of ρ updates is in general very
low, usually requiring just more matrix factorization to adjust, with up to 5 refactorisations
used in the worst case when solving with high accuracy.

8.2 SuiteSparse matrix collection least squares problems

We considered 30 least squares problem in the form Ax ≈ b from the SuiteSparse Matrix
Collection library [DH11]. Using the Lasso and Huber problem setups from Appendix A we
formulate 60 QPs that we solve with OSQP, GUROBI and MOSEK. We excluded ECOS
because its interior-point algorithm showed numerical issues for several problems of the test
set. We also excluded qpOASES because it is not designed for large linear systems.

19
Table 3: SuiteSparse matrix problems comparison with timings as shifted geometric mean
and failure rates.
OSQP GUROBI MOSEK
Shifted geometric Low accuracy 1.000 1.630 1.745
means High accuracy 1.000 1.489 4.498
Low accuracy 0.000 14.286 12.500
Failure rates [%]
High accuracy 1.786 16.071 33.929

Table 4: SuiteSparse problems OSQP statistics.

Median Max
Low accuracy 71.37 2910.37
Setup/solve time [%]
High accuracy 48.03 1451.56
Low accuracy 32.27 178.23
Polish time increase [%]
High accuracy 22.68 115.77
Low accuracy 0 2
Number of ρ updates
High accuracy 1 3

Mean
Low accuracy 67.86
Polish success [%]
High accuracy 78.18

Results. Results are shown in Table 3 and Figure 4. OSQP shows the best performance
with GUROBI slightly slower and MOSEK third. The failure rates for GUROBI and MOSEK
are higher because the reported solution does not satisfy the optimality conditions of the
original problem. We display the OSQP statistics in Table 4. The setup phase takes a
significant amount of time compared to the solve phase, especially when OSQP converges
in a few iterations. This happens because the large problem dimensions result in a large
initial factorization time. Polish time is in general 22 to 32% of the total solution time.
However, the success is usually reliable, succeeding 78% of the times with very high quality
solutions. The number of matrix refactorizations required due to ρ updates is very low in
these examples, with a maximum of 2 or 3 even for high accuracy.

8.3 Maros-Mészáros problems

We considered the Maros-Mészáros test set [MM99] of hard QPs. We compared the OSQP
solver against GUROBI and MOSEK against all the problems in the set. We decided to
exclude ECOS because its interior-point algorithm showed numerical issues for several prob-
lems of the test set. We also excluded qpOASES because it could not solve most of the
problems since it is not suited for large QPs – it is based on an active-set method with dense

20
Table 5: Maros-Mészáros problems comparison with timings as shifted geometric mean
and failure rates.
OSQP GUROBI MOSEK
Shifted geometric Low accuracy 1.464 1.000 6.121
means High accuracy 5.247 1.000 14.897
Low accuracy 1.449 2.174 14.493
Failure rates [%]
High accuracy 10.145 2.899 30.435

Table 6: Maros-Mészáros problems OSQP statistics.

Median Max
Low accuracy 31.59 643.29
Setup/solve time [%]
High accuracy 2.89 326.11
Low accuracy 9.49 127.55
Polish time increase [%]
High accuracy 1.55 76.36
Low accuracy 1 70
Number of ρ updates
High accuracy 2 2498

Mean
Low accuracy 30.15
Polish success [%]
High accuracy 37.90

linear algebra.

Results. Results are shown in Table 5 and Figure 5. GUROBI shows the best performance
and OSQP, while slower, is still competitive on both low and high accuracy tests. MOSEK
remains the slowest in every case. Table 6 shows the statistics relative to OSQP. Since these
hard problems require a larger number of iterations to converge, the setup time overhead
compared to the solution time is in general lower than the other benchmark sets. Moreover,
since the problems are badly scaled and degenerate, the polishing strategy rarely succeeds.
However, the median time increase from the polish step is less than 10% of the total com-
putation time for both low and high accuracy modes. Note that the number of ρ updates
is usually very low with a median of 1 or 2. However, there are some worst-case problems
when it is very high because the bad scaling causes issues in our ρ estimation. However,
from our data we have seen that in more than 95% of the cases the number of ρ updates is
less than 5.

21
8.4 Warm start and factorization caching
To show the benefits of warm starting and factorization caching, we solved a sequence of
QPs using OSQP with the data varying according to some parameters. Since we are not
comparing OSQP with other high accuracy solvers in these benchmarks, we use its default
settings with accuracy 10−3 .

Lasso regularization path. We solved a Lasso problem described in Appendix A.5 with
varying λ in order to choose a regressor with good validation set performance. We solved
one problem instance with n = 50, 100, 150, 200 features, m = 100n data points, and λ
logarithmically spaced taking 100 values between λmax = kAT bk∞ and 0.01λmax .
Since the parameters only enter linearly in the cost, we can reuse the matrix factorization
and enable warm starting to reduce the computation time as discussed in Section 6.

Model predictive control. In MPC, we solve the optimal control problem described in
Appendix A.3 at each time step to compute an optimal input sequence over the horizon.
Then, we apply only the first input to the system and propagate the state to the next time
step. The whole procedure is repeated with an updated initial state xinit . We solved the
control problem with nx = 20, 40, 60, 80 states, nu = nx /2 inputs, horizon T = 10 and 100
simulation steps. The initial state of the simulation is uniformly distributed and constrained
to be within the feasible region, i.e., xinit ∼ U(−0.5x, 0.5x).
Since the parameters only enter linearly in the constraints bounds, we can reuse the
matrix factorization and enable warm starting to reduce the computation time as discussed
in Section 6.

Portfolio back test. Consider the portfolio optimization problem in Appendix A.4 with
n = 10k assets and k = 100, 200, 300, 400 factors.
We run a 4 years back test to compute the optimal assets investment depending on
varying expected returns and factor models [BBD+ 17]. We solved 240 QPs per year giving
a total of 960 QPs. Each month we solved 20 QPs corresponding to the trading days.
Every day, we updated the expected returns µ by randomly generating another vector with
µi ∼ 0.9µ̂i + N (0, 0.1), where µ̂i comes from the previous expected returns. The risk model
was updated every √ month by updating the nonzero elements of D and F according to Dii ∼
0.9D̂ii + U[0, 0.1 k] and Fij ∼ 0.9F̂ij + N (0, 0.1) where D̂ii and F̂ij come from the previous
risk model.
As discussed in Section 6, we exploited the following computations during the QP updates
to reduce the computation times. Since µ only enters in the linear part of the objective, we
can reuse the matrix factorization and enable warm starting. Since the sparsity patterns of
D and F do not change during the monthly updates, we can reuse the symbolic factorization
and exploit warm starting to speed up the computations.

Results. We show the results in Table 7. For the Lasso problem we see more than 10-
fold improvement in time and between 8 and 11 times reduction in number of iterations

22
Table 7: OSQP parametric problem results with warm start (ws) and without warm start
(no ws) in terms of time in seconds and number of iterations for different leading problem
dimensions of Lasso, MPC and Portfolio classes.
Time Time Time Iter Iter Iter
Problem dim. no ws ws improv. no ws ws improv.
50 0.225 0.012 19.353 210.250 25.750 8.165
100 0.423 0.040 10.556 224.000 25.750 8.699
Lasso
150 1.022 0.086 11.886 235.500 25.750 9.146
200 2.089 0.149 13.986 281.750 26.000 10.837
20 0.007 0.002 4.021 89.500 32.750 2.733
40 0.014 0.005 2.691 29.000 27.250 1.064
MPC
60 0.035 0.013 2.673 33.750 33.000 1.023
80 0.067 0.022 3.079 32.000 31.750 1.008
100 0.177 0.030 5.817 93.333 25.417 3.672
200 0.416 0.061 6.871 86.875 25.391 3.422
Portfolio
300 0.646 0.097 6.635 80.521 25.521 3.155
400 0.976 0.139 7.003 76.458 26.094 2.930

depending on the dimension. For the MPC problem the number of iterations does not
significantly decrease because the number of iterations is already low in cold-start. However
we get from 2.6 to 4-fold time improvement from factorization caching. OSQP shows from
5.8 to 7 times reduction in time for the portfolio problem and from 2.9 to 3.6 times reduction
in number of iterations.

9 Conclusions
We presented a novel general-purpose QP solver based on ADMM. Our method uses a
new splitting requiring the solution of a quasi-definite linear system that is always solvable
independently from the problem data. We impose no assumptions on the problem data other
than convexity, resulting in a general-purpose and very robust algorithm.
For the first time, we propose a first-order QP solution method able to provide primal and
dual infeasibility certificates if the problem is unsolvable without resorting to homogeneous
self-dual embedding or additional complexity in the iterations.
In contrast to other first-order methods, our solver can provide high-quality solutions by
performing solution polishing. After guessing which constraints are active, we compute the
solutions of an additional small equality constrained QP by solving a linear system. If the
constraints are identified correctly, the returned solution has accuracy equal or higher than
interior-point methods.
The proposed method is easily warm started to reduce the number of iterations. If the
problem matrices do not change, the linear system matrix factorization can be cached and

23
reused across multiple solves greatly improving the computation time. This technique can be
extremely effective, especially when solving parametric QPs where only part of the problem
data change.
We have implemented our algorithm in the open-source OSQP solver written in C and
interfaced with multiple other languages and parsers. OSQP is based on sparse linear algebra
and is able to exploit the structure of QPs arising in different application areas. OSQP is
robust against noisy and unreliable data and, after the first factorization is computed, can be
compiled to be library-free and division-free, making it suitable for embedded applications.
Thanks to its simple and parallelizable iterations, OSQP can handle large-scale problems
with millions of nonzeros.
We extensively benchmarked the OSQP solver with problems arising in several appli-
cation domains including finance, control and machine learning. In addition, we bench-
marked it against the hard problems from the Maros-Mészáros test set [MM99] and Lasso
and Huber fitting problems generated with sparse matrices from the SuiteSparse Matrix
Collection [DH11]. Timing and failure rate results showed great improvements over state-
of-the-art academic and commercial QP solvers.
OSQP has already a large userbase with tens of thousands of users both from top academic
institutions and large corporations.

A Problem classes
In this section we describe the random problem classes used in the benchmarks and derive
formulations with explicit linear equalities and inequalities that can be directly written in
the form Ax ∈ C with C = [l, u].

A.1 Random QP
Consider the following QP

minimize (1/2)xT P x + q T x
subject to l ≤ Ax ≤ u.

Problem instances. The number of variables and constraints in our problem instances are
n and m = 10n. We generated random matrix P = M M T + αI where M ∈ Rn×n and 15%
nonzero elements Mij ∼ N (0, 1). We add the regularization αI with α = 10−2 to ensure that
the problem is not unbounded. We set the elements of A ∈ Rm×n as Aij ∼ N (0, 1) with only
15% being nonzero. The linear part of the cost is normally distributed, i.e., qi ∼ N (0, 1).
We generated the constraint bounds as ui ∼ U(0, 1), li ∼ −U(0, 1).

24
A.2 Equality constrained QP
Consider the following equality constrained QP
minimize (1/2)xT P x + q T x
subject to Ax = b.

This problem can be rewritten as (1) by setting l = u = b.

Problem instances. The number of variables and constraints in our problem instances are
n and m = bn/2c.
We generated random matrix P = M M T + αI where M ∈ Rn×n and 15% nonzero
elements Mij ∼ N (0, 1). We add the regularization αI with α = 10−2 to ensure that the
problem is not unbounded. We set the elements of A ∈ Rm×n as Aij ∼ N (0, 1) with only
15% being nonzero. The vectors are all normally distributed, i.e., qi , bi ∼ N (0, 1).

Iterative refinement interpretation. Solution of the above problem can be found directly
by solving the following linear system

P AT x −q
= . (35)
A 0 ν b

If we apply the ADMM iterations (15)–(19) for solving the above problem, and by setting
α = 1 and y 0 = b, the algorithm boils down to the following iteration
k+1 k −1
x x P + σI AT −q P A T xk
= k + − ,
ν k+1 ν A −ρ−1 I b A 0 νk

which is equivalent to (31) with g = (−q, b) and t̂k = (xk , ν k ). This means that Algo-
rithm 1 applied to solve an equality constrained QP is equivalent to applying iterative re-
finement [Wil63, DER89] to solve the KKT system (35). Note that the perturbation matrix
in this case is
σI
∆K = ,
−ρ−1 I
which justifies using a low value of σ and a high value of ρ for equality constraints.

A.3 Optimal control

We consider the problem of controlling a constrained linear time-invariant dynamical system.
To achieve this, we formulate the following optimization problem [BBM17]
P −1 T
minimize xTT QT xT + Tt=0 xt Qxt + uTt Rut
subject to xt+1 = Axt + But (36)
xt ∈ X , ut ∈ U
x0 = xinit .

25
The states xt ∈ Rnx and the inputs uk ∈ Rnu are subject to polyhedral constraints defined
by the sets X and U. The horizon length is T and the initial state is xinit ∈ Rnx . Matrices
Q ∈ Sn+x and R ∈ Sn++ u
define the state and input costs at each stage of the horizon, and
nx
QT ∈ S+ defines the final stage cost.
By defining the new variable z = (x0 , . . . , xT , u0 , . . . , uT −1 ), problem (36) can be written
as a sparse QP of the form (2) with a total of nx (T + 1) + nu T variables.

Problem instances. We defined the linear systems with n = nx states and nu = 0.5nx
inputs. We set the horizon length to T = 10. We generated the dynamics as A = I + ∆ with
∆ij ∼ N (0, 0.01). We chose only stable dynamics by enforcing the norm of the eigenvalues
of A to be less than 1. The input action is modeled as B with Bij ∼ N (0, 1).
The state cost is defined as Q = diag(q) where qi ∼ U(0, 10) and 70% nonzero elements
in q. We chose the input cost as R = 0.1I. The terminal cost QT is chosen as the optimal
cost for the linear quadratic regulator (LQR) applied to A, B, Q, R by solving a discrete
algebraic Riccati equation (DARE) [BBM17]. We generated input and state constraints as
X = {xt ∈ Rnx | −x ≤ xt ≤ x}, U = {ut ∈ Rnu | −u ≤ ut ≤ u},
where xi ∼ U(1, 2) and ui ∼ U(0, 0.1). The initial state is uniformly distributed with
xinit ∼ U(−0.5x, 0.5x).

A.4 Portfolio optimization

Portfolio optimization is a problem arising in finance that seeks to allocate assets in a way
that maximizes the risk adjusted return [BMOW14, Mar52, BBD+ 17], [BV04, §4.4.1],

maximize µT x − γ(xT Σx)

subject to 1T x = 1
x ≥ 0,
where the variable x ∈ R represents the portfolio, µ ∈ Rn the vector of expected returns,
n

γ > 0 the risk aversion parameter, and Σ ∈ Sn+ the risk model covariance matrix. The risk
model is usually assumed to be the sum of a diagonal and a rank k < n matrix
Σ = F F T + D,
where F ∈ Rn×k is the factor loading matrix and D ∈ Rn×n is a diagonal matrix describing
the asset-specific risk.
We introduce a new variable y = F T x and solve the resulting problem in variables x
and y
minimize xT Dx + y T y − γ −1 µT x
subject to y = F T x
(37)
1T x = 1
x ≥ 0,
Note that the Hessian of the objective in (37) is a diagonal matrix. Also, observe that F F T
does not appear in problem (37).

26
Problem instances. We generated portfolio problems for increasing number of factors k
and number of assets n = 100k. The elements of matrix F were chosen as√Fij ∼ N (0, 1)
with 50% nonzero elements. The diagonal matrix D is chosen as Dii ∼ U[0, k]. The mean
return was generated as µi ∼ N (0, 1). We set γ = 1.

A.5 Lasso
The least absolute shrinkage and selection operator (Lasso) is a well known linear regression
technique obtained by adding an `1 regularization term in the objective [Tib96, CWB08]. It
can be formulated as
minimize kAx − bk22 + λkxk1 ,
where x ∈ Rn is the vector of parameters and A ∈ Rm×n is the data matrix and λ is the
weighting parameter.
We convert this problem to the following QP

minimize y T y + λ1T t
subject to y = Ax − b
−t ≤ x ≤ t,

where y ∈ Rm and t ∈ Rn are two newly introduced variables.

Problem instances. The elements of matrix A are generated as Aij ∼ N (0, 1) with 15%
nonzero elements. To construct the vector b, we generated the true sparse vector v ∈ Rn to
be learned (
0 with probability p = 0.5
vi ∼
N (0, 1/n) otherwise.
Then we let b = Av + ε where ε is the noise generated as εi ∼ N (0, 1). We generated the
instances with varying n features and m = 100n data points. The parameter λ is chosen as
(1/5)kAT bk∞ since kAT bk∞ is the critical value above which the solution of the problem is
x = 0.

A.6 Huber fitting

Huber fitting or the robust least-squares problem performs linear regression under the as-
sumption that there are outliers in the data [Hub64, Hub81]. The fitting problem is written
as Pm T
minimize i=1 φhub (ai x − bi ), (38)
with the Huber penalty function φhub : R → R defined as
(
u2 |u| ≤ M
φhub (u) =
M (2|u| − M ) |u| > M.

27
Problem (38) is equivalent to the following QP [MM00, Eq. (24)]

minimize uT u + 2M 1T (r + s)
subject to Ax − b − u = r − s
r≥0
s ≥ 0.

Problem instances. We generate the elements of A as Aij ∼ N (0, 1) with 15% nonzero
elements. To construct b ∈ Rm we first generate a vector v ∈ Rn as vi ∼ N (0, 1/n) and a
noise vector ε ∈ Rm with elements
(
N (0, 1/4) with probability p = 0.95
εi ∼
U[0, 10] otherwise.

We then set b = Av + ε. For each instance we choose m = 100n and M = 1.

A.7 Support vector machine

Support vector machine problem seeks an affine function that approximately classifies the
two sets of points [CV95]. The problem can be stated as

minimize xT x + λ m T
P
i=1 max(0, bi ai x + 1),

where bi ∈ {−1, +1} is a set label, and ai is a vector of features for the i-th point. The
problem can be equivalently represented as the following QP

minimize xT x + λ1T t
subject to t ≥ diag(b)Ax + 1
t ≥ 0,

where diag(b) denotes the diagonal matrix with elements of b on its diagonal.

Problem instances. We choose the vector b so that

(
+1 i ≤ m/2
bi =
−1 otherwise,

and the elements of A as

(
N (+1/n, 1/n) i ≤ m/2
Aij ∼
N (−1/n, 1/n) otherwise,

with 15% nonzeros per case.

28
GUROBI OSQP

Random QP Eq QP
Computation time [s] Computation time [s] Computation time [s]

102 102

10−1 10−1

10−4 2 10−4 2
10 103 104 105 106 107 108 10 103 104 105 106 107 108
Portfolio Lasso

102 102

10−1 10−1

10−4 2 10−4 2
10 103 104 105 106 107 108 10 103 104 105 106 107 108
SVM Huber

102 102

10−1 10−1

10−4 2 10−4 2
10 103 104 105 106 107 108 10 103 104 105 106 107 108
Problem dimension N Problem dimension N
Control
Computation time [s]

102

10−1

10−4 2
10 103 104 105 106 107 108
Problem dimension N

Figure 1: Computation time vs problem dimension for OSQP and GUROBI for low accu-
racy mode.

29
GUROBI OSQP

Random QP Eq QP
Computation time [s] Computation time [s] Computation time [s]

102 102

10−1 10−1

10−4 2 10−4 2
10 103 104 105 106 107 108 10 103 104 105 106 107 108
Portfolio Lasso

102 102

10−1 10−1

10−4 2 10−4 2
10 103 104 105 106 107 108 10 103 104 105 106 107 108
SVM Huber

102 102

10−1 10−1

10−4 2 10−4 2
10 103 104 105 106 107 108 10 103 104 105 106 107 108
Problem dimension N Problem dimension N
Control
Computation time [s]

102

10−1

10−4 2
10 103 104 105 106 107 108
Problem dimension N

Figure 2: Computation time vs problem dimension for OSQP and GUROBI for high
accuracy mode.

30
Low accuracy
Ratio of problems solved 1
0.8
0.6 OSQP
GUROBI
0.4 MOSEK
ECOS
0.2
qpOASES
0
1 10 100 1,000 10,000
Performance ratio τ

High accuracy
Ratio of problems solved

1
0.8
0.6 OSQP
GUROBI
0.4 MOSEK
ECOS
0.2
qpOASES
0
1 10 100 1,000 10,000
Performance ratio τ

Figure 3: Benchmark problems comparison with performance profiles.

References
[ABQ+ 99] F. Allgöwer, T. A. Badgwell, J. S. Qin, J. B. Rawlings, and S. J. Wright.
Nonlinear Predictive Control and Moving Horizon Estimation – An Introductory
Overview, pages 391–449. Springer London, London, 1999.

[ADD04] P. R. Amestoy, T. A. Davis, and I. S. Duff. Algorithm 837: AMD, an approxi-

mate minimum degree ordering algorithm. ACM Transactions on Mathematical
Software, 30(3):381–388, 2004.

[AVDB18] A. Agrawal, R. Verschueren, S. Diamond, and S. Boyd. A rewriting system for

convex optimization problems. Journal of Control and Decision, 5(1):42–60,
2018.

[BB96] H. H. Bauschke and J. M. Borwein. On projection algorithms for solving convex

feasibility problems. SIAM Review, 38(3):367–426, 1996.

31
Low accuracy
Ratio of problems solved 1
0.8
0.6
0.4 OSQP
GUROBI
0.2
MOSEK
0
1 10 100 1,000 10,000
Performance ratio τ

High accuracy
Ratio of problems solved

1
0.8
0.6
0.4 OSQP
GUROBI
0.2
MOSEK
0
1 10 100 1,000 10,000
Performance ratio τ

Figure 4: SuiteSparse matrix problems comparison with performance profiles.

[BBD+ 17] S. Boyd, E. Busseti, S. Diamond, R. N. Kahn, K. Koh, P. Nystrup, and J. Speth.
Multi-period trading via convex optimization. Foundations and Trends in Op-
timization, 3(1):1–76, 2017.

[BBM17] F. Borrelli, A. Bemporad, and M. Morari. Predictive Control for Linear and
Hybrid Systems. Cambridge University Press, 2017.

[BC11] H. H. Bauschke and P. L. Combettes. Convex Analysis and Monotone Operator

Theory in Hilbert Spaces. Springer, 1st edition, 2011.

[BEGFB94] S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan. Linear Matrix In-

equalities in System and Control Theory. Society for Industrial and Applied
Mathematics, 1994.

[Ben02] M. Benzi. Preconditioning techniques for large linear systems: a survey. Journal
of Computational Physics, 182(2):418 – 477, 2002.

32
Low accuracy
Ratio of problems solved 1
0.8
0.6
0.4 OSQP
GUROBI
0.2
MOSEK
0
1 10 100 1,000 10,000
Performance ratio τ

High accuracy
Ratio of problems solved

1
0.8
0.6
0.4 OSQP
GUROBI
0.2
MOSEK
0
1 10 100 1,000 10,000
Performance ratio τ

Figure 5: Maros-Mészáros problems comparison with performance profiles.

[BG18] G. Banjac and P. Goulart. Tight global linear convergence rate bounds for oper-
ator splitting methods. IEEE Transactions on Automatic Control, 63(12):4126–
4139, 2018.

[BGSB19] G. Banjac, P. Goulart, B. Stellato, and S. Boyd. Infeasibility detection in the

alternating direction method of multipliers for convex optimization. Journal of
Optimization Theory and Applications, 183(2):490–519, 2019.

[BHT04] H. Balakrishnan, I. Hwang, and C. J. Tomlin. Polynomial approximation algo-

rithms for belief matrix maintenance in identity management. In IEEE Con-
ference on Decision and Control (CDC), pages 4874–4879, 2004.

[BKL+ 13] P. Belotti, C. Kirches, S. Leyffer, J. Linderoth, J. Luedtke, and A. Mahajan.

Mixed-integer nonlinear optimization. Acta Numerica, 22:1–131, April 2013.

33
[BMOW14] S. Boyd, M. T. Mueller, B. O’Donoghue, and Y. Wang. Performance bounds
and suboptimal policies for multiperiod investment. Foundations and Trends
in Optimization, 1(1):1–72, 2014.
[BPC+ 11] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimiza-
tion and statistical learning via the alternating direction method of multipliers.
Foundations and Trends in Machine Learning, 3(1):1–122, 2011.
[Bra10] A. Bradley. Algorithms for the equilibration of matrices and their application to
limited-memory quasi-Newton methods. PhD thesis, Stanford University, 2010.
[BSM+ 17] G. Banjac, B. Stellato, N. Moehle, P. Goulart, A. Bemporad, and S. Boyd.
Embedded code generation using the OSQP solver. In IEEE Conference on
Decision and Control (CDC), 2017.
[BV04] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University
Press, 2004.
[CT06] G. Cornuejols and R. Tütüncü. Optimization Methods in Finance. Mathematics,
Finance and Risk. Cambridge University Press, 2006.
[CV95] C. Cortes and V. Vapnik. Support-vector networks. Machine Learning,
20(3):273–297, 1995.
[CWB08] E. J. Candés, M. B. Wakin, and S. Boyd. Enhancing sparsity by reweighted
`1 minimization. Journal of Fourier Analysis and Applications, 14(5):877–905,
2008.
[Dan63] G. B. Dantzig. Linear programming and extensions. Princeton University Press
Princeton, N.J., 1963.
[Dav05] T. A. Davis. Algorithm 849: a concise sparse Cholesky factorization package.
ACM Transactions on Mathematical Software, 31(4):587–591, 2005.
[Dav06] T. A. Davis. Direct Methods for Sparse Linear Systems. Society for Industrial
and Applied Mathematics, 2006.
[DB16] S. Diamond and S. Boyd. CVXPY: A Python-embedded modeling language for
convex optimization. Journal of Machine Learning Research, 17(83):1–5, 2016.
[DB17] S. Diamond and S. Boyd. Stochastic matrix-free equilibration. Journal of
Optimization Theory and Applications, 172(2):436–454, February 2017.
[DCB13] A. Domahidi, E. Chu, and S. Boyd. ECOS: An SOCP solver for embedded
systems. In European Control Conference (ECC), pages 3071–3076, 2013.
[DER89] I. S. Duff, A. M. Erisman, and J. K. Reid. Direct methods for sparse matrices.
Oxford University Press, London, 1989.

34
[DFH09] M. Diehl, H. J. Ferreau, and N. Haverbeke. Efficient Numerical Methods for
Nonlinear MPC and Moving Horizon Estimation, pages 391–417. Springer
Berlin Heidelberg, Berlin, Heidelberg, 2009.

[DH11] T. A. Davis and Y. Hu. The University of Florida Sparse Matrix Collection.
ACM Trans. Math. Softw., 38(1):1:1–1:25, December 2011.

[DHL17] I. Dunning, J. Huchette, and M. Lubin. JuMP: A modeling language for math-
ematical optimization. SIAM Review, 59(2):295–320, 2017.

[DM02] Elizabeth D. Dolan and Jorge J. Moré. Benchmarking optimization software

with performance profiles. Mathematical Programming, 91(2):201–213, January
2002.

[DR56] J. Douglas and H. H. Rachford. On the numerical solution of heat conduc-

tion problems in two and three space variables. Transactions of the American
Mathematical Society, 82(2):421–439, 1956.

[Eck94] J. Eckstein. Parallel alternating direction multiplier decomposition of convex

programs. Journal of Optimization Theory and Applications, 80(1):39–62, 1994.

[EF98] J. Eckstein and M. C. Ferris. Operator-splitting methods for monotone affine

variational inequalities, with a parallel application to optimal control. IN-
FORMS Journal on Computing, 10(2):218–235, 1998.

[FB18] C. Fougner and S. Boyd. Parameter Selection and Preconditioning for a Graph
Form Solver, pages 41–61. Springer International Publishing, 2018.

[FKP+ 14] H. J. Ferreau, C. Kirches, A. Potschka, H. G. Bock, and M. Diehl. qpOASES:

a parametric active-set algorithm for quadratic programming. Mathematical
Programming Computation, 6(4):327–363, 2014.

[FL98] R. Fletcher and S. Leyffer. Numerical experience with lower bounds for MIQP
branch-and-bound. SIAM Journal on Optimization, 8(2):604–616, 1998.

[FW56] M. Frank and P. Wolfe. An algorithm for quadratic programming. Naval Re-
search Logistics Quarterly, 3(1-2):95–110, 1956.

[Gab83] D. Gabay. Chapter IX Applications of the method of multipliers to variational

inequalities. Studies in Mathematics and Its Applications, 15:299 – 331, 1983.

[GB15] P. Giselsson and S. Boyd. Metric selection in fast dual forward–backward split-
ting. Automatica, 62:1–10, 2015.

[GB17] P. Giselsson and S. Boyd. Linear convergence and metric selection for Douglas-
Rachford splitting and ADMM. IEEE Transactions on Automatic Control,
62(2):532–544, February 2017.

35
[GM75] R. Glowinski and A. Marroco. Sur l’approximation, par éléments finis d’ordre
un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirich-
let non linéaires. ESAIM: Mathematical Modelling and Numerical Analysis -
Modélisation Mathématique et Analyse Numérique, 9(R2):41–76, 1975.

[GM76] D. Gabay and B. Mercier. A dual algorithm for the solution of nonlinear vari-
ational problems via finite element approximation. Computers & Mathematics
with Applications, 2(1):17 – 40, 1976.

[GMS+ 86] P. E. Gill, W. Murray, M. A. Saunders, J. A. Tomlin, and M. H. Wright. On

projected Newton barrier methods for linear programming and an equivalence
to Karmarkar’s projective method. Mathematical Programming, 36(2):183–209,
1986.

[GPM89] C. E. Garca, D. M. Prett, and M. Morari. Model predictive control: Theory

and practicea survey. Automatica, 25(3):335 – 348, 1989.

[Gre97] A. Greenbaum. Iterative Methods for Solving Linear Systems. Society for In-
dustrial and Applied Mathematics, 1997.

[GS16] N. Gould and J. Scott. A note on performance profiles for benchmarking soft-
ware. ACM Trans. Math. Softw., 43(2):15:1–15:5, August 2016.

[GSB18] P. Goulart, B. Stellato, and G. Banjac. QDLDL. https://github.com/

oxfordcontrol/qdldl, 2018.

[GTSJ15] E. Ghadimi, A. Teixeira, I. Shames, and M. Johansson. Optimal parameter

selection for the alternating direction method of multipliers (ADMM): quadratic
problems. IEEE Transactions on Automatic Control, 60(3):644–658, 2015.

[Gur16] Gurobi Optimization Inc. Gurobi optimizer reference manual.

http://www.gurobi.com, 2016.

[GVL96] G. H. Golub and C. F. Van Loan. Matrix Computations (3rd Ed.). Johns
Hopkins University Press, Baltimore, MD, USA, 1996.

[GW03] E. M. Gertz and S. J. Wright. Object-oriented software for quadratic program-

ming. ACM Trans. Math. Softw., 29(1):58–81, March 2003.

[Hub64] P. J. Huber. Robust estimation of a location parameter. The Annals of Math-

ematical Statistics, 35(1):73–101, 1964.

[Hub81] P. J. Huber. Robust Statistics. John Wiley & Sons, 1981.

[HYW00] B. S. He, H. Yang, and S. L. Wang. Alternating direction method with self-
adaptive penalty parameters for monotone variational inequalities. Journal of
Optimization Theory and Applications, 106(2):337–356, 2000.

36
[Int17] Intel Corporation. Intel Math Kernel Library. User’s Guide, 2017.

[JGR+ 14] J. L. Jerez, P. J. Goulart, S. Richter, G. A. Constantinides, E. C. Kerrigan,

and M. Morari. Embedded online optimization for model predictive control at
megahertz rates. IEEE Transactions on Automatic Control, 59(12):3238–3251,
December 2014.

[Kan60] L. Kantorovich. Mathematical methods of organizing and planning production.

Management Science, 6(4):366–422, 1960. English translation.

[Kar84] N. Karmarkar. A new polynomial-time algorithm for linear programming. Com-

binatorica, 4(4):373–395, 1984.

[Kel95] C. Kelley. Iterative Methods for Linear and Nonlinear Equations. Society for
Industrial and Applied Mathematics, 1995.

[KM70] V. Klee and G. Minty. How good is the simplex algorithm. Technical report,
Department of Mathematics, University of Washington, 1970.

[KRU14] P. A. Knight, D. Ruiz, and B. Uçar. A symmetry preserving algorithm for

matrix scaling. SIAM Journal on Matrix Analysis and Applications, 35(3):931–
955, 2014.

[L0̈4] J. Löfberg. YALMIP: a toolbox for modeling and optimization in MATLAB.

In IEEE International Conference on Robotics and Automation, pages 284–289,
2004.

[LM79] P. L. Lions and B. Mercier. Splitting algorithms for the sum of two nonlinear
operators. SIAM Journal on Numerical Analysis, 16(6):964–979, 1979.

[Mar52] H. Markowitz. Portfolio selection. The Journal of Finance, 7(1):77–91, 1952.

[MB10] J. Mattingley and S. Boyd. Real-time convex optimization in signal processing.

IEEE Signal Processing Magazine, 27(3):50–61, May 2010.

[MB12] J. Mattingley and S. Boyd. CVXGEN: A code generator for embedded convex
optimization. Optimization and Engineering, 13(1):1–27, 2012.

[Meh92] S. Mehrotra. On the implementation of a primal-dual interior point method.

SIAM Journal on Optimization, 2(4):575–601, 1992.

[Mit] H. Mittelmann. Benchmarks for optimization software. http://plato.asu.

edu/bench.html. Accessed: 2019-09-08.

[MM99] I. Maros and C. Mészáros. A repository of convex quadratic programming

problems. Optimization Methods and Software, 11(1-4):671–681, 1999.

37
[MM00] O. L. Mangasarian and D. R. Musicant. Robust linear and support vector
regression. IEEE Transactions on Pattern Analysis and Machine Intelligence,
22(9):950–955, 2000.

[MOS17] MOSEK ApS. The MOSEK optimization toolbox for MATLAB manual. Version
8.0 (Revision 57)., 2017.

[NLR+ 15] R. Nishihara, L. Lessard, B. Recht, A. Packard, and M. I. Jordan. A general

analysis of the convergence of ADMM. In International Conference on Machine
Learning (ICML), pages 343–352, 2015.

[NN94] Y. Nesterov and A. Nemirovskii. Interior-Point Polynomial Algorithms in Con-

vex Programming. Society for Industrial and Applied Mathematics, 1994.

[NW06] J. Nocedal and S. J. Wright. Numerical optimization. Springer Series in Oper-

ations Research and Financial Engineering. Springer, Berlin, 2006.

[OCPB16] B. O’Donoghue, E. Chu, N. Parikh, and S. Boyd. Conic optimization via oper-
ator splitting and homogeneous self-dual embedding. Journal of Optimization
Theory and Applications, 169(3):1042–1068, June 2016.

[OSB13] B. O’Donoghue, G. Stathopoulos, and S. Boyd. A splitting method for optimal

control. IEEE Transactions on Control Systems Technology, 21(6):2432–2442,
November 2013.

[PC11] T. Pock and A. Chambolle. Diagonal preconditioning for first order primal-
dual algorithms in convex optimization. In 2011 International Conference on
Computer Vision, pages 1762–1769, November 2011.

[RDC14a] A. U. Raghunathan and S. Di Cairano. ADMM for convex quadratic programs:

Q-linear convergence and infeasibility detection. arXiv:1411.7288, 2014.

[RDC14b] A. U. Raghunathan and S. Di Cairano. Infeasibility detection in alternating

direction method of multipliers for convex quadratic programs. In IEEE Con-
ference on Decision and Control (CDC), pages 5819–5824, 2014.

[RKB+ 18] A. Reuther, J. Kepner, C. Byun, S. Samsi, W. Arcand, D. Bestor, B. Berg-

eron, V. Gadepally, M. Houle, M. Hubbell, M. Jones, A. Klein, L. Milechin,
J. Mullen, A. Prout, A. Rosa, C. Yee, and P. Michaleas. Interactive super-
computing on 40,000 cores for machine learning and data analysis. In 2018
IEEE High Performance extreme Computing Conference (HPEC), pages 1–6,
Sep. 2018.

[RM09] J. B. Rawlings and D. Q. Mayne. Model Predictive Control: Theory and Design.
Nob Hill Publishing, 2009.

38
[Rui01] D. Ruiz. A scaling algorithm to equilibrate both rows and columns norms in ma-
trices. Technical Report RAL-TR-2001-034, Rutherford Appleton Laboratory,
Oxon, UL, 2001.

[RW98] R. T. Rockafellar and R. J.-B Wets. Variational analysis. Grundlehren der

mathematischen Wissenschaften. Springer, 1998.

[SB19] B. Stellato and G. Banjac. Benchmark examples for the OSQP solver.
https://github.com/oxfordcontrol/osqp_benchmarks, 2019.

[SK67] R. Sinkhorn and P. Knopp. Concerning nonnegative matrices and doubly

stochastic matrices. Pacific Journal of Mathematics, 21(2):343–348, 1967.

[SSS+ 16] G. Stathopoulos, H. Shukla, A. Szucs, Y. Pu, and C. N. Jones. Operator

splitting methods in control. Foundations and Trends in Systems and Control,
3(3):249–362, 2016.

[Tib96] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the
Royal Statistical Society: Series B, 58(1):267–288, 1996.

[TJ14] R. Takapoui and H. Javadi. Preconditioning via diagonal scaling. EE364b:

Convex Optimization II Class Project, 2014.

[Van95] R. Vanderbei. Symmetric quasi-definite matrices. SIAM Journal on Optimiza-

tion, 5(1):100–113, 1995.

[Wil63] J. H. Wilkinson. Rounding Errors in Algebraic Processes. Prentice Hall, Engle-

wood Cliffs, NJ, 1963.

[Woh17] B. Wohlberg. ADMM penalty parameter selection by residual balancing.

arXiv:1704.06209v1, 2017.

[Wol59] P. Wolfe. The simplex method for quadratic programming. Econometrica,

27(3):382–398, 1959.

[Wri97] S. Wright. Primal-Dual Interior-Point Methods. Society for Industrial and

Applied Mathematics, Philadelphia, 1997.

(Bajalinov) Linear-Fractional Programming 1st Edition
100% (4)
(Bajalinov) Linear-Fractional Programming 1st Edition
442 pages
The Seven Principles of Effective Communication
100% (1)
The Seven Principles of Effective Communication
23 pages
Write Now Plus 1 - SB Answer Key
No ratings yet
Write Now Plus 1 - SB Answer Key
43 pages
Integration and Differentiation
No ratings yet
Integration and Differentiation
27 pages
Goldfarb Idnani
100% (1)
Goldfarb Idnani
33 pages
Numerical Algorithms For SQP
No ratings yet
Numerical Algorithms For SQP
186 pages
Sequential Quadratic Programming Methods: Abstract. in His 1963 PHD Thesis, Wilson Proposed The First Sequential Quadratic
No ratings yet
Sequential Quadratic Programming Methods: Abstract. in His 1963 PHD Thesis, Wilson Proposed The First Sequential Quadratic
78 pages
QP Null Space Method
No ratings yet
QP Null Space Method
30 pages
Gurobi_LP
No ratings yet
Gurobi_LP
50 pages
2503.17356v1
No ratings yet
2503.17356v1
42 pages
Linear Programming Mathematics Theory and Algorithms
No ratings yet
Linear Programming Mathematics Theory and Algorithms
502 pages
s10107-024-02183-z
No ratings yet
s10107-024-02183-z
40 pages
SQ Preview
No ratings yet
SQ Preview
63 pages
Chapter 3
No ratings yet
Chapter 3
24 pages
Lecture Notes J2LALP
No ratings yet
Lecture Notes J2LALP
201 pages
linear-programming-in-pascal-2-220115
No ratings yet
linear-programming-in-pascal-2-220115
100 pages
Linear Programming
No ratings yet
Linear Programming
54 pages
A faster algorithm for solving general LPs
No ratings yet
A faster algorithm for solving general LPs
10 pages
Multi Parametric With References
No ratings yet
Multi Parametric With References
29 pages
Maher Nawkhass
No ratings yet
Maher Nawkhass
16 pages
Optimizing_a_linear_fractional_function
No ratings yet
Optimizing_a_linear_fractional_function
17 pages
Application of a merit function based interior point method to linear model predictive control
No ratings yet
Application of a merit function based interior point method to linear model predictive control
13 pages
Unit - V Aem
No ratings yet
Unit - V Aem
144 pages
QPs With IPM and ASM
No ratings yet
QPs With IPM and ASM
40 pages
An Algorithm For Quadratic Optimization With One Quadratic Constraint and Bounds On The Variables
No ratings yet
An Algorithm For Quadratic Optimization With One Quadratic Constraint and Bounds On The Variables
9 pages
Frank 1956
No ratings yet
Frank 1956
16 pages
notes for stats
No ratings yet
notes for stats
8 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
ECEN615 Fall2022 Lect22-3
No ratings yet
ECEN615 Fall2022 Lect22-3
52 pages
1 Quadratic Programming 5
No ratings yet
1 Quadratic Programming 5
23 pages
Simplex
No ratings yet
Simplex
8 pages
1 Algorithms For Linear Programming: 1.1 The Geometry of Lps
No ratings yet
1 Algorithms For Linear Programming: 1.1 The Geometry of Lps
6 pages
Lecture Primal Dual
No ratings yet
Lecture Primal Dual
14 pages
2204.08786v3(2)
No ratings yet
2204.08786v3(2)
6 pages
OQM Lecture Note - Part 1 Introduction To Mathematical Optimisation
No ratings yet
OQM Lecture Note - Part 1 Introduction To Mathematical Optimisation
10 pages
05 Lecture - ILP-and-duality
No ratings yet
05 Lecture - ILP-and-duality
8 pages
Deep Diver Manual
No ratings yet
Deep Diver Manual
49 pages
Classification of Optimization methods
No ratings yet
Classification of Optimization methods
68 pages
Linear and Integer Optimization (V3C1/F4C1) : Lecture Notes
No ratings yet
Linear and Integer Optimization (V3C1/F4C1) : Lecture Notes
129 pages
MULTI-OBJECTIVE Linear Programming
No ratings yet
MULTI-OBJECTIVE Linear Programming
37 pages
International Journal of Pure and Applied Mathematics No. 5 2015, 699-706
No ratings yet
International Journal of Pure and Applied Mathematics No. 5 2015, 699-706
8 pages
Paper
No ratings yet
Paper
7 pages
ma3252-cheatsheet-intro-to-linear-programming-concepts
No ratings yet
ma3252-cheatsheet-intro-to-linear-programming-concepts
4 pages
Optimizatio With Matlab
No ratings yet
Optimizatio With Matlab
49 pages
Chapter 2: Quadratic Programming
No ratings yet
Chapter 2: Quadratic Programming
16 pages
All Orders - SHEIN
No ratings yet
All Orders - SHEIN
3 pages
A Prlmal Algorithm For Interval Linear-Programming Problems
No ratings yet
A Prlmal Algorithm For Interval Linear-Programming Problems
14 pages
Benders
No ratings yet
Benders
11 pages
Camacho2007 PDF
No ratings yet
Camacho2007 PDF
24 pages
Linear and NonLinearProgramming18903341X
No ratings yet
Linear and NonLinearProgramming18903341X
6 pages
Neural Networks Give A Warm Start To Linear Optimization Problems
No ratings yet
Neural Networks Give A Warm Start To Linear Optimization Problems
6 pages
Clarabel
No ratings yet
Clarabel
48 pages
Wisdom of Crowds Intro
No ratings yet
Wisdom of Crowds Intro
53 pages
OptimumEngineeringDesign Day2b
No ratings yet
OptimumEngineeringDesign Day2b
24 pages
02 Lecture-LP-algorithms-simplex
No ratings yet
02 Lecture-LP-algorithms-simplex
7 pages
Guitar Pro - INERTIA - TO BE HERO X
No ratings yet
Guitar Pro - INERTIA - TO BE HERO X
6 pages
Exercises 1
No ratings yet
Exercises 1
2 pages
Introduction To Linear Programming: Algorithmic and Geometric Foundations of Optimization
No ratings yet
Introduction To Linear Programming: Algorithmic and Geometric Foundations of Optimization
28 pages
Daa Unit4 PDF
No ratings yet
Daa Unit4 PDF
21 pages
Mathematical Modelling of Robust Optimization For Integer Programming Problem
No ratings yet
Mathematical Modelling of Robust Optimization For Integer Programming Problem
4 pages
Solving Optimization Problems Using The Matlab Opt
No ratings yet
Solving Optimization Problems Using The Matlab Opt
50 pages
Cheatsheet
No ratings yet
Cheatsheet
2 pages
Linear Programming: Presented by - Meenakshi Tripathi
No ratings yet
Linear Programming: Presented by - Meenakshi Tripathi
13 pages
ECO-BLOOM
No ratings yet
ECO-BLOOM
17 pages
PDF Bundle 01
No ratings yet
PDF Bundle 01
62 pages
Service Tax - GTA - Conflict between popular view and Revenue's stand
No ratings yet
Service Tax - GTA - Conflict between popular view and Revenue's stand
7 pages
Drilling Mitsubishi Catalogue
No ratings yet
Drilling Mitsubishi Catalogue
187 pages
فیمینزم اور اینٹی فیمینزم۔ توازن کی تلاش
No ratings yet
فیمینزم اور اینٹی فیمینزم۔ توازن کی تلاش
17 pages
Camera Network Coverage Imrpoving by particle swarm optimization
No ratings yet
Camera Network Coverage Imrpoving by particle swarm optimization
11 pages
Mkandawire-GoodGovernanceItinerary-2007
No ratings yet
Mkandawire-GoodGovernanceItinerary-2007
4 pages
SCVP_Learning_One-Shot_View_Planning_via_Set_Covering_for_Unknown_Object_Reconstruction
No ratings yet
SCVP_Learning_One-Shot_View_Planning_via_Set_Covering_for_Unknown_Object_Reconstruction
8 pages
Solutions of Linear Programming Model
No ratings yet
Solutions of Linear Programming Model
9 pages
Camera network optimization maximize coverage in a 3D virtual environment
No ratings yet
Camera network optimization maximize coverage in a 3D virtual environment
2 pages
Poke Runyon - Secrets of The Golden Dawn Cypher Manuscript
100% (29)
Poke Runyon - Secrets of The Golden Dawn Cypher Manuscript
267 pages
MAGNETIC EFFECT OF ELECTRIC CURRENT QUESTION BANK - Copy
No ratings yet
MAGNETIC EFFECT OF ELECTRIC CURRENT QUESTION BANK - Copy
6 pages
Tata Technologies (I) LTD
No ratings yet
Tata Technologies (I) LTD
87 pages
Food Regulation 1985 Food Hygiene Regulation 2009
No ratings yet
Food Regulation 1985 Food Hygiene Regulation 2009
9 pages
Hans Urs Von Balthasar and Contemporary Feminist Theology: Theological Studies 65 (2004)
No ratings yet
Hans Urs Von Balthasar and Contemporary Feminist Theology: Theological Studies 65 (2004)
30 pages
Grinding Vs Shaving
No ratings yet
Grinding Vs Shaving
9 pages
Cafe Aurora Dinner Menu Dec 8
No ratings yet
Cafe Aurora Dinner Menu Dec 8
9 pages
Letter To PM PDF
No ratings yet
Letter To PM PDF
3 pages
Falguni Gruh Udhyog Brochure
No ratings yet
Falguni Gruh Udhyog Brochure
16 pages
Max Von Laue and The Discovery of X-Ray Diffraction in 1912: Then & Now
No ratings yet
Max Von Laue and The Discovery of X-Ray Diffraction in 1912: Then & Now
3 pages
Beatles - White Album (Score)
No ratings yet
Beatles - White Album (Score)
4 pages
CH 11 Simple Interest
No ratings yet
CH 11 Simple Interest
5 pages
Moon Phases Flip Book
No ratings yet
Moon Phases Flip Book
4 pages
Aisyiah Putri Haris - Economic Integration
No ratings yet
Aisyiah Putri Haris - Economic Integration
6 pages
BY Puan Noor Faizah BT Mohd Lajin
No ratings yet
BY Puan Noor Faizah BT Mohd Lajin
18 pages
Law Reviewer
No ratings yet
Law Reviewer
7 pages
"Philosophical Temperaments: From Plato To Foucault," by Peter Sloterdijk
100% (2)
"Philosophical Temperaments: From Plato To Foucault," by Peter Sloterdijk
14 pages
Danza Paraguaya, Partes
No ratings yet
Danza Paraguaya, Partes
34 pages
Katie Bouman, The MIT Grad Who Made First-Ever Photograph of Black Hole Possible
No ratings yet
Katie Bouman, The MIT Grad Who Made First-Ever Photograph of Black Hole Possible
2 pages
Electronic Data Backup SOP
No ratings yet
Electronic Data Backup SOP
8 pages
Compare and Contrast Lesson Plan Draft 3
No ratings yet
Compare and Contrast Lesson Plan Draft 3
3 pages
Mastering Dynamic Programming in Python
From Everand
Mastering Dynamic Programming in Python
Ed A Norex
No ratings yet