OSQP
OSQP
Quadratic Programs
Bartolomeo Stellato, Goran Banjac, Paul Goulart,
arXiv:1711.08013v4 [math.OC] 12 Feb 2020
Abstract
1 Introduction
1.1 The problem
Consider the following optimization problem
minimize (1/2)xT P x + q T x
(1)
subject to Ax ∈ C,
where x ∈ Rn is the decision variable. The objective function is defined by a positive
semidefinite matrix P ∈ Sn+ and a vector q ∈ Rn , and the constraints by a matrix A ∈ Rm×n
1
and a nonempty, closed and convex set C ⊆ Rm . We will refer to it as general (convex)
quadratic program.
If the set C takes the form
C = [l, u] = {z ∈ Rm | li ≤ zi ≤ ui , i = 1, . . . , m} ,
minimize (1/2)xT P x + q T x
(2)
subject to l ≤ Ax ≤ u,
which we will refer to as a quadratic program (QP). Linear equality constraints can be
encoded in this way by setting li = ui for some or all of the elements in (l, u). Note that any
linear program (LP) can be written in this form by setting P = 0. We will characterize the
size of (2) with the tuple (n, m, N ) where N is the sum of the number of nonzero entries in
P and A, i.e., N = nnz(P ) + nnz(A).
Applications. Optimization problems of the form (1) arise in a huge variety of applica-
tions in engineering, finance, operations research and many other fields. Applications in
machine learning include support vector machines (SVM) [CV95], Lasso [Tib96, CWB08]
and Huber fitting [Hub64, Hub81]. Financial applications of (1) include portfolio opti-
mization [CT06, Mar52, BMOW14, BBD+ 17] [BV04, §4.4.1]. In the field of control engi-
neering, model predictive control (MPC) [RM09, GPM89] and moving horizon estimation
(MHE) [ABQ+ 99] techniques require the solution of a QP at each time instant. Several
signal processing problems also fall into the same class [BV04, §6.3.3][MB10]. In addition,
the numerical solution of QP subproblems is an essential component in nonconvex opti-
mization methods such as sequential quadratic programming (SQP) [NW06, Chap. 18] and
mixed-integer optimization using branch-and-bound algorithms [BKL+ 13, FL98].
Active-set methods. Active-set methods were the first algorithms popularized as solution
methods for QPs [Wol59], and were obtained from an extension of Dantzig’s simplex method
for solving LPs [Dan63]. Active-set algorithms select an active-set (i.e., a set of binding
constraints) and then iteratively adapt it by adding and dropping constraints from the index
of active ones [NW06, §16.5]. New active constraints are added based on the cost function
gradient and the current dual variables. Active-set methods for QPs differ from the simplex
method for LPs because the iterates are not necessarily vertices of the feasible region. These
methods can easily be warm started to reduce the number of active-set recalculations re-
quired. However, the major drawback of active-set methods is that the worst-case complexity
2
grows exponentially with the number of constraints, since it may be necessary to investigate
all possible active-sets before reaching the optimal one [KM70]. Modern implementations of
active-set methods for the solution of QPs can be found in many commercial solvers, such as
MOSEK [MOS17] and GUROBI [Gur16], and in the open-source solver qpOASES [FKP+ 14].
3
problems and implemented in the open-source solver SCS [OCPB16]. Although every QP can
be reformulated as a conic program, this reformulation is not efficient from a computational
point of view. A further drawback of ADMM is that number of iterations required to converge
is highly dependent on the problem data and on the user’s choice of the algorithm’s step-
size parameters. Despite some recent theoretical results [GB17, BG18], it remains unclear
how to select those parameters to optimize the algorithm convergence rate. For this reason,
even though there are several benefits in using ADMM techniques for solving optimization
problems, there exists no reliable general-purpose QP solver based on operator splitting
methods.
4
against state-of-the-art interior-point and active-set solvers over a benchmark library of 1400
problems from 7 different classes and over the hard QPs Maros-Mészáros test set [MM99].
Numerical results show that our algorithm is able to provide up to an order of magnitude
computational time improvements over existing commercial and open-source solvers in a
wide variety of applications. We also showed further time reductions from warm starting
and factorization caching.
2 Optimality conditions
We will find it convenient to rewrite problem (1) by introducing an additional decision
variable z ∈ Rm , to obtain the equivalent problem
minimize (1/2)xT P x + q T x
subject to Ax = z (3)
z ∈ C.
We can write the optimality conditions of problem (3) as [BGSB19, Lem. A.1] [RW98, Thm.
6.12]
Ax = z, (4)
P x + q + AT y = 0, (5)
z ∈ C, y ∈ NC (z), (6)
where y ∈ Rm is the Lagrange multiplier associated with the constraint Ax = z and NC (z)
denotes the normal cone of C at z. If there exist x ∈ Rn , z ∈ Rm and y ∈ Rm that satisfy the
conditions above, then we say that (x, z) is a primal and y is a dual solution to problem (3).
We define the primal and dual residuals of problem (1) as
rprim = Ax − z, (7)
T
rdual = P x + q + A y. (8)
Quadratic programs. In case of QPs of the form (2), condition (6) reduces to
T T
l ≤ z ≤ u, y+ (z − u) = 0, y− (z − l) = 0, (9)
where y+ = max(y, 0) and y− = min(y, 0).
5
where SC is the support function of C, provided that some type of constraint qualification
holds [BV04]. In other words, any variable y ∈ D serves as a certificate that problem (1) is
primal infeasible.
Quadratic programs. In case C = [l, u], certifying primal infeasibility of (2) amounts to
finding a vector y ∈ Rm such that
AT y = 0, uT y+ + lT y− < 0. (12)
n
Similarly, it can be shown that a vector x ∈ R satisfying
= 0 li , ui ∈ R
P x = 0, q T x < 0, (Ax)i ≥ 0 ui = +∞, li ∈ R (13)
≤ 0 l = −∞, u ∈ R
i i
is a certificate of dual infeasibility for problem (2); see [BGSB19, Prop. 3.1] for more details.
6
3.1 Solving the linear system
Evaluating the ADMM step (15) involves solving the equality constrained QP
Direct method. A direct method for solving the linear system (24) computes its solution
by first factoring the KKT matrix and then performing forward and backward substitution.
Since the KKT matrix remains the same for every iteration of ADMM, we only need to
perform the factorization once prior to the first iteration and cache the factors so that we can
reuse them in subsequent iterations. This approach is very efficient when the factorization
cost is considerably higher than the cost of forward and backward substitutions, so that
each iteration is computed quickly. Note that if ρ or σ change, the KKT matrix needs to be
factored again.
Our particular choice of splitting results in a KKT matrix that is quasi-definite, i.e., it can
be written as a 2-by-2 block-symmetric matrix where the (1, 1)-block is positive definite, and
the (2, 2)-block is negative definite. It therefore always has a well defined LDLT factorization,
with L being a lower triangular matrix with unit diagonal elements and D a diagonal matrix
with nonzero diagonal elements [Van95]. Note that once the factorization is carried out,
computing the solution of (24) can be made division-free by storing D−1 instead of D.
When the KKT matrix is sparse and quasi-definite, efficient algorithms can be used for
computing a suitable permutation matrix P for which the factorization of P KP T results in
7
a sparse factor L [ADD04, Dav06] without regard for the actual nonzero values appearing in
the KKT matrix. The LDLT factorization consists of two steps. In the first step we compute
the sparsity pattern of the factor L. This step is referred to as the symbolic factorization
and requires only the sparsity pattern of the KKT matrix. In the second step, referred to as
the numerical factorization, we determine the values of nonzero elements in L and D. Note
that we do not need to update the symbolic factorization if the nonzero entries of the KKT
matrix change but the sparsity pattern remains the same.
Indirect method. With large-scale QPs, factoring linear system (24) might be prohibitive.
In these cases it might be more convenient to use an indirect method by solving instead the
linear system
P + σI + ρAT A x̃k+1 = σxk − q + AT (ρz k − y k )
obtained by eliminating ν k+1 from (24). We then compute z̃ k+1 as z̃ k+1 = Ax̃k+1 . Note
that the coefficient matrix in the above linear system is always positive definite. The linear
system can therefore be solved with an iterative scheme such as the conjugate gradient
method [GVL96, NW06]. When the linear system is solved up to some predefined accuracy,
we terminate the method. We can also warm start the method using the linear system
solution at the previous iteration of ADMM to speed up its convergence. In contrast to
direct methods, the complexity of indirect methods does not change if we update ρ and σ
since there is no factorization required. This allows for more updates to take place without
any overhead.
Algorithm 1
1: given initial values x0 , z 0 , y 0 and parameters ρ > 0, σ > 0, α ∈ (0, 2)
2: repeat k+1
k+1 k+1 P + σI AT x̃ σxk − q
3: (x̃ , ν ) ← solve linear system = k
A −ρ−1 I ν k+1 z − ρ−1 y k
k+1 k −1 k+1 k
4: z̃ ← z + ρ (ν −y )
k+1 k+1
5: x ← αx̃ + (1 − α)xk
z k+1 ← Π αz̃ k+1 + (1 − α)z k + ρ−1 y k
6:
7: y k+1 ← y k + ρ αz̃ k+1 + (1 − α)z k − z k+1
8: until termination criterion is satisfied
8
3.3 Convergence and infeasibility detection
We show in this section that the proposed algorithm generates a sequence of iterates
(xk , z k , y k ) that in the limit satisfy the optimality conditions (4)–(6) when problem (1) is
solvable, or provides a certificate of primal or dual infeasibility otherwise.
If we denote the argument of the projection operator in step 6 of Algorithm 1 by v k+1 ,
then we can express z k and y k as
Observe from (25) that iterates z k and y k satisfy optimality condition (6) for all k > 0
by construction [BC11, Prop. 6.46]. Therefore, it only remains to show that optimality
conditions (4)–(5) are satisfied in the limit.
As shown in [BGSB19, Prop. 5.3], if problem (2) is solvable, then Algorithm 1 produces
a convergent sequence of iterates (xk , z k , y k ) so that
k
lim rprim = 0,
k→∞
k
lim rdual = 0,
k→∞
k k
where rprim and rdual correspond to the residuals defined in (7) and (8) respectively.
On the other hand, if problem (2) is primal and/or dual infeasible, then the sequence of
iterates (xk , z k , y k ) generated by Algorithm 1 does not converge. However, the sequence
always converges and can be used to certify infeasibility of the problem. According to
[BGSB19, Thm. 5.1], if the problem is primal infeasible, then δy = limk→∞ δy k satisfies
conditions (12), whereas δx = limk→∞ δxk satisfies conditions (13) if it is dual infeasible.
where εabs > 0 and εrel > 0 are absolute and relative tolerances, respectively.
9
Quadratic programs infeasibility. If C = [l, u], we check the following conditions for primal
infeasibility
AT δy k ∞
≤ εpinf kδy k k∞ , uT (δy k )+ + lT (δy k )− ≤ εpinf kδy k k∞ ,
where εpinf > 0 is some tolerance level. Similarly, we define the following criterion for
detecting dual infeasibility
for i = 1, . . . , m where εdinf > 0 is some tolerance level. Note that kδxk k∞ and kδy k k∞
appear in the right-hand sides to avoid division when considering normalized vectors δxk
and δy k in the termination criteria.
4 Solution polishing
Operator splitting methods are typically used for obtaining solution of an optimization prob-
lem with a low or medium accuracy. However, even if a solution is not very accurate we can
often guess which constraints are active from an approximate primal-dual solution. When
dealing with QPs of the form (2), we can obtain high accuracy solutions from the final
ADMM iterates by solving one additional system of equations.
Given a dual solution y of the problem, we define the sets of lower- and upper-active
constraints
L = {i ∈ {1, . . . , m} | yi < 0} ,
U = {i ∈ {1, . . . , m} | yi > 0} .
According to (9) we have that zL = lL and zU = uU , where lL denotes the vector composed
of elements of l corresponding to the indices in L. Similarly, we will denote by AL the matrix
composed of rows of A corresponding to the indices in L.
If the sets of active constraints are known a priori, then a primal-dual solution (x, y, z)
can be found by solving the following linear system
P ATL ATU x −q
AL yL = lL , (27)
AU yU uU
yi = 0, i ∈
/ (L ∪ U), (28)
z = Ax. (29)
10
We can then apply the aforementioned procedure to obtain a candidate solution (x, y, z).
If (x, y, z) satisfies the optimality conditions (4)–(6), then our guess is correct and (x, y, z)
is a primal-dual solution of problem (3). This approach is referred to as solution polishing.
Note that the dimension of the linear system (27) is usually much smaller than the KKT
system in Section 3.1 because the number of active constraints at optimality is less than or
equal to n for non-degenerate QPs.
However, the linear system (27) is not necessarily solvable even if the sets of active
constraints L and U have been correctly identified. This can happen, e.g., if the solution is
degenerate, i.e., if it has one or more redundant active constraints. We make the solution
polishing procedure more robust by solving instead the following linear system
P + δI ATL ATU x̂ −q
AL −δI ŷL = lL , (30)
AU −δI ŷU uU
where δ > 0 is a regularization parameter with value δ ≈ 10−6 . Since the regularized matrix
in (30) is quasi-definite, the linear system (30) is always solvable.
By using regularization, we actually solve a perturbed linear system and thus introduce
a small error to the polished solution. If we denote by K and (K + ∆K) the coefficient
matrices in (27) and (30), respectively, then we can represent the two linear systems as
Kt = g and (K + ∆K)t̂ = g. To compensate for this error, we apply an iterative refinement
procedure [Wil63], i.e., we iteratively solve
(K + ∆K)∆t̂k = g − K t̂k (31)
and update t̂k+1 = t̂k + ∆t̂k . The sequence {t̂k } converges to the true solution t, provided
that it exists. Observe that, compared to solving the linear system (30), iterative refine-
ment requires only a backward- and a forward-solve, and does not require another matrix
factorization. Since the iterative refinement iterations converge very quickly in practice, we
just run them for a fixed number of passes without imposing any termination condition to
satisfy. Note that this is the same strategy used in commercial linear system solvers using
iterative refinement [Int17].
5.1 Preconditioning
Preconditioning is a common heuristic aiming to reduce the number of iterations in first-
order methods [NW06, Chap. 5],[GTSJ15, Ben02, PC11, GB15, GB17]. The optimal choice
11
of preconditioners has been studied for at least two decades and remains an active area of
research [Kel95, Chap. 2],[Gre97, Chap. 10]. For example, the optimal diagonal precon-
ditioner required to minimize the condition number of a matrix can be found exactly by
solving a semidefinite program [BEGFB94]. However, this computation is typically more
complicated than solving the original QP, and is therefore unlikely to be worth the effort
since preconditioning is only a heuristic to minimize the number of iterations.
In order to keep the preconditioning procedure simple, we instead make use of a simple
heuristic called matrix equilibration [Bra10, TJ14, FB18, DB17]. Our goal is to rescale
the problem data to reduce the condition number of the symmetric matrix M ∈ Sn+m
representing the problem data, defined as
P AT
M= . (32)
A 0
where D ∈ Sn++ and E ∈ Sm ++ are both diagonal. In addition, we would like to normalize
the cost function to prevent the dual variables from being too large. We can achieve this by
multiplying the cost function by the scalar c > 0.
Preconditioning effectively modifies problem (1) into the following
minimize (1/2)x̄T P̄ x̄ + q̄ T x̄
¯ (34)
subject to Āx̄ ∈ C,
Ruiz equilibration. In this work we apply a variation of the Ruiz equilibration [Rui01].
This technique was originally proposed to equilibrate square matrices showing fast linear
convergence superior to other methods such as the Sinkhorn-Knopp equilibration [SK67].
Ruiz equilibration converges in few tens of iterations even in cases when Sinkhorn-Knopp
equilibration takes thousands of iterations [KRU14]. The steps are outlined in Algorithm 2
and differ from the original Ruiz algorithm by adding a cost scaling step that takes into
12
Algorithm 2 Modified Ruiz equilibration
initialize c = 1, S = I, δ = 0, P̄ = P, q̄ = q, Ā = A, C¯ = C
while k1 − δk∞ > εequil do
for i = 1, . .p . , n + m do
δi ← 1/ kMi k∞ . M equilibration
P̄ , q̄, Ā, C¯ ← Scale P̄ , q̄, Ā, C¯ using diag(δ)
γ ← 1/ max{mean(kP̄i k∞ ), kq̄k∞ } . Cost scaling
P̄ ← γ P̄ , q̄ ← γ q̄
S ← diag(δ)S, c ← γc
return S, c
account very large values of the cost. The first part is the usual Ruiz equilibration step.
Since M is symmetric, we focus only on the columns Mi and apply the scaling to both sides
of M . At each iteration, we compute the ∞-norm of each column and normalize that column
by the inverse of its square root. The second part is a cost scaling step. The scalar γ is the
current cost normalization coefficient taking into account the maximum between the average
norm of the columns of P̄ and the norm of q̄. We normalize problem data P̄ , q̄, Ā, ¯l, ū in
place at each iteration using the current values of δ and γ.
Unscaled termination criteria. Although we rescale our problem in the form (34), we
would still like to apply the stopping criteria defined in Section 3.4 to an unscaled version of
our problem. The primal and dual residuals in (26) can be rewritten in terms of the scaled
problem as
k
rprim = E −1 r̄prim
k
= E −1 (Āx̄k − z̄ k ), k
rdual = c−1 D−1 r̄dual
k
= c−1 D−1 (P̄ x̄k + q̄ + ĀT ȳ k ),
and the tolerances levels as
εprim = εabs + εrel max{kE −1 Āx̄k k∞ , kE −1 z̄ k k∞ }
εdual = εabs + εrel c−1 max{kD−1 P̄ x̄k k∞ , kD−1 ĀT ȳ k k∞ , kD−1 q̄k∞ }.
Quadratic programs infeasibility. When C = [l, u], the primal infeasibility conditions be-
come
D−1 ĀT δ ȳ k ∞ ≤ εpinf kEδ ȳ k k∞ , ūT (δ ȳ k )+ + ¯lT (δ ȳ k )− ≤ εpinf kEδ ȳ k k∞ ,
where the primal infeasibility certificate is c−1 Eδ ȳ k . The dual infeasibility criteria are
kD−1 P̄ δx̄k k∞ ≤ cεdinf kDδx̄k k∞ , q̄ T δx̄k ≤ cεdinf kDδx̄k k∞ ,
k
∈ [−εdinf , εdinf ] kDδx̄ k∞
ui , li ∈ R
(E −1 Āδx̄k )i ≥ −εdinf kDδx̄k k∞ ui = +∞
≤ εdinf kDδx̄k k∞ li = −∞,
13
5.2 Parameter selection
The choice of parameters (ρ, σ, α) in Algorithm 1 is a key factor in determining the number
of iterations required to find an optimal solution. Unfortunately, it is still an open research
question how to select the optimal ADMM parameters, see [GTSJ15, NLR+ 15, GB17]. After
extensive numerical testing on millions of problem instances and a wide range of dimensions,
we chose the algorithm parameters as follows for QPs.
Choosing σ and α. The parameter σ is a regularization term which is used to ensure that
a unique solution of (15) will always exist, even when P has one or more zero eigenvalues.
After scaling P in order to minimize its condition number, we choose σ as small as possible
to preserve numerical stability without slowing down the algorithm. We set the default value
as σ = 10−6 . The relaxation parameter α in the range [1.5, 1.8] has empirically shown to
improve the convergence rate [Eck94, EF98]. In the proposed method, we set the default
value of α = 1.6.
Choosing ρ. The most crucial parameter is the step-size ρ. Numerical testing showed that
having different values of ρ for different constraints, can greatly improve the performance.
For this reason, without altering the algorithm steps, we chose ρ ∈ Sm ++ being a positive
definite diagonal matrix with different elements ρi .
For a specific QP, if we know the active and inactive constraints, then we can rewrite it
simply as an equality constrained QP. In this case the optimal ρ is defined as ρi = ∞ for
the active constraints and ρi = 0 for the inactive constraints, therefore reducing the linear
system (24) to the optimality conditions of the equivalent equality constrained QP (after
setting σ = 0). Unfortunately, it is impossible to know a priori whether any given constraint
is active or inactive at optimality, so we must instead adopt some heuristics. We define ρ as
follows (
ρ̄ li 6= ui
ρ = diag(ρ1 , . . . , ρm ), ρi = 3
10 ρ̄ li = ui ,
where ρ̄ > 0. In this way we assign a high value to the step-size related to the equality
constraints since they will be active at the optimum. Having a fixed value of ρ̄ cannot
provide fast convergence for different kind of problems since the optimal solution and the
active constraints vary greatly. To compensate for this issue, we adopt an adaptive scheme
which updates ρ̄ during the iterations based on the ratio between primal and dual residuals.
The idea of introducing “feedback” in the algorithm steps makes ADMM more robust to bad
scaling in the data; see [HYW00, BPC+ 11, Woh17]. Contrary to the adaptation approaches
in the literature where the update increases or decreases the value of the step-size by a fixed
amount, we adopt the following rule
s
k
k+1 k
kr̄prim k∞ / max{kĀx̄k k∞ , kz̄ k k∞ }
ρ̄ ← ρ̄ k
.
kr̄dual k∞ / max{kP̄ x̄k k∞ , kĀT ȳ k k∞ , kq̄k∞ }
14
In other words we update ρ̄k using the square root of the ratio between the scaled residuals
normalized by the magnitudes of the relative part of the tolerances. We set the initial value as
ρ̄0 = 0.1. In our benchmarks, if ρ̄0 does not already give a low number of ADMM iterations,
it gets usually tuned with a maximum of 1 or 2 updates. The adaptation causes the KKT
matrix in (24) to change and, if the linear system solver solution method is direct, it requires
a new numerical factorization. We do not require a new symbolic factorization because
the sparsity pattern of the KKT matrix does not change. Since the numerical factorization
can be costly, we perform the adaptation only when it is really necessary. In particular, we
allow an update if the accumulated iterations time is greater than a certain percentage of the
factorization time (nominally 40%) and if the new parameter is sufficiently different than the
current one, i.e., 5 times larger or smaller. Note that in the case of an indirect method this
rule allows for more frequent changes of ρ since there is no need to factor the KKT matrix
and the update is numerically much cheaper. Note that the convergence of the ADMM
algorithm is hard to prove in general if the ρ updates happen at each iteration. However, if
we assume that the updates stop after a fixed number of iterations the convergence results
hold [BPC+ 11, Section 3.4.1].
6 Parametric programs
In application domains such as control, statistics, finance, and SQP, problem (1) is solved
repeatedly for varying data. For these problems, usually referred to as parametric programs,
we can speed up the repeated OSQP calls by re-using the computations across multiple
solves.
We make the distinction between cases in which only the vectors or all data in (1) change
between subsequent problem instances. We assume that the problem dimensions n and m
and the sparsity patterns of P and A are fixed.
Vectors as parameters. If the vectors q, l, and u are the only parameters that vary, then
the KKT coefficient matrix in Algorithm 1 does not change across different instances of
the parametric program. Thus, if a direct method is used, we perform and store its fac-
torization only once before the first solution and reuse it across all subsequent iterations.
Since the matrix factorization is the computationally most expensive step of the algorithm,
this approach reduces significantly the amount of time OSQP takes to solve subsequent
problems. This class of problems arises very frequently in many applications including
linear MPC and MHE [RM09, ABQ+ 99], Lasso [Tib96, CWB08], and portfolio optimiza-
tion [BMOW14, Mar52].
Matrices and vectors as parameters. We separately consider the case in which the values
(but not the locations) of the nonzero entries of matrices P and A are updated. In this
case, in a direct method, we need to refactor the matrix in Algorithm 1. However, since
the sparsity pattern does not change we need only to recompute the numerical factorization
while reusing the symbolic factorization from the previous solution. This results in a modest
15
reduction in the computation time. This class of problems encompasses several applications
such as nonlinear MPC and MHE [DFH09] and sequential quadratic programming [NW06].
7 OSQP
We have implemented our proposed approach in the “Operator Splitting Quadratic Program”
(OSQP) solver, an open-source software package in the C language. OSQP can solve any
QP of the form (2) and makes no assumptions about the problem data other than convexity.
OSQP is available online at
https://osqp.org.
Users can call OSQP from C, C++, Fortran, Python, Matlab, R, Julia, Ruby and Rust, and
via parsers such as CVXPY [DB16, AVDB18], JuMP [DHL17], and YALMIP [L0̈4].
To exploit the data sparsity pattern, OSQP accepts matrices in Compressed-Sparse-
Column (CSC) format [Dav06]. We implemented the linear system solution described in
Section 3.1 as an object-oriented interface to easily switch between efficient algorithms. At
present, OSQP ships with the open-source QDLDL direct solver which is our independent
implementation based on [Dav05], and also supports dynamic loading of more advanced
algorithms such as the MKL Pardiso direct solver [Int17]. We plan to add iterative indirect
solvers and other direct solvers in future versions.
The default values for the OSQP termination tolerances described in Section 3.4 are
εabs = εrel = 10−3 , εpinf = εdinf = 10−4 .
The default step-size parameter σ and the relaxation parameter α are set to
σ = 10−6 , α = 1.6,
while ρ is automatically chosen by default as described in Section 5.2, with optional user
override. We set the default fixed number of iterative refinement steps to 3.
OSQP reports the total computation time divided by the time required to perform pre-
processing operations such as scaling or matrix factorization and the time to carry out the
ADMM iterations. If the solver is called multiple times reusing the same matrix factoriza-
tion, it will report only the ADMM solve time as total computation time. For more details
we refer the reader to the solver documentation on the OSQP project website.
16
8 Numerical examples
We benchmarked OSQP against the open-source interior-point solver ECOS [DCB13], the
open-source active-set solver qpOASES [FKP+ 14], and the commercial interior-point solvers
GUROBI [Gur16] and MOSEK [MOS17]. We executed every benchmark comparing different
solvers with both low accuracy, i.e., εabs = εrel = 10−3 , and high accuracy, i.e., εabs = εrel =
10−5 . We set GUROBI, ECOS, MOSEK and OSQP primal and dual feasibility tolerances
to our low and high accuracy tolerances. Since qpOASES is an active-set method and does
not allow the user to tune primal nor dual feasibility tolerances, we set it to its default
termination settings. In addition, the maximum time we allow each solver to run is 1000 sec
and no limit on the maximum number of iterations. Note that the use of maximum time
limits with no bounds on the number of iterations is the default setting in commercial solvers
such as MOSEK. For every solver we leave all the other settings to the internal defaults.
In general it is hard to compare the solution accuracies because all the solvers, especially
commercial ones, use an internal problem scaling and verify that the termination conditions
are satisfied against their scaled version of the problem. In contrast, OSQP allows the option
to check the termination conditions against the internally scaled or the original problem.
Therefore, to make the benchmark fair, we say that the primal-dual solution (x? , y ? ) returned
by each solver is optimal if the following optimality conditions are satisfied with tolerances
defined above with low and high accuracy modes,
where εprim and εdual are defined in Section 3.4. If the primal-dual solution returned by a
solver does not satisfy the optimality conditions defined above, we consider it a failure. Note
that we decided not to include checks on the complementary slackness satisfaction because
interior-point solvers satisfied them with different metrics and scalings, therefore failing very
often. In contrast OSQP always satisfies complementary slackness conditions with machine
precision by construction.
In addition, we used the direct single-threaded linear system solver QDLDL [GSB18]
based on [ADD04, Dav05] and very simple linear algebra where other solvers such as
GUROBI and MOSEK use advanced multi-threaded linear system solvers and custom linear
algebra.
All the experiments were carried out on the MIT SuperCloud facility in collaboration
with the Lincoln Laboratory [RKB+ 18] with 16 Intel Xeon E5-2650 cores. The code for all
the numerical examples is available online at [SB19].
Shifted geometric mean. As in most common benchmarks [Mit], we make use of the
normalized shifted geometric mean to compare the timings of the various solvers. Given the
time required by solver s to solve problem p tp,s , we define the shifted geometric mean as
sY
gs = n (tp,s + k) − k,
p
17
where n is the number of problem instances considered and k = 1 is the shift [Mit]. The
normalized shifted geometric mean is therefore
rs = gs / min gs .
s
This value shows the factor at which a specific solver is slower than the fastest one with
scaled value of 1.00. If solver s fails at solving problem p, we set the time as the maximum
allowed, i.e., tp,s = 1000 sec. Note that to avoid memory overflows in the product, we
compute in practice the shifted geometric mean as eln gs .
Performance profiles. We also make use of the performance profiles [DM02] to compare
the solver timings. We define the performance ratio
where I≤τ (up,s ) = 1 if up,s ≤ τ or 0 otherwise. The value fs (τ ) corresponds to the fraction of
problems solved within τ times from the best solver. Note that while we cannot necessarily
assess the performance of one solver relative to another with performance profiles, they still
represent a viable choice to benchmark the performance of a solver with respect to the best
one [GS16].
Results. We show in Figures 1 and 2 the OSQP and GUROBI computation times across all
the problem classes for low and high accuracy solutions respectively. OSQP is competitive
or even faster than GUROBI for several problem classes. Results are shown in Table 1
and Figure 3. OSQP shows the best performance across these benchmarks with MOSEK
performing better at lower accuracy and GUROBI at higher accuracy. ECOS is generally
18
Table 1: Benchmark problems comparison with timings as shifted geometric mean and
failure rates.
OSQP GUROBI MOSEK ECOS qpOASES
Shifted geometric Low accuracy 1.000 4.285 2.522 28.847 149.932
means High accuracy 1.000 1.886 6.234 52.718 66.254
Low accuracy 0.000 1.429 0.071 20.714 31.857
Failure rates [%]
High accuracy 0.000 1.429 11.000 45.571 31.714
Mean
Low accuracy 42.79
Polish success [%]
High accuracy 83.21
slower than the other interior-point solvers but faster than qpOASES that shows issues with
many constraints. Table 2 contains the OSQP statistics for this benchmark class. Because
of the good convergence behavior of OSQP on these problems, the setup time is significant
compared to the solve time, especially at low accuracy. Solution polishing increases the
solution time by a median of 10 to 20 percent due to the additional factorization used. The
worst-case time increase is very high and happens for the problems that converge in very few
iterations. Note that with high accuracy, polishing succeeds in 83% of test cases while on
low accuracy it succeeds in only 42% of cases. The number of ρ updates is in general very
low, usually requiring just more matrix factorization to adjust, with up to 5 refactorisations
used in the worst case when solving with high accuracy.
19
Table 3: SuiteSparse matrix problems comparison with timings as shifted geometric mean
and failure rates.
OSQP GUROBI MOSEK
Shifted geometric Low accuracy 1.000 1.630 1.745
means High accuracy 1.000 1.489 4.498
Low accuracy 0.000 14.286 12.500
Failure rates [%]
High accuracy 1.786 16.071 33.929
Mean
Low accuracy 67.86
Polish success [%]
High accuracy 78.18
Results. Results are shown in Table 3 and Figure 4. OSQP shows the best performance
with GUROBI slightly slower and MOSEK third. The failure rates for GUROBI and MOSEK
are higher because the reported solution does not satisfy the optimality conditions of the
original problem. We display the OSQP statistics in Table 4. The setup phase takes a
significant amount of time compared to the solve phase, especially when OSQP converges
in a few iterations. This happens because the large problem dimensions result in a large
initial factorization time. Polish time is in general 22 to 32% of the total solution time.
However, the success is usually reliable, succeeding 78% of the times with very high quality
solutions. The number of matrix refactorizations required due to ρ updates is very low in
these examples, with a maximum of 2 or 3 even for high accuracy.
20
Table 5: Maros-Mészáros problems comparison with timings as shifted geometric mean
and failure rates.
OSQP GUROBI MOSEK
Shifted geometric Low accuracy 1.464 1.000 6.121
means High accuracy 5.247 1.000 14.897
Low accuracy 1.449 2.174 14.493
Failure rates [%]
High accuracy 10.145 2.899 30.435
Mean
Low accuracy 30.15
Polish success [%]
High accuracy 37.90
linear algebra.
Results. Results are shown in Table 5 and Figure 5. GUROBI shows the best performance
and OSQP, while slower, is still competitive on both low and high accuracy tests. MOSEK
remains the slowest in every case. Table 6 shows the statistics relative to OSQP. Since these
hard problems require a larger number of iterations to converge, the setup time overhead
compared to the solution time is in general lower than the other benchmark sets. Moreover,
since the problems are badly scaled and degenerate, the polishing strategy rarely succeeds.
However, the median time increase from the polish step is less than 10% of the total com-
putation time for both low and high accuracy modes. Note that the number of ρ updates
is usually very low with a median of 1 or 2. However, there are some worst-case problems
when it is very high because the bad scaling causes issues in our ρ estimation. However,
from our data we have seen that in more than 95% of the cases the number of ρ updates is
less than 5.
21
8.4 Warm start and factorization caching
To show the benefits of warm starting and factorization caching, we solved a sequence of
QPs using OSQP with the data varying according to some parameters. Since we are not
comparing OSQP with other high accuracy solvers in these benchmarks, we use its default
settings with accuracy 10−3 .
Lasso regularization path. We solved a Lasso problem described in Appendix A.5 with
varying λ in order to choose a regressor with good validation set performance. We solved
one problem instance with n = 50, 100, 150, 200 features, m = 100n data points, and λ
logarithmically spaced taking 100 values between λmax = kAT bk∞ and 0.01λmax .
Since the parameters only enter linearly in the cost, we can reuse the matrix factorization
and enable warm starting to reduce the computation time as discussed in Section 6.
Model predictive control. In MPC, we solve the optimal control problem described in
Appendix A.3 at each time step to compute an optimal input sequence over the horizon.
Then, we apply only the first input to the system and propagate the state to the next time
step. The whole procedure is repeated with an updated initial state xinit . We solved the
control problem with nx = 20, 40, 60, 80 states, nu = nx /2 inputs, horizon T = 10 and 100
simulation steps. The initial state of the simulation is uniformly distributed and constrained
to be within the feasible region, i.e., xinit ∼ U(−0.5x, 0.5x).
Since the parameters only enter linearly in the constraints bounds, we can reuse the
matrix factorization and enable warm starting to reduce the computation time as discussed
in Section 6.
Portfolio back test. Consider the portfolio optimization problem in Appendix A.4 with
n = 10k assets and k = 100, 200, 300, 400 factors.
We run a 4 years back test to compute the optimal assets investment depending on
varying expected returns and factor models [BBD+ 17]. We solved 240 QPs per year giving
a total of 960 QPs. Each month we solved 20 QPs corresponding to the trading days.
Every day, we updated the expected returns µ by randomly generating another vector with
µi ∼ 0.9µ̂i + N (0, 0.1), where µ̂i comes from the previous expected returns. The risk model
was updated every √ month by updating the nonzero elements of D and F according to Dii ∼
0.9D̂ii + U[0, 0.1 k] and Fij ∼ 0.9F̂ij + N (0, 0.1) where D̂ii and F̂ij come from the previous
risk model.
As discussed in Section 6, we exploited the following computations during the QP updates
to reduce the computation times. Since µ only enters in the linear part of the objective, we
can reuse the matrix factorization and enable warm starting. Since the sparsity patterns of
D and F do not change during the monthly updates, we can reuse the symbolic factorization
and exploit warm starting to speed up the computations.
Results. We show the results in Table 7. For the Lasso problem we see more than 10-
fold improvement in time and between 8 and 11 times reduction in number of iterations
22
Table 7: OSQP parametric problem results with warm start (ws) and without warm start
(no ws) in terms of time in seconds and number of iterations for different leading problem
dimensions of Lasso, MPC and Portfolio classes.
Time Time Time Iter Iter Iter
Problem dim. no ws ws improv. no ws ws improv.
50 0.225 0.012 19.353 210.250 25.750 8.165
100 0.423 0.040 10.556 224.000 25.750 8.699
Lasso
150 1.022 0.086 11.886 235.500 25.750 9.146
200 2.089 0.149 13.986 281.750 26.000 10.837
20 0.007 0.002 4.021 89.500 32.750 2.733
40 0.014 0.005 2.691 29.000 27.250 1.064
MPC
60 0.035 0.013 2.673 33.750 33.000 1.023
80 0.067 0.022 3.079 32.000 31.750 1.008
100 0.177 0.030 5.817 93.333 25.417 3.672
200 0.416 0.061 6.871 86.875 25.391 3.422
Portfolio
300 0.646 0.097 6.635 80.521 25.521 3.155
400 0.976 0.139 7.003 76.458 26.094 2.930
depending on the dimension. For the MPC problem the number of iterations does not
significantly decrease because the number of iterations is already low in cold-start. However
we get from 2.6 to 4-fold time improvement from factorization caching. OSQP shows from
5.8 to 7 times reduction in time for the portfolio problem and from 2.9 to 3.6 times reduction
in number of iterations.
9 Conclusions
We presented a novel general-purpose QP solver based on ADMM. Our method uses a
new splitting requiring the solution of a quasi-definite linear system that is always solvable
independently from the problem data. We impose no assumptions on the problem data other
than convexity, resulting in a general-purpose and very robust algorithm.
For the first time, we propose a first-order QP solution method able to provide primal and
dual infeasibility certificates if the problem is unsolvable without resorting to homogeneous
self-dual embedding or additional complexity in the iterations.
In contrast to other first-order methods, our solver can provide high-quality solutions by
performing solution polishing. After guessing which constraints are active, we compute the
solutions of an additional small equality constrained QP by solving a linear system. If the
constraints are identified correctly, the returned solution has accuracy equal or higher than
interior-point methods.
The proposed method is easily warm started to reduce the number of iterations. If the
problem matrices do not change, the linear system matrix factorization can be cached and
23
reused across multiple solves greatly improving the computation time. This technique can be
extremely effective, especially when solving parametric QPs where only part of the problem
data change.
We have implemented our algorithm in the open-source OSQP solver written in C and
interfaced with multiple other languages and parsers. OSQP is based on sparse linear algebra
and is able to exploit the structure of QPs arising in different application areas. OSQP is
robust against noisy and unreliable data and, after the first factorization is computed, can be
compiled to be library-free and division-free, making it suitable for embedded applications.
Thanks to its simple and parallelizable iterations, OSQP can handle large-scale problems
with millions of nonzeros.
We extensively benchmarked the OSQP solver with problems arising in several appli-
cation domains including finance, control and machine learning. In addition, we bench-
marked it against the hard problems from the Maros-Mészáros test set [MM99] and Lasso
and Huber fitting problems generated with sparse matrices from the SuiteSparse Matrix
Collection [DH11]. Timing and failure rate results showed great improvements over state-
of-the-art academic and commercial QP solvers.
OSQP has already a large userbase with tens of thousands of users both from top academic
institutions and large corporations.
A Problem classes
In this section we describe the random problem classes used in the benchmarks and derive
formulations with explicit linear equalities and inequalities that can be directly written in
the form Ax ∈ C with C = [l, u].
A.1 Random QP
Consider the following QP
minimize (1/2)xT P x + q T x
subject to l ≤ Ax ≤ u.
Problem instances. The number of variables and constraints in our problem instances are
n and m = 10n. We generated random matrix P = M M T + αI where M ∈ Rn×n and 15%
nonzero elements Mij ∼ N (0, 1). We add the regularization αI with α = 10−2 to ensure that
the problem is not unbounded. We set the elements of A ∈ Rm×n as Aij ∼ N (0, 1) with only
15% being nonzero. The linear part of the cost is normally distributed, i.e., qi ∼ N (0, 1).
We generated the constraint bounds as ui ∼ U(0, 1), li ∼ −U(0, 1).
24
A.2 Equality constrained QP
Consider the following equality constrained QP
minimize (1/2)xT P x + q T x
subject to Ax = b.
Problem instances. The number of variables and constraints in our problem instances are
n and m = bn/2c.
We generated random matrix P = M M T + αI where M ∈ Rn×n and 15% nonzero
elements Mij ∼ N (0, 1). We add the regularization αI with α = 10−2 to ensure that the
problem is not unbounded. We set the elements of A ∈ Rm×n as Aij ∼ N (0, 1) with only
15% being nonzero. The vectors are all normally distributed, i.e., qi , bi ∼ N (0, 1).
Iterative refinement interpretation. Solution of the above problem can be found directly
by solving the following linear system
P AT x −q
= . (35)
A 0 ν b
If we apply the ADMM iterations (15)–(19) for solving the above problem, and by setting
α = 1 and y 0 = b, the algorithm boils down to the following iteration
k+1 k −1
x x P + σI AT −q P A T xk
= k + − ,
ν k+1 ν A −ρ−1 I b A 0 νk
which is equivalent to (31) with g = (−q, b) and t̂k = (xk , ν k ). This means that Algo-
rithm 1 applied to solve an equality constrained QP is equivalent to applying iterative re-
finement [Wil63, DER89] to solve the KKT system (35). Note that the perturbation matrix
in this case is
σI
∆K = ,
−ρ−1 I
which justifies using a low value of σ and a high value of ρ for equality constraints.
25
The states xt ∈ Rnx and the inputs uk ∈ Rnu are subject to polyhedral constraints defined
by the sets X and U. The horizon length is T and the initial state is xinit ∈ Rnx . Matrices
Q ∈ Sn+x and R ∈ Sn++ u
define the state and input costs at each stage of the horizon, and
nx
QT ∈ S+ defines the final stage cost.
By defining the new variable z = (x0 , . . . , xT , u0 , . . . , uT −1 ), problem (36) can be written
as a sparse QP of the form (2) with a total of nx (T + 1) + nu T variables.
Problem instances. We defined the linear systems with n = nx states and nu = 0.5nx
inputs. We set the horizon length to T = 10. We generated the dynamics as A = I + ∆ with
∆ij ∼ N (0, 0.01). We chose only stable dynamics by enforcing the norm of the eigenvalues
of A to be less than 1. The input action is modeled as B with Bij ∼ N (0, 1).
The state cost is defined as Q = diag(q) where qi ∼ U(0, 10) and 70% nonzero elements
in q. We chose the input cost as R = 0.1I. The terminal cost QT is chosen as the optimal
cost for the linear quadratic regulator (LQR) applied to A, B, Q, R by solving a discrete
algebraic Riccati equation (DARE) [BBM17]. We generated input and state constraints as
X = {xt ∈ Rnx | −x ≤ xt ≤ x}, U = {ut ∈ Rnu | −u ≤ ut ≤ u},
where xi ∼ U(1, 2) and ui ∼ U(0, 0.1). The initial state is uniformly distributed with
xinit ∼ U(−0.5x, 0.5x).
γ > 0 the risk aversion parameter, and Σ ∈ Sn+ the risk model covariance matrix. The risk
model is usually assumed to be the sum of a diagonal and a rank k < n matrix
Σ = F F T + D,
where F ∈ Rn×k is the factor loading matrix and D ∈ Rn×n is a diagonal matrix describing
the asset-specific risk.
We introduce a new variable y = F T x and solve the resulting problem in variables x
and y
minimize xT Dx + y T y − γ −1 µT x
subject to y = F T x
(37)
1T x = 1
x ≥ 0,
Note that the Hessian of the objective in (37) is a diagonal matrix. Also, observe that F F T
does not appear in problem (37).
26
Problem instances. We generated portfolio problems for increasing number of factors k
and number of assets n = 100k. The elements of matrix F were chosen as√Fij ∼ N (0, 1)
with 50% nonzero elements. The diagonal matrix D is chosen as Dii ∼ U[0, k]. The mean
return was generated as µi ∼ N (0, 1). We set γ = 1.
A.5 Lasso
The least absolute shrinkage and selection operator (Lasso) is a well known linear regression
technique obtained by adding an `1 regularization term in the objective [Tib96, CWB08]. It
can be formulated as
minimize kAx − bk22 + λkxk1 ,
where x ∈ Rn is the vector of parameters and A ∈ Rm×n is the data matrix and λ is the
weighting parameter.
We convert this problem to the following QP
minimize y T y + λ1T t
subject to y = Ax − b
−t ≤ x ≤ t,
Problem instances. The elements of matrix A are generated as Aij ∼ N (0, 1) with 15%
nonzero elements. To construct the vector b, we generated the true sparse vector v ∈ Rn to
be learned (
0 with probability p = 0.5
vi ∼
N (0, 1/n) otherwise.
Then we let b = Av + ε where ε is the noise generated as εi ∼ N (0, 1). We generated the
instances with varying n features and m = 100n data points. The parameter λ is chosen as
(1/5)kAT bk∞ since kAT bk∞ is the critical value above which the solution of the problem is
x = 0.
27
Problem (38) is equivalent to the following QP [MM00, Eq. (24)]
minimize uT u + 2M 1T (r + s)
subject to Ax − b − u = r − s
r≥0
s ≥ 0.
Problem instances. We generate the elements of A as Aij ∼ N (0, 1) with 15% nonzero
elements. To construct b ∈ Rm we first generate a vector v ∈ Rn as vi ∼ N (0, 1/n) and a
noise vector ε ∈ Rm with elements
(
N (0, 1/4) with probability p = 0.95
εi ∼
U[0, 10] otherwise.
minimize xT x + λ m T
P
i=1 max(0, bi ai x + 1),
where bi ∈ {−1, +1} is a set label, and ai is a vector of features for the i-th point. The
problem can be equivalently represented as the following QP
minimize xT x + λ1T t
subject to t ≥ diag(b)Ax + 1
t ≥ 0,
where diag(b) denotes the diagonal matrix with elements of b on its diagonal.
28
GUROBI OSQP
Random QP Eq QP
Computation time [s] Computation time [s] Computation time [s]
102 102
10−1 10−1
10−4 2 10−4 2
10 103 104 105 106 107 108 10 103 104 105 106 107 108
Portfolio Lasso
102 102
10−1 10−1
10−4 2 10−4 2
10 103 104 105 106 107 108 10 103 104 105 106 107 108
SVM Huber
102 102
10−1 10−1
10−4 2 10−4 2
10 103 104 105 106 107 108 10 103 104 105 106 107 108
Problem dimension N Problem dimension N
Control
Computation time [s]
102
10−1
10−4 2
10 103 104 105 106 107 108
Problem dimension N
Figure 1: Computation time vs problem dimension for OSQP and GUROBI for low accu-
racy mode.
29
GUROBI OSQP
Random QP Eq QP
Computation time [s] Computation time [s] Computation time [s]
102 102
10−1 10−1
10−4 2 10−4 2
10 103 104 105 106 107 108 10 103 104 105 106 107 108
Portfolio Lasso
102 102
10−1 10−1
10−4 2 10−4 2
10 103 104 105 106 107 108 10 103 104 105 106 107 108
SVM Huber
102 102
10−1 10−1
10−4 2 10−4 2
10 103 104 105 106 107 108 10 103 104 105 106 107 108
Problem dimension N Problem dimension N
Control
Computation time [s]
102
10−1
10−4 2
10 103 104 105 106 107 108
Problem dimension N
Figure 2: Computation time vs problem dimension for OSQP and GUROBI for high
accuracy mode.
30
Low accuracy
Ratio of problems solved 1
0.8
0.6 OSQP
GUROBI
0.4 MOSEK
ECOS
0.2
qpOASES
0
1 10 100 1,000 10,000
Performance ratio τ
High accuracy
Ratio of problems solved
1
0.8
0.6 OSQP
GUROBI
0.4 MOSEK
ECOS
0.2
qpOASES
0
1 10 100 1,000 10,000
Performance ratio τ
References
[ABQ+ 99] F. Allgöwer, T. A. Badgwell, J. S. Qin, J. B. Rawlings, and S. J. Wright.
Nonlinear Predictive Control and Moving Horizon Estimation – An Introductory
Overview, pages 391–449. Springer London, London, 1999.
31
Low accuracy
Ratio of problems solved 1
0.8
0.6
0.4 OSQP
GUROBI
0.2
MOSEK
0
1 10 100 1,000 10,000
Performance ratio τ
High accuracy
Ratio of problems solved
1
0.8
0.6
0.4 OSQP
GUROBI
0.2
MOSEK
0
1 10 100 1,000 10,000
Performance ratio τ
[BBD+ 17] S. Boyd, E. Busseti, S. Diamond, R. N. Kahn, K. Koh, P. Nystrup, and J. Speth.
Multi-period trading via convex optimization. Foundations and Trends in Op-
timization, 3(1):1–76, 2017.
[BBM17] F. Borrelli, A. Bemporad, and M. Morari. Predictive Control for Linear and
Hybrid Systems. Cambridge University Press, 2017.
[Ben02] M. Benzi. Preconditioning techniques for large linear systems: a survey. Journal
of Computational Physics, 182(2):418 – 477, 2002.
32
Low accuracy
Ratio of problems solved 1
0.8
0.6
0.4 OSQP
GUROBI
0.2
MOSEK
0
1 10 100 1,000 10,000
Performance ratio τ
High accuracy
Ratio of problems solved
1
0.8
0.6
0.4 OSQP
GUROBI
0.2
MOSEK
0
1 10 100 1,000 10,000
Performance ratio τ
[BG18] G. Banjac and P. Goulart. Tight global linear convergence rate bounds for oper-
ator splitting methods. IEEE Transactions on Automatic Control, 63(12):4126–
4139, 2018.
33
[BMOW14] S. Boyd, M. T. Mueller, B. O’Donoghue, and Y. Wang. Performance bounds
and suboptimal policies for multiperiod investment. Foundations and Trends
in Optimization, 1(1):1–72, 2014.
[BPC+ 11] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimiza-
tion and statistical learning via the alternating direction method of multipliers.
Foundations and Trends in Machine Learning, 3(1):1–122, 2011.
[Bra10] A. Bradley. Algorithms for the equilibration of matrices and their application to
limited-memory quasi-Newton methods. PhD thesis, Stanford University, 2010.
[BSM+ 17] G. Banjac, B. Stellato, N. Moehle, P. Goulart, A. Bemporad, and S. Boyd.
Embedded code generation using the OSQP solver. In IEEE Conference on
Decision and Control (CDC), 2017.
[BV04] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University
Press, 2004.
[CT06] G. Cornuejols and R. Tütüncü. Optimization Methods in Finance. Mathematics,
Finance and Risk. Cambridge University Press, 2006.
[CV95] C. Cortes and V. Vapnik. Support-vector networks. Machine Learning,
20(3):273–297, 1995.
[CWB08] E. J. Candés, M. B. Wakin, and S. Boyd. Enhancing sparsity by reweighted
`1 minimization. Journal of Fourier Analysis and Applications, 14(5):877–905,
2008.
[Dan63] G. B. Dantzig. Linear programming and extensions. Princeton University Press
Princeton, N.J., 1963.
[Dav05] T. A. Davis. Algorithm 849: a concise sparse Cholesky factorization package.
ACM Transactions on Mathematical Software, 31(4):587–591, 2005.
[Dav06] T. A. Davis. Direct Methods for Sparse Linear Systems. Society for Industrial
and Applied Mathematics, 2006.
[DB16] S. Diamond and S. Boyd. CVXPY: A Python-embedded modeling language for
convex optimization. Journal of Machine Learning Research, 17(83):1–5, 2016.
[DB17] S. Diamond and S. Boyd. Stochastic matrix-free equilibration. Journal of
Optimization Theory and Applications, 172(2):436–454, February 2017.
[DCB13] A. Domahidi, E. Chu, and S. Boyd. ECOS: An SOCP solver for embedded
systems. In European Control Conference (ECC), pages 3071–3076, 2013.
[DER89] I. S. Duff, A. M. Erisman, and J. K. Reid. Direct methods for sparse matrices.
Oxford University Press, London, 1989.
34
[DFH09] M. Diehl, H. J. Ferreau, and N. Haverbeke. Efficient Numerical Methods for
Nonlinear MPC and Moving Horizon Estimation, pages 391–417. Springer
Berlin Heidelberg, Berlin, Heidelberg, 2009.
[DH11] T. A. Davis and Y. Hu. The University of Florida Sparse Matrix Collection.
ACM Trans. Math. Softw., 38(1):1:1–1:25, December 2011.
[DHL17] I. Dunning, J. Huchette, and M. Lubin. JuMP: A modeling language for math-
ematical optimization. SIAM Review, 59(2):295–320, 2017.
[FB18] C. Fougner and S. Boyd. Parameter Selection and Preconditioning for a Graph
Form Solver, pages 41–61. Springer International Publishing, 2018.
[FL98] R. Fletcher and S. Leyffer. Numerical experience with lower bounds for MIQP
branch-and-bound. SIAM Journal on Optimization, 8(2):604–616, 1998.
[FW56] M. Frank and P. Wolfe. An algorithm for quadratic programming. Naval Re-
search Logistics Quarterly, 3(1-2):95–110, 1956.
[GB15] P. Giselsson and S. Boyd. Metric selection in fast dual forward–backward split-
ting. Automatica, 62:1–10, 2015.
[GB17] P. Giselsson and S. Boyd. Linear convergence and metric selection for Douglas-
Rachford splitting and ADMM. IEEE Transactions on Automatic Control,
62(2):532–544, February 2017.
35
[GM75] R. Glowinski and A. Marroco. Sur l’approximation, par éléments finis d’ordre
un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirich-
let non linéaires. ESAIM: Mathematical Modelling and Numerical Analysis -
Modélisation Mathématique et Analyse Numérique, 9(R2):41–76, 1975.
[GM76] D. Gabay and B. Mercier. A dual algorithm for the solution of nonlinear vari-
ational problems via finite element approximation. Computers & Mathematics
with Applications, 2(1):17 – 40, 1976.
[Gre97] A. Greenbaum. Iterative Methods for Solving Linear Systems. Society for In-
dustrial and Applied Mathematics, 1997.
[GS16] N. Gould and J. Scott. A note on performance profiles for benchmarking soft-
ware. ACM Trans. Math. Softw., 43(2):15:1–15:5, August 2016.
[GVL96] G. H. Golub and C. F. Van Loan. Matrix Computations (3rd Ed.). Johns
Hopkins University Press, Baltimore, MD, USA, 1996.
[HYW00] B. S. He, H. Yang, and S. L. Wang. Alternating direction method with self-
adaptive penalty parameters for monotone variational inequalities. Journal of
Optimization Theory and Applications, 106(2):337–356, 2000.
36
[Int17] Intel Corporation. Intel Math Kernel Library. User’s Guide, 2017.
[Kel95] C. Kelley. Iterative Methods for Linear and Nonlinear Equations. Society for
Industrial and Applied Mathematics, 1995.
[KM70] V. Klee and G. Minty. How good is the simplex algorithm. Technical report,
Department of Mathematics, University of Washington, 1970.
[LM79] P. L. Lions and B. Mercier. Splitting algorithms for the sum of two nonlinear
operators. SIAM Journal on Numerical Analysis, 16(6):964–979, 1979.
[MB12] J. Mattingley and S. Boyd. CVXGEN: A code generator for embedded convex
optimization. Optimization and Engineering, 13(1):1–27, 2012.
37
[MM00] O. L. Mangasarian and D. R. Musicant. Robust linear and support vector
regression. IEEE Transactions on Pattern Analysis and Machine Intelligence,
22(9):950–955, 2000.
[MOS17] MOSEK ApS. The MOSEK optimization toolbox for MATLAB manual. Version
8.0 (Revision 57)., 2017.
[OCPB16] B. O’Donoghue, E. Chu, N. Parikh, and S. Boyd. Conic optimization via oper-
ator splitting and homogeneous self-dual embedding. Journal of Optimization
Theory and Applications, 169(3):1042–1068, June 2016.
[PC11] T. Pock and A. Chambolle. Diagonal preconditioning for first order primal-
dual algorithms in convex optimization. In 2011 International Conference on
Computer Vision, pages 1762–1769, November 2011.
[RM09] J. B. Rawlings and D. Q. Mayne. Model Predictive Control: Theory and Design.
Nob Hill Publishing, 2009.
38
[Rui01] D. Ruiz. A scaling algorithm to equilibrate both rows and columns norms in ma-
trices. Technical Report RAL-TR-2001-034, Rutherford Appleton Laboratory,
Oxon, UL, 2001.
[SB19] B. Stellato and G. Banjac. Benchmark examples for the OSQP solver.
https://github.com/oxfordcontrol/osqp_benchmarks, 2019.
[Tib96] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the
Royal Statistical Society: Series B, 58(1):267–288, 1996.
39