0% found this document useful (0 votes)
24 views23 pages

Rahul Paper

This paper introduces a constrained online convex optimization (COCO) framework, proposing an algorithm that achieves a static regret of O(T) and a cumulative constraint violation (CCV) that is instance dependent, improving upon previous universal bounds. The algorithm exploits the geometric properties of nested convex sets to minimize CCV, achieving O(1) in special cases while providing a hybrid approach that combines its method with existing algorithms for broader applicability. The findings highlight the potential for significant improvements in CCV for certain problem instances compared to prior work.

Uploaded by

Abhishek Sinha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views23 pages

Rahul Paper

This paper introduces a constrained online convex optimization (COCO) framework, proposing an algorithm that achieves a static regret of O(T) and a cumulative constraint violation (CCV) that is instance dependent, improving upon previous universal bounds. The algorithm exploits the geometric properties of nested convex sets to minimize CCV, achieving O(1) in special cases while providing a hybrid approach that combines its method with existing algorithms for broader applicability. The findings highlight the potential for significant improvements in CCV for certain problem instances compared to prior work.

Uploaded by

Abhishek Sinha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

O( T ) Static Regret and Instance Dependent Constraint


Violation for Constrained Online Convex Optimization
Rahul Vaze, Abhishek Sinha,
School of Technology and Computer Science
arXiv:2502.05019v1 [cs.LG] 7 Feb 2025

Tata Institute of Fundamental Research


Mumbai 400005, India
[email protected], [email protected]
February 10, 2025

Abstract
The constrained version of the standard online convex optimization (OCO) framework, called COCO is
considered, where on every round, a convex cost function and a convex constraint function are revealed to
the learner after it chooses the action for that round. The objective is to simultaneously minimize the static
regret
√ and cumulative constraint violation
√ (CCV). An algorithm is proposed that guarantees a static regret of
O( T ) and a CCV of min{V, O( T log T )}, where V depends on the distance between the consecutively
revealed constraint sets, the shape of constraint sets, dimension of action space and the diameter of the action
space.√For special cases of constraint
√ sets, V = O(1). Compared to the state of the art results, static regret
of O( T ) and CCV of O( T log T ), that were universal, the new result on CCV is instance dependent,
which is derived by exploiting the geometric properties of the constraint sets.

1 Introduction
In this paper, we consider the constrained version of the standard online convex optimization (OCO) frame-
work, called constrained OCO or COCO. In COCO, on every round t, the online algorithm first chooses an
admissible action xt ∈ X ⊂ Rd , and then the adversary chooses a convex loss/cost function ft : X → R and
a constraint function of the form gt (x) ≤ 0, where gt : X → R is a convex function. Since gt ’s are revealed
after the action xt is chosen, an online algorithm need not necessarily take feasible actions on each round, and
in addition to the static regret
T
X T
X
Regret[1:T ] ≡ sup sup RegretT (x⋆ ), where RegretT (x⋆ ) ≡ ft (xt ) − ft (x⋆ ), (1)
{ft }T
t=1
x⋆ ∈X t=1 t=1

an additional metric of interest is the total cumulative constraint violation (CCV) defined as
T
X
CCV[1:T ] ≡ max(gt (xt ), 0). (2)
t=1

Let X ⋆ be the feasible set consisting of all admissible actions that satisfy all constraints gt (x) ≤ 0, t ∈ [T ].
Under the standard assumption that X ⋆ is not empty (called the feasibility assumption), the goal is to design

1
an online algorithm to simultaneously achieve a small regret (1) with respect to any admissible benchmark
x⋆ ∈ X ⋆ and a small CCV (2).
With constraint sets Gt = {x ∈ X : gt (x) ≤ 0} being convex for all t, and the assumption X ⋆ = ∩t Gt ̸=
∅ implies that sets St = ∩tτ =1 Gτ are convex and are nested, i.e. St ⊆ St−1 and X ⋆ ∈ St for all t. Essentially,
set St ’s are sufficient to quantify the CCV.

1.1 Prior Work


Constrained OCO (COCO): (A) Time-invariant constraints: COCO with time-invariant constraints, i.e.,
gt = g, ∀ t (Yuan and Lamperski, 2018; Jenatton et al., 2016; Mahdavi et al., 2012; Yi et al., 2021) has been
considered extensively, where functions g are assumed to be known to the algorithm a priori. The algorithm is
allowed to take actions that are infeasible at any time to avoid the costly projection step of the vanilla projected
OGD algorithm and the main objective was to design an efficient algorithm with a small regret and CCV while
avoiding the explicit projection step.
(B) Time-varying constraints: The more difficult question is solving COCO problem when the constraint
functions, i.e., gt ’s, change arbitrarily with time t. In this setting, all prior work on COCO made the feasibility
assumption. One popular algorithm for solving COCO considered a Lagrangian function optimization that is
updated using the primal and dual variables (Yu et al., 2017; Sun et al., 2017; Yi et al., 2023). Alternatively,
Neely and Yu (2017) and Liakopoulos et al. (2019) used the drift-plus-penalty (DPP) framework Neely (2010)
to solve the COCO, but which needed additional assumption, e.g. the Slater’s condition in Neely and Yu
(2017) and with weaker form of the feasibility assumption Neely and Yu (2017)’s.
More recently paper, Guo et al. (2022) obtained the bounds similar to Neely and Yu (2017) but without
assuming Slater’s condition. However, the algorithm Guo et al. (2022) was quite computationally intensive
since it requires solving a convex optimization problem √ on each round.√Finally, very recently, the state of the
art guarantees on simultaneous bounds on regret O( T ) and CCV O( T log T ) for COCO were derived in
Sinha and Vaze (2024) with a very simple algorithm that combines the loss function at time t and the CCV
accrued till time t in a single loss function, and then executes the online gradient descent (OGD) algorithm
on the single loss function with an adaptive step-size. Please refer to Table 1 for a brief summary of the prior
results.
The COCO problem has been considered in the dynamic setting as well (Chen and Giannakis, 2018;
Cao and Liu, 2018; Vaze, 2022; Liu et al., 2022) where the benchmark x⋆ in (1) is replaced by x⋆t (x⋆t =
arg minx ft (x)) that is also allowed to change its actions over time. However, in this paper, we focus our
entire attention on the static version. A special case of COCO is the online constraint satisfaction (OCS)
problem that does not involve any cost function, i.e., ft = 0, ∀t, and the only object of interest is minimizing
the CCV. The√algorithm with state of the art guarantee for COCO Sinha and Vaze (2024) was shown to have
a CCV of O( T log T ) for the OCS.

1.2 Convex Body Chasing Problem


A well-studied problem related to the COCO is the nested convex body chasing (NCBC) problem (Bansal
et al., 2018; Argue et al., 2019; Bubeck et al., 2020), where at each round t, a convex set χt ⊆ χ is revealed
such that χt ⊆ χt−1 , and χ0 = χ ⊆ Rd is a convex, compact, and bounded set. The objective is to choose
PT
action xt ∈ χt so as to minimize the total movement cost C = t=1 ||xt − xt−1 ||, where x0 ∈ χ is some
fixed action. Best known-algorithms for NCBC (Bansal et al., 2018; Argue et al., 2019; Bubeck et al., 2020)
choose xt to be the centroid or Stiener point of χt , essentially well inside the newly revealed convex set in
order to reduce the future movement cost. With COCO, such an approach does not provide any useful bounds
because of the presence of cost functions ft ’s whose minima could be towards the boundary of relevant convex
sets St ’s.

2
1.3 Limitations of Prior Work
We explicitly show in
√ Lemma 6 that the best known algorithm Sinha and Vaze (2024) for solving COCO
suffers a CCV of Ω( T log T ) for ‘simple’ problem instances where ft = f and gt = g for all t and d = 1
dimension, for which ideally the CCV should be O(1). The same is true for most other prior algorithms,
where the main reason for their large CCV for simple instances is that all these algorithms treat minimizing
the CCV as a regret minimization problem for functions gt . What they fail to exploit is the geometry of the
underlying nested convex sets St ’s that control the CCV.

1.4 Main open question


In comparison to the above discussed √ upper bounds, the best known
√ simultaneous lower bound Sinha and
Vaze (2024) for COCO is R[1:T ] = Ω( d) and CCV[1:T ] = Ω( d), where d is the dimension of the action

space X . Without constraints, i.e., gt ≡ 0 for all t, the lower bound on R[1:T ] = Ω( T ) Hazan (2012). Thus,
there is a fundamental gap between the lower and upper bound for the CCV, and the main open question for
COCO is : √ √
Is it possible to simultaneously achieve R[1:T ] = O( T ) and CCV[1:T ] = o( T ) or CCV[1:T ] = O(1)
for COCO?
Even though we do not fully resolve this question, in this paper, we make some meaningful progress
by proposing an algorithm that exploits √ the geometry of the nested sets St ’s and show that it is possible to
simultaneously achieve R[1:T ] = O( T ) and CCV[1:T ] = O(1) in certain cases, and for general case, give

a bound on the CCV that depends on the shape of the convex sets St ’s while achieving R[1:T ] = O( T ). In
particular, the contributions of this paper are as follows.

1.5 Our Contributions


In this paper, we propose an algorithm (Algorithm 2) that tries to exploit the geometry of the nested convex
sets St ’s. In particular, Algorithm 2 at time t, first takes an OGD step from the previous action xt−1 with
respect to the most recently revealed loss function ft−1 with appropriate step-size to reach yt−1 , and then
projects yt−1 onto the most recently revealed set St−1 to get xt , the action to be played at time t. Let Ft be
the “projection” hyperplane passing through xt that is perpendicular to xt − yt−1 . For Algorithm 2, we derive
the following guarantees.

• The regret of the Algorithm 2 is O( T ).
• The CCV for the Algorithm 2 takes the following form
– When sets St ’s are ‘nice’, e.g. are spheres, or axis parallel cuboids, CCV is O(1).
– For general St ’s, the CCV is upper bounded by a quantity V that is a function of the distance
between the consecutive sets St and St+1 for all t, the shape of St ’s, dimension d and the diameter
D. Since V depends on the shape of St ’s, there is no universal bound on V, and the derived bound
is instance dependent.
– For the special case of d = 2, when projection hyperplanes Ft ’s progressively make increasing
angles with respect to the first projection hyperplane F1 , the CCV is O(1).
• As pointed out above, for general St ’s, there is no universal bound on the CCV of Algorithm 2. Thus,
we propose an algorithm Switch that √ combines Algorithm 2 and the algorithm from√Sinha and Vaze
(2024) to provide a regret bound of O( T ) and a CCV that is minimum of V and O( T log T ). Thus,
Switch provides a best of two√worlds CCV guarantee, which is small if the sets St ’s are nice, while in
the worst case it is at most O( T log T ).

3
Reference Regret CCV Complexity per round
√ √
Neely and Yu (2017) O( T ) O( T ) Conv-OPT, Slater’s condition
√ 3
Guo et al. (2022) O( T ) O(T 4 ) Conv-OPT
max(β,1−β) 1−β/2
Yi et al. (2023) O(T
√ ) O(T
√ ) Conv-OPT
Sinha and Vaze (2024) O(√T ) O( T log T√) Projection
This paper O( T ) O(min{V, T log T }) Projection

Table 1: Summary of the results on COCO for arbitrary time-varying convex constraints and convex cost functions. In
the above table, 0 ≤ β ≤ 1 is an adjustable parameter. Conv-OPT refers to solving a constrained convex optimization
problem on each round. Projection refers to the Euclidean projection operation on the convex set X . The CCV bound for
this paper is stated in terms of V which can be O(1) or depends on the shape of convex sets St ’s.


• For the OCS problem, we show that the CCV of Algorithm 2 is O(1) compared to the CCV of O( T log T )
Sinha and Vaze (2024).

2 COCO Problem
On round t, the online policy first chooses an admissible action xt ∈ X ⊂ Rd , and then the adversary chooses
a convex cost function ft : X → R and a constraint of the form gt (x) ≤ 0, where gt : X → R is a convex
function. Once the action xt has been chosen, we let ∇ft (xt ) and full function gt or the set {x : gt (x) ≤ 0}
to be revealed, as is standard in the literature. We now state the standard assumptions made in the literature
while studying the COCO problem Guo et al. (2022); Yi et al. (2021); Neely and Yu (2017); Sinha and Vaze
(2024).
Assumption 1 (Convexity) X ⊂ Rd is the admissible set that is closed, convex and has a finite Euclidean
diameter D. The cost function ft : X 7→ R and the constraint function gt : X 7→ R are convex for all t ≥ 1.

Assumption 2 (Lipschitzness) All cost functions {ft }t≥1 and the constraint functions {gt }t≥1 ’s are G-
Lipschitz, i.e., for any x, y ∈ X , we have

|ft (x) − ft (y)| ≤ G||x − y||, |gt (x) − gt (y)| ≤ G||x − y||, ∀t ≥ 1.

Assumption 3 (Feasibility) With Gt = {x ∈ X : gt (x) ≤ 0}, we assume that X ⋆ = ∩Tt=1 Gt ̸= ∅. Any


action x⋆ ∈ X ⋆ is defined to be feasible.

The feasibility assumption distinguishes the cost functions from the constraint functions and is common across
all previous literature on COCO Guo et al. (2022); Neely and Yu (2017); Yu and Neely (2016); Yuan and
Lamperski (2018); Yi et al. (2023); Liakopoulos et al. (2019); Sinha and Vaze (2024).
For any real number z, we define (z)+ ≡ max(0, z). Since gt ’s are revealed after the action xt is chosen,
any online policy need not necessarily take feasible actions on each round. Thus in addition to the static1
regret defined below

Regret[1:T ] ≡ sup sup Regret[1:T ] (x⋆ ), (3)


{ft }T
t=1
x⋆ ∈X ⋆

1 The static-ness refers to the fixed benchmark using only one action x⋆ throughout the horizon of length T

4
PT PT
where Regret[1:T ] (x⋆ ) ≡ t=1 ft (xt ) − t=1 ft (x⋆ ), an additional obvious metric of interest is the total
cumulative constraint violation (CCV) defined as
T
X
CCV[1:T ] = (gt (xt ))+ . (4)
t=1

Under the standard assumption (Assumption 3) that X ⋆ is not empty, the goal is to design an online policy to
simultaneously achieve a small regret (1) with x⋆ ∈ X ⋆ and a small CCV (2). We refer to this problem as the
constrained OCO (COCO).
For simplicity, we define set
St = ∩tτ =1 Gτ , (5)
where Gt is as defined in Assumption 3. All Gt ’s are convex and consequently, all St ’s are convex and are
nested, i.e. St ⊆ St−1 . Moreover, because of Assumption 3, each St is non-empty and in particular X ⋆ ∈ St
for all t. After action xt has been chosen, set St controls the constraint violation, which can be used to write
an upper bound on the CCV[1:T ] as follows.

Definition 4 For a convex set χ and a point x ∈


/ χ,

dist(x, χ) = min ||x − y||.


y∈χ

Thus, the constraint violation at time t,


T
X
(gt (xt ))+ ≤ Gdist(xt , St ), and CCV[1:T ] ≤ G dist(xt , St ), (6)
t=1

where G is the common Lipschitz constants for all gt ’s.

3 Algorithm from Sinha and Vaze (2024)


The best known algorithm (Algorithm 1) to solve COCO Sinha and Vaze (2024) was shown to have the
following guarantee.
√ √
Theorem 5 [Sinha and Vaze (2024)] Algorithm 1’s Regret[1:T ] = O( T ) and CCV[1:T ] = O( T log T )
when ft , gt are convex.
We next show that in fact the analysis of Sinha and Vaze (2024) is tight for the CCV even when d = 1 and
ft (x) = f (x) and gt (x) = g(x) for all t. With finite diameter D and the fact that any x⋆ ∈ X ⋆ belongs to all
nested convex bodies St ’s, when d = 1, one expects that the CCV for any algorithm in this case will be O(D).
However, we as we show next, Algorithm 1 does not effectively make use of geometric constraints imposed
by nested convex bodies St ’s.

Lemma 6 Even when d = 1 and ft (x) = f (x) and gt (x) = g(x) for all t, for Algorithm 1, its CCV[1:T ] =

Ω( T log T ).

Proof: Input: Consider d = 1, and let X = [1, a], a > 2. Moreover, let ft (x) = f (x) and gt (x) = g(x) for
all t. Let f (x) = cx2 for some (large) c > 0 and g(x) be such that G = {x : g(x) ≤ 0} ⊆ [a/2, a] and let
|∇g(x)| ≤ 1 for all x.
Let 1 < x1 < a/2. Note that CCV(t) (defined in Algorithm 1) is a non-decreasing function, and let t⋆
be the earliest time t such that Φ′ (CCV(t))∇g(x) < −c. For f (x) = cx2 , ∇f (x) ≥ c for all x > 1. Thus,

5
Algorithm 1 Online Algorithm from Sinha and Vaze (2024)
1: Input: Sequence of convex cost functions {ft }T T
t=1 and constraint functions {gt }t=1 , G = a common
Lipschitz constant, T = Horizon length, D = Euclidean diameter of the admissible set X , PX (·) =
Euclidean projection oracle on the set X
2: Parameter settings:
1. Convex cost functions: β = (2GD)−1 , V = 1, λ = 1

2 T
, Φ(x) = exp(λx) − 1.
8G2 ln(T e)
2. α-Strongly convex cost functions: β = 1, V = α , Φ(x) = x2 .
3: Initialization: Set x1 = 0, CCV(0) = 0.
4: For t = 1 : T
5: Play xt , observe ft , gt , incur a cost of ft (xt ) and constraint violation of (gt (xt ))+
6: f˜t ← βft , g̃t ← β max(0, gt ).
7: CCV(t) = CCV(t − 1) + g̃t (xt ).
8: Compute ∇t = ∇fˆt (xt ), where fˆt (x) := V f˜t (x) + Φ′ (CCV(t))g̃t (x), t ≥ 1.
9: xt+1 = PX (xt − ηt ∇t ), where
 √
 √Pt 2D , for convex costs
2
ηt = 2 τ =1 ||∇τ ||2
 Pt 1 , for strongly convex costs (Ht is the strong convexity parameter of ft ).
s=1 Hs

10: EndFor

using Algorithm 1’s definition, it follows that for all t ≤ t⋆ , xt < a/2, since the derivative of f dominates the
derivative of Φ′ (CCV(t))g(x) until then.
Since Φ(x) = exp(λx) − 1 with λ = 2√1 T , and by definition |∇g(x)| ≤ 1 for all x, thus, we have that by
√ √
time t⋆ , CCV[1:t⋆ ] = Ω( T log T ). Therefore, CCV[1:T ] = Ω( T log T ).
2
Essentially, Algorithm 1 is treating minimizing the √ CCV problem as regret minimization for function g
similar to function f and this leads to its CCV of Ω( T log T ). For any given input instance with d = 1,
an alternate algorithm that chooses its actions following
√ online gradient descent (OGD) projected on to the
most recently revealed feasible set St achieves O( T ) regret (irrespective of the starting action x1 ) and O(D)
CCV (since any x⋆ ∈ St for all t). We extend this intuition in the next section, and present an algorithm that
tries to exploit the geometry of the nested convex sets St for general d.

4 New Algorithm for solving COCO


In this section, we present a simple algorithm (Algorithm 2) for solving COCO. Algorithm 2 is essentially an
online projected gradient algorithm (OGD), which first takes an OGD step from the previous action xt−1 with
respect to the most recently revealed loss function ft−1 with appropriate step-size which is then projected onto
St−2 to reach yt−1 , and then projects yt−1 onto the most recently revealed set St−1 to get xt , the action to be
played at time t. (5).

Remark 1 Step 6 of Algorithm 2 might appear unnecessary, however, its useful for proving Theorem 16.

Since Algorithm 2 is essentially an online projected


√ gradient algorithm, similar to classical result on OGD,
next, we show that the regret of Algorithm 2 is O( T ).

Lemma 7 The Regret[1:T ] for Algorithm 2 is O( T ).

6
Algorithm 2 Online Algorithm for COCO
1: Input: Sequence of convex cost functions {ft }T T
t=1 and constraint functions {gt }t=1 , G = a common
Lipschitz constant, d dimension of the admissible set X , step size ηt = GD√t . D = Euclidean diameter of
the admissible set X , PX (·) = Euclidean projection operator on the set X ,
2: Initialization: Set x1 ∈ X arbitrarily, CCV(0) = 0.
3: For t = 1 : T
4: Play xt , observe ft , gt , incur a cost of ft (xt ) and constraint violation of (gt (xt ))+
5: Set St as defined in (5)
6: yt = PSt−1 (xt − ηt ∇ft (xt ))
7: xt+1 = PSt (yt )
8: EndFor

Extension of Lemma 7 when ft ’s are strongly convex which results in Regret[1:T ] = O(log T ) for Algorithm
2 follows standard arguments Hazan (2012) and is omitted.
The real challenge is to bound the total CCV for Algorithm 2. Let xt be the action played by Algorithm 2.
Then by definition, xt ∈ St−1 . Moreover, from (6), the constraint violation at time t, CCV(t) ≤ Gdist(xt , St ).
The next action xt+1 chosen by Algorithm 2 belongs to St , however, it is obtained by first taking an OGD
step from xt to reach yt and then projects yt onto St . Since ft ’s are arbitrary, the OGD step could be towards
any direction, and thus, there is no direct relationship between xt+1 and xt . Informally, (x1 , x2 , . . . , xT ) is
not a connected curve with any useful property. Thus, we take recourse in upper bounding the CCV via upper
bounding the total movement cost M (defined below) between nested convex sets using projections.
The total constraint violation for Algorithm 2 is
t
X
CCV[1:t] ≤ G dist(xτ , Sτ ),
τ =1
(a) X t
≤ G ||xτ − bτ ||, (7)
τ =1
(b)
= GMt , (8)

where in (a) bt is the projection of xt onto St , i.e., bt = PSt (xt ) and in (b)
t
X
Mt = ||xτ − bτ || (9)
τ =1

is defined to be the total movement cost on the instance S1 , . . . , St . The object of interest is MT .

5 Bounding the Total Movement Cost MT (9)


We start by considering two simple cases where bounding MT is easy.

Lemma 8 If all nested convex bodies S1 ⊇ S2 ⊇ · · · ⊇ ST are spheres then MT ≤ d3/2 D.


Proof: Recall the definition that xt ∈ ∂St−1 , bt = PSt (xt ) ∈ St from (7). Let ||xt − bt || = r, then
since all St ’s are spheres, at least along one of the d-orthogonal canonical basis vectors, diameter(St ) ≤
diameter(St−1 ) − √rd . Since the diameter along any of the d-axis is D, we get the answer. 2

7
Figure 1: Figure representing the cone Cwt (ct ) that contains the convex hull of mt and St with unit vector wt .

Lemma 9 If all nested convex bodies S1 ⊇ S2 ⊇ · · · ⊇ ST are cuboids that are axis parallel to each other,
then M ≤ d3/2 D.

Proof is identical to Lemma 8. Note that similar results can be obtained when St ’s are regular polygons that
are axis parallel with each other.
After exhausting the universal results for an upper bound on MT for ‘nice’ nested convex bodies, we next
give a general bound on MT for any sequence of nested convex bodies which depends on the geometry of the
nested convex bodies (instance dependent). To state the result we need the following preliminaries.
Following (7), bt = PSt (xt ) where xt ∈ ∂St−1 . Without loss of generality, xt ∈/ St since otherwise the
distance ||xt − bt || = 0. Let mt be the mid-point of xt and bt , i.e. mt = xt +b
2
t
.
Definition 10 Let the convex hull of mt ∪ St be Ct . Let wt be a unit vector such that there exists ct > 0 such
that the cone  
d T (z − mt )
Cwt (ct ) = z ∈ R : −wt ≥ ct
||(z − mt )||
contains Ct . Since St is convex, such wt , ct > 0 exist. For example, wt = bt − xt is one such choice for which
ct > 0 since mt ∈ / St . See Fig. 1 for a pictorial representation.
Let c⋆wt ,t = arg minct Cwt (ct ), c⋆t = minwt c⋆wt ,t , and wt⋆ = arg minwt c⋆wt ,t . Moreover, let c⋆ = mint c⋆t ,
where by definition, c⋆ < 1.
Essentially, 2 cos−1 (c⋆t ) is the angle width of Ct with respect to wt⋆ , i.e. each element of Ct makes an angle
of at most cos−1 (c⋆t ) with wt⋆ .

Remark 2 Note that c⋆t is only a function of the distance ||xt − bt || and the shape of St ’s, in particular,
the maximum width of St along the directions perpendicular to vector xt − bt ∀ t which can be at most the
diameter D. c⋆t decreases (increasing the “width” of cone Cwt⋆ (c⋆t )) as ||xt − bt || decreases, but small xt − bt
also implies small violation at time t from (7). Across time slots, dmin = mint ||xt − bt || and shape of St ’s
control c⋆ , where dmin > 0 is inherent from the definition of c⋆ since a bound on ||xt − bt || is only needed for
the case when xt ̸= bt .

Remark 3 Projecting xt ∈ ∂St−1 onto St to get bt = PSt (xt ), the diameter of St is at most diameter of
St−1 − ||xt − bt ||, however, only along the direction bt − xt . Since the shape of St is arbitrary, as a result, the

8
diameter of St need not be smaller than the diameter of St−1 along any pre-specified direction, which was the
main idea used to derive Lemma 8. Thus, to relate the distance ||xt − bt || with the decrease in the diameter
of the convex bodies St ’s, we use the concept of mean width of a convex body that is defined as the expected
width of the convex body along all the directions that are chosen uniformly randomly (formal definition is
provided in Definition 18).

Next, we upper bound MT by connecting the distance ||xt − bt || to the decrease in mean width (to be
defined ) of convex bodies St−1 and St ’s.

Lemma 11 The total movement cost MT in (9) is at most


 d
2Vd (d − 1) 1
D,
Vd−1 c⋆

where Vd is the (d − 1)-dimensional Lebesgue measure of the unit sphere in d dimensions.



Note that Vd /Vd−1 = O(1/ d). Thus, we get the following main result of the paper for Algorithm 2
combining Lemma 7 and Lemma 11.
Theorem 12 For solving COCO, Algorithm 2 has
d !
√ √

1
Regret[1:T ] = O( T ), and CCV[1:T ] = O d D .
c⋆

Compared to all prior results on COCO, that were√ universal (instance independent),
√ where the best known
one Sinha and Vaze (2024) has Regret[1:T ] = O( T ), and CCV[1:T ] = O( T log T ), Theorem 12 is an
instance dependent result for the CCV. In particular, it exploits the geometric structure of the nested convex
sets St ’s and derives an upper bound on the CCV that only depends on the ‘shape’ of St ’s. It can be the case
that the instance is ‘badly’ behaved and √c⋆ is very small or dependent on T . If that is the case, in Section
6 we show how to limit the CCV to O( T log T ). However, when St ’s are ‘nice’, e.g., c⋆ is independent
of T (Remark 2) or St ’s are spheres or axis parallel cuboids (Lemma 8 and 9), the CCV of Algorithm 2 is
independent of T , which is a fundamentally improved result compared to large body of prior work. In fact,
in prior work this was largely assumed to be not √ possible. In particular, before
√ the result of Sinha and Vaze
(2024), achieving simultaneous Regret[1:T ] = O( T ), and CCV[1:T ] = O( T ) itself was the final goal.

6 Algorithm Switch

Since Theorem 12 provides an instance dependent bound √ on the CCV, that is a function of c which can be
small, it can be the case that its CCV is larger than O( T log T ), thus providing a result that is inferior to that
of Algorithm 1 Sinha and Vaze (2024). Thus, next, we marry the two algorithms, Algorithm 1 and Algorithm
2, in Algorithm 3 to provide a best of both results as follows.

Theorem 13 Switch (Algorithm 3) has regret Regret[1:T ] = O( T ), while
( d ! )
√ √

1
CCV[1:T ] = min O d D , O( T log T ) .
c⋆

Algorithm Switch should be understood as the best of two worlds algorithm, where
√ the two worlds corre-
sponds to one having nice convex sets St ’s that have CCV independent of T or o( T ) for Algorithm 2, while

9
Algorithm 3 Switch
1: Input: Sequence of convex cost functions {ft }T T
t=1 and constraint functions {gt }t=1 , G = a common
Lipschitz constant, d dimension of the admissible set X , D = Euclidean diameter of the admissible set
X , PX (·) = Euclidean projection operator on the set X ,
2: Initialization: Set x1 ∈ X arbitrarily, CCV(0) = 0.
3: For t = 1 : T √
4: If CCV(t − 1) ≤ T log T
5: Follow Algorithm 2
6: CCV(t) = CCV(t − 1) + max{gt (xt ), 0}.
7: Else
8: Follow Algorithm 1 with resetting CCV(t − 1) = 0
9: EndIf
10: EndFor

in the other, CCV of Algorithm 2 is large√ on its own, and the overall CCV is controlled by discontinuing the
use of Algorithm√2 once its CCV reaches T log T and switching to Algorithm 1 thereafter that has universal
guarantee of O( T log T ) on its CCV.
After exhausting the general results on the CCV of Algorithm 2, we next consider the special case of d = 2
and when the sets St have a special structure defined by their projection hyperplanes. Note that it is highly
non-trivial to bound the CCV of Algorithm 2 even when d = 2.

7 Special case of d = 2
In this section, we show that if d = 2 (all convex sets St ’s lie in a plane) and the projections satisfy a
monotonicity property depending on the problem instance, then we can bound the total CCV for Algorithm 2
independent of the time horizon T and consequently getting a O(1) CCV.
Recall from the definition of Algorithm 2, yt = PSt−1 (xt − ηt ∇ft (xt )) and xt+1 = PSt (yt ).
Definition 14 Let the hyperplane perpendicular to line segment (yt , xt+1 ) passing through xt+1 be Ft . With-
out loss of generality, we let yt ∈
/ St , since then the projection is trivial. Essentially Ft is the projection hyper-
plane at time t. Let Ht+ denote the positive half plane corresponding to Ft , i.e., Ht+ = {z : z T (yt − xt+1 ) ≥
0}. Refer to Fig. 2. Let the angle between F1 and Ft be θt .

Definition 15 The instance S1 ⊇ S2 ⊇ · · · ⊇ ST is defined to be monotonic if θ2 ≤ θ3 ≤ · · · ≤ θT .


Theorem 16 For d = 2 when the instance is monotonic, CCV[1:T ] for Algorithm 2 is at most O(GD).
Theorem 16 provides a universal guarantee on the CCV of Algorithm 2 that is independent of the problem
instance (as long as it is monotonic) unlike Lemma 11, even though it applies only for d = 2. The proof
is derived by using basic convex geometry results from Manselli and Pucci (1991) in combination with ex-
ploiting the definition of Algorithm 2 and the monotonicity condition. It is worth noting that even under the
monotonicity assumption it is non-trivial to upper bound the CCV since the successive angles made by Ft
with F1 can increase arbitrarily slowly, making it difficult to control the total CCV.

8 OCS Problem
In Sinha and Vaze (2024), a special case of COCO, called the OCS problem, was introduced where ft ≡ 0 for
all t. Essentially, with OCS, only constraint satisfaction is the objective. In Sinha and Vaze (2024), Algorithm

10
Figure 2: Definition of Ft ’s.


1 was shown to have CCV of O( T log T ). Next, we show that Algorithm 2 has CCV of O(1) for the OCS,
a remarkable improvement.

Theorem 17 For solving OCS, Algorithm 2 has CCV[1:T ] = O dd/2 D .

As discussed in Sinha and Vaze (2024), there are important applications of OCS, and it is important to
find tight bounds on its CCV. Theorem 17 achieves this by showing that CCV of O(1) can be achieved, where
the constant depends only on the dimension of√the action space and the diameter. This is a fundamental im-
provement compared to the CCV bound of O( T log T ) from Sinha and Vaze (2024). Theorem 17 is derived
by using the connection between the curve obtained by successive projections on nested convex sets and self-
expanded curves (Definition 24) and then using a classical result on self-expanded curves from Manselli and
Pucci (1991).

9 Conclusions
One fundamental open question for COCO is: whether it is possible to simultaneously achieve R[1:T ] =
√ √
O( T ) and CCV[1:T ] = o( T ) or CCV[1:T ] = O(1). In this paper, we have made substantial progress
towards answering this question by proposing an algorithm that exploits the geometric properties of the nested
convex sets St√’s that effectively control the CCV. The state of the art algorithm Sinha and Vaze (2024) achieves
a CCV of Ω( T log T ) even for very simple√instances as shown in Lemma 6, and conceptually different
algorithms were needed to achieve CCV of o( T ). We propose one such algorithm and show that when the
nested convex constraint
√ sets are ‘nice’ (instances is simple), achieving a CCV of O(1) is possible without
losing out on O( T ) regret guarantee. We also derived a bound on the CCV for general problem instances,
that is as a function of the shape of nested convex constraint sets and the distance between them, and the
diameter.
In the absence of good lower bounds, the open question remains open in general, however, this paper
significantly improves the conceptual understanding of COCO problem by demonstrating that good algorithms
need to exploit the geometry of the nested convex constraint sets.

11
One remark we want to make at the end is that COCO is inherently a difficult problem, which is best
exemplified by the fact, that even for the special case of COCO where ft = f for all t, essentially where f
is known ahead of time, our algorithm/prior work does not yield a better regret or CCV bound compared to
when ft ’s are arbitrarily varying.

References
CJ Argue, Sébastien Bubeck, Michael B Cohen, Anupam Gupta, and Yin Tat Lee. A nearly-linear bound for
chasing nested convex bodies. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete
Algorithms, pages 117–122. SIAM, 2019.
Nikhil Bansal, Martin Böhm, Marek Eliáš, Grigorios Koumoutsos, and Seeun William Umboh. Nested convex
bodies are chaseable. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete
Algorithms, pages 1253–1260. SIAM, 2018.

Sébastien Bubeck, Bo’az Klartag, Yin Tat Lee, Yuanzhi Li, and Mark Sellke. Chasing nested convex bodies
nearly optimally. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms,
pages 1496–1508. SIAM, 2020.
Xuanyu Cao and KJ Ray Liu. Online convex optimization with time-varying constraints and bandit feedback.
IEEE Transactions on automatic control, 64(7):2665–2680, 2018.

Tianyi Chen and Georgios B Giannakis. Bandit convex optimization for scalable and dynamic iot management.
IEEE Internet of Things Journal, 6(1):1276–1286, 2018.
Harold Gordon Eggleston. Convexity, 1966.

Hengquan Guo, Xin Liu, Honghao Wei, and Lei Ying. Online convex optimization with hard constraints:
Towards the best of two worlds and beyond. Advances in Neural Information Processing Systems, 35:
36426–36439, 2022.
Elad Hazan. The convex optimization approach to regret minimization. Optimization for machine learning,
page 287, 2012.

Rodolphe Jenatton, Jim Huang, and Cédric Archambeau. Adaptive algorithms for online convex optimization
with long-term constraints. In International Conference on Machine Learning, pages 402–411. PMLR,
2016.
Nikolaos Liakopoulos, Apostolos Destounis, Georgios Paschos, Thrasyvoulos Spyropoulos, and Panayotis
Mertikopoulos. Cautious regret minimization: Online optimization with long-term budget constraints. In
International Conference on Machine Learning, pages 3944–3952. PMLR, 2019.
Qingsong Liu, Wenfei Wu, Longbo Huang, and Zhixuan Fang. Simultaneously achieving sublinear regret
and constraint violations for online convex optimization with time-varying constraints. ACM SIGMETRICS
Performance Evaluation Review, 49(3):4–5, 2022.

Mehrdad Mahdavi, Rong Jin, and Tianbao Yang. Trading regret for efficiency: online convex optimization
with long term constraints. The Journal of Machine Learning Research, 13(1):2503–2528, 2012.
Paolo Manselli and Carlo Pucci. Maximum length of steepest descent curves for quasi-convex functions.
Geometriae Dedicata, 38(2):211–227, 1991.

12
Michael J Neely. Stochastic network optimization with application to communication and queueing systems.
Synthesis Lectures on Communication Networks, 3(1):1–211, 2010.
Michael J Neely and Hao Yu. Online convex optimization with time-varying constraints. arXiv preprint
arXiv:1702.04783, 2017.

Abhishek Sinha and Rahul Vaze. Optimal algorithms for online convex optimization with adversarial con-
straints. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL
https://openreview.net/forum?id=TxffvJMnBy.
Wen Sun, Debadeepta Dey, and Ashish Kapoor. Safety-aware algorithms for adversarial contextual bandit. In
International Conference on Machine Learning, pages 3280–3288. PMLR, 2017.

Rahul Vaze. On dynamic regret and constraint violations in constrained online convex optimization. In 2022
20th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks
(WiOpt), pages 9–16, 2022. doi: 10.23919/WiOpt56218.2022.9930613.
Xinlei Yi, Xiuxian Li, Tao Yang, Lihua Xie, Tianyou Chai, and Karl Johansson. Regret and cumulative
constraint violation analysis for online convex optimization with long term constraints. In International
Conference on Machine Learning, pages 11998–12008. PMLR, 2021.
Xinlei Yi, Xiuxian Li, Tao Yang, Lihua Xie, Yiguang Hong, Tianyou Chai, and Karl H Johansson. Distributed
online convex optimization with adversarial constraints: Reduced cumulative constraint violation bounds
under slater’s condition. arXiv preprint arXiv:2306.00149, 2023.

Hao Yu and Michael J Neely. A low complexity algorithm with o( T ) regret and o(1) constraint violations
for online convex optimization with long term constraints. arXiv preprint arXiv:1604.02218, 2016.
Hao Yu, Michael Neely, and Xiaohan Wei. Online convex optimization with stochastic constraints. Advances
in Neural Information Processing Systems, 30, 2017.
Jianjun Yuan and Andrew Lamperski. Online convex optimization for cumulative constraints. Advances in
Neural Information Processing Systems, 31, 2018.

13
10 Proof of Lemma 7
Proof: From the convexity of ft ’s, for x⋆ satisfying Assumption (3), we have

ft (xt ) − ft (x⋆ ) ≤ ∇ftT (xt − x⋆ ).

From the choice of Algorithm 2 for xt+1 , we have

||xt+1 − x⋆ ||2 = ||PSt (yt ) − x⋆ ||2


(a)
≤ ||yt − x⋆ ||2 ,
= ||PSt−1 (xt − ηt ∇ft (xt )) − x⋆ ||2 ,
(n)
≤ ||(xt − ηt ∇ftT (xt )) − x⋆ ||2 ,

where inequalities (a) and (b) follow since x⋆ ∈ St for all t. Hence

||xt+1 − x⋆ ||2 ≤ ||xt − x⋆ ||2 + ηt2 ||∇ft (xt )||2 − 2ηt ∇ftT (xt )(xt − x⋆ ),
||xt − x⋆ ||2 − ||xt+1 − x⋆ ||2
∇ftT (xt )(xt − x⋆ ) ≤ + ηt G2 .
ηt
Summing this over t = 1 to T , we get
T
X T
X
2 (ft (xt ) − ft (x⋆ )) ≤ ∇ftT (xt − x⋆ ),
t=1 t=1
T T
X ||xt − x⋆ ||2 − ||xt+1 − x⋆ ||2 X
≤ + ηt G2 ,
t=1
ηt t=1
T
1 X
≤ D2 + G2 ηt ,
ηT t=1

≤ O(DG T ),

where the final inequality follows by choosing ηt = D


√ .
G t
2

11 Proof of Theorem 11
Proof: We need the following preliminaries.

Definition 18 Let K be a non-empty convex bounded set in Rd . Let u be a unit vector, and ℓu a line through
the origin parallel to u. Let Ku be the orthogonal projection of K onto ℓu , with length |Ku |. The mean width
of K is defined as Z
1
W (K) = |Ku |du, (10)
Vd Sd1
where Sd1 is the unit sphere in d dimensions and Vd its (d − 1)-dimensional Lebesgue measure.

The following is immediate.


0 ≤ W (K) ≤ diameter(K). (11)

14
Lemma 19 Eggleston (1966) For d = 2,
Perimeter(K)
W (K) = .
π
Lemma 19 implies that W (K) ̸= W (K1 ) + W (K2 ) even if K1 ∪ K2 = K and K1 ∩ K2 = ϕ.
Recall from (7) that xt ∈ ∂St−1 and bt is the projection of xt onto St , and mt is the mid-point of xt and
bt , i.e. mt = xt +b
2 . Moreover, the convex sets St ’s are nested, i.e., S1 ⊇ S2 ⊇ · · · ⊇ ST . To prove Theorem
t

11 we will bound the rate at which W (St ) (Definition 18) decreases as a function of the length ||xt − bt ||.
From Definition 10, recall that Ct is the convex hull of mt ∪ St . We also need to define Ct− as the convex
hull of xt ∪ St . Since St ⊆ Ct and Ct− ⊆ St−1 (since St−1 is convex and xt ∈ St−1 ), we have

W (St ) − W (St−1 ) ≤ W (Ct ) − W (Ct− ). (12)

Definition 20 ∆t = W (Ct ) − W (Ct− ).


The main ingredient of the proof is the following Lemma that bounds ∆t whose proof is provided after
completing the proof of Theorem 11.
Lemma 21
||xt − bt || ⋆ d
∆t ≤ −Vd−1 (c ) ,
2Vd (d − 1) t
where c⋆t has been defined in Definition 10.
Recalling that c⋆ = mint c⋆t from Definition 10, and combining Lemma 21 with (11) and (12), we get that
T  d
X 2Vd (d − 1) 1
||xt − bt || ≤ diameter(S1 ),
t=1
Vd−1 c⋆

since S1 ⊇ S2 ⊇ · · · ⊇ ST . Recalling that diameter(S1 ) ≤ D, Theorem 11 follows. 2


Proof: [Proof of Lemma 21]
Let Hu be the hyperplane perpendicular to vector u. Let U0 be the set of unit vectors u such that hyper-
planes Hu are supporting hyperplanes to Ct at point mt such that Ct ∩ Hu = {mt } and uT (xt − mt ) ≥ 0.
See Fig. 3 for reference.
Since bt is a projection of xt onto St , and mt is the mid-point of xt , bt , for u ∈ U0 , the hyperplane Hu′
containing xt and parallel to Hu is a supporting hyperplane for Ct− .
Thus, using the definition of Ku from (10),
||xt − bt || (xt − mt )
Z Z
1 −
∆t ≤ (|Ct,u | − |Ct,u |)du = − uT du, (13)
V d U0 2Vd U0 ||xt − mt ||

since ||xt − mt || = ||xt − bt ||/2.


Recall the definition of Cwt⋆ (c⋆t ) from Definition 10 which implies that the convex hull of mt and St , Ct is
contained in Cwt⋆ (c⋆t ). Next, we consider U1 the set of unit vectors u such that hyperplanes Hu are supporting
hyperplanes to Cwt⋆ (c⋆t ) at point mt such that uT (xt − mt ) ≥ 0. By definition Ct ⊆ Cwt⋆ (c⋆t ), it follows that
U1 ⊂ U0 .
Thus, from (13)
||xt − bt || (xt − mt )
Z
∆t ≤ − uT . du (14)
2Vd U1 ||xt − mt ||
Recalling the definition of wt⋆ (Definition 10), vector u ∈ U1 can be written as
p
u = λu⊥ + 1 − λ2 wt⋆ ,

15
Figure 3: Figure representing the cone Cwt (ct ) that contains the convex hull of mt and St with respect to the
unit vector wt . u is a unit vector perpendicular to Hu an hyperplane that is a supporting hyperplane Ct at mt
such that Ct ∩ Hu = {mt } and uT (xt − mt ) ≥ 0

where uT⊥ wt⋆ = 0, |u⊥ | = 1 and since u ∈ U1


q
0 ≤ λ = 1 − (uT wt⋆ ) = uT u⊥ ≤ c⋆t .

Let S⊥ = {u⊥ : |u⊥ | = 1, uT⊥ wt⋆ = 0}. Let du⊥ be the (n − 2)-dimensional Lebesgue measure of S⊥ .
It is easy to verify that du = λd−2 (1 − λ2 )−1/2 dλdu⊥ and hence from (14)
c⋆
||xt − bt || (xt − mt )
Z t
Z p
∆t ≤ − λd−2 (1 − λ2 )−1/2 dλ (λu⊥ + 1 − λ2 wt⋆ )T du⊥ . (15)
Vd 0 S⊥ ||xt − mt ||
R
Note that du⊥
u⊥ du⊥ = 0. Thus,

c⋆
||xt − bt || (wt⋆ )T (xt − mt )
Z t p Z
d−2 2 −1/2
∆t = − λ (1 − λ ) 1− λ2 dλ du⊥ ,
2Vd ||xt − mt || 0 S⊥
c⋆
(a) ||xt − bt || (wt⋆ )T (xt − mt )
Z t
≤ −Vd−1 λd−2 dλ,
2Vd ||xt − mt || 0
(b) ||xt − bt || ⋆ ⋆ d−1
≤ −Vd−1 c (c ) ,
2Vd (d − 1) t t
||xt − bt || ⋆ d
= −Vd−1 (c ) , (16)
2Vd (d − 1) t
(wt⋆ )T (xt −mt )
≥ c⋆t from Definition
R
where (a) follows since S⊥
du⊥ = Vd−1 by definition, (b) follows since ||xt −mt ||
10.
2

16
12 Proof of Theorem 13
Proof: Since CCV(t) is a monotone non-decreasing function, let tmin be the largest time until which Algo-
rithm 2 is followed
√ by Switch. The regret guarantee is easy to prove. From Theorem 12, regret until time tmin
is at most O(
√ min ). Moreover, starting from time tmin till T , from Theorem
t √ 5, the regret of Algorithm 1 is
at most O( T − tmin ). Thus, the overall regret for Switch is at√most O( T ).
For the CCV, with Switch, until time tmin , CCV(tmin ) ≤ T log T . At time tmin , Switch starts to use
Algorithm 1 which has the following appealing property from (8) Sinha and Vaze (2024) that for any t ≥ tmin
where at time tmin Algorithm 1 was started to be used with resetting CCV(tmin ) = 0. For any t ≥ tmin
v
u t

u X 2 √
Φ(CCV(t)) + Regrett (x ) ≤ t Φ′ (CCV(τ )) + t − tmin . (17)
τ =tmin

where β = (2GD)−1 , V = 1, λ = 1

2 T
, Φ(x) = exp(λx)−1, and λ = 1

2 T
. We trivially have Regrett (x⋆ ) ≥
Dt
− 2D ≥ − 2t . Hence, from (17), we have that for any λ = 1

2 T
and any t ≥ tmin
√
CCV[tmin ,T ] ≤ 4GD ln(2 1 + 2T ) T.
√ √
Since as argued before, with Switch, CCV(tmin ) ≤ T log T , we get that CCV[1:T ] ≤ O( T log T ). 2

13 Preliminaries for Bounding the CCV in Theorem 17 and Theorem


16
Let K1 , . . . , KT be nested (i.e., K1 ⊇ K2 ⊃ K3 ⊇ · · · ⊇ KT ) bounded convex subsets of Rd .

Definition 22 If σ1 ∈ K1 , and σt+1 = PKt+1 (σt ), for t = 1, . . . , T . Then the curve

σ = {(σ1 , σ2 ), (σ2 , σ3 ), . . . , (σT −1 , σT )}

is called the projection curve on K1 , . . . , KT .

We are interested in upper bounding the quantity


T
X −1
Σ = max ||σt − σt+1 ||. (18)
σ
t=1

Lemma 23 For a projection curve σ, Σ ≤ dd/2 diameter(K1 ).

To prove the result we need the following definition.

Definition 24 A curve γ : I → Rd is called self-expanded, if for every t where γ ′ (t) exists, we have

< γ ′ (t), γ(t) − γ(u) > ≥ 0

for all u ∈ I with u ≤ t, where < ., . > represents the inner product. In words, what this means is that
γ starting in a point x0 is self expanded, if for every x ∈ γ for which there exists the tangent line T, the
arc (sub-curve) (x0 , x) is contained in one of the two half-spaces, bounded by the hyperplane through x and
orthogonal to T.

17
For self-expanded curves the following classical result is known.
Theorem 25 Manselli and Pucci (1991) For any self-expanded curve γ belonging to a closed bounded convex
set of Rd with diameter D, its total length is at most O(dd/2 D).
Proof: [Proof of Lemma 23] From Definition 22, the projection curve is

σ = {(σ1 , σ2 ), (σ2 , σ3 ), . . . , (σT −1 , σT )}.

Let the reverse curve be r = {rt }t=0,...,T −2 , where rt = (σT −t , σT −t−1 ). Thus we are reading σ backwards
and calling it r. Note that since σt is the projection of σt−1 on Kt , each piece-wise linear segment (σt , σt+1 )
is a straight line and hence differentiable except at the end points. Moreover, since each σt is obtained by
projecting σt−1 onto Kt and Kt+1 ⊆ Kt , we have that the projection hyperplane Ft that passes through σt =
PKt (σt−1 ) and is perpendicular to σt −σt−1 separates the two sub curves {(σ1 , σ2 ), (σ2 , σ3 ), . . . , (σt−1 , σt )}
and {(σt , σt+1 ), (σt+1 , σt+2 ), . . . , (σT −1 , σT )}.
Thus, we have that for each segment rτ , at each point where it is differentiable, the curve r1 , . . . rτ −1 lies
on one side of the hyperplane that passes through the point and is perpendicular to rτ +1 . Thus, we conclude
that curve r is self-expanded.
As a result, Theorem 24 implies that the length of r is at most O(dd/2 diameter(K1 )), and the result follows
since the length of r is same as that of σ which is Σ. 2

14 Proof of Theorem 17
Clearly, with ft ≡ 0 for all t, with Algorithm 2, yt = xt and the successive xt ’s are such that xt+1 = PSt (xt ).
Thus, essentially, the curve x = (x1 , x2 ), (x2 , x3 ), . . . , (xT −1 , xT ) formed by Algorithm 2 for OCS is a
projection curve (Definition 22) on S1 ⊇, . . . , ⊇ ST and the result follows from Lemma 23 and the fact that
diameter(S1 ) ≤ D.

15 Proof of Theorem 16
Proof: Recall that d = 2, and the definition of Ft from Definition 14. Let the center be c = PS1 (x1 ). Let
torth be the earliest t for which ∠(Ft , F1 ) = π.
Initialize κ = 1, s(1) = 1, τ (1) = 1.
BeginProcedure Step 1:Definition of Phase κ. Consider

τ (κ) = arg max t.


s(κ)<t≤torth ,∠(Fs(κ) ,Ft )≤π/4

If there is no such τ (κ),


Phase κ ends, define Phase κ as Empty, s(κ + 1) = τ (κ) + 1.
Else If
∠(Fτ (κ) , F1 ) = π Exit
Else If
s(κ + 1) = τ (κ)
End If
Increment κ = κ + 1, and Go to Step 1.
EndProcedure

18
Figure 4: Figure corresponding to Example 26.

Example 26 To better understand the definition of phases, consider Fig. 4, where the largest t for which the
angle between Ft and F1 is at most π/4 is 3. Thus, τ (1) = 3, i.e., phase 1 explores till time t = 3 and phase 1
ends. The starting hyperplane to consider in phase 2 is s(2) = 3 and given that angle between F3 and and the
next hyperplane F4 is more than π/4, phase 2 is empty and phase 2 ends by exploring till t = 4. The starting
hyperplane to consider in phase 3 is s(3) = 4 and the process goes on. The first time t such that the angle
between F1 and Ft is π is t = 6, and thus torth = 6, and the process stops at time t = 6. This also implies that
S6 ⊂ F1 . Since St ’s are nested, for all t ≥ 6, St ⊂ F1 . Hence the total CCV after t ≥ torth is at most GD.

The main idea with defining phases, is to partition the whole space into empty and non-empty regions,
where in each non-empty region, the starting and ending hyperplanes have an angle to at most π/4, while in
an empty phase the starting and ending hyperplanes have an angle of at least π/4. Thus, we get the following
simple result.

Lemma 27 For d = 2, there can be at most 4 non-empty and 4 empty phases.

19
Proof is immediate from the definition of the phases, since any consecutively occurring non-empty and empty
phase exhausts an angle of at least π/4.

Remark 4 Since we are in d = 2 dimensions, for all t ≥ torth , the movement is along the hyperplane F1 and
thus the resulting constraint violation after time t ≥ torth is at most GD. Thus, in the phase definition above,
we have only considered time till torth and we only need to upper bound the CCV till time torth .

We next define the following required quantities.

Definition 28 With respect to the quantities defined for Algorithm 2, let for a non-empty phase κ
T
rmax (κ) = max ||yt − c|| and t⋆ (κ) = arg max ||yt − c||.
s(κ)<t≤τ (κ) s(κ)<t≤τ (κ)

t⋆ (κ) is the time index belonging to phase κ for which yt is the farthest.

Definition 29 A non-empty phase κ consists of time slots T (κ) = [τ (κ−1), τ (κ)] and the angle ∠(Ft1 , Ft2 ) ≤
π/4 for all t1 , t2 ∈ T (κ). Using Definition 28, we partition T (κ) as T (κ) = T − (κ) ∪ T + (κ), where
T − (κ) = [τ (κ − 1) + 1, t⋆ (κ) + 1] and T + (κ) = [t⋆ (κ) + 2, τ (κ)].

Thus, T (κ) and T (κ + 1) have one common time slot.

Definition 30 [Definition of zt (κ) for t ∈ T − (κ)]. Let zt⋆ (κ)+1 = xt⋆ (κ)+1 . For t ∈ T − (κ)\t⋆ (κ) + 1,
define zt (κ) inductively as follows. zt (κ) is the pre-image of zt+1 (κ) on Ft−1 such that the projection of zt (κ)
on Ft is zt+1 (κ).

Definition 31 [Definition of zt (κ) for t ∈ T + (κ)]. For t ∈ T + (κ), define zt (κ) inductively as follows.
zt (κ) is the projection of zt−1 (κ) on Ft−1 .

See Fig. 5 for a visual illustration of t⋆ (κ) and zt (κ).


The main idea behind defining zt (κ)’s is as follows. For each non-empty phase, we will construct a
projection curve (Definition 22) using points zk such that the length of the projection curve upper bounds
the CCV of Algorithm 2 (shown in Lemma 37), and then use Lemma 23 to upper bound the length of the
projection curve.

Definition 32 [Definition of St′ for a non-empty phase κ:] St′⋆ (κ)+1 = St⋆ (κ)+1 . For t ∈ T − (κ)\t⋆ (κ) + 1,

St′ is the convex hull of zt+1 (κ) ∪ St ∪ St+1 (κ). For t ∈ T + (κ), St′ = St . See Fig. 6.

Lemma 33 For a non-empty phase κ, for any t ∈ T (κ), St+1 ⊆ St′ , i.e. they are nested.

Definition 34 For a non-empty phase, χ(κ) = Sτ′ (κ−1) ∩ Hτ+(κ) , where Hτ+(κ) has been defined in Definition
14.

Definition 35 [New Violations for t ∈ T (κ):] For a non-empty phase κ, for t ∈ T (κ)\τ (κ − 1), let

vt (κ) = ||zt (κ) − zt−1 (κ)||.



Lemma 36 For each non-empty phase κ, all zt (κ)’s for t ∈ T √(κ) belongs to B(c, 2D), where B(c, r) is a
ball with radius r centered at c. In other words, χ(κ) ⊆ B(c, 2D).

20
Figure 5: Illustration of definition of zt (κ) for t ∈ T (κ). In this example, for phase 1, t⋆ (1) = 3 since the
distance of y3 from c is the farthest for phase 1 that consists of time slots T (1) = {2, 3}. Hence zt⋆ (1)+1 (1) =
x4 . For t ∈ T (1)\t⋆ (1) + 1, zt (1) are such zt+1 (1) is a projection of zt (1) onto Ft .

21
Figure 6: Definition of St ’s where Ut are the extra regions that are added to St to get St′ .

Proof: Recall that for a non-empty phase κ, T (κ) = T − (κ) ∪ T + (κ). We first√argue about t ∈ T − (κ).
By definition, zt⋆ (κ)+1 = xt⋆ (κ)+1 and xt⋆ (κ)+1 ∈ St⋆ (κ) . Thus, zt⋆ (κ)+1 ∈ B(c, 2D). Next we argue for
t ∈ T − (κ)\t⋆ (κ) + 1. Recall that the diameter of X is D, and the fact that yt ∈ St−1 from Algorithm 2.
Thus, for any non-empty phase κ, the distance from c to the farthest yt belonging to the phase κ is at most D,
i.e., rmax (κ) ≤ D. Let the pre-image of zt⋆ (κ)+1 (κ) onto Fs(κ) (the base hyperplane with respect to which
all hyperplanes have an angle of at most π/4 in phase κ) be p(κ) such that projection of p(κ) onto Fs(κ) is
zt⋆ (κ)+1 (κ). From the definition of any non-empty phase, the angle between Fs(κ) and Ft for t ∈ T (κ) is at

most π/4. Thus, the distance of p(κ) from c is at most 2D.
Consider the ‘triangle’ Π(κ) that is the convex hull of c, zt⋆ (κ)+1 (κ) and p(κ). Given that the angle
between Ft⋆ (κ) and Ft⋆ (κ)−1 is at most π/4, the argument above implies that zt (κ) ∈ Π(κ) for t = t⋆ (κ). For

t = t⋆ (κ) − 1, zt (κ) ∈ Ft−1 is the projection of zt−1 (κ) onto St−1 . This implies that the distance of zt (κ)

(for t = t (κ) − 1) from c is at most

D
,
cos(αt,t⋆ (κ) ) cos(αt⋆ (κ),t⋆ (κ)+1 )

where αt1 ,t2 is the angle between Ft1 and Ft2 . From the monotonicity of angles θt (Definition 15), and the def-
inition of a non-empty phase, we have that αt,t⋆ (κ) + αt⋆ (κ),t⋆ (κ)+1 ≤ π/4 and αt,t⋆ (κ) ≥ 0, αt⋆ (κ),t⋆ (κ)+1 ≥
0. Next, we appeal to the identity
cos(A + B) ≤ cos(A) cos(B) (19)
where A + B ≤ π/4, to claim that zt (κ) ∈ Π(κ) for t = t⋆ (κ) − 1.
identity (19) gives the result that for any t ∈ T − (κ), we
Iteratively using this argument while invoking the√
have that zt (κ) belongs to Π(κ). Since Π(κ) ⊆ B(c, 2D), we have the claim for all t ∈ T − (κ).

22
By definition zt (κ) for t ∈ T + (κ) belong to St−1 ⊆ S1 . Thus, their distance from c is at most D. 2

Lemma 37 For each non-empty phase κ, and for t ∈ T (κ) the violation vt (κ) ≥ dist(xt , St ), where
dist(xt , St ) is the original violation.

Proof: By construction of any non-empty phase κ, for t ∈ T (κ) both xt (κ) and zt (κ) belong to Ft−1 .
Moreover, by construction, the distance of zt (κ) from c is at least as much as the distance of xt from c.
Thus, using the monotonicity property of angles θt (Definition 15) we get the result. See Fig. 5 for a visual
illustration. 2
For each non-empty phase κ, by definition, the curve defined by sequence zt (κ) for t ∈ T (κ) is a pro-
jection curve (Definition 22) on sets St′ (κ) (note that St′ (κ)’s are nested from Lemma 33). Moreover, for all
t ∈ T (κ), set St′ (κ) ⊂ χ(κ) which is a bounded convex set. Thus, for d = 2 from Lemma 23 the length of
curve z(κ) = {(zt (κ), zt+1 (κ))}t∈T (κ)
X
vt (κ) ≤ 2diameter(χ(κ)). (20)
t∈T (κ)

√ number of non-empty phases till time torth is at most 4. Moreover, in each non-empty
By definition, the
phase χ(κ) ⊆ B(c, 2D) from Lemma 36 .
Thus, from (20), we have that
X X X
vt (κ) ≤ 2 diameter(χ(κ))
Phase κ is non-empty t∈T (κ) Phase κ is non-empty

≤ 8 diameter(B(c, 2D)) ≤ O(D). (21)

Using Lemma 37, we get


X X
dist(xt , St ) ≤ O(D). (22)
Phase κ is non-empty t∈T (κ)

For any empty phase, the constraint violation is the length of line segment (xt , PSt (xt )) (Algorithm 2)
crossing it is a straight line whose length is at most O(D). Moreover, the total number of empty phases
(Lemma 27) is a constant. Thus, the length of the curve (xt , PSt (xt )) for Algorithm 2 corresponding to all
empty phases is at O(D).
Recall from (6) that the CCV is at most G times dist(xt , St ). Thus, from (22) we get that the total violation
incurred by Algorithm 2 corresponding to non-empty phases is at most O(GD), while corresponding to empty
phases is at O(GD). Finally, accounting for the very first violation dist(x1 , S1 ) ≤ D and the fact that the
CCV after time t ≥ torth (Remark 4) is at most GD, we get that the total constraint violation CCV[1:T ] for
Algorithm 2 is at most O(GD).
2

23

You might also like