Rahul Paper
Rahul Paper
Abstract
The constrained version of the standard online convex optimization (OCO) framework, called COCO is
considered, where on every round, a convex cost function and a convex constraint function are revealed to
the learner after it chooses the action for that round. The objective is to simultaneously minimize the static
regret
√ and cumulative constraint violation
√ (CCV). An algorithm is proposed that guarantees a static regret of
O( T ) and a CCV of min{V, O( T log T )}, where V depends on the distance between the consecutively
revealed constraint sets, the shape of constraint sets, dimension of action space and the diameter of the action
space.√For special cases of constraint
√ sets, V = O(1). Compared to the state of the art results, static regret
of O( T ) and CCV of O( T log T ), that were universal, the new result on CCV is instance dependent,
which is derived by exploiting the geometric properties of the constraint sets.
1 Introduction
In this paper, we consider the constrained version of the standard online convex optimization (OCO) frame-
work, called constrained OCO or COCO. In COCO, on every round t, the online algorithm first chooses an
admissible action xt ∈ X ⊂ Rd , and then the adversary chooses a convex loss/cost function ft : X → R and
a constraint function of the form gt (x) ≤ 0, where gt : X → R is a convex function. Since gt ’s are revealed
after the action xt is chosen, an online algorithm need not necessarily take feasible actions on each round, and
in addition to the static regret
T
X T
X
Regret[1:T ] ≡ sup sup RegretT (x⋆ ), where RegretT (x⋆ ) ≡ ft (xt ) − ft (x⋆ ), (1)
{ft }T
t=1
x⋆ ∈X t=1 t=1
an additional metric of interest is the total cumulative constraint violation (CCV) defined as
T
X
CCV[1:T ] ≡ max(gt (xt ), 0). (2)
t=1
Let X ⋆ be the feasible set consisting of all admissible actions that satisfy all constraints gt (x) ≤ 0, t ∈ [T ].
Under the standard assumption that X ⋆ is not empty (called the feasibility assumption), the goal is to design
1
an online algorithm to simultaneously achieve a small regret (1) with respect to any admissible benchmark
x⋆ ∈ X ⋆ and a small CCV (2).
With constraint sets Gt = {x ∈ X : gt (x) ≤ 0} being convex for all t, and the assumption X ⋆ = ∩t Gt ̸=
∅ implies that sets St = ∩tτ =1 Gτ are convex and are nested, i.e. St ⊆ St−1 and X ⋆ ∈ St for all t. Essentially,
set St ’s are sufficient to quantify the CCV.
2
1.3 Limitations of Prior Work
We explicitly show in
√ Lemma 6 that the best known algorithm Sinha and Vaze (2024) for solving COCO
suffers a CCV of Ω( T log T ) for ‘simple’ problem instances where ft = f and gt = g for all t and d = 1
dimension, for which ideally the CCV should be O(1). The same is true for most other prior algorithms,
where the main reason for their large CCV for simple instances is that all these algorithms treat minimizing
the CCV as a regret minimization problem for functions gt . What they fail to exploit is the geometry of the
underlying nested convex sets St ’s that control the CCV.
3
Reference Regret CCV Complexity per round
√ √
Neely and Yu (2017) O( T ) O( T ) Conv-OPT, Slater’s condition
√ 3
Guo et al. (2022) O( T ) O(T 4 ) Conv-OPT
max(β,1−β) 1−β/2
Yi et al. (2023) O(T
√ ) O(T
√ ) Conv-OPT
Sinha and Vaze (2024) O(√T ) O( T log T√) Projection
This paper O( T ) O(min{V, T log T }) Projection
Table 1: Summary of the results on COCO for arbitrary time-varying convex constraints and convex cost functions. In
the above table, 0 ≤ β ≤ 1 is an adjustable parameter. Conv-OPT refers to solving a constrained convex optimization
problem on each round. Projection refers to the Euclidean projection operation on the convex set X . The CCV bound for
this paper is stated in terms of V which can be O(1) or depends on the shape of convex sets St ’s.
√
• For the OCS problem, we show that the CCV of Algorithm 2 is O(1) compared to the CCV of O( T log T )
Sinha and Vaze (2024).
2 COCO Problem
On round t, the online policy first chooses an admissible action xt ∈ X ⊂ Rd , and then the adversary chooses
a convex cost function ft : X → R and a constraint of the form gt (x) ≤ 0, where gt : X → R is a convex
function. Once the action xt has been chosen, we let ∇ft (xt ) and full function gt or the set {x : gt (x) ≤ 0}
to be revealed, as is standard in the literature. We now state the standard assumptions made in the literature
while studying the COCO problem Guo et al. (2022); Yi et al. (2021); Neely and Yu (2017); Sinha and Vaze
(2024).
Assumption 1 (Convexity) X ⊂ Rd is the admissible set that is closed, convex and has a finite Euclidean
diameter D. The cost function ft : X 7→ R and the constraint function gt : X 7→ R are convex for all t ≥ 1.
Assumption 2 (Lipschitzness) All cost functions {ft }t≥1 and the constraint functions {gt }t≥1 ’s are G-
Lipschitz, i.e., for any x, y ∈ X , we have
|ft (x) − ft (y)| ≤ G||x − y||, |gt (x) − gt (y)| ≤ G||x − y||, ∀t ≥ 1.
The feasibility assumption distinguishes the cost functions from the constraint functions and is common across
all previous literature on COCO Guo et al. (2022); Neely and Yu (2017); Yu and Neely (2016); Yuan and
Lamperski (2018); Yi et al. (2023); Liakopoulos et al. (2019); Sinha and Vaze (2024).
For any real number z, we define (z)+ ≡ max(0, z). Since gt ’s are revealed after the action xt is chosen,
any online policy need not necessarily take feasible actions on each round. Thus in addition to the static1
regret defined below
1 The static-ness refers to the fixed benchmark using only one action x⋆ throughout the horizon of length T
4
PT PT
where Regret[1:T ] (x⋆ ) ≡ t=1 ft (xt ) − t=1 ft (x⋆ ), an additional obvious metric of interest is the total
cumulative constraint violation (CCV) defined as
T
X
CCV[1:T ] = (gt (xt ))+ . (4)
t=1
Under the standard assumption (Assumption 3) that X ⋆ is not empty, the goal is to design an online policy to
simultaneously achieve a small regret (1) with x⋆ ∈ X ⋆ and a small CCV (2). We refer to this problem as the
constrained OCO (COCO).
For simplicity, we define set
St = ∩tτ =1 Gτ , (5)
where Gt is as defined in Assumption 3. All Gt ’s are convex and consequently, all St ’s are convex and are
nested, i.e. St ⊆ St−1 . Moreover, because of Assumption 3, each St is non-empty and in particular X ⋆ ∈ St
for all t. After action xt has been chosen, set St controls the constraint violation, which can be used to write
an upper bound on the CCV[1:T ] as follows.
Lemma 6 Even when d = 1 and ft (x) = f (x) and gt (x) = g(x) for all t, for Algorithm 1, its CCV[1:T ] =
√
Ω( T log T ).
Proof: Input: Consider d = 1, and let X = [1, a], a > 2. Moreover, let ft (x) = f (x) and gt (x) = g(x) for
all t. Let f (x) = cx2 for some (large) c > 0 and g(x) be such that G = {x : g(x) ≤ 0} ⊆ [a/2, a] and let
|∇g(x)| ≤ 1 for all x.
Let 1 < x1 < a/2. Note that CCV(t) (defined in Algorithm 1) is a non-decreasing function, and let t⋆
be the earliest time t such that Φ′ (CCV(t))∇g(x) < −c. For f (x) = cx2 , ∇f (x) ≥ c for all x > 1. Thus,
5
Algorithm 1 Online Algorithm from Sinha and Vaze (2024)
1: Input: Sequence of convex cost functions {ft }T T
t=1 and constraint functions {gt }t=1 , G = a common
Lipschitz constant, T = Horizon length, D = Euclidean diameter of the admissible set X , PX (·) =
Euclidean projection oracle on the set X
2: Parameter settings:
1. Convex cost functions: β = (2GD)−1 , V = 1, λ = 1
√
2 T
, Φ(x) = exp(λx) − 1.
8G2 ln(T e)
2. α-Strongly convex cost functions: β = 1, V = α , Φ(x) = x2 .
3: Initialization: Set x1 = 0, CCV(0) = 0.
4: For t = 1 : T
5: Play xt , observe ft , gt , incur a cost of ft (xt ) and constraint violation of (gt (xt ))+
6: f˜t ← βft , g̃t ← β max(0, gt ).
7: CCV(t) = CCV(t − 1) + g̃t (xt ).
8: Compute ∇t = ∇fˆt (xt ), where fˆt (x) := V f˜t (x) + Φ′ (CCV(t))g̃t (x), t ≥ 1.
9: xt+1 = PX (xt − ηt ∇t ), where
√
√Pt 2D , for convex costs
2
ηt = 2 τ =1 ||∇τ ||2
Pt 1 , for strongly convex costs (Ht is the strong convexity parameter of ft ).
s=1 Hs
10: EndFor
using Algorithm 1’s definition, it follows that for all t ≤ t⋆ , xt < a/2, since the derivative of f dominates the
derivative of Φ′ (CCV(t))g(x) until then.
Since Φ(x) = exp(λx) − 1 with λ = 2√1 T , and by definition |∇g(x)| ≤ 1 for all x, thus, we have that by
√ √
time t⋆ , CCV[1:t⋆ ] = Ω( T log T ). Therefore, CCV[1:T ] = Ω( T log T ).
2
Essentially, Algorithm 1 is treating minimizing the √ CCV problem as regret minimization for function g
similar to function f and this leads to its CCV of Ω( T log T ). For any given input instance with d = 1,
an alternate algorithm that chooses its actions following
√ online gradient descent (OGD) projected on to the
most recently revealed feasible set St achieves O( T ) regret (irrespective of the starting action x1 ) and O(D)
CCV (since any x⋆ ∈ St for all t). We extend this intuition in the next section, and present an algorithm that
tries to exploit the geometry of the nested convex sets St for general d.
Remark 1 Step 6 of Algorithm 2 might appear unnecessary, however, its useful for proving Theorem 16.
6
Algorithm 2 Online Algorithm for COCO
1: Input: Sequence of convex cost functions {ft }T T
t=1 and constraint functions {gt }t=1 , G = a common
Lipschitz constant, d dimension of the admissible set X , step size ηt = GD√t . D = Euclidean diameter of
the admissible set X , PX (·) = Euclidean projection operator on the set X ,
2: Initialization: Set x1 ∈ X arbitrarily, CCV(0) = 0.
3: For t = 1 : T
4: Play xt , observe ft , gt , incur a cost of ft (xt ) and constraint violation of (gt (xt ))+
5: Set St as defined in (5)
6: yt = PSt−1 (xt − ηt ∇ft (xt ))
7: xt+1 = PSt (yt )
8: EndFor
Extension of Lemma 7 when ft ’s are strongly convex which results in Regret[1:T ] = O(log T ) for Algorithm
2 follows standard arguments Hazan (2012) and is omitted.
The real challenge is to bound the total CCV for Algorithm 2. Let xt be the action played by Algorithm 2.
Then by definition, xt ∈ St−1 . Moreover, from (6), the constraint violation at time t, CCV(t) ≤ Gdist(xt , St ).
The next action xt+1 chosen by Algorithm 2 belongs to St , however, it is obtained by first taking an OGD
step from xt to reach yt and then projects yt onto St . Since ft ’s are arbitrary, the OGD step could be towards
any direction, and thus, there is no direct relationship between xt+1 and xt . Informally, (x1 , x2 , . . . , xT ) is
not a connected curve with any useful property. Thus, we take recourse in upper bounding the CCV via upper
bounding the total movement cost M (defined below) between nested convex sets using projections.
The total constraint violation for Algorithm 2 is
t
X
CCV[1:t] ≤ G dist(xτ , Sτ ),
τ =1
(a) X t
≤ G ||xτ − bτ ||, (7)
τ =1
(b)
= GMt , (8)
where in (a) bt is the projection of xt onto St , i.e., bt = PSt (xt ) and in (b)
t
X
Mt = ||xτ − bτ || (9)
τ =1
is defined to be the total movement cost on the instance S1 , . . . , St . The object of interest is MT .
7
Figure 1: Figure representing the cone Cwt (ct ) that contains the convex hull of mt and St with unit vector wt .
Lemma 9 If all nested convex bodies S1 ⊇ S2 ⊇ · · · ⊇ ST are cuboids that are axis parallel to each other,
then M ≤ d3/2 D.
Proof is identical to Lemma 8. Note that similar results can be obtained when St ’s are regular polygons that
are axis parallel with each other.
After exhausting the universal results for an upper bound on MT for ‘nice’ nested convex bodies, we next
give a general bound on MT for any sequence of nested convex bodies which depends on the geometry of the
nested convex bodies (instance dependent). To state the result we need the following preliminaries.
Following (7), bt = PSt (xt ) where xt ∈ ∂St−1 . Without loss of generality, xt ∈/ St since otherwise the
distance ||xt − bt || = 0. Let mt be the mid-point of xt and bt , i.e. mt = xt +b
2
t
.
Definition 10 Let the convex hull of mt ∪ St be Ct . Let wt be a unit vector such that there exists ct > 0 such
that the cone
d T (z − mt )
Cwt (ct ) = z ∈ R : −wt ≥ ct
||(z − mt )||
contains Ct . Since St is convex, such wt , ct > 0 exist. For example, wt = bt − xt is one such choice for which
ct > 0 since mt ∈ / St . See Fig. 1 for a pictorial representation.
Let c⋆wt ,t = arg minct Cwt (ct ), c⋆t = minwt c⋆wt ,t , and wt⋆ = arg minwt c⋆wt ,t . Moreover, let c⋆ = mint c⋆t ,
where by definition, c⋆ < 1.
Essentially, 2 cos−1 (c⋆t ) is the angle width of Ct with respect to wt⋆ , i.e. each element of Ct makes an angle
of at most cos−1 (c⋆t ) with wt⋆ .
Remark 2 Note that c⋆t is only a function of the distance ||xt − bt || and the shape of St ’s, in particular,
the maximum width of St along the directions perpendicular to vector xt − bt ∀ t which can be at most the
diameter D. c⋆t decreases (increasing the “width” of cone Cwt⋆ (c⋆t )) as ||xt − bt || decreases, but small xt − bt
also implies small violation at time t from (7). Across time slots, dmin = mint ||xt − bt || and shape of St ’s
control c⋆ , where dmin > 0 is inherent from the definition of c⋆ since a bound on ||xt − bt || is only needed for
the case when xt ̸= bt .
Remark 3 Projecting xt ∈ ∂St−1 onto St to get bt = PSt (xt ), the diameter of St is at most diameter of
St−1 − ||xt − bt ||, however, only along the direction bt − xt . Since the shape of St is arbitrary, as a result, the
8
diameter of St need not be smaller than the diameter of St−1 along any pre-specified direction, which was the
main idea used to derive Lemma 8. Thus, to relate the distance ||xt − bt || with the decrease in the diameter
of the convex bodies St ’s, we use the concept of mean width of a convex body that is defined as the expected
width of the convex body along all the directions that are chosen uniformly randomly (formal definition is
provided in Definition 18).
Next, we upper bound MT by connecting the distance ||xt − bt || to the decrease in mean width (to be
defined ) of convex bodies St−1 and St ’s.
Compared to all prior results on COCO, that were√ universal (instance independent),
√ where the best known
one Sinha and Vaze (2024) has Regret[1:T ] = O( T ), and CCV[1:T ] = O( T log T ), Theorem 12 is an
instance dependent result for the CCV. In particular, it exploits the geometric structure of the nested convex
sets St ’s and derives an upper bound on the CCV that only depends on the ‘shape’ of St ’s. It can be the case
that the instance is ‘badly’ behaved and √c⋆ is very small or dependent on T . If that is the case, in Section
6 we show how to limit the CCV to O( T log T ). However, when St ’s are ‘nice’, e.g., c⋆ is independent
of T (Remark 2) or St ’s are spheres or axis parallel cuboids (Lemma 8 and 9), the CCV of Algorithm 2 is
independent of T , which is a fundamentally improved result compared to large body of prior work. In fact,
in prior work this was largely assumed to be not √ possible. In particular, before
√ the result of Sinha and Vaze
(2024), achieving simultaneous Regret[1:T ] = O( T ), and CCV[1:T ] = O( T ) itself was the final goal.
6 Algorithm Switch
⋆
Since Theorem 12 provides an instance dependent bound √ on the CCV, that is a function of c which can be
small, it can be the case that its CCV is larger than O( T log T ), thus providing a result that is inferior to that
of Algorithm 1 Sinha and Vaze (2024). Thus, next, we marry the two algorithms, Algorithm 1 and Algorithm
2, in Algorithm 3 to provide a best of both results as follows.
√
Theorem 13 Switch (Algorithm 3) has regret Regret[1:T ] = O( T ), while
( d ! )
√ √
1
CCV[1:T ] = min O d D , O( T log T ) .
c⋆
Algorithm Switch should be understood as the best of two worlds algorithm, where
√ the two worlds corre-
sponds to one having nice convex sets St ’s that have CCV independent of T or o( T ) for Algorithm 2, while
9
Algorithm 3 Switch
1: Input: Sequence of convex cost functions {ft }T T
t=1 and constraint functions {gt }t=1 , G = a common
Lipschitz constant, d dimension of the admissible set X , D = Euclidean diameter of the admissible set
X , PX (·) = Euclidean projection operator on the set X ,
2: Initialization: Set x1 ∈ X arbitrarily, CCV(0) = 0.
3: For t = 1 : T √
4: If CCV(t − 1) ≤ T log T
5: Follow Algorithm 2
6: CCV(t) = CCV(t − 1) + max{gt (xt ), 0}.
7: Else
8: Follow Algorithm 1 with resetting CCV(t − 1) = 0
9: EndIf
10: EndFor
in the other, CCV of Algorithm 2 is large√ on its own, and the overall CCV is controlled by discontinuing the
use of Algorithm√2 once its CCV reaches T log T and switching to Algorithm 1 thereafter that has universal
guarantee of O( T log T ) on its CCV.
After exhausting the general results on the CCV of Algorithm 2, we next consider the special case of d = 2
and when the sets St have a special structure defined by their projection hyperplanes. Note that it is highly
non-trivial to bound the CCV of Algorithm 2 even when d = 2.
7 Special case of d = 2
In this section, we show that if d = 2 (all convex sets St ’s lie in a plane) and the projections satisfy a
monotonicity property depending on the problem instance, then we can bound the total CCV for Algorithm 2
independent of the time horizon T and consequently getting a O(1) CCV.
Recall from the definition of Algorithm 2, yt = PSt−1 (xt − ηt ∇ft (xt )) and xt+1 = PSt (yt ).
Definition 14 Let the hyperplane perpendicular to line segment (yt , xt+1 ) passing through xt+1 be Ft . With-
out loss of generality, we let yt ∈
/ St , since then the projection is trivial. Essentially Ft is the projection hyper-
plane at time t. Let Ht+ denote the positive half plane corresponding to Ft , i.e., Ht+ = {z : z T (yt − xt+1 ) ≥
0}. Refer to Fig. 2. Let the angle between F1 and Ft be θt .
8 OCS Problem
In Sinha and Vaze (2024), a special case of COCO, called the OCS problem, was introduced where ft ≡ 0 for
all t. Essentially, with OCS, only constraint satisfaction is the objective. In Sinha and Vaze (2024), Algorithm
10
Figure 2: Definition of Ft ’s.
√
1 was shown to have CCV of O( T log T ). Next, we show that Algorithm 2 has CCV of O(1) for the OCS,
a remarkable improvement.
Theorem 17 For solving OCS, Algorithm 2 has CCV[1:T ] = O dd/2 D .
As discussed in Sinha and Vaze (2024), there are important applications of OCS, and it is important to
find tight bounds on its CCV. Theorem 17 achieves this by showing that CCV of O(1) can be achieved, where
the constant depends only on the dimension of√the action space and the diameter. This is a fundamental im-
provement compared to the CCV bound of O( T log T ) from Sinha and Vaze (2024). Theorem 17 is derived
by using the connection between the curve obtained by successive projections on nested convex sets and self-
expanded curves (Definition 24) and then using a classical result on self-expanded curves from Manselli and
Pucci (1991).
9 Conclusions
One fundamental open question for COCO is: whether it is possible to simultaneously achieve R[1:T ] =
√ √
O( T ) and CCV[1:T ] = o( T ) or CCV[1:T ] = O(1). In this paper, we have made substantial progress
towards answering this question by proposing an algorithm that exploits the geometric properties of the nested
convex sets St√’s that effectively control the CCV. The state of the art algorithm Sinha and Vaze (2024) achieves
a CCV of Ω( T log T ) even for very simple√instances as shown in Lemma 6, and conceptually different
algorithms were needed to achieve CCV of o( T ). We propose one such algorithm and show that when the
nested convex constraint
√ sets are ‘nice’ (instances is simple), achieving a CCV of O(1) is possible without
losing out on O( T ) regret guarantee. We also derived a bound on the CCV for general problem instances,
that is as a function of the shape of nested convex constraint sets and the distance between them, and the
diameter.
In the absence of good lower bounds, the open question remains open in general, however, this paper
significantly improves the conceptual understanding of COCO problem by demonstrating that good algorithms
need to exploit the geometry of the nested convex constraint sets.
11
One remark we want to make at the end is that COCO is inherently a difficult problem, which is best
exemplified by the fact, that even for the special case of COCO where ft = f for all t, essentially where f
is known ahead of time, our algorithm/prior work does not yield a better regret or CCV bound compared to
when ft ’s are arbitrarily varying.
References
CJ Argue, Sébastien Bubeck, Michael B Cohen, Anupam Gupta, and Yin Tat Lee. A nearly-linear bound for
chasing nested convex bodies. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete
Algorithms, pages 117–122. SIAM, 2019.
Nikhil Bansal, Martin Böhm, Marek Eliáš, Grigorios Koumoutsos, and Seeun William Umboh. Nested convex
bodies are chaseable. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete
Algorithms, pages 1253–1260. SIAM, 2018.
Sébastien Bubeck, Bo’az Klartag, Yin Tat Lee, Yuanzhi Li, and Mark Sellke. Chasing nested convex bodies
nearly optimally. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms,
pages 1496–1508. SIAM, 2020.
Xuanyu Cao and KJ Ray Liu. Online convex optimization with time-varying constraints and bandit feedback.
IEEE Transactions on automatic control, 64(7):2665–2680, 2018.
Tianyi Chen and Georgios B Giannakis. Bandit convex optimization for scalable and dynamic iot management.
IEEE Internet of Things Journal, 6(1):1276–1286, 2018.
Harold Gordon Eggleston. Convexity, 1966.
Hengquan Guo, Xin Liu, Honghao Wei, and Lei Ying. Online convex optimization with hard constraints:
Towards the best of two worlds and beyond. Advances in Neural Information Processing Systems, 35:
36426–36439, 2022.
Elad Hazan. The convex optimization approach to regret minimization. Optimization for machine learning,
page 287, 2012.
Rodolphe Jenatton, Jim Huang, and Cédric Archambeau. Adaptive algorithms for online convex optimization
with long-term constraints. In International Conference on Machine Learning, pages 402–411. PMLR,
2016.
Nikolaos Liakopoulos, Apostolos Destounis, Georgios Paschos, Thrasyvoulos Spyropoulos, and Panayotis
Mertikopoulos. Cautious regret minimization: Online optimization with long-term budget constraints. In
International Conference on Machine Learning, pages 3944–3952. PMLR, 2019.
Qingsong Liu, Wenfei Wu, Longbo Huang, and Zhixuan Fang. Simultaneously achieving sublinear regret
and constraint violations for online convex optimization with time-varying constraints. ACM SIGMETRICS
Performance Evaluation Review, 49(3):4–5, 2022.
Mehrdad Mahdavi, Rong Jin, and Tianbao Yang. Trading regret for efficiency: online convex optimization
with long term constraints. The Journal of Machine Learning Research, 13(1):2503–2528, 2012.
Paolo Manselli and Carlo Pucci. Maximum length of steepest descent curves for quasi-convex functions.
Geometriae Dedicata, 38(2):211–227, 1991.
12
Michael J Neely. Stochastic network optimization with application to communication and queueing systems.
Synthesis Lectures on Communication Networks, 3(1):1–211, 2010.
Michael J Neely and Hao Yu. Online convex optimization with time-varying constraints. arXiv preprint
arXiv:1702.04783, 2017.
Abhishek Sinha and Rahul Vaze. Optimal algorithms for online convex optimization with adversarial con-
straints. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL
https://openreview.net/forum?id=TxffvJMnBy.
Wen Sun, Debadeepta Dey, and Ashish Kapoor. Safety-aware algorithms for adversarial contextual bandit. In
International Conference on Machine Learning, pages 3280–3288. PMLR, 2017.
Rahul Vaze. On dynamic regret and constraint violations in constrained online convex optimization. In 2022
20th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks
(WiOpt), pages 9–16, 2022. doi: 10.23919/WiOpt56218.2022.9930613.
Xinlei Yi, Xiuxian Li, Tao Yang, Lihua Xie, Tianyou Chai, and Karl Johansson. Regret and cumulative
constraint violation analysis for online convex optimization with long term constraints. In International
Conference on Machine Learning, pages 11998–12008. PMLR, 2021.
Xinlei Yi, Xiuxian Li, Tao Yang, Lihua Xie, Yiguang Hong, Tianyou Chai, and Karl H Johansson. Distributed
online convex optimization with adversarial constraints: Reduced cumulative constraint violation bounds
under slater’s condition. arXiv preprint arXiv:2306.00149, 2023.
√
Hao Yu and Michael J Neely. A low complexity algorithm with o( T ) regret and o(1) constraint violations
for online convex optimization with long term constraints. arXiv preprint arXiv:1604.02218, 2016.
Hao Yu, Michael Neely, and Xiaohan Wei. Online convex optimization with stochastic constraints. Advances
in Neural Information Processing Systems, 30, 2017.
Jianjun Yuan and Andrew Lamperski. Online convex optimization for cumulative constraints. Advances in
Neural Information Processing Systems, 31, 2018.
13
10 Proof of Lemma 7
Proof: From the convexity of ft ’s, for x⋆ satisfying Assumption (3), we have
where inequalities (a) and (b) follow since x⋆ ∈ St for all t. Hence
||xt+1 − x⋆ ||2 ≤ ||xt − x⋆ ||2 + ηt2 ||∇ft (xt )||2 − 2ηt ∇ftT (xt )(xt − x⋆ ),
||xt − x⋆ ||2 − ||xt+1 − x⋆ ||2
∇ftT (xt )(xt − x⋆ ) ≤ + ηt G2 .
ηt
Summing this over t = 1 to T , we get
T
X T
X
2 (ft (xt ) − ft (x⋆ )) ≤ ∇ftT (xt − x⋆ ),
t=1 t=1
T T
X ||xt − x⋆ ||2 − ||xt+1 − x⋆ ||2 X
≤ + ηt G2 ,
t=1
ηt t=1
T
1 X
≤ D2 + G2 ηt ,
ηT t=1
√
≤ O(DG T ),
11 Proof of Theorem 11
Proof: We need the following preliminaries.
Definition 18 Let K be a non-empty convex bounded set in Rd . Let u be a unit vector, and ℓu a line through
the origin parallel to u. Let Ku be the orthogonal projection of K onto ℓu , with length |Ku |. The mean width
of K is defined as Z
1
W (K) = |Ku |du, (10)
Vd Sd1
where Sd1 is the unit sphere in d dimensions and Vd its (d − 1)-dimensional Lebesgue measure.
14
Lemma 19 Eggleston (1966) For d = 2,
Perimeter(K)
W (K) = .
π
Lemma 19 implies that W (K) ̸= W (K1 ) + W (K2 ) even if K1 ∪ K2 = K and K1 ∩ K2 = ϕ.
Recall from (7) that xt ∈ ∂St−1 and bt is the projection of xt onto St , and mt is the mid-point of xt and
bt , i.e. mt = xt +b
2 . Moreover, the convex sets St ’s are nested, i.e., S1 ⊇ S2 ⊇ · · · ⊇ ST . To prove Theorem
t
11 we will bound the rate at which W (St ) (Definition 18) decreases as a function of the length ||xt − bt ||.
From Definition 10, recall that Ct is the convex hull of mt ∪ St . We also need to define Ct− as the convex
hull of xt ∪ St . Since St ⊆ Ct and Ct− ⊆ St−1 (since St−1 is convex and xt ∈ St−1 ), we have
15
Figure 3: Figure representing the cone Cwt (ct ) that contains the convex hull of mt and St with respect to the
unit vector wt . u is a unit vector perpendicular to Hu an hyperplane that is a supporting hyperplane Ct at mt
such that Ct ∩ Hu = {mt } and uT (xt − mt ) ≥ 0
Let S⊥ = {u⊥ : |u⊥ | = 1, uT⊥ wt⋆ = 0}. Let du⊥ be the (n − 2)-dimensional Lebesgue measure of S⊥ .
It is easy to verify that du = λd−2 (1 − λ2 )−1/2 dλdu⊥ and hence from (14)
c⋆
||xt − bt || (xt − mt )
Z t
Z p
∆t ≤ − λd−2 (1 − λ2 )−1/2 dλ (λu⊥ + 1 − λ2 wt⋆ )T du⊥ . (15)
Vd 0 S⊥ ||xt − mt ||
R
Note that du⊥
u⊥ du⊥ = 0. Thus,
c⋆
||xt − bt || (wt⋆ )T (xt − mt )
Z t p Z
d−2 2 −1/2
∆t = − λ (1 − λ ) 1− λ2 dλ du⊥ ,
2Vd ||xt − mt || 0 S⊥
c⋆
(a) ||xt − bt || (wt⋆ )T (xt − mt )
Z t
≤ −Vd−1 λd−2 dλ,
2Vd ||xt − mt || 0
(b) ||xt − bt || ⋆ ⋆ d−1
≤ −Vd−1 c (c ) ,
2Vd (d − 1) t t
||xt − bt || ⋆ d
= −Vd−1 (c ) , (16)
2Vd (d − 1) t
(wt⋆ )T (xt −mt )
≥ c⋆t from Definition
R
where (a) follows since S⊥
du⊥ = Vd−1 by definition, (b) follows since ||xt −mt ||
10.
2
16
12 Proof of Theorem 13
Proof: Since CCV(t) is a monotone non-decreasing function, let tmin be the largest time until which Algo-
rithm 2 is followed
√ by Switch. The regret guarantee is easy to prove. From Theorem 12, regret until time tmin
is at most O(
√ min ). Moreover, starting from time tmin till T , from Theorem
t √ 5, the regret of Algorithm 1 is
at most O( T − tmin ). Thus, the overall regret for Switch is at√most O( T ).
For the CCV, with Switch, until time tmin , CCV(tmin ) ≤ T log T . At time tmin , Switch starts to use
Algorithm 1 which has the following appealing property from (8) Sinha and Vaze (2024) that for any t ≥ tmin
where at time tmin Algorithm 1 was started to be used with resetting CCV(tmin ) = 0. For any t ≥ tmin
v
u t
⋆
u X 2 √
Φ(CCV(t)) + Regrett (x ) ≤ t Φ′ (CCV(τ )) + t − tmin . (17)
τ =tmin
where β = (2GD)−1 , V = 1, λ = 1
√
2 T
, Φ(x) = exp(λx)−1, and λ = 1
√
2 T
. We trivially have Regrett (x⋆ ) ≥
Dt
− 2D ≥ − 2t . Hence, from (17), we have that for any λ = 1
√
2 T
and any t ≥ tmin
√
CCV[tmin ,T ] ≤ 4GD ln(2 1 + 2T ) T.
√ √
Since as argued before, with Switch, CCV(tmin ) ≤ T log T , we get that CCV[1:T ] ≤ O( T log T ). 2
Definition 24 A curve γ : I → Rd is called self-expanded, if for every t where γ ′ (t) exists, we have
for all u ∈ I with u ≤ t, where < ., . > represents the inner product. In words, what this means is that
γ starting in a point x0 is self expanded, if for every x ∈ γ for which there exists the tangent line T, the
arc (sub-curve) (x0 , x) is contained in one of the two half-spaces, bounded by the hyperplane through x and
orthogonal to T.
17
For self-expanded curves the following classical result is known.
Theorem 25 Manselli and Pucci (1991) For any self-expanded curve γ belonging to a closed bounded convex
set of Rd with diameter D, its total length is at most O(dd/2 D).
Proof: [Proof of Lemma 23] From Definition 22, the projection curve is
Let the reverse curve be r = {rt }t=0,...,T −2 , where rt = (σT −t , σT −t−1 ). Thus we are reading σ backwards
and calling it r. Note that since σt is the projection of σt−1 on Kt , each piece-wise linear segment (σt , σt+1 )
is a straight line and hence differentiable except at the end points. Moreover, since each σt is obtained by
projecting σt−1 onto Kt and Kt+1 ⊆ Kt , we have that the projection hyperplane Ft that passes through σt =
PKt (σt−1 ) and is perpendicular to σt −σt−1 separates the two sub curves {(σ1 , σ2 ), (σ2 , σ3 ), . . . , (σt−1 , σt )}
and {(σt , σt+1 ), (σt+1 , σt+2 ), . . . , (σT −1 , σT )}.
Thus, we have that for each segment rτ , at each point where it is differentiable, the curve r1 , . . . rτ −1 lies
on one side of the hyperplane that passes through the point and is perpendicular to rτ +1 . Thus, we conclude
that curve r is self-expanded.
As a result, Theorem 24 implies that the length of r is at most O(dd/2 diameter(K1 )), and the result follows
since the length of r is same as that of σ which is Σ. 2
14 Proof of Theorem 17
Clearly, with ft ≡ 0 for all t, with Algorithm 2, yt = xt and the successive xt ’s are such that xt+1 = PSt (xt ).
Thus, essentially, the curve x = (x1 , x2 ), (x2 , x3 ), . . . , (xT −1 , xT ) formed by Algorithm 2 for OCS is a
projection curve (Definition 22) on S1 ⊇, . . . , ⊇ ST and the result follows from Lemma 23 and the fact that
diameter(S1 ) ≤ D.
15 Proof of Theorem 16
Proof: Recall that d = 2, and the definition of Ft from Definition 14. Let the center be c = PS1 (x1 ). Let
torth be the earliest t for which ∠(Ft , F1 ) = π.
Initialize κ = 1, s(1) = 1, τ (1) = 1.
BeginProcedure Step 1:Definition of Phase κ. Consider
18
Figure 4: Figure corresponding to Example 26.
Example 26 To better understand the definition of phases, consider Fig. 4, where the largest t for which the
angle between Ft and F1 is at most π/4 is 3. Thus, τ (1) = 3, i.e., phase 1 explores till time t = 3 and phase 1
ends. The starting hyperplane to consider in phase 2 is s(2) = 3 and given that angle between F3 and and the
next hyperplane F4 is more than π/4, phase 2 is empty and phase 2 ends by exploring till t = 4. The starting
hyperplane to consider in phase 3 is s(3) = 4 and the process goes on. The first time t such that the angle
between F1 and Ft is π is t = 6, and thus torth = 6, and the process stops at time t = 6. This also implies that
S6 ⊂ F1 . Since St ’s are nested, for all t ≥ 6, St ⊂ F1 . Hence the total CCV after t ≥ torth is at most GD.
The main idea with defining phases, is to partition the whole space into empty and non-empty regions,
where in each non-empty region, the starting and ending hyperplanes have an angle to at most π/4, while in
an empty phase the starting and ending hyperplanes have an angle of at least π/4. Thus, we get the following
simple result.
19
Proof is immediate from the definition of the phases, since any consecutively occurring non-empty and empty
phase exhausts an angle of at least π/4.
Remark 4 Since we are in d = 2 dimensions, for all t ≥ torth , the movement is along the hyperplane F1 and
thus the resulting constraint violation after time t ≥ torth is at most GD. Thus, in the phase definition above,
we have only considered time till torth and we only need to upper bound the CCV till time torth .
Definition 28 With respect to the quantities defined for Algorithm 2, let for a non-empty phase κ
T
rmax (κ) = max ||yt − c|| and t⋆ (κ) = arg max ||yt − c||.
s(κ)<t≤τ (κ) s(κ)<t≤τ (κ)
t⋆ (κ) is the time index belonging to phase κ for which yt is the farthest.
Definition 29 A non-empty phase κ consists of time slots T (κ) = [τ (κ−1), τ (κ)] and the angle ∠(Ft1 , Ft2 ) ≤
π/4 for all t1 , t2 ∈ T (κ). Using Definition 28, we partition T (κ) as T (κ) = T − (κ) ∪ T + (κ), where
T − (κ) = [τ (κ − 1) + 1, t⋆ (κ) + 1] and T + (κ) = [t⋆ (κ) + 2, τ (κ)].
Definition 30 [Definition of zt (κ) for t ∈ T − (κ)]. Let zt⋆ (κ)+1 = xt⋆ (κ)+1 . For t ∈ T − (κ)\t⋆ (κ) + 1,
define zt (κ) inductively as follows. zt (κ) is the pre-image of zt+1 (κ) on Ft−1 such that the projection of zt (κ)
on Ft is zt+1 (κ).
Definition 31 [Definition of zt (κ) for t ∈ T + (κ)]. For t ∈ T + (κ), define zt (κ) inductively as follows.
zt (κ) is the projection of zt−1 (κ) on Ft−1 .
Definition 32 [Definition of St′ for a non-empty phase κ:] St′⋆ (κ)+1 = St⋆ (κ)+1 . For t ∈ T − (κ)\t⋆ (κ) + 1,
′
St′ is the convex hull of zt+1 (κ) ∪ St ∪ St+1 (κ). For t ∈ T + (κ), St′ = St . See Fig. 6.
′
Lemma 33 For a non-empty phase κ, for any t ∈ T (κ), St+1 ⊆ St′ , i.e. they are nested.
Definition 34 For a non-empty phase, χ(κ) = Sτ′ (κ−1) ∩ Hτ+(κ) , where Hτ+(κ) has been defined in Definition
14.
Definition 35 [New Violations for t ∈ T (κ):] For a non-empty phase κ, for t ∈ T (κ)\τ (κ − 1), let
20
Figure 5: Illustration of definition of zt (κ) for t ∈ T (κ). In this example, for phase 1, t⋆ (1) = 3 since the
distance of y3 from c is the farthest for phase 1 that consists of time slots T (1) = {2, 3}. Hence zt⋆ (1)+1 (1) =
x4 . For t ∈ T (1)\t⋆ (1) + 1, zt (1) are such zt+1 (1) is a projection of zt (1) onto Ft .
21
Figure 6: Definition of St ’s where Ut are the extra regions that are added to St to get St′ .
Proof: Recall that for a non-empty phase κ, T (κ) = T − (κ) ∪ T + (κ). We first√argue about t ∈ T − (κ).
By definition, zt⋆ (κ)+1 = xt⋆ (κ)+1 and xt⋆ (κ)+1 ∈ St⋆ (κ) . Thus, zt⋆ (κ)+1 ∈ B(c, 2D). Next we argue for
t ∈ T − (κ)\t⋆ (κ) + 1. Recall that the diameter of X is D, and the fact that yt ∈ St−1 from Algorithm 2.
Thus, for any non-empty phase κ, the distance from c to the farthest yt belonging to the phase κ is at most D,
i.e., rmax (κ) ≤ D. Let the pre-image of zt⋆ (κ)+1 (κ) onto Fs(κ) (the base hyperplane with respect to which
all hyperplanes have an angle of at most π/4 in phase κ) be p(κ) such that projection of p(κ) onto Fs(κ) is
zt⋆ (κ)+1 (κ). From the definition of any non-empty phase, the angle between Fs(κ) and Ft for t ∈ T (κ) is at
√
most π/4. Thus, the distance of p(κ) from c is at most 2D.
Consider the ‘triangle’ Π(κ) that is the convex hull of c, zt⋆ (κ)+1 (κ) and p(κ). Given that the angle
between Ft⋆ (κ) and Ft⋆ (κ)−1 is at most π/4, the argument above implies that zt (κ) ∈ Π(κ) for t = t⋆ (κ). For
′
t = t⋆ (κ) − 1, zt (κ) ∈ Ft−1 is the projection of zt−1 (κ) onto St−1 . This implies that the distance of zt (κ)
⋆
(for t = t (κ) − 1) from c is at most
D
,
cos(αt,t⋆ (κ) ) cos(αt⋆ (κ),t⋆ (κ)+1 )
where αt1 ,t2 is the angle between Ft1 and Ft2 . From the monotonicity of angles θt (Definition 15), and the def-
inition of a non-empty phase, we have that αt,t⋆ (κ) + αt⋆ (κ),t⋆ (κ)+1 ≤ π/4 and αt,t⋆ (κ) ≥ 0, αt⋆ (κ),t⋆ (κ)+1 ≥
0. Next, we appeal to the identity
cos(A + B) ≤ cos(A) cos(B) (19)
where A + B ≤ π/4, to claim that zt (κ) ∈ Π(κ) for t = t⋆ (κ) − 1.
identity (19) gives the result that for any t ∈ T − (κ), we
Iteratively using this argument while invoking the√
have that zt (κ) belongs to Π(κ). Since Π(κ) ⊆ B(c, 2D), we have the claim for all t ∈ T − (κ).
22
By definition zt (κ) for t ∈ T + (κ) belong to St−1 ⊆ S1 . Thus, their distance from c is at most D. 2
Lemma 37 For each non-empty phase κ, and for t ∈ T (κ) the violation vt (κ) ≥ dist(xt , St ), where
dist(xt , St ) is the original violation.
Proof: By construction of any non-empty phase κ, for t ∈ T (κ) both xt (κ) and zt (κ) belong to Ft−1 .
Moreover, by construction, the distance of zt (κ) from c is at least as much as the distance of xt from c.
Thus, using the monotonicity property of angles θt (Definition 15) we get the result. See Fig. 5 for a visual
illustration. 2
For each non-empty phase κ, by definition, the curve defined by sequence zt (κ) for t ∈ T (κ) is a pro-
jection curve (Definition 22) on sets St′ (κ) (note that St′ (κ)’s are nested from Lemma 33). Moreover, for all
t ∈ T (κ), set St′ (κ) ⊂ χ(κ) which is a bounded convex set. Thus, for d = 2 from Lemma 23 the length of
curve z(κ) = {(zt (κ), zt+1 (κ))}t∈T (κ)
X
vt (κ) ≤ 2diameter(χ(κ)). (20)
t∈T (κ)
√ number of non-empty phases till time torth is at most 4. Moreover, in each non-empty
By definition, the
phase χ(κ) ⊆ B(c, 2D) from Lemma 36 .
Thus, from (20), we have that
X X X
vt (κ) ≤ 2 diameter(χ(κ))
Phase κ is non-empty t∈T (κ) Phase κ is non-empty
√
≤ 8 diameter(B(c, 2D)) ≤ O(D). (21)
For any empty phase, the constraint violation is the length of line segment (xt , PSt (xt )) (Algorithm 2)
crossing it is a straight line whose length is at most O(D). Moreover, the total number of empty phases
(Lemma 27) is a constant. Thus, the length of the curve (xt , PSt (xt )) for Algorithm 2 corresponding to all
empty phases is at O(D).
Recall from (6) that the CCV is at most G times dist(xt , St ). Thus, from (22) we get that the total violation
incurred by Algorithm 2 corresponding to non-empty phases is at most O(GD), while corresponding to empty
phases is at O(GD). Finally, accounting for the very first violation dist(x1 , S1 ) ≤ D and the fact that the
CCV after time t ≥ torth (Remark 4) is at most GD, we get that the total constraint violation CCV[1:T ] for
Algorithm 2 is at most O(GD).
2
23