0% found this document useful (0 votes)

15 views30 pages

Gradient Methods For Onl

This paper addresses online monotone DR-submodular maximization with long-term stochastic constraints, proposing a gradient ascent algorithm that achieves O(√T) regret and O(T^(3/4)) constraint violation with high probability. The authors also introduce an algorithm for full-information feedback that achieves (1 - 1/e)-regret with similar constraint violation, significantly improving query complexity compared to existing methods. The study highlights the challenges of balancing reward maximization with budget constraints in stochastic settings, contributing to the understanding of DR-submodular functions in online optimization.

Uploaded by

Abhishek Sinha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views30 pages

Gradient Methods For Onl

Uploaded by

Abhishek Sinha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Gradient Methods for Online DR-Submodular

Maximization with Stochastic Long-Term Constraints

Guanyu Nie Vaneet Aggarwal Christopher John Quinn

Iowa State University Purdue University Iowa State University
Ames, IA 50010 West Lafayette, IN 47907 Ames, IA 50010
[email protected] [email protected] [email protected]

Abstract

In this paper, we consider the problem of online monotone DR-submodular maxi-

mization subject to long-term stochastic constraints. Specifically, at each round
t ∈ [T ], after committing an action xt , a random reward ft (xt ) and an unbi-
ased gradient estimate of the point ∇f
e t (xt ) (semi-bandit feedback) are revealed.
Meanwhile, a budget of gt (xt ), which is linear and stochastic, is consumed of
its total allotted budget√BT . We propose a gradient ascent based algorithm that
achieves 12 -regret of O( T ) with O(T 3/4 ) constraint violation with high probabil-
ity. Moreover, when first-order full-information feedback
√ is available, we propose
an algorithm that achieves (1 − 1/e)-regret of O( T ) with O(T 3/4 ) constraint
violation. These algorithms significantly improve over the state-of-the-art in terms
of query complexity.

1 Introduction

Online optimization confronts diverse challenges as information gradually unfolds, compelling

irreversible decisions at each step amidst uncertainty about future information. Typically, an online
optimization problem can be framed as a recurring game between a learner or algorithm and an
adversary or environment: during each iteration, the learner chooses an action from a predefined
domain set, and the environment subsequently reveals feedback in the form of utility or reward
for the selected action. Over time, the learner aims to learn from past experiences and improve
decision-making strategies to maximize cumulative rewards.
In instances where the objective function is concave and the feasible set is convex, the problem has
been extensively explored in the literature under the name of online convex optimization (OCO)
[13]. OCO has found considerable success in various machine learning applications, leveraging the
well-established theories
√ of convex optimization. Within OCO, it is established that any algorithm
incurs a regret of Ω( T ) in the worst case [13]. Notably, there exist algorithms that match this lower
bound, such as Online Gradient Descent (OGD) [43].
Even though optimizing convex/concave functions can be done efficiently, most problems in artificial
intelligence are non-convex. Examples include training deep neural networks, Bayesian inference,
and clustering, among many others. One important example of such functions is called submodular
set function. Submodular set functions exhibit a natural property known as diminishing returns, akin
to concave functions in continuous domains. They have been applied in various machine learning
contexts, including viral marketing [16], sensor placement [11], recommendation systems [23], data
summarization [22], and numerous others. However, submodularity extends beyond set functions and
can also be defined for continuous functions. DR-submodularity represents a special class of such
continuous functions [5].

38th Conference on Neural Information Processing Systems (NeurIPS 2024).

In the realm of online DR-submodular maximization, two primary types of algorithms have received
extensive attention. Gradient ascent algorithms are often favored for their simplicity and lower
gradient sample complexity, while Frank-Wolfe algorithms offer the ability to avoid potentially
costly projection operations [9]. The type of feedback also plays a pivotal role in algorithm design.
Frank-Wolfe algorithms typically assume more information is acquired each round, assuming that
upon committing an action, the agent can observe the entire function, and gradients of multiple points
can be observed. This feedback type is often referred to as full information feedback. On the other
hand, gradient ascent-based algorithms usually assume only the function value of the chosen action
and the gradient at that point can be observed. This feedback type is often referred to as semi-bandit
feedback. Recent efforts have aimed to enhance gradient queries for Frank-Wolfe algorithms, albeit
at the cost of worse regret [41].
In numerous applications, aside from the goal of maximizing the cumulative reward, there exist
constraints on the sequence of decisions made by the learner that must be satisfied on average [1, 3].
In such scenarios, long-term constraints pose challenges because the decision-making process cannot
be simply expressed as restricting the set of actions for all time steps may be sub-optimal. Instead,
the learner still (always) wants to maximize cumulative reward, but if putatively instantaneously high
reward arms are expensive, can only play them a few times. To illustrate, let’s consider the online ad
allocation problem encountered by an advertiser. At each round t ∈ [T ], the advertiser faces the task
of determining the allocation of funds across n different websites for placing ads. While the primary
aim is to maximize the overall click-through rates of the ads, the advertiser is also constrained by
a predetermined budget allocated over a specified time horizon [4]. In this scenario, the cost of
ad placement in each round may either be fixed and known in advance, or it may depend on the
number of clicks the ads receive, which remains uncertain in advance. As a result, the advertiser must
navigate the trade-off between optimizing the total reward and adhering to budget constraints.
In this paper, we study the problem of online DR-submodular maximization with long-term (linear)
budget constraints. While most prior works assume the objective function is chosen adversarially
[10, 29, 31], we study the stochastic DR-submodular maximization setting, where the objective
functions are i.i.d. sampled from a distribution [7, 12]. Note that this setting is still of interest
because as opposed to assuming each arriving function ft being DR-submodular, we only assume the
expectation of ft to possess DR-submodularity. Moreover, we consider the semi-bandit feedback
setting where only the noisy function value ft (xt ) and an unbiased gradient estimator of that point
∇f
e t (xt ) can be observed. To the best of our knowledge, these particular settings (stochastic utility
and semi-bandit feedback) have not been explored in the literature on DR-submodular maximization
with long-term constraints.
Our Contributions: We summarize our contributions in three parts.

1. In the semi-bandit feedback setting, we propose the first stochastic gradient ascent based algorithm
for stochastic online DR-submodular√maximization with stochastic long-term constraints. Our
proposed algorithm can achieve O( T ) 12 -regret and O(T 3/4 ) constraint violation with high
probability. Compared to all previous works [10, 29, 31],
√ they consider first-order full-information
feedback and require unbiased gradient estimates at T locations √(not just at action xt ) in every
round. For those works, their query complexity in each round is T while ours is just 1.

2. In first order full-information setting, where the unbiased gradient estimates of any point can be
observed, we propose the first stochastic gradient ascent based algorithm for stochastic online DR-
submodular maximization with stochastic long-term constraints. We utilize the recently developed
√
technique in [42] called the non-oblivious function. Our proposed algorithm can achieve O( T )
(1 − 1/e)-regret and O(T 3/4 ) constraint violation with high probability. Again, compared to
previous works [10, 29, 31], our query complexity is significantly lower.

Regarding the approximation ratios: We note that in offline 1 − 1/e is known to be the optimal
approximation ratio for optimizing monotone DR submodular functions over a general convex set,
where the query can be anywhere in the convex hull of K ∪ {0} (K is the constraint set). However,
when the oracle calls are restricted to K, an approximation ratio of 1/2 is the best that is known to be
achievable [28]. Thus, full information feedback can achieve 1 − 1/e-regret while the semi-bandit
feedback achieves 1/2-regret.

2
Table 1: We include related works from online DR-submodular optimization with constant or
stochastic long term constraint functions. (Works handling adversarial long-term constraints require
a different definition of regret.) All methods require a gradient oracle for feedback, and ‘Noise’ lists
whether the gradient is exact or there is stochastic noise. ‘# Grad.’ is the number gradient evaluations
required per-round. ‘Con. Viol.’ is the bound on the constraint violation. † [31] considered√constraint
set being convex√ while all other works consider linear constraint. In ‘# Grad.’ column, 2 T means
this work needs T gradients on both f and g. ‡ While all actions will be feasible, some gradient
queries will be in the convex hull of K ∪ {0}.

R EFERENCE R EGION N OISE # G RAD . A PPROX . R EGRET C ON . V IOL .

√ √ √
[29] 0∈K × √T 1 − 1/e O(√T ) O(
e T)
√
[31]† 0∈K × 2√ T 1 − 1/e O(√T ) O(√T )
[10] 0∈K × T 1 − 1/e O( T ) O( T )
√ √ √
[29] 0∈K T 1 − 1/e O(T√3/4 ) O(
e T)
√
T HEOREM 2 G ENERAL ‡ 1 1 − 1/e O( T ) O(T 3/4 )
√ √
T HEOREM 1 G ENERAL 1( SEMI ) 1/2 O( T ) O(T 3/4 )

2 Related Works
The primary related works are summarized in Table 1. We briefly discuss notable contributions and
for additional related works, see Appendix F
Online DR-submodular Maximization with Long-Term Constraints We do not compare results
with adversarial constraints due to additional assumptions needed in this setting. See Appendix F for
a detailed discussion. In the context of stochastic constraints,
√ Raut et al. [29] conducted the initial
study of the problem. They successfully attained
√ O( T ) regret and constraint violation with high
probability, as well as O(T 3/4 ) regret and O( T ) constraint violation in expectation.
√ Building upon
this work, Sadeghi et al. [31] further improved the results to achieve O( T ) regret and constraint
violation, both in expectation and with high probability. Additionally, Feng et al. [10] extended these
findings to incorporate weakly DR-submodular utility, achieving analogous results.
Online Convex Optimization with Long-Term Constraints Several results for OCO with determin-
istic long-term√constraints can be found in [14, 20, 37, 38, 40]. Existing literature has established that
a regret of O( T ) and a cumulative constraint violation of O(T 1/4 ) can be achieved without√the
Slater condition. Conversely, assuming the Slater condition allows for achieving a regret of O( T )
and a cumulative constraint violation of O(1).
√ In cases where the considered constraint is assumed to
be stochastic, Yu et al. [39] achieved a O( T ) bound on both regret and constraint violations under
the Slater condition. Furthermore, Wei et al. [35] achieved the same regret and constraint violation
bounds while assuming a strictly weaker assumption than the Slater condition.

3 Preliminaries
3.1 Notations

Vectors are shown by lowercase bold letters, such as x ∈ Rd . We denote by ∥ · ∥ the ℓ2 (Euclidean)
norm. We use [T ] to denote the set {1, 2, . . . , T }. The inner product of two vectors x, y ∈ Rd
is denoted by either ⟨x, y⟩ or x⊤ y. For u ∈ R, we define [u]+ := max{u, 0}. For two vectors
x, y ∈ Rd , x ⪯ y implies that xi ≤ yi , ∀i ∈ [d]. For a convex set X , we denote the projection of y
onto set X as ΠX (y) = arg minx∈X ∥x − y∥.

3.2 Function Properties

Here we list some function properties that will appear in our assumptions.
Monotonicity A function f is monotone if f (x) ≤ f (y) for all x ⪯ y.

3
Lipschitz continuous A function f is Lipschitz continuous with parameter β if for any x, y ∈ X ,
we have f (x) − f (y) ≤ β∥x − y∥.

3.3 DR-Submodular Functions

A function f : 2Ω → R+ , defined on the ground set Ω, is said to be submodular if

f (A) + f (B) ≥ f (A ∪ B) + f (A ∩ B)
for all A, B ⊂ Ω. The notion of submodularity has been extended to continuous Qn domains [2, 34, 36].
Consider a function f : X → R+ where the domain is of the form X = i=1 Xi and each Xi is
a compact subset of R+ . We say that f is continuous submodular if f is continuous and for all
x, y ∈ X , we have
f (x) + f (y) ≥ f (x ∨ y) + f (x ∧ y)
where x ∨ y and x ∧ y are component-wise maximum and minimum, respectively. For efficient
maximization, we also require that these functions satisfy a diminishing returns condition [5]. We say
a differentiable function f is continuous DR-submodular if f it satisfies
∇f (x) ⪰ ∇f (y)
for all x ⪯ y. When the function f is twice differentiable, DR-submodularity is equivalent to
∂ 2 f (x)
≤ 0, ∀i, j, ∀x ∈ X .
∂xi ∂xj
The following lemma is an important property of monotone DR-submodular functions.
Lemma 1. Let f : X → R+ be a monotone, differentiable, DR-submodular function. For any two
vectors x, y ∈ X , we have
f (y) − 2f (x) ≤ ⟨∇f (x), y − x⟩.
The proof can be found in the proof of Theorem 4.2 in [12]. Note that this property shares an analogy
with a pivotal characteristic that defines concavity: f (y) − f (x) ≤ ⟨∇f (x), y − x⟩. Lemma 1
also implies that if we use gradient descent directly on f , we can only achieve an approximation
ratio of 1/2. Zhang et al. [42] introduced the so-called non-oblivious function to obtain the optimal
approximation ratio of 1 − 1/e:
Lemma 2. Let f : X → R+ be a monotone, differentiable, DR-submodular function, and let F be
R1
the non-oblivious function of f defined by its gradient ∇F (x) = 0 ez−1 ∇f (z · x)dz. Then for any
vectors x, y ∈ X , we have
1 − e−1 f (y) − f (x) ≤ ⟨∇F (x), y − x⟩.

The proof of Lemma 2 can be found in [42].

4 Problem Statement
Consider the following offline optimization problem:
maxx∈K f (x)
(1)
subject to g(x) ≤ 0,
where g(x) = ⟨p, x⟩ − b for some non-negative constant b. We study an analogous online setup
as follows: At each round t ∈ [T ], the algorithm chooses an action xt ∈ K, where K ⊂ Rd+ is a
fixed, known set. We consider both the utility and the constraints being stochastic, where we assume
at each time step, the utility function ft is sampled i.i.d. from a distribution Df with mean f , i.e.,
Eft ∼Df [ft (·)] = f (·), while the cost vector pt is i.i.d. sampled from another distribution Dp . After
an action is selected by the learner, a random reward ft (xt ) is obtained while using ⟨pt , xt ⟩ of its
fixed total allotted budget BT , and pt is observed. In the semi-bandit setting, an unbiased gradient
estimator for that action, ∇fe t (xt ), is also revealed. In the first order full-information setting, the
unbiased gradient estimator of any point can be observed. In this paper, we consider both settings
while all other works in the literature on DR-submodular maximization with long-term constraints,
such as [10, 29, 31], consider full-information feedback.

4
To make sure the long-term constraint is not vacuous, we consider BT = bT for a constant b such
that minx∈K ⟨p, x⟩ ≤ b < maxx∈K ⟨p, x⟩. In this case, there will always be a solution x that satisfies
constraint (having zero constraint violation) and it is not the case that any sequence of actions
(especially the most expensive w.r.t. p) is feasible.
We make the following assumptions to proceed our analysis:
Assumption 1. The constraint set K is convex and compact, with diameter d = supx,y∈K ∥x − y∥
and radius r = supx∈K ∥x∥. Since X is compact, we denote its diameter as d¯ = supx,y∈X ∥x − y∥
and radius as r̄ = supx∈X ∥x∥, respectively.
Assumption 2. The expected utility function f (·) is monotone DR-submodular and βf -Lipschitz.
Assumption 3. The distribution Dp for the cost vectors has bounded support βp B ∩ Rd+ with mean
p ⪰ 0, where B is the unit ball of Euclidean norm.
Assumption 4. The gradient oracle is unbiased E[∇f (x) − ∇f e t (x)|x] = 0 and has a bounded
2 2
variance E[∥∇f (x) − ∇f e t (x)∥ |x] ≤ σ . In the semi-bandit setting, we assume G =
maxt supx ∥∇ft (x)∥ is finite. In the first order full-information setting, denoting the unbiased
e
estimator of the non-oblivious function obtained at round t by ∇F e t (x), we assume GF =
maxt supx ∥∇F
e t (x)∥ is finite.

Unlike Frank-Wolfe type algorithms in other papers [10, 29, 31], we do not assume bounded smooth-
ness on the gradients: ∇f (x) − ∇f (y) ≤ β∥x − y∥. Moreover, we do not assume f (0) = 0.
Our overall goal is to maximize the total obtained reward while satisfying the budget constraint
PT
asymptotically (i.e., t=1 ⟨p, xt ⟩ − BT being sub-linear in T ).
Note that our proposed algorithm can handle multiple linear constraints as well, and similar regret
and constraint violation bounds can be derived. In the case of there are m constraints gi (·), i ∈ [m],
we can define g(x) := maxi∈[m] gi (x) and it can be shown that g preserves the same properties
as those of individual gi ’s (sub-differentiability, bounded (sub-)gradients and bounded values; see
Proposition 6 in [20] for proofs).
For simplicity of presentation, we denote β = max{βf , βp }. Since K is compact, from mono-
tonicity of f we have F1 := maxx∈K |f (x)| is bounded. Since f is βf -Lipschitz, we have
F2 := maxx,y∈K |f (x) − f (y)| ≤ βf D is bounded. Since K is compact and Dp has bounded
support, we have C := maxp′ ∼Dp maxx∈K |⟨p′ , x⟩ − BTT | is bounded.
To measure the effectiveness of our proposed algorithm, we use the notions of regret and total
constraint violation to quantify the overall utility and the total resource consumption, respectively.
Regret is typically defined as the difference between the total reward accumulated by the algorithm
and the best fixed action in hindsight. Note that even in the offline setting, maximizing a monotone
DR-submodular function subject to a convex constraint can only be done approximately in polynomial
time unless RP = NP [5]. Thus, we instead use the notion of α-regret of an algorithm.
Definition 1. The α-regret of an online algorithm with outputs {xt }Tt=1 is defined as
T
X T
X
RT := α max∗ ft (x) − ft (xt ), (2)
x∈K
t=1 t=1

where K∗ is the restricted search space of solutions that satisfy long-term constraints for T steps, (i.e.,
PT
can be played T times), K∗ = {x ∈ K : t=1 g(x) ≤ 0}, which is also equivalent as satisfying
per-round constraint: K∗ = {x ∈ K : g(x) ≤ 0}.

Since we are mainly interested in stochastic utility functions, i.e., ft ∼ Df , we aim to minimize the
expected α-regret:
XT
E[RT ] = αT max∗ f (x) − f (xt ).
x∈K
t=1

Denote x∗ = arg maxx∈K∗ f (x). Note that since pt is drawn i.i.d. from the distribution Dp with
mean p ∀t ∈ [T ], the best benchmark action is with respect to the “true” underlying p of the constraint

5
function as opposed to pt . It is possible that the best-fixed action has a constraint violation with some
noisy pt ’s.
We next define the total constraint violation.
Definition 2. The total constraint violation of an online algorithm with outputs {xt }Tt=1 is defined as
T
X T
X
CT := g(xt ) = ⟨p, xt ⟩ − BT .
t=1 t=1

Again, in the stochastic constraint setting, the total constraint violation is defined with respect to the
mean p.

5 An Efficient Primal-Dual Algorithm under Semi-bandit Feedback

In this section, we introduce our first proposed algorithm for online DR-submodular maximization
subject to stochastic long term constraints under semi-bandit feedback: Online Lagrangian Stochastic
Gradient Ascent (OLSGA). The algorithm is presented in Algorithm 1. The overall structure of the
algorithm is inspired by the primal-dual update in Online Convex Optimization (OCO) (e.g., [20]).
Associating a dual variable λ ∈ [0, +∞) with the constraint, the saddle point formulation of (1) can
be written as
max min f (x) − λg(x).
x∈K λ∈[0,+∞)

In this work, we consider the following regularized Lagrangian function L(x, λ) given by
δη 2
L(x, λ) := f (x) − λg(x) + λ . (3)
2
It is important to observe that the expression in (3) deviates from the conventional Lagrangian due
to the inclusion of the term δη 2
2 λ , where both δ and η are parameters that will be later chosen to
optimize theoretical guarantees. The main purpose of this modification is to control the value of λ
and prevent it from growing too large. Although we can achieve the same goal by restricting λ to a
bounded domain, using the quadratic regularizer makes it convenient for our analysis.
One issue is that p (shown in g(x)) is unknown to the online algorithm. Therefore, we alternatively
bt = 1t ts=1 ps instead of p in the Lagrangian function. Moreover, in
P
use an empirical estimate p
order to achieve the high probability bound, we adjust our Lagrangian function in (3) as follows:
δη 2
Lt (x, λ) = ft (x) − λe
gt (x) + λ , (4)
2
q
2C 2 log( 2T
ε )
where get (x) = ⟨bpt , x⟩ − BTT − γt and γt = t . For the purpose of analysis, we further
pt , x⟩ − BTT .
define gbt (x) := ⟨b
For the purpose of analysis, we do not directly use Equation (4) in our primal update. Let Lbt be
defined by its gradient, ∇
e x Lbt (xt , λt ) = ∇f
e t (xt ) − 2λt ∇e gt (xt ). The primal updates are formulated
as follows:
xt+1 = ΠK (xt + η ∇ e x Lbt (xt , λt )).

Note that compared to Equation (4), the Lagrangian function used for updating has a coefficient of 2
in front of the second term.
Our proposed algorithm is shown in Algorithm 1. The algorithm proceed as follows: it takes a
convex constraint set K and a time horizon T as inputs. Initially, the algorithm selects an initial point
x1 ∈ K and sets λ1 = 0. At each time step t ∈ [T ], the algorithm takes an action xt , acquires a
reward ft (xt ), and observes the cost vector pt as well as an unbiased gradient estimate ∇f e t (xt ).
Subsequently, unbiased gradient estimates of the updating Lagrangian function with respect to x and
to λ are computed using the empirical estimate of p. Using these calculated gradients, updates to x
and λ are made using (7) and (8), respectively.

6
Algorithm 1 OLSGA (Semi-bandit Feedback)
1: Input: Convex set K, time horizon T
2: Initialize x1 ∈ K, λ1 = 0.
3: for t ∈ [T ] do
4: Play xt , obtain ft (xt ) and ∇f
e t (xt ) and pt
1
Pt
5: Compute p bt = t s=1 ps
6: Compute
∇
e x Lbt (xt , λt ) = ∇f
e t (xt ) − 2λt ∇e
gt (xt ) (5)
∇λ Lt (xt , λt ) = −e gt (xt ) + δηλt (6)

7: Update xt and λt :
xt+1 = ΠK (xt + η ∇
e x Lbt (xt , λt )) (7)
λt+1 = Π[0,+∞) (λt − η∇λ Lt (xt , λt )) (8)
8: end for

Remark 1. A notable difference between our algorithm and all prior works addressing online DR-
submodular maximization with long-term constraints (e.g., [29, 31]) is that our algorithm can handle
search spaces that do not necessarily include 0. This distinction bears importance, particularly when
considering scenarios where we can only query values within the constraint set. In such cases, 1/2
has been conjectured to be the optimal approximation ratio ([28] section B in Appendix). We refer to
Appendix H for motivation examples where searching over K ∪ {0} is not applicable.
Now, we establish the regret and constraint violation achievable by our proposed Algorithm 1. Before
delving into the main theorem, we first present three lemmas, which are adapted from [29] and are
essential for achieving high probability bounds. Given the slight difference in the definition of p
b, we
provide the proofs in Appendices A to C respectively. First, Lemma 3 demonstrates that with high
probability, the empirical estimate p
bt is close to its mean p.
Lemma 3. The following holds with probability at least 1 − ε:
s
T
X 2nT
∥b
pt − p∥ ≤ Qβ T log ,
t=1
ε
where Q > 0 is some universal constant.
Next, Lemma 4 establishes that with high probability, the gb(·) computed using p
bt and g(·) calculated
using p are close.
n q oT
Lemma 4. Let x ∈ K be fixed. For a fixed t ∈ [T ] and γt := 2t C 2 log 2T ε , |b
gt (x) −
t=1
ε
g(x)| ≤ γt holds with probability at least 1 − T .
Finally, Lemma 5 provides an upper bound for the total constraint violation.
Lemma 5. Let {γt }Tt=1 be defined as in Lemma 4, then the following holds:
T
X T
X T
X
CT ≤ get (xt ) + r ∥b
pt − p∥ + γt . (9)
t=1 t=1 t=1

Armed with these results, we can now establish regret and constraint violation bounds for Algorithm 1.
d
Theorem 1. Let Assumptions 1 2 3 4 be satisfied. Let U = max{G, C}. Choose η = U √ T
and δ = 8β 2 . Let x1 , . . . , xT be the sequence of solutions obtained by Algorithm 1. When T is
2 2
sufficiently large, i.e., T ≥ 64dU 2β , we have the following 12 -regret and constraint violation bounds
with probability at least 1 − ε:
√
E[RT ] = O( T ) and CT = O(T 3/4 ).

7
The complete theorem statement and the complete proof are in Appendix D.
Partial Proof: From the update of xt , we have that for any x ∈ K,
∥xt+1 − x∥2 = ∥ΠK (xt + η ∇ e x Lbt (xt , λt )) − x∥2
e x Lbt (xt , λt ) − x∥2
≤ ∥xt + η ∇
e x Lbt (xt , λt )∥2 − 2η(x − xt )⊤ ∇
≤ ∥xt − x∥2 + η 2 ∥∇ e x Lbt (xt , λt ). (10)
Rearranging, and using Assumption 4 and Assumption 3 we have
e x Lbt (xt , λt ) ≤ 1 (∥xt − x∥2 −∥xt+1 − x∥2 ) + ηG2 + 4ηβ 2 λ2t .
(x − xt )⊤ ∇ (11)
2η
Applying similar steps on the λ updates, we establish
1
(λ − λt )⊤ ∇λ Lt (xt , λt ) ≥ − (∥λt − λ∥2 − ∥λt+1 − λ∥2 ) − C 2 η − 2ηγt2 − 2δ 2 η 3 λ2t . (12)
2η
From monotonicity and DR-submodularity of E[ft (x)], we have
E[Lt (x, λt ) − 2Lt (xt , λt )]
= E[E[Lt (x, λt ) − 2Lt (xt , λt )|xt ]]
δη
≤ E[(x − xt )⊤ ∇x E[Lbt (xt , λt )|xt ]] + λt get (x) − λ2t (Lemma 1)
2
1 δη
≤ E[∥xt − x∥2−∥xt+1 − x∥2 ] + G2 η + 4ηβ 2 λ2t + λt get (x) − λ2t , (13)
2η 2
where (13) follows from (11). Similarly, from convexity of function Lt (x, λ) w.r.t λ, we have
Lt (xt , λ) − Lt (xt , λt ) ≥ (λ − λt )⊤ ∇λ Lt (xt , λt )
1
≥ − (∥λt − λ∥2 − ∥λt+1 − λ∥2 ) − C 2 η − 2ηγt2 − 2δ 2 η 3 λ2t , (14)
2η
where (14) follows from (12). Subtracting two times (14) from (13), and sum t over 1 through T , we
get
T
X
E[Lt (x, λt ) − 2Lt (xt , λ)]
t=1
T T T
d2 λ2 X X X δη
≤ + + G2 ηT + ηβ 2 λ2t + 2C 2 ηT + 4η γt2 + 4δ 2 η 3 λ2t + λt get (x) − λ2t ,
2η η t=1 t=1 t=1
2
(15)
Expanding the left hand side of (15) and rearranging, we deduce
T
" T #
X X 1
[f (x) − 2f (xt )] + 2λ get (xt ) − δηT + λ2
t=1 t=1
η
T T T
X X d2 X
≤2 λt get (x) + η 4β 2 + 4δ 2 η 2 − δ λ2t + + G2 ηT + 2C 2 ηT + 4η γt2 . (16)
t=1 t=1
2η t=1

2 2
To ensure that the equation 4β 2 + 4δ 2 η 2 − δ = 0 has real roots, we require T ≥ 64dU 2β . Setting
δ = 8β 2 ensures that 4β 2 + 4δ 2 η 2 − δ ≤ 0. Set x = x∗ ; From Lemma 4, with probability at least
1 − Tε , get (x∗ ) = gbt (x∗ ) − γt ≤ g(x∗ ) holds. since x∗ satisfies the long term constraint, we have
g(x∗ ) ≤ 0. Thus, we can drop the first two terms in the RHS of (16) and by union bound, we get
with probability at least 1 − ε,
T
" T # T
d2

X
∗
X 1 X
[f (x ) − 2f (xt )] + 2λ get (xt ) − δηT + λ2 ≤ + G2 ηT + 2C 2 ηT + 4η γt2 .
t=1 t=1
η 2η t=1
(17)

8
Maximizing the LHS of (17) with respect to λ over the range [0, +∞), we get a solution of λ =
[ Tt=1 get (xt )]+
P

δηT +1/η . Plugging this into (17) gives us

hP i2
T
t=1 g
et (xt )
T T
X + d2 X
[f (x∗ ) − 2f (xt )] + ≤ + G2 ηT + 2C 2 ηT + 4η γt2 . (18)
t=1
δηT + 1/η 2η t=1
d
Plugging in U = max{G, C} and η = U
√
T
, we have with probability at least 1 − ε,
hP i2
T
t=1 g
et (xt )
T
X + 7dU √ 2T
[f (x∗ ) − 2f (xt )] + ≤ T + 8dU log , (19)
t=1
δηT + 1/η 2 ε
PT 1
√
where we used the fact that t=1
√
t
≤ 2 T . This gives us our result on objective regret:
T
√

X 1
f (x∗ ) − f (xt ) = O( T ). (20)
t=1
2
The detailed proof, including constraint violation steps, are provided in Appendix D.

6 First Order Full-information Case

In this section, we introduce our second proposed algorithm for online DR-submodular maximization
subject to stochastic long-term constraints under the first-order full-information feedback. In the
interest of space, the algorithm is presented in Algorithm 2 in the Appendix. The overall structure of
the algorithm is similar to Algorithm 1, but the primal update uses the gradient of the non-oblivious
function:
∇
e x Lbt (xt , λt ) = ∇F
e t (xt ) − λt ∇e
gt (xt ), and
xt+1 = ΠK (xt + η ∇
e x Lbt (xt , λt )),

R1
where the non-oblivious function F is defined by its gradient: ∇F (x) = 0 ez−1 ∇f (z · x)dz.
As we discussed in Section 3, the non-oblivious function F plays an important role in obtaining
the optimal approximation ratio 1 − 1/e. However, calculating the gradient of the non-oblivious
function F (x) can be challenging, especially when only unbiased estimates of the gradients are
available. To overcome this, [42] presents a computational approach for obtaining an unbiased
estimate of the gradient of F (x) through sampling (Lines 6 and 7). The following lemma indicates
that (1 − 1/e)∇fe t (z ∗ x) is an unbiased estimator of ∇F (x) with bounded variance.
Lemma
h 6. If z is sampled from
i r.v. Z as in line 6 of Algorithm 2, E[∇f
e t (x) | x] = ∇f (x), and
E ∥∇f e t (x) − ∇f (x)∥2 | x ≤ σ 2 , we have
h i
(i) E (1 − 1/e)∇f e t (z ∗ x)| x = ∇F (x);

2 2 2
2
(ii) E (1 − 1/e)∇ft (z ∗ x) − ∇F (x)
e x ≤ σ12 , where σ12 = 2 (1 − 1/e) σ 2 + 2β r̄ (1−1/e)
3 .

With the unbiased estimator of the gradient of the non-oblivious function, we show the following
regret and constraint violation guarantee for our Algorithm 2 in Appendix E:
d
Theorem 2. Let Assumptions 1 2 3 4 be satisfied. Let U = max{GF , C}. Choosing η = U √ T
and
δ = 4β 2 . Let xt , t ∈ [T ] be the sequence of solutions obtained by Algorithm 2. When T is sufficiently
2 2
large, i.e., T ≥ 16dU 2β , we have the following (1 − 1/e)-regret and constraint violation bounds with
probability at least 1 − ε:
√
E[RT ] = O( T ) and CT = O(T 3/4 ).

9
7 Conclusions
In this paper, we address the problem of stochastic DR-submodular maximization with stochastic
long-term constraints
√ over a general convex set. We introduce the first algorithm for this setting,
attaining O( T ) regret and O(T 3/4 ) constraint violation bounds. Notably, our algorithm operates in
both the semi-bandit feedback and first-order full-information setting, requiring only
√ 1 gradient query
per round, while all previous works operate in the full-information setting with T gradient queries
per round. Extension of the results here to upper-linearizable functions in [27] is an open direction.

8 Acknowledgement
This work was supported in part by the National Science Foundation under grants CCF-2149588 and
CCF-2149617. We acknowledge Yiyang (Roy) Lu for helpful feedback.

References
[1] S. Agrawal and N. R. Devanur. Bandits with concave rewards and convex knapsacks. Proceed-
ings of the fifteenth ACM conference on Economics and computation, 2014.
[2] F. R. Bach. Submodular functions: from discrete to continuous domains. Mathematical
Programming, 175:419 – 459, 2015.
[3] A. Badanidiyuru, R. D. Kleinberg, and A. Slivkins. Bandits with knapsacks. 2013 IEEE 54th
Annual Symposium on Foundations of Computer Science, pages 207–216, 2013.
[4] S. Balseiro and Y. Gur. Learning in repeated auctions with budgets: Regret minimization and
equilibrium. Management Science, 65:3952–3968, 09 2019. doi: 10.1287/mnsc.2018.3174.
[5] A. A. Bian, B. Mirzasoleiman, J. Buhmann, and A. Krause. Guaranteed Non-convex Optimiza-
tion: Submodular Maximization over Continuous Domains. In AISTATS, volume 54, pages
111–120. PMLR, 20–22 Apr 2017.
[6] Y. Bian, J. M. Buhmann, and A. Krause. Continuous submodular function maximization. ArXiv,
abs/2006.13474, 2020.
[7] L. Chen, C. Harshaw, H. Hassani, and A. Karbasi. Projection-free online optimization with
stochastic gradient: From convexity to submodularity. In J. Dy and A. Krause, editors, Proceed-
ings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of
Machine Learning Research, pages 814–823. PMLR, 10–15 Jul 2018.
[8] L. Chen, C. Harshaw, H. Hassani, and A. Karbasi. Projection-free online optimization with
stochastic gradient: From convexity to submodularity. In J. Dy and A. Krause, editors, Proceed-
ings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of
Machine Learning Research, pages 814–823. PMLR, 10–15 Jul 2018.
[9] L. Chen, H. Hassani, and A. Karbasi. Online continuous submodular maximization. In
A. Storkey and F. Perez-Cruz, editors, Proceedings of the Twenty-First International Confer-
ence on Artificial Intelligence and Statistics, volume 84 of Proceedings of Machine Learning
Research, pages 1896–1905. PMLR, 09–11 Apr 2018.
[10] J. Feng, R. Yang, Y. Zhang, and Z. Zhang. Online weakly dr-submodular optimization with
stochastic long-term constraints. In D.-Z. Du, D. Du, C. Wu, and D. Xu, editors, Theory and
Applications of Models of Computation, pages 32–42, Cham, 2022. Springer International
Publishing. ISBN 978-3-031-20350-3.
[11] C. Guestrin, A. Krause, and A. P. Singh. Near-optimal sensor placements in gaussian processes.
Proceedings of the 22nd international conference on Machine learning, 2005.
[12] H. Hassani, M. Soltanolkotabi, and A. Karbasi. Gradient methods for submodular maximization.
In Neural Information Processing Systems, 2017.

10
[13] E. Hazan. Introduction to Online Convex Optimization. Foundations and Trends in Optimization.
Now, Boston, 2016. ISBN 978-1-68083-170-2. doi: 10.1561/2400000013.
[14] R. Jenatton, J. Huang, and C. Archambeau. Adaptive algorithms for online convex optimization
with long-term constraints. In M. F. Balcan and K. Q. Weinberger, editors, Proceedings of The
33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine
Learning Research, pages 402–411, New York, New York, USA, 20–22 Jun 2016. PMLR.
[15] C. Jin, P. Netrapalli, R. Ge, S. M. Kakade, and M. I. Jordan. A short note on concentration
inequalities for random vectors with subgaussian norm, 2019.
[16] D. Kempe, J. M. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social
network. In Knowledge Discovery and Data Mining, 2003.
[17] A. Krause and D. Golovin. Submodular function maximization. In Tractability, 2014.
[18] N. Liakopoulos, A. Destounis, G. Paschos, T. Spyropoulos, and P. Mertikopoulos. Cautious
regret minimization: Online optimization with long-term budget constraints. In K. Chaudhuri
and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine
Learning, volume 97 of Proceedings of Machine Learning Research, pages 3944–3952. PMLR,
09–15 Jun 2019.
[19] T. Lin, J. Li, and W. Chen. Stochastic online greedy learning with semi-bandit feedbacks. In
Proceedings of the 29th International Conference on Neural Information Processing Systems,
pages 352–360, 2015.
[20] M. Mahdavi, R. Jin, and T. Yang. Trading regret for efficiency: Online convex optimization
with long term constraints. Journal of Machine Learning Research, 13(81):2503–2528, 2012.
[21] S. Mannor, J. N. Tsitsiklis, and J. Y. Yu. Online learning with sample path constraints. Journal
of Machine Learning Research, 10:569–590, 2009.
[22] B. Mirzasoleiman, A. Karbasi, R. Sarkar, and A. Krause. Distributed submodular maximization:
Identifying representative elements in massive data. In C. Burges, L. Bottou, M. Welling,
Z. Ghahramani, and K. Weinberger, editors, Advances in Neural Information Processing
Systems, volume 26. Curran Associates, Inc., 2013.
[23] B. Mirzasoleiman, A. Badanidiyuru, and A. Karbasi. Fast constrained submodular maximization:
Personalized data summarization. In International Conference on Machine Learning, 2016.
[24] R. Niazadeh, N. Golrezaei, J. R. Wang, F. Susan, and A. Badanidiyuru. Online learning via
offline greedy algorithms: Applications in market design and optimization. In Proceedings of
the 22nd ACM Conference on Economics and Computation, pages 737–738, 2021.
[25] G. Nie, M. Agarwal, A. K. Umrawal, V. Aggarwal, and C. J. Quinn. An explore-then-commit
algorithm for submodular maximization under full-bandit feedback. In Uncertainty in Artificial
Intelligence, pages 1541–1551. PMLR, 2022.
[26] G. Nie, Y. Y. Nadew, Y. Zhu, V. Aggarwal, and C. J. Quinn. A framework for adapting offline
algorithms to solve combinatorial multi-armed bandit problems with bandit feedback. In
A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, editors, Proceedings
of the 40th International Conference on Machine Learning, volume 202 of Proceedings of
Machine Learning Research, pages 26166–26198. PMLR, 23–29 Jul 2023.
[27] M. Pedramfar and V. Aggarwal. From linear to linearizable optimization: A novel framework
with applications to stationary and non-stationary dr-submodular optimization. In Thirty-eighth
Conference on Neural Information Processing Systems, 2024.
[28] M. Pedramfar, C. J. Quinn, and V. Aggarwal. A unified approach for maximizing continuous
DR-submodular functions. In Thirty-seventh Conference on Neural Information Processing
Systems, 2023.
[29] P. S. Raut, O. Sadeghi, and M. Fazel. Online dr-submodular maximization: Minimizing regret
and constraint violation. In AAAI Conference on Artificial Intelligence, 2021.

11
[30] O. Sadeghi and M. Fazel. Online continuous dr-submodular maximization with long-term
budget constraints. In S. Chiappa and R. Calandra, editors, Proceedings of the Twenty Third
International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of
Machine Learning Research, pages 4410–4419. PMLR, 26–28 Aug 2020.
[31] O. Sadeghi, P. Raut, and M. Fazel. A single recipe for online submodular maximization with
adversarial or stochastic constraints. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan,
and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages
14712–14723. Curran Associates, Inc., 2020.
[32] T. Soma, N. Kakimura, K. Inaba, and K. ichi Kawarabayashi. Optimal budget allocation:
Theoretical guarantee and efficient algorithm. In International Conference on Machine Learning,
2014.
[33] M. J. Streeter and D. Golovin. An online algorithm for maximizing submodular functions. In
Neural Information Processing Systems, 2008.
[34] J. Vondrák. Submodularity in Combinatorial Optimization. Phd thesis, Charles University,
2007.
[35] X. Wei, H. Yu, and M. J. Neely. Online primal-dual mirror descent under stochastic constraints.
Proceedings of the ACM on Measurement and Analysis of Computing Systems, 4:1 – 36, 2019.
[36] L. A. Wolsey. An analysis of the greedy algorithm for the submodular set covering problem.
Combinatorica, 2:385–393, 1982.
[37] X. Yi, X. Li, T. Yang, L. Xie, T. Chai, and K. Johansson. Regret and cumulative constraint
violation analysis for online convex optimization with long term constraints. In M. Meila and
T. Zhang, editors, Proceedings of the 38th International Conference on Machine Learning,
volume 139 of Proceedings of Machine Learning Research, pages 11998–12008. PMLR, 18–24
Jul 2021.
√
[38] H. Yu and M. J. Neely. A low complexity algorithm with o( T ) regret and o(1) constraint
violations for online convex optimization with long term constraints. Journal of Machine
Learning Research, 21(1):1–24, 2020.
[39] H. Yu, M. J. Neely, and X. Wei. Online convex optimization with stochastic constraints. In
Neural Information Processing Systems, 2017.
[40] J. Yuan and A. Lamperski. Online convex optimization for cumulative constraints. In S. Bengio,
H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in
Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
[41] M. Zhang, L. Chen, H. Hassani, and A. Karbasi. Online continuous submodular maximization:
From full-information to bandit feedback. In Advances in Neural Information Processing
Systems, volume 32. Curran Associates, Inc., 2019.
[42] Q. Zhang, Z. Deng, Z. Chen, H. Hu, and Y. Yang. Stochastic continuous submodular maximiza-
tion: Boosting via non-oblivious function. In International Conference on Machine Learning,
2022.
[43] M. A. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In
International Conference on Machine Learning, 2003.

12
A Proof of Lemma 3

Before proceeding to the proof, we first show a lemma:

Lemma 7. For all t ∈ [T ], the following holds:
ζ2
− 2ρ
P{∥pt − p∥ ≥ ζ} ≤ 2e 2
,
where ρ = cβ for some universal constant c > 0.

Proof. From Assumption 1, we have ∥pt ∥ ≤ β. We can apply Lemma 1 of [15] with sub-Gaussian
parameter ρ = cβ for some universal constant c > 0 and the result follows immediately.

Then we proceed to prove Lemma 3. We restate the lemma as follows:

Lemma 8. The following holds with probability at least 1 − ε:
s
T
X 2nT
∥b
pt − p∥ ≤ Qβ T log ,
t=1
ε

where Q > 0 is some universal constant.

Proof. From the definition of p

bt , we have
t
1 X
∥b
pt − p∥ = ∥ (p − p)∥. (21)
t s=1 s

Combine E[pt − p] = 0 and Lemma 7, we can apply Corollary 7 of [15] to the random vector
{ps − p}ts=1 and obtain with probability at least 1 − Tε ,
v
t u t r
X uX
′t 2
2dT ′ 2dT
∥ (ps − p)∥ ≤ c ρ log( ) = c ρ t log( ), (22)
s=1 s=1
ε ε

where c′ > 0 is some universal constant. Combining (21) and (22) and apply union bound, we have
with probability at least 1 − Tε ,
T T t
X X 1 X
∥b
pt − p∥ = ∥ (p − p)∥
t=1 t=1
t s=1 s
T r
X
′ 2dT
≤ c ρ log( )/t
t=1
ε
T r
X
′ 2dT
≤ c ρ T log( ), (23)
t=1
ε
PT 1
√
where the last inequality follows from t=1
√
t
≤ 2 T . We get the desired result be taking
Q = 2cc′ .

B Proof of Lemma 4

We restate the lemma as follows:

n q oT
Lemma 9. Let x ∈ K be fixed. For a fixed t ∈ [T ] and γt := 2t C 2 log 2T
ε , |b
gt (x) −
t=1
ε
g(x)| ≤ γt holds with probability at least 1 − T .

13
Proof. Recall from the definition that get (x) = ⟨b pt , x⟩ − BTT − γt and gbt (x) := ⟨b pt , x⟩ − BTT .
BT BT
Note that E[bgt (x)] = E[⟨b pt , x⟩ − T ] = ⟨p, x⟩ − T = g(x). Recall that we have defined
C := maxp′ ∼Dp maxx∈K |⟨p′ , x⟩ − BTT |. Thus, we have gbt (x) is bounded within interval [−C, C].
Apply Hoeffding’s inequality on gbt (x), we get
tγ 2

P {|bgt (x) − g(x)| > γt } ≤ 2 exp − t2 .
2C
gt (x) − g(x)| > γt } ≤ Tϵ .
Substituting the value of γt in the right hand side, we get that P {|b

C Proof of Lemma 5
We restate the lemma as follows:
Lemma 10. Let {γt }Tt=1 be defined as in Lemma 4, then the following holds:
T
X T
X T
X
CT ≤ get (xt ) + r ∥b
pt − p∥ + γt . (24)
t=1 t=1 t=1

PT
Proof. Bounding t=1 g
et (xt ) from below, we obtain:
T
X T
X T
X T
X
get (xt ) = g (xt ) + gt (xt ) − g (xt )) −
(b γt
t=1 t=1 t=1 t=1
T
X T
X T
X
≥ g (xt ) − |b
gt (xt ) − g (xt )| − γt
t=1 t=1 t=1
T
X T
X
= CT − |⟨b
pt − p, xt ⟩| − γt
t=1 t=1
T
X T
X
≥ CT − ∥b
pt − p∥ ∥xt ∥ − γt
t=1 t=1
T
X T
X
≥ CT − r ∥b
pt − p∥ − γt .
t=1 t=1

Rearranging the above inequality, we obtain the desired result.

D Proof of Theorem 1
We restate our theorem as follows:
d
Theorem 3. Let Assumptions 1 2 3 4 be satisfied. Let U = max{G, C}. Choose η = U √ T
2
and δ = 8β . Let xt , t ∈ [T ] be the sequence of solutions obtained by Algorithm 1. When T is
2 2
sufficiently large, i.e., T ≥ 64dU 2β , we have the following 12 -regret and constraint violation bounds
with probability at least 1 − ε:
T
7dU √

X 1 ∗ 2T
E[RT ] ≤ f (x ) − f (xt ) ≤ T + 8dU log = O(T 1/2 )
t=1
2 4 ε
and
s
7dU √ 8β d U √
2
2T
CT ≤ T + 4dU log + (F1 + F2 )T · + T
4 ε U d
s s
2nT 2T
+ rQσ T log + 2 2T C 2 log (25)
ε ε
= O(T 3/4 ).

14
Proof. From the update of xt , we have that for any x ∈ K,

∥xt+1 − x∥2 = ∥ΠK (xt + η ∇ e x Lbt (xt , λt )) − x∥2

e x Lbt (xt , λt ) − x∥2
≤ ∥xt + η ∇ (def. of projection)
e x Lbt (xt , λt )∥2 − 2η(x − xt )⊤ ∇
= ∥xt − x∥2 + η 2 ∥∇ e x Lbt (xt , λt ). (26)

Rearranging,

1 η e b
(x − xt )⊤ ∇
e x Lbt (xt , λt ) ≤ (∥xt − x∥2 − ∥xt+1 − x∥2 ) + ∥∇x Lt (xt , λt )∥2
2η 2
1 η e
= (∥xt − x∥2 − ∥xt+1 − x∥2 ) + ∥∇f (xt ) − 2λt ∇e gt (xt )∥2
2η 2
(from (5))
1
≤ (∥xt − x∥2 − ∥xt+1 − x∥2 ) + η∥∇f
e (xt )∥2 + 4ηλ2t ∥∇e
gt (xt )∥2
2η
(∥a + b∥2 ≤ 2∥a∥2 + 2∥b∥2 )
1
≤ (∥xt − x∥2 − ∥xt+1 − x∥2 ) + ηG2 + 4ηβ 2 λ2t (27)
2η

where (27) follows from Assumption 4 and Assumption 3. Taking expectation with respect to ft , we
have

E[Lt (x, λt ) − 2Lt (xt , λt )]

= E[E[Lt (x, λt ) − 2Lt (xt , λt )|xt ]]

δη 2 δη
= E[E[ft (x) − λt get (x) + λt − 2 ft (xt ) − λt get (xt ) + λ2t |xt ]]
2 2
δη
= E[E[ft (x) − 2ft (xt ) − 2λt (e gt (x) − get (xt ))|xt ]] + λt get (x) − λ2t
2
⊤ δη
≤ E[(x − xt ) ∇x E[ft (xt ) − 2λt (e gt (x) − get (xt ))|xt ]] + λt get (x) − λ2t
2
(Assumption 2 and Lemma 1)
δη
= E[(x − xt )⊤ ∇x E[f (xt ) − 2λt (x − xt )⊤ ∇x get (xt )|xt ]] + λt get (x) − λ2t
2
gt (·) is linear so get (y) − get (x) ≥ ⟨∇e
(e gt (x), y − x⟩)
δη
≤ E[E[(x − xt )⊤ ∇x f (xt ) − 2λt (x − xt )⊤ ∇x get (xt )|xt ]] + λt get (x) − λ2t
2
(linearity of expectation)
δη
= E[E[(x − xt )⊤ ∇x Lbt (xt , λt )|xt ]] + λt get (x) − λ2t
2
1 δη
≤ E[∥xt − x∥2 − ∥xt+1 − x∥2 ] + G2 η + 4ηβ 2 λ2t + λt get (x) − λ2t , (28)
2η 2

where (28) follows from (27). From the update of λt (8), we have

∥λt+1 − λ∥2 = ∥Π[0,+∞) (λt − η∇λ Lt (xt , λt )) − λ∥2 (from (8))

2
≤ ∥λt − η∇λ Lt (xt , λt ) − λ∥ (def. of projection)
2 2 2 ⊤
= ∥λ − λt ∥ + η ∥∇λ Lt (xt , λt )∥ + 2η(λ − λt ) ∇λ Lt (xt , λt ). (29)

15
Rearranging,
1 η
(λ − λt )⊤ ∇λ Lt (xt , λt ) ≥ − (∥λt − λ∥2 − ∥λt+1 − λ∥2 ) − ∥∇λ Lt (xt , λt )∥2
2η 2
1 η
= − (∥λt − λ∥2 − ∥λt+1 − λ∥2 ) − ∥ − get (xt ) + δηλt ∥2
2η 2
(from (6))
1 η
= − (∥λt − λ∥2 − ∥λt+1 − λ∥2 ) − ∥ − gbt (xt ) − γt + δηλt ∥2
2η 2
(by def. gbt )
1
≥ − (∥λt − λ∥2 − ∥λt+1 − λ∥2 ) − η∥bgt (xt )∥2 − 2ηγt2 − 2δ 2 η 3 λ2t
2η
(apply ∥a + b∥2 ≤ 2∥a∥2 + 2∥b∥2 twice)
1
≥ − (∥λt − λ∥2 − ∥λt+1 − λ∥2 ) − C 2 η − 2ηγt2 − 2δ 2 η 3 λ2t , (30)
2η
BT
where the last inequality follows from the definition of C := maxp′ ∼Dp maxx∈K |⟨p′ , x⟩ − T | and
pt , x⟩ − BTT . From convexity of function Lt (x, λ) w.r.t λ, we have
gbt (x) := ⟨b
Lt (xt , λ) − Lt (xt , λt ) ≥ (λ − λt )⊤ ∇λ Lt (xt , λt )
1
≥ − (∥λt − λ∥2 − ∥λt+1 − λ∥2 ) − C 2 η − 2ηγt2 − 2δ 2 η 3 λ2t (31)
2η
where (31) follows from (30). Subtracting two times (31) from (28), we get
1 1
E[Lt (x, λt ) − 2Lt (xt , λ)] ≤ E[∥xt − x∥2 − ∥xt+1 − x∥2 ] + (∥λt − λ∥2 − ∥λt+1 − λ∥2 )
2η η
δη
+ G2 η + 4ηβ 2 λ2t + 2C 2 η + 4ηγt2 + 4δ 2 η 3 λ2t + λt get (x) − λ2t .
2
(32)
Summing (32) for t ∈ [T ], we have
T
X
E[Lt (x, λt ) − 2Lt (xt , λ)]
t=1
T T
1 X 1X
≤ E[∥xt − x∥2 − ∥xt+1 − x∥2 ] + (∥λt − λ∥2 − ∥λt+1 − λ∥2 )
2η t=1 η t=1
T T T T T
X X X X δη X 2
+ G2 ηT + 4ηβ 2 λ2t + 2C 2 ηT + 4η γt2 + 4δ 2 η 3 λ2t + λt get (x) − λ
t=1 t=1 t=1 t=1
2 t=1 t
1 1
≤ E[∥x1 − x∥2 − ∥xT +1 − x∥2 ] + (∥λ1 − λ∥2 − ∥λT +1 − λ∥2 )
2η η
T T T T T
X X X X δη X 2
+ G2 ηT + 4ηβ 2 λ2t + 2C 2 ηT + 4η γt2 + 4δ 2 η 3 λ2t + λt get (x) − λ
t=1 t=1 t=1 t=1
2 t=1 t
(telescoping series)
1 1
≤ E[∥x1 − xT +1 ∥2 ] + (∥λ1 − λ∥2 − ∥λT +1 − λ∥2 )
2η η
T T T T T
2 2
X
2 2
X
2 2 3
X
2
X δη X 2
+ G ηT + 4ηβ λt + 2C ηT + 4η γt + 4δ η λt + λt get (x) − λ
t=1 t=1 t=1 t=1
2 t=1 t
(triangle inequality)
T T T T T
d2 λ2 X X X X δη X 2
≤ + + G2 ηT + ηβ 2 λ2t + 2C 2 ηT + 4η γt2 + 4δ 2 η 3 λ2t + λt get (x) − λ ,
2η η t=1 t=1 t=1 t=1
2 t=1 t
(33)

16
where (33) uses Assumption 1 with def. of diameter d, λ1 = 0, and −∥ · ∥2 ≤ 0. Expanding the left
hand side of (33), we deduce

T T T
δηλ2
X X X
t
[f (x) − 2f (xt )] + gt (xt ) − λt get (x)] +
[2λe − δηλ2
t=1 t=1 t=1
2
T T T T T
d2 λ2 X X X X δη X 2
≤ + + G2 ηT + 4ηβ 2 λ2t + 2C 2 ηT + 4η γt2 + 4δ 2 η 3 λ2t + λt get (x) − λ .
2η η t=1 t=1 t=1 t=1
2 t=1 t
(34)

Rearranging, we have

T
( T )
X X 1
[f (x) − 2f (xt )] + 2λ get (xt ) − δηT + λ2
t=1 t=1
η
T T T
X X d2 X
≤2 λt get (x) + η 4β 2 + 4δ 2 η 2 − δ λ2t + + G2 ηT + 2C 2 ηT + 4η γt2 . (35)
t=1 t=1
2η t=1

Set x = x∗ . From Lemma 4, with probability at least 1 − Tε , get (x∗ ) = gbt (x∗ ) − γt ≤ g(x∗ ) holds.
Since x∗ satisfies the long term constraint, we have g(x∗ ) ≤ 0. By the union bound, we get with
probability at least 1 − ε that for the first term on the RHS of (35)

T
X T
X
2 λt get (x∗ ) ≤ 2g(x∗ ) λt ≤ 0.
t=1 t=1

Now we choose δ such that 4β 2 + 4δ 2 η 2 − δ ≤ 0 (so that the second term on the RHS of (35) will
be negative and can be dropped with an upper bound). This is a quadratic term w.r.t. δ. It is easy
1 d
to verify that the quadratic formula has real roots when η ≤ 8β . As we choose η = U √ T
, this will
64d2 β 2
give us the condition that T needs to be sufficiently large, i.e., T ≥ U2 . We can simply choose
δ = 8β 2 .
Applying both of those inequalities to (35) and by union bound, we get with probability at least 1 − ε,

T
( T
)
X
∗
X 1
[f (x ) − 2f (xt )] + 2λ get (xt ) − (δηT + )λ2
t=1 t=1
η
T
d2 2 2
X
≤ + G ηT + 2C ηT + 4η γt2 . (36)
2η t=1

Maximizing the LHS of (36) with respect to λ, over the range [0, +∞), we get a solution of
[ Tt=1 get (xt )]
P
λ = δηT +1/η + . Plugging this into (36) gives us

hP i2
T
t=1 g
et (xt )
T T
X + d2 X
[f (x∗ ) − 2f (xt )] + ≤ + G2 ηT + 2C 2 ηT + 4η γt2 . (37)
t=1
δηT + 1/η 2η t=1

17
d
Let U = max{G, C}. Choosing η = U √ T
, we have with probability at least 1 − ε,
hP i2
T
t=1 g
et (xt )
XT
+
[f (x∗ ) − 2f (xt )] +
t=1
δηT + 1/η
T
d max{G, C} √ G2 d √ 2C 2 d √ 8 max{G, C}d log 2T
ε
X 1
≤ T+ T+ T+ √
2 max{G, C} max{G, C} T t=1
t
T
d max{G, C} √ √ √ 8U d log 2T
ε
X 1
≤ T + max{G, C}d T + 2 max{G, C}d T + √ (38)
2 T t=1
t
T
7dU √ 8U d log 2T X 1
≤ T+ √ ε √ (39)
2 T t=1
t
7dU √ 2T
≤ T + 16U d log . (40)
2 ε
PT √
where Equation (40) uses t=1 √1t ≤ 2 T . This gives us
T
7dU √

X 1 ∗ 2T
f (x ) − f (xt ) ≤ T + 8dU log = O(T 1/2 ). (41)
t=1
2 4 ε

Next, we establish our constraint violation bound. Since F1 := maxx∈K |f (x)| and F2 :=
maxx,y∈K |f (x) − f (y)|, we have
|f (x∗ ) − 2f (xt )| ≤ |f (x∗ ) − f (xt )| + |f (xt )| ≤ F1 + F2 , (42)
PT ∗
thus, t=1 [f (x ) − 2f (xt )] ≥ −(F1 + F2 )T . Plugging back in (40), we have
hP i2
T
t=1 g
et (xt ) 7dU √ 2T
+
≤ T + 8dU log + (F1 + F2 )T. (43)
δηT + 1/η 4 ε
Rearranging and plug in the value of η, we have
" T # s
7dU √ 8β d U √
2
X 2T
ge( xt ) ≤ T + 8dU log + (F1 + F2 )T · + T. (44)
t=1
4 ε U d
+

Combining (44) with Lemma 5, we get

s
T T
7dU √ √
2
2T 8β d U X X
CT ≤ T + 8dU log + (F1 + F2 )T · + T +r ∥b
pt − p∥ + γt
4 ε U d t=1 t=1
s
7dU √ √
2
2T 8β d U
≤ T + 8dU log + (F1 + F2 )T · + T
4 ε U d
s X T
2nT
+ rQσ T log + γt (using Lemma 3)
ε t=1
s
7dU √ √
2
2T 8β d U
≤ T + 8dU log + (F1 + F2 )T · + T
4 ε U d
s s
2nT 2
2T
+ rQσ T log + 2 2T C log
ε ε
= O(T 3/4 ).
where in the last inequality we plug in the definition of γt . This concludes the proof.

18
Algorithm 2 OLSGA with First Order Full Information
1: Input: Convex set K, time horizon T
2: Initialize x1 ∈ K, λ1 = 0.
3: for t ∈ [T ] do
4: Play xt , obtain ft (xt ) and ∇f
e t (·) and pt
1
Pt
5: Compute p bt = t s=1 ps
R z eu−1
6: Sample zt from Z where P(Z ≤ z) = 0 1−e −1 du.

7: Compute ∇Ft (xt ) = (1 − 1/e)∇ft (zt ∗ xt )

e e
8: Compute
∇
e x Lbt (xt , λt ) = ∇F
e t (xt ) − λt ∇egt (xt ) (45)
∇λ Lt (xt , λt ) = −e gt (xt ) + δηλt (46)

9: Update xt and λt :
xt+1 = ΠK (xt + η ∇
e x Lbt (xt , λt )) (47)
λt+1 = Π[0,+∞) (λt − η∇λ Lt (xt , λt )) (48)
10: end for

E Proof of Theorem 2

We restate our theorem as follows:

d
Theorem 4. Let Assumptions 1 2 3 4 be satisfied. Let U = max{GF , C}. Choosing η = U √ T
and
2
δ = 4β . Let xt , t ∈ [T ] be the sequence of solutions obtained by Algorithm 2. When T is sufficiently
2 2
large, i.e., T ≥ 32dU 2β , we have the following (1 − 1/e)-regret and constraint violation bounds with
probability at least 1 − ε:

T
X 5dU √ 2T
E[RT ] ≤ [(1 − 1/e)f (x∗ ) − f (xt )] ≤ T + 8dU log = O(T 1/2 )
t=1
2 ε

and
s
5dU √ 4β d U √
2
2T F1
CT ≤ T + 8dU log +( + F2 )T · + T
2 ε e U d
s s
2nT 2T
+ rQσ T log + 2 2T C 2 log
ε ε
= O(T 3/4 ).

Proof. From the update of xt , we have that for any x ∈ K,

∥xt+1 − x∥2 = ∥ΠK (xt + η ∇ e x Lbt (xt , λt )) − x∥2

e x Lbt (xt , λt ) − x∥2
≤ ∥xt + η ∇ (def. of projection)
e x Lbt (xt , λt )∥2 − 2η(x − xt )⊤ ∇
= ∥xt − x∥2 + η 2 ∥∇ e x Lbt (xt , λt ). (49)

19
Rearranging,

1 η e b
(x − xt )⊤ ∇
e x Lbt (xt , λt ) ≤ (∥xt − x∥2 − ∥xt+1 − x∥2 ) + ∥∇x Lt (xt , λt )∥2
2η 2
1 η e
= (∥xt − x∥2 − ∥xt+1 − x∥2 ) + ∥∇Ft (xt ) − λt ∇e gt (xt )∥2
2η 2
(from (45))
1
≤ (∥xt − x∥2 − ∥xt+1 − x∥2 ) + η∥∇F
e t (xt )∥2 + ηλ2t ∥∇e
gt (xt )∥2
2η
(∥a + b∥2 ≤ 2∥a∥2 + 2∥b∥2 )
1
≤ (∥xt − x∥2 − ∥xt+1 − x∥2 ) + ηG2F + ηβ 2 λ2t (50)
2η

where (50) follows from Assumption 4 and Assumption 3. When λ is fixed, we have (taking
expectation over ft )

E[(1 − 1/e)Lt (x, λt ) − Lt (xt , λt )]

= E[E[(1 − 1/e)Lt (x, λt ) − Lt (xt , λt )|xt ]]
δη δη
= E[E[(1 − 1/e)(ft (x) − λt get (x) + λ2t ) − (ft (xt ) − λt get (xt ) + λ2t )|xt ]]
2 2
1 δη
= E[E[(1 − 1/e)ft (x) − ft (xt ) − λt (e gt (x) − get (xt ))|xt ]] + λt get (x) − λ2t )
e 2e
1 δη
≤ E[(x − xt )⊤ ∇x E[Ft (xt ) − λt (e gt (x) − get (xt ))|xt ]] + λt get (x) − λ2t (Lemma 2)
e 2e
1 δη
= E[E[(x − xt )⊤ ∇x Ft (xt ) − λt (e gt (x) − get (xt ))|xt ]] + λt get (x) − λ2t
e 2e
(linearity of expectation)
1 δη
≤ E[E[(x − xt )⊤ ∇x Ft (xt ) − λt ((x − xt )⊤ ∇e gt (xt ))|xt ]] + λt get (x) − λ2t
e 2e
gt (·) is convex so get (y) − get (x) ≥ ⟨∇e
(e gt (x), y − x⟩)
1 δη
= E[E[(x − xt )⊤ ∇x (Ft (xt ) − λt get (xt ))|xt ]] + λt get (x) − λ2t
e 2e
⊤ δη 2 1 δη
= E[E[(x − xt ) ∇x (Ft (xt ) − λt get (xt ) + λt )|xt ]] + λt get (x) − λ2t
2 e 2e
⊤e b 1 δη 2
= E[E[(x − xt ) ∇x Lt (xt , λt )|xt ]] + λt get (x) − λt
e 2e
1 1 δη
≤ E[∥xt − x∥ − ∥xt+1 − x∥ ] + GF η + ηβ λt + λt get (x) − λ2t ,
2 2 2 2 2
(51)
2η e 2e

where (51) follows from (50). From the update (48) of λt , we have

∥λt+1 − λ∥2 = ∥Π[0,+∞) (λt − η∇λ Lt (xt , λt )) − λ∥2 (from (48))

2
≤ ∥λt − η∇λ Lt (xt , λt ) − λ∥ (def. of projection)
2 2 2 ⊤
≤ ∥λt − λ∥ + η ∥∇λ Lt (xt , λt )∥ + 2η(λ − λt ) ∇λ Lt (xt , λt ), (52)

20
where (52) multiplies through. Rearranging,
1 η
(λ − λt )⊤ ∇λ Lt (xt , λt ) ≥ − (∥λt − λ∥2 − ∥λt+1 − λ∥2 ) − ∥∇λ Lt (xt , λt )∥2
2η 2
1 η
= − (∥λt − λ∥2 − ∥λt+1 − λ∥2 ) − ∥ − get (xt ) + δηλt ∥2
2η 2
(from (46))
1 η
= − (∥λt − λ∥2 − ∥λt+1 − λ∥2 ) − ∥ − gbt (xt ) − γt + δηλt ∥2
2η 2
1
≥ − (∥λt − λ∥2 − ∥λt+1 − λ∥2 ) − η∥b gt (xt )∥2 − 2ηγt2 − 2δ 2 η 3 λ2t
2η
(apply ∥a + b∥2 ≤ 2∥a∥2 + 2∥b∥2 twice)
1
≥ − (∥λt − λ∥2 − ∥λt+1 − λ∥2 ) − C 2 η − 2ηγt2 − 2δ 2 η 3 λ2t , (53)
2η
where the last inequality follows from the definition of C := maxp′ ∼Dp maxx∈K |⟨p′ , x⟩ − BTT | and
pt , x⟩ − BTT . From convexity of function Lt (x, λ) w.r.t λ, we have
gbt (x) := ⟨b
Lt (xt , λ) − Lt (xt , λt ) ≥ (λ − λt )⊤ ∇λ Lt (xt , λt )
1
≥ − (∥λt − λ∥2 − ∥λt+1 − λ∥2 ) − C 2 η − 2ηγt2 − 2δ 2 η 3 λ2t (54)
2η
where (54) follows from (53).
Subtracting (54) from (51), we get
1 1
E[(1 − 1/e)Lt (x, λt ) − Lt (xt , λ)] ≤ E[∥xt − x∥2 − ∥xt+1 − x∥2 ] + (∥λt − λ∥2 − ∥λt+1 − λ∥2 )
2η 2η
1 δη
+ G2F η + ηβ 2 λ2t + C 2 η + 2ηγt2 + 2δ 2 η 3 λ2t + λt get (x) − λ2t .
e 2e
(55)
Summing (55) for t ∈ [T ], we have
T
X
E[(1 − 1/e)Lt (x, λt ) − Lt (xt , λ)]
t=1
T T
1 X 1 X
≤ E[∥xt − x∥2 − ∥xt+1 − x∥2 ] + (∥λt − λ∥2 − ∥λt+1 − λ∥2 )
2η t=1 2η t=1
T T T T T
X X X 1X δη X 2
+ G2F ηT + ηβ 2 λ2t + C 2 ηT + 2η γt2 + 2δ 2 η 3 λ2t + λt get (x) − λ
t=1 t=1 t=1
e t=1 2e t=1 t
(56)
1 1
≤ E[∥x1 − x∥2 − ∥xT +1 − x∥2 ] + (∥λ1 − λ∥2 − ∥λT +1 − λ∥2 )
2η 2η
T T T T T
X X X 1X δη X 2
+ G2F ηT + ηβ 2 λ2t + C 2 ηT + 2η γt2 + 2δ 2 η 3 λ2t + λt get (x) − λ
t=1 t=1 t=1
e t=1 2e t=1 t
(57)
1 1
≤ E[∥x1 − xT +1 ∥2 ] + (∥λ1 − λ∥2 − ∥λT +1 − λ∥2 )
2η 2η
T T T T T
2 2
X
2 2
X
2 2 3
X
2 1X δη X 2
+ GF ηT + ηβ λt + C ηT + 2η γt + 2δ η λt + λt get (x) − λ
t=1 t=1 t=1
e t=1 2e t=1 t
(triangle inequality)
T T T T T
d2 λ2 X X X 1X δη X 2
≤ + + G2F ηT + ηβ 2 λ2t + C 2 ηT + 2η γt2 + 2δ 2 η 3 λ2t + λt get (x) − λ .
2η 2η t=1 t=1 t=1
e t=1 2e t=1 t
(58)

21
where (58) uses Assumption 1 with def. of diameter d,λ1 = 0, and −∥ · ∥2 ≤ 0. Expanding the left
hand side of (58), we deduce

T T T
X X X δηλ2t δηλ2
[(1 − 1/e)f (x) − f (xt )] + gt (xt ) − (1 − 1/e)λt get (x)] +
[λe [(1 − 1/e) − ]
t=1 t=1 t=1
2 2
2 2 T T T T T
d λ X X X 1X δη X 2
≤ + + G2F ηT + ηβ 2 λ2t + C 2 ηT + 2η γt2 + 2δ 2 η 3 λ2t + λt get (x) − λ .
2η 2η t=1 t=1 t=1
e t=1 2e t=1 t
(59)
Rearranging, we have
T
( T )
X X δηT 1 2
[(1 − 1/e)f (x) − f (xt )] + λ get (xt ) − + λ
t=1 t=1
2 2η
T T T
d2
X
X
2 δ 2 2
X
≤ λt get (x) + η β + 2δ η − λ2t + + G2F ηT + C 2 ηT + 2η γt2 . (60)
t=1
2 t=1
2η t=1

Set x = x∗ ; From Lemma 4, with probability at least 1 − Tε , get (x∗ ) = gbt (x∗ ) − γt ≤ g(x∗ )
holds. since x∗ satisfies the long term constraint, we have g(x∗ ) ≤ 0. Now we choose δ such that
β 2 + 2δ 2 η 2 − 2δ ≤ 0. This is a quadratic term w.r.t. δ. It is easy to verify that the quadratic formula
√
has real roots when η ≤ 8β2 . As we choose η = U √ d
T
, this will give us the condition that T needs to
32d2 β 2
be sufficiently large, i.e., T ≥ U2 . We can simply choose δ = 4β 2 .
Applying both of those inequalities to (60) and by union bound, we get with probability at least 1 − ε,
T
( T ) T
d2

X X δηT 1 X
[(1 − 1/e)f (x∗ ) − f (xt )]+ λ get (xt ) − + λ2 ≤ + G2F ηT + C 2 ηT + 2η γt2 .
t=1 t=1
2 2η 2η t=1
(61)
Maximizing the LHS of (61) with respect to λ, over the range [0, +∞), we get a solution of
[ Tt=1 get (xt )]+
P
λ = δηT /2+1/2η . Plugging this into (61) gives us
hP i2
T
T
X t=1 g
et (xt ) d2 XT
+
[(1 − 1/e)f (x∗ ) − f (xt )] + ≤ + G2F ηT + C 2 ηT + 2η γt2 . (62)
t=1
δηT /2 + 1/2η 2η t=1
d
Let U = max{GF , C}. Choosing η = U
√
T
, we have with probability at least 1 − ε,
hP i2
T
t=1 g
et (xt )
T
X +
[(1 − 1/e)f (x∗ ) − f (xt )] +
t=1
δηT /2 + 1/2η
T
d max{GF , C} √ G2F d √ C 2d √ 4 max{GF , C}d log 2T
ε
X 1
≤ T+ T+ T+ √
2 max{GF , C} max{GF , C} T t=1
t
T
d max{GF , C} √ √ √ 4 max{GF , C}d log 2T
ε
X 1
≤ T + max{GF , C}d T + max{GF , C}d T + √
2 T t=1
t
T
5dU √ 4dU log 2T X 1
≤ T+ √ ε √ (63)
2 T t=1
t
5dU √ 2T
= T + 8dU log (64)
2 ε
1/2
= O(T ), (65)

22
and dropping the second term on the LHS gives us the desired (1 − 1/e)-regret bound.
Next, we establish our constraint violation bound. Since F1 := maxx∈K |f (x)| and F2 :=
maxx,y∈K |f (x) − f (y)|, we have
1 F1
|(1 − 1/e)f (x∗ ) − f (xt )| ≤ |f (x∗ ) − f (xt )| + |f (xt )| ≤ + F2 , (66)
e e
PT
thus, t=1 [(1 − 1/e)f (x∗ ) − f (xt )] ≥ −( Fe1 + F2 )T . Plugging back in (64), we have
hP i2
T
t=1 g
et (xt ) 5dU √ 2T F1
+
≤ T + 8dU log +( + F2 )T. (67)
δηT + 1/η 2 ε e
Rearranging and plug in the value of η, we have
" T # s
5dU √ 4β d U √
2
X 2T F1
ge( xt ) ≤ T + 8dU log +( + F2 )T · + T. (68)
t=1
2 ε e U d
+

Combining (68) with Lemma 5, we get

s
T T
5dU √ √
2
2T F1 4β d U X X
CT ≤ T + 8dU log +( + F2 )T · + T +r ∥b
pt − p∥ + γt
2 ε e U d t=1 t=1
s
5dU √ √
2
2T F1 4β d U
≤ T + 8dU log +( + F2 )T · + T
2 ε e U d
s X T
2nT
+ rQσ T log + γt (using Lemma 3)
ε t=1
s
5dU √ √
2
2T F1 4β d U
≤ T + 8dU log +( + F2 )T · + T
2 ε e U d
s s
2nT 2
2T
+ rQσ T log + 2 2T C log
ε ε
= O(T 3/4 ). (69)
where in the last inequality we plug in the definition of γt . This concludes the proof.

F More Related Works

We discuss more related works in this section.
Adversarial Constraints It is worth noting a line of research that emphasizes adversarial constraints.
This setting was initially explored in [21], where a simple counterexample highlighted that achieving
sub-linear regret against the best fixed benchmark action in hindsight while maintaining sub-linear
total constraint violation, is not always possible. Subsequent works addressed this challenge by
introducing additional assumptions to the problem setting to derive meaningful results. Specifically,
these works not only required the fixed benchmark action to satisfy the long-term constraint but also
imposed the restriction that the benchmark satisfies the constraint proportionally over any window of
size W ∈ [T ] [18, 30, 31]. Due to these added assumptions, direct comparisons with our work are
not applicable.
Online Monotone Submodular Maximization In the discrete domain, online monotone submodular
set maximization was first studied in [33] in the adversarial setting where they introduced the
meta-action technique. More recently, [24] proposed novel algorithms utilizing the Blackwell
Approachability framework, showcasing improved regrets (w.r.t. constant terms) in both semi-
bandit and bandit feedback settings. As for stochastic submodular maximization in the discrete

23
domain, [19] investigated the case of semi-bandit feedback, specifically in the form of marginal gains.
Additionally, recent works such as [25, 26] have delved into the full-bandit feedback setting. For
continuous domains, [9] first investigate the online (stochastic) gradient ascent (OGA) with a 12 -regret
of O(T 1/2 ). Then, inspired by the meta actions [33], [9] also proposed Frank-Wolfe type algorithm
with a (1 − 1/e)-regret of O(T 1/2 ) when exact gradient is available. When only stochastic gradient
is available, [7] proposed a variant of Frank-Wolfe algorithm achieving (1 − 1/e)-regret of O(T 1/2 ),
but requires O(T 3/2 ) stochastic gradient queries in each time step. In the effort of reducing gradient
queries, [41] achieves (1 − 1/e)-regret of O(T 4/5 ) with only one stochastic gradient evaluation each
round. Recently, [42] have proposed an auxiliary function to boost the approximation ratio of the
online gradient ascent algorithms from 21 to 1 − 1/e.

G Query Set and Constraint Set

Let K ⊂ Rd+ denote a convex set, with K′ defined as the convex hull of K ∪ {0}. For problems
involving monotone functions over a general set K, the optimal solution in K′ is the same as that in K
due to monotonicity. However, addressing this extended problem domain within K′ may necessitate
evaluating the function across the larger set, which may not always be feasible. In the literature of
stochastic DR-submodular maximization without long-term constraints, various algorithms have
been developed to accommodate different constraints (general convex set, convex set containing
0, downward-closed set, etc.). Notably, in scenarios where queries are restricted solely to the
constraint set, no algorithm—neither online nor offline—has demonstrated an ability to surpass a 1/2
approximation ratio. For an exhaustive discussion on this matter, we defer to [28].

H Application Examples: Origin not Included in Constraint Set

We will discuss the following example motivating applications to highlight the importance of gen-
eral convex regions K for which the origin is not feasible. For such problems, the best-known
approximation ratio is 1/2 [12].

1. Maximum Coverage [17] Imagine a city’s emergency management agency aiming to

optimize the deployment of emergency response teams (firefighters, medical personnel,
rescue teams) in many rounds during crises like major fires or earthquakes. The objective
is to maximize coverage across different city zones. Allocating 0 teams to any zone isn’t
feasible, as it means no emergency response. The goal is to allocate resources in each round
to maximize overall response coverage while satisfying the long-term agency budget.
2. Budget Allocation [32] Let a bipartite graph G = (S, T ; W ) represent a social network,
where S and T are collections of advertising channels and customers, respectively. The
edge weight represents the influence probability of channel s to customer t. The goal is
to distribute the per-round budget among the source nodes, and to maximize the expected
influence on the potential customers over multiple rounds, while satisfying a long-term
constraint (e.g., money spent). Corporate management may require a minimum level of
customer engagement with each campaign overall or from select target groups. There may
also be per-round minimum contractual purchase requirements with the advertising partners.
Thus, allocating 0 budget in any round may not be permitted.
3. Facility location [17] Consider a scenario where a company needs to decide on the locations
(virtual) to set up new service centers to maximize service coverage over multiple rounds
while satisfying a total budget constraint over all rounds. At each round, each customer
segment must receive at least a certain level of service or coverage, which means 0 is not a
feasible solution because no facilities can provide no service.

All the problems above were initially studied in the discrete domain and extended to the continuous
domain in [6]. Furthermore, when faced with a discrete objective, one can always use the "relax
and rounding" strategy to transition from addressing a discrete problem to tackling a continuous one.
Such techniques are widely frequently utilized within the submodular maximization community, as
exemplified by the work of [8].

24
NeurIPS Paper Checklist
1. Claims
Question: Do the main claims made in the abstract and introduction accurately reflect the
paper’s contributions and scope?
Answer: [Yes]
Justification: The abstract and introduction clearly state the paper’s contribution and scope.
Guidelines:
• The answer NA means that the abstract and introduction do not include the claims
made in the paper.
• The abstract and/or introduction should clearly state the claims made, including the
contributions made in the paper and important assumptions and limitations. A No or
NA answer to this question will not be perceived well by the reviewers.
• The claims made should match theoretical and experimental results, and reflect how
much the results can be expected to generalize to other settings.
• It is fine to include aspirational goals as motivation as long as it is clear that these goals
are not attained by the paper.
2. Limitations
Question: Does the paper discuss the limitations of the work performed by the authors?
Answer: [Yes]
Justification: Some limitations of the paper have been discussed in the conclusion section.
Assumptions form another part of the limitations.
Guidelines:
• The answer NA means that the paper has no limitation while the answer No means that
the paper has limitations, but those are not discussed in the paper.
• The authors are encouraged to create a separate "Limitations" section in their paper.
• The paper should point out any strong assumptions and how robust the results are to
violations of these assumptions (e.g., independence assumptions, noiseless settings,
model well-specification, asymptotic approximations only holding locally). The authors
should reflect on how these assumptions might be violated in practice and what the
implications would be.
• The authors should reflect on the scope of the claims made, e.g., if the approach was
only tested on a few datasets or with a few runs. In general, empirical results often
depend on implicit assumptions, which should be articulated.
• The authors should reflect on the factors that influence the performance of the approach.
For example, a facial recognition algorithm may perform poorly when image resolution
is low or images are taken in low lighting. Or a speech-to-text system might not be
used reliably to provide closed captions for online lectures because it fails to handle
technical jargon.
• The authors should discuss the computational efficiency of the proposed algorithms
and how they scale with dataset size.
• If applicable, the authors should discuss possible limitations of their approach to
address problems of privacy and fairness.
• While the authors might fear that complete honesty about limitations might be used by
reviewers as grounds for rejection, a worse outcome might be that reviewers discover
limitations that aren’t acknowledged in the paper. The authors should use their best
judgment and recognize that individual actions in favor of transparency play an impor-
tant role in developing norms that preserve the integrity of the community. Reviewers
will be specifically instructed to not penalize honesty concerning limitations.
3. Theory Assumptions and Proofs
Question: For each theoretical result, does the paper provide the full set of assumptions and
a complete (and correct) proof?
Answer: [Yes]

25
Justification: We have clearly stated the required assumptions and an accompanying com-
plete proof in the appendix for each theory result.
Guidelines:
• The answer NA means that the paper does not include theoretical results.
• All the theorems, formulas, and proofs in the paper should be numbered and cross-
referenced.
• All assumptions should be clearly stated or referenced in the statement of any theorems.
• The proofs can either appear in the main paper or the supplemental material, but if
they appear in the supplemental material, the authors are encouraged to provide a short
proof sketch to provide intuition.
• Inversely, any informal proof provided in the core of the paper should be complemented
by formal proofs provided in appendix or supplemental material.
• Theorems and Lemmas that the proof relies upon should be properly referenced.
4. Experimental Result Reproducibility
Question: Does the paper fully disclose all the information needed to reproduce the main ex-
perimental results of the paper to the extent that it affects the main claims and/or conclusions
of the paper (regardless of whether the code and data are provided or not)?
Answer: [NA]
Justification: Our paper is primarily of theoretical nature and does not include experiments.
Guidelines:
• The answer NA means that the paper does not include experiments.
• If the paper includes experiments, a No answer to this question will not be perceived
well by the reviewers: Making the paper reproducible is important, regardless of
whether the code and data are provided or not.
• If the contribution is a dataset and/or model, the authors should describe the steps taken
to make their results reproducible or verifiable.
• Depending on the contribution, reproducibility can be accomplished in various ways.
For example, if the contribution is a novel architecture, describing the architecture fully
might suffice, or if the contribution is a specific model and empirical evaluation, it may
be necessary to either make it possible for others to replicate the model with the same
dataset, or provide access to the model. In general. releasing code and data is often
one good way to accomplish this, but reproducibility can also be provided via detailed
instructions for how to replicate the results, access to a hosted model (e.g., in the case
of a large language model), releasing of a model checkpoint, or other means that are
appropriate to the research performed.
• While NeurIPS does not require releasing code, the conference does require all submis-
sions to provide some reasonable avenue for reproducibility, which may depend on the
nature of the contribution. For example
(a) If the contribution is primarily a new algorithm, the paper should make it clear how
to reproduce that algorithm.
(b) If the contribution is primarily a new model architecture, the paper should describe
the architecture clearly and fully.
(c) If the contribution is a new model (e.g., a large language model), then there should
either be a way to access this model for reproducing the results or a way to reproduce
the model (e.g., with an open-source dataset or instructions for how to construct
the dataset).
(d) We recognize that reproducibility may be tricky in some cases, in which case
authors are welcome to describe the particular way they provide for reproducibility.
In the case of closed-source models, it may be that access to the model is limited in
some way (e.g., to registered users), but it should be possible for other researchers
to have some path to reproducing or verifying the results.
5. Open access to data and code
Question: Does the paper provide open access to the data and code, with sufficient instruc-
tions to faithfully reproduce the main experimental results, as described in supplemental
material?

26
Answer: [NA]
Justification: Our paper is primarily of theoretical nature and does not include experiments.
Guidelines:
• The answer NA means that paper does not include experiments requiring code.
• Please see the NeurIPS code and data submission guidelines (https://nips.cc/
public/guides/CodeSubmissionPolicy) for more details.
• While we encourage the release of code and data, we understand that this might not be
possible, so “No” is an acceptable answer. Papers cannot be rejected simply for not
including code, unless this is central to the contribution (e.g., for a new open-source
benchmark).
• The instructions should contain the exact command and environment needed to run to
reproduce the results. See the NeurIPS code and data submission guidelines (https:
//nips.cc/public/guides/CodeSubmissionPolicy) for more details.
• The authors should provide instructions on data access and preparation, including how
to access the raw data, preprocessed data, intermediate data, and generated data, etc.
• The authors should provide scripts to reproduce all experimental results for the new
proposed method and baselines. If only a subset of experiments are reproducible, they
should state which ones are omitted from the script and why.
• At submission time, to preserve anonymity, the authors should release anonymized
versions (if applicable).
• Providing as much information as possible in supplemental material (appended to the
paper) is recommended, but including URLs to data and code is permitted.
6. Experimental Setting/Details
Question: Does the paper specify all the training and test details (e.g., data splits, hyper-
parameters, how they were chosen, type of optimizer, etc.) necessary to understand the
results?
Answer: [NA]
Justification: Our paper is primarily of theoretical nature and does not include experiments.
Guidelines:
• The answer NA means that the paper does not include experiments.
• The experimental setting should be presented in the core of the paper to a level of detail
that is necessary to appreciate the results and make sense of them.
• The full details can be provided either with the code, in appendix, or as supplemental
material.
7. Experiment Statistical Significance
Question: Does the paper report error bars suitably and correctly defined or other appropriate
information about the statistical significance of the experiments?
Answer: [NA]
Justification: Our paper is primarily of theoretical nature and does not include experiments.
Guidelines:
• The answer NA means that the paper does not include experiments.
• The authors should answer "Yes" if the results are accompanied by error bars, confi-
dence intervals, or statistical significance tests, at least for the experiments that support
the main claims of the paper.
• The factors of variability that the error bars are capturing should be clearly stated (for
example, train/test split, initialization, random drawing of some parameter, or overall
run with given experimental conditions).
• The method for calculating the error bars should be explained (closed form formula,
call to a library function, bootstrap, etc.)
• The assumptions made should be given (e.g., Normally distributed errors).
• It should be clear whether the error bar is the standard deviation or the standard error
of the mean.

27
• It is OK to report 1-sigma error bars, but one should state it. The authors should
preferably report a 2-sigma error bar than state that they have a 96% CI, if the hypothesis
of Normality of errors is not verified.
• For asymmetric distributions, the authors should be careful not to show in tables or
figures symmetric error bars that would yield results that are out of range (e.g. negative
error rates).
• If error bars are reported in tables or plots, The authors should explain in the text how
they were calculated and reference the corresponding figures or tables in the text.
8. Experiments Compute Resources
Question: For each experiment, does the paper provide sufficient information on the com-
puter resources (type of compute workers, memory, time of execution) needed to reproduce
the experiments?
Answer: [NA]
Justification: Our paper is primarily of theoretical nature and does not include experiments.
Guidelines:
• The answer NA means that the paper does not include experiments.
• The paper should indicate the type of compute workers CPU or GPU, internal cluster,
or cloud provider, including relevant memory and storage.
• The paper should provide the amount of compute required for each of the individual
experimental runs as well as estimate the total compute.
• The paper should disclose whether the full research project required more compute
than the experiments reported in the paper (e.g., preliminary or failed experiments that
didn’t make it into the paper).
9. Code Of Ethics
Question: Does the research conducted in the paper conform, in every respect, with the
NeurIPS Code of Ethics https://neurips.cc/public/EthicsGuidelines?
Answer: [Yes]
Justification: Our research conforms, in every respect, to the NeurIPS Code of Ethics.
Guidelines:
• The answer NA means that the authors have not reviewed the NeurIPS Code of Ethics.
• If the authors answer No, they should explain the special circumstances that require a
deviation from the Code of Ethics.
• The authors should make sure to preserve anonymity (e.g., if there is a special consid-
eration due to laws or regulations in their jurisdiction).
10. Broader Impacts
Question: Does the paper discuss both potential positive societal impacts and negative
societal impacts of the work performed?
Answer: [Yes]
Justification: Our work is primarily of theoretical nature and has no immediate societal
impact.
Guidelines:
• The answer NA means that there is no societal impact of the work performed.
• If the authors answer NA or No, they should explain why their work has no societal
impact or why the paper does not address societal impact.
• Examples of negative societal impacts include potential malicious or unintended uses
(e.g., disinformation, generating fake profiles, surveillance), fairness considerations
(e.g., deployment of technologies that could make decisions that unfairly impact specific
groups), privacy considerations, and security considerations.

28
• The conference expects that many papers will be foundational research and not tied
to particular applications, let alone deployments. However, if there is a direct path to
any negative applications, the authors should point it out. For example, it is legitimate
to point out that an improvement in the quality of generative models could be used to
generate deepfakes for disinformation. On the other hand, it is not needed to point out
that a generic algorithm for optimizing neural networks could enable people to train
models that generate Deepfakes faster.
• The authors should consider possible harms that could arise when the technology is
being used as intended and functioning correctly, harms that could arise when the
technology is being used as intended but gives incorrect results, and harms following
from (intentional or unintentional) misuse of the technology.
• If there are negative societal impacts, the authors could also discuss possible mitigation
strategies (e.g., gated release of models, providing defenses in addition to attacks,
mechanisms for monitoring misuse, mechanisms to monitor how a system learns from
feedback over time, improving the efficiency and accessibility of ML).
11. Safeguards
Question: Does the paper describe safeguards that have been put in place for responsible
release of data or models that have a high risk for misuse (e.g., pretrained language models,
image generators, or scraped datasets)?
Answer: [NA]
Justification: No high risk data or model have been used.
Guidelines:
• The answer NA means that the paper poses no such risks.
• Released models that have a high risk for misuse or dual-use should be released with
necessary safeguards to allow for controlled use of the model, for example by requiring
that users adhere to usage guidelines or restrictions to access the model or implementing
safety filters.
• Datasets that have been scraped from the Internet could pose safety risks. The authors
should describe how they avoided releasing unsafe images.
• We recognize that providing effective safeguards is challenging, and many papers do
not require this, but we encourage authors to take this into account and make a best
faith effort.
12. Licenses for existing assets
Question: Are the creators or original owners of assets (e.g., code, data, models), used in
the paper, properly credited and are the license and terms of use explicitly mentioned and
properly respected?
Answer: [NA]
Justification: No existing asset has been used in the paper.
Guidelines:
• The answer NA means that the paper does not use existing assets.
• The authors should cite the original paper that produced the code package or dataset.
• The authors should state which version of the asset is used and, if possible, include a
URL.
• The name of the license (e.g., CC-BY 4.0) should be included for each asset.
• For scraped data from a particular source (e.g., website), the copyright and terms of
service of that source should be provided.
• If assets are released, the license, copyright information, and terms of use in the
package should be provided. For popular datasets, paperswithcode.com/datasets
has curated licenses for some datasets. Their licensing guide can help determine the
license of a dataset.
• For existing datasets that are re-packaged, both the original license and the license of
the derived asset (if it has changed) should be provided.

29
• If this information is not available online, the authors are encouraged to reach out to
the asset’s creators.
13. New Assets
Question: Are new assets introduced in the paper well documented and is the documentation
provided alongside the assets?
Answer: [NA]
Justification: No new asset is introduced in the paper.
Guidelines:
• The answer NA means that the paper does not release new assets.
• Researchers should communicate the details of the dataset/code/model as part of their
submissions via structured templates. This includes details about training, license,
limitations, etc.
• The paper should discuss whether and how consent was obtained from people whose
asset is used.
• At submission time, remember to anonymize your assets (if applicable). You can either
create an anonymized URL or include an anonymized zip file.
14. Crowdsourcing and Research with Human Subjects
Question: For crowdsourcing experiments and research with human subjects, does the paper
include the full text of instructions given to participants and screenshots, if applicable, as
well as details about compensation (if any)?
Answer: [NA]
Justification: No experiments with human subjects were conducted.
Guidelines:
• The answer NA means that the paper does not involve crowdsourcing nor research with
human subjects.
• Including this information in the supplemental material is fine, but if the main contribu-
tion of the paper involves human subjects, then as much detail as possible should be
included in the main paper.
• According to the NeurIPS Code of Ethics, workers involved in data collection, curation,
or other labor should be paid at least the minimum wage in the country of the data
collector.
15. Institutional Review Board (IRB) Approvals or Equivalent for Research with Human
Subjects
Question: Does the paper describe potential risks incurred by study participants, whether
such risks were disclosed to the subjects, and whether Institutional Review Board (IRB)
approvals (or an equivalent approval/review based on the requirements of your country or
institution) were obtained?
Answer: [NA]
Justification: We conducted no experiments with human subjects.
Guidelines:
• The answer NA means that the paper does not involve crowdsourcing nor research with
human subjects.
• Depending on the country in which research is conducted, IRB approval (or equivalent)
may be required for any human subjects research. If you obtained IRB approval, you
should clearly state this in the paper.
• We recognize that the procedures for this may vary significantly between institutions
and locations, and we expect authors to adhere to the NeurIPS Code of Ethics and the
guidelines for their institution.
• For initial submissions, do not include any information that would break anonymity (if
applicable), such as the institution conducting the review.

Heaven Official's Blessing Vol 5
100% (7)
Heaven Official's Blessing Vol 5
465 pages
Optimization in Operations Research 2nd Edition Rardin Solutions Manual Download
100% (23)
Optimization in Operations Research 2nd Edition Rardin Solutions Manual Download
18 pages
AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions
0% (1)
AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions
4 pages
Labor Law Practice in ACI LTD
0% (1)
Labor Law Practice in ACI LTD
12 pages
Submod Oco
No ratings yet
Submod Oco
52 pages
Submod Value
No ratings yet
Submod Value
8 pages
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Unconstrained Online Learning With Unbounded Losses
No ratings yet
Unconstrained Online Learning With Unbounded Losses
41 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet
Loop-shaping Robust Control
From Everand
Loop-shaping Robust Control
Philippe Feyel
No ratings yet
Computational Geometry: Exploring Geometric Insights for Computer Vision
From Everand
Computational Geometry: Exploring Geometric Insights for Computer Vision
Fouad Sabry
No ratings yet
NIPS 1987 Constrained Differential Optimization Paper
No ratings yet
NIPS 1987 Constrained Differential Optimization Paper
10 pages
NIPS 2012 A Unifying Perspective of Parametric Policy Search Methods For Markov Decision Processes Paper
No ratings yet
NIPS 2012 A Unifying Perspective of Parametric Policy Search Methods For Markov Decision Processes Paper
9 pages
Markov Decision Process: Fundamentals and Applications
From Everand
Markov Decision Process: Fundamentals and Applications
Fouad Sabry
No ratings yet
Final 10
No ratings yet
Final 10
9 pages
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
NeurIPS 2022 Efficient Methods For Non Stationary Online Learning Paper Conference
No ratings yet
NeurIPS 2022 Efficient Methods For Non Stationary Online Learning Paper Conference
13 pages
Mathematical Optimization: Fundamentals and Applications
From Everand
Mathematical Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Mavrin 19 A
No ratings yet
Mavrin 19 A
11 pages
Non-Monotone Submodular Maximization Under Matroid and Knapsack Constraints
No ratings yet
Non-Monotone Submodular Maximization Under Matroid and Knapsack Constraints
20 pages
Inapproximability Results For Combinatorial Auctions With Submodular Utility Functions
No ratings yet
Inapproximability Results For Combinatorial Auctions With Submodular Utility Functions
16 pages
Week 6: Sensitive Analysis
No ratings yet
Week 6: Sensitive Analysis
16 pages
Advait Sood 24bey10027 Aiml Module 2
No ratings yet
Advait Sood 24bey10027 Aiml Module 2
8 pages
Stock Price Prediction Using Reinforcement Learning
No ratings yet
Stock Price Prediction Using Reinforcement Learning
6 pages
Satplan: Fundamentals and Applications
From Everand
Satplan: Fundamentals and Applications
Fouad Sabry
No ratings yet
R-Learning Model Update
No ratings yet
R-Learning Model Update
13 pages
High-Dimensional Continuous Control Using Generalized Advantage Estimation-1506.02438v5
No ratings yet
High-Dimensional Continuous Control Using Generalized Advantage Estimation-1506.02438v5
14 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
RL Mid-1 Bit Bank
No ratings yet
RL Mid-1 Bit Bank
10 pages
Simulation-Based Optimization Parametric Optimizat
100% (1)
Simulation-Based Optimization Parametric Optimizat
11 pages
Adaptive Critic Applied To Constraint Optimization
No ratings yet
Adaptive Critic Applied To Constraint Optimization
62 pages
Collision Detection: Understanding Visual Intersections in Computer Vision
From Everand
Collision Detection: Understanding Visual Intersections in Computer Vision
Fouad Sabry
No ratings yet
SSRN Id3489355
No ratings yet
SSRN Id3489355
37 pages
Chapter 4 Sensitivity Analysis and Duality
No ratings yet
Chapter 4 Sensitivity Analysis and Duality
49 pages
W Pg#s
No ratings yet
W Pg#s
85 pages
Reinforcement Learning in A Nutshell
No ratings yet
Reinforcement Learning in A Nutshell
12 pages
Automatic, Dynamic, and Nearly Optimal Learning Rate Specification by Local Quadratic Approximation
No ratings yet
Automatic, Dynamic, and Nearly Optimal Learning Rate Specification by Local Quadratic Approximation
10 pages
Exploration, Exploitation, and Engagement in Multi-Armed Bandits With Abandonment
No ratings yet
Exploration, Exploitation, and Engagement in Multi-Armed Bandits With Abandonment
55 pages
10795-Article Text-14323-1-2-20201228part2
No ratings yet
10795-Article Text-14323-1-2-20201228part2
9 pages
Notes
No ratings yet
Notes
6 pages
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Complete PF
No ratings yet
Complete PF
70 pages
GDD Nonlinear NIPS 2009 Convergent Temporal Difference Learning With Arbitrary Smooth Function Approximation
No ratings yet
GDD Nonlinear NIPS 2009 Convergent Temporal Difference Learning With Arbitrary Smooth Function Approximation
9 pages
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
Dynamic Programming and Optimal Control
No ratings yet
Dynamic Programming and Optimal Control
62 pages
Contextual Online Decision Making With Infinite-Dimensional Functional Regression
No ratings yet
Contextual Online Decision Making With Infinite-Dimensional Functional Regression
30 pages
Bits
No ratings yet
Bits
5 pages
Fawad Linear Prog
No ratings yet
Fawad Linear Prog
110 pages
Azar 17 A
No ratings yet
Azar 17 A
10 pages
Level Set Method: Advancing Computer Vision, Exploring the Level Set Method
From Everand
Level Set Method: Advancing Computer Vision, Exploring the Level Set Method
Fouad Sabry
No ratings yet
Exploration Versus Exploitation in Reinforcement Learning: A Stochastic Control Approach
No ratings yet
Exploration Versus Exploitation in Reinforcement Learning: A Stochastic Control Approach
33 pages
Optimization, Learning, and Games With Predictable Sequences
No ratings yet
Optimization, Learning, and Games With Predictable Sequences
22 pages
Optical Flow: Exploring Dynamic Visual Patterns in Computer Vision
From Everand
Optical Flow: Exploring Dynamic Visual Patterns in Computer Vision
Fouad Sabry
No ratings yet
Reinforcement Learning For Short Video Recommender Systems
No ratings yet
Reinforcement Learning For Short Video Recommender Systems
34 pages
Robust Algorithms For Online Convex Problems Via Primal-Dual
No ratings yet
Robust Algorithms For Online Convex Problems Via Primal-Dual
22 pages
Lecture 4 Control
No ratings yet
Lecture 4 Control
23 pages
Quiz2 Sol
No ratings yet
Quiz2 Sol
4 pages
LP sensitivity-OR
No ratings yet
LP sensitivity-OR
13 pages
Global Adaptive Dynamic Programming For Continuous-Time Nonlinear Systems
No ratings yet
Global Adaptive Dynamic Programming For Continuous-Time Nonlinear Systems
13 pages
Approximate Dynamic Programming and Reinforcement Learning - Algorithms, Analysis and An Application
No ratings yet
Approximate Dynamic Programming and Reinforcement Learning - Algorithms, Analysis and An Application
139 pages
COCO Sinha Vaze24
No ratings yet
COCO Sinha Vaze24
22 pages
Reduction 3
No ratings yet
Reduction 3
1 page
Blan
No ratings yet
Blan
1 page
Blanka
No ratings yet
Blanka
1 page
QED Paper
No ratings yet
QED Paper
1 page
Blan
No ratings yet
Blan
1 page
2010 ISI B.Stat Entrance Exam: Contributors: Sayan, Orl
No ratings yet
2010 ISI B.Stat Entrance Exam: Contributors: Sayan, Orl
2 pages
Theory Nothingness
No ratings yet
Theory Nothingness
1 page
Social Force Model For Pedestrian Dynamics: Dirk Helbing and P Eter Moln Ar
No ratings yet
Social Force Model For Pedestrian Dynamics: Dirk Helbing and P Eter Moln Ar
18 pages
Case Studies of Cptu Exploration Techniques in Investigating Challenging Ground Conditions in Malaysia
No ratings yet
Case Studies of Cptu Exploration Techniques in Investigating Challenging Ground Conditions in Malaysia
6 pages
ASEP D 22 00341 R1 Reviewer
No ratings yet
ASEP D 22 00341 R1 Reviewer
19 pages
Bloomberg Businessweek USA - May 06 2024
No ratings yet
Bloomberg Businessweek USA - May 06 2024
80 pages
Sample Citizens Charter External Services 1
No ratings yet
Sample Citizens Charter External Services 1
14 pages
Chapter 08 1
No ratings yet
Chapter 08 1
101 pages
نحو-تحقيق-لوجستيات-التقاضي-الإلكتروني-في-المحاكم-القطرية - (دراسة-مقارنة-في-ظل-تشريعات-مجلس-التعاون-لدول-الخليج-العربية)
No ratings yet
نحو-تحقيق-لوجستيات-التقاضي-الإلكتروني-في-المحاكم-القطرية - (دراسة-مقارنة-في-ظل-تشريعات-مجلس-التعاون-لدول-الخليج-العربية)
17 pages
High Temp Shearing Blind Rams Exceed API Specs
No ratings yet
High Temp Shearing Blind Rams Exceed API Specs
2 pages
Format WPQ
No ratings yet
Format WPQ
2 pages
Appropriate Building Technology Lecture 3
No ratings yet
Appropriate Building Technology Lecture 3
14 pages
NRIIMS NMC Info
No ratings yet
NRIIMS NMC Info
40 pages
(The English Coach) Giveaway Speaking Samples 0823
No ratings yet
(The English Coach) Giveaway Speaking Samples 0823
9 pages
SBC Report ELSR at Borgaon - 09122024
No ratings yet
SBC Report ELSR at Borgaon - 09122024
20 pages
Beginners Guide To Concentrate Filtration - Outotec
No ratings yet
Beginners Guide To Concentrate Filtration - Outotec
3 pages
How Oil Drilling Works: Craig C. Freudenrich, PH.D
100% (1)
How Oil Drilling Works: Craig C. Freudenrich, PH.D
18 pages
Power Development Program (Power Supply Plan)
No ratings yet
Power Development Program (Power Supply Plan)
49 pages
Gorakhpur
No ratings yet
Gorakhpur
18 pages
07 - System of Particles and Rotational Motion
No ratings yet
07 - System of Particles and Rotational Motion
20 pages
SM025 KMJ Muafakat Set 1 (Question)
No ratings yet
SM025 KMJ Muafakat Set 1 (Question)
5 pages
Political Economy 1st Edition Sarah. Comyn 2025 Scribd Download
No ratings yet
Political Economy 1st Edition Sarah. Comyn 2025 Scribd Download
77 pages
General Condtionsfor TFTR
No ratings yet
General Condtionsfor TFTR
20 pages
Unit-1-Coal and Petroleum Processing
No ratings yet
Unit-1-Coal and Petroleum Processing
68 pages
Section 3 Code of Conduct Appeal
No ratings yet
Section 3 Code of Conduct Appeal
8 pages
Computer-Science Hand Book 3
No ratings yet
Computer-Science Hand Book 3
141 pages
CAP Regulation 173-3 - 05/03/2002
No ratings yet
CAP Regulation 173-3 - 05/03/2002
8 pages
MG236B PDF
100% (2)
MG236B PDF
3 pages
WWW Learncbse in Ncert Solutions For Class 8 Science Friction
No ratings yet
WWW Learncbse in Ncert Solutions For Class 8 Science Friction
20 pages
Act 2 3 10 Wastewater Management
No ratings yet
Act 2 3 10 Wastewater Management
3 pages

Gradient Methods For Onl

Uploaded by

Gradient Methods For Onl

Uploaded by

Gradient Methods for Online DR-Submodular

Maximization with Stochastic Long-Term Constraints

Guanyu Nie Vaneet Aggarwal Christopher John Quinn

In this paper, we consider the problem of online monotone DR-submodular maxi-

Online optimization confronts diverse challenges as information gradually unfolds, compelling

38th Conference on Neural Information Processing Systems (NeurIPS 2024).

R EFERENCE R EGION N OISE # G RAD . A PPROX . R EGRET C ON . V IOL .

3.2 Function Properties

3.3 DR-Submodular Functions

A function f : 2Ω → R+ , defined on the ground set Ω, is said to be submodular if

The proof of Lemma 2 can be found in [42].

5 An Efficient Primal-Dual Algorithm under Semi-bandit Feedback

δηT +1/η . Plugging this into (17) gives us

6 First Order Full-information Case

Before proceeding to the proof, we first show a lemma:

Then we proceed to prove Lemma 3. We restate the lemma as follows:

where Q > 0 is some universal constant.

Proof. From the definition of p

We restate the lemma as follows:

Rearranging the above inequality, we obtain the desired result.

∥xt+1 − x∥2 = ∥ΠK (xt + η ∇ e x Lbt (xt , λt )) − x∥2

E[Lt (x, λt ) − 2Lt (xt , λt )]

∥λt+1 − λ∥2 = ∥Π[0,+∞) (λt − η∇λ Lt (xt , λt )) − λ∥2 (from (8))

Combining (44) with Lemma 5, we get

7: Compute ∇Ft (xt ) = (1 − 1/e)∇ft (zt ∗ xt )

We restate our theorem as follows:

Proof. From the update of xt , we have that for any x ∈ K,

∥xt+1 − x∥2 = ∥ΠK (xt + η ∇ e x Lbt (xt , λt )) − x∥2

E[(1 − 1/e)Lt (x, λt ) − Lt (xt , λt )]

∥λt+1 − λ∥2 = ∥Π[0,+∞) (λt − η∇λ Lt (xt , λt )) − λ∥2 (from (48))

Combining (68) with Lemma 5, we get

F More Related Works

G Query Set and Constraint Set

H Application Examples: Origin not Included in Constraint Set

1. Maximum Coverage [17] Imagine a city’s emergency management agency aiming to

You might also like