Adversarial Network Optimization under Bandit Feedback:
Maximizing Utility in Non-Stationary Multi-Hop Networks

Yan Dai ORC & LIDS, MIT. Email: [email protected]. Work done when Yan was an undergraduate student at IIIS, Tsinghua.    Longbo Huang IIIS, Tsinghua. Email: [email protected].
Abstract

Stochastic Network Optimization (SNO) concerns scheduling in stochastic queueing systems. It has been widely studied in network theory. Classical SNO algorithms require network conditions to be stationary with time, which fails to capture the non-stationary components in many real-world scenarios. Many existing algorithms also assume knowledge of network conditions before decision, which rules out applications where unpredictability presents.

Motivated by these issues, we consider Adversarial Network Optimization (ANO) under bandit feedback. Specifically, we consider the task of i) maximizing some unknown and time-varying utility function associated to scheduler’s actions, where ii) the underlying network is a non-stationary multi-hop one whose conditions change arbitrarily with time, and iii) only bandit feedback (effect of actually deployed actions) is revealed after decisions. Our proposed UMO2 algorithm ensures network stability and also matches the utility maximization performance of any “mildly varying” reference policy up to a polynomially decaying gap. To our knowledge, no previous ANO algorithm handled multi-hop networks or achieved utility guarantees under bandit feedback, whereas ours can do both.

Technically, our method builds upon a novel integration of online learning into Lyapunov analyses: To handle complex inter-dependencies among queues in multi-hop networks, we propose meticulous techniques to balance online learning and Lyapunov arguments. To tackle the learning obstacles due to potentially unbounded queue sizes, we design a new online linear optimization algorithm that automatically adapts to loss magnitudes. To maximize utility, we propose a bandit convex optimization algorithm with novel queue-dependent learning rate scheduling that suites drastically varying queue lengths. Our new insights in online learning can be of independent interest.

1 Introduction

Stochastic Network Optimization (SNO) studies the fundamental problem of resource allocation in a dynamic system to fulfill incoming demands, with extensive applications in real-world problems including communication networks (Srikant and Ying, 2013), cloud computing (Maguluri et al., 2012), and supply chains (Rahdar et al., 2018). There are many classical scheduling algorithms in this field enjoying performance guarantees in terms of throughput maximization (Tsibonis et al., 2003), delay minimization (Neely, 2008), or utility maximization (Huang and Neely, 2011).

Classical SNO models often assume that the network conditions, for example, the arrival and service rates to each queue or the capacities of data links, are stationary with respect to time. However, many important network scenarios in practice face non-stationarity. For instance, in applications such as autonomous driving, parties in the communication networks can move rapidly (Ashjaei et al., 2021), causing the network conditions to vary from time to time. Even more, attacks such as Distributed Denial-of-Service (DDoS) or jamming can frequently happen in communication networks (Zou et al., 2016), where arrival rates or link conditions are altered by some malicious adversary.

Moreover, we notice that existing works, even those allowing non-stationary network conditions, assumes perfect knowledge about the network conditions. For example, in the paper by Liang and Modiano (2018b), the network condition is revealed at the beginning of each round, so the outcomes associated with each action can be accurately calculated, before actually deciding and deploying the scheduler’s action (see Section 1.1 for more discussions). Nevertheless, this may again not be the case in practice. In underwater wireless communication systems, for instance, the network conditions is unpredictable until the policy is actually executed (Khan et al., 2020). In Internet of Things (IoT), device failures or sensor temperatures can change rapidly, resulting in highly unpredictable traffic and channel patterns in the network (Gaddam et al., 2020). Therefore, it is hard to estimate counterfactual outcomes of other actions (i.e., “what will happen if we used a different action?”) even after deploying the action and obtaining more information about the network conditions, not to mention pre-decision evaluations. In a nutshell, it is important and largely open to design network algorithms that are robust to time-dependent or adversarial conditions with post-decision feedback.

Motivated by these two challenges, this paper considers optimizing an abstract utility function associated with the scheduler’s action (the so-called utility maximization task (Neely et al., 2008)) even when the network is non-stationary (which we call Adversarial Network Optimization, or ANO in short) and the feedback model is bandit style. Specifically, in ANO, the network conditions and utility functions can unknowingly vary from time to time. Therefore, statistics of the past merely infers the current network condition, which breaks many traditional SNO techniques. Moreover, under the bandit feedback model, the scheduler has no information about the current network condition before decision. Even worse, after making decisions, it also only receives feedback resulting from the chosen action – not those “counterfactual” ones associated with other actions. More formally, if an action a𝑎aitalic_a is associated with outcome Ot(a)subscript𝑂𝑡𝑎O_{t}(a)italic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a ) (for example, arrival and service rates) in round t𝑡titalic_t. Then i) the scheduler has to decide action atsubscript𝑎𝑡a_{t}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT without having any information about Ot(a)subscript𝑂𝑡𝑎O_{t}(a)italic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a ), and ii) after playing atsubscript𝑎𝑡a_{t}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, it can only observe Ot(at)subscript𝑂𝑡subscript𝑎𝑡O_{t}(a_{t})italic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) but not those Ot(a)subscript𝑂𝑡superscript𝑎O_{t}(a^{\prime})italic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT )’s for aasuperscript𝑎𝑎a^{\prime}\neq aitalic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≠ italic_a. Therefore, it is hard to evaluate the optimal action even in hindsight: Based on information collected in rounds [1,t]1𝑡[1,t][ 1 , italic_t ], one cannot accurately calculate which action had the most gain within rounds [1,t]1𝑡[1,t][ 1 , italic_t ]. Henceforth, in our model, one not only cannot predict the future, but also cannot fully interpolate the history.

In addition to the challenging ANO setup, bandit feedback model, and utility maximization task, we also allow the underlying network to be multi-hop, which means jobs can be forwarded between queues. Despite these hardness, we succeeded in designing the utility maximization algorithm of UMO2, which achieves a strong performance guarantee in non-stationary multi-hop networks under bandit feedback. It not only ensures the network is stable over time, but also proves a polynomially decaying gap between our utility and any “mildly varying” policy’s (measured by the path length; see Theorem 4.5), similar to what can be achieved in perfect-knowledge SNO problems (Neely et al., 2008).

We now highlight several technical innovations in our algorithm. While our algorithm is based on the classical Lyapunov drift-plus-penalty (DPP) analysis, the adversarial network conditions breaks existing SNO arguments, which we tackle by designing online learning algorithms enjoying dynamic regret guarantees in adversarial environments. However, due to the multi-hop topology, learning in different queues is correlated. Thus, it is highly non-trivial to decompose the problem into several online learning tasks. To this end, we develop meticulous analysis techniques that can jointly analyze the online learning algorithms and the utility maximization effects. Moreover, the queue lengths can occasionally be large despite having a bounded expectation. Such a unique challenge is missing in online learning literature, which usually assumes the loss magnitudes are uniformly bounded by a constant. Finally, yet another challenge is due to a combination of the multi-hop topology and the unbounded queue lengths, which makes the losses fed into online learning algorithms sometimes quite negative, a known issue for many online learning algorithms (Zheng et al., 2019, Dai et al., 2023). These two challenges make existing online learning algorithms unable to fulfill our purpose, and we propose a novel Online Linear Optimization algorithm (AdaPFOL; used for system stability) that adapts to drastically varying losses (proportional to queue lengths) and a new Bandit Convex Optimization algorithm (AdaBGD; deployed for utility maximization) whose learning rates are carefully designed to take care of the time-dependent loss magnitudes and Lipschitzness.

Table 1: Comparison of Most Related Works
Network Arrival &
Conditions Service 11footnotemark: 1 00footnotetext: 11footnotemark: 1 Arrival & Service and Utility columns stand for whether the arrival and service rates or the utility function associated with each feasible control action is known before decision-making, respectively. Topology Objective Utility 11footnotemark: 1
(Neely et al., 2008) Stochastic Known Multi-Hop Utility Maximization Known
(Neely, 2010b) Adversarial Known Multi-Hop Utility Maximization Known
(Liang and Modiano, 2018b) Adversarial Known Multi-Hop Network Stability
(Liang and Modiano, 2018a) Adversarial Known Multi-Hop Utility Maximization Known
(Yang et al., 2023) Adversarial Unknown Single-Hop Network Stability
(Huang et al., 2024) Adversarial Unknown Single-Hop Network Stability
Ours Adversarial Unknown Multi-Hop Utility Maximization Unknown

Finally, we mention some most related works including (Neely, 2010b, Liang and Modiano, 2018b; a, Huang et al., 2024, Yang et al., 2023) to help interpolate our position in the literature. Among them, Neely (2010b), Liang and Modiano (2018b; a) assumed perfect pre-decision knowledge on network conditions, which allows direct calculation of the arrival and service rates resulting from every action. In our case, we have to learn these outcomes in an online manner. Liang and Modiano (2018b), Huang et al. (2024), Yang et al. (2023) focused on network stability, while our utility maximization task additionally requires maximizing an abstract, unknown, and time-varying utility function, thus adding difficulties in designing online learning algorithms. Neely (2010b), Liang and Modiano (2018a) considered utility maximization, but their utility functions are fixed and non-adversarial. In our case, the utility functions are both time-varying and unknown, thus another online learning sub-routine is needed. Huang et al. (2024), Yang et al. (2023) investigated single-hop networks where jobs leave the network upon being served, whereas our formulation considers the more general multi-hop networks. We refer the readers to Table 1 and Section 1.1 for more information.

Our main contributions in this paper can be summarized as follows:

  • We propose a novel algorithm UMO2 (Algorithm 3) for adversarial multi-hop networks under bandit feedback, which gives rigorous utility optimization guarantee. To the best of our knowledge, no previous algorithm can handle multi-hop topology or achieve utility guarantees in adversarial networks under bandit feedback, whereas our UMO2 algorithm is able to do both. Moreover, as a by-product, we also derive a simpler algorithm NSO (Algorithm 1) which ensures network stability for adversarial multi-hop networks under bandit feedback.

  • To handle the multi-hop topology which brings inter-queue correlations and to jointly handle online learning and network optimization, we develop a unified analysis that allows the integration of online learning techniques into the classical Lyapunov drift-plus-penalty arguments. Specifically, via the design of a new OLO algorithm to stabilize the network and a novel BCO algorithm to maximize the utility, UMO2 algorithm enjoys a network stability guarantee together with a polynomially decaying gap between its utility and that of any policy that is “mildly varying” (in the sense that its path length is of order o(T)𝑜𝑇o(\sqrt{T})italic_o ( square-root start_ARG italic_T end_ARG ); see Theorem 4.5 for more details).

  • Due to the potentially unbounded queue lengths, existing online learning algorithms are unfortunately inapplicable. We design an OLO algorithm that can handle large losses and enjoys a performance guarantee adapted to the loss magnitudes (AdaPFOL; Theorem 3.5). We also develop a new BCO method specially crafted for the drastically varying loss magnitudes and Lipschitzness (AdaBGD; Theorem 4.4). Both online learning algorithms can be of independent interest.

1.1 Related Works

We discuss the most related works here. A more comprehensive literature review is in Appendix A.

Adversarial Network Control. Adversarial networks date back to the 1990s, when Cruz (1991) gave the first adversarial dynamics network model and its scheduling algorithm. More efforts were made to allow more general arrival rates (Borodin et al., 2001, Andrews et al., 2001), link conditions (Andrews and Zhang, 2004, Andrews et al., 2007), or both (Liang and Modiano, 2018b). We also direct the readers to the references therein for more discussions. The main focus of the aforementioned papers were usually system stability, whereas ours is utility maximization. As we are aware of, existing results on utility maximization (Neely, 2010b, Liang and Modiano, 2018a) mostly assumed perfect knowledge on network conditions.

Feedback Models. Most previous works considered perfect knowledge model which assumes pre-decision knowledge on network conditions (Liang and Modiano, 2018b; a). In contrast, our paper considers bandit feedback model which only reveals the consequence of our action. A small number of previous works (Fu and Modiano, 2022, Yang et al., 2023, Huang et al., 2024) also assumed similar feedback models albeit under different names. Another feedback model whose difficulty lies in between is full-information feedback model, which requires network conditions to be revealed after decision and thus counterfactual evaluations of all actions (i.e., not only the deployed one) are allowed in hindsight. See (Neely et al., 2012) for an example.

Adversarial Networks under Bandit Feedback. Prior to our work, Huang et al. (2024), Yang et al. (2023) also studied adversarial networks under bandit feedback. However, they both assumed single-hop networks and focused on network stability. In contrast, our paper allows a general multi-hop topology and tackles the utility maximization task of optimizing an abstract, unknown, and time-varying utility function.

2 Notations and Preliminaries

We use bold letters to denote vectors, e.g., 𝒒t,𝝁t,𝝀tsubscript𝒒𝑡subscript𝝁𝑡subscript𝝀𝑡\bm{q}_{t},\bm{\mu}_{t},\bm{\lambda}_{t}bold_italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and denote their elements with corresponding normal letters, e.g., qt,i,μt,i,λt,isubscript𝑞𝑡𝑖subscript𝜇𝑡𝑖subscript𝜆𝑡𝑖q_{t,i},\mu_{t,i},\lambda_{t,i}italic_q start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT. For an integer n0𝑛0n\geq 0italic_n ≥ 0, [n]delimited-[]𝑛[n][ italic_n ] stands for {1,2,,n}12𝑛\{1,2,\ldots,n\}{ 1 , 2 , … , italic_n }. For a finite set 𝒮𝒮\mathcal{S}caligraphic_S, (𝒮)𝒮\triangle(\mathcal{S})△ ( caligraphic_S ) is the simplex over 𝒮𝒮\mathcal{S}caligraphic_S, i.e., {𝒙|𝒮|i=1|𝒮|xi=1}conditional-set𝒙superscript𝒮superscriptsubscript𝑖1𝒮subscript𝑥𝑖1\{\bm{x}\in\mathbb{R}^{\lvert\mathcal{S}\rvert}\mid\sum_{i=1}^{\lvert\mathcal{% S}\rvert}x_{i}=1\}{ bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT | caligraphic_S | end_POSTSUPERSCRIPT ∣ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | caligraphic_S | end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 }, where every element 𝒙(𝒮)𝒙𝒮\bm{x}\in\triangle(\mathcal{S})bold_italic_x ∈ △ ( caligraphic_S ) is a discrete probability distribution over 𝒮𝒮\mathcal{S}caligraphic_S. We use 𝒪𝒪\operatorname{\mathcal{O}}caligraphic_O to hide all absolute constants, and use 𝒪~~𝒪\operatorname{\widetilde{\mathcal{O}}}start_OPFUNCTION over~ start_ARG caligraphic_O end_ARG end_OPFUNCTION to additionally hide all logarithmic factors. For functions f(T)𝑓𝑇f(T)italic_f ( italic_T ) and g(T)𝑔𝑇g(T)italic_g ( italic_T ), we say f(T)=𝒪T(g(T))𝑓𝑇subscript𝒪𝑇𝑔𝑇f(T)=\operatorname{\mathcal{O}}_{T}(g(T))italic_f ( italic_T ) = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_g ( italic_T ) ) if lim supTf(T)g(T)<subscriptlimit-supremum𝑇𝑓𝑇𝑔𝑇\limsup_{T\to\infty}\frac{f(T)}{g(T)}<\inftylim sup start_POSTSUBSCRIPT italic_T → ∞ end_POSTSUBSCRIPT divide start_ARG italic_f ( italic_T ) end_ARG start_ARG italic_g ( italic_T ) end_ARG < ∞ and f(T)=oT(g(T))𝑓𝑇subscript𝑜𝑇𝑔𝑇f(T)=o_{T}(g(T))italic_f ( italic_T ) = italic_o start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_g ( italic_T ) ) if lim supTf(T)g(T)=0subscriptlimit-supremum𝑇𝑓𝑇𝑔𝑇0\limsup_{T\to\infty}\frac{f(T)}{g(T)}=0lim sup start_POSTSUBSCRIPT italic_T → ∞ end_POSTSUBSCRIPT divide start_ARG italic_f ( italic_T ) end_ARG start_ARG italic_g ( italic_T ) end_ARG = 0.

2.1 Adversarial Network Optimization under Bandit Feedback Formulation

We first introduce our adversarial network optimization with bandit feedback model. Specifically, in a network with multiple servers and directional data links, we denote the set of all servers by 𝒩𝒩{\mathcal{N}}caligraphic_N and that of all data links by 𝒩×𝒩𝒩𝒩{\mathcal{L}}\subseteq{\mathcal{N}}\times{\mathcal{N}}caligraphic_L ⊆ caligraphic_N × caligraphic_N. Suppose that |𝒩|𝒩\lvert{\mathcal{N}}\rvert| caligraphic_N | and ||\lvert{\mathcal{L}}\rvert| caligraphic_L | are both finite. There are |𝒩|𝒩\lvert{\mathcal{N}}\rvert| caligraphic_N | commodities of jobs such that those jobs belonging to commodity k𝒩𝑘𝒩k\in{\mathcal{N}}italic_k ∈ caligraphic_N are destined for server k𝒩𝑘𝒩k\in{\mathcal{N}}italic_k ∈ caligraphic_N. We denote Qn(k)superscriptsubscript𝑄𝑛𝑘Q_{n}^{(k)}italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT as the queue of unfinished commodity-k𝑘kitalic_k jobs at server n𝑛nitalic_n, where n𝒩𝑛𝒩n\in{\mathcal{N}}italic_n ∈ caligraphic_N and k𝒩𝑘𝒩k\in{\mathcal{N}}italic_k ∈ caligraphic_N. We assume the links do not interfere with each other.

The scheduling problem lasts for T>0𝑇0T>0italic_T > 0 rounds. In round t[T]𝑡delimited-[]𝑇t\in[T]italic_t ∈ [ italic_T ], the scheduler makes two decisions: i) arrival rates of commodity-k𝑘kitalic_k jobs into server n𝑛nitalic_n, n𝒩,k𝒩formulae-sequencefor-all𝑛𝒩𝑘𝒩\forall n\in{\mathcal{N}},k\in{\mathcal{N}}∀ italic_n ∈ caligraphic_N , italic_k ∈ caligraphic_N, and ii) link rate allocations of transmitting how many commodity-k𝑘kitalic_k jobs over data link (n,m)𝑛𝑚(n,m)( italic_n , italic_m ), k𝒩,(n,m)formulae-sequencefor-all𝑘𝒩𝑛𝑚\forall k\in{\mathcal{N}},(n,m)\in{\mathcal{L}}∀ italic_k ∈ caligraphic_N , ( italic_n , italic_m ) ∈ caligraphic_L. Both decisions are made under the bandit feedback model, i.e., the scheduler makes decisions in blind and only receives feedback resulting from its actions. Below, we describe them in detail.

Arrival Rates and Utility. In every round t𝑡titalic_t, the scheduler decides an |𝒩|×|𝒩|𝒩𝒩\lvert{\mathcal{N}}\rvert\times\lvert{\mathcal{N}}\rvert| caligraphic_N | × | caligraphic_N | dimensional arrival rate matrix 𝝀(t)𝝀𝑡\bm{\lambda}(t)bold_italic_λ ( italic_t ) from some fixed action set Λ0|𝒩|×|𝒩|Λsuperscriptsubscriptabsent0𝒩𝒩\Lambda\subseteq\mathbb{R}_{\geq 0}^{\lvert{\mathcal{N}}\rvert\times\lvert{% \mathcal{N}}\rvert}roman_Λ ⊆ blackboard_R start_POSTSUBSCRIPT ≥ 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | caligraphic_N | × | caligraphic_N | end_POSTSUPERSCRIPT, and consequently, λn(k)(t)superscriptsubscript𝜆𝑛𝑘𝑡\lambda_{n}^{(k)}(t)italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) jobs with commodity k𝑘kitalic_k will be added to queue Qn(k)superscriptsubscript𝑄𝑛𝑘Q_{n}^{(k)}italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT. The arrival rate vector 𝝀(t)𝝀𝑡\bm{\lambda}(t)bold_italic_λ ( italic_t ) is associated with some abstract utility gt(𝝀(t))subscript𝑔𝑡𝝀𝑡g_{t}(\bm{\lambda}(t))italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) where gt:Λ:subscript𝑔𝑡Λg_{t}\colon\Lambda\to\mathbb{R}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : roman_Λ → blackboard_R is concave (that is, the user’s marginal return diminishes gradually as the arrivals increase (Huang and Neely, 2011, Huang et al., 2012)), L𝐿Litalic_L-Lipschitz, and [G,G]𝐺𝐺[-G,G][ - italic_G , italic_G ]-bounded, where L𝐿Litalic_L and G𝐺Gitalic_G here are known constants. Following the adversarial network assumption, we allow gtsubscript𝑔𝑡g_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT’s to be time-dependent (though they have to be pre-determined, which is called the oblivious adversary model). Following the bandit feedback model, the scheduler has no information about gtsubscript𝑔𝑡g_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT before the decision, and can only observe gt(𝝀(t))subscript𝑔𝑡𝝀𝑡g_{t}(\bm{\lambda}(t))italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) for the chosen 𝝀(t)𝝀𝑡\bm{\lambda}(t)bold_italic_λ ( italic_t ) but not the whole gtsubscript𝑔𝑡g_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT after decision.

Link Rate Allocations. The capacity of each link (n,m)𝑛𝑚(n,m)\in{\mathcal{L}}( italic_n , italic_m ) ∈ caligraphic_L can be time-varying. We denote the capacity of (n,m)𝑛𝑚(n,m)( italic_n , italic_m ) in round t𝑡titalic_t as Cn,m(t)subscript𝐶𝑛𝑚𝑡C_{n,m}(t)italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ). We assume the capacities are always bounded by some finite constant M𝑀Mitalic_M. Due to the bandit feedback model, the scheduler cannot access Cn,m(t)subscript𝐶𝑛𝑚𝑡C_{n,m}(t)italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) when deciding. Nevertheless, the scheduler can still decide a link allocation plan which assigns a distribution over commodities on each link, or formally denoted as 𝒂n,m(t)(𝒩)subscript𝒂𝑛𝑚𝑡𝒩\bm{a}_{n,m}(t)\in\triangle({\mathcal{N}})bold_italic_a start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ∈ △ ( caligraphic_N ) (the |𝒩|𝒩|{\mathcal{N}}|| caligraphic_N |-dimension distribution simplex, representing the portion of rates allocated to each commodity over the link). Via sending jobs from each commodity along link (n,m)𝑛𝑚(n,m)( italic_n , italic_m ) according to distribution 𝒂n,m(t)subscript𝒂𝑛𝑚𝑡\bm{a}_{n,m}(t)bold_italic_a start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) in a round-robin manner, approximately an,m(k)(t)Cn,m(t)superscriptsubscript𝑎𝑛𝑚𝑘𝑡subscript𝐶𝑛𝑚𝑡a_{n,m}^{(k)}(t)C_{n,m}(t)italic_a start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) jobs from queue Qn(k)superscriptsubscript𝑄𝑛𝑘Q_{n}^{(k)}italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT will be sent along link (n,m)𝑛𝑚(n,m)( italic_n , italic_m ) to queue Qm(k)superscriptsubscript𝑄𝑚𝑘Q_{m}^{(k)}italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT. Formally, we assume that after deciding link allocation plans {𝒂n,m(t)(𝒩)}(n,m)subscriptsubscript𝒂𝑛𝑚𝑡𝒩𝑛𝑚\{\bm{a}_{n,m}(t)\in\triangle({\mathcal{N}})\}_{(n,m)\in{\mathcal{L}}}{ bold_italic_a start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ∈ △ ( caligraphic_N ) } start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT, the number of jobs successfully sent from Qn(k)superscriptsubscript𝑄𝑛𝑘Q_{n}^{(k)}italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT to Qm(k)superscriptsubscript𝑄𝑚𝑘Q_{m}^{(k)}italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT, denoted by μn,m(k)(t)superscriptsubscript𝜇𝑛𝑚𝑘𝑡\mu_{n,m}^{(k)}(t)italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ), are independently generated such that 𝔼[μn,m(k)(t)]=Cn,m(t)an,m(k)(t)𝔼superscriptsubscript𝜇𝑛𝑚𝑘𝑡subscript𝐶𝑛𝑚𝑡superscriptsubscript𝑎𝑛𝑚𝑘𝑡\operatornamewithlimits{\mathbb{E}}[\mu_{n,m}^{(k)}(t)]=C_{n,m}(t)a_{n,m}^{(k)% }(t)blackboard_E [ italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] = italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) italic_a start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) and μn,m(k)(t)[0,M]superscriptsubscript𝜇𝑛𝑚𝑘𝑡0𝑀\mu_{n,m}^{(k)}(t)\in[0,M]italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ∈ [ 0 , italic_M ]. Again, we assume a bandit fededback model, which means the scheduler is able to observe Cn,m(t)subscript𝐶𝑛𝑚𝑡C_{n,m}(t)italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) and 𝝁n,m(t)subscript𝝁𝑛𝑚𝑡\bm{\mu}_{n,m}(t)bold_italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) for all (n,m)𝑛𝑚(n,m)\in{\mathcal{L}}( italic_n , italic_m ) ∈ caligraphic_L only after the decision is made at the end of round t𝑡titalic_t.

Putting the two components together, by denoting the length of Qn(k)superscriptsubscript𝑄𝑛𝑘Q_{n}^{(k)}italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT at the beginning of round t𝑡titalic_t to be Qn(k)(t)superscriptsubscript𝑄𝑛𝑘𝑡Q_{n}^{(k)}(t)italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ), the network dynamics can then be characterized as follows:

Qn(k)(t+1)={[Qn(k)(t)(n,m)μn,m(k)(t)]++(o,n)μo,n(k)(t)+λn(k)(t),kn0,k=n,superscriptsubscript𝑄𝑛𝑘𝑡1casessubscriptdelimited-[]superscriptsubscript𝑄𝑛𝑘𝑡subscript𝑛𝑚superscriptsubscript𝜇𝑛𝑚𝑘𝑡subscript𝑜𝑛superscriptsubscript𝜇𝑜𝑛𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡𝑘𝑛0𝑘𝑛Q_{n}^{(k)}(t+1)=\begin{cases}\left[Q_{n}^{(k)}(t)-\sum_{(n,m)\in{\mathcal{L}}% }\mu_{n,m}^{(k)}(t)\right]_{+}+\sum_{(o,n)\in{\mathcal{L}}}\mu_{o,n}^{(k)}(t)+% \lambda_{n}^{(k)}(t),&k\neq n\\ 0,&k=n\end{cases},italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t + 1 ) = { start_ROW start_CELL [ italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT ( italic_o , italic_n ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) , end_CELL start_CELL italic_k ≠ italic_n end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL italic_k = italic_n end_CELL end_ROW , (1)

where λn(k)(t)superscriptsubscript𝜆𝑛𝑘𝑡\lambda_{n}^{(k)}(t)italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) is the number of jobs with commodity k𝑘kitalic_k that the scheduler adds to server n𝑛nitalic_n, and μn,m(k)(t)superscriptsubscript𝜇𝑛𝑚𝑘𝑡\mu_{n,m}^{(k)}(t)italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) is the number of jobs with commodity k𝑘kitalic_k transmitted along data link (n,m)𝑛𝑚(n,m)( italic_n , italic_m ).

The objective of the scheduler is to maximize its average utility over the T𝑇Titalic_T rounds, namely 1T𝔼[t=1Tgt(𝝀(t))]1𝑇𝔼superscriptsubscript𝑡1𝑇subscript𝑔𝑡𝝀𝑡\frac{1}{T}\operatornamewithlimits{\mathbb{E}}[\sum_{t=1}^{T}g_{t}(\bm{\lambda% }(t))]divide start_ARG 1 end_ARG start_ARG italic_T end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) ]. However, a scheduling algorithm is meaningless if it cannot ensure network stability, which requires the average number of jobs remaining in the network is non-divergent when the number of rounds is large enough. Formally, the network stability requirement says

1T𝔼[t=1T𝑸(t)1]=1T𝔼[t=1Tn𝒩k𝒩Qn(k)(t)]=𝒪T(1),when T0.formulae-sequence1𝑇𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡11𝑇𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡subscript𝒪𝑇1much-greater-thanwhen 𝑇0\frac{1}{T}\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{Q}% (t)\rVert_{1}\right]=\frac{1}{T}\operatornamewithlimits{\mathbb{E}}\left[\sum_% {t=1}^{T}\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q_{n}^{(k)}(t)\right]% =\operatorname{\mathcal{O}}_{T}(1),\quad\text{when }T\gg 0.divide start_ARG 1 end_ARG start_ARG italic_T end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = divide start_ARG 1 end_ARG start_ARG italic_T end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( 1 ) , when italic_T ≫ 0 . (2)

The scheduler aims to maximize its average utility subject to the network stability condition, i.e.,

Maximize 1T𝔼[t=1Tgt(𝝀(t))] s.t. Equation 2 holds.Maximize 1𝑇𝔼superscriptsubscript𝑡1𝑇subscript𝑔𝑡𝝀𝑡 s.t. Equation 2 holds\text{Maximize }\frac{1}{T}\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}% ^{T}g_{t}(\bm{\lambda}(t))\right]\text{ s.t. \lx@cref{creftypecap~refnum}{eq:a% verage queue} holds}.Maximize divide start_ARG 1 end_ARG start_ARG italic_T end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) ] s.t. holds . (3)

2.2 Technical Overview of Our Paper

In order to improve presentation and facilitate understanding, in Section 3, we first present the network stability algorithm NSO, i.e., pretending gtsubscript𝑔𝑡g_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is a constant. This algorithm will serve as a key building block for the utility maximization algorithm UMO2 that we introduce in Section 4.

In Figure 1, we give an overview of our main technical steps when analyzing NSO and UMO2. The steps for NSO are in yellow, the ones for UMO2 are in blue, and those in common are in green. In general, either analysis starts from the famous Lyapunov drift(-plus-penalty) analysis (Neely, 2010a, §4), which reveals the non-negativity of a Lyapunov drift(-plus-penalty) function – see Section 3.3 and Section 4.3 for more details. We then use online learning techniques to minimize them (i.e., making them as close to zero as possible). From here, the analyses for NSO and UMO2 become different.

Lyapunov Drift Analysis (Lemma 3.2) Online Linear Optimization AdaPFOL (Algorithm 2 & Lemma 3.4) Deciding Link Allocations via AdaPFOL (Theorem 3.5) NSO: Network Stability for Multi-Hop ANO under Bandit Feedback (Theorem 3.6) Lyapunov Drift-Plus-Penalty Analysis (Lemma C.2) Bandit Convex Optimization AdaBGD (Algorithm 4 & Lemma 4.3) Deciding Arrival Rates via AdaBGD (Theorem 4.4) UMO2: Utility Maximization for Multi-Hop ANO under Bandit Feedback (Theorem 4.5) MotivatesMotivates
Figure 1: Technical Overview of NSO and UMO2 Frameworks

For the network stability algorithm NSO, we succeeded in expressing the Lyapunov drift function as a function linear in the queue lengths 𝑸(t)𝑸𝑡\bm{Q}(t)bold_italic_Q ( italic_t ) and the link allocation plan 𝒂(t)𝒂𝑡\bm{a}(t)bold_italic_a ( italic_t ). While this belongs to the classical Online Linear Optimization (OLO) problem in online learning (Zinkevich, 2003), we face two unique challenges due to the potentially unbounded queue lengths and the self-bounding analysis for network stability guarantees; see Section 3.1 for more discussions. These two requirements rules out existing OLO algorithms, and thus we have to design our own algorithm crafted towards the network optimization objective. Specifically, we designed an OLO algorithm AdaPFOL (see Algorithm 2) that can handle occasionally large loss magnitudes and ensures a performance guarantee depending on all the losses. When plugging it into NSO, we are able to see that the link allocation plans perform well, as detailed in Theorem 3.5. Therefore, combining it with some reference policy assumption (1) and the Lyapunov drift analysis, we obtain the network stability guarantee of NSO in Theorem 3.6. A more detailed overview of NSO is in Section 3.1.

Regarding the utility maximization algorithm UMO2, we have to decompose the Lyapunov drift-plus-penalty function into two parts. The first part is still linear in 𝑸(t)𝑸𝑡\bm{Q}(t)bold_italic_Q ( italic_t ) and 𝒂(t)𝒂𝑡\bm{a}(t)bold_italic_a ( italic_t ), which we can reuse the AdaPFOL algorithm. For the second part, as the utility function gtsubscript𝑔𝑡g_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is an arbitrary time-varying concave function and we only receive bandit feedback (recall Table 1), OLO cannot capture it. Instead, we model this part as a Bandit Convex Optimization (BCO) problem (Flaxman et al., 2005). Unfortunately, again due to the potentially unbounded queue lengths and the self-bounding analysis, the loss functions’ magnitudes and Lipschitzness can be very large for some rounds but we still want to adapt to them – thus, existing BCO algorithms are inapplicable either. To this end, we develop a BCO algorithm AdaBGD (Algorithm 4) which allows loss functions with large magnitudes or Lipschitzness and enjoys a performance adaptive to the loss functions. When plugging in AdaBGD to the UMO2 framework, it can generate a good arrival rate sequence as we analyze in Theorem 4.4. Therefore, similar to the analysis of NSO in Theorem 3.6, if we combine the AdaPFOL in Algorithm 2 and the AdaBGD in Algorithm 4 together, we are able to derive the utility maximization guarantee of UMO2 in Theorem 4.5. Again, a more detailed overview of UMO2 can be found in Section 4.1.

3 Network Stability in Adversarial Multi-Hop Networks

In this section, we do a first step towards our ultimate goal of multi-hop utility maximization, which is network stability (recall that Equation 3 requires Equation 2 as a condition). That is, this section only focuses on stablizing the average number of tasks in the system 1T𝔼[t=1T𝑸(t)1]1𝑇𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1\frac{1}{T}\operatornamewithlimits{\mathbb{E}}[\sum_{t=1}^{T}\lVert\bm{Q}(t)% \rVert_{1}]divide start_ARG 1 end_ARG start_ARG italic_T end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] and does not consider utilities. The algorithm designed for this purpose (NSO in Algorithm 1) will serve as the network stability component of our utility maximization algorithm (UMO2 in Algorithm 3).

One may observe that if we only want to ensure the network stability condition in Equation 2, it suffices to pick all arrival rate vectors 𝝀(t)0𝝀𝑡0\bm{\lambda}(t)\equiv 0bold_italic_λ ( italic_t ) ≡ 0. To avoid such trivial algorithms, we assume the arrival rates are adversarially chosen for now. That is, 𝝀(t)[0,R]|𝒩|×|𝒩|𝝀𝑡superscript0𝑅𝒩𝒩\bm{\lambda}(t)\in[0,R]^{\lvert{\mathcal{N}}\rvert\times\lvert{\mathcal{N}}\rvert}bold_italic_λ ( italic_t ) ∈ [ 0 , italic_R ] start_POSTSUPERSCRIPT | caligraphic_N | × | caligraphic_N | end_POSTSUPERSCRIPT is some arbitrary, unknown, and time-varying vector following the oblivious adversary model. It is only revealed post-decision, at the end of round t𝑡titalic_t. The rationale of assuming adversarial 𝝀(t)𝝀𝑡\bm{\lambda}(t)bold_italic_λ ( italic_t ) is because in the UMO2 algorithm for utility maximiztion, another algorithmic component decides 𝝀(t)𝝀𝑡\bm{\lambda}(t)bold_italic_λ ( italic_t ) and the network stability component that we design in this section must adapt to such an arbitrary arrival rate matrix 𝝀(t)𝝀𝑡\bm{\lambda}(t)bold_italic_λ ( italic_t ).

Algorithm 1 NSO: Network Stability via Online Linear Optimization
0:  Number of rounds T𝑇Titalic_T, set of servers 𝒩𝒩{\mathcal{N}}caligraphic_N and links {\mathcal{L}}caligraphic_L, maximum capacity M𝑀Mitalic_M, feasible arrival rates ΛΛ\Lambdaroman_Λ (adversarial arrival rates given during execution – only assumed in this section). An online linear optimization algorithm AdaPFOL (Algorithm 2).
1:  For each link (n,m)𝑛𝑚(n,m)\in{\mathcal{L}}( italic_n , italic_m ) ∈ caligraphic_L, initialize an instance of AdaPFOL with action set (𝒩)𝒩\triangle({\mathcal{N}})△ ( caligraphic_N ) as AdaPFOLn,msubscriptAdaPFOL𝑛𝑚\texttt{AdaPFOL}_{n,m}AdaPFOL start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT.
2:  for t=1,2,,T𝑡12𝑇t=1,2,\ldots,Titalic_t = 1 , 2 , … , italic_T do
3:     For each link (n,m)𝑛𝑚(n,m)\in{\mathcal{L}}( italic_n , italic_m ) ∈ caligraphic_L, pass the maximum loss magnitude for this round M𝑸m(t)𝑸n(t)𝑀subscriptdelimited-∥∥subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡M\lVert\bm{Q}_{m}(t)-\bm{Q}_{n}(t)\rVert_{\infty}italic_M ∥ bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT to AdaPFOLn,msubscriptAdaPFOL𝑛𝑚\texttt{AdaPFOL}_{n,m}AdaPFOL start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT. Pick link allocation 𝒂n,m(t)(𝒩)subscript𝒂𝑛𝑚𝑡𝒩\bm{a}_{n,m}(t)\in\triangle({\mathcal{N}})bold_italic_a start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ∈ △ ( caligraphic_N ) as the output of AdaPFOLn,msubscriptAdaPFOL𝑛𝑚\texttt{AdaPFOL}_{n,m}AdaPFOL start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT.
4:     Observe arrival rates 𝝀(t)Λ𝝀𝑡Λ\bm{\lambda}(t)\in\Lambdabold_italic_λ ( italic_t ) ∈ roman_Λ. \triangleright In the utility maximization algorithm UMO2 (Algorithm 3), this step will be replaced by another algorithmic component.
5:     Observe capacities {Cn,m(t)}(n,m)subscriptsubscript𝐶𝑛𝑚𝑡𝑛𝑚\{C_{n,m}(t)\}_{(n,m)\in{\mathcal{L}}}{ italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) } start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT and actual data transmissions {μn,m(k)(t)}(n,m),k𝒩subscriptsuperscriptsubscript𝜇𝑛𝑚𝑘𝑡formulae-sequence𝑛𝑚𝑘𝒩\{\mu_{n,m}^{(k)}(t)\}_{(n,m)\in{\mathcal{L}},k\in{\mathcal{N}}}{ italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) } start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L , italic_k ∈ caligraphic_N end_POSTSUBSCRIPT.
6:     Calculate queue lengths 𝑸(t+1)𝑸𝑡1\bm{Q}(t+1)bold_italic_Q ( italic_t + 1 ) from 𝑸(t)𝑸𝑡\bm{Q}(t)bold_italic_Q ( italic_t ) according to Equation 1.
7:     For each link (n,m)𝑛𝑚(n,m)\in{\mathcal{L}}( italic_n , italic_m ) ∈ caligraphic_L, pass the loss vector Cn,m(t)(𝑸m(t)𝑸n(t))subscript𝐶𝑛𝑚𝑡subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡C_{n,m}(t)(\bm{Q}_{m}(t)-\bm{Q}_{n}(t))italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ( bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ) to AdaPFOLn,msubscriptAdaPFOL𝑛𝑚\texttt{AdaPFOL}_{n,m}AdaPFOL start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT.

3.1 Motivation of Our Algorithmic Framework

In Algorithm 1, we present Network Stability via Online Linear Optimization (NSO), an algorithmic framework which achieves stability in adversarial multi-hop networks under bandit feedback. One key ingredient of NSO is the plug-in Online Linear Optimization (OLO) algorithm AdaPFOL. Before going into details of the AdaPFOL algorithm, we first introduce why we need it.

The design of NSO is based on the famous Lyapunov drift analysis (Neely, 2010a, §4). Conducting standard Lyapunov analysis on the network dynamics defined in Equation 1, we are able to derive

12N2((NM)2+2(NM)2+2R2)T12superscript𝑁2superscript𝑁𝑀22superscript𝑁𝑀22superscript𝑅2𝑇\displaystyle\quad-\frac{1}{2}N^{2}((NM)^{2}+2(NM)^{2}+2R^{2})T- divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_T
𝔼[t=1Tn𝒩k𝒩Qn(k)(t)((o,n)μo,n(k)(t)+λn(k)(t)(n,m)μn,m(k)(t))],absent𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡subscript𝑜𝑛superscriptsubscript𝜇𝑜𝑛𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡subscript𝑛𝑚superscriptsubscript𝜇𝑛𝑚𝑘𝑡\displaystyle\leq\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum_{% n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q_{n}^{(k)}(t)\left(\sum_{(o,n)\in{% \mathcal{L}}}\mu_{o,n}^{(k)}(t)+\lambda_{n}^{(k)}(t)-\sum_{(n,m)\in{\mathcal{L% }}}\mu_{n,m}^{(k)}(t)\right)\right],≤ blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ( ∑ start_POSTSUBSCRIPT ( italic_o , italic_n ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) ] , (4)

whose formal statement and proof can be found in Lemma 3.2.

Based on this inequality, a Lyapunov drift based algorithm can be constructed by minimizing the RHS of Equation 4 (Neely, 2010a, §4). As the arrival rate λ𝜆\lambdaitalic_λ’s is regarded as a constant in this section, we may only focus on the terms related to μ𝜇\muitalic_μ. Thus, minimizing RHS of Equation 4 is equivalent to

Minimizing 𝔼[t=1T(n,m)k=1Kμn,m(k)(t)(Qm(k)(t)Qn(k)(t))]𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝑚superscriptsubscript𝑘1𝐾superscriptsubscript𝜇𝑛𝑚𝑘𝑡superscriptsubscript𝑄𝑚𝑘𝑡superscriptsubscript𝑄𝑛𝑘𝑡\displaystyle\quad\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum_% {(n,m)\in{\mathcal{L}}}\sum_{k=1}^{K}\mu_{n,m}^{(k)}(t)(Q_{m}^{(k)}(t)-Q_{n}^{% (k)}(t))\right]blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ( italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) ]
=𝔼[t=1T(n,m)𝑸m(t)𝑸n(t),𝝁n,m(t)]absent𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝑚subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡subscript𝝁𝑛𝑚𝑡\displaystyle=\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum_{(n,% m)\in{\mathcal{L}}}\langle\bm{Q}_{m}(t)-\bm{Q}_{n}(t),\bm{\mu}_{n,m}(t)\rangle\right]= blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ⟨ bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) , bold_italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ⟩ ]
=𝔼[t=1T(n,m)Cn,m(t)(𝑸m(t)𝑸n(t)),𝒂n,m(t)],absent𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝑚subscript𝐶𝑛𝑚𝑡subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡subscript𝒂𝑛𝑚𝑡\displaystyle=\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum_{(n,% m)\in{\mathcal{L}}}\langle C_{n,m}(t)(\bm{Q}_{m}(t)-\bm{Q}_{n}(t)),\bm{a}_{n,m% }(t)\rangle\right],= blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ⟨ italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ( bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ) , bold_italic_a start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ⟩ ] , (5)

where the last step uses the assumption that 𝔼[𝝁n,m(t)]=𝔼[Cn,m𝒂n,m(t)]𝔼subscript𝝁𝑛𝑚𝑡𝔼subscript𝐶𝑛𝑚subscript𝒂𝑛𝑚𝑡\operatornamewithlimits{\mathbb{E}}[\bm{\mu}_{n,m}(t)]=\operatornamewithlimits% {\mathbb{E}}[C_{n,m}\bm{a}_{n,m}(t)]blackboard_E [ bold_italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ] = blackboard_E [ italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT bold_italic_a start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ].

For illustration purposes, let us focus on a single data link (n,m)𝑛𝑚(n,m)\in{\mathcal{L}}( italic_n , italic_m ) ∈ caligraphic_L. Motivated by Huang et al. (2024), we consider designing a scheduling algorithm via minimizing the following expectation:

𝔼[t=1TCn,m(t)(𝑸m(t)𝑸n(t)),𝒂n,m(t)].𝔼superscriptsubscript𝑡1𝑇subscript𝐶𝑛𝑚𝑡subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡subscript𝒂𝑛𝑚𝑡\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\langle C_{n,m}(t)(\bm{% Q}_{m}(t)-\bm{Q}_{n}(t)),\bm{a}_{n,m}(t)\rangle\right].blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ⟨ italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ( bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ) , bold_italic_a start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ⟩ ] .
Remark 1.

While having similarities, this objective is different from that of Huang et al. (2024) in two aspects:

First, the network topology in (Huang et al., 2024) is a single-server, single-hop one, thus it suffices to conduct the Lyapunov drift optimization on the centralized server. In contrast, due to our multi-hop topology, our optimization task Equation 5 has to be distributed onto every data link (n,m)𝑛𝑚(n,m)\in{\mathcal{L}}( italic_n , italic_m ) ∈ caligraphic_L and extra efforts are needed to ensure a good overall scheduling effect.

Second, the coefficient Cn,m(t)(Qm(k)(t)Qn(k)(t))subscript𝐶𝑛𝑚𝑡superscriptsubscript𝑄𝑚𝑘𝑡superscriptsubscript𝑄𝑛𝑘𝑡C_{n,m}(t)(Q_{m}^{(k)}(t)-Q_{n}^{(k)}(t))italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ( italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) before an,m(k)(t)superscriptsubscript𝑎𝑛𝑚𝑘𝑡a_{n,m}^{(k)}(t)italic_a start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) can be either positive or negative, whereas that of Huang et al. (2024) is always non-negative. Such a negativity also increases the difficulty as many online learning algorithms are typically bad at handling potentially negative losses, see, e.g., (Zheng et al., 2019, Dai et al., 2023).

Recall that 𝒂n,m(t)(𝒩)subscript𝒂𝑛𝑚𝑡𝒩\bm{a}_{n,m}(t)\in\triangle({\mathcal{N}})bold_italic_a start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ∈ △ ( caligraphic_N ) can be any probability distribution from the simplex. Hence, one may view (𝒩)𝒩\triangle({\mathcal{N}})△ ( caligraphic_N ) as the action set in round t𝑡titalic_t. Moreover, for an action 𝒂𝒂\bm{a}bold_italic_a from the action set (𝒩)𝒩\triangle({\mathcal{N}})△ ( caligraphic_N ), picking it in round t𝑡titalic_t will incur a loss Cn,m(t)(𝑸m(t)𝑸n(t)),𝒂subscript𝐶𝑛𝑚𝑡subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡𝒂\langle C_{n,m}(t)(\bm{Q}_{m}(t)-\bm{Q}_{n}(t)),\bm{a}\rangle⟨ italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ( bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ) , bold_italic_a ⟩ – which is linear in 𝒂𝒂\bm{a}bold_italic_a. Thus, this problem belongs to the class of Online Linear Optimization (OLO) problems (Zinkevich, 2003, McMahan and Streeter, 2010, Duchi et al., 2011), whose formal definition will be presented later as Definition 3.3. Although our problem belongs to the OLO formulation, we face significantly different challenges due to our network optimization context:

  1. i)

    In our task of minimizing 𝔼[t=1TCn,m(t)(𝑸m(t)𝑸n(t)),𝒂n,m(t)]𝔼superscriptsubscript𝑡1𝑇subscript𝐶𝑛𝑚𝑡subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡subscript𝒂𝑛𝑚𝑡\operatornamewithlimits{\mathbb{E}}[\sum_{t=1}^{T}\langle C_{n,m}(t)(\bm{Q}_{m% }(t)-\bm{Q}_{n}(t)),\bm{a}_{n,m}(t)\rangle]blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ⟨ italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ( bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ) , bold_italic_a start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ⟩ ], the magnitude of the loss Cn,m(t)(𝑸m(t)𝑸n(t))subscript𝐶𝑛𝑚𝑡subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡C_{n,m}(t)(\bm{Q}_{m}(t)-\bm{Q}_{n}(t))italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ( bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ) can occasionally be large because 𝑸n(t)subscript𝑸𝑛𝑡\bm{Q}_{n}(t)bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) or 𝑸m(t)subscript𝑸𝑚𝑡\bm{Q}_{m}(t)bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) may be unbounded. Note that, despite our system stability condition Equation 2 requires 1T𝔼[t=1T𝑸(t)1]1𝑇𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1\frac{1}{T}\operatornamewithlimits{\mathbb{E}}[\sum_{t=1}^{T}\lVert\bm{Q}(t)% \rVert_{1}]divide start_ARG 1 end_ARG start_ARG italic_T end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] to be small, some 𝑸(t)𝑸𝑡\bm{Q}(t)bold_italic_Q ( italic_t )’s inside average and expectation are still allowed to be large. However, existing algorithms in OLO mostly require the losses to be uniformly bounded by a constant (see, e.g., (McMahan and Streeter, 2010, Cutkosky, 2020)), which means extra efforts should be made to handle occasionally large losses.

  2. ii)

    Moreover, we also want our performance to depend on the geometric mean of all loss magnitudes (which in turn relates to queue lengths since the losses Cn,m(t)(𝑸m(t)𝑸n(t))subscript𝐶𝑛𝑚𝑡subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡C_{n,m}(t)(\bm{Q}_{m}(t)-\bm{Q}_{n}(t))italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ( bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ) depend on 𝑸(t)delimited-∥∥𝑸𝑡\lVert\bm{Q}(t)\rVert∥ bold_italic_Q ( italic_t ) ∥). On a high level, this can be understood as follows: The average queue length 1T𝔼[t=1T𝑸(t)1]1𝑇𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1\frac{1}{T}\operatornamewithlimits{\mathbb{E}}[\sum_{t=1}^{T}\lVert\bm{Q}(t)% \rVert_{1}]divide start_ARG 1 end_ARG start_ARG italic_T end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] can be controlled by the OLO performance via some other arguments (see Section 3.2). Thus, if we can additionally show that OLO performance is bounded by queue lengths, we can conduct a self-bounding analysis on the queue lengths which informally reads

    𝔼[t=1T𝑸(t)1]Online Learning Performance𝒪T(T)+oT(T1/4)𝔼[t=1T𝑸(t)1]3/4\displaystyle\quad\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}% \lVert\bm{Q}(t)\rVert_{1}\right]\lesssim\text{Online Learning Performance}% \lesssim\operatorname{\mathcal{O}}_{T}(T)+o_{T}(T^{1/4})% \operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{% 1}\right]^{3/4}blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] ≲ Online Learning Performance ≲ caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T ) + italic_o start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT ) blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT
    1T𝔼[t=1T𝑸(t)1]=𝒪T(1), i.e., the system is stabilized.formulae-sequenceabsent1𝑇𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1subscript𝒪𝑇1 i.e. the system is stabilized\displaystyle\Longrightarrow\frac{1}{T}\operatornamewithlimits{\mathbb{E}}% \left[\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{1}\right]=\operatorname{\mathcal{O}% }_{T}(1),\textit{ i.e.},\text{ the system is stabilized}.⟹ divide start_ARG 1 end_ARG start_ARG italic_T end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( 1 ) , i.e. , the system is stabilized . (6)

    More details regarding the self-bounding analysis can be found in Section 3.5. Nevertheless, it suffices to remember that a performance depending on all loss magnitudes is beneficial.

In Section 3.4, we introduce our novel OLO algorithm of AdaPFOL (Algorithm 2) that enjoys these two properties. Equipped with such an algorithm, we can minimize Equation 5 and achieve network stability guarantee by deploying it on every link (n,m)𝑛𝑚(n,m)\in{\mathcal{L}}( italic_n , italic_m ) ∈ caligraphic_L for T𝑇Titalic_T rounds with action set (𝒩)𝒩\triangle({\mathcal{N}})△ ( caligraphic_N ). This idea of deploying AdaPFOL onto every link exactly gives our NSO framework in Algorithm 1.

Therefore, we are able to analyze the network stability effect when the NSO framework is equipped with AdaPFOL in Algorithm 2, which we do in the rest of this section: We introduce our reference policy assumptions in Section 3.2, conduct the Lyapunov drift analysis in Section 3.3, introduct and analyze the novel AdaPFOL algorithm in Section 3.4, and present our final analysis in Section 3.5.

3.2 Reference Policy Assumption

We first make the following multi-hop piecewise stability assumption, which, informally speaking, assumes that there exists a reference policy that stabilizes the system piecewisely. It is an extension of the piecewise stability assumption (Huang et al., 2024, Assumption 1) to multi-hop cases.

Assumption 1 (Multi-Hop Piecewise Stability for Network Stability).

There exists a reference action sequence {𝐚̊(t)}t[T]subscript̊𝐚𝑡𝑡delimited-[]𝑇\{\mathring{\bm{a}}(t)\}_{t\in[T]}{ over̊ start_ARG bold_italic_a end_ARG ( italic_t ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT (where 𝐚̊(t)={𝐚̊n,m(t)(𝒩)}(n,m)̊𝐚𝑡subscriptsubscript̊𝐚𝑛𝑚𝑡𝒩𝑛𝑚\mathring{\bm{a}}(t)=\{\mathring{\bm{a}}_{n,m}(t)\in\triangle({\mathcal{N}})\}% _{(n,m)\in{\mathcal{L}}}over̊ start_ARG bold_italic_a end_ARG ( italic_t ) = { over̊ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ∈ △ ( caligraphic_N ) } start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT, in analogue to the scheduler’s action sequence), such that there are some constants CW0subscript𝐶𝑊0C_{W}\geq 0italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ≥ 0, ϵW0subscriptitalic-ϵ𝑊0\epsilon_{W}\geq 0italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ≥ 0 and a partition W1,W2,,WJsubscript𝑊1subscript𝑊2subscript𝑊𝐽W_{1},W_{2},\ldots,W_{J}italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_W start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT of [T]delimited-[]𝑇[T][ italic_T ],111A partition W1,W2,,WNsubscript𝑊1subscript𝑊2subscript𝑊𝑁W_{1},W_{2},\ldots,W_{N}italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_W start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT of [T]delimited-[]𝑇[T][ italic_T ] is a collection of a few non-intersecting intervals whose union is [T]delimited-[]𝑇[T][ italic_T ] which ensure that j=1J(|Wj|1)2CWTsuperscriptsubscript𝑗1𝐽superscriptsubscript𝑊𝑗12subscript𝐶𝑊𝑇\sum_{j=1}^{J}(\lvert W_{j}\rvert-1)^{2}\leq C_{W}T∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_J end_POSTSUPERSCRIPT ( | italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_T and

1|Wj|tWj(n,m)Cn,m(t)ån,m(k)(t)ϵW+1|Wj|tWj(λn(k)(t)+(o,n)Co,n(t)åo,n(k)(t)),1subscript𝑊𝑗subscript𝑡subscript𝑊𝑗subscript𝑛𝑚subscript𝐶𝑛𝑚𝑡superscriptsubscript̊𝑎𝑛𝑚𝑘𝑡subscriptitalic-ϵ𝑊1subscript𝑊𝑗subscript𝑡subscript𝑊𝑗superscriptsubscript𝜆𝑛𝑘𝑡subscript𝑜𝑛subscript𝐶𝑜𝑛𝑡superscriptsubscript̊𝑎𝑜𝑛𝑘𝑡\displaystyle\frac{1}{\lvert W_{j}\rvert}\sum_{t\in W_{j}}\sum_{(n,m)\in{% \mathcal{L}}}C_{n,m}(t)\mathring{a}_{n,m}^{(k)}(t)\geq\epsilon_{W}+\frac{1}{% \lvert W_{j}\rvert}\sum_{t\in W_{j}}\left(\lambda_{n}^{(k)}(t)+\sum_{(o,n)\in{% \mathcal{L}}}C_{o,n}(t)\mathring{a}_{o,n}^{(k)}(t)\right),divide start_ARG 1 end_ARG start_ARG | italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_t ∈ italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ≥ italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG | italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_t ∈ italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) + ∑ start_POSTSUBSCRIPT ( italic_o , italic_n ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) ,
j[J],n𝒩,k𝒩,formulae-sequencefor-all𝑗delimited-[]𝐽formulae-sequence𝑛𝒩𝑘𝒩\displaystyle\quad\forall j\in[J],n\in{\mathcal{N}},k\in{\mathcal{N}},∀ italic_j ∈ [ italic_J ] , italic_n ∈ caligraphic_N , italic_k ∈ caligraphic_N , (7)

where λn(k)(t)superscriptsubscript𝜆𝑛𝑘𝑡\lambda_{n}^{(k)}(t)italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) is the obliviously decided arrival rates that we assume in this section.

Intuitively, 1 means that there exists some “good” action sequence {𝒂̊(t)}t[T]subscript̊𝒂𝑡𝑡delimited-[]𝑇\{\mathring{\bm{a}}(t)\}_{t\in[T]}{ over̊ start_ARG bold_italic_a end_ARG ( italic_t ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT making the network stable, in the sense that there are multiple windows W1,W2,,WJsubscript𝑊1subscript𝑊2subscript𝑊𝐽W_{1},W_{2},\ldots,W_{J}italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_W start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT such that in expectation, for each window Wjsubscript𝑊𝑗W_{j}italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and for each queue Qn(k)superscriptsubscript𝑄𝑛𝑘Q_{n}^{(k)}italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT, the average service rate it receives (whose expectation is (n,m)Cn,m(t)ån,m(k)(t)subscript𝑛𝑚subscript𝐶𝑛𝑚𝑡superscriptsubscript̊𝑎𝑛𝑚𝑘𝑡\sum_{(n,m)\in{\mathcal{L}}}C_{n,m}(t)\mathring{a}_{n,m}^{(k)}(t)∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) in round t𝑡titalic_t), is strictly more than its net arrival rate (which includes both external data flows λn(k)(t)superscriptsubscript𝜆𝑛𝑘𝑡\lambda_{n}^{(k)}(t)italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) and internal data flows that are forwarded from other queues (o,n)Co,n(t)åo,n(k)(t)subscript𝑜𝑛subscript𝐶𝑜𝑛𝑡superscriptsubscript̊𝑎𝑜𝑛𝑘𝑡\sum_{(o,n)\in{\mathcal{L}}}C_{o,n}(t)\mathring{a}_{o,n}^{(k)}(t)∑ start_POSTSUBSCRIPT ( italic_o , italic_n ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t )), by a constant gap of at least ϵWsubscriptitalic-ϵ𝑊\epsilon_{W}italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT.

Remark 2.

Such assumptions are typical in network optimization literature. In the case when the network is stationary, 1 recovers the classical capacity region assumption in SNO (Neely, 2010a). However, extending this condition to adversarial network is highly non-trivial. For adversarial networks, an alternative assumption is the (W,ϵ)𝑊italic-ϵ(W,\epsilon)( italic_W , italic_ϵ )-constrained dynamics assumption (Liang and Modiano, 2018b), which roughly says Equation 7 holds for every window of size W𝑊Witalic_W. 1 thus allows more flexibility. Finally, our 1 can be viewed as a generalization of the piecewise stability assumption (Huang et al., 2024), which was crafted for a single centralized server.

Before moving on, we shall remark that the reference action sequence {𝒂̊(t)}t[T]subscript̊𝒂𝑡𝑡delimited-[]𝑇\{\mathring{\bm{a}}(t)\}_{t\in[T]}{ over̊ start_ARG bold_italic_a end_ARG ( italic_t ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT in 1 is unknown to the scheduler. Instead, the scheduler needs to learn its own way of stabilizing the network via observations. To characterize the ability of {𝒂̊(t)}t[T]subscript̊𝒂𝑡𝑡delimited-[]𝑇\{\mathring{\bm{a}}(t)\}_{t\in[T]}{ over̊ start_ARG bold_italic_a end_ARG ( italic_t ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT in stabilizing the network, the following lemma controls the average queue length resulting from any scheduling policy.

Lemma 3.1 (Ability of {𝒂̊(t)}t[T]subscript̊𝒂𝑡𝑡delimited-[]𝑇\{\mathring{\bm{a}}(t)\}_{t\in[T]}{ over̊ start_ARG bold_italic_a end_ARG ( italic_t ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT in Stabilizing the Network).

If {𝐚̊(t)}t[T]subscript̊𝐚𝑡𝑡delimited-[]𝑇\{\mathring{\bm{a}}(t)\}_{t\in[T]}{ over̊ start_ARG bold_italic_a end_ARG ( italic_t ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT satisfies 1, then for any scheduler-generated queue lengths {𝐐(t)}t[T]subscript𝐐𝑡𝑡delimited-[]𝑇\{\bm{Q}(t)\}_{t\in[T]}{ bold_italic_Q ( italic_t ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT,

ϵW𝔼[t=1Tn𝒩k𝒩Qn(k)(t)](N2(2NM+R)2+ϵWN2(2NM+R))CWTsubscriptitalic-ϵ𝑊𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡superscript𝑁2superscript2𝑁𝑀𝑅2subscriptitalic-ϵ𝑊superscript𝑁22𝑁𝑀𝑅subscript𝐶𝑊𝑇\displaystyle\quad\epsilon_{W}\operatornamewithlimits{\mathbb{E}}\left[\sum_{t% =1}^{T}\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q_{n}^{(k)}(t)\right]-(% N^{2}(2NM+R)^{2}+\epsilon_{W}N^{2}(2NM+R))C_{W}Titalic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] - ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) ) italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_T
𝔼[t=1T(n,m)k𝒩Cn,m(t)ån,m(k)(t)(Qm(k)(t)Qn(k)(t))]𝔼[t=1Tn𝒩k𝒩Qn(k)(t)λn(k)(t)].absent𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝑚subscript𝑘𝒩subscript𝐶𝑛𝑚𝑡superscriptsubscript̊𝑎𝑛𝑚𝑘𝑡superscriptsubscript𝑄𝑚𝑘𝑡superscriptsubscript𝑄𝑛𝑘𝑡𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡\displaystyle\leq-\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum_% {(n,m)\in{\mathcal{L}}}\sum_{k\in{\mathcal{N}}}C_{n,m}(t)\mathring{a}_{n,m}^{(% k)}(t)(Q_{m}^{(k)}(t)-Q_{n}^{(k)}(t))\right]-\operatornamewithlimits{\mathbb{E% }}\left[\sum_{t=1}^{T}\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q_{n}^{(% k)}(t)\lambda_{n}^{(k)}(t)\right].≤ - blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ( italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) ] - blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] .

Lemma 3.1 says the Lyapunov drift (defined later) under {𝒂̊(t)}t=1Tsuperscriptsubscript̊𝒂𝑡𝑡1𝑇\{\mathring{\bm{a}}(t)\}_{t=1}^{T}{ over̊ start_ARG bold_italic_a end_ARG ( italic_t ) } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT is always negative, which is useful when analyzing queue-based policies (Neely, 2010a, §3.1). Its proof can be found in Section B.1.

3.3 Lyapunov Drift Analysis

We carry out our analysis based on the Lyapunov drift analysis (Neely, 2010a, §4), which considers the Lyapunov function Ltsubscript𝐿𝑡L_{t}italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and its drift Δ(𝑸(t))Δ𝑸𝑡\Delta(\bm{Q}(t))roman_Δ ( bold_italic_Q ( italic_t ) ), defined as follows:

Lyapunov function Lt=12n𝒩k𝒩(Qn(k))2,Lyapunov drift Δ(𝑸(t))=𝔼[Lt+1Lt𝑸(t)].formulae-sequenceLyapunov function subscript𝐿𝑡12subscript𝑛𝒩subscript𝑘𝒩superscriptsuperscriptsubscript𝑄𝑛𝑘2Lyapunov drift Δ𝑸𝑡𝔼subscript𝐿𝑡1conditionalsubscript𝐿𝑡𝑸𝑡\text{Lyapunov function }L_{t}=\frac{1}{2}\sum_{n\in{\mathcal{N}}}\sum_{k\in{% \mathcal{N}}}\left(Q_{n}^{(k)}\right)^{2},\quad\text{Lyapunov drift }\Delta(% \bm{Q}(t))=\operatornamewithlimits{\mathbb{E}}[L_{t+1}-L_{t}\mid\bm{Q}(t)].Lyapunov function italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT ( italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , Lyapunov drift roman_Δ ( bold_italic_Q ( italic_t ) ) = blackboard_E [ italic_L start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∣ bold_italic_Q ( italic_t ) ] .

We give the following result which is almost diretly applying the classical Lyapunov drift analysis to the queue dynamics in Equation 1. The proof is standard and thus deferred to Section B.2.

Lemma 3.2 (Lyapunov Drift Analysis).

Under the queue dynamics of Equation 1,

0𝔼[t=1TΔ(𝑸(t))]0𝔼superscriptsubscript𝑡1𝑇Δ𝑸𝑡\displaystyle 0\leq\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}% \Delta(\bm{Q}(t))\right]0 ≤ blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_Δ ( bold_italic_Q ( italic_t ) ) ] 𝔼[t=1T(n,m)k𝒩μn,m(k)(t)(Qm(k)(t)Qn(k)(t))+n𝒩k𝒩Qn(k)(t)λn(k)(t)]+absentlimit-from𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝑚subscript𝑘𝒩superscriptsubscript𝜇𝑛𝑚𝑘𝑡superscriptsubscript𝑄𝑚𝑘𝑡superscriptsubscript𝑄𝑛𝑘𝑡subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡\displaystyle\leq\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum_{% (n,m)\in{\mathcal{L}}}\sum_{k\in{\mathcal{N}}}\mu_{n,m}^{(k)}(t)\left(Q_{m}^{(% k)}(t)-Q_{n}^{(k)}(t)\right)+\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q% _{n}^{(k)}(t)\lambda_{n}^{(k)}(t)\right]+≤ blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ( italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) + ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] +
12N2((NM)2+2(NM)2+2R2)T.12superscript𝑁2superscript𝑁𝑀22superscript𝑁𝑀22superscript𝑅2𝑇\displaystyle\quad\frac{1}{2}N^{2}((NM)^{2}+2(NM)^{2}+2R^{2})T.divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_T . (8)

As sketched in Section 3.1, our algorithm is designed to approximately minimize the RHS of Equation 8 via online learning, which contains two non-constant terms 𝔼[(n,m)𝝁n,m(t),𝑸m(t)𝑸n(t)]𝔼subscript𝑛𝑚subscript𝝁𝑛𝑚𝑡subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡\operatornamewithlimits{\mathbb{E}}[\sum_{(n,m)\in{\mathcal{L}}}\langle\bm{\mu% }_{n,m}(t),\bm{Q}_{m}(t)-\bm{Q}_{n}(t)\rangle]blackboard_E [ ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ⟨ bold_italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) , bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ⟩ ] and 𝔼[n𝒩𝑸n(t),𝝀n(t)]𝔼subscript𝑛𝒩subscript𝑸𝑛𝑡subscript𝝀𝑛𝑡\operatornamewithlimits{\mathbb{E}}[\sum_{n\in{\mathcal{N}}}\langle\bm{Q}_{n}(% t),\bm{\lambda}_{n}(t)\rangle]blackboard_E [ ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ⟨ bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) , bold_italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ⟩ ]. As λn(k)(t)superscriptsubscript𝜆𝑛𝑘𝑡\lambda_{n}^{(k)}(t)italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) are obliviously chosen, the second term is also constant. Therefore, it remains to minimize the following term:

𝔼[t=1T𝝁n,m(t),𝑸m(t)𝑸n(t)]=𝔼[t=1TCn,m(t)(𝑸m(t)𝑸n(t)),𝒂n,m(t)],(n,m).formulae-sequence𝔼superscriptsubscript𝑡1𝑇subscript𝝁𝑛𝑚𝑡subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡𝔼superscriptsubscript𝑡1𝑇subscript𝐶𝑛𝑚𝑡subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡subscript𝒂𝑛𝑚𝑡for-all𝑛𝑚\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\langle\bm{\mu}_{n,m}(t% ),\bm{Q}_{m}(t)-\bm{Q}_{n}(t)\rangle\right]=\operatornamewithlimits{\mathbb{E}% }\left[\sum_{t=1}^{T}\langle C_{n,m}(t)(\bm{Q}_{m}(t)-\bm{Q}_{n}(t)),\bm{a}_{n% ,m}(t)\rangle\right],\quad\forall(n,m)\in{\mathcal{L}}.blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ⟨ bold_italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) , bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ⟩ ] = blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ⟨ italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ( bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ) , bold_italic_a start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ⟩ ] , ∀ ( italic_n , italic_m ) ∈ caligraphic_L . (9)

For each data link (n,m)𝑛𝑚(n,m)\in{\mathcal{L}}( italic_n , italic_m ) ∈ caligraphic_L, Equation 9 corresponds to an Online Linear Optimization (OLO) problem with Cn,m(t)(𝑸m(t)𝑸n(t)),𝒂n,m(t)subscript𝐶𝑛𝑚𝑡subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡subscript𝒂𝑛𝑚𝑡\langle C_{n,m}(t)(\bm{Q}_{m}(t)-\bm{Q}_{n}(t)),\bm{a}_{n,m}(t)\rangle⟨ italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ( bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ) , bold_italic_a start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ⟩ being the loss for round t𝑡titalic_t. In the next section, we first rigorously define the OLO problem in Definition 3.3 and then present our novel algorithm that tackles the unique challenges we face in network optimization contexts – potentially large losses due to unbounded queue lengths (recall Equation 9), and adapting to all the loss magnitudes because we want to conduct a self-bounding analysis on the queue lengths (recall Equation 6).

3.4 AdaPFOL: Learning for Network Stability

In this section, we will have a small detour to the OLO problem mentioned in Section 3.1. We first rigorously define the OLO problem (Zinkevich, 2003, McMahan and Streeter, 2010, Duchi et al., 2011) in Definition 3.3. Then, we present the construction our novel OLO algorithm of AdaPFOL (Algorithm 2). Finally, we prove that plugging it into the NSO framework in Algorithm 1 indeed ensures good optimization effect of Equation 9.

Definition 3.3 (Online Linear Optimization).

Consider a T𝑇Titalic_T-round game. Every round t[T]𝑡delimited-[]𝑇t\in[T]italic_t ∈ [ italic_T ], the player selects an action 𝐱tsubscript𝐱𝑡\bm{x}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT from a convex set 𝒳d𝒳superscript𝑑\mathcal{X}\subseteq\mathbb{R}^{d}caligraphic_X ⊆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. The environment simultaneously decides a loss vector 𝐠tdsubscript𝐠𝑡superscript𝑑\bm{g}_{t}\in\mathbb{R}^{d}bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT such that the loss of the player for round t𝑡titalic_t is 𝐱t,𝐠tsubscript𝐱𝑡subscript𝐠𝑡\langle\bm{x}_{t},\bm{g}_{t}\rangle⟨ bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩. The player will observe the whole vector of 𝐠tsubscript𝐠𝑡\bm{g}_{t}bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (i.e., full-information feedback is available, instead of the more restrictive bandit feedback model). Dynamic regret minimization in OLO considers minimizing

D-RegretTOLO(𝒙̊1,𝒙̊2,,𝒙̊T)=t=1T𝒈t,𝒙t𝒙t,𝒙̊1,𝒙̊2,,𝒙̊T𝒳.formulae-sequencesuperscriptsubscriptD-Regret𝑇OLOsubscript̊𝒙1subscript̊𝒙2subscript̊𝒙𝑇superscriptsubscript𝑡1𝑇subscript𝒈𝑡subscript𝒙𝑡superscriptsubscript𝒙𝑡for-allsubscript̊𝒙1subscript̊𝒙2subscript̊𝒙𝑇𝒳\text{D-Regret}_{T}^{\text{OLO}}(\mathring{\bm{x}}_{1},\mathring{\bm{x}}_{2},% \ldots,\mathring{\bm{x}}_{T})=\sum_{t=1}^{T}\langle\bm{g}_{t},\bm{x}_{t}-\bm{x% }_{t}^{\circ}\rangle,\quad\forall\mathring{\bm{x}}_{1},\mathring{\bm{x}}_{2},% \ldots,\mathring{\bm{x}}_{T}\in\mathcal{X}.D-Regret start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT OLO end_POSTSUPERSCRIPT ( over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ⟨ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ⟩ , ∀ over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∈ caligraphic_X .

Before moving on, we recall the two challenges i) and ii) in Section 3.1: Due to potentially unbounded queue lengths, AdaPFOL must resist from large and negative losses. Meanwhile, as we want to conduct the self-bounding analysis in Equation 6, AdaPFOL shall additionally enjoy a performance guarantee (D-RegretTOLOsuperscriptsubscriptD-Regret𝑇OLO\text{D-Regret}_{T}^{\text{OLO}}D-Regret start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT OLO end_POSTSUPERSCRIPT in Definition 3.3) depending on the geometric mean of all the loss magnitudes. Thus, our ideal algorithm for Definition 3.3 must satisfy the following:

  1. i)

    it can resist against occasionally large loss magnitudes, i.e., supt𝒈tsubscriptsupremum𝑡subscriptdelimited-∥∥subscript𝒈𝑡\sup_{t}\lVert\bm{g}_{t}\rVert_{\infty}roman_sup start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT can be large, and

  2. ii)

    it enjoys a performance guarantee depending on all the loss magnitudes, e.g., t=1T𝒈t2superscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥subscript𝒈𝑡2\sqrt{\sum_{t=1}^{T}\lVert\bm{g}_{t}\rVert_{\infty}^{2}}square-root start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG.

Algorithm 2 AdaPFOL: Adaptive Pamameter-Free Online Learning
0:  Action set 𝒳𝒳\mathcal{X}caligraphic_X. For each round t𝑡titalic_t, the maximum loss magnitude Gtsubscript𝐺𝑡G_{t}italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT will be reveled at the beginning, and a loss vector 𝒈tsubscript𝒈𝑡\bm{g}_{t}bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT satisfying 𝒈tGtsubscriptdelimited-∥∥subscript𝒈𝑡subscript𝐺𝑡\lVert\bm{g}_{t}\rVert_{\infty}\leq G_{t}∥ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT will be given in the end.
1:  Set G1𝐺1G\leftarrow 1italic_G ← 1. Initialize an instance 𝒜𝒜\mathcal{A}caligraphic_A of the algorithm in Lemma B.4 with action set 𝒳𝒳\mathcal{X}caligraphic_X.
2:  for t=1,2,𝑡12t=1,2,\ldotsitalic_t = 1 , 2 , … do
3:     Observe the maximum loss magnitude Gt>0subscript𝐺𝑡0G_{t}>0italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT > 0 for this round.
4:     if Gt>Gsubscript𝐺𝑡𝐺G_{t}>Gitalic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT > italic_G then
5:        Set G=2Gt𝐺2subscript𝐺𝑡G=2G_{t}italic_G = 2 italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Reset 𝒜𝒜\mathcal{A}caligraphic_A as a new instance of the algorithm in Lemma B.4 with action set 𝒳𝒳\mathcal{X}caligraphic_X.
6:     Output the ouptut of 𝒜𝒜\mathcal{A}caligraphic_A. Observe loss vector 𝒈tsubscript𝒈𝑡\bm{g}_{t}bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (such that 𝒈tGtsubscriptdelimited-∥∥subscript𝒈𝑡subscript𝐺𝑡\lVert\bm{g}_{t}\rVert_{\infty}\leq G_{t}∥ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT). Feed G1𝒈tsuperscript𝐺1subscript𝒈𝑡G^{-1}\bm{g}_{t}italic_G start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to 𝒜𝒜\mathcal{A}caligraphic_A.

To design such an algorithm, we build upon the Parameter-Free Online Learning (PFOL) algorithm by Cutkosky (2020). It ensures condition ii) by enjoying D-RegretTOLO(𝒙̊1,𝒙̊2,,𝒙̊T)t=1T𝒈t2proportional-tosuperscriptsubscriptD-Regret𝑇OLOsubscript̊𝒙1subscript̊𝒙2subscript̊𝒙𝑇superscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥subscript𝒈𝑡2\text{D-Regret}_{T}^{\text{OLO}}(\mathring{\bm{x}}_{1},\mathring{\bm{x}}_{2},% \ldots,\mathring{\bm{x}}_{T})\propto\sqrt{\sum_{t=1}^{T}\lVert\bm{g}_{t}\rVert% _{\infty}^{2}}D-Regret start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT OLO end_POSTSUPERSCRIPT ( over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) ∝ square-root start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG (Cutkosky, 2020, Theorem 6), but fails to bare large loss magnitudes as it requires 𝒈t1subscriptdelimited-∥∥subscript𝒈𝑡1\lVert\bm{g}_{t}\rVert_{\infty}\leq 1∥ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ 1 for all t[T]𝑡delimited-[]𝑇t\in[T]italic_t ∈ [ italic_T ].

Fortunately, as Cn,m(t)[0,M]subscript𝐶𝑛𝑚𝑡0𝑀C_{n,m}(t)\in[0,M]italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ∈ [ 0 , italic_M ], we know 𝒈tM𝑸m(t)𝑸n(t)subscriptdelimited-∥∥subscript𝒈𝑡𝑀subscriptdelimited-∥∥subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡\lVert\bm{g}_{t}\rVert_{\infty}\leq M\lVert\bm{Q}_{m}(t)-\bm{Q}_{n}(t)\rVert_{\infty}∥ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ italic_M ∥ bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT. Even better, 𝑸(t)𝑸𝑡\bm{Q}(t)bold_italic_Q ( italic_t ) can be calculated at the beginning of round t𝑡titalic_t – before deciding 𝒂(t)𝒂𝑡\bm{a}(t)bold_italic_a ( italic_t ). Utilizing this knowledge, we are able to design our OLO algorithm which enjoys both property i) and ii). We call this algorithm AdaPFOL (Adaptive Pamameter-Free Online Learning), whose pseudo-code is presented in Algorithm 2. AdaPFOL deploys a doubling technique to the PFOL algorithm of Cutkosky (2020), which restarts every time observing a large Gtsubscript𝐺𝑡G_{t}italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. We can show that this only introduces a logarithmic overhead as the original PFOL algorithm also enjoys ii).

AdaPFOL algorithm enjoys the following dynamic regret guarantee, satisfying both i) and ii):

Lemma 3.4 (Guarantee of AdaPFOL Algorithm).

Consider the OLO problem in Definition 3.3. Let the action set 𝒳𝒳\mathcal{X}caligraphic_X has diameter D=sup𝐱,𝐲𝒳𝐱𝐲1𝐷subscriptsupremum𝐱𝐲𝒳subscriptdelimited-∥∥𝐱𝐲1D=\sup_{\bm{x},\bm{y}\in\mathcal{X}}\lVert\bm{x}-\bm{y}\rVert_{1}italic_D = roman_sup start_POSTSUBSCRIPT bold_italic_x , bold_italic_y ∈ caligraphic_X end_POSTSUBSCRIPT ∥ bold_italic_x - bold_italic_y ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Suppose that 𝐠tGtsubscriptdelimited-∥∥subscript𝐠𝑡subscript𝐺𝑡\lVert\bm{g}_{t}\rVert_{\infty}\leq G_{t}∥ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, where Gtsubscript𝐺𝑡G_{t}italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is some t1subscript𝑡1\mathcal{F}_{t-1}caligraphic_F start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT-measurable random variable and (t)t=0Tsuperscriptsubscriptsubscript𝑡𝑡0𝑇(\mathcal{F}_{t})_{t=0}^{T}( caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT is the natural filtration, i.e., tsubscript𝑡\mathcal{F}_{t}caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the σ𝜎\sigmaitalic_σ-algebra generated by all random observations made during the first t𝑡titalic_t rounds. Then, AdaPFOL (Algorithm 2) ensures that for any comparator sequence 𝐱̊1,𝐱̊2,,𝐱̊T𝒳subscript̊𝐱1subscript̊𝐱2subscript̊𝐱𝑇𝒳\mathring{\bm{x}}_{1},\mathring{\bm{x}}_{2},\ldots,\mathring{\bm{x}}_{T}\in% \mathcal{X}over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∈ caligraphic_X, if maxt[T]Gt1subscript𝑡delimited-[]𝑇subscript𝐺𝑡1\max_{t\in[T]}G_{t}\geq 1roman_max start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≥ 1, then

D-RegretTOLO(𝒙̊1,𝒙̊2,,𝒙̊T)=𝒪(D(D+t=1T1𝒙̊t𝒙̊t+11)t=1T𝒈t2logTlog(maxt=1TGt)).superscriptsubscriptD-Regret𝑇OLOsubscript̊𝒙1subscript̊𝒙2subscript̊𝒙𝑇𝒪𝐷𝐷superscriptsubscript𝑡1𝑇1subscriptdelimited-∥∥subscript̊𝒙𝑡subscript̊𝒙𝑡11superscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥subscript𝒈𝑡2𝑇superscriptsubscript𝑡1𝑇subscript𝐺𝑡\text{D-Regret}_{T}^{\text{OLO}}(\mathring{\bm{x}}_{1},\mathring{\bm{x}}_{2},% \ldots,\mathring{\bm{x}}_{T})=\operatorname{\mathcal{O}}\left(\sqrt{D\left(D+% \sum_{t=1}^{T-1}\lVert\mathring{\bm{x}}_{t}-\mathring{\bm{x}}_{t+1}\rVert_{1}% \right)}\sqrt{\sum_{t=1}^{T}\lVert\bm{g}_{t}\rVert_{\infty}^{2}}\log T\log% \left(\max_{t=1}^{T}G_{t}\right)\right).D-Regret start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT OLO end_POSTSUPERSCRIPT ( over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) = caligraphic_O ( square-root start_ARG italic_D ( italic_D + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∥ over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG square-root start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_log italic_T roman_log ( roman_max start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) .
Remark 3.

Note that in general, it is impossible to guarantee D-RegretTOLO(𝐱̊1,𝐱̊2,,𝐱̊T)=oT(T)superscriptsubscriptD-Regret𝑇OLOsubscript̊𝐱1subscript̊𝐱2subscript̊𝐱𝑇subscript𝑜𝑇𝑇\text{D-Regret}_{T}^{\text{OLO}}(\mathring{\bm{x}}_{1},\mathring{\bm{x}}_{2},% \ldots,\mathring{\bm{x}}_{T})=o_{T}(T)D-Regret start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT OLO end_POSTSUPERSCRIPT ( over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) = italic_o start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T ) simultaneously for all {𝐱̊t𝒳}t[T]subscriptsubscript̊𝐱𝑡𝒳𝑡delimited-[]𝑇\{\mathring{\bm{x}}_{t}\in\mathcal{X}\}_{t\in[T]}{ over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_X } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT (Zinkevich, 2003). Therefore, many dynamic regret bounds, including ours, depend on the notion of path length PT=t=1T1𝐱̊t𝐱̊t+1subscript𝑃𝑇superscriptsubscript𝑡1𝑇1delimited-∥∥subscript̊𝐱𝑡subscript̊𝐱𝑡1P_{T}=\sum_{t=1}^{T-1}\lVert\mathring{\bm{x}}_{t}-\mathring{\bm{x}}_{t+1}\rVertitalic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∥ over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∥. Although the path length is linear in T𝑇Titalic_T in the worst case, D-RegretTOLO(𝐱̊1,𝐱̊2,,𝐱̊T)=oT(T)superscriptsubscriptD-Regret𝑇OLOsubscript̊𝐱1subscript̊𝐱2subscript̊𝐱𝑇subscript𝑜𝑇𝑇\text{D-Regret}_{T}^{\text{OLO}}(\mathring{\bm{x}}_{1},\mathring{\bm{x}}_{2},% \ldots,\mathring{\bm{x}}_{T})=o_{T}(T)D-Regret start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT OLO end_POSTSUPERSCRIPT ( over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) = italic_o start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T ) can still be ensured in cases where PT=oT(T)subscript𝑃𝑇subscript𝑜𝑇𝑇P_{T}=o_{T}(T)italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = italic_o start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T ).

The proof of Lemma 3.4 will be presented in Section B.3. It can be seen that AdaPFOL indeed satisfies both i) and ii): It allows the loss magnitudes 𝒈tsubscriptdelimited-∥∥subscript𝒈𝑡\lVert\bm{g}_{t}\rVert_{\infty}∥ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT to be large, and also enjoys a magnitude-aware dynamic regret guarantee of D-RegretTOLO(𝒙̊1,𝒙̊2,,𝒙̊T)t=1T𝒈t2proportional-tosuperscriptsubscriptD-Regret𝑇OLOsubscript̊𝒙1subscript̊𝒙2subscript̊𝒙𝑇superscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥subscript𝒈𝑡2\text{D-Regret}_{T}^{\text{OLO}}(\mathring{\bm{x}}_{1},\mathring{\bm{x}}_{2},% \ldots,\mathring{\bm{x}}_{T})\propto\sqrt{\sum_{t=1}^{T}\lVert\bm{g}_{t}\rVert% _{\infty}^{2}}D-Regret start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT OLO end_POSTSUPERSCRIPT ( over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) ∝ square-root start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG.

Therefore, if we deploy AdaPFOL (Algorithm 2) to decide 𝒂n,m(t)subscript𝒂𝑛𝑚𝑡\bm{a}_{n,m}(t)bold_italic_a start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) on each link (n,m)𝑛𝑚(n,m)\in{\mathcal{L}}( italic_n , italic_m ) ∈ caligraphic_L as we do in Algorithm 1, the RHS of Equation 8 can consequently be minimized in the sense that it is close to that induced by the reference actions {𝒂̊(t)}t[T]subscript̊𝒂𝑡𝑡delimited-[]𝑇\{\mathring{\bm{a}}(t)\}_{t\in[T]}{ over̊ start_ARG bold_italic_a end_ARG ( italic_t ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT. Formally, we give the following theorem:

Theorem 3.5 (Deciding 𝒂(t)𝒂𝑡\bm{a}(t)bold_italic_a ( italic_t ) via AdaPFOL Algorithm).

For each link (n,m)𝑛𝑚(n,m)\in{\mathcal{L}}( italic_n , italic_m ) ∈ caligraphic_L, as we did in NSO, we execute an instance of AdaPFOL (Algorithm 2) where 𝒳=(𝒩)𝒳𝒩\mathcal{X}=\triangle({\mathcal{N}})caligraphic_X = △ ( caligraphic_N ), 𝐠t=Cn,m(t)(𝐐m(t)𝐐n(t))subscript𝐠𝑡subscript𝐶𝑛𝑚𝑡subscript𝐐𝑚𝑡subscript𝐐𝑛𝑡\bm{g}_{t}=C_{n,m}(t)(\bm{Q}_{m}(t)-\bm{Q}_{n}(t))bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ( bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ), and Gt=M𝐐m(t)𝐐n(t)subscript𝐺𝑡𝑀subscriptdelimited-∥∥subscript𝐐𝑚𝑡subscript𝐐𝑛𝑡G_{t}=M\lVert\bm{Q}_{m}(t)-\bm{Q}_{n}(t)\rVert_{\infty}italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_M ∥ bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT. We make their outputs 𝐱tsubscript𝐱𝑡\bm{x}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT as 𝐚n,m(t)subscript𝐚𝑛𝑚𝑡\bm{a}_{n,m}(t)bold_italic_a start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) for every round t𝑡titalic_t. Let μn,m(k)(t)superscriptsubscript𝜇𝑛𝑚𝑘𝑡\mu_{n,m}^{(k)}(t)italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) be the number of actually transmitted jobs from Qn(k)(t)superscriptsubscript𝑄𝑛𝑘𝑡Q_{n}^{(k)}(t)italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) to Qm(k)(t+1)superscriptsubscript𝑄𝑚𝑘𝑡1Q_{m}^{(k)}(t+1)italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t + 1 ) induced by an,m(k)(t)superscriptsubscript𝑎𝑛𝑚𝑘𝑡a_{n,m}^{(k)}(t)italic_a start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ).

Consider an arbitrary reference action sequence {𝐚̊(t)}t[T]subscript̊𝐚𝑡𝑡delimited-[]𝑇\{\mathring{\bm{a}}(t)\}_{t\in[T]}{ over̊ start_ARG bold_italic_a end_ARG ( italic_t ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT satisfying 1. Let μ̊n,m(k)(t)=Cn,m(t)ån,m(k)(t)[0,M]superscriptsubscript̊𝜇𝑛𝑚𝑘𝑡subscript𝐶𝑛𝑚𝑡superscriptsubscript̊𝑎𝑛𝑚𝑘𝑡0𝑀\mathring{\mu}_{n,m}^{(k)}(t)=C_{n,m}(t)\mathring{a}_{n,m}^{(k)}(t)\in[0,M]over̊ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) = italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ∈ [ 0 , italic_M ] (as Cn,m(t)[0,M]subscript𝐶𝑛𝑚𝑡0𝑀C_{n,m}(t)\in[0,M]italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ∈ [ 0 , italic_M ] and ån,m(k)(t)[0,1]superscriptsubscript̊𝑎𝑛𝑚𝑘𝑡01\mathring{a}_{n,m}^{(k)}(t)\in[0,1]over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ∈ [ 0 , 1 ]). Then

𝔼[t=1T(n,m)k𝒩(μn,m(k)(t)μ̊n,m(k)(t))(Qm(k)(t)Qn(k)(t))]𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝑚subscript𝑘𝒩superscriptsubscript𝜇𝑛𝑚𝑘𝑡superscriptsubscript̊𝜇𝑛𝑚𝑘𝑡superscriptsubscript𝑄𝑚𝑘𝑡superscriptsubscript𝑄𝑛𝑘𝑡\displaystyle\quad\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum_% {(n,m)\in{\mathcal{L}}}\sum_{k\in{\mathcal{N}}}(\mu_{n,m}^{(k)}(t)-\mathring{% \mu}_{n,m}^{(k)}(t))\left(Q_{m}^{(k)}(t)-Q_{n}^{(k)}(t)\right)\right]blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - over̊ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) ( italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) ]
=𝒪(M1+PTa𝔼[t=1T𝑸(t)22logTlog(maxt=1Tmax(n,m)M𝑸m(t)𝑸n(t))]),absent𝒪𝑀1superscriptsubscript𝑃𝑇𝑎𝔼superscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥𝑸𝑡22𝑇superscriptsubscript𝑡1𝑇subscript𝑛𝑚𝑀subscriptdelimited-∥∥subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡,\displaystyle=\operatorname{\mathcal{O}}\left(M\sqrt{1+P_{T}^{a}}% \operatornamewithlimits{\mathbb{E}}\left[\sqrt{\sum_{t=1}^{T}\lVert\bm{Q}(t)% \rVert_{2}^{2}}\log T\log\left(\max_{t=1}^{T}\max_{(n,m)\in{\mathcal{L}}}M% \lVert\bm{Q}_{m}(t)-\bm{Q}_{n}(t)\rVert_{\infty}\right)\right]\right)\text{,}= caligraphic_O ( italic_M square-root start_ARG 1 + italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_ARG blackboard_E [ square-root start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_log italic_T roman_log ( roman_max start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_max start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_M ∥ bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) ] ) ,

where PTat=1T1(n,m)𝐚̊n,m(t)𝐚̊n,m(t+1)1superscriptsubscript𝑃𝑇𝑎superscriptsubscript𝑡1𝑇1subscript𝑛𝑚subscriptdelimited-∥∥subscript̊𝐚𝑛𝑚𝑡subscript̊𝐚𝑛𝑚𝑡11P_{T}^{a}\triangleq\sum_{t=1}^{T-1}\sum_{(n,m)\in{\mathcal{L}}}\lVert\mathring% {\bm{a}}_{n,m}(t)-\mathring{\bm{a}}_{n,m}(t+1)\rVert_{1}italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ≜ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∥ over̊ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) - over̊ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t + 1 ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is the path length of {𝐚̊(t)}t=1Tsuperscriptsubscript̊𝐚𝑡𝑡1𝑇\{\mathring{\bm{a}}(t)\}_{t=1}^{T}{ over̊ start_ARG bold_italic_a end_ARG ( italic_t ) } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT.

The proof is almost directly applying Lemma 4.3, so we postpone it to Section B.4. Thanks to property ii), the RHS of Theorem 3.5 depends on the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-norm of the queue lengths t=1T𝑸(t)22superscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥𝑸𝑡22\sqrt{\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{2}^{2}}square-root start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG. As we will see shortly, this is pivotal to the self-bounding argument sketched in Equation 6.

3.5 Main Theorem for Multi-Hop Network Stability

As sketched in Section 3.1, putting previous conclusions together and use a so-called self-bounding property, the following guarantee for multi-hop network stability can be derived:

Theorem 3.6 (Main Theorem for Multi-Hop Network Stability).

Suppose that {𝐚̊n,m(t)(𝒩)}(n,m),t[T]subscriptsubscript̊𝐚𝑛𝑚𝑡𝒩formulae-sequence𝑛𝑚𝑡delimited-[]𝑇\{\mathring{\bm{a}}_{n,m}(t)\in\triangle({\mathcal{N}})\}_{(n,m)\in{\mathcal{L% }},t\in[T]}{ over̊ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ∈ △ ( caligraphic_N ) } start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L , italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT satisfies 1 and its path length satisfies 222This assumption on path lengths comes from (Huang et al., 2024, Assumption 2). As discussed in Remark 3, such conditions are necessary.

Ptas=1t1(n,m)𝒂̊n,m(s)𝒂̊n,m(s+1)1Cat1/2δa,t=1,2,,T,formulae-sequencesuperscriptsubscript𝑃𝑡𝑎superscriptsubscript𝑠1𝑡1subscript𝑛𝑚subscriptdelimited-∥∥subscript̊𝒂𝑛𝑚𝑠subscript̊𝒂𝑛𝑚𝑠11superscript𝐶𝑎superscript𝑡12subscript𝛿𝑎for-all𝑡12𝑇P_{t}^{a}\triangleq\sum_{s=1}^{t-1}\sum_{(n,m)\in{\mathcal{L}}}\lVert\mathring% {\bm{a}}_{n,m}(s)-\mathring{\bm{a}}_{n,m}(s+1)\rVert_{1}\leq C^{a}t^{1/2-% \delta_{a}},\quad\forall t=1,2,\ldots,T,italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ≜ ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∥ over̊ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_s ) - over̊ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_s + 1 ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_C start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , ∀ italic_t = 1 , 2 , … , italic_T , (10)

where Casuperscript𝐶𝑎C^{a}italic_C start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT and δasubscript𝛿𝑎\delta_{a}italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT are assumed to be known constants but the precise Ptasuperscriptsubscript𝑃𝑡𝑎P_{t}^{a}italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT or {𝐚̊n,m(t)(𝒩)}(n,m),t[T]subscriptsubscript̊𝐚𝑛𝑚𝑡𝒩formulae-sequence𝑛𝑚𝑡delimited-[]𝑇\{\mathring{\bm{a}}_{n,m}(t)\in\triangle({\mathcal{N}})\}_{(n,m)\in{\mathcal{L% }},t\in[T]}{ over̊ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ∈ △ ( caligraphic_N ) } start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L , italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT both remain unknown. Then, if we execute the NSO framework in Algorithm 1 with AdaPFOL defined in Algorithm 2, the following performance guarantee is enjoyed:

1T𝔼[t=1T𝑸(t)1]=𝒪((N2(2NM+R)2+ϵWN2(2NM+R))CW+(N4M2+N2R2)ϵW)+oT(1).1𝑇𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1𝒪superscript𝑁2superscript2𝑁𝑀𝑅2subscriptitalic-ϵ𝑊superscript𝑁22𝑁𝑀𝑅subscript𝐶𝑊superscript𝑁4superscript𝑀2superscript𝑁2superscript𝑅2subscriptitalic-ϵ𝑊subscript𝑜𝑇1\frac{1}{T}\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{Q}% (t)\rVert_{1}\right]=\operatorname{\mathcal{O}}\left(\frac{(N^{2}(2NM+R)^{2}+% \epsilon_{W}N^{2}(2NM+R))C_{W}+(N^{4}M^{2}+N^{2}R^{2})}{\epsilon_{W}}\right)+o% _{T}(1).divide start_ARG 1 end_ARG start_ARG italic_T end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = caligraphic_O ( divide start_ARG ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) ) italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT + ( italic_N start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT end_ARG ) + italic_o start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( 1 ) .

That is, when T0much-greater-than𝑇0T\gg 0italic_T ≫ 0, we have 1T𝔼[t=1T𝐐(t)1]=𝒪T(1)1𝑇𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝐐𝑡1subscript𝒪𝑇1\frac{1}{T}\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{Q}% (t)\rVert_{1}\right]=\operatorname{\mathcal{O}}_{T}(1)divide start_ARG 1 end_ARG start_ARG italic_T end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( 1 ), i.e., Equation 2 holds and the system is stable.

Remark 4.

Any reference policy {𝐚̊(t)}t[T]subscript̊𝐚𝑡𝑡delimited-[]𝑇\{\mathring{\bm{a}}(t)\}_{t\in[T]}{ over̊ start_ARG bold_italic_a end_ARG ( italic_t ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT satisfying Equation 10 is called “mildly varying”. As mentioned in Remark 3, it is impossible to achieve non-trivial performance without restricting the reference sequence. We also compare our result with two previous results (Huang et al., 2024, Yang et al., 2023) which also ensured network stability in adversarial networks under bandit feedback: Under a path length assumption similar to but looser than Equation 10, Huang et al. (2024) stabilized single-server networks. By assuming the environment (instead of the reference policy which we do) is mildly varying, Yang et al. (2023) stabilized single-hop networks. Thus, Theorem 3.6 is the first guarantee applicable to adversarial multi-hop networks under bandit feedback.

A formal proof resides in Section B.5. We only highlight the self-bounding step here:

Proof Sketch of Theorem 3.6.

The first step of the proof is comparing the guarantee from 1 and that from Lyapunov drift analysis. Specifically, recall that Lemma 3.1 upper bounds the total queue length using 𝔼[t=1T(n,m)𝑪n,m(t)(𝑸m(t)𝑸n(t)),𝒂̊n,m(t)]𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝑚subscript𝑪𝑛𝑚𝑡subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡subscript̊𝒂𝑛𝑚𝑡-\operatornamewithlimits{\mathbb{E}}[\sum_{t=1}^{T}\sum_{(n,m)\in{\mathcal{L}}% }\langle\bm{C}_{n,m}(t)(\bm{Q}_{m}(t)-\bm{Q}_{n}(t)),\mathring{\bm{a}}_{n,m}(t% )\rangle]- blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ⟨ bold_italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ( bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ) , over̊ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ⟩ ] (together with some other terms, which are constants after taking expectations) while Lemma 3.2 reveals the non-positivity of 𝔼[(n,m)𝑪n,m(t)(𝑸m(t)𝑸n(t)),𝒂n,m(t)]𝔼subscript𝑛𝑚subscript𝑪𝑛𝑚𝑡subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡subscript𝒂𝑛𝑚𝑡-\operatornamewithlimits{\mathbb{E}}[\sum_{(n,m)\in{\mathcal{L}}}\langle\bm{C}% _{n,m}(t)(\bm{Q}_{m}(t)-\bm{Q}_{n}(t)),{\bm{a}}_{n,m}(t)\rangle]- blackboard_E [ ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ⟨ bold_italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ( bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ) , bold_italic_a start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ⟩ ] (again, many constants omitted). Furthermore, via the guarantee of AdaPFOL (Algorithm 2) in Theorem 3.5, these two terms are actually pretty close – they only differ by 𝒪~T(1+CaT1/2δa𝔼[t=1T𝑸(t)22])subscript~𝒪𝑇1superscript𝐶𝑎superscript𝑇12subscript𝛿𝑎𝔼superscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥𝑸𝑡22\operatorname{\widetilde{\mathcal{O}}}_{T}\left(\sqrt{1+C^{a}T^{1/2-\delta_{a}% }}\operatornamewithlimits{\mathbb{E}}\left[\sqrt{\sum_{t=1}^{T}\lVert\bm{Q}(t)% \rVert_{2}^{2}}\right]\right)start_OPFUNCTION over~ start_ARG caligraphic_O end_ARG end_OPFUNCTION start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( square-root start_ARG 1 + italic_C start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG blackboard_E [ square-root start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] ), where we used the assumption that PTaCaT1/2δasuperscriptsubscript𝑃𝑇𝑎superscript𝐶𝑎superscript𝑇12subscript𝛿𝑎P_{T}^{a}\leq C^{a}T^{1/2-\delta_{a}}italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ≤ italic_C start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. By a property that ensures t=1Txt24(t=1Txt)1/5superscriptsubscript𝑡1𝑇superscriptsubscript𝑥𝑡24superscriptsuperscriptsubscript𝑡1𝑇subscript𝑥𝑡15\sum_{t=1}^{T}x_{t}^{2}\leq 4(\sum_{t=1}^{T}x_{t})^{1/5}∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 4 ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 5 end_POSTSUPERSCRIPT when |xtxt+1|1subscript𝑥𝑡subscript𝑥𝑡11\lvert x_{t}-x_{t+1}\rvert\leq 1| italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT | ≤ 1 (Lemma D.3), this 𝔼[t=1T𝑸(t)22]𝔼superscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥𝑸𝑡22\operatornamewithlimits{\mathbb{E}}\left[\sqrt{\sum_{t=1}^{T}\lVert\bm{Q}(t)% \rVert_{2}^{2}}\right]blackboard_E [ square-root start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] can be controlled by 𝔼[t=1T𝑸(t)1]3/4\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{% 1}\right]^{3/4}blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT.

Finishing these steps, which are detailed in Lemma B.8, we are able to conclude

𝔼[t=1T𝑸(t)1]f(T)+g(T)𝔼[t=1T𝑸(t)1]3/4log𝔼[t=1T𝑸(t)1],\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{% 1}\right]\leq f(T)+g(T)\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}% \lVert\bm{Q}(t)\rVert_{1}\right]^{3/4}\log\operatornamewithlimits{\mathbb{E}}% \left[\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{1}\right],blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] ≤ italic_f ( italic_T ) + italic_g ( italic_T ) blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT roman_log blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] ,

where f(T)=𝒪T(T)𝑓𝑇subscript𝒪𝑇𝑇f(T)=\operatorname{\mathcal{O}}_{T}(T)italic_f ( italic_T ) = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T ) and g(T)=𝒪T(T1/4δa/2)𝑔𝑇subscript𝒪𝑇superscript𝑇14subscript𝛿𝑎2g(T)=\operatorname{\mathcal{O}}_{T}(T^{1/4-\delta_{a}/2})italic_g ( italic_T ) = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT 1 / 4 - italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT / 2 end_POSTSUPERSCRIPT ) are some abstract functions to simplify notations.

Therefore, this inequality is in a self-bounding form that yf+y3/4glogy𝑦𝑓superscript𝑦34𝑔𝑦y\leq f+y^{3/4}g\log yitalic_y ≤ italic_f + italic_y start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT italic_g roman_log italic_y where y𝑦yitalic_y is the total queue lengths in T𝑇Titalic_T rounds. As we informally stated in Equation 6, this gives our system stability guarantee. Indeed, in Lemma D.5, we show that yf+y3/4glogy𝑦𝑓superscript𝑦34𝑔𝑦y\leq f+y^{3/4}g\log yitalic_y ≤ italic_f + italic_y start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT italic_g roman_log italic_y implies y=𝒪(f)+𝒪~(g4)𝑦𝒪𝑓~𝒪superscript𝑔4y=\operatorname{\mathcal{O}}(f)+\operatorname{\widetilde{\mathcal{O}}}(g^{4})italic_y = caligraphic_O ( italic_f ) + start_OPFUNCTION over~ start_ARG caligraphic_O end_ARG end_OPFUNCTION ( italic_g start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ). Therefore, y=𝔼[t=1T𝑸(t)1]=𝒪T(T)+𝒪~T(T12δa)𝑦𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1subscript𝒪𝑇𝑇subscript~𝒪𝑇superscript𝑇12subscript𝛿𝑎y=\operatornamewithlimits{\mathbb{E}}[\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{1}]% =\operatorname{\mathcal{O}}_{T}(T)+\operatorname{\widetilde{\mathcal{O}}}_{T}(% T^{1-2\delta_{a}})italic_y = blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T ) + start_OPFUNCTION over~ start_ARG caligraphic_O end_ARG end_OPFUNCTION start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT 1 - 2 italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) and thus 1T𝔼[t=1T𝑸(t)1]=𝒪T(1)1𝑇𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1subscript𝒪𝑇1\frac{1}{T}\operatornamewithlimits{\mathbb{E}}[\sum_{t=1}^{T}\lVert\bm{Q}(t)% \rVert_{1}]=\operatorname{\mathcal{O}}_{T}(1)divide start_ARG 1 end_ARG start_ARG italic_T end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( 1 ). ∎

Algorithm 3 UMO2: Utility Maximization via Online Linear Optimization and Bandit Convex Optimization
0:  Number of rounds T𝑇Titalic_T, set of servers 𝒩𝒩{\mathcal{N}}caligraphic_N and links {\mathcal{L}}caligraphic_L, maximum capacity M𝑀Mitalic_M, feasible arrival rates ΛΛ\Lambdaroman_Λ. Parameter V𝑉Vitalic_V. An online linear optimization algorithm AdaPFOL (Algorithm 2) and a bandit convex optimization algorithm AdaBGD (Algorithm 4).
1:  For each link (n,m)𝑛𝑚(n,m)\in{\mathcal{L}}( italic_n , italic_m ) ∈ caligraphic_L, initialize an instance of AdaPFOL with action set (𝒩)𝒩\triangle({\mathcal{N}})△ ( caligraphic_N ) as AdaPFOLn,msubscriptAdaPFOL𝑛𝑚\texttt{AdaPFOL}_{n,m}AdaPFOL start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT.
2:   Initialize an instance of AdaBGD with action set ΛΛ\Lambdaroman_Λ.
3:  for t=1,2,,T𝑡12𝑇t=1,2,\ldots,Titalic_t = 1 , 2 , … , italic_T do
4:     For each link (n,m)𝑛𝑚(n,m)\in{\mathcal{L}}( italic_n , italic_m ) ∈ caligraphic_L, pass the maximum loss magnitude for this round M𝑸m(t)𝑸n(t)𝑀subscriptdelimited-∥∥subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡M\lVert\bm{Q}_{m}(t)-\bm{Q}_{n}(t)\rVert_{\infty}italic_M ∥ bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT to AdaPFOLn,msubscriptAdaPFOL𝑛𝑚\texttt{AdaPFOL}_{n,m}AdaPFOL start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT. Pick link allocation 𝒂n,m(t)(𝒩)subscript𝒂𝑛𝑚𝑡𝒩\bm{a}_{n,m}(t)\in\triangle({\mathcal{N}})bold_italic_a start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ∈ △ ( caligraphic_N ) as the output of AdaPFOLn,msubscriptAdaPFOL𝑛𝑚\texttt{AdaPFOL}_{n,m}AdaPFOL start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT.
5:     Call AdaBGD with learning rates defined in Equation 11. Pick arrival rates 𝝀(t)Λ𝝀𝑡Λ\bm{\lambda}(t)\in\Lambdabold_italic_λ ( italic_t ) ∈ roman_Λ as its output.
ηtsubscript𝜂𝑡\displaystyle\eta_{t}italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =(CλT1/2δλ/(CλT1/2δλ)7/3(4r3d2)28/9(M+R)4/3+CλT1/2δλ(r3d2VG2/L)4/3+s=1t((𝒒s+VG)2(𝒒s2+VL)2)1/3)3/4,\displaystyle=\left(C^{\lambda}T^{1/2-\delta_{\lambda}}\middle/\begin{subarray% }{c}\left(C^{\lambda}T^{1/2-\delta_{\lambda}}\right)^{7/3}\left(4r^{-3}d^{2}% \right)^{28/9}\left(M+R\right)^{4/3}+\\ C^{\lambda}T^{1/2-\delta_{\lambda}}(r^{-3}d^{2}VG^{2}/L)^{4/3}+\\ \sum_{s=1}^{t}\left((\lVert\bm{q}_{s}\rVert_{\infty}+VG)^{2}(\lVert\bm{q}_{s}% \rVert_{2}+VL)^{2}\right)^{1/3}\end{subarray}\right)^{3/4},= ( italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT / start_ARG start_ROW start_CELL ( italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 7 / 3 end_POSTSUPERSCRIPT ( 4 italic_r start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 28 / 9 end_POSTSUPERSCRIPT ( italic_M + italic_R ) start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT + end_CELL end_ROW start_ROW start_CELL italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_r start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_V italic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_L ) start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT + end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( ( ∥ bold_italic_q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT + italic_V italic_G ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ∥ bold_italic_q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_V italic_L ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT ,
δtsubscript𝛿𝑡\displaystyle\delta_{t}italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =(ηtd2(𝑸(t)+VG)2(𝑸(t)2+VL))1/3,αt=δtr.formulae-sequenceabsentsuperscriptsubscript𝜂𝑡superscript𝑑2superscriptsubscriptdelimited-∥∥𝑸𝑡𝑉𝐺2subscriptdelimited-∥∥𝑸𝑡2𝑉𝐿13subscript𝛼𝑡subscript𝛿𝑡𝑟\displaystyle=\left(\eta_{t}d^{2}\frac{(\lVert\bm{Q}(t)\rVert_{\infty}+VG)^{2}% }{(\lVert\bm{Q}(t)\rVert_{2}+VL)}\right)^{1/3},\quad\alpha_{t}=\frac{\delta_{t% }}{r}.= ( italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT + italic_V italic_G ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_V italic_L ) end_ARG ) start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT , italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_r end_ARG . (11)
6:     Observe capacities {Cn,m(t)}(n,m)subscriptsubscript𝐶𝑛𝑚𝑡𝑛𝑚\{C_{n,m}(t)\}_{(n,m)\in{\mathcal{L}}}{ italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) } start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT and actual data transmissions {μn,m(k)(t)}(n,m),k𝒩subscriptsuperscriptsubscript𝜇𝑛𝑚𝑘𝑡formulae-sequence𝑛𝑚𝑘𝒩\{\mu_{n,m}^{(k)}(t)\}_{(n,m)\in{\mathcal{L}},k\in{\mathcal{N}}}{ italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) } start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L , italic_k ∈ caligraphic_N end_POSTSUBSCRIPT.
7:     Calculate queue lengths 𝑸(t+1)𝑸𝑡1\bm{Q}(t+1)bold_italic_Q ( italic_t + 1 ) from 𝑸(t)𝑸𝑡\bm{Q}(t)bold_italic_Q ( italic_t ) according to Equation 1.
8:     For each link (n,m)𝑛𝑚(n,m)\in{\mathcal{L}}( italic_n , italic_m ) ∈ caligraphic_L, pass the loss vector Cn,m(t)(𝑸m(t)𝑸n(t))subscript𝐶𝑛𝑚𝑡subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡C_{n,m}(t)(\bm{Q}_{m}(t)-\bm{Q}_{n}(t))italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ( bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ) to AdaPFOLn,msubscriptAdaPFOL𝑛𝑚\texttt{AdaPFOL}_{n,m}AdaPFOL start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT.
9:      Observe the collected utility gt(𝝀(t))subscript𝑔𝑡𝝀𝑡g_{t}(\bm{\lambda}(t))italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ). Pass the loss 𝑸(t),𝝀(t)Vgt(𝝀(t))𝑸𝑡𝝀𝑡𝑉subscript𝑔𝑡𝝀𝑡\langle\bm{Q}(t),\bm{\lambda}(t)\rangle-Vg_{t}(\bm{\lambda}(t))⟨ bold_italic_Q ( italic_t ) , bold_italic_λ ( italic_t ) ⟩ - italic_V italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) to AdaBGD.

4 Utility Maximization in Adversarial Multi-Hop Networks

We now turn to the utility maximization task. In this task, in addition to the capacity allocations, the arrival rates are also decided by the scheduler with an objective of maximizing the unknown and time-varying utility function. The scheduler’s objective is to maximize the average utility it gains (Equation 3), while ensuring the average number of jobs in the network remains small (Equation 2).

This section is organized similar to Section 3: We first explain the motivation of our algorithmic framework UMO2 in Algorithm 3 and then present the assumptions together with analysis.

4.1 Motivation of Our Algorithmic Framework

In Algorithm 3, we give the general algorithmic framework of Utility Maximization via Online Linear Optimization and Bandit Convex Optimization (UMO2) which achieves utility optimization via the plug-in of two optimization sub-rountines AdaPFOL and AdaBGD. The differences between it and our system stability algorithm (NSO; Algorithm 1) is marked in blue. As we already motivated in Section 3.1 that an OLO algorithm AdaPFOL can help stabilize the system (recall Equation 5), this section focuses on motivating the other sub-rountine AdaBGD by going through the design of UMO2.

To handle the utility function, instead of the Lyapunov analysis in the previous section, UMO2 is based on the Lyapunov drift-plus-penalty analysis (Neely, 2010a, Theorem 4.2). In Lemma C.2, we derive

𝔼[t=1T(n,m)Cn,m(t)(𝑸m(t)𝑸n(t)),𝒂n,m(t)]+limit-from𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝑚subscript𝐶𝑛𝑚𝑡subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡subscript𝒂𝑛𝑚𝑡\displaystyle\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum_{(n,m% )\in{\mathcal{L}}}\langle C_{n,m}(t)(\bm{Q}_{m}(t)-\bm{Q}_{n}(t)),\bm{a}_{n,m}% (t)\rangle\right]+blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ⟨ italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ( bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ) , bold_italic_a start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ⟩ ] +
𝔼[t=1Tn𝒩𝑸n(t),𝝀n(t)]V𝔼[t=1T(gt(𝝀(t)gt(𝝀̊(t))))]0,greater-than-or-equivalent-to𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑸𝑛𝑡subscript𝝀𝑛𝑡𝑉𝔼superscriptsubscript𝑡1𝑇subscript𝑔𝑡𝝀𝑡subscript𝑔𝑡̊𝝀𝑡0\displaystyle\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum_{n\in% {\mathcal{N}}}\langle\bm{Q}_{n}(t),\bm{\lambda}_{n}(t)\rangle\right]-V% \operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\bigg{(}g_{t}(\bm{% \lambda}(t)-g_{t}(\mathring{\bm{\lambda}}(t)))\bigg{)}\right]\gtrsim 0,blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ⟨ bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) , bold_italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ⟩ ] - italic_V blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) ) ) ] ≳ 0 , (12)

where V𝑉Vitalic_V is a constant that we can arbitrarily pick for analytical purposes. Intuitively, this V𝑉Vitalic_V stands for a trade-off between the stability part 𝑸n(t),𝝀n(t)subscript𝑸𝑛𝑡subscript𝝀𝑛𝑡\langle\bm{Q}_{n}(t),\bm{\lambda}_{n}(t)\rangle⟨ bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) , bold_italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ⟩ and the utility part gt(𝝀(t))subscript𝑔𝑡𝝀𝑡g_{t}(\bm{\lambda}(t))italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ).

Again motivated by Huang et al. (2024), our goal is to minimize Equation 12. The first term in Equation 12 is exactly the OLO optimization objective from the previous section (recall Equation 5), which can be minimized by the AdaPFOL algorithm given in Algorithm 2. For the second and third term, we would like to

Minimize 𝔼[t=1T(𝑸(t),𝝀(t)Vgt(𝝀(t)))].Minimize 𝔼superscriptsubscript𝑡1𝑇𝑸𝑡𝝀𝑡𝑉subscript𝑔𝑡𝝀𝑡\text{Minimize }\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\bigg{(% }\langle\bm{Q}(t),\bm{\lambda}(t)\rangle-Vg_{t}(\bm{\lambda}(t))\bigg{)}\right].Minimize blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( ⟨ bold_italic_Q ( italic_t ) , bold_italic_λ ( italic_t ) ⟩ - italic_V italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) ) ] . (13)

We now also tackle Equation 13 using online learning techniques: As gtsubscript𝑔𝑡g_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is defined over all possible 𝝀(t)𝝀𝑡\bm{\lambda}(t)bold_italic_λ ( italic_t )’s and 𝝀(t)𝝀𝑡\bm{\lambda}(t)bold_italic_λ ( italic_t ) can be arbitrarily chosen from the feasible action set ΛΛ\Lambdaroman_Λ, we regard “deciding 𝝀(t)𝝀𝑡\bm{\lambda}(t)bold_italic_λ ( italic_t )” as an learning problem with action set ΛΛ\Lambdaroman_Λ (instead of making decisions on each link or server separately in Section 3.4) where the loss of picking 𝝀𝝀\bm{\lambda}bold_italic_λ in round t𝑡titalic_t is t(𝝀)𝑸(t),𝝀Vgt(𝝀)subscript𝑡𝝀𝑸𝑡𝝀𝑉subscript𝑔𝑡𝝀\ell_{t}(\bm{\lambda})\triangleq\langle\bm{Q}(t),\bm{\lambda}\rangle-Vg_{t}(% \bm{\lambda})roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ) ≜ ⟨ bold_italic_Q ( italic_t ) , bold_italic_λ ⟩ - italic_V italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ). This loss function is convex w.r.t. 𝝀𝝀\bm{\lambda}bold_italic_λ as ,\langle\cdot,\cdot\rangle⟨ ⋅ , ⋅ ⟩ is linear and gtsubscript𝑔𝑡g_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is concave. However, since only bandit feedback on gtsubscript𝑔𝑡g_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is available, we can only calculate t(𝝀)subscript𝑡𝝀\ell_{t}(\bm{\lambda})roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ) but not the whole tsubscript𝑡\ell_{t}roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Thus, this problem is not an OLO problem as Definition 3.3 requires full information feedback. Instead, it belongs to the category of Bandit Convex Optimization (BCO) (Flaxman et al., 2005), which we will define in Definition 4.2.

Similar to Section 3.1, we now discuss what properties the AdaBGD sub-routine should enjoy. We have the following challenges that is unique due to the network optimization context:

  1. i)

    Again, the queue lengths can potentially go unbounded, which means the loss t(𝝀)subscript𝑡𝝀\ell_{t}(\bm{\lambda})roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ) can have large magnitudes. However, different from the OLO problem we met in Equation 5, in BCO problems we face general convex functions and thus Lipschitzness (i.e., the maximum gradient magnitude) also plays a role as it characterizes how fast tsubscript𝑡\ell_{t}roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT changes with 𝝀𝝀\bm{\lambda}bold_italic_λ. Therefore, our AdaBGD shall not only bare large loss magnitudes but also resist from huge Lipshictzness. As we will see in Equation 11, adapting to both magnitudes and Lipschitzness is particularly difficult.

  2. ii)

    The second challenge is again due to self-bounding analysis: Specifically, we want to conduct self-bounding analyses on the queue lengths and also the utility gap (both similar to Equation 6), we also want AdaBGD to be adaptive to the loss functions’ magnitudes and Lipschitzness.

In Section 4.4, we introduce the details of our AdaBGD algorithm that ensures both i) and ii). Equipped with this algorithm, we can optimize Equation 13 by deploying it over the action set ΛΛ\Lambdaroman_Λ. Moreover, as deploying AdaPFOL (Algorithm 2) on each link (n,m)𝑛𝑚(n,m)\in{\mathcal{L}}( italic_n , italic_m ) ∈ caligraphic_L minimizes 𝔼[t=1T(n,m)𝑸m(t)𝑸n(t),𝝁n,m(t)]𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝑚subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡subscript𝝁𝑛𝑚𝑡\operatornamewithlimits{\mathbb{E}}[\sum_{t=1}^{T}\sum_{(n,m)\in{\mathcal{L}}}% \langle\bm{Q}_{m}(t)-\bm{Q}_{n}(t),\bm{\mu}_{n,m}(t)\rangle]blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ⟨ bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) , bold_italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ⟩ ], these two algorithms can together minimize the Lyapunov drift-plus-penalty in Equation 12. Such a combination gives the UMO2 framework in Algorithm 3.

In the remaining of this section, we present the analysis of UMO2: In Section 4.2, we introduce the reference sequence assumption. In Section 4.3, we present the Lyapunov drift-plus-penalty analysis. In Section 4.4, we rigorously define the BCO problem and present our AdaBGD algorithm (Algorithm 4). Finally, by combining the AdaPFOL guarantee from Section 3.4 and the AdaBGD guarantee from Section 4.4, we yield the utility maximization guarantee in Section 4.5.

4.2 Reference Policy Assumption

The assumption we need in the multi-hop utility maximization task is similar to the one in multi-hop network stability (1), with one important difference that our arrival rates are no longer fixed but decided by the scheduler. Hence, instead of assuming a sequence of {𝒂̊(t)}t[T]subscript̊𝒂𝑡𝑡delimited-[]𝑇\{\mathring{\bm{a}}(t)\}_{t\in[T]}{ over̊ start_ARG bold_italic_a end_ARG ( italic_t ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT stabilizing the system with the obliviously adversarial arrival rates {𝝀(t)}t[T]subscript𝝀𝑡𝑡delimited-[]𝑇\{\bm{\lambda}(t)\}_{t\in[T]}{ bold_italic_λ ( italic_t ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT, we assume the existence of reference sequence {(𝒂̊(t),𝝀̊(t))}t[T]subscript̊𝒂𝑡̊𝝀𝑡𝑡delimited-[]𝑇\{(\mathring{\bm{a}}(t),\mathring{\bm{\lambda}}(t))\}_{t\in[T]}{ ( over̊ start_ARG bold_italic_a end_ARG ( italic_t ) , over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT such that {𝒂̊(t)}t[T]subscript̊𝒂𝑡𝑡delimited-[]𝑇\{\mathring{\bm{a}}(t)\}_{t\in[T]}{ over̊ start_ARG bold_italic_a end_ARG ( italic_t ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT stabilizes system with reference arrival rates {𝝀̊(t)}t[T]subscript̊𝝀𝑡𝑡delimited-[]𝑇\{\mathring{\bm{\lambda}}(t)\}_{t\in[T]}{ over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT. Formally, we make the following assumption.

Assumption 2 (Multi-Hop Piecewise Stability for Utility Maximization).

There exists a reference action sequence {(𝐚̊(t),𝛌̊(t))}t[T]subscript̊𝐚𝑡̊𝛌𝑡𝑡delimited-[]𝑇\{(\mathring{\bm{a}}(t),\mathring{\bm{\lambda}}(t))\}_{t\in[T]}{ ( over̊ start_ARG bold_italic_a end_ARG ( italic_t ) , over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT (where 𝐚̊(t)={𝐚̊n,m(t)(𝒩)}(n,m)̊𝐚𝑡subscriptsubscript̊𝐚𝑛𝑚𝑡𝒩𝑛𝑚\mathring{\bm{a}}(t)=\{\mathring{\bm{a}}_{n,m}(t)\in\triangle({\mathcal{N}})\}% _{(n,m)\in{\mathcal{L}}}over̊ start_ARG bold_italic_a end_ARG ( italic_t ) = { over̊ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ∈ △ ( caligraphic_N ) } start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT and 𝛌̊(t)={𝛌̊n(t)Λn}n𝒩̊𝛌𝑡subscriptsubscript̊𝛌𝑛𝑡subscriptΛ𝑛𝑛𝒩\mathring{\bm{\lambda}}(t)=\{\mathring{\bm{\lambda}}_{n}(t)\in\Lambda_{n}\}_{n% \in{\mathcal{N}}}over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) = { over̊ start_ARG bold_italic_λ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ∈ roman_Λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT, in analogue to the scheduler’s action sequence) such that there are constants CW0subscript𝐶𝑊0C_{W}\geq 0italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ≥ 0, ϵW0subscriptitalic-ϵ𝑊0\epsilon_{W}\geq 0italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ≥ 0, and a partition W1,W2,,WJsubscript𝑊1subscript𝑊2subscript𝑊𝐽W_{1},W_{2},\ldots,W_{J}italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_W start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT of [T]delimited-[]𝑇[T][ italic_T ], such that j=1J(|Wj|1)2CWTsuperscriptsubscript𝑗1𝐽superscriptsubscript𝑊𝑗12subscript𝐶𝑊𝑇\sum_{j=1}^{J}(\lvert W_{j}\rvert-1)^{2}\leq C_{W}T∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_J end_POSTSUPERSCRIPT ( | italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_T and

1|Wj|tWj(n,m)Cn,m(t)ån,m(k)(t)ϵW+1|Wj|tWj(λ̊n(k)(t)+(o,n)Co,n(t)åo,n(k)(t)),1subscript𝑊𝑗subscript𝑡subscript𝑊𝑗subscript𝑛𝑚subscript𝐶𝑛𝑚𝑡superscriptsubscript̊𝑎𝑛𝑚𝑘𝑡subscriptitalic-ϵ𝑊1subscript𝑊𝑗subscript𝑡subscript𝑊𝑗superscriptsubscript̊𝜆𝑛𝑘𝑡subscript𝑜𝑛subscript𝐶𝑜𝑛𝑡superscriptsubscript̊𝑎𝑜𝑛𝑘𝑡\displaystyle\frac{1}{\lvert W_{j}\rvert}\sum_{t\in W_{j}}\sum_{(n,m)\in{% \mathcal{L}}}C_{n,m}(t)\mathring{a}_{n,m}^{(k)}(t)\geq\epsilon_{W}+\frac{1}{% \lvert W_{j}\rvert}\sum_{t\in W_{j}}\left(\mathring{\lambda}_{n}^{(k)}(t)+\sum% _{(o,n)\in{\mathcal{L}}}C_{o,n}(t)\mathring{a}_{o,n}^{(k)}(t)\right),divide start_ARG 1 end_ARG start_ARG | italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_t ∈ italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ≥ italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG | italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_t ∈ italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( over̊ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) + ∑ start_POSTSUBSCRIPT ( italic_o , italic_n ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) ,
j[J],n𝒩,k𝒩.formulae-sequencefor-all𝑗delimited-[]𝐽formulae-sequence𝑛𝒩𝑘𝒩\displaystyle\quad\forall j\in[J],n\in{\mathcal{N}},k\in{\mathcal{N}}.∀ italic_j ∈ [ italic_J ] , italic_n ∈ caligraphic_N , italic_k ∈ caligraphic_N .

Imitating Lemma 3.1, one can derive the following, whose proof is in Section C.1.

Lemma 4.1 (Ability of {(𝒂̊(t),𝝀̊(t))}t[T]subscript̊𝒂𝑡̊𝝀𝑡𝑡delimited-[]𝑇\{(\mathring{\bm{a}}(t),\mathring{\bm{\lambda}}(t))\}_{t\in[T]}{ ( over̊ start_ARG bold_italic_a end_ARG ( italic_t ) , over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT in Stabilizing the Network).

If {(𝐚̊(t),𝛌̊(t))}t[T]subscript̊𝐚𝑡̊𝛌𝑡𝑡delimited-[]𝑇\{(\mathring{\bm{a}}(t),\mathring{\bm{\lambda}}(t))\}_{t\in[T]}{ ( over̊ start_ARG bold_italic_a end_ARG ( italic_t ) , over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT satisfies 2, then for any scheduler-generated queue lengths {𝐐(t)}t[T]subscript𝐐𝑡𝑡delimited-[]𝑇\{\bm{Q}(t)\}_{t\in[T]}{ bold_italic_Q ( italic_t ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT,

ϵW𝔼[t=1Tn𝒩k𝒩Qn(k)(t)](N2(2NM+R)2+ϵWN2(2NM+R))CWTsubscriptitalic-ϵ𝑊𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡superscript𝑁2superscript2𝑁𝑀𝑅2subscriptitalic-ϵ𝑊superscript𝑁22𝑁𝑀𝑅subscript𝐶𝑊𝑇\displaystyle\quad\epsilon_{W}\operatornamewithlimits{\mathbb{E}}\left[\sum_{t% =1}^{T}\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q_{n}^{(k)}(t)\right]-(% N^{2}(2NM+R)^{2}+\epsilon_{W}N^{2}(2NM+R))C_{W}Titalic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] - ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) ) italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_T
𝔼[t=1T(n,m)k𝒩μ̊n,m(k)(t)(Qm(k)(t)Qn(k)(t))]𝔼[t=1Tn𝒩k𝒩Qn(k)(t)λ̊n(k)(t)].absent𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝑚subscript𝑘𝒩superscriptsubscript̊𝜇𝑛𝑚𝑘𝑡superscriptsubscript𝑄𝑚𝑘𝑡superscriptsubscript𝑄𝑛𝑘𝑡𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡superscriptsubscript̊𝜆𝑛𝑘𝑡\displaystyle\leq-\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum_% {(n,m)\in{\mathcal{L}}}\sum_{k\in{\mathcal{N}}}\mathring{\mu}_{n,m}^{(k)}(t)(Q% _{m}^{(k)}(t)-Q_{n}^{(k)}(t))\right]-\operatornamewithlimits{\mathbb{E}}\left[% \sum_{t=1}^{T}\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q_{n}^{(k)}(t)% \mathring{\lambda}_{n}^{(k)}(t)\right].≤ - blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT over̊ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ( italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) ] - blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) over̊ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] . (14)

Still, we remark that our scheduler cannot access the reference action sequence {(𝒂̊(t),𝝀̊(t))}t[T]subscript̊𝒂𝑡̊𝝀𝑡𝑡delimited-[]𝑇\{(\mathring{\bm{a}}(t),\mathring{\bm{\lambda}}(t))\}_{t\in[T]}{ ( over̊ start_ARG bold_italic_a end_ARG ( italic_t ) , over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT. However, similar to the NSO algorithm, our UMO2 algorithm can also learn to stabilize the system. Even more, it also learns to outperform the utility maximization performance of any mildly varying reference policy. Specifically, i) our action sequence {(𝒂(t),𝝀(t))}t[T]subscript𝒂𝑡𝝀𝑡𝑡delimited-[]𝑇\{(\bm{a}(t),\bm{\lambda}(t))\}_{t\in[T]}{ ( bold_italic_a ( italic_t ) , bold_italic_λ ( italic_t ) ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT also stabilizes the system, i.e., 1T𝔼[t=1T𝑸(t)1|]=𝒪T(1)\frac{1}{T}\operatornamewithlimits{\mathbb{E}}[\sum_{t=1}^{T}\lVert\bm{Q}(t)% \rVert_{1}\rvert]=\operatorname{\mathcal{O}}_{T}(1)divide start_ARG 1 end_ARG start_ARG italic_T end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | ] = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( 1 ); and, ii) its utility matches any mildly varying reference policy {(𝒂̊(t),𝝀̊(t))}t[T]subscript̊𝒂𝑡̊𝝀𝑡𝑡delimited-[]𝑇\{(\mathring{\bm{a}}(t),\mathring{\bm{\lambda}}(t))\}_{t\in[T]}{ ( over̊ start_ARG bold_italic_a end_ARG ( italic_t ) , over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT asymptotically — that is, 1T𝔼[t=1Tgt(𝝀(t))]polynomially1Tt=1Tgt(𝝀̊(t))polynomially1𝑇𝔼superscriptsubscript𝑡1𝑇subscript𝑔𝑡𝝀𝑡1𝑇superscriptsubscript𝑡1𝑇subscript𝑔𝑡̊𝝀𝑡\frac{1}{T}\operatornamewithlimits{\mathbb{E}}[\sum_{t=1}^{T}g_{t}(\bm{\lambda% }(t))]\xrightarrow{\text{polynomially}}\frac{1}{T}\sum_{t=1}^{T}g_{t}(% \mathring{\bm{\lambda}}(t))divide start_ARG 1 end_ARG start_ARG italic_T end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) ] start_ARROW overpolynomially → end_ARROW divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ).333According to the oblivious adversary assumption, gtsubscript𝑔𝑡g_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is pre-determined. Thus, the gtsubscript𝑔𝑡g_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT’s on the LHS and RHS are the same, so there is no need to take conditional expectation w.r.t. previous actions {(𝒂(t),𝝀(t))}t[T]subscript𝒂𝑡𝝀𝑡𝑡delimited-[]𝑇\{(\bm{a}(t),\bm{\lambda}(t))\}_{t\in[T]}{ ( bold_italic_a ( italic_t ) , bold_italic_λ ( italic_t ) ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT or {(𝒂̊(t),𝝀̊(t))}t[T]subscript̊𝒂𝑡̊𝝀𝑡𝑡delimited-[]𝑇\{(\mathring{\bm{a}}(t),\mathring{\bm{\lambda}}(t))\}_{t\in[T]}{ ( over̊ start_ARG bold_italic_a end_ARG ( italic_t ) , over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT.

4.3 Lyapunov Drift-plus-Penalty Analysis

In the Lyapunov drift analysis (Section 3.3), we consider the drift function Δ(𝑸(t))𝔼[Lt+1Lt𝑸(t)]Δ𝑸𝑡𝔼subscript𝐿𝑡1conditionalsubscript𝐿𝑡𝑸𝑡\Delta(\bm{Q}(t))\triangleq\operatornamewithlimits{\mathbb{E}}[L_{t+1}-L_{t}% \mid\bm{Q}(t)]roman_Δ ( bold_italic_Q ( italic_t ) ) ≜ blackboard_E [ italic_L start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∣ bold_italic_Q ( italic_t ) ] where Lt12𝑸(t)22subscript𝐿𝑡12superscriptsubscriptdelimited-∥∥𝑸𝑡22L_{t}\triangleq\frac{1}{2}\lVert\bm{Q}(t)\rVert_{2}^{2}italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≜ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is the Lyapunov function. In the Lyapunov drift-plus-penalty (DPP) analysis (Neely, 2010a, Theorem 4.2), we consider the DPP function Δ(𝑸(t))V𝔼[gt(𝝀(t))𝑸(t)]Δ𝑸𝑡𝑉𝔼conditionalsubscript𝑔𝑡𝝀𝑡𝑸𝑡\Delta(\bm{Q}(t))-V\operatornamewithlimits{\mathbb{E}}[g_{t}(\bm{\lambda}(t))% \mid\bm{Q}(t)]roman_Δ ( bold_italic_Q ( italic_t ) ) - italic_V blackboard_E [ italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) ∣ bold_italic_Q ( italic_t ) ], where V𝑉Vitalic_V is arbitrarily determined for our purpose. As we will see in Theorem 4.5, when V𝑉Vitalic_V is chosen to be no larger than a polynomial of T𝑇Titalic_T, our utility is at least that of any mildly varying reference policy minus 𝒪(V1)𝒪superscript𝑉1\operatorname{\mathcal{O}}(V^{-1})caligraphic_O ( italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ), thus implying a polynomially decaying gap between these two utilities.

Similar to the calculations in Lemma 3.1, one can derive the following inequality (see Lemma C.2):

𝔼[t=1T(n,m)k𝒩Cn,m(t)(Qm(k)(t)Qn(k)(t))an,m(k)(t)]𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝑚subscript𝑘𝒩subscript𝐶𝑛𝑚𝑡superscriptsubscript𝑄𝑚𝑘𝑡superscriptsubscript𝑄𝑛𝑘𝑡superscriptsubscript𝑎𝑛𝑚𝑘𝑡\displaystyle\quad-\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum% _{(n,m)\in{\mathcal{L}}}\sum_{k\in{\mathcal{N}}}C_{n,m}(t)(Q_{m}^{(k)}(t)-Q_{n% }^{(k)}(t))a_{n,m}^{(k)}(t)\right]- blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ( italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) italic_a start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ]
𝔼[t=1Tn𝒩k𝒩Qn(k)(t)λn(k)(t)]+V𝔼[t=1T(gt(𝝀(t))gt(𝝀̊(t)))]𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡𝑉𝔼superscriptsubscript𝑡1𝑇subscript𝑔𝑡𝝀𝑡subscript𝑔𝑡̊𝝀𝑡\displaystyle\quad-\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum% _{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q_{n}^{(k)}(t)\lambda_{n}^{(k)}(t)% \right]+V\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}(g_{t}(\bm{% \lambda}(t))-g_{t}(\mathring{\bm{\lambda}}(t)))\right]- blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] + italic_V blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) ) ]
12N2((NM)2+2(NM)2+2R2)T+V𝔼[t=1T(gt(𝝀(t))gt(𝝀̊(t)))].absent12superscript𝑁2superscript𝑁𝑀22superscript𝑁𝑀22superscript𝑅2𝑇𝑉𝔼superscriptsubscript𝑡1𝑇subscript𝑔𝑡𝝀𝑡subscript𝑔𝑡̊𝝀𝑡\displaystyle\leq\frac{1}{2}N^{2}((NM)^{2}+2(NM)^{2}+2R^{2})T+V% \operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}(g_{t}(\bm{\lambda}(t))% -g_{t}(\mathring{\bm{\lambda}}(t)))\right].≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_T + italic_V blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) ) ] . (15)

Similar to the previous section, we want to make the RHS of Equation 15 close to that of Equation 14. Specifically, we decompose these two RHS’s into two parts and show the following inequalities:

(16)

The first inequality is exactly our objective in Equation 5, which can be ensured via the AdaPFOL algorithm in Algorithm 2 – recall its performance guarantee in Theorem 3.5. On the other hand, the second inequality is new in the utility maximization task. While some algorithmic ingredients can be borrowed from the Bandit Convex Optimization (BCO) problem (Flaxman et al., 2005), new efforts need be made due to the network optimization context: Our BCO algorithm shall accept large loss magnitudes and Lipschitzness (due to potentially unbounded queue lengths), and its performance must be adaptive to the loss functions’ magnitudes and Lipschitzness as well.

4.4 AdaBGD: Learning for Utility Maximization

As mentioned in the sketch (Section 4.1), Equation 16 is equivalent to minimizing the time-varying loss function t(𝝀)Q(t),𝝀Vgt(𝝀)subscript𝑡𝝀𝑄𝑡𝝀𝑉subscript𝑔𝑡𝝀\ell_{t}(\bm{\lambda})\triangleq\langle Q(t),\bm{\lambda}\rangle-Vg_{t}(\bm{% \lambda})roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ) ≜ ⟨ italic_Q ( italic_t ) , bold_italic_λ ⟩ - italic_V italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ) under bandit feedback over the action set ΛΛ\Lambdaroman_Λ. This problem is different from the OLO problem introduced in Definition 3.3 as we do not have full-information feedback: Indeed, we assume gt(𝝀(t))subscript𝑔𝑡𝝀𝑡g_{t}(\bm{\lambda}(t))italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) instead of the whole gtsubscript𝑔𝑡g_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT will be revealed to the scheduler, hence only t(𝝀(t))subscript𝑡𝝀𝑡\ell_{t}(\bm{\lambda}(t))roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ), the actual loss associated with our action, can be accurately calculated. We provide a formal definition of the Bandit Convex Optimization (BCO) problem (Flaxman et al., 2005, Chen and Giannakis, 2018) below.

Definition 4.2 (Bandit Convex Optimization).

Consider a T𝑇Titalic_T-round game. In round t=1,2,,T𝑡12𝑇t=1,2,\ldots,Titalic_t = 1 , 2 , … , italic_T, the player picks an action 𝐱tsubscript𝐱𝑡\bm{x}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT from a convex action set 𝒳d𝒳superscript𝑑\mathcal{X}\subseteq\mathbb{R}^{d}caligraphic_X ⊆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, and the environment simultaneously picks an arbitrary convex loss t:𝒳:subscript𝑡𝒳\ell_{t}\colon\mathcal{X}\to\mathbb{R}roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : caligraphic_X → blackboard_R. The player observes and suffers loss t(𝐱t)subscript𝑡subscript𝐱𝑡\ell_{t}(\bm{x}_{t})roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ). Dynamic regret minimization in BCO considers minimizing

D-RegretTBCO(𝒖1,𝒖2,,𝒖T)=𝔼[t=1T(t(𝒙t)t(𝒖t))],𝒖1,𝒖2,,𝒖T𝒳.formulae-sequencesuperscriptsubscriptD-Regret𝑇BCOsubscript𝒖1subscript𝒖2subscript𝒖𝑇𝔼superscriptsubscript𝑡1𝑇subscript𝑡subscript𝒙𝑡subscript𝑡subscript𝒖𝑡for-allsubscript𝒖1subscript𝒖2subscript𝒖𝑇𝒳\text{D-Regret}_{T}^{\text{BCO}}(\bm{u}_{1},\bm{u}_{2},\ldots,\bm{u}_{T})=% \operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}(\ell_{t}(\bm{x}_{t})-% \ell_{t}(\bm{u}_{t}))\right],\quad\forall\bm{u}_{1},\bm{u}_{2},\ldots,\bm{u}_{% T}\in\mathcal{X}.D-Regret start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BCO end_POSTSUPERSCRIPT ( bold_italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , bold_italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) = blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) ] , ∀ bold_italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , bold_italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∈ caligraphic_X .

Again, we recall the two challenges that our BCO algorithm should overcome:

  1. i)

    It must handle tsubscript𝑡\ell_{t}roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT’s with large magnitude sup𝝀Λ|t(𝝀)|subscriptsupremum𝝀Λsubscript𝑡𝝀\sup_{\bm{\lambda}\in\Lambda}\lvert\ell_{t}(\bm{\lambda})\rvertroman_sup start_POSTSUBSCRIPT bold_italic_λ ∈ roman_Λ end_POSTSUBSCRIPT | roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ) | or Lipschitzness sup𝝀Λt(𝝀)2subscriptsupremum𝝀Λsubscriptdelimited-∥∥subscript𝑡𝝀2\sup_{\bm{\lambda}\in\Lambda}\lVert\nabla\ell_{t}(\bm{\lambda})\rVert_{2}roman_sup start_POSTSUBSCRIPT bold_italic_λ ∈ roman_Λ end_POSTSUBSCRIPT ∥ ∇ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

  2. ii)

    Its performance should depend on magnitudes and Lipschitzness of all loss functions.

Our algorithm is based on the Bandit Gradient Descent (BGD) algorithm (Zhao et al., 2021, Algorithm 1), which does not satisfy i) or ii) as it requires losses to be uniformly bounded by some C𝐶Citalic_C and Lipschitzness to be always bounded by some L𝐿Litalic_L. In our case where t(𝝀)=𝑸(t),𝝀Vgt(𝝀)subscript𝑡𝝀𝑸𝑡𝝀𝑉subscript𝑔𝑡𝝀\ell_{t}(\bm{\lambda})=\langle\bm{Q}(t),\bm{\lambda}\rangle-Vg_{t}(\bm{\lambda})roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ) = ⟨ bold_italic_Q ( italic_t ) , bold_italic_λ ⟩ - italic_V italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ), the loss magnitude 𝑸(t)+VGsubscriptdelimited-∥∥𝑸𝑡𝑉𝐺\lVert\bm{Q}(t)\rVert_{\infty}+VG∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT + italic_V italic_G and the Lipschitzness 𝑸(t)2+VLsubscriptdelimited-∥∥𝑸𝑡2𝑉𝐿\lVert\bm{Q}(t)\rVert_{2}+VL∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_V italic_L are both large when 𝑸(t)delimited-∥∥𝑸𝑡\lVert\bm{Q}(t)\rVert∥ bold_italic_Q ( italic_t ) ∥ is large.

Nevertheless, based on the BGD algorithm, we designed a BCO algorithm called Adaptive BGD (AdaBGD; Algorithm 4) which satisfies both i) and ii). Specifically, to ensure i), we utilize the fact that 𝑸(t)𝑸𝑡\bm{Q}(t)bold_italic_Q ( italic_t ) is known before deciding 𝝀tsubscript𝝀𝑡\bm{\lambda}_{t}bold_italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, which means the magnitude Ctsubscript𝐶𝑡C_{t}italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and the Lipschitzness Ltsubscript𝐿𝑡L_{t}italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT of loss function tsubscript𝑡\ell_{t}roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT can be calculated before decision. To enjoy ii), instead of the doubling technique in AdaPFOL (Algorithm 2), we now design an adaptive learning rate scheduling mechanism which involves a sequence of time-varying learning rates, namely η1>η2>>ηTsubscript𝜂1subscript𝜂2subscript𝜂𝑇\eta_{1}>\eta_{2}>\cdots>\eta_{T}italic_η start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > italic_η start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > ⋯ > italic_η start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, instead of using a single η𝜂\etaitalic_η throughout execution. Formally, AdaBGD has the following dynamic regret guarantee:

Algorithm 4 AdaBGD: Adaptive Bandit Gradient Descent
0:  Action set 𝒳𝒳\mathcal{X}caligraphic_X bounded by [r,R]𝑟𝑅[r,R][ italic_r , italic_R ] (i.e., r𝔹𝒳R𝔹𝑟𝔹𝒳𝑅𝔹r\mathbb{B}\subseteq\mathcal{X}\subseteq R\mathbb{B}italic_r blackboard_B ⊆ caligraphic_X ⊆ italic_R blackboard_B), hyper-parameters η1>η2>>ηTsubscript𝜂1subscript𝜂2subscript𝜂𝑇\eta_{1}>\eta_{2}>\cdots>\eta_{T}italic_η start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > italic_η start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > ⋯ > italic_η start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, δ1,δ2,,δTsubscript𝛿1subscript𝛿2subscript𝛿𝑇\delta_{1},\delta_{2},\ldots,\delta_{T}italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_δ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, and αtδt/r,t[T]formulae-sequencesubscript𝛼𝑡subscript𝛿𝑡𝑟for-all𝑡delimited-[]𝑇\alpha_{t}\triangleq\delta_{t}/r,\forall t\in[T]italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≜ italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT / italic_r , ∀ italic_t ∈ [ italic_T ].
1:  Initialize 𝒚1=𝟎subscript𝒚10\bm{y}_{1}=\bm{0}bold_italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = bold_0 (an internal variable of the algorithm).
2:  for t=1,2,,T𝑡12𝑇t=1,2,\ldots,Titalic_t = 1 , 2 , … , italic_T do
3:     Calculate this round’s action 𝒙t𝒳subscript𝒙𝑡𝒳\bm{x}_{t}\in\mathcal{X}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_X, observe loss t(xt)subscript𝑡subscript𝑥𝑡\ell_{t}(x_{t})roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), and update internal variable 𝒚t+1subscript𝒚𝑡1\bm{y}_{t+1}bold_italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT:
𝒙t=𝒚t+δ𝒔t,𝒚t+1=Proj(1αt)𝒳[𝒚tηtdδtt(𝒙t)𝒔t],formulae-sequencesubscript𝒙𝑡subscript𝒚𝑡𝛿subscript𝒔𝑡subscript𝒚𝑡1subscriptProj1subscript𝛼𝑡𝒳delimited-[]subscript𝒚𝑡subscript𝜂𝑡𝑑subscript𝛿𝑡subscript𝑡subscript𝒙𝑡subscript𝒔𝑡\bm{x}_{t}=\bm{y}_{t}+\delta\bm{s}_{t},\quad\bm{y}_{t+1}=\text{Proj}_{(1-% \alpha_{t})\mathcal{X}}\left[\bm{y}_{t}-\eta_{t}\frac{d}{\delta_{t}}\ell_{t}(% \bm{x}_{t})\bm{s}_{t}\right],bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_δ bold_italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = Proj start_POSTSUBSCRIPT ( 1 - italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) caligraphic_X end_POSTSUBSCRIPT [ bold_italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT divide start_ARG italic_d end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) bold_italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] , (17)
where 𝒔tdsubscript𝒔𝑡superscript𝑑\bm{s}_{t}\in\mathbb{R}^{d}bold_italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is a uniformly sampled unit vector used to estimate gradients (Flaxman et al., 2005).
Lemma 4.3 (Guarantee of AdaBGD Algorithm).

Suppose that r𝔹𝒳R𝔹𝑟𝔹𝒳𝑅𝔹r\mathbb{B}\subseteq\mathcal{X}\subseteq R\mathbb{B}italic_r blackboard_B ⊆ caligraphic_X ⊆ italic_R blackboard_B, the t𝑡titalic_t-th loss tsubscript𝑡\ell_{t}roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is bounded by Ctsubscript𝐶𝑡C_{t}italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and is Ltsubscript𝐿𝑡L_{t}italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT-Lipschitz. Suppose that ηtsubscript𝜂𝑡\eta_{t}italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and δtsubscript𝛿𝑡\delta_{t}italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are both t1subscript𝑡1\mathcal{F}_{t-1}caligraphic_F start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT-measurable (where (t)t=0Tsuperscriptsubscriptsubscript𝑡𝑡0𝑇(\mathcal{F}_{t})_{t=0}^{T}( caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT is the natural filtration), η1>η2>>ηTsubscript𝜂1subscript𝜂2subscript𝜂𝑇\eta_{1}>\eta_{2}>\cdots>\eta_{T}italic_η start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > italic_η start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > ⋯ > italic_η start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, and αtδt/r<1subscript𝛼𝑡subscript𝛿𝑡𝑟1\alpha_{t}\triangleq\delta_{t}/r<1italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≜ italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT / italic_r < 1 a.s. for all t[T]𝑡delimited-[]𝑇t\in[T]italic_t ∈ [ italic_T ]. Then for any fixed 𝐮1,𝐮2,,𝐮T𝒳subscript𝐮1subscript𝐮2subscript𝐮𝑇𝒳\bm{u}_{1},\bm{u}_{2},\ldots,\bm{u}_{T}\in\mathcal{X}bold_italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , bold_italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∈ caligraphic_X, the AdaBGD algorithm in Algorithm 4 enjoys the following guarantee:

D-RegretTBCO(𝒖1,𝒖2,,𝒖T)=𝔼[t=1T(t(𝒙t)t(𝒖t))]superscriptsubscriptD-Regret𝑇BCOsubscript𝒖1subscript𝒖2subscript𝒖𝑇𝔼superscriptsubscript𝑡1𝑇subscript𝑡subscript𝒙𝑡subscript𝑡subscript𝒖𝑡\displaystyle\quad\text{D-Regret}_{T}^{\text{BCO}}(\bm{u}_{1},\bm{u}_{2},% \ldots,\bm{u}_{T})=\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}(% \ell_{t}(\bm{x}_{t})-\ell_{t}(\bm{u}_{t}))\right]D-Regret start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BCO end_POSTSUPERSCRIPT ( bold_italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , bold_italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) = blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) ]
𝔼[7R24ηT+PTRηT+t=1T(ηt2d2δt2Ct2+3Ltδt+LtαtR)],absent𝔼7superscript𝑅24subscript𝜂𝑇subscript𝑃𝑇𝑅subscript𝜂𝑇superscriptsubscript𝑡1𝑇subscript𝜂𝑡2superscript𝑑2superscriptsubscript𝛿𝑡2superscriptsubscript𝐶𝑡23subscript𝐿𝑡subscript𝛿𝑡subscript𝐿𝑡subscript𝛼𝑡𝑅\displaystyle\leq\operatornamewithlimits{\mathbb{E}}\left[\frac{7R^{2}}{4\eta_% {T}}+\frac{P_{T}R}{\eta_{T}}+\sum_{t=1}^{T}\left(\frac{\eta_{t}}{2}\frac{d^{2}% }{\delta_{t}^{2}}C_{t}^{2}+3L_{t}\delta_{t}+L_{t}\alpha_{t}R\right)\right],≤ blackboard_E [ divide start_ARG 7 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_η start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG + divide start_ARG italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT italic_R end_ARG start_ARG italic_η start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( divide start_ARG italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG divide start_ARG italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 3 italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_R ) ] ,

where PT=t=1T1𝐮t𝐮t+1subscript𝑃𝑇superscriptsubscript𝑡1𝑇1delimited-∥∥subscript𝐮𝑡subscript𝐮𝑡1P_{T}=\sum_{t=1}^{T-1}\lVert\bm{u}_{t}-\bm{u}_{t+1}\rVertitalic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∥ bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - bold_italic_u start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∥ is the path length of the comparator sequence {𝐮t}t[T]subscriptsubscript𝐮𝑡𝑡delimited-[]𝑇\{\bm{u}_{t}\}_{t\in[T]}{ bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT.

Compared to the algorithm itself in Algorithm 4 and the guarantee in Lemma 4.3, our main innovation on the BCO side lies in the novel learning rate scheduling mechanism in Equation 11. Therefore, we do not prove Lemma 4.3 at this moment and postpone it to Section C.3. Instead, we prove:

Theorem 4.4 (Deciding 𝝀(t)𝝀𝑡\bm{\lambda}(t)bold_italic_λ ( italic_t ) via AdaBGD Algorithm).

For the reference arrival rates {𝛌̊(t)}t[T]subscript̊𝛌𝑡𝑡delimited-[]𝑇\{\mathring{\bm{\lambda}}(t)\}_{t\in[T]}{ over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT defined in 2, suppose that its path length ensures

Ptλt=1T1𝝀̊(t+1)𝝀̊(t))1Cλt1/2δλ,t=1,2,,T,P_{t}^{\lambda}\triangleq\sum_{t=1}^{T-1}\lVert\mathring{\bm{\lambda}}({t+1})-% \mathring{\bm{\lambda}}(t))\rVert_{1}\leq C^{\lambda}t^{1/2-\delta_{\lambda}},% \quad\forall t=1,2,\ldots,T,italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT ≜ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∥ over̊ start_ARG bold_italic_λ end_ARG ( italic_t + 1 ) - over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , ∀ italic_t = 1 , 2 , … , italic_T ,

where, similar to Theorem 3.6, Cλsuperscript𝐶𝜆C^{\lambda}italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT and δλsubscript𝛿𝜆\delta_{\lambda}italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT are assumed to be known constants but the precise Ptλsuperscriptsubscript𝑃𝑡𝜆P_{t}^{\lambda}italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT or {𝛌̊(t)}t[T]subscript̊𝛌𝑡𝑡delimited-[]𝑇\{\mathring{\bm{\lambda}}(t)\}_{t\in[T]}{ over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT both remain unknown. Suppose that the action set ΛΛ\Lambdaroman_Λ is bounded by [r,R]𝑟𝑅[r,R][ italic_r , italic_R ] (i.e., r𝔹ΛR𝔹𝑟𝔹Λ𝑅𝔹r\mathbb{B}\subseteq\Lambda\subseteq R\mathbb{B}italic_r blackboard_B ⊆ roman_Λ ⊆ italic_R blackboard_B). If we execute AdaBGD (Algorithm 4) over ΛΛ\Lambdaroman_Λ with loss functions t(𝛌)=𝐐(t),𝛌Vgt(𝛌)subscript𝑡𝛌𝐐𝑡𝛌𝑉subscript𝑔𝑡𝛌\ell_{t}(\bm{\lambda})=\langle\bm{Q}(t),\bm{\lambda}\rangle-Vg_{t}(\bm{\lambda})roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ) = ⟨ bold_italic_Q ( italic_t ) , bold_italic_λ ⟩ - italic_V italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ) and parameters ηt,δt,αtsubscript𝜂𝑡subscript𝛿𝑡subscript𝛼𝑡\eta_{t},\delta_{t},\alpha_{t}italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT defined in Equation 11 (restated below as Equation 18 to ease reading):

ηtsubscript𝜂𝑡\displaystyle\eta_{t}italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =(CλT1/2δλ/(CλT1/2δλ)7/3(4r3d2)28/9(M+R)4/3+CλT1/2δλ(r3d2VG2/L)4/3+s=1t((𝒒s+VG)2(𝒒s2+VL)2)1/3)3/4,\displaystyle=\left(C^{\lambda}T^{1/2-\delta_{\lambda}}\middle/\begin{subarray% }{c}\left(C^{\lambda}T^{1/2-\delta_{\lambda}}\right)^{7/3}\left(4r^{-3}d^{2}% \right)^{28/9}\left(M+R\right)^{4/3}+\\ C^{\lambda}T^{1/2-\delta_{\lambda}}(r^{-3}d^{2}VG^{2}/L)^{4/3}+\\ \sum_{s=1}^{t}\left((\lVert\bm{q}_{s}\rVert_{\infty}+VG)^{2}(\lVert\bm{q}_{s}% \rVert_{2}+VL)^{2}\right)^{1/3}\end{subarray}\right)^{3/4},= ( italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT / start_ARG start_ROW start_CELL ( italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 7 / 3 end_POSTSUPERSCRIPT ( 4 italic_r start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 28 / 9 end_POSTSUPERSCRIPT ( italic_M + italic_R ) start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT + end_CELL end_ROW start_ROW start_CELL italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_r start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_V italic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_L ) start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT + end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( ( ∥ bold_italic_q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT + italic_V italic_G ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ∥ bold_italic_q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_V italic_L ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT ,
δtsubscript𝛿𝑡\displaystyle\delta_{t}italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =(ηtd2(𝑸(t)+VG)2(𝑸(t)2+VL))1/3,αt=δtr,formulae-sequenceabsentsuperscriptsubscript𝜂𝑡superscript𝑑2superscriptsubscriptdelimited-∥∥𝑸𝑡𝑉𝐺2subscriptdelimited-∥∥𝑸𝑡2𝑉𝐿13subscript𝛼𝑡subscript𝛿𝑡𝑟\displaystyle=\left(\eta_{t}d^{2}\frac{(\lVert\bm{Q}(t)\rVert_{\infty}+VG)^{2}% }{(\lVert\bm{Q}(t)\rVert_{2}+VL)}\right)^{1/3},\quad\alpha_{t}=\frac{\delta_{t% }}{r},= ( italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT + italic_V italic_G ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_V italic_L ) end_ARG ) start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT , italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_r end_ARG , (18)

then the outputs 𝛌(1),𝛌(2),,𝛌(T)Λ𝛌1𝛌2𝛌𝑇Λ\bm{\lambda}(1),\bm{\lambda}(2),\ldots,\bm{\lambda}(T)\in\Lambdabold_italic_λ ( 1 ) , bold_italic_λ ( 2 ) , … , bold_italic_λ ( italic_T ) ∈ roman_Λ of AdaBGD ensure

In words, it means that the optimization objective Equation 16 is ensured. The second term on the RHS is a key term as it ensures property ii) – which is due to the ηtsubscript𝜂𝑡\eta_{t}italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT definition in Equation 11 which contains all historical magnitudes and Lipschitzness. Below, we quick overview this proof and see why ii) can be ensured by the ηtsubscript𝜂𝑡\eta_{t}italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT defined in Equation 18. A formal version is included in Section C.4.

Proof Sketch of Theorem 4.4.

The terms (𝑸(t)+VG)subscriptdelimited-∥∥𝑸𝑡𝑉𝐺(\lVert\bm{Q}(t)\rVert_{\infty}+VG)( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT + italic_V italic_G ) and (𝑸(t)2+VL)subscriptdelimited-∥∥𝑸𝑡2𝑉𝐿(\lVert\bm{Q}(t)\rVert_{2}+VL)( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_V italic_L ) are the boundedness and Lipschitzness of tsubscript𝑡\ell_{t}roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, respectively, which we denote by Ctsubscript𝐶𝑡C_{t}italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and Ltsubscript𝐿𝑡L_{t}italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to simplify notations.

First suppose that all conditions in Lemma 4.3 are satisfied by our {ηt}t[T]subscriptsubscript𝜂𝑡𝑡delimited-[]𝑇\{\eta_{t}\}_{t\in[T]}{ italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT. To balance the term inside t=1T()superscriptsubscript𝑡1𝑇\sum_{t=1}^{T}(\cdot)∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( ⋅ ) in Lemma 4.3 by fixing ηtsubscript𝜂𝑡\eta_{t}italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and altering δtsubscript𝛿𝑡\delta_{t}italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, one shall pick δt=(ηtd2Ct2/Lt)1/3subscript𝛿𝑡superscriptsubscript𝜂𝑡superscript𝑑2superscriptsubscript𝐶𝑡2subscript𝐿𝑡13\delta_{t}=(\eta_{t}d^{2}C_{t}^{2}/L_{t})^{1/3}italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT and roughly have (hiding many constant terms; see Equation 24 in the appendix for an accurate form):

D-RegretTBCO(𝝀̊(1),𝝀̊(2),,𝝀̊(T))CλT1/2δληT+t=1T(ηtCt2Lt2)1/3.less-than-or-similar-tosuperscriptsubscriptD-Regret𝑇BCO̊𝝀1̊𝝀2̊𝝀𝑇superscript𝐶𝜆superscript𝑇12subscript𝛿𝜆subscript𝜂𝑇superscriptsubscript𝑡1𝑇superscriptsubscript𝜂𝑡superscriptsubscript𝐶𝑡2superscriptsubscript𝐿𝑡213\text{D-Regret}_{T}^{\text{BCO}}(\mathring{\bm{\lambda}}(1),\mathring{\bm{% \lambda}}(2),\ldots,\mathring{\bm{\lambda}}(T))\lesssim\frac{C^{\lambda}T^{1/2% -\delta_{\lambda}}}{\eta_{T}}+\sum_{t=1}^{T}\left(\eta_{t}C_{t}^{2}L_{t}^{2}% \right)^{1/3}.D-Regret start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BCO end_POSTSUPERSCRIPT ( over̊ start_ARG bold_italic_λ end_ARG ( 1 ) , over̊ start_ARG bold_italic_λ end_ARG ( 2 ) , … , over̊ start_ARG bold_italic_λ end_ARG ( italic_T ) ) ≲ divide start_ARG italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_η start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT . (19)

We derive in Lemma D.1 that t=1Txt(stxs)1/4(t=1Txt)3/4less-than-or-similar-tosuperscriptsubscript𝑡1𝑇subscript𝑥𝑡superscriptsubscript𝑠𝑡subscript𝑥𝑠14superscriptsuperscriptsubscript𝑡1𝑇subscript𝑥𝑡34\sum_{t=1}^{T}\frac{x_{t}}{(\sum_{s\leq t}x_{s})^{1/4}}\lesssim(\sum_{t=1}^{T}% x_{t})^{3/4}∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ( ∑ start_POSTSUBSCRIPT italic_s ≤ italic_t end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT end_ARG ≲ ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT, which is a variant of the famous summation lemma (Auer et al., 2002). Therefore, if we pick ηt(CλT1/2δλs=1t(Ct2Lt2)1/3)3/4subscript𝜂𝑡superscriptsuperscript𝐶𝜆superscript𝑇12subscript𝛿𝜆superscriptsubscript𝑠1𝑡superscriptsuperscriptsubscript𝐶𝑡2superscriptsubscript𝐿𝑡21334\eta_{t}\approx\left(\frac{C^{\lambda}T^{1/2-\delta_{\lambda}}}{\sum_{s=1}^{t}% (C_{t}^{2}L_{t}^{2})^{1/3}}\right)^{3/4}italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≈ ( divide start_ARG italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT (i.e., only keeping the third term in the denominator of Equation 18), Equation 19 becomes 𝒪((CλT1/2δλ)1/4𝔼[(t=1T(Ct2Lt2)1/3)3/4])𝒪superscriptsuperscript𝐶𝜆superscript𝑇12subscript𝛿𝜆14𝔼superscriptsuperscriptsubscript𝑡1𝑇superscriptsuperscriptsubscript𝐶𝑡2superscriptsubscript𝐿𝑡21334\operatorname{\mathcal{O}}((C^{\lambda}T^{1/2-\delta_{\lambda}})^{1/4}% \operatornamewithlimits{\mathbb{E}}[(\sum_{t=1}^{T}(C_{t}^{2}L_{t}^{2})^{1/3})% ^{3/4}])caligraphic_O ( ( italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT blackboard_E [ ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT ] ). When focusing on T𝑇Titalic_T-related terms, this is roughly 𝒪T(T1/8δλ/4𝔼[(t=1T𝑸(t)1)]7/8)\operatorname{\mathcal{O}}_{T}(T^{1/8-\delta_{\lambda}/4}% \operatornamewithlimits{\mathbb{E}}[(\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{1})]% ^{7/8})caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT 1 / 8 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT / 4 end_POSTSUPERSCRIPT blackboard_E [ ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT 7 / 8 end_POSTSUPERSCRIPT ), which allows the self-bounding analysis similar to Equation 6. However, such a configuration of {ηt}t[T]subscriptsubscript𝜂𝑡𝑡delimited-[]𝑇\{\eta_{t}\}_{t\in[T]}{ italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT may not ensure the αt=δt/r=(ηtd2Ct2/Lt)1/3/r<1subscript𝛼𝑡subscript𝛿𝑡𝑟superscriptsubscript𝜂𝑡superscript𝑑2superscriptsubscript𝐶𝑡2subscript𝐿𝑡13𝑟1\alpha_{t}=\delta_{t}/r=(\eta_{t}d^{2}C_{t}^{2}/L_{t})^{1/3}/r<1italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT / italic_r = ( italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT / italic_r < 1 condition in Lemma 4.3. The other two terms in Equation 18 are added for this purpose. We refer the readers to Section C.4 for detailed verification. ∎

4.5 Main Theorem for Multi-Hop Utility Maximization

As sketched in Section 4.1, if we use a Lyapunov drift-plus-penalty analysis, exploit the network stability assumption, use AdaPFOL (Algorithm 2) to decide link allocations 𝒂(t)𝒂𝑡\bm{a}(t)bold_italic_a ( italic_t ), and use AdaBGD (Algorithm 4) to decide arrival rates 𝝀(t)𝝀𝑡\bm{\lambda}(t)bold_italic_λ ( italic_t ), we get the following utility maximization guarantee.

Theorem 4.5 (Main Theorem for Multi-Hop Utility Maximization).

Suppose that the feasible set of arrival rates vector ΛΛ\Lambdaroman_Λ is bounded by [r,R]𝑟𝑅[r,R][ italic_r , italic_R ]. Assume all (unknown) utility functions gtsubscript𝑔𝑡g_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to be concave, L𝐿Litalic_L-Lipschitz, and [G,G]𝐺𝐺[-G,G][ - italic_G , italic_G ]-bounded. Consider a reference action sequence {(𝐚̊(t),𝛌̊(t))}t[T]subscript̊𝐚𝑡̊𝛌𝑡𝑡delimited-[]𝑇\{(\mathring{\bm{a}}(t),\mathring{\bm{\lambda}}(t))\}_{t\in[T]}{ ( over̊ start_ARG bold_italic_a end_ARG ( italic_t ) , over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT satisfying 2, such that their path lengths satisfy

Ptas=1t1𝒂̊(s)𝒂̊(s+1)1Cat1/2δa,Ptλs=1t1𝝀̊(s)𝝀̊(s+1)1Cλt1/2δλ,t[T].formulae-sequencesuperscriptsubscript𝑃𝑡𝑎superscriptsubscript𝑠1𝑡1subscriptdelimited-∥∥̊𝒂𝑠̊𝒂𝑠11superscript𝐶𝑎superscript𝑡12subscript𝛿𝑎superscriptsubscript𝑃𝑡𝜆superscriptsubscript𝑠1𝑡1subscriptdelimited-∥∥̊𝝀𝑠̊𝝀𝑠11superscript𝐶𝜆superscript𝑡12subscript𝛿𝜆for-all𝑡delimited-[]𝑇\displaystyle P_{t}^{a}\triangleq\sum_{s=1}^{t-1}\lVert\mathring{\bm{a}}(s)-% \mathring{\bm{a}}(s+1)\rVert_{1}\leq C^{a}t^{1/2-\delta_{a}},\leavevmode% \nobreak\ P_{t}^{\lambda}\triangleq\sum_{s=1}^{t-1}\lVert\mathring{\bm{\lambda% }}(s)-\mathring{\bm{\lambda}}(s+1)\rVert_{1}\leq C^{\lambda}t^{1/2-\delta_{% \lambda}},\quad\forall t\in[T].italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ≜ ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∥ over̊ start_ARG bold_italic_a end_ARG ( italic_s ) - over̊ start_ARG bold_italic_a end_ARG ( italic_s + 1 ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_C start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT ≜ ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∥ over̊ start_ARG bold_italic_λ end_ARG ( italic_s ) - over̊ start_ARG bold_italic_λ end_ARG ( italic_s + 1 ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , ∀ italic_t ∈ [ italic_T ] .

Here, M,R,r,L,G,Ca,δa,Cλ,δλ𝑀𝑅𝑟𝐿𝐺superscript𝐶𝑎subscript𝛿𝑎superscript𝐶𝜆subscript𝛿𝜆M,R,r,L,G,C^{a},\delta_{a},C^{\lambda},\delta_{\lambda}italic_M , italic_R , italic_r , italic_L , italic_G , italic_C start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT , italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT , italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT are assumed to be known constants, whereas the specific {(𝐚̊(t),𝛌̊(t))}t[T]subscript̊𝐚𝑡̊𝛌𝑡𝑡delimited-[]𝑇\{(\mathring{\bm{a}}(t),\mathring{\bm{\lambda}}(t))\}_{t\in[T]}{ ( over̊ start_ARG bold_italic_a end_ARG ( italic_t ) , over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT remains unknown. If we execute the UMO2 framework in Algorithm 3 with the AdaPFOL sub-rountine given in Algorithm 2 and the AdaBGD sub-routine given in Algorithm 4, when T𝑇Titalic_T is large enough such that the constant V=oT(min{T2δa/3,T2δλ/7})𝑉subscript𝑜𝑇superscript𝑇2subscript𝛿𝑎3superscript𝑇2subscript𝛿𝜆7V=o_{T}(\min\{T^{2\delta_{a}/3},T^{2\delta_{\lambda}/7}\})italic_V = italic_o start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( roman_min { italic_T start_POSTSUPERSCRIPT 2 italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT / 3 end_POSTSUPERSCRIPT , italic_T start_POSTSUPERSCRIPT 2 italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT / 7 end_POSTSUPERSCRIPT } ), the following inequalities hold simultaneously:

That is, when T0much-greater-than𝑇0T\gg 0italic_T ≫ 0, our algorithm not only stabilizes the system so that 1T𝔼[t=1T𝐐(t)1]=𝒪T(1)1𝑇𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝐐𝑡1subscript𝒪𝑇1\frac{1}{T}\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{Q}% (t)\rVert_{1}\right]=\operatorname{\mathcal{O}}_{T}(1)divide start_ARG 1 end_ARG start_ARG italic_T end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( 1 ), but also enjoys an average utility approaching that of the reference policy polynomially fast, i.e., 1T𝔼[t=1T(gt(𝛌̊(t))gt(𝛌(t)))]=𝒪T(V1)1𝑇𝔼superscriptsubscript𝑡1𝑇subscript𝑔𝑡̊𝛌𝑡subscript𝑔𝑡𝛌𝑡subscript𝒪𝑇superscript𝑉1\frac{1}{T}\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\left(g_{t}(% \mathring{\bm{\lambda}}(t))-g_{t}(\bm{\lambda}(t))\right)\right]=\operatorname% {\mathcal{O}}_{T}(V^{-1})divide start_ARG 1 end_ARG start_ARG italic_T end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) ) ] = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) – the utility maximization objective Equation 3 is ensured.

Remark 5.

The condition V=oT(min{T2δa/3,T2δλ/7})𝑉subscript𝑜𝑇superscript𝑇2subscript𝛿𝑎3superscript𝑇2subscript𝛿𝜆7V=o_{T}(\min\{T^{2\delta_{a}/3},T^{2\delta_{\lambda}/7}\})italic_V = italic_o start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( roman_min { italic_T start_POSTSUPERSCRIPT 2 italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT / 3 end_POSTSUPERSCRIPT , italic_T start_POSTSUPERSCRIPT 2 italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT / 7 end_POSTSUPERSCRIPT } ) says T𝑇Titalic_T cannot be too small compared to V𝑉Vitalic_V, which was not an issue in SNO as people often let T𝑇Titalic_T approach infinity (Neely et al., 2008). Albeit this condition looks restrictive, we remark that V𝑉Vitalic_V can still be as large as a polynomial of T𝑇Titalic_T and thus 𝒪T(V1)subscript𝒪𝑇superscript𝑉1\operatorname{\mathcal{O}}_{T}(V^{-1})caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) means a polynomially decaying gap between our utility and that of any mildly varying policies whose path lengths are small – which is the first guarantee that applies to utility maximization tasks in adversarial networks under bandit feedback. Similar to the discussions in Remark 4, due to non-stationary environments and bandit feedback, it is highly non-trivial to define “optimal reference policy” in ANO. Nevertheless, our mildly varying reference policy class allows optimal policies for SNO settings.

The full proof of Theorem 4.5 is presented in Section C.5. We outline three key steps here.

Proof Sketch of Theorem 4.5.

The first step of the proof is plugging the algorithmic guarantees for AdaPFOL (Theorem 3.5) and for AdaBGD (Theorem 4.4) into the reference policy assumption (Equation 14) and then making use of the Lyapunov DPP guarantee (Equation 15). We present the detailed derivation in Lemma C.7. The conclusion reads

𝔼[t=1T𝑸(t)1]𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1\displaystyle\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{% Q}(t)\rVert_{1}\right]blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] VϵW𝔼[t=1T(gt(𝝀̊(t))gt(𝝀(t)))]+f(T)+absent𝑉subscriptitalic-ϵ𝑊𝔼superscriptsubscript𝑡1𝑇subscript𝑔𝑡̊𝝀𝑡subscript𝑔𝑡𝝀𝑡limit-from𝑓𝑇\displaystyle\leq-\frac{V}{\epsilon_{W}}\operatornamewithlimits{\mathbb{E}}% \left[\sum_{t=1}^{T}\bigg{(}g_{t}(\mathring{\bm{\lambda}}(t))-g_{t}(\bm{% \lambda}(t))\bigg{)}\right]+f(T)+≤ - divide start_ARG italic_V end_ARG start_ARG italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) ) ] + italic_f ( italic_T ) +
g(T)𝔼[t=1T𝑸(t)1]3/4log𝔼[t=1T𝑸(t)1]+h(T)𝔼[t=1T𝑸(t)1]7/8,\displaystyle\quad g(T)\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}% \lVert\bm{Q}(t)\rVert_{1}\right]^{3/4}\log\operatornamewithlimits{\mathbb{E}}% \left[\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{1}\right]+h(T)% \operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{% 1}\right]^{7/8},italic_g ( italic_T ) blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT roman_log blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] + italic_h ( italic_T ) blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT 7 / 8 end_POSTSUPERSCRIPT , (20)

where f(T)=𝒪T(T)𝑓𝑇subscript𝒪𝑇𝑇f(T)=\operatorname{\mathcal{O}}_{T}(T)italic_f ( italic_T ) = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T ), g(T)=𝒪T(T1/4δa/2)𝑔𝑇subscript𝒪𝑇superscript𝑇14subscript𝛿𝑎2g(T)=\operatorname{\mathcal{O}}_{T}(T^{1/4-\delta_{a}/2})italic_g ( italic_T ) = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT 1 / 4 - italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT / 2 end_POSTSUPERSCRIPT ), and h(T)=𝒪T(T1/8δλ/4)𝑇subscript𝒪𝑇superscript𝑇18subscript𝛿𝜆4h(T)=\operatorname{\mathcal{O}}_{T}(T^{1/8-\delta_{\lambda}/4})italic_h ( italic_T ) = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT 1 / 8 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT / 4 end_POSTSUPERSCRIPT ).

Step 1 (Develop a Coarse Average Queue Length Bound).

By the boundedness of gtsubscript𝑔𝑡g_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, the first term on the RHS of Equation 20 is controlled by VϵWTG=𝒪T(VT)𝑉subscriptitalic-ϵ𝑊𝑇𝐺subscript𝒪𝑇𝑉𝑇\frac{V}{\epsilon_{W}}TG=\operatorname{\mathcal{O}}_{T}(VT)divide start_ARG italic_V end_ARG start_ARG italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT end_ARG italic_T italic_G = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_V italic_T ) (note that, as V=poly(T)𝑉poly𝑇V=\text{poly}(T)italic_V = poly ( italic_T ), V𝑉Vitalic_V is not a constant that can be hidden in 𝒪Tsubscript𝒪𝑇\operatorname{\mathcal{O}}_{T}caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT and this term is actually super-linear). Similar to the one used when proving Theorem 3.6, we develop another self-bounding property that says yf+y3/4glogy+y7/8h𝑦𝑓superscript𝑦34𝑔𝑦superscript𝑦78y\leq f+y^{3/4}g\log y+y^{7/8}hitalic_y ≤ italic_f + italic_y start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT italic_g roman_log italic_y + italic_y start_POSTSUPERSCRIPT 7 / 8 end_POSTSUPERSCRIPT italic_h infers y=𝒪(f)+𝒪~(g4)+𝒪(h8)𝑦𝒪𝑓~𝒪superscript𝑔4𝒪superscript8y=\operatorname{\mathcal{O}}(f)+\operatorname{\widetilde{\mathcal{O}}}(g^{4})+% \operatorname{\mathcal{O}}(h^{8})italic_y = caligraphic_O ( italic_f ) + start_OPFUNCTION over~ start_ARG caligraphic_O end_ARG end_OPFUNCTION ( italic_g start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) + caligraphic_O ( italic_h start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT ) (see Lemma D.6). Therefore,

𝔼[t=1T𝑸(t)1]=𝒪T(VT+T)+𝒪~T(T12δa)+𝒪T(T12δλ)=𝒪T(VT),𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1subscript𝒪𝑇𝑉𝑇𝑇subscript~𝒪𝑇superscript𝑇12subscript𝛿𝑎subscript𝒪𝑇superscript𝑇12subscript𝛿𝜆subscript𝒪𝑇𝑉𝑇\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{% 1}\right]=\operatorname{\mathcal{O}}_{T}\left(VT+T\right)+\operatorname{% \widetilde{\mathcal{O}}}_{T}\left(T^{1-2\delta_{a}}\right)+\operatorname{% \mathcal{O}}_{T}\left(T^{1-2\delta_{\lambda}}\right)=\operatorname{\mathcal{O}% }_{T}(VT),blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_V italic_T + italic_T ) + start_OPFUNCTION over~ start_ARG caligraphic_O end_ARG end_OPFUNCTION start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT 1 - 2 italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) + caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT 1 - 2 italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_V italic_T ) ,

only giving a 1T𝔼[t=1T𝑸(t)1]=𝒪T(V)=ωT(1)1𝑇𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1subscript𝒪𝑇𝑉subscript𝜔𝑇1\frac{1}{T}\operatornamewithlimits{\mathbb{E}}[\sum_{t=1}^{T}\lVert\bm{Q}(t)% \rVert_{1}]=\operatorname{\mathcal{O}}_{T}(V)=\omega_{T}(1)divide start_ARG 1 end_ARG start_ARG italic_T end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_V ) = italic_ω start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( 1 ) bound on the average queue length (violating the system stability condition Equation 2). However, this inequality can be used to derive the polynomial convergence result on the utility, which in turn further refines the queue length bound.

Step 2 (Yield Polynomial Convergence on the Utility).

Moving the difference in the average utility to the LHS in Equation 20 and plugging in the just-derived bound on average queue length, we have

VϵWT𝔼[t=1T(gt(𝝀̊(t))gt(𝝀(t))))]\displaystyle\frac{V}{\epsilon_{W}T}\operatornamewithlimits{\mathbb{E}}\left[% \sum_{t=1}^{T}\left(g_{t}(\mathring{\bm{\lambda}}(t))-g_{t}(\bm{\lambda}(t)))% \right)\right]divide start_ARG italic_V end_ARG start_ARG italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_T end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) ) ) ] =𝒪T(0+f(T)T+g(T)T(VT)3/4+h(T)T(VT)7/8)absentsubscript𝒪𝑇0𝑓𝑇𝑇𝑔𝑇𝑇superscript𝑉𝑇34𝑇𝑇superscript𝑉𝑇78\displaystyle=\operatorname{\mathcal{O}}_{T}\left(-0+\frac{f(T)}{T}+\frac{g(T)% }{T}(VT)^{3/4}+\frac{h(T)}{T}(VT)^{7/8}\right)= caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( - 0 + divide start_ARG italic_f ( italic_T ) end_ARG start_ARG italic_T end_ARG + divide start_ARG italic_g ( italic_T ) end_ARG start_ARG italic_T end_ARG ( italic_V italic_T ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT + divide start_ARG italic_h ( italic_T ) end_ARG start_ARG italic_T end_ARG ( italic_V italic_T ) start_POSTSUPERSCRIPT 7 / 8 end_POSTSUPERSCRIPT )
=𝒪T(1+(T12δaV3T)1/4+(T12δλV7T)1/8).absentsubscript𝒪𝑇1superscriptsuperscript𝑇12subscript𝛿𝑎superscript𝑉3𝑇14superscriptsuperscript𝑇12subscript𝛿𝜆superscript𝑉7𝑇18\displaystyle=\operatorname{\mathcal{O}}_{T}\left(1+\left(\frac{T^{1-2\delta_{% a}}V^{3}}{T}\right)^{1/4}+\left(\frac{T^{1-2\delta_{\lambda}}V^{7}}{T}\right)^% {1/8}\right).= caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( 1 + ( divide start_ARG italic_T start_POSTSUPERSCRIPT 1 - 2 italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_V start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_T end_ARG ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + ( divide start_ARG italic_T start_POSTSUPERSCRIPT 1 - 2 italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_V start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT end_ARG start_ARG italic_T end_ARG ) start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT ) .

According to the assumption that V=𝒪T(min{T2δa/3,T2δλ/7})𝑉subscript𝒪𝑇superscript𝑇2subscript𝛿𝑎3superscript𝑇2subscript𝛿𝜆7V=\operatorname{\mathcal{O}}_{T}(\min\{T^{2\delta_{a}/3},T^{2\delta_{\lambda}/% 7}\})italic_V = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( roman_min { italic_T start_POSTSUPERSCRIPT 2 italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT / 3 end_POSTSUPERSCRIPT , italic_T start_POSTSUPERSCRIPT 2 italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT / 7 end_POSTSUPERSCRIPT } ), the RHS is of order 𝒪T(1)subscript𝒪𝑇1\operatorname{\mathcal{O}}_{T}(1)caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( 1 ). Thus

1T𝔼[t=1T(gt(𝝀̊(t))gt(𝝀(t))))]=𝒪T(V1),\frac{1}{T}\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\left(g_{t}(% \mathring{\bm{\lambda}}(t))-g_{t}(\bm{\lambda}(t)))\right)\right]=% \operatorname{\mathcal{O}}_{T}(V^{-1}),divide start_ARG 1 end_ARG start_ARG italic_T end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) ) ) ] = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ,

i.e., a polynomial convergence rate on the expected average utility is derived.

Step 3 (Refine the Average Queue Length Bound).

Now we are ready to refine our average queue length bound. Instead of controlling the utility with boundedness gt[G,G]subscript𝑔𝑡𝐺𝐺g_{t}\in[-G,G]italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ [ - italic_G , italic_G ], we utilize the just-derived convergence result and yields (again using the self-bounding property in Lemma D.6)

𝔼[t=1T𝑸(t)1]=𝒪T(V×V1T+T)+𝒪~T(T12δa)+𝒪T(T12δλ)=𝒪T(T),𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1subscript𝒪𝑇𝑉superscript𝑉1𝑇𝑇subscript~𝒪𝑇superscript𝑇12subscript𝛿𝑎subscript𝒪𝑇superscript𝑇12subscript𝛿𝜆subscript𝒪𝑇𝑇\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{% 1}\right]=\operatorname{\mathcal{O}}_{T}\left(V\times V^{-1}T+T\right)+% \operatorname{\widetilde{\mathcal{O}}}_{T}\left(T^{1-2\delta_{a}}\right)+% \operatorname{\mathcal{O}}_{T}\left(T^{1-2\delta_{\lambda}}\right)=% \operatorname{\mathcal{O}}_{T}(T),blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_V × italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_T + italic_T ) + start_OPFUNCTION over~ start_ARG caligraphic_O end_ARG end_OPFUNCTION start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT 1 - 2 italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) + caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT 1 - 2 italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T ) ,

that is, the average queue length 1T𝔼[t=1T𝑸(t)1]=𝒪T(1)1𝑇𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1subscript𝒪𝑇1\frac{1}{T}\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{Q}% (t)\rVert_{1}\right]=\operatorname{\mathcal{O}}_{T}(1)divide start_ARG 1 end_ARG start_ARG italic_T end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( 1 ), which means Equation 2 holds and the system is stable. Putting Step 2 and Step 3 together gives our conclusion.

Once again, we omitted all factors except for those poly(T)poly𝑇\text{poly}(T)poly ( italic_T ) ones throughout this proof sketch. Please refer to Section C.5 for the complete version. ∎

5 Conclusion

We study utility maximization in Adversarial Network Optimization (ANO) under bandit feedback. We design a network stability algorithm NSO and a utility maximization algorithm UMO2, which both ingeniously integrate online learning components into Lyapunov drift framework to allow a joint analysis. When designing the online learning components of UMO2, due to the potentially unbounded queue lengths in network optimization and the self-bounding analysis we want to conduct, we develop a novel OLO algorithm AdaPFOL which adapts to occasionally large losses and a BCO algorithm AdaBGD which suites large loss magnitudes and Lipschitzness via a meticulous learning rate scheduling scheme. One important future research direction will be defining other alternative reference policy classes that allows competing to more policies, even the optimal ones.

References

  • Andrews and Zhang (2004) Matthew Andrews and Lisa Zhang. Scheduling over nonstationary wireless channels with finite rate sets. In IEEE INFOCOM 2004, volume 3, pages 1694–1704. IEEE, 2004.
  • Andrews and Zhang (2005) Matthew Andrews and Lisa Zhang. Scheduling over a time-varying user-dependent channel with applications to high-speed wireless data. Journal of the ACM (JACM), 52(5):809–834, 2005.
  • Andrews et al. (2001) Matthew Andrews, Baruch Awerbuch, Antonio Fernández, Tom Leighton, Zhiyong Liu, and Jon Kleinberg. Universal-stability results and performance bounds for greedy contention-resolution protocols. Journal of the ACM (JACM), 48(1):39–69, 2001.
  • Andrews et al. (2007) Matthew Andrews, Kyomin Jung, and Alexander Stolyar. Stability of the max-weight routing and scheduling protocol in dynamic networks and at critical loads. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 145–154, 2007.
  • Ashjaei et al. (2021) Mohammad Ashjaei, Lucia Lo Bello, Masoud Daneshtalab, Gaetano Patti, Sergio Saponara, and Saad Mubeen. Time-sensitive networking in automotive embedded systems: State of the art and research opportunities. Journal of systems architecture, 117:102137, 2021.
  • Auer (2002) Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3(Nov):397–422, 2002.
  • Auer et al. (2002) Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E Schapire. The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48–77, 2002.
  • Borodin et al. (2001) Allan Borodin, Jon Kleinberg, Prabhakar Raghavan, Madhu Sudan, and David P Williamson. Adversarial queuing theory. Journal of the ACM (JACM), 48(1):13–38, 2001.
  • Chen and Giannakis (2018) Tianyi Chen and Georgios B Giannakis. Bandit convex optimization for scalable and dynamic iot management. IEEE Internet of Things Journal, 6(1):1276–1286, 2018.
  • Cholvi and Echagüe (2007) Vicent Cholvi and Juan Echagüe. Stability of fifo networks under adversarial models: State of the art. Computer Networks, 51(15):4460–4474, 2007.
  • Choudhury et al. (2021) Tuhinangshu Choudhury, Gauri Joshi, Weina Wang, and Sanjay Shakkottai. Job dispatching policies for queueing systems with unknown service rates. In Proceedings of the Twenty-second International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing, pages 181–190, 2021.
  • Cruz (1991) Rene L Cruz. A calculus for network delay. i. network elements in isolation. IEEE Transactions on information theory, 37(1):114–131, 1991.
  • Cutkosky (2020) Ashok Cutkosky. Parameter-free, dynamic, and strongly-adaptive online learning. In International Conference on Machine Learning, pages 2250–2259. PMLR, 2020.
  • Dai and Gluzman (2022) Jim G Dai and Mark Gluzman. Queueing network controls via deep reinforcement learning. Stochastic Systems, 12(1):30–67, 2022.
  • Dai et al. (2023) Yan Dai, Haipeng Luo, Chen-Yu Wei, and Julian Zimmert. Refined regret for adversarial mdps with linear function approximation. In International Conference on Machine Learning, pages 6726–6759. PMLR, 2023.
  • Duchi et al. (2011) John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, 12(7), 2011.
  • Flaxman et al. (2005) Abraham D Flaxman, Adam Tauman Kalai, and H Brendan McMahan. Online convex optimization in the bandit setting: gradient descent without a gradient. In Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, pages 385–394, 2005.
  • Fu and Modiano (2022) Xinzhe Fu and Eytan Modiano. Joint learning and control in stochastic queueing networks with unknown utilities. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 6(3):1–32, 2022.
  • Gaddam et al. (2020) Anuroop Gaddam, Tim Wilkin, Maia Angelova, and Jyotheesh Gaddam. Detecting sensor faults, anomalies and outliers in the internet of things: A survey on the challenges and solutions. Electronics, 9(3):511, 2020.
  • Harrison and Wein (1990) J Michael Harrison and Lawrence M Wein. Scheduling networks of queues: Heavy traffic analysis of a two-station closed network. Operations research, 38(6):1052–1064, 1990.
  • Huang et al. (2024) Jiatai Huang, Leana Golubchik, and Longbo Huang. When lyapunov drift based queue scheduling meets adversarial bandit learning. IEEE/ACM Transactions on Networking, 2024.
  • Huang and Neely (2011) Longbo Huang and Michael J Neely. Utility optimal scheduling in processing networks. Performance Evaluation, 68(11):1002–1021, 2011.
  • Huang et al. (2012) Longbo Huang, Scott Moeller, Michael J Neely, and Bhaskar Krishnamachari. Lifo-backpressure achieves near-optimal utility-delay tradeoff. IEEE/ACM Transactions On Networking, 21(3):831–844, 2012.
  • Khan et al. (2020) Md Rizwan Khan, Bikramaditya Das, and Bibhuti Bhusan Pati. Channel estimation strategies for underwater acoustic (uwa) communication: An overview. Journal of the Franklin Institute, 357(11):7229–7265, 2020.
  • Krishnasamy et al. (2018) Subhashini Krishnasamy, PT Akhil, Ari Arapostathis, Rajesh Sundaresan, and Sanjay Shakkottai. Augmenting max-weight with explicit learning for wireless scheduling with switching costs. IEEE/ACM Transactions on Networking, 26(6):2501–2514, 2018.
  • Krishnasamy et al. (2021) Subhashini Krishnasamy, Rajat Sen, Ramesh Johari, and Sanjay Shakkottai. Learning unknown service rates in queues: A multiarmed bandit approach. Operations research, 69(1):315–330, 2021.
  • Liang and Modiano (2018a) Qingkai Liang and Evtan Modiano. Network utility maximization in adversarial environments. In IEEE INFOCOM 2018-IEEE Conference on Computer Communications, pages 594–602. IEEE, 2018a.
  • Liang and Modiano (2018b) Qingkai Liang and Eytan Modiano. Minimizing queue length regret under adversarial network models. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 2(1):1–32, 2018b.
  • Lim et al. (2013) Sungsu Lim, Kyomin Jung, and Matthew Andrews. Stability of the max-weight protocol in adversarial wireless networks. IEEE/ACM Transactions on Networking, 22(6):1859–1872, 2013.
  • Liu et al. (2022) Bai Liu, Qiaomin Xie, and Eytan Modiano. Rl-qn: A reinforcement learning framework for optimal control of queueing systems. ACM Transactions on Modeling and Performance Evaluation of Computing Systems, 7(1):1–35, 2022.
  • Maguluri et al. (2012) Siva Theja Maguluri, Rayadurgam Srikant, and Lei Ying. Stochastic models of load balancing and scheduling in cloud computing clusters. In 2012 Proceedings IEEE Infocom, pages 702–710. IEEE, 2012.
  • McMahan and Streeter (2010) H Brendan McMahan and Matthew Streeter. Adaptive bound optimization for online convex optimization. Annual Conference on Learning Theory 2010, page 244, 2010.
  • Neely (2008) Michael J Neely. Order optimal delay for opportunistic scheduling in multi-user wireless uplinks and downlinks. IEEE/ACM Transactions on Networking, 16(5):1188–1199, 2008.
  • Neely (2009) Michael J Neely. Delay analysis for max weight opportunistic scheduling in wireless systems. IEEE Transactions on Automatic Control, 54(9):2137–2150, 2009.
  • Neely (2010a) Michael J Neely. Stochastic network optimization with application to communication and queueing systems. Synthesis Lectures on Communication Networks, 3(1):1–211, 2010a.
  • Neely (2010b) Michael J Neely. Universal scheduling for networks with arbitrary traffic, channels, and mobility. In 49th IEEE Conference on Decision and Control (CDC), pages 1822–1829. IEEE, 2010b.
  • Neely et al. (2008) Michael J Neely, Eytan Modiano, and Chih-Ping Li. Fairness and optimal stochastic control for heterogeneous networks. IEEE/ACM Transactions On Networking, 16(2):396–409, 2008.
  • Neely et al. (2012) Michael J Neely, Scott T Rager, and Thomas F La Porta. Max weight learning algorithms for scheduling in unknown environments. IEEE Transactions on Automatic Control, 57(5):1179–1191, 2012.
  • Rahdar et al. (2018) Mohammad Rahdar, Lizhi Wang, and Guiping Hu. A tri-level optimization model for inventory control with uncertain demand and lead time. International Journal of Production Economics, 195:96–105, 2018.
  • Sadiq and De Veciana (2009) Bilal Sadiq and Gustavo De Veciana. Throughput optimality of delay-driven maxweight scheduler for a wireless system with flow dynamics. In 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 1097–1102. IEEE, 2009.
  • Srikant and Ying (2013) Rayadurgam Srikant and Lei Ying. Communication networks: an optimization, control, and stochastic networks perspective. Cambridge University Press, 2013.
  • Tsibonis et al. (2003) Vagelis Tsibonis, Leonidas Georgiadis, and Leandros Tassiulas. Exploiting wireless channel state information for throughput maximization. In IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No. 03CH37428), volume 1, pages 301–310. IEEE, 2003.
  • Wei et al. (2024) Honghao Wei, Xin Liu, Weina Wang, and Lei Ying. Sample efficient reinforcement learning in mixed systems through augmented samples and its applications to queueing networks. Advances in Neural Information Processing Systems, 36, 2024.
  • Yang et al. (2023) Zixian Yang, R Srikant, and Lei Ying. Learning while scheduling in multi-server systems with unknown statistics: Maxweight with discounted ucb. In International Conference on Artificial Intelligence and Statistics, pages 4275–4312. PMLR, 2023.
  • Zhao et al. (2021) Peng Zhao, Guanghui Wang, Lijun Zhang, and Zhi-Hua Zhou. Bandit convex optimization in non-stationary environments. Journal of Machine Learning Research, 22(125):1–45, 2021.
  • Zheng et al. (2019) Kai Zheng, Haipeng Luo, Ilias Diakonikolas, and Liwei Wang. Equipping experts/bandits with long-term memory. Advances in Neural Information Processing Systems, 32, 2019.
  • Zinkevich (2003) Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th international conference on machine learning (icml-03), pages 928–936, 2003.
  • Zou et al. (2016) Yulong Zou, Jia Zhu, Xianbin Wang, and Lajos Hanzo. A survey on wireless security: Technical challenges, recent advances, and future trends. Proceedings of the IEEE, 104(9):1727–1765, 2016.
\appendixpage
\startcontents

[section] \printcontents[section]l1

Appendix A Additional Related Works

Adversarial Components in Network Optimization

The study of adversarial components in network optimization dates back to the 1990s, when Cruz (1991) gave the first network model with adversarial dynamics. This was further generalized to Adversarial Queueing Theory (Borodin et al., 2001) and Leaky Bucket (Andrews et al., 2001) models. Many follow-up works, as surveyed by Cholvi and Echagüe (2007), considered network optimization under various types of adversarial traffic injections (i.e., the arrival rates to each queue are adversarial). However, these early works only considered adversarial arrival rates but assume link conditions are stationary, which cannot capture the fact that wireless communication networks can have very different link conditions from time to time due to congestions (Zou et al., 2016).

Noticing this shortcoming, Andrews and Zhang (2004) and Andrews and Zhang (2005) studied a single-hop network where link conditions are also adversarial. Their works were extended to multi-hop networks by Andrews et al. (2007) and Lim et al. (2013). For a more up-to-date discussion on these works, we refer the readers to the discussions in (Liang and Modiano, 2018b).

Utility Maximization in Adversarial Networks. While it has been more and more results considering network stability in adversarial networks, the utility maximization guarantees are not so common. Neely (2010b) proposed the universal network utility maximization problem which considers competing with a look-ahead policy that has perfect knowledge about the near future. Liang and Modiano (2018a) generalized the aforementioned stability requirement in another way and showed a trade-off between network stability and utility maximization. However, both papers assumed perfect knowledge on network conditions, as summarized in Table 1.

Feedback Models. Most previous works considered the perfect knowledge model which assumes known network conditions before decison (Liang and Modiano, 2018b; a). Despite its simplicity, this assumption eliminates the hardness of estimating the network topology or link conditions, which we argue is highly non-trivial due to the unpredictability in a drastically varying network like underwater communications (Khan et al., 2020) or IoT systems (Gaddam et al., 2020). The harder full-information feedback model assumes no prior knowledge at decision-making but requires the network conditions to be fully revealed after decision; a variant of this model was proposed by Neely et al. (2012) under the name of 2-stage decision model. In this paper, we consider the hardest bandit feedback model, which rules out all counterfactual feedback that are associated with the actions that are not really deployed. This model, under various different names, were recently proposed and considered by Fu and Modiano (2022), Yang et al. (2023), and Huang et al. (2024).

Adversarial Networks under Bandit Feedback. As we are aware of, the papers closest to ours are the ones by Huang et al. (2024) and Yang et al. (2023), which also studied adversarial networks under bandit feedback. However, both papers assumed a single-hop network model, i.e., jobs immediately leave the network upon being served. This model finds shortcoming when trying to reflect the reality where some jobs may be forwarded within the network for many hops – for example, in the classical criss-cross network extensively studied in the SNO literature (Harrison and Wein, 1990). In contrast, our multi-hop model allows jobs being forwarded from one server to another and is much more general.

Moreover, both papers only considered the task of network stability, i.e., the number of jobs remaining in the network (which is the sum of queue lengths) does not diverge; see Equation 2. However, only stabilizing the system may not be enough in many realistic problems, where the throughput (average number of jobs getting served) (Tsibonis et al., 2003, Sadiq and De Veciana, 2009) or delay (average waiting time for each job from entering the system to being served) (Neely, 2008; 2009) should be optimized. In our paper, we consider the utility maximization task (Huang and Neely, 2011, Huang et al., 2012) where an abstract utility function shall be optimized, allowing various network optimization objectives other than simply stabilizing the system.

Learning-Augmented Algorithms in Network Optimization. Finally, we give a brief overview of recent learning-augmented algorithms in network optimization. To tackle the lack of accurate channel information (e.g., under the feedback model), exploration approaches like ϵitalic-ϵ\epsilonitalic_ϵ-greedy (Krishnasamy et al., 2018; 2021) or Upper Confidence Bound (UCB) (Auer, 2002) (e.g., (Choudhury et al., 2021, Krishnasamy et al., 2021, Yang et al., 2023)) were widely used. More recently, using a Reinforcement Learning (RL) approach, Liu et al. (2022) proposed the RL-QN algorithm by putting the queue lengths as the state in RL, which outperforms many existing SNO algorithms. Empirically, utilizing the recent advances in Deep RL (DRL), Dai and Gluzman (2022) established state-of-the-art control performance in the criss-cross network. Following this line, Wei et al. (2024) compressed the number of states and yielded improved performance in other networks as well. However, most aforementioned works only considered the stochastic regime. Moreover, RL approaches (especially those DRL ones) typically have extremely large space when modelling network optimization problems, making the algorithm computationally infeasible. In contrast, our online-learning-based approach makes the algorithm not only capable of adversarial environment and bandit feedback but also computationally efficient.

Appendix B Omitted Proofs for Multi-Hop Network Stability Tasks

Before presenting the theorem proofs, we first give a bound on the queue length increments.

Lemma B.1 (Queue Length Increment).

For every n𝒩𝑛𝒩n\in{\mathcal{N}}italic_n ∈ caligraphic_N, k𝒩𝑘𝒩k\in{\mathcal{N}}italic_k ∈ caligraphic_N, and t[T]𝑡delimited-[]𝑇t\in[T]italic_t ∈ [ italic_T ], we have

|Qn(k)(t+1)Qn(k)(t)|(2NM+R).superscriptsubscript𝑄𝑛𝑘𝑡1superscriptsubscript𝑄𝑛𝑘𝑡2𝑁𝑀𝑅\lvert Q_{n}^{(k)}(t+1)-Q_{n}^{(k)}(t)\rvert\leq(2NM+R).| italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t + 1 ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) | ≤ ( 2 italic_N italic_M + italic_R ) .
Proof.

According to the queue length dynamics in Equation 1, we know

|Qn(k)(t+1)Qn(k)(t)||(n,m)μn,m(k)(t)+(o,n)μo,n(k)(t)+λn(k)(t)|(2NM+R),superscriptsubscript𝑄𝑛𝑘𝑡1superscriptsubscript𝑄𝑛𝑘𝑡subscript𝑛𝑚superscriptsubscript𝜇𝑛𝑚𝑘𝑡subscript𝑜𝑛superscriptsubscript𝜇𝑜𝑛𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡2𝑁𝑀𝑅\lvert Q_{n}^{(k)}(t+1)-Q_{n}^{(k)}(t)\rvert\leq\left\lvert\sum_{(n,m)\in{% \mathcal{L}}}\mu_{n,m}^{(k)}(t)+\sum_{(o,n)\in{\mathcal{L}}}\mu_{o,n}^{(k)}(t)% +\lambda_{n}^{(k)}(t)\right\rvert\leq(2NM+R),| italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t + 1 ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) | ≤ | ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) + ∑ start_POSTSUBSCRIPT ( italic_o , italic_n ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) | ≤ ( 2 italic_N italic_M + italic_R ) ,

which utilizes the assumption that λn(k)(t)[0,R]superscriptsubscript𝜆𝑛𝑘𝑡0𝑅\lambda_{n}^{(k)}(t)\in[0,R]italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ∈ [ 0 , italic_R ] and μ(n,m)(k)(t)[0,M]superscriptsubscript𝜇𝑛𝑚𝑘𝑡0𝑀\mu_{(n,m)}^{(k)}(t)\in[0,M]italic_μ start_POSTSUBSCRIPT ( italic_n , italic_m ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ∈ [ 0 , italic_M ]. ∎

B.1 Reference Policy Assumption (Proof of Lemma 3.1)

Lemma B.2 (Restatement of Lemma 3.1; Ability of {𝒂̊(t)}t[T]subscript̊𝒂𝑡𝑡delimited-[]𝑇\{\mathring{\bm{a}}(t)\}_{t\in[T]}{ over̊ start_ARG bold_italic_a end_ARG ( italic_t ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT in Stabilizing the Network).

If {𝐚̊(t)}t[T]subscript̊𝐚𝑡𝑡delimited-[]𝑇\{\mathring{\bm{a}}(t)\}_{t\in[T]}{ over̊ start_ARG bold_italic_a end_ARG ( italic_t ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT satisfies 1, then for any scheduler-generated queue lengths {𝐐(t)}t[T]subscript𝐐𝑡𝑡delimited-[]𝑇\{\bm{Q}(t)\}_{t\in[T]}{ bold_italic_Q ( italic_t ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT,

ϵW𝔼[t=1Tn𝒩k𝒩Qn(k)(t)](N2(2NM+R)2+ϵWN2(2NM+R))CWTsubscriptitalic-ϵ𝑊𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡superscript𝑁2superscript2𝑁𝑀𝑅2subscriptitalic-ϵ𝑊superscript𝑁22𝑁𝑀𝑅subscript𝐶𝑊𝑇\displaystyle\quad\epsilon_{W}\operatornamewithlimits{\mathbb{E}}\left[\sum_{t% =1}^{T}\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q_{n}^{(k)}(t)\right]-(% N^{2}(2NM+R)^{2}+\epsilon_{W}N^{2}(2NM+R))C_{W}Titalic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] - ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) ) italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_T
𝔼[t=1T(n,m)k𝒩Cn,m(t)ån,m(k)(t)(Qm(k)(t)Qn(k)(t))]𝔼[t=1Tn𝒩k𝒩Qn(k)(t)λn(k)(t)].absent𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝑚subscript𝑘𝒩subscript𝐶𝑛𝑚𝑡superscriptsubscript̊𝑎𝑛𝑚𝑘𝑡superscriptsubscript𝑄𝑚𝑘𝑡superscriptsubscript𝑄𝑛𝑘𝑡𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡\displaystyle\leq-\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum_% {(n,m)\in{\mathcal{L}}}\sum_{k\in{\mathcal{N}}}C_{n,m}(t)\mathring{a}_{n,m}^{(% k)}(t)(Q_{m}^{(k)}(t)-Q_{n}^{(k)}(t))\right]-\operatornamewithlimits{\mathbb{E% }}\left[\sum_{t=1}^{T}\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q_{n}^{(% k)}(t)\lambda_{n}^{(k)}(t)\right].≤ - blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ( italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) ] - blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] .
Proof.

The proof almost follows that of Huang et al. (2024, Lemma 2). We adapt their proof here for completeness. For each interval Wjsubscript𝑊𝑗W_{j}italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in 1, let T0subscript𝑇0T_{0}italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT be the first round in Wjsubscript𝑊𝑗W_{j}italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Then

tWjn𝒩k𝒩Qn(k)(t)((n,m)Cn,m(t)ån,m(k)(t)λn(k)(t)(o,n)Co,n(t)åo,n(k)(t))subscript𝑡subscript𝑊𝑗subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡subscript𝑛𝑚subscript𝐶𝑛𝑚𝑡superscriptsubscript̊𝑎𝑛𝑚𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡subscript𝑜𝑛subscript𝐶𝑜𝑛𝑡superscriptsubscript̊𝑎𝑜𝑛𝑘𝑡\displaystyle\quad\sum_{t\in W_{j}}\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal% {N}}}Q_{n}^{(k)}(t)\left(\sum_{(n,m)\in{\mathcal{L}}}C_{n,m}(t)\mathring{a}_{n% ,m}^{(k)}(t)-\lambda_{n}^{(k)}(t)-\sum_{(o,n)\in{\mathcal{L}}}C_{o,n}(t)% \mathring{a}_{o,n}^{(k)}(t)\right)∑ start_POSTSUBSCRIPT italic_t ∈ italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ( ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - ∑ start_POSTSUBSCRIPT ( italic_o , italic_n ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) )
=tWjn𝒩k𝒩Qn(k)(T0)((n,m)Cn,m(t)ån,m(k)(t)λn(k)(t)(o,n)Co,n(t)åo,n(k)(t))+absentlimit-fromsubscript𝑡subscript𝑊𝑗subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘subscript𝑇0subscript𝑛𝑚subscript𝐶𝑛𝑚𝑡superscriptsubscript̊𝑎𝑛𝑚𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡subscript𝑜𝑛subscript𝐶𝑜𝑛𝑡superscriptsubscript̊𝑎𝑜𝑛𝑘𝑡\displaystyle=\sum_{t\in W_{j}}\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}% }Q_{n}^{(k)}(T_{0})\left(\sum_{(n,m)\in{\mathcal{L}}}C_{n,m}(t)\mathring{a}_{n% ,m}^{(k)}(t)-\lambda_{n}^{(k)}(t)-\sum_{(o,n)\in{\mathcal{L}}}C_{o,n}(t)% \mathring{a}_{o,n}^{(k)}(t)\right)+= ∑ start_POSTSUBSCRIPT italic_t ∈ italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ( ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - ∑ start_POSTSUBSCRIPT ( italic_o , italic_n ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) +
tWjn𝒩k𝒩(Qn(k)(t)Qn(k)(T0))((n,m)Cn,m(t)ån,m(k)(t)λn(k)(t)(o,n)Co,n(t)åo,n(k)(t)).subscript𝑡subscript𝑊𝑗subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡superscriptsubscript𝑄𝑛𝑘subscript𝑇0subscript𝑛𝑚subscript𝐶𝑛𝑚𝑡superscriptsubscript̊𝑎𝑛𝑚𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡subscript𝑜𝑛subscript𝐶𝑜𝑛𝑡superscriptsubscript̊𝑎𝑜𝑛𝑘𝑡\displaystyle\quad\sum_{t\in W_{j}}\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal% {N}}}\left(Q_{n}^{(k)}(t)-Q_{n}^{(k)}(T_{0})\right)\left(\sum_{(n,m)\in{% \mathcal{L}}}C_{n,m}(t)\mathring{a}_{n,m}^{(k)}(t)-\lambda_{n}^{(k)}(t)-\sum_{% (o,n)\in{\mathcal{L}}}C_{o,n}(t)\mathring{a}_{o,n}^{(k)}(t)\right).∑ start_POSTSUBSCRIPT italic_t ∈ italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT ( italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) ( ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - ∑ start_POSTSUBSCRIPT ( italic_o , italic_n ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) .

For the first term, we have the following for every n𝒩𝑛𝒩n\in{\mathcal{N}}italic_n ∈ caligraphic_N and k𝒩𝑘𝒩k\in{\mathcal{N}}italic_k ∈ caligraphic_N, according to 1:

tWj((n,m)Cn,m(t)ån,m(k)(t)λn(k)(t)(o,n)Co,n(t)åo,n(k)(t))ϵW|Wj|.subscript𝑡subscript𝑊𝑗subscript𝑛𝑚subscript𝐶𝑛𝑚𝑡superscriptsubscript̊𝑎𝑛𝑚𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡subscript𝑜𝑛subscript𝐶𝑜𝑛𝑡superscriptsubscript̊𝑎𝑜𝑛𝑘𝑡subscriptitalic-ϵ𝑊subscript𝑊𝑗\displaystyle\sum_{t\in W_{j}}\left(\sum_{(n,m)\in{\mathcal{L}}}C_{n,m}(t)% \mathring{a}_{n,m}^{(k)}(t)-\lambda_{n}^{(k)}(t)-\sum_{(o,n)\in{\mathcal{L}}}C% _{o,n}(t)\mathring{a}_{o,n}^{(k)}(t)\right)\geq\epsilon_{W}\lvert W_{j}\rvert.∑ start_POSTSUBSCRIPT italic_t ∈ italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - ∑ start_POSTSUBSCRIPT ( italic_o , italic_n ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) ≥ italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT | italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | .

Thus, the first summation enjoys the following lower bound:

tWjn𝒩k𝒩Qn(k)(T0)((n,m)Cn,m(t)ån,m(k)(t)λn(k)(t)(o,n)Co,n(t)åo,n(k)(t))subscript𝑡subscript𝑊𝑗subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘subscript𝑇0subscript𝑛𝑚subscript𝐶𝑛𝑚𝑡superscriptsubscript̊𝑎𝑛𝑚𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡subscript𝑜𝑛subscript𝐶𝑜𝑛𝑡superscriptsubscript̊𝑎𝑜𝑛𝑘𝑡\displaystyle\quad\sum_{t\in W_{j}}\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal% {N}}}Q_{n}^{(k)}(T_{0})\left(\sum_{(n,m)\in{\mathcal{L}}}C_{n,m}(t)\mathring{a% }_{n,m}^{(k)}(t)-\lambda_{n}^{(k)}(t)-\sum_{(o,n)\in{\mathcal{L}}}C_{o,n}(t)% \mathring{a}_{o,n}^{(k)}(t)\right)∑ start_POSTSUBSCRIPT italic_t ∈ italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ( ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - ∑ start_POSTSUBSCRIPT ( italic_o , italic_n ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) )
(n𝒩k𝒩Qn(k)(T0))ϵW|Wj|=ϵW|Wj|×𝑸(T0)1ϵWtWj𝑸(t)1ϵWN2(2NM+R)(|Wj|1)2,absentsubscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘subscript𝑇0subscriptitalic-ϵ𝑊subscript𝑊𝑗subscriptitalic-ϵ𝑊subscript𝑊𝑗subscriptdelimited-∥∥𝑸subscript𝑇01subscriptitalic-ϵ𝑊subscript𝑡subscript𝑊𝑗subscriptdelimited-∥∥𝑸𝑡1subscriptitalic-ϵ𝑊superscript𝑁22𝑁𝑀𝑅superscriptsubscript𝑊𝑗12\displaystyle\geq\left(\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q_{n}^{% (k)}(T_{0})\right)\epsilon_{W}\lvert W_{j}\rvert=\epsilon_{W}\lvert W_{j}% \rvert\times\lVert\bm{Q}(T_{0})\rVert_{1}\geq\epsilon_{W}\sum_{t\in W_{j}}% \lVert\bm{Q}(t)\rVert_{1}-\epsilon_{W}N^{2}(2NM+R)(\lvert W_{j}\rvert-1)^{2},≥ ( ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT | italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | = italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT | italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | × ∥ bold_italic_Q ( italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_t ∈ italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) ( | italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where the last step uses the fact that 𝑸(t)1𝑸(T0)1𝑸(t)𝑸(T0)1subscriptdelimited-∥∥𝑸𝑡1subscriptdelimited-∥∥𝑸subscript𝑇01subscriptdelimited-∥∥𝑸𝑡𝑸subscript𝑇01\lVert\bm{Q}(t)\rVert_{1}-\lVert\bm{Q}(T_{0})\rVert_{1}\leq\lVert\bm{Q}(t)-\bm% {Q}(T_{0})\rVert_{1}∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - ∥ bold_italic_Q ( italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ ∥ bold_italic_Q ( italic_t ) - bold_italic_Q ( italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and the queue length increment bound in Lemma B.1.

For the second summation, again utilizing the bound on 𝑸(t)𝑸(T0)1subscriptdelimited-∥∥𝑸𝑡𝑸subscript𝑇01\lVert\bm{Q}(t)-\bm{Q}(T_{0})\rVert_{1}∥ bold_italic_Q ( italic_t ) - bold_italic_Q ( italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, we have

tWjn𝒩k𝒩(Qn(k)(t)Qn(k)(T0))((n,m)Cn,m(t)ån,m(k)(t)λn(k)(t)(o,n)Co,n(t)åo,n(k)(t))subscript𝑡subscript𝑊𝑗subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡superscriptsubscript𝑄𝑛𝑘subscript𝑇0subscript𝑛𝑚subscript𝐶𝑛𝑚𝑡superscriptsubscript̊𝑎𝑛𝑚𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡subscript𝑜𝑛subscript𝐶𝑜𝑛𝑡superscriptsubscript̊𝑎𝑜𝑛𝑘𝑡\displaystyle\quad\sum_{t\in W_{j}}\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal% {N}}}\left(Q_{n}^{(k)}(t)-Q_{n}^{(k)}(T_{0})\right)\left(\sum_{(n,m)\in{% \mathcal{L}}}C_{n,m}(t)\mathring{a}_{n,m}^{(k)}(t)-\lambda_{n}^{(k)}(t)-\sum_{% (o,n)\in{\mathcal{L}}}C_{o,n}(t)\mathring{a}_{o,n}^{(k)}(t)\right)∑ start_POSTSUBSCRIPT italic_t ∈ italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT ( italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) ( ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - ∑ start_POSTSUBSCRIPT ( italic_o , italic_n ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) )
tWj𝑸(t)𝑸(T0)1×maxtWjmaxn𝒩maxk𝒩|(n,m)Cn,m(t)ån,m(k)(t)λn(k)(t)(o,n)Co,n(t)åo,n(k)(t)|absentsubscript𝑡subscript𝑊𝑗subscriptdelimited-∥∥𝑸𝑡𝑸subscript𝑇01subscript𝑡subscript𝑊𝑗subscript𝑛𝒩subscript𝑘𝒩subscript𝑛𝑚subscript𝐶𝑛𝑚𝑡superscriptsubscript̊𝑎𝑛𝑚𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡subscript𝑜𝑛subscript𝐶𝑜𝑛𝑡superscriptsubscript̊𝑎𝑜𝑛𝑘𝑡\displaystyle\geq-\sum_{t\in W_{j}}\lVert\bm{Q}(t)-\bm{Q}(T_{0})\rVert_{1}% \times\max_{t\in W_{j}}\max_{n\in{\mathcal{N}}}\max_{k\in{\mathcal{N}}}\left% \lvert\sum_{(n,m)\in{\mathcal{L}}}C_{n,m}(t)\mathring{a}_{n,m}^{(k)}(t)-% \lambda_{n}^{(k)}(t)-\sum_{(o,n)\in{\mathcal{L}}}C_{o,n}(t)\mathring{a}_{o,n}^% {(k)}(t)\right\rvert≥ - ∑ start_POSTSUBSCRIPT italic_t ∈ italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ bold_italic_Q ( italic_t ) - bold_italic_Q ( italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × roman_max start_POSTSUBSCRIPT italic_t ∈ italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT | ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - ∑ start_POSTSUBSCRIPT ( italic_o , italic_n ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) |
N2(2NM+R)(|Wj|1)2×(2NM+R)=N2(2NM+R)2(|Wj|1)2.absentsuperscript𝑁22𝑁𝑀𝑅superscriptsubscript𝑊𝑗122𝑁𝑀𝑅superscript𝑁2superscript2𝑁𝑀𝑅2superscriptsubscript𝑊𝑗12\displaystyle\geq-N^{2}(2NM+R)(\lvert W_{j}\rvert-1)^{2}\times(2NM+R)=-N^{2}(2% NM+R)^{2}(\lvert W_{j}\rvert-1)^{2}.≥ - italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) ( | italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT × ( 2 italic_N italic_M + italic_R ) = - italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( | italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Therefore, recall the assumption that j[J](|Wj|1)2CWTsubscript𝑗delimited-[]𝐽superscriptsubscript𝑊𝑗12subscript𝐶𝑊𝑇\sum_{j\in[J]}(\lvert W_{j}\rvert-1)^{2}\leq C_{W}T∑ start_POSTSUBSCRIPT italic_j ∈ [ italic_J ] end_POSTSUBSCRIPT ( | italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_T, summing over j=1,2,,J𝑗12𝐽j=1,2,\ldots,Jitalic_j = 1 , 2 , … , italic_J gives

ϵW𝔼[t=1Tn𝒩k𝒩Qn(k)(t)](N2(2NM+R)2+ϵWN2(2NM+R))CWTsubscriptitalic-ϵ𝑊𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡superscript𝑁2superscript2𝑁𝑀𝑅2subscriptitalic-ϵ𝑊superscript𝑁22𝑁𝑀𝑅subscript𝐶𝑊𝑇\displaystyle\quad\epsilon_{W}\operatornamewithlimits{\mathbb{E}}\left[\sum_{t% =1}^{T}\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q_{n}^{(k)}(t)\right]-(% N^{2}(2NM+R)^{2}+\epsilon_{W}N^{2}(2NM+R))C_{W}Titalic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] - ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) ) italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_T
𝔼[t=1Tn𝒩k𝒩Qn(k)(t)((n,m)Cn,m(t)ån,m(k)(t)λn(k)(t)(o,n)Co,n(t)åo,n(k)(t))]absent𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡subscript𝑛𝑚subscript𝐶𝑛𝑚𝑡superscriptsubscript̊𝑎𝑛𝑚𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡subscript𝑜𝑛subscript𝐶𝑜𝑛𝑡superscriptsubscript̊𝑎𝑜𝑛𝑘𝑡\displaystyle\leq\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum_{% n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q_{n}^{(k)}(t)\left(\sum_{(n,m)\in{% \mathcal{L}}}C_{n,m}(t)\mathring{a}_{n,m}^{(k)}(t)-\lambda_{n}^{(k)}(t)-\sum_{% (o,n)\in{\mathcal{L}}}C_{o,n}(t)\mathring{a}_{o,n}^{(k)}(t)\right)\right]≤ blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ( ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - ∑ start_POSTSUBSCRIPT ( italic_o , italic_n ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) ]
=𝔼[t=1T(n,m)k𝒩Cn,m(t)ån,m(k)(t)(Qm(k)(t)Qn(k)(t))]𝔼[t=1Tn𝒩k𝒩Qn(k)(t)λn(k)(t)],absent𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝑚subscript𝑘𝒩subscript𝐶𝑛𝑚𝑡superscriptsubscript̊𝑎𝑛𝑚𝑘𝑡superscriptsubscript𝑄𝑚𝑘𝑡superscriptsubscript𝑄𝑛𝑘𝑡𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡\displaystyle=-\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum_{(n% ,m)\in{\mathcal{L}}}\sum_{k\in{\mathcal{N}}}C_{n,m}(t)\mathring{a}_{n,m}^{(k)}% (t)(Q_{m}^{(k)}(t)-Q_{n}^{(k)}(t))\right]-\operatornamewithlimits{\mathbb{E}}% \left[\sum_{t=1}^{T}\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q_{n}^{(k)% }(t)\lambda_{n}^{(k)}(t)\right],= - blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ( italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) ] - blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] ,

thus giving our conclusion. ∎

B.2 Lyapunov Drift Analysis (Proof of Lemma 3.2)

Lemma B.3 (Restatement of Lemma 3.2; Lyapunov Drift Analysis).

Under the queue dynamics of Equation 1,

0𝔼[t=1TΔ(𝑸(t))]0𝔼superscriptsubscript𝑡1𝑇Δ𝑸𝑡\displaystyle 0\leq\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}% \Delta(\bm{Q}(t))\right]0 ≤ blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_Δ ( bold_italic_Q ( italic_t ) ) ] 𝔼[t=1T(n,m)k𝒩μn,m(k)(t)(Qm(k)(t)Qn(k)(t))+n𝒩k𝒩Qn(k)(t)λn(k)(t)]+absentlimit-from𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝑚subscript𝑘𝒩superscriptsubscript𝜇𝑛𝑚𝑘𝑡superscriptsubscript𝑄𝑚𝑘𝑡superscriptsubscript𝑄𝑛𝑘𝑡subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡\displaystyle\leq\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum_{% (n,m)\in{\mathcal{L}}}\sum_{k\in{\mathcal{N}}}\mu_{n,m}^{(k)}(t)\left(Q_{m}^{(% k)}(t)-Q_{n}^{(k)}(t)\right)+\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q% _{n}^{(k)}(t)\lambda_{n}^{(k)}(t)\right]+≤ blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ( italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) + ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] +
12N2((NM)2+2(NM)2+2R2)T.12superscript𝑁2superscript𝑁𝑀22superscript𝑁𝑀22superscript𝑅2𝑇\displaystyle\quad\frac{1}{2}N^{2}((NM)^{2}+2(NM)^{2}+2R^{2})T.divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_T .
Proof.

According to Equation 1, for any round t[T]𝑡delimited-[]𝑇t\in[T]italic_t ∈ [ italic_T ], server n𝒩𝑛𝒩n\in{\mathcal{N}}italic_n ∈ caligraphic_N, and commodity k𝒩𝑘𝒩k\in{\mathcal{N}}italic_k ∈ caligraphic_N, we

(Qn(k)(t+1))2superscriptsuperscriptsubscript𝑄𝑛𝑘𝑡12\displaystyle(Q_{n}^{(k)}(t+1))^{2}( italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t + 1 ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT [Qn(k)(t)(n,m)μn,m(k)(t)]+2+((o,n)μo,n(k)(t)+λn(k)(t))2+absentsuperscriptsubscriptdelimited-[]superscriptsubscript𝑄𝑛𝑘𝑡subscript𝑛𝑚superscriptsubscript𝜇𝑛𝑚𝑘𝑡2limit-fromsuperscriptsubscript𝑜𝑛superscriptsubscript𝜇𝑜𝑛𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡2\displaystyle\leq\left[Q_{n}^{(k)}(t)-\sum_{(n,m)\in{\mathcal{L}}}\mu_{n,m}^{(% k)}(t)\right]_{+}^{2}+\left(\sum_{(o,n)\in{\mathcal{L}}}\mu_{o,n}^{(k)}(t)+% \lambda_{n}^{(k)}(t)\right)^{2}+≤ [ italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( ∑ start_POSTSUBSCRIPT ( italic_o , italic_n ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT +
2[Qn(k)(t)(n,m)μn,m(k)(t)]+((o,n)μo,n(k)(t)+λn(k)(t))2subscriptdelimited-[]superscriptsubscript𝑄𝑛𝑘𝑡subscript𝑛𝑚superscriptsubscript𝜇𝑛𝑚𝑘𝑡subscript𝑜𝑛superscriptsubscript𝜇𝑜𝑛𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡\displaystyle\quad 2\left[Q_{n}^{(k)}(t)-\sum_{(n,m)\in{\mathcal{L}}}\mu_{n,m}% ^{(k)}(t)\right]_{+}\left(\sum_{(o,n)\in{\mathcal{L}}}\mu_{o,n}^{(k)}(t)+% \lambda_{n}^{(k)}(t)\right)2 [ italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT ( italic_o , italic_n ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) )
(Qn(k)(t))22Qn(k)(t)((n,m)μn,m(k)(t))+((n,m)μn,m(k)(t))2+absentsuperscriptsuperscriptsubscript𝑄𝑛𝑘𝑡22superscriptsubscript𝑄𝑛𝑘𝑡subscript𝑛𝑚superscriptsubscript𝜇𝑛𝑚𝑘𝑡limit-fromsuperscriptsubscript𝑛𝑚superscriptsubscript𝜇𝑛𝑚𝑘𝑡2\displaystyle\leq(Q_{n}^{(k)}(t))^{2}-2Q_{n}^{(k)}(t)\left(\sum_{(n,m)\in{% \mathcal{L}}}\mu_{n,m}^{(k)}(t)\right)+\left(\sum_{(n,m)\in{\mathcal{L}}}\mu_{% n,m}^{(k)}(t)\right)^{2}+≤ ( italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ( ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) + ( ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT +
((o,n)μo,n(k)(t)+λn(k)(t))2+2Qn(k)(t)((o,n)μo,n(k)(t)+λn(k)(t))superscriptsubscript𝑜𝑛superscriptsubscript𝜇𝑜𝑛𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡22superscriptsubscript𝑄𝑛𝑘𝑡subscript𝑜𝑛superscriptsubscript𝜇𝑜𝑛𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡\displaystyle\quad\left(\sum_{(o,n)\in{\mathcal{L}}}\mu_{o,n}^{(k)}(t)+\lambda% _{n}^{(k)}(t)\right)^{2}+2Q_{n}^{(k)}(t)\left(\sum_{(o,n)\in{\mathcal{L}}}\mu_% {o,n}^{(k)}(t)+\lambda_{n}^{(k)}(t)\right)( ∑ start_POSTSUBSCRIPT ( italic_o , italic_n ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ( ∑ start_POSTSUBSCRIPT ( italic_o , italic_n ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) )
=(Qn(k)(t))22Qn(k)(t)((n,m)μn,m(k)(t)(o,n)μo,n(k)(t)λn(k)(t))+absentsuperscriptsuperscriptsubscript𝑄𝑛𝑘𝑡2limit-from2superscriptsubscript𝑄𝑛𝑘𝑡subscript𝑛𝑚superscriptsubscript𝜇𝑛𝑚𝑘𝑡subscript𝑜𝑛superscriptsubscript𝜇𝑜𝑛𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡\displaystyle=(Q_{n}^{(k)}(t))^{2}-2Q_{n}^{(k)}(t)\left(\sum_{(n,m)\in{% \mathcal{L}}}\mu_{n,m}^{(k)}(t)-\sum_{(o,n)\in{\mathcal{L}}}\mu_{o,n}^{(k)}(t)% -\lambda_{n}^{(k)}(t)\right)+= ( italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ( ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - ∑ start_POSTSUBSCRIPT ( italic_o , italic_n ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) +
((n,m)μn,m(k)(t))2+((o,n)μo,n(k)(t)+λn(k)(t))2.superscriptsubscript𝑛𝑚superscriptsubscript𝜇𝑛𝑚𝑘𝑡2superscriptsubscript𝑜𝑛superscriptsubscript𝜇𝑜𝑛𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡2\displaystyle\quad\left(\sum_{(n,m)\in{\mathcal{L}}}\mu_{n,m}^{(k)}(t)\right)^% {2}+\left(\sum_{(o,n)\in{\mathcal{L}}}\mu_{o,n}^{(k)}(t)+\lambda_{n}^{(k)}(t)% \right)^{2}.( ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( ∑ start_POSTSUBSCRIPT ( italic_o , italic_n ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Therefore, by definition of the Lyapunov function Ltsubscript𝐿𝑡L_{t}italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and the Lyapunov drift Δ(𝑸(t))Δ𝑸𝑡\Delta(\bm{Q}(t))roman_Δ ( bold_italic_Q ( italic_t ) ),

Δ(𝑸(t))Δ𝑸𝑡\displaystyle\Delta(\bm{Q}(t))roman_Δ ( bold_italic_Q ( italic_t ) ) n𝒩k𝒩Qn(k)(t)𝔼[(n,m)μn,m(k)(t)(o,n)μo,n(k)(t)λn(k)(t)|𝑸(t)]+absentlimit-fromsubscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡𝔼subscript𝑛𝑚superscriptsubscript𝜇𝑛𝑚𝑘𝑡subscript𝑜𝑛superscriptsubscript𝜇𝑜𝑛𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡𝑸𝑡\displaystyle\leq-\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q_{n}^{(k)}(% t)\operatornamewithlimits{\mathbb{E}}\left[\sum_{(n,m)\in{\mathcal{L}}}\mu_{n,% m}^{(k)}(t)-\sum_{(o,n)\in{\mathcal{L}}}\mu_{o,n}^{(k)}(t)-\lambda_{n}^{(k)}(t% )\middle|\bm{Q}(t)\right]+≤ - ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) blackboard_E [ ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - ∑ start_POSTSUBSCRIPT ( italic_o , italic_n ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) | bold_italic_Q ( italic_t ) ] +
12n𝒩k𝒩𝔼[((n,m)μn,m(k)(t))2+((o,n)μo,n(k)(t)+λn(k)(t))2|𝑸(t)].12subscript𝑛𝒩subscript𝑘𝒩𝔼superscriptsubscript𝑛𝑚superscriptsubscript𝜇𝑛𝑚𝑘𝑡2superscriptsubscript𝑜𝑛superscriptsubscript𝜇𝑜𝑛𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡2𝑸𝑡\displaystyle\quad\frac{1}{2}\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}% \operatornamewithlimits{\mathbb{E}}\left[\left(\sum_{(n,m)\in{\mathcal{L}}}\mu% _{n,m}^{(k)}(t)\right)^{2}+\left(\sum_{(o,n)\in{\mathcal{L}}}\mu_{o,n}^{(k)}(t% )+\lambda_{n}^{(k)}(t)\right)^{2}\middle|\bm{Q}(t)\right].divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT blackboard_E [ ( ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( ∑ start_POSTSUBSCRIPT ( italic_o , italic_n ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) + italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | bold_italic_Q ( italic_t ) ] .

Exchanging summations and using the bounded assumptions that μn,m(k)(t)[0,M]superscriptsubscript𝜇𝑛𝑚𝑘𝑡0𝑀\mu_{n,m}^{(k)}(t)\in[0,M]italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ∈ [ 0 , italic_M ] and λn(k)(t)[0,R]superscriptsubscript𝜆𝑛𝑘𝑡0𝑅\lambda_{n}^{(k)}(t)\in[0,R]italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ∈ [ 0 , italic_R ], we get

Δ(𝑸(t))Δ𝑸𝑡\displaystyle\Delta(\bm{Q}(t))roman_Δ ( bold_italic_Q ( italic_t ) ) 𝔼[(n,m)k𝒩μn,m(k)(t)(Qm(k)(t)Qn(k)(t))+n𝒩k𝒩Qn(k)(t)λn(k)(t)|𝑸(t)]+absentlimit-from𝔼subscript𝑛𝑚subscript𝑘𝒩superscriptsubscript𝜇𝑛𝑚𝑘𝑡superscriptsubscript𝑄𝑚𝑘𝑡superscriptsubscript𝑄𝑛𝑘𝑡subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡𝑸𝑡\displaystyle\leq\operatornamewithlimits{\mathbb{E}}\left[\sum_{(n,m)\in{% \mathcal{L}}}\sum_{k\in{\mathcal{N}}}\mu_{n,m}^{(k)}(t)\left(Q_{m}^{(k)}(t)-Q_% {n}^{(k)}(t)\right)+\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q_{n}^{(k)% }(t)\lambda_{n}^{(k)}(t)\middle|\bm{Q}(t)\right]+≤ blackboard_E [ ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ( italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) + ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) | bold_italic_Q ( italic_t ) ] +
12N2((NM)2+2(NM)2+2R2).12superscript𝑁2superscript𝑁𝑀22superscript𝑁𝑀22superscript𝑅2\displaystyle\quad\frac{1}{2}N^{2}((NM)^{2}+2(NM)^{2}+2R^{2}).divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) .

Taking expectation w.r.t. 𝑸(t)𝑸𝑡\bm{Q}(t)bold_italic_Q ( italic_t ) and summing up from t=1,2,,T𝑡12𝑇t=1,2,\ldots,Titalic_t = 1 , 2 , … , italic_T, we have

𝔼[t=1TΔ(𝑸(t))]𝔼superscriptsubscript𝑡1𝑇Δ𝑸𝑡\displaystyle\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\Delta(\bm% {Q}(t))\right]blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_Δ ( bold_italic_Q ( italic_t ) ) ] 𝔼[t=1T(n,m)k𝒩μn,m(k)(t)(Qm(k)(t)Qn(k)(t))+n𝒩k𝒩Qn(k)(t)λn(k)(t)]+absentlimit-from𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝑚subscript𝑘𝒩superscriptsubscript𝜇𝑛𝑚𝑘𝑡superscriptsubscript𝑄𝑚𝑘𝑡superscriptsubscript𝑄𝑛𝑘𝑡subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡\displaystyle\leq\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum_{% (n,m)\in{\mathcal{L}}}\sum_{k\in{\mathcal{N}}}\mu_{n,m}^{(k)}(t)\left(Q_{m}^{(% k)}(t)-Q_{n}^{(k)}(t)\right)+\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q% _{n}^{(k)}(t)\lambda_{n}^{(k)}(t)\right]+≤ blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ( italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) + ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] +
12N2((NM)2+2(NM)2+2R2)T.12superscript𝑁2superscript𝑁𝑀22superscript𝑁𝑀22superscript𝑅2𝑇\displaystyle\quad\frac{1}{2}N^{2}((NM)^{2}+2(NM)^{2}+2R^{2})T.divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_T .

By telescoping sums, we know

t=1TΔ(𝒒t)=LT+1L1=LT+10,superscriptsubscript𝑡1𝑇Δsubscript𝒒𝑡subscript𝐿𝑇1subscript𝐿1subscript𝐿𝑇10\sum_{t=1}^{T}\Delta(\bm{q}_{t})=L_{T+1}-L_{1}=L_{T+1}\geq 0,∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_Δ ( bold_italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = italic_L start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT - italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_L start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ≥ 0 ,

where the last step uses the fact that LT+1subscript𝐿𝑇1L_{T+1}italic_L start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT is the sum of squares. Therefore, our conclusion follows. ∎

B.3 Guarantee of AdaPFOL Algorithm (Proof of Lemma 3.4)

Before analyzing our AdaPFOL algorithm, we first include the guarantee of the PFOL algorithm (Cutkosky, 2020) as follows. It roughly says that for bounded losses (i.e., 𝒈t1delimited-∥∥subscript𝒈𝑡1\lVert\bm{g}_{t}\rVert\leq 1∥ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ 1), there exists an algorithm that enjoys the following parameter-free (i.e., performance depending on loss magnitudes) guarantee.

Lemma B.4 (Guarantee of PFOL Algorithm (Cutkosky, 2020, Theorem 6)).

Consider the OLO problem in Definition 3.3. Suppose that 𝒳𝒳\mathcal{X}caligraphic_X has diameter D=sup𝐱,𝐲𝒳𝐱𝐲1𝐷subscriptsupremum𝐱𝐲𝒳subscriptdelimited-∥∥𝐱𝐲1D=\sup_{\bm{x},\bm{y}\in\mathcal{X}}\lVert\bm{x}-\bm{y}\rVert_{1}italic_D = roman_sup start_POSTSUBSCRIPT bold_italic_x , bold_italic_y ∈ caligraphic_X end_POSTSUBSCRIPT ∥ bold_italic_x - bold_italic_y ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and all 𝐠t1subscriptdelimited-∥∥subscript𝐠𝑡1\lVert\bm{g}_{t}\rVert_{\infty}\leq 1∥ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ 1. Then there exists an algorithm, such that for any comparator sequence 𝐱̊1,𝐱̊2,,𝐱̊T𝒳subscript̊𝐱1subscript̊𝐱2subscript̊𝐱𝑇𝒳\mathring{\bm{x}}_{1},\mathring{\bm{x}}_{2},\ldots,\mathring{\bm{x}}_{T}\in% \mathcal{X}over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∈ caligraphic_X,

D-RegretTOLO(𝒙̊1,𝒙̊2,,𝒙̊T)=𝒪(D(D+t=1T1𝒙̊t𝒙̊t+11)1+t=1T𝒈t2log(Tt=1T𝒈t2)).superscriptsubscriptD-Regret𝑇OLOsubscript̊𝒙1subscript̊𝒙2subscript̊𝒙𝑇𝒪𝐷𝐷superscriptsubscript𝑡1𝑇1subscriptdelimited-∥∥subscript̊𝒙𝑡subscript̊𝒙𝑡111superscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥subscript𝒈𝑡2𝑇superscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥subscript𝒈𝑡2\text{D-Regret}_{T}^{\text{OLO}}(\mathring{\bm{x}}_{1},\mathring{\bm{x}}_{2},% \ldots,\mathring{\bm{x}}_{T})=\operatorname{\mathcal{O}}\left(\sqrt{D\left(D+% \sum_{t=1}^{T-1}\lVert\mathring{\bm{x}}_{t}-\mathring{\bm{x}}_{t+1}\rVert_{1}% \right)}\sqrt{1+\sum_{t=1}^{T}\lVert\bm{g}_{t}\rVert_{\infty}^{2}}\log\left(T% \sum_{t=1}^{T}\lVert\bm{g}_{t}\rVert_{\infty}^{2}\right)\right).D-Regret start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT OLO end_POSTSUPERSCRIPT ( over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) = caligraphic_O ( square-root start_ARG italic_D ( italic_D + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∥ over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG square-root start_ARG 1 + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_log ( italic_T ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ) .

Due to the complicatedness of the original algorithm, we do not present its pseudo-code here but instead use it as a black-box. Please refer to the original paper by Cutkosky (2020) for more details. Note that, although the original analysis by Cutkosky (2020) uses 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-norm for both comparators {𝒙̊t}t=1Tsuperscriptsubscriptsubscript̊𝒙𝑡𝑡1𝑇\{\mathring{\bm{x}}_{t}\}_{t=1}^{T}{ over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT and loss vectors {𝒈t}t=1Tsuperscriptsubscriptsubscript𝒈𝑡𝑡1𝑇\{\bm{g}_{t}\}_{t=1}^{T}{ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, it is straightforward to extend to a pair of dual norms, which is 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-norm for {𝒙̊t}t=1Tsuperscriptsubscriptsubscript̊𝒙𝑡𝑡1𝑇\{\mathring{\bm{x}}_{t}\}_{t=1}^{T}{ over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT and subscript\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-norm for {𝒈t}t=1Tsuperscriptsubscriptsubscript𝒈𝑡𝑡1𝑇\{\bm{g}_{t}\}_{t=1}^{T}{ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT in our case.

Now, we are ready to present the guarantee for our AdaPFOL algorithm:

Lemma B.5 (Restatement of Lemma 3.4; Guarantee of AdaPFOL Algorithm).

Consider the OLO problem in Definition 3.3. Let the action set 𝒳𝒳\mathcal{X}caligraphic_X has diameter D=sup𝐱,𝐲𝒳𝐱𝐲1𝐷subscriptsupremum𝐱𝐲𝒳subscriptdelimited-∥∥𝐱𝐲1D=\sup_{\bm{x},\bm{y}\in\mathcal{X}}\lVert\bm{x}-\bm{y}\rVert_{1}italic_D = roman_sup start_POSTSUBSCRIPT bold_italic_x , bold_italic_y ∈ caligraphic_X end_POSTSUBSCRIPT ∥ bold_italic_x - bold_italic_y ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Suppose that 𝐠tGtsubscriptdelimited-∥∥subscript𝐠𝑡subscript𝐺𝑡\lVert\bm{g}_{t}\rVert_{\infty}\leq G_{t}∥ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, where Gtsubscript𝐺𝑡G_{t}italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is some t1subscript𝑡1\mathcal{F}_{t-1}caligraphic_F start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT-measurable random variable and (t)t=0Tsuperscriptsubscriptsubscript𝑡𝑡0𝑇(\mathcal{F}_{t})_{t=0}^{T}( caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT is the natural filtration, i.e., tsubscript𝑡\mathcal{F}_{t}caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the σ𝜎\sigmaitalic_σ-algebra generated by all random observations made during the first t𝑡titalic_t rounds. Then, AdaPFOL (Algorithm 2) ensures that for any comparator sequence 𝐱̊1,𝐱̊2,,𝐱̊T𝒳subscript̊𝐱1subscript̊𝐱2subscript̊𝐱𝑇𝒳\mathring{\bm{x}}_{1},\mathring{\bm{x}}_{2},\ldots,\mathring{\bm{x}}_{T}\in% \mathcal{X}over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∈ caligraphic_X, if maxt[T]Gt1subscript𝑡delimited-[]𝑇subscript𝐺𝑡1\max_{t\in[T]}G_{t}\geq 1roman_max start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≥ 1, then

D-RegretTOLO(𝒙̊1,𝒙̊2,,𝒙̊T)=𝒪(D(D+t=1T1𝒙̊t𝒙̊t+11)t=1T𝒈t2logTlog(maxt=1TGt)).superscriptsubscriptD-Regret𝑇OLOsubscript̊𝒙1subscript̊𝒙2subscript̊𝒙𝑇𝒪𝐷𝐷superscriptsubscript𝑡1𝑇1subscriptdelimited-∥∥subscript̊𝒙𝑡subscript̊𝒙𝑡11superscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥subscript𝒈𝑡2𝑇superscriptsubscript𝑡1𝑇subscript𝐺𝑡\text{D-Regret}_{T}^{\text{OLO}}(\mathring{\bm{x}}_{1},\mathring{\bm{x}}_{2},% \ldots,\mathring{\bm{x}}_{T})=\operatorname{\mathcal{O}}\left(\sqrt{D\left(D+% \sum_{t=1}^{T-1}\lVert\mathring{\bm{x}}_{t}-\mathring{\bm{x}}_{t+1}\rVert_{1}% \right)}\sqrt{\sum_{t=1}^{T}\lVert\bm{g}_{t}\rVert_{\infty}^{2}}\log T\log% \left(\max_{t=1}^{T}G_{t}\right)\right).D-Regret start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT OLO end_POSTSUPERSCRIPT ( over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) = caligraphic_O ( square-root start_ARG italic_D ( italic_D + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∥ over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG square-root start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_log italic_T roman_log ( roman_max start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) .
Proof.

As G𝐺Gitalic_G changes only if Gt>2maxs<tGssubscript𝐺𝑡2subscript𝑠𝑡subscript𝐺𝑠G_{t}>2\max_{s<t}G_{s}italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT > 2 roman_max start_POSTSUBSCRIPT italic_s < italic_t end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, it cannot change for more than log2(maxtGt)subscript2subscript𝑡subscript𝐺𝑡\lceil\log_{2}(\max_{t}G_{t})\rceil⌈ roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( roman_max start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ⌉ times. For a fixed G𝐺Gitalic_G, suppose that it is used for rounds t1,t1+1,,t2subscript𝑡1subscript𝑡11subscript𝑡2t_{1},t_{1}+1,\ldots,t_{2}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 1 , … , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, then we must have GtGsubscript𝐺𝑡𝐺G_{t}\leq Gitalic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≤ italic_G, t[t1,t2]for-all𝑡subscript𝑡1subscript𝑡2\forall t\in[t_{1},t_{2}]∀ italic_t ∈ [ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] as otherwise a new instance of PFOL will be launched.

Therefore, as 𝒈tGtGsubscriptdelimited-∥∥subscript𝒈𝑡subscript𝐺𝑡𝐺\lVert\bm{g}_{t}\rVert_{\infty}\leq G_{t}\leq G∥ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≤ italic_G, we have G1𝒈t1subscriptdelimited-∥∥superscript𝐺1subscript𝒈𝑡1\lVert G^{-1}\bm{g}_{t}\rVert_{\infty}\leq 1∥ italic_G start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ 1. This allows us to apply Lemma B.4 and yield

t=t1t2G1𝒈t,𝒙t𝒙t=𝒪(D(D+t=t1t21𝒙̊t𝒙̊t+11)t=t1t2G1𝒈t2logT).superscriptsubscript𝑡subscript𝑡1subscript𝑡2superscript𝐺1subscript𝒈𝑡subscript𝒙𝑡superscriptsubscript𝒙𝑡𝒪𝐷𝐷superscriptsubscript𝑡subscript𝑡1subscript𝑡21subscriptdelimited-∥∥subscript̊𝒙𝑡subscript̊𝒙𝑡11superscriptsubscript𝑡subscript𝑡1subscript𝑡2superscriptsubscriptdelimited-∥∥superscript𝐺1subscript𝒈𝑡2𝑇\sum_{t=t_{1}}^{t_{2}}\langle G^{-1}\bm{g}_{t},\bm{x}_{t}-\bm{x}_{t}^{\circ}% \rangle=\operatorname{\mathcal{O}}\left(\sqrt{D\left(D+\sum_{t=t_{1}}^{t_{2}-1% }\lVert\mathring{\bm{x}}_{t}-\mathring{\bm{x}}_{t+1}\rVert_{1}\right)}\sqrt{% \sum_{t=t_{1}}^{t_{2}}\lVert G^{-1}\bm{g}_{t}\rVert_{\infty}^{2}}\log T\right).∑ start_POSTSUBSCRIPT italic_t = italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⟨ italic_G start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ⟩ = caligraphic_O ( square-root start_ARG italic_D ( italic_D + ∑ start_POSTSUBSCRIPT italic_t = italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ∥ over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG square-root start_ARG ∑ start_POSTSUBSCRIPT italic_t = italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∥ italic_G start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_log italic_T ) .

Multiplying G𝐺Gitalic_G on both sides, we have

t=t1t2𝒈t,𝒙t𝒙t=𝒪(D(D+t=t1t21𝒙̊t𝒙̊t+11)t=t1t2𝒈t2logT).superscriptsubscript𝑡subscript𝑡1subscript𝑡2subscript𝒈𝑡subscript𝒙𝑡superscriptsubscript𝒙𝑡𝒪𝐷𝐷superscriptsubscript𝑡subscript𝑡1subscript𝑡21subscriptdelimited-∥∥subscript̊𝒙𝑡subscript̊𝒙𝑡11superscriptsubscript𝑡subscript𝑡1subscript𝑡2superscriptsubscriptdelimited-∥∥subscript𝒈𝑡2𝑇\sum_{t=t_{1}}^{t_{2}}\langle\bm{g}_{t},\bm{x}_{t}-\bm{x}_{t}^{\circ}\rangle=% \operatorname{\mathcal{O}}\left(\sqrt{D\left(D+\sum_{t=t_{1}}^{t_{2}-1}\lVert% \mathring{\bm{x}}_{t}-\mathring{\bm{x}}_{t+1}\rVert_{1}\right)}\sqrt{\sum_{t=t% _{1}}^{t_{2}}\lVert\bm{g}_{t}\rVert_{\infty}^{2}}\log T\right).∑ start_POSTSUBSCRIPT italic_t = italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⟨ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ⟩ = caligraphic_O ( square-root start_ARG italic_D ( italic_D + ∑ start_POSTSUBSCRIPT italic_t = italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ∥ over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG square-root start_ARG ∑ start_POSTSUBSCRIPT italic_t = italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∥ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_log italic_T ) . (21)

As all [t1,t2]subscript𝑡1subscript𝑡2[t_{1},t_{2}][ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ]’s form a partition of [T]delimited-[]𝑇[T][ italic_T ], summing up all Equation 21 gives

t=1T𝒈t,𝒙t𝒙t=𝒪(D(D+t=1T1𝒙̊t𝒙̊t+11)t=1T𝒈t2logTlog(maxt=1TGt)),superscriptsubscript𝑡1𝑇subscript𝒈𝑡subscript𝒙𝑡superscriptsubscript𝒙𝑡𝒪𝐷𝐷superscriptsubscript𝑡1𝑇1subscriptdelimited-∥∥subscript̊𝒙𝑡subscript̊𝒙𝑡11superscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥subscript𝒈𝑡2𝑇superscriptsubscript𝑡1𝑇subscript𝐺𝑡\sum_{t=1}^{T}\langle\bm{g}_{t},\bm{x}_{t}-\bm{x}_{t}^{\circ}\rangle=% \operatorname{\mathcal{O}}\left(\sqrt{D\left(D+\sum_{t=1}^{T-1}\lVert\mathring% {\bm{x}}_{t}-\mathring{\bm{x}}_{t+1}\rVert_{1}\right)}\sqrt{\sum_{t=1}^{T}% \lVert\bm{g}_{t}\rVert_{\infty}^{2}}\log T\log\left(\max_{t=1}^{T}G_{t}\right)% \right),∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ⟨ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ⟩ = caligraphic_O ( square-root start_ARG italic_D ( italic_D + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∥ over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over̊ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG square-root start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_log italic_T roman_log ( roman_max start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) ,

which utilizes the fact that at most 𝒪(log2(maxtGt))𝒪subscript2subscript𝑡subscript𝐺𝑡\operatorname{\mathcal{O}}(\lceil\log_{2}(\max_{t}G_{t})\rceil)caligraphic_O ( ⌈ roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( roman_max start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ⌉ ) distinct [t1,t2]subscript𝑡1subscript𝑡2[t_{1},t_{2}][ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ]’s can occur. ∎

B.4 Deciding 𝒂(t)𝒂𝑡\bm{a}(t)bold_italic_a ( italic_t ) via AdaPFOL Algorithm (Proof of Theorem 3.5)

Theorem B.6 (Restatement of Theorem 3.5; Deciding 𝒂(t)𝒂𝑡\bm{a}(t)bold_italic_a ( italic_t ) via AdaPFOL Algorithm).

For each link (n,m)𝑛𝑚(n,m)\in{\mathcal{L}}( italic_n , italic_m ) ∈ caligraphic_L, as we did in NSO, we execute an instance of AdaPFOL (Algorithm 2) where 𝒳=(𝒩)𝒳𝒩\mathcal{X}=\triangle({\mathcal{N}})caligraphic_X = △ ( caligraphic_N ), 𝐠t=Cn,m(t)(𝐐m(t)𝐐n(t))subscript𝐠𝑡subscript𝐶𝑛𝑚𝑡subscript𝐐𝑚𝑡subscript𝐐𝑛𝑡\bm{g}_{t}=C_{n,m}(t)(\bm{Q}_{m}(t)-\bm{Q}_{n}(t))bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ( bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ), and Gt=M𝐐m(t)𝐐n(t)subscript𝐺𝑡𝑀subscriptdelimited-∥∥subscript𝐐𝑚𝑡subscript𝐐𝑛𝑡G_{t}=M\lVert\bm{Q}_{m}(t)-\bm{Q}_{n}(t)\rVert_{\infty}italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_M ∥ bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT. We make their outputs 𝐱tsubscript𝐱𝑡\bm{x}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT as 𝐚n,m(t)subscript𝐚𝑛𝑚𝑡\bm{a}_{n,m}(t)bold_italic_a start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) for every round t𝑡titalic_t. Let μn,m(k)(t)superscriptsubscript𝜇𝑛𝑚𝑘𝑡\mu_{n,m}^{(k)}(t)italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) be the number of actually transmitted jobs from Qn(k)(t)superscriptsubscript𝑄𝑛𝑘𝑡Q_{n}^{(k)}(t)italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) to Qm(k)(t+1)superscriptsubscript𝑄𝑚𝑘𝑡1Q_{m}^{(k)}(t+1)italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t + 1 ) induced by an,m(k)(t)superscriptsubscript𝑎𝑛𝑚𝑘𝑡a_{n,m}^{(k)}(t)italic_a start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ).

Consider an arbitrary reference action sequence {𝐚̊(t)}t[T]subscript̊𝐚𝑡𝑡delimited-[]𝑇\{\mathring{\bm{a}}(t)\}_{t\in[T]}{ over̊ start_ARG bold_italic_a end_ARG ( italic_t ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT satisfying 1. Let μ̊n,m(k)(t)=Cn,m(t)ån,m(k)(t)[0,M]superscriptsubscript̊𝜇𝑛𝑚𝑘𝑡subscript𝐶𝑛𝑚𝑡superscriptsubscript̊𝑎𝑛𝑚𝑘𝑡0𝑀\mathring{\mu}_{n,m}^{(k)}(t)=C_{n,m}(t)\mathring{a}_{n,m}^{(k)}(t)\in[0,M]over̊ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) = italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ∈ [ 0 , italic_M ] (as Cn,m(t)[0,M]subscript𝐶𝑛𝑚𝑡0𝑀C_{n,m}(t)\in[0,M]italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ∈ [ 0 , italic_M ] and ån,m(k)(t)[0,1]superscriptsubscript̊𝑎𝑛𝑚𝑘𝑡01\mathring{a}_{n,m}^{(k)}(t)\in[0,1]over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ∈ [ 0 , 1 ]). Then

𝔼[t=1T(n,m)k𝒩(μn,m(k)(t)μ̊n,m(k)(t))(Qm(k)(t)Qn(k)(t))]𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝑚subscript𝑘𝒩superscriptsubscript𝜇𝑛𝑚𝑘𝑡superscriptsubscript̊𝜇𝑛𝑚𝑘𝑡superscriptsubscript𝑄𝑚𝑘𝑡superscriptsubscript𝑄𝑛𝑘𝑡\displaystyle\quad\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum_% {(n,m)\in{\mathcal{L}}}\sum_{k\in{\mathcal{N}}}(\mu_{n,m}^{(k)}(t)-\mathring{% \mu}_{n,m}^{(k)}(t))\left(Q_{m}^{(k)}(t)-Q_{n}^{(k)}(t)\right)\right]blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - over̊ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) ( italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) ]
=𝒪(M1+PTa𝔼[t=1T𝑸(t)22logTlog(maxt=1Tmax(n,m)M𝑸m(t)𝑸n(t))]),absent𝒪𝑀1superscriptsubscript𝑃𝑇𝑎𝔼superscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥𝑸𝑡22𝑇superscriptsubscript𝑡1𝑇subscript𝑛𝑚𝑀subscriptdelimited-∥∥subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡,\displaystyle=\operatorname{\mathcal{O}}\left(M\sqrt{1+P_{T}^{a}}% \operatornamewithlimits{\mathbb{E}}\left[\sqrt{\sum_{t=1}^{T}\lVert\bm{Q}(t)% \rVert_{2}^{2}}\log T\log\left(\max_{t=1}^{T}\max_{(n,m)\in{\mathcal{L}}}M% \lVert\bm{Q}_{m}(t)-\bm{Q}_{n}(t)\rVert_{\infty}\right)\right]\right)\text{,}= caligraphic_O ( italic_M square-root start_ARG 1 + italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_ARG blackboard_E [ square-root start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_log italic_T roman_log ( roman_max start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_max start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_M ∥ bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) ] ) ,

where PTat=1T1(n,m)𝐚̊n,m(t)𝐚̊n,m(t+1)1superscriptsubscript𝑃𝑇𝑎superscriptsubscript𝑡1𝑇1subscript𝑛𝑚subscriptdelimited-∥∥subscript̊𝐚𝑛𝑚𝑡subscript̊𝐚𝑛𝑚𝑡11P_{T}^{a}\triangleq\sum_{t=1}^{T-1}\sum_{(n,m)\in{\mathcal{L}}}\lVert\mathring% {\bm{a}}_{n,m}(t)-\mathring{\bm{a}}_{n,m}(t+1)\rVert_{1}italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ≜ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∥ over̊ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) - over̊ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t + 1 ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is the path length of {𝐚̊(t)}t=1Tsuperscriptsubscript̊𝐚𝑡𝑡1𝑇\{\mathring{\bm{a}}(t)\}_{t=1}^{T}{ over̊ start_ARG bold_italic_a end_ARG ( italic_t ) } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT.

Proof.

According to the definitions of Cn,m(k)(t)superscriptsubscript𝐶𝑛𝑚𝑘𝑡C_{n,m}^{(k)}(t)italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) and C̊n,m(k)(t)superscriptsubscript̊𝐶𝑛𝑚𝑘𝑡\mathring{C}_{n,m}^{(k)}(t)over̊ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) together with Lemma 3.4,

t=1T𝔼[(n,m)k𝒩(Cn,m(k)(t)C̊n,m(k)(t))(Qm(k)(t)Qn(k)(t))|𝑸(t)]superscriptsubscript𝑡1𝑇𝔼subscript𝑛𝑚subscript𝑘𝒩superscriptsubscript𝐶𝑛𝑚𝑘𝑡superscriptsubscript̊𝐶𝑛𝑚𝑘𝑡superscriptsubscript𝑄𝑚𝑘𝑡superscriptsubscript𝑄𝑛𝑘𝑡𝑸𝑡\displaystyle\quad\sum_{t=1}^{T}\operatornamewithlimits{\mathbb{E}}\left[\sum_% {(n,m)\in{\mathcal{L}}}\sum_{k\in{\mathcal{N}}}(C_{n,m}^{(k)}(t)-\mathring{C}_% {n,m}^{(k)}(t))\left(Q_{m}^{(k)}(t)-Q_{n}^{(k)}(t)\right)\middle|\bm{Q}(t)\right]∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT blackboard_E [ ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - over̊ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) ( italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) | bold_italic_Q ( italic_t ) ]
=t=1T(n,m)k𝒩Cn,m(t)(Qm(k)(t)Qn(k)(t))(an,m(k)(t)ån,m(k)(t))absentsuperscriptsubscript𝑡1𝑇subscript𝑛𝑚subscript𝑘𝒩subscript𝐶𝑛𝑚𝑡superscriptsubscript𝑄𝑚𝑘𝑡superscriptsubscript𝑄𝑛𝑘𝑡superscriptsubscript𝑎𝑛𝑚𝑘𝑡superscriptsubscript̊𝑎𝑛𝑚𝑘𝑡\displaystyle=\sum_{t=1}^{T}\sum_{(n,m)\in{\mathcal{L}}}\sum_{k\in{\mathcal{N}% }}C_{n,m}(t)\left(Q_{m}^{(k)}(t)-Q_{n}^{(k)}(t)\right)\left(a_{n,m}^{(k)}(t)-% \mathring{a}_{n,m}^{(k)}(t)\right)= ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ( italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) ( italic_a start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) )
𝒪(1+t=1T1(n,m)𝒂̊n,m(t)𝒂̊n,m(t+1)1\displaystyle\leq\operatorname{\mathcal{O}}\Bigg{(}\sqrt{1+\sum_{t=1}^{T-1}% \sum_{(n,m)\in{\mathcal{L}}}\lVert\mathring{\bm{a}}_{n,m}(t)-\mathring{\bm{a}}% _{n,m}(t+1)\rVert_{1}}≤ caligraphic_O ( square-root start_ARG 1 + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∥ over̊ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) - over̊ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t + 1 ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG
t=1T(n,m)k𝒩(Cn,m(t)(Qm(k)(t)Qn(k)(t)))2superscriptsubscript𝑡1𝑇subscript𝑛𝑚subscript𝑘𝒩superscriptsubscript𝐶𝑛𝑚𝑡superscriptsubscript𝑄𝑚𝑘𝑡superscriptsubscript𝑄𝑛𝑘𝑡2\displaystyle\qquad\sqrt{\sum_{t=1}^{T}\sum_{(n,m)\in{\mathcal{L}}}\sum_{k\in{% \mathcal{N}}}\big{(}C_{n,m}(t)(Q_{m}^{(k)}(t)-Q_{n}^{(k)}(t))\big{)}^{2}}square-root start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ( italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
logTlog(maxt=1Tmax(n,m)M𝑸m(t)𝑸n(t))).\displaystyle\qquad\log T\log\left(\max_{t=1}^{T}\max_{(n,m)\in{\mathcal{L}}}M% \lVert\bm{Q}_{m}(t)-\bm{Q}_{n}(t)\rVert_{\infty}\right)\Bigg{)}.roman_log italic_T roman_log ( roman_max start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_max start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_M ∥ bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) ) .

As Cn,m(t)[0,M]subscript𝐶𝑛𝑚𝑡0𝑀C_{n,m}(t)\in[0,M]italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ∈ [ 0 , italic_M ], we can upper bound the RHS of the above inequality by

𝒪(1+PTa2M2t=1Tn𝒩k𝒩Qn(k)(t)2logTlog(maxt=1Tmax(n,m)M𝑸m(t)𝑸n(t))),𝒪1superscriptsubscript𝑃𝑇𝑎2superscript𝑀2superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘superscript𝑡2𝑇superscriptsubscript𝑡1𝑇subscript𝑛𝑚𝑀subscriptdelimited-∥∥subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡\operatorname{\mathcal{O}}\left(\sqrt{1+P_{T}^{a}}\sqrt{2M^{2}\sum_{t=1}^{T}% \sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q_{n}^{(k)}(t)^{2}}\log T\log% \left(\max_{t=1}^{T}\max_{(n,m)\in{\mathcal{L}}}M\lVert\bm{Q}_{m}(t)-\bm{Q}_{n% }(t)\rVert_{\infty}\right)\right),caligraphic_O ( square-root start_ARG 1 + italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_ARG square-root start_ARG 2 italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_log italic_T roman_log ( roman_max start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_max start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_M ∥ bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) ) ,

where PTat=1T1(n,m)𝒂̊n,m(t)𝒂̊n,m(t+1)1superscriptsubscript𝑃𝑇𝑎superscriptsubscript𝑡1𝑇1subscript𝑛𝑚subscriptdelimited-∥∥subscript̊𝒂𝑛𝑚𝑡subscript̊𝒂𝑛𝑚𝑡11P_{T}^{a}\triangleq\sum_{t=1}^{T-1}\sum_{(n,m)\in{\mathcal{L}}}\lVert\mathring% {\bm{a}}_{n,m}(t)-\mathring{\bm{a}}_{n,m}(t+1)\rVert_{1}italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ≜ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∥ over̊ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) - over̊ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t + 1 ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is the path length of 𝒂̊(t)̊𝒂𝑡\mathring{\bm{a}}(t)over̊ start_ARG bold_italic_a end_ARG ( italic_t ). Taking expectations gives our conclusion. ∎

B.5 Main Theorem for Multi-Hop Network Stability (Proof of Theorem 3.6)

Theorem B.7 (Restatement of Theorem 3.6; Main Theorem for Multi-Hop Network Stability).

Suppose that {𝐚̊n,m(t)(𝒩)}(n,m),t[T]subscriptsubscript̊𝐚𝑛𝑚𝑡𝒩formulae-sequence𝑛𝑚𝑡delimited-[]𝑇\{\mathring{\bm{a}}_{n,m}(t)\in\triangle({\mathcal{N}})\}_{(n,m)\in{\mathcal{L% }},t\in[T]}{ over̊ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ∈ △ ( caligraphic_N ) } start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L , italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT satisfies 1 and its path length satisfies

Ptas=1t1(n,m)𝒂̊n,m(s)𝒂̊n,m(s+1)1Cat1/2δa,t=1,2,,T,formulae-sequencesuperscriptsubscript𝑃𝑡𝑎superscriptsubscript𝑠1𝑡1subscript𝑛𝑚subscriptdelimited-∥∥subscript̊𝒂𝑛𝑚𝑠subscript̊𝒂𝑛𝑚𝑠11superscript𝐶𝑎superscript𝑡12subscript𝛿𝑎for-all𝑡12𝑇P_{t}^{a}\triangleq\sum_{s=1}^{t-1}\sum_{(n,m)\in{\mathcal{L}}}\lVert\mathring% {\bm{a}}_{n,m}(s)-\mathring{\bm{a}}_{n,m}(s+1)\rVert_{1}\leq C^{a}t^{1/2-% \delta_{a}},\quad\forall t=1,2,\ldots,T,italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ≜ ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∥ over̊ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_s ) - over̊ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_s + 1 ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_C start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , ∀ italic_t = 1 , 2 , … , italic_T ,

where Casuperscript𝐶𝑎C^{a}italic_C start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT and δasubscript𝛿𝑎\delta_{a}italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT are assumed to be known constants but the precise Ptasuperscriptsubscript𝑃𝑡𝑎P_{t}^{a}italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT or {𝐚̊n,m(t)(𝒩)}(n,m),t[T]subscriptsubscript̊𝐚𝑛𝑚𝑡𝒩formulae-sequence𝑛𝑚𝑡delimited-[]𝑇\{\mathring{\bm{a}}_{n,m}(t)\in\triangle({\mathcal{N}})\}_{(n,m)\in{\mathcal{L% }},t\in[T]}{ over̊ start_ARG bold_italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ∈ △ ( caligraphic_N ) } start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L , italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT both remain unknown. Then, if we execute the NSO framework in Algorithm 1 with AdaPFOL defined in Algorithm 2, the following performance guarantee is enjoyed:

1T𝔼[t=1T𝑸(t)1]=𝒪((N2(2NM+R)2+ϵWN2(2NM+R))CW+(N4M2+N2R2)ϵW)+oT(1).1𝑇𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1𝒪superscript𝑁2superscript2𝑁𝑀𝑅2subscriptitalic-ϵ𝑊superscript𝑁22𝑁𝑀𝑅subscript𝐶𝑊superscript𝑁4superscript𝑀2superscript𝑁2superscript𝑅2subscriptitalic-ϵ𝑊subscript𝑜𝑇1\frac{1}{T}\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{Q}% (t)\rVert_{1}\right]=\operatorname{\mathcal{O}}\left(\frac{(N^{2}(2NM+R)^{2}+% \epsilon_{W}N^{2}(2NM+R))C_{W}+(N^{4}M^{2}+N^{2}R^{2})}{\epsilon_{W}}\right)+o% _{T}(1).divide start_ARG 1 end_ARG start_ARG italic_T end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = caligraphic_O ( divide start_ARG ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) ) italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT + ( italic_N start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT end_ARG ) + italic_o start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( 1 ) .

That is, when T0much-greater-than𝑇0T\gg 0italic_T ≫ 0, we have 1T𝔼[t=1T𝐐(t)1]=𝒪T(1)1𝑇𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝐐𝑡1subscript𝒪𝑇1\frac{1}{T}\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{Q}% (t)\rVert_{1}\right]=\operatorname{\mathcal{O}}_{T}(1)divide start_ARG 1 end_ARG start_ARG italic_T end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( 1 ), i.e., Equation 2 holds and the system is stable.

Proof.

We first defer some calculations into Lemma B.8, which basically combines Lemma 3.1, Lemma 3.2, and Theorem 3.5 together. Lemma B.8 says

𝔼[t=1T𝑸(t)1]𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1\displaystyle\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{% Q}(t)\rVert_{1}\right]blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] f(T)+g(T)𝔼[t=1T𝑸(t)1]3/4log𝔼[t=1T𝑸(t)1],\displaystyle\leq f(T)+g(T)\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}% ^{T}\lVert\bm{Q}(t)\rVert_{1}\right]^{3/4}\log\operatornamewithlimits{\mathbb{% E}}\left[\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{1}\right],≤ italic_f ( italic_T ) + italic_g ( italic_T ) blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT roman_log blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] ,

where

f(T)𝑓𝑇\displaystyle f(T)italic_f ( italic_T ) =ϵW1𝒪((N2(2NM+R)2+ϵWN2(2NM+R))CWT+(N4M2+N2R2)T),absentsuperscriptsubscriptitalic-ϵ𝑊1𝒪superscript𝑁2superscript2𝑁𝑀𝑅2subscriptitalic-ϵ𝑊superscript𝑁22𝑁𝑀𝑅subscript𝐶𝑊𝑇superscript𝑁4superscript𝑀2superscript𝑁2superscript𝑅2𝑇\displaystyle=\epsilon_{W}^{-1}\operatorname{\mathcal{O}}\left((N^{2}(2NM+R)^{% 2}+\epsilon_{W}N^{2}(2NM+R))C_{W}T+(N^{4}M^{2}+N^{2}R^{2})T\right),= italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_O ( ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) ) italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_T + ( italic_N start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_T ) ,
g(T)𝑔𝑇\displaystyle g(T)italic_g ( italic_T ) =ϵW1𝒪(M(2NM+R)1/41+PTalogT).absentsuperscriptsubscriptitalic-ϵ𝑊1𝒪𝑀superscript2𝑁𝑀𝑅141superscriptsubscript𝑃𝑇𝑎𝑇\displaystyle=\epsilon_{W}^{-1}\operatorname{\mathcal{O}}\left(M(2NM+R)^{1/4}% \sqrt{1+P_{T}^{a}}\log T\right).= italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_O ( italic_M ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT square-root start_ARG 1 + italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_ARG roman_log italic_T ) .

In Lemma D.5, we will prove a self-bounding inequality that says, if yf+y3/4glogy𝑦𝑓superscript𝑦34𝑔𝑦y\leq f+y^{3/4}g\log yitalic_y ≤ italic_f + italic_y start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT italic_g roman_log italic_y, then y(f1/4+glog(2(f1/4+g)2))4𝑦superscriptsuperscript𝑓14𝑔2superscriptsuperscript𝑓14𝑔24y\leq\left(f^{1/4}+g\log\left(2(f^{1/4}+g)^{2}\right)\right)^{4}italic_y ≤ ( italic_f start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_g roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_g ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT. Therefore, we can apply Lemma D.5 to conclude that

𝔼[t=1T𝑸(t)1]𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1\displaystyle\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{% Q}(t)\rVert_{1}\right]blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] =𝒪(f(T)+g(T)4log4(2(f(T)1/4+g(T))2))absent𝒪𝑓𝑇𝑔superscript𝑇4superscript42superscript𝑓superscript𝑇14𝑔𝑇2\displaystyle=\operatorname{\mathcal{O}}\left(f(T)+g(T)^{4}\log^{4}\left(2(f(T% )^{1/4}+g(T))^{2}\right)\right)= caligraphic_O ( italic_f ( italic_T ) + italic_g ( italic_T ) start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ( 2 ( italic_f ( italic_T ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_g ( italic_T ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) )
=ϵW1𝒪((N2(2NM+R)2+ϵWN2(2NM+R))CWT+(N4M2+N2R2)T)+absentlimit-fromsuperscriptsubscriptitalic-ϵ𝑊1𝒪superscript𝑁2superscript2𝑁𝑀𝑅2subscriptitalic-ϵ𝑊superscript𝑁22𝑁𝑀𝑅subscript𝐶𝑊𝑇superscript𝑁4superscript𝑀2superscript𝑁2superscript𝑅2𝑇\displaystyle=\epsilon_{W}^{-1}\operatorname{\mathcal{O}}\left((N^{2}(2NM+R)^{% 2}+\epsilon_{W}N^{2}(2NM+R))C_{W}T+(N^{4}M^{2}+N^{2}R^{2})T\right)+= italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_O ( ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) ) italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_T + ( italic_N start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_T ) +
(ϵW1𝒪(M(2NM+R)1/41+PTalogT)𝒪~T(1))4.superscriptsuperscriptsubscriptitalic-ϵ𝑊1𝒪𝑀superscript2𝑁𝑀𝑅141superscriptsubscript𝑃𝑇𝑎𝑇subscript~𝒪𝑇14\displaystyle\quad\left(\epsilon_{W}^{-1}\operatorname{\mathcal{O}}\left(M(2NM% +R)^{1/4}\sqrt{1+P_{T}^{a}}\log T\right)\operatorname{\widetilde{\mathcal{O}}}% _{T}(1)\right)^{4}.( italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_O ( italic_M ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT square-root start_ARG 1 + italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_ARG roman_log italic_T ) start_OPFUNCTION over~ start_ARG caligraphic_O end_ARG end_OPFUNCTION start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( 1 ) ) start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT .

Since

(1+PTa)4(T1/2δa)4=𝒪T(T12δa),superscript1superscriptsubscript𝑃𝑇𝑎4superscriptsuperscript𝑇12subscript𝛿𝑎4subscript𝒪𝑇superscript𝑇12subscript𝛿𝑎\displaystyle\left(\sqrt{1+P_{T}^{a}}\right)^{4}\leq\left(\sqrt{T^{1/2-\delta_% {a}}}\right)^{4}=\operatorname{\mathcal{O}}_{T}(T^{1-2\delta_{a}}),( square-root start_ARG 1 + italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ≤ ( square-root start_ARG italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT 1 - 2 italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) ,

we know (ϵW1𝒪(M(2NM+R)1/41+PTalogT)𝒪~T(1))4=𝒪~T(T12δa)=oT(T)superscriptsuperscriptsubscriptitalic-ϵ𝑊1𝒪𝑀superscript2𝑁𝑀𝑅141superscriptsubscript𝑃𝑇𝑎𝑇subscript~𝒪𝑇14subscript~𝒪𝑇superscript𝑇12subscript𝛿𝑎subscript𝑜𝑇𝑇\left(\epsilon_{W}^{-1}\operatorname{\mathcal{O}}\left(M(2NM+R)^{1/4}\sqrt{1+P% _{T}^{a}}\log T\right)\operatorname{\widetilde{\mathcal{O}}}_{T}(1)\right)^{4}% =\operatorname{\widetilde{\mathcal{O}}}_{T}(T^{1-2\delta_{a}})=o_{T}(T)( italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_O ( italic_M ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT square-root start_ARG 1 + italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_ARG roman_log italic_T ) start_OPFUNCTION over~ start_ARG caligraphic_O end_ARG end_OPFUNCTION start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( 1 ) ) start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = start_OPFUNCTION over~ start_ARG caligraphic_O end_ARG end_OPFUNCTION start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT 1 - 2 italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) = italic_o start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T ). The conclusion then follows. ∎

Lemma B.8 (Calculations when Proving Theorem 3.6).

Following all the assumptions in Theorem 3.6, we have

𝔼[t=1T𝑸(t)1]𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1\displaystyle\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{% Q}(t)\rVert_{1}\right]blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] f(T)+g(T)𝔼[t=1T𝑸(t)1]3/4log𝔼[t=1T𝑸(t)1],\displaystyle\leq f(T)+g(T)\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}% ^{T}\lVert\bm{Q}(t)\rVert_{1}\right]^{3/4}\log\operatornamewithlimits{\mathbb{% E}}\left[\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{1}\right],≤ italic_f ( italic_T ) + italic_g ( italic_T ) blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT roman_log blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] ,

where

f(T)𝑓𝑇\displaystyle f(T)italic_f ( italic_T ) =ϵW1𝒪((N2(2NM+R)2+ϵWN2(2NM+R))CWT+(N4M2+N2R2)T),absentsuperscriptsubscriptitalic-ϵ𝑊1𝒪superscript𝑁2superscript2𝑁𝑀𝑅2subscriptitalic-ϵ𝑊superscript𝑁22𝑁𝑀𝑅subscript𝐶𝑊𝑇superscript𝑁4superscript𝑀2superscript𝑁2superscript𝑅2𝑇\displaystyle=\epsilon_{W}^{-1}\operatorname{\mathcal{O}}\left((N^{2}(2NM+R)^{% 2}+\epsilon_{W}N^{2}(2NM+R))C_{W}T+(N^{4}M^{2}+N^{2}R^{2})T\right),= italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_O ( ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) ) italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_T + ( italic_N start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_T ) ,
g(T)𝑔𝑇\displaystyle g(T)italic_g ( italic_T ) =ϵW1𝒪(M(2NM+R)1/41+PTalogT).absentsuperscriptsubscriptitalic-ϵ𝑊1𝒪𝑀superscript2𝑁𝑀𝑅141superscriptsubscript𝑃𝑇𝑎𝑇\displaystyle=\epsilon_{W}^{-1}\operatorname{\mathcal{O}}\left(M(2NM+R)^{1/4}% \sqrt{1+P_{T}^{a}}\log T\right).= italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_O ( italic_M ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT square-root start_ARG 1 + italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_ARG roman_log italic_T ) .
Proof.

Recall the piecewise stability assumption in 1 infers Lemma 3.1:

ϵW𝔼[t=1Tn𝒩k𝒩Qn(k)(t)](N2(2NM+R)2+ϵWN2(2NM+R))CWTsubscriptitalic-ϵ𝑊𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡superscript𝑁2superscript2𝑁𝑀𝑅2subscriptitalic-ϵ𝑊superscript𝑁22𝑁𝑀𝑅subscript𝐶𝑊𝑇\displaystyle\quad\epsilon_{W}\operatornamewithlimits{\mathbb{E}}\left[\sum_{t% =1}^{T}\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q_{n}^{(k)}(t)\right]-(% N^{2}(2NM+R)^{2}+\epsilon_{W}N^{2}(2NM+R))C_{W}Titalic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] - ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) ) italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_T
𝔼[t=1T(n,m)k𝒩Cn,m(t)ån,m(k)(t)(Qm(k)(t)Qn(k)(t))]𝔼[t=1Tn𝒩k𝒩Qn(k)(t)λn(k)(t)].absent𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝑚subscript𝑘𝒩subscript𝐶𝑛𝑚𝑡superscriptsubscript̊𝑎𝑛𝑚𝑘𝑡superscriptsubscript𝑄𝑚𝑘𝑡superscriptsubscript𝑄𝑛𝑘𝑡𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡\displaystyle\leq-\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum_% {(n,m)\in{\mathcal{L}}}\sum_{k\in{\mathcal{N}}}C_{n,m}(t)\mathring{a}_{n,m}^{(% k)}(t)(Q_{m}^{(k)}(t)-Q_{n}^{(k)}(t))\right]-\operatornamewithlimits{\mathbb{E% }}\left[\sum_{t=1}^{T}\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q_{n}^{(% k)}(t)\lambda_{n}^{(k)}(t)\right].≤ - blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) over̊ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ( italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) ] - blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] .

From the non-negativity of Lyapunov drifts in Lemma 3.2, we know

00\displaystyle 0 𝔼[t=1T(n,m)k𝒩μn,m(k)(t)(Qm(k)(t)Qn(k)(t))+n𝒩k𝒩Qn(k)(t)λn(k)(t)]+absentlimit-from𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝑚subscript𝑘𝒩superscriptsubscript𝜇𝑛𝑚𝑘𝑡superscriptsubscript𝑄𝑚𝑘𝑡superscriptsubscript𝑄𝑛𝑘𝑡subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡\displaystyle\leq\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum_{% (n,m)\in{\mathcal{L}}}\sum_{k\in{\mathcal{N}}}\mu_{n,m}^{(k)}(t)\left(Q_{m}^{(% k)}(t)-Q_{n}^{(k)}(t)\right)+\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q% _{n}^{(k)}(t)\lambda_{n}^{(k)}(t)\right]+≤ blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ( italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) + ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] +
12N2((NM)2+2(NM)2+2R2)T.12superscript𝑁2superscript𝑁𝑀22superscript𝑁𝑀22superscript𝑅2𝑇\displaystyle\quad\frac{1}{2}N^{2}((NM)^{2}+2(NM)^{2}+2R^{2})T.divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_T .

Furthermore, recall the guarantee of AdaPFOL algorithm in Theorem 3.5 that

𝔼[t=1T(n,m)k𝒩(μn,m(k)(t)μ̊n,m(k)(t))(Qm(k)(t)Qn(k)(t))]𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝑚subscript𝑘𝒩superscriptsubscript𝜇𝑛𝑚𝑘𝑡superscriptsubscript̊𝜇𝑛𝑚𝑘𝑡superscriptsubscript𝑄𝑚𝑘𝑡superscriptsubscript𝑄𝑛𝑘𝑡\displaystyle\quad\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum_% {(n,m)\in{\mathcal{L}}}\sum_{k\in{\mathcal{N}}}(\mu_{n,m}^{(k)}(t)-\mathring{% \mu}_{n,m}^{(k)}(t))\left(Q_{m}^{(k)}(t)-Q_{n}^{(k)}(t)\right)\right]blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - over̊ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) ( italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) ]
=𝒪(M1+PTa𝔼[t=1T𝑸(t)22logTlog(maxt=1Tmax(n,m)M𝑸m(t)𝑸n(t))]),absent𝒪𝑀1superscriptsubscript𝑃𝑇𝑎𝔼superscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥𝑸𝑡22𝑇superscriptsubscript𝑡1𝑇subscript𝑛𝑚𝑀subscriptdelimited-∥∥subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡,\displaystyle=\operatorname{\mathcal{O}}\left(M\sqrt{1+P_{T}^{a}}% \operatornamewithlimits{\mathbb{E}}\left[\sqrt{\sum_{t=1}^{T}\lVert\bm{Q}(t)% \rVert_{2}^{2}}\log T\log\left(\max_{t=1}^{T}\max_{(n,m)\in{\mathcal{L}}}M% \lVert\bm{Q}_{m}(t)-\bm{Q}_{n}(t)\rVert_{\infty}\right)\right]\right)\text{,}= caligraphic_O ( italic_M square-root start_ARG 1 + italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_ARG blackboard_E [ square-root start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_log italic_T roman_log ( roman_max start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_max start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_M ∥ bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) ] ) ,

Therefore, we are able to get

ϵW𝔼[t=1Tn𝒩k𝒩Qn(k)(t)](N2(2NM+R)2+ϵWN2(2NM+R))CWTsubscriptitalic-ϵ𝑊𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡superscript𝑁2superscript2𝑁𝑀𝑅2subscriptitalic-ϵ𝑊superscript𝑁22𝑁𝑀𝑅subscript𝐶𝑊𝑇\displaystyle\quad\epsilon_{W}\operatornamewithlimits{\mathbb{E}}\left[\sum_{t% =1}^{T}\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q_{n}^{(k)}(t)\right]-(% N^{2}(2NM+R)^{2}+\epsilon_{W}N^{2}(2NM+R))C_{W}Titalic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] - ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) ) italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_T
𝒪(M1+PTa𝔼[t=1T𝑸(t)22logTlog(maxt=1Tmax(n,m)M𝑸m(t)𝑸n(t))])+absentlimit-from𝒪𝑀1superscriptsubscript𝑃𝑇𝑎𝔼superscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥𝑸𝑡22𝑇superscriptsubscript𝑡1𝑇subscript𝑛𝑚𝑀subscriptdelimited-∥∥subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡\displaystyle\leq\operatorname{\mathcal{O}}\left(M\sqrt{1+P_{T}^{a}}% \operatornamewithlimits{\mathbb{E}}\left[\sqrt{\sum_{t=1}^{T}\lVert\bm{Q}(t)% \rVert_{2}^{2}}\log T\log\left(\max_{t=1}^{T}\max_{(n,m)\in{\mathcal{L}}}M% \lVert\bm{Q}_{m}(t)-\bm{Q}_{n}(t)\rVert_{\infty}\right)\right]\right)+≤ caligraphic_O ( italic_M square-root start_ARG 1 + italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_ARG blackboard_E [ square-root start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_log italic_T roman_log ( roman_max start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_max start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_M ∥ bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) ] ) +
12N2((NM)2+2(NM)2+2R2)T.12superscript𝑁2superscript𝑁𝑀22superscript𝑁𝑀22superscript𝑅2𝑇\displaystyle\quad\frac{1}{2}N^{2}((NM)^{2}+2(NM)^{2}+2R^{2})T.divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_T .

Lemma D.3 states that if x1=0subscript𝑥10x_{1}=0italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0, x2,,xT0subscript𝑥2subscript𝑥𝑇0x_{2},\ldots,x_{T}\geq 0italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ≥ 0, and |xt+1xt|1subscript𝑥𝑡1subscript𝑥𝑡1\lvert x_{t+1}-x_{t}\rvert\leq 1| italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | ≤ 1, then t=1Txt2=𝒪((t=1Txt)3/2)superscriptsubscript𝑡1𝑇superscriptsubscript𝑥𝑡2𝒪superscriptsuperscriptsubscript𝑡1𝑇subscript𝑥𝑡32\sum_{t=1}^{T}x_{t}^{2}=\operatorname{\mathcal{O}}\left((\sum_{t=1}^{T}x_{t})^% {3/2}\right)∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = caligraphic_O ( ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT ). From Lemma B.1, any single queue Qn(k)(t)superscriptsubscript𝑄𝑛𝑘𝑡Q_{n}^{(k)}(t)italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) satisfies |Qn(k)(t+1)Qn(k)(t)|(2NM+R)superscriptsubscript𝑄𝑛𝑘𝑡1superscriptsubscript𝑄𝑛𝑘𝑡2𝑁𝑀𝑅\lvert Q_{n}^{(k)}(t+1)-Q_{n}^{(k)}(t)\rvert\leq(2NM+R)| italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t + 1 ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) | ≤ ( 2 italic_N italic_M + italic_R ). Hence, applying Lemma D.3 to {Qn(k)(t)/(2NM+R)}t[T]subscriptsuperscriptsubscript𝑄𝑛𝑘𝑡2𝑁𝑀𝑅𝑡delimited-[]𝑇\{Q_{n}^{(k)}(t)/(2NM+R)\}_{t\in[T]}{ italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) / ( 2 italic_N italic_M + italic_R ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT to every n𝒩𝑛𝒩n\in{\mathcal{N}}italic_n ∈ caligraphic_N and k𝒩𝑘𝒩k\in{\mathcal{N}}italic_k ∈ caligraphic_N, we have

t=1T𝑸(t)22=(2NM+R)2t=1Tn𝒩k𝒩(Qn(k)(t)2NM+R)2=𝒪(2NM+R(t=1T𝑸(t)1)1.5).superscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥𝑸𝑡22superscript2𝑁𝑀𝑅2superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑘𝒩superscriptsuperscriptsubscript𝑄𝑛𝑘𝑡2𝑁𝑀𝑅2𝒪2𝑁𝑀𝑅superscriptsuperscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡11.5\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{2}^{2}=(2NM+R)^{2}\sum_{t=1}^{T}\sum_{n% \in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}\left(\frac{Q_{n}^{(k)}(t)}{2NM+R}% \right)^{2}=\operatorname{\mathcal{O}}\left(\sqrt{2NM+R}\left(\sum_{t=1}^{T}% \lVert\bm{Q}(t)\rVert_{1}\right)^{1.5}\right).∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT ( divide start_ARG italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) end_ARG start_ARG 2 italic_N italic_M + italic_R end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = caligraphic_O ( square-root start_ARG 2 italic_N italic_M + italic_R end_ARG ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1.5 end_POSTSUPERSCRIPT ) .

Further noticing that

maxt=1Tmax(n,m)M𝑸m(t)𝑸n(t)t=1TMn𝒩𝑸n(t)1Mt=1T𝑸(t)1,superscriptsubscript𝑡1𝑇subscript𝑛𝑚𝑀subscriptdelimited-∥∥subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡superscriptsubscript𝑡1𝑇𝑀subscript𝑛𝒩subscriptdelimited-∥∥subscript𝑸𝑛𝑡1𝑀superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1\max_{t=1}^{T}\max_{(n,m)\in{\mathcal{L}}}M\lVert\bm{Q}_{m}(t)-\bm{Q}_{n}(t)% \rVert_{\infty}\leq\sum_{t=1}^{T}M\sum_{n\in{\mathcal{N}}}\lVert\bm{Q}_{n}(t)% \rVert_{1}\leq M\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{1},roman_max start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_max start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_M ∥ bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_M ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∥ bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_M ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ,

the above inequality becomes

ϵW𝔼[t=1T𝑸(t)1]subscriptitalic-ϵ𝑊𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1\displaystyle\epsilon_{W}\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{% T}\lVert\bm{Q}(t)\rVert_{1}\right]italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] 𝒪(M(2NM+R)1/41+PTa𝔼[(t=1T𝑸(t)1)3/4logTlog(Mt=1T𝑸(t)1)])+absentlimit-from𝒪𝑀superscript2𝑁𝑀𝑅141superscriptsubscript𝑃𝑇𝑎𝔼superscriptsuperscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡134𝑇𝑀superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1\displaystyle\leq\operatorname{\mathcal{O}}\left(M(2NM+R)^{1/4}\sqrt{1+P_{T}^{% a}}\operatornamewithlimits{\mathbb{E}}\left[\left(\sum_{t=1}^{T}\lVert\bm{Q}(t% )\rVert_{1}\right)^{3/4}\log T\log\left(M\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{% 1}\right)\right]\right)+≤ caligraphic_O ( italic_M ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT square-root start_ARG 1 + italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_ARG blackboard_E [ ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT roman_log italic_T roman_log ( italic_M ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ] ) +
𝒪((N2(2NM+R)2+ϵWN2(2NM+R))CWT+(N4M2+N2R2)T).𝒪superscript𝑁2superscript2𝑁𝑀𝑅2subscriptitalic-ϵ𝑊superscript𝑁22𝑁𝑀𝑅subscript𝐶𝑊𝑇superscript𝑁4superscript𝑀2superscript𝑁2superscript𝑅2𝑇\displaystyle\quad\operatorname{\mathcal{O}}\left((N^{2}(2NM+R)^{2}+\epsilon_{% W}N^{2}(2NM+R))C_{W}T+(N^{4}M^{2}+N^{2}R^{2})T\right).caligraphic_O ( ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) ) italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_T + ( italic_N start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_T ) .

Noticing that xx3/4log(Mx)maps-to𝑥superscript𝑥34𝑀𝑥x\mapsto x^{3/4}\log(Mx)italic_x ↦ italic_x start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT roman_log ( italic_M italic_x ) is concave when x𝑥xitalic_x is large enough, Jensen inequality then gives

𝒪(𝔼[(t=1T𝑸(t)1)3/4log(Mt=1T𝑸(t)1)])=𝒪(𝔼[t=1T𝑸(t)1]3/4log𝔼[t=1T𝑸(t)1]).\operatorname{\mathcal{O}}\left(\operatornamewithlimits{\mathbb{E}}\left[\left% (\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{1}\right)^{3/4}\log\left(M\sum_{t=1}^{T}% \lVert\bm{Q}(t)\rVert_{1}\right)\right]\right)=\operatorname{\mathcal{O}}\left% (\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_% {1}\right]^{3/4}\log\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}% \lVert\bm{Q}(t)\rVert_{1}\right]\right).caligraphic_O ( blackboard_E [ ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT roman_log ( italic_M ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ] ) = caligraphic_O ( blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT roman_log blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] ) .

Therefore, if we define the auxiliary functions f(T)𝑓𝑇f(T)italic_f ( italic_T ) and g(T)𝑔𝑇g(T)italic_g ( italic_T ) as

f(T)𝑓𝑇\displaystyle f(T)italic_f ( italic_T ) =ϵW1𝒪((N2(2NM+R)2+ϵWN2(2NM+R))CWT+(N4M2+N2R2)T),absentsuperscriptsubscriptitalic-ϵ𝑊1𝒪superscript𝑁2superscript2𝑁𝑀𝑅2subscriptitalic-ϵ𝑊superscript𝑁22𝑁𝑀𝑅subscript𝐶𝑊𝑇superscript𝑁4superscript𝑀2superscript𝑁2superscript𝑅2𝑇\displaystyle=\epsilon_{W}^{-1}\operatorname{\mathcal{O}}\left((N^{2}(2NM+R)^{% 2}+\epsilon_{W}N^{2}(2NM+R))C_{W}T+(N^{4}M^{2}+N^{2}R^{2})T\right),= italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_O ( ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) ) italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_T + ( italic_N start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_T ) ,
g(T)𝑔𝑇\displaystyle g(T)italic_g ( italic_T ) =ϵW1𝒪(M(2NM+R)1/41+PTalogT),absentsuperscriptsubscriptitalic-ϵ𝑊1𝒪𝑀superscript2𝑁𝑀𝑅141superscriptsubscript𝑃𝑇𝑎𝑇\displaystyle=\epsilon_{W}^{-1}\operatorname{\mathcal{O}}\left(M(2NM+R)^{1/4}% \sqrt{1+P_{T}^{a}}\log T\right),= italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_O ( italic_M ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT square-root start_ARG 1 + italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_ARG roman_log italic_T ) ,

we are able to conclude that

𝔼[t=1T𝑸(t)1]𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1\displaystyle\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{% Q}(t)\rVert_{1}\right]blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] f(T)+g(T)𝔼[t=1T𝑸(t)1]3/4log𝔼[t=1T𝑸(t)1],\displaystyle\leq f(T)+g(T)\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}% ^{T}\lVert\bm{Q}(t)\rVert_{1}\right]^{3/4}\log\operatornamewithlimits{\mathbb{% E}}\left[\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{1}\right],≤ italic_f ( italic_T ) + italic_g ( italic_T ) blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT roman_log blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] ,

as claimed. ∎

Appendix C Omitted Proofs for Multi-Hop Utility Maximization Tasks

C.1 Reference Policy Assumption (Proof of Lemma 4.1)

Lemma C.1 (Restatement of Lemma 4.1; Ability of {(𝒂̊(t),𝝀̊(t))}t[T]subscript̊𝒂𝑡̊𝝀𝑡𝑡delimited-[]𝑇\{(\mathring{\bm{a}}(t),\mathring{\bm{\lambda}}(t))\}_{t\in[T]}{ ( over̊ start_ARG bold_italic_a end_ARG ( italic_t ) , over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT in Stabilizing the Network).

If {(𝐚̊(t),𝛌̊(t))}t[T]subscript̊𝐚𝑡̊𝛌𝑡𝑡delimited-[]𝑇\{(\mathring{\bm{a}}(t),\mathring{\bm{\lambda}}(t))\}_{t\in[T]}{ ( over̊ start_ARG bold_italic_a end_ARG ( italic_t ) , over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT satisfies 2, then for any scheduler-generated queue lengths {𝐐(t)}t[T]subscript𝐐𝑡𝑡delimited-[]𝑇\{\bm{Q}(t)\}_{t\in[T]}{ bold_italic_Q ( italic_t ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT,

ϵW𝔼[t=1Tn𝒩k𝒩Qn(k)(t)](N2(2NM+R)2+ϵWN2(2NM+R))CWTsubscriptitalic-ϵ𝑊𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡superscript𝑁2superscript2𝑁𝑀𝑅2subscriptitalic-ϵ𝑊superscript𝑁22𝑁𝑀𝑅subscript𝐶𝑊𝑇\displaystyle\quad\epsilon_{W}\operatornamewithlimits{\mathbb{E}}\left[\sum_{t% =1}^{T}\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q_{n}^{(k)}(t)\right]-(% N^{2}(2NM+R)^{2}+\epsilon_{W}N^{2}(2NM+R))C_{W}Titalic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] - ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) ) italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_T
𝔼[t=1T(n,m)k𝒩μ̊n,m(k)(t)(Qm(k)(t)Qn(k)(t))]𝔼[t=1Tn𝒩k𝒩Qn(k)(t)λ̊n(k)(t)].absent𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝑚subscript𝑘𝒩superscriptsubscript̊𝜇𝑛𝑚𝑘𝑡superscriptsubscript𝑄𝑚𝑘𝑡superscriptsubscript𝑄𝑛𝑘𝑡𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡superscriptsubscript̊𝜆𝑛𝑘𝑡\displaystyle\leq-\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum_% {(n,m)\in{\mathcal{L}}}\sum_{k\in{\mathcal{N}}}\mathring{\mu}_{n,m}^{(k)}(t)(Q% _{m}^{(k)}(t)-Q_{n}^{(k)}(t))\right]-\operatornamewithlimits{\mathbb{E}}\left[% \sum_{t=1}^{T}\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q_{n}^{(k)}(t)% \mathring{\lambda}_{n}^{(k)}(t)\right].≤ - blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT over̊ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ( italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) ] - blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) over̊ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] .
Proof.

The proof of this lemma is identical to that of Lemma 3.1, except for replacing the environment-generated 𝝀(t)𝝀𝑡\bm{\lambda}(t)bold_italic_λ ( italic_t ) with the 𝝀̊(t)̊𝝀𝑡\mathring{\bm{\lambda}}(t)over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) generated by the reference policy. For more details, please refer to Section B.1. ∎

C.2 Lyapunov Drift-Plus-Penalty Analysis (Proof of Lemma C.2)

Lemma C.2 (Lyapunov Drift-Plus-Penalty Analysis).

Under the queue dynamics of Equation 1,

𝔼[t=1T(n,m)k𝒩Cn,m(t)(Qm(k)(t)Qn(k)(t))an,m(k)(t)]𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝑚subscript𝑘𝒩subscript𝐶𝑛𝑚𝑡superscriptsubscript𝑄𝑚𝑘𝑡superscriptsubscript𝑄𝑛𝑘𝑡superscriptsubscript𝑎𝑛𝑚𝑘𝑡\displaystyle\quad-\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum% _{(n,m)\in{\mathcal{L}}}\sum_{k\in{\mathcal{N}}}C_{n,m}(t)(Q_{m}^{(k)}(t)-Q_{n% }^{(k)}(t))a_{n,m}^{(k)}(t)\right]- blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) ( italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) italic_a start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ]
𝔼[t=1Tn𝒩k𝒩Qn(k)(t)λn(k)(t)]+V𝔼[t=1T(gt(𝝀(t))gt(𝝀̊(t)))]𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡𝑉𝔼superscriptsubscript𝑡1𝑇subscript𝑔𝑡𝝀𝑡subscript𝑔𝑡̊𝝀𝑡\displaystyle\quad-\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum% _{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q_{n}^{(k)}(t)\lambda_{n}^{(k)}(t)% \right]+V\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}(g_{t}(\bm{% \lambda}(t))-g_{t}(\mathring{\bm{\lambda}}(t)))\right]- blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] + italic_V blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) ) ]
12N2((NM)2+2(NM)2+2R2)T+V𝔼[t=1T(gt(𝝀(t))gt(𝝀̊(t)))].absent12superscript𝑁2superscript𝑁𝑀22superscript𝑁𝑀22superscript𝑅2𝑇𝑉𝔼superscriptsubscript𝑡1𝑇subscript𝑔𝑡𝝀𝑡subscript𝑔𝑡̊𝝀𝑡\displaystyle\leq\frac{1}{2}N^{2}((NM)^{2}+2(NM)^{2}+2R^{2})T+V% \operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}(g_{t}(\bm{\lambda}(t))% -g_{t}(\mathring{\bm{\lambda}}(t)))\right].≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_T + italic_V blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) ) ] .
Proof.

The proof follows from applying Lemma 3.2 and adding

V𝔼[t=1T(gt(𝝀(t))gt(𝝀̊(t)))]𝑉𝔼superscriptsubscript𝑡1𝑇subscript𝑔𝑡𝝀𝑡subscript𝑔𝑡̊𝝀𝑡V\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}(g_{t}(\bm{\lambda}(t)% )-g_{t}(\mathring{\bm{\lambda}}(t)))\right]italic_V blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) ) ]

to both sides. ∎

C.3 Guarantee of AdaBGD Algorithm (Proof of Lemma 4.3)

Lemma C.3 (Restatement of Lemma 4.3; Guarantee of AdaBGD Algorithm).

Suppose that r𝔹𝒳R𝔹𝑟𝔹𝒳𝑅𝔹r\mathbb{B}\subseteq\mathcal{X}\subseteq R\mathbb{B}italic_r blackboard_B ⊆ caligraphic_X ⊆ italic_R blackboard_B, the t𝑡titalic_t-th loss tsubscript𝑡\ell_{t}roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is bounded by Ctsubscript𝐶𝑡C_{t}italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and is Ltsubscript𝐿𝑡L_{t}italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT-Lipschitz. Suppose that ηtsubscript𝜂𝑡\eta_{t}italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and δtsubscript𝛿𝑡\delta_{t}italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are both t1subscript𝑡1\mathcal{F}_{t-1}caligraphic_F start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT-measurable (where (t)t=0Tsuperscriptsubscriptsubscript𝑡𝑡0𝑇(\mathcal{F}_{t})_{t=0}^{T}( caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT is the natural filtration), η1>η2>>ηTsubscript𝜂1subscript𝜂2subscript𝜂𝑇\eta_{1}>\eta_{2}>\cdots>\eta_{T}italic_η start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > italic_η start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > ⋯ > italic_η start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, and αtδt/r<1subscript𝛼𝑡subscript𝛿𝑡𝑟1\alpha_{t}\triangleq\delta_{t}/r<1italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≜ italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT / italic_r < 1 a.s. for all t[T]𝑡delimited-[]𝑇t\in[T]italic_t ∈ [ italic_T ]. Then for any fixed 𝐮1,𝐮2,,𝐮T𝒳subscript𝐮1subscript𝐮2subscript𝐮𝑇𝒳\bm{u}_{1},\bm{u}_{2},\ldots,\bm{u}_{T}\in\mathcal{X}bold_italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , bold_italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∈ caligraphic_X, the AdaBGD algorithm in Algorithm 4 enjoys the following guarantee:

D-RegretTBCO(𝒖1,𝒖2,,𝒖T)=𝔼[t=1T(t(𝒙t)t(𝒖t))]superscriptsubscriptD-Regret𝑇BCOsubscript𝒖1subscript𝒖2subscript𝒖𝑇𝔼superscriptsubscript𝑡1𝑇subscript𝑡subscript𝒙𝑡subscript𝑡subscript𝒖𝑡\displaystyle\quad\text{D-Regret}_{T}^{\text{BCO}}(\bm{u}_{1},\bm{u}_{2},% \ldots,\bm{u}_{T})=\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}(% \ell_{t}(\bm{x}_{t})-\ell_{t}(\bm{u}_{t}))\right]D-Regret start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BCO end_POSTSUPERSCRIPT ( bold_italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , bold_italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) = blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) ]
𝔼[7R24ηT+PTRηT+t=1T(ηt2d2δt2Ct2+3Ltδt+LtαtR)],absent𝔼7superscript𝑅24subscript𝜂𝑇subscript𝑃𝑇𝑅subscript𝜂𝑇superscriptsubscript𝑡1𝑇subscript𝜂𝑡2superscript𝑑2superscriptsubscript𝛿𝑡2superscriptsubscript𝐶𝑡23subscript𝐿𝑡subscript𝛿𝑡subscript𝐿𝑡subscript𝛼𝑡𝑅\displaystyle\leq\operatornamewithlimits{\mathbb{E}}\left[\frac{7R^{2}}{4\eta_% {T}}+\frac{P_{T}R}{\eta_{T}}+\sum_{t=1}^{T}\left(\frac{\eta_{t}}{2}\frac{d^{2}% }{\delta_{t}^{2}}C_{t}^{2}+3L_{t}\delta_{t}+L_{t}\alpha_{t}R\right)\right],≤ blackboard_E [ divide start_ARG 7 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_η start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG + divide start_ARG italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT italic_R end_ARG start_ARG italic_η start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( divide start_ARG italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG divide start_ARG italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 3 italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_R ) ] ,

where PT=t=1T1𝐮t𝐮t+1subscript𝑃𝑇superscriptsubscript𝑡1𝑇1delimited-∥∥subscript𝐮𝑡subscript𝐮𝑡1P_{T}=\sum_{t=1}^{T-1}\lVert\bm{u}_{t}-\bm{u}_{t+1}\rVertitalic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∥ bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - bold_italic_u start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∥ is the path length of the comparator sequence {𝐮t}t[T]subscriptsubscript𝐮𝑡𝑡delimited-[]𝑇\{\bm{u}_{t}\}_{t\in[T]}{ bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT.

Proof.

Similar to the proof by Zhao et al. (2021, Theorem 1), let ^t(𝒙)=𝔼𝒗𝔹[t(𝒙+δ𝒗)]subscript^𝑡𝒙subscript𝔼𝒗𝔹subscript𝑡𝒙𝛿𝒗\widehat{\ell}_{t}(\bm{x})=\operatornamewithlimits{\mathbb{E}}_{\bm{v}\in% \mathbb{B}}[\ell_{t}(\bm{x}+\delta\bm{v})]over^ start_ARG roman_ℓ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x ) = blackboard_E start_POSTSUBSCRIPT bold_italic_v ∈ blackboard_B end_POSTSUBSCRIPT [ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x + italic_δ bold_italic_v ) ] (where 𝔹={𝒙d𝒙1}𝔹conditional-set𝒙superscript𝑑delimited-∥∥𝒙1\mathbb{B}=\{\bm{x}\in\mathbb{R}^{d}\mid\lVert\bm{x}\rVert\leq 1\}blackboard_B = { bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ∣ ∥ bold_italic_x ∥ ≤ 1 } is the unit ball in dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT) and 𝒗t=(1αt)𝒖t(1αt)𝒳subscript𝒗𝑡1subscript𝛼𝑡subscript𝒖𝑡1subscript𝛼𝑡𝒳\bm{v}_{t}=(1-\alpha_{t})\bm{u}_{t}\in(1-\alpha_{t})\mathcal{X}bold_italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( 1 - italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ ( 1 - italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) caligraphic_X, then

t=1T(t(𝒙t)t(𝒖t))=t=1T(^t(𝒚t)^t(𝒗t))+t=1T(t(𝒙t)^t(𝒚t))+t=1T(^t(𝒗t)t(𝒖t)).superscriptsubscript𝑡1𝑇subscript𝑡subscript𝒙𝑡subscript𝑡subscript𝒖𝑡superscriptsubscript𝑡1𝑇subscript^𝑡subscript𝒚𝑡subscript^𝑡subscript𝒗𝑡superscriptsubscript𝑡1𝑇subscript𝑡subscript𝒙𝑡subscript^𝑡subscript𝒚𝑡superscriptsubscript𝑡1𝑇subscript^𝑡subscript𝒗𝑡subscript𝑡subscript𝒖𝑡\sum_{t=1}^{T}(\ell_{t}(\bm{x}_{t})-\ell_{t}(\bm{u}_{t}))=\sum_{t=1}^{T}(% \widehat{\ell}_{t}(\bm{y}_{t})-\widehat{\ell}_{t}(\bm{v}_{t}))+\sum_{t=1}^{T}(% \ell_{t}(\bm{x}_{t})-\widehat{\ell}_{t}(\bm{y}_{t}))+\sum_{t=1}^{T}(\widehat{% \ell}_{t}(\bm{v}_{t})-\ell_{t}(\bm{u}_{t})).∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( over^ start_ARG roman_ℓ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - over^ start_ARG roman_ℓ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - over^ start_ARG roman_ℓ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( over^ start_ARG roman_ℓ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) .

According to the original proof, the expectation of the latter two terms are controlled by 𝔼[t=1T2Lδt]𝔼superscriptsubscript𝑡1𝑇2𝐿subscript𝛿𝑡\operatornamewithlimits{\mathbb{E}}[\sum_{t=1}^{T}2L\delta_{t}]blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT 2 italic_L italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] and 𝔼[t=1T(Lδt+LαtR)]𝔼superscriptsubscript𝑡1𝑇𝐿subscript𝛿𝑡𝐿subscript𝛼𝑡𝑅\operatornamewithlimits{\mathbb{E}}[\sum_{t=1}^{T}(L\delta_{t}+L\alpha_{t}R)]blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_L italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_L italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_R ) ], respectively. For the first term, according to Flaxman et al. (2005, Lemma 2.1), 𝔼𝒔t[dδt(𝒙t)𝒔t]=^t(𝒙t)subscript𝔼subscript𝒔𝑡𝑑𝛿subscript𝑡subscript𝒙𝑡subscript𝒔𝑡subscript^𝑡subscript𝒙𝑡\operatornamewithlimits{\mathbb{E}}_{\bm{s}_{t}}[\frac{d}{\delta}\ell_{t}(\bm{% x}_{t})\bm{s}_{t}]=\nabla\widehat{\ell}_{t}(\bm{x}_{t})blackboard_E start_POSTSUBSCRIPT bold_italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ divide start_ARG italic_d end_ARG start_ARG italic_δ end_ARG roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) bold_italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] = ∇ over^ start_ARG roman_ℓ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ). Therefore, since dδtt(𝒙t)𝒔tdδtCtdelimited-∥∥𝑑subscript𝛿𝑡subscript𝑡subscript𝒙𝑡subscript𝒔𝑡𝑑subscript𝛿𝑡subscript𝐶𝑡\lVert\frac{d}{\delta_{t}}\ell_{t}(\bm{x}_{t})\bm{s}_{t}\rVert\leq\frac{d}{% \delta_{t}}C_{t}∥ divide start_ARG italic_d end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) bold_italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ≤ divide start_ARG italic_d end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, we get from Lemma C.4 that

𝔼[t=1T(^t(𝒙t)^t(𝒗t))]𝔼[7R24ηT+PTRηT+t=1Tηt2d2δt2Ct2],𝔼superscriptsubscript𝑡1𝑇subscript^𝑡subscript𝒙𝑡subscript^𝑡subscript𝒗𝑡𝔼7superscript𝑅24subscript𝜂𝑇subscript𝑃𝑇𝑅subscript𝜂𝑇superscriptsubscript𝑡1𝑇subscript𝜂𝑡2superscript𝑑2superscriptsubscript𝛿𝑡2superscriptsubscript𝐶𝑡2\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}(\widehat{\ell}_{t}(\bm% {x}_{t})-\widehat{\ell}_{t}(\bm{v}_{t}))\right]\leq\operatornamewithlimits{% \mathbb{E}}\left[\frac{7R^{2}}{4\eta_{T}}+\frac{P_{T}R}{\eta_{T}}+\sum_{t=1}^{% T}\frac{\eta_{t}}{2}\frac{d^{2}}{\delta_{t}^{2}}C_{t}^{2}\right],blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( over^ start_ARG roman_ℓ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - over^ start_ARG roman_ℓ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) ] ≤ blackboard_E [ divide start_ARG 7 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_η start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG + divide start_ARG italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT italic_R end_ARG start_ARG italic_η start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT divide start_ARG italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG divide start_ARG italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ,

where P^T=t=1T1𝒗t𝒗t+1subscript^𝑃𝑇superscriptsubscript𝑡1𝑇1delimited-∥∥subscript𝒗𝑡subscript𝒗𝑡1\widehat{P}_{T}=\sum_{t=1}^{T-1}\lVert\bm{v}_{t}-\bm{v}_{t+1}\rVertover^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∥ bold_italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - bold_italic_v start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∥, the path length of {𝒗t}t[T]subscriptsubscript𝒗𝑡𝑡delimited-[]𝑇\{\bm{v}_{t}\}_{t\in[T]}{ bold_italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT, satisfies P^TPTsubscript^𝑃𝑇subscript𝑃𝑇\widehat{P}_{T}\leq P_{T}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ≤ italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT. This gives our conclusion. ∎

Lemma C.4 (Guarantee of Projected SGD).

Suppose that 𝒳𝒳\mathcal{X}caligraphic_X is bounded by [r,R]𝑟𝑅[r,R][ italic_r , italic_R ], the t𝑡titalic_t-th loss function tsubscript𝑡\ell_{t}roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is bounded by [Ct,Ct]subscript𝐶𝑡subscript𝐶𝑡[-C_{t},C_{t}][ - italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] and is L𝐿Litalic_L-Lipschitz. Further suppose that a stochastic gradient 𝐠tsubscript𝐠𝑡\bm{g}_{t}bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT can be calculated in round t𝑡titalic_t such that 𝔼[𝐠t𝐱1,1,,𝐱t,t]=t(𝐱t)𝔼conditionalsubscript𝐠𝑡subscript𝐱1subscript1subscript𝐱𝑡subscript𝑡subscript𝑡subscript𝐱𝑡\operatornamewithlimits{\mathbb{E}}[\bm{g}_{t}\mid\bm{x}_{1},\ell_{1},\ldots,% \bm{x}_{t},\ell_{t}]=\nabla\ell_{t}(\bm{x}_{t})blackboard_E [ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∣ bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] = ∇ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) and 𝐠t2Ctsubscriptdelimited-∥∥subscript𝐠𝑡2subscript𝐶𝑡\lVert\bm{g}_{t}\rVert_{2}\leq C_{t}∥ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. The the iteration

𝒙t+1=Proj𝒳[𝒙tηt𝒈t]subscript𝒙𝑡1subscriptProj𝒳delimited-[]subscript𝒙𝑡subscript𝜂𝑡subscript𝒈𝑡\bm{x}_{t+1}=\text{Proj}_{\mathcal{X}}\left[\bm{x}_{t}-\eta_{t}\bm{g}_{t}\right]bold_italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = Proj start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT [ bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ]

ensures the following dynamic regret guarantee for any fixed 𝐮1,𝐮2,,𝐮T𝒳subscript𝐮1subscript𝐮2subscript𝐮𝑇𝒳\bm{u}_{1},\bm{u}_{2},\ldots,\bm{u}_{T}\in\mathcal{X}bold_italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , bold_italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∈ caligraphic_X:

t=1T(t(𝒙t)t(𝒖t))7R24ηT+PTRηT+t=1Tηt2Ct2,superscriptsubscript𝑡1𝑇subscript𝑡subscript𝒙𝑡subscript𝑡subscript𝒖𝑡7superscript𝑅24subscript𝜂𝑇subscript𝑃𝑇𝑅subscript𝜂𝑇superscriptsubscript𝑡1𝑇subscript𝜂𝑡2superscriptsubscript𝐶𝑡2\sum_{t=1}^{T}(\ell_{t}(\bm{x}_{t})-\ell_{t}(\bm{u}_{t}))\leq\frac{7R^{2}}{4% \eta_{T}}+\frac{P_{T}R}{\eta_{T}}+\sum_{t=1}^{T}\frac{\eta_{t}}{2}C_{t}^{2},∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) ≤ divide start_ARG 7 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_η start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG + divide start_ARG italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT italic_R end_ARG start_ARG italic_η start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT divide start_ARG italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where PT=t=1T1𝐮t𝐮t+1subscript𝑃𝑇superscriptsubscript𝑡1𝑇1delimited-∥∥subscript𝐮𝑡subscript𝐮𝑡1P_{T}=\sum_{t=1}^{T-1}\lVert\bm{u}_{t}-\bm{u}_{t+1}\rVertitalic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∥ bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - bold_italic_u start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∥ is the path length of {𝐮t}t=1Tsuperscriptsubscriptsubscript𝐮𝑡𝑡1𝑇\{\bm{u}_{t}\}_{t=1}^{T}{ bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT.

Proof.

We first consider the full-feedback model where the whole tsubscript𝑡\ell_{t}roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, instead of the single-entry t(𝒙t)subscript𝑡subscript𝒙𝑡\ell_{t}(\bm{x}_{t})roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), is available. Then the Gradient Descent algorithm 𝒙t+1=Proj𝒳[𝒙tηtt(𝒙t)]subscript𝒙𝑡1subscriptProj𝒳delimited-[]subscript𝒙𝑡subscript𝜂𝑡subscript𝑡subscript𝒙𝑡\bm{x}_{t+1}=\text{Proj}_{\mathcal{X}}[\bm{x}_{t}-\eta_{t}\nabla\ell_{t}(\bm{x% }_{t})]bold_italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = Proj start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT [ bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∇ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] enjoys the following dynamic regret guarantee (which follows the proof of Zinkevich (2003, Theorem 2)):

D-RegretT(𝒖1,𝒖1,,𝒖T)=t=1T(t(𝒙t)t(𝒖t))subscriptD-Regret𝑇subscript𝒖1subscript𝒖1subscript𝒖𝑇superscriptsubscript𝑡1𝑇subscript𝑡subscript𝒙𝑡subscript𝑡subscript𝒖𝑡\displaystyle\quad\text{D-Regret}_{T}(\bm{u}_{1},\bm{u}_{1},\ldots,\bm{u}_{T})% =\sum_{t=1}^{T}(\ell_{t}(\bm{x}_{t})-\ell_{t}(\bm{u}_{t}))D-Regret start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) )
t=1T(12ηt(𝒙t𝒖t2𝒙t+1𝒖t2)+ηt2t(𝒙t)2)absentsuperscriptsubscript𝑡1𝑇12subscript𝜂𝑡superscriptdelimited-∥∥subscript𝒙𝑡subscript𝒖𝑡2superscriptdelimited-∥∥subscript𝒙𝑡1subscript𝒖𝑡2subscript𝜂𝑡2superscriptdelimited-∥∥subscript𝑡subscript𝒙𝑡2\displaystyle\leq\sum_{t=1}^{T}\left(\frac{1}{2\eta_{t}}\left(\lVert\bm{x}_{t}% -\bm{u}_{t}\rVert^{2}-\lVert\bm{x}_{t+1}-\bm{u}_{t}\rVert^{2}\right)+\frac{% \eta_{t}}{2}\lVert\nabla\ell_{t}(\bm{x}_{t})\rVert^{2}\right)≤ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG ( ∥ bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ∥ bold_italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + divide start_ARG italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ ∇ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )
=t=1T𝒙t2𝒙t+122ηt+t=1T𝒙t+1𝒙t,𝒖tηt+t=1Tηt2t(xt)2absentsuperscriptsubscript𝑡1𝑇superscriptdelimited-∥∥subscript𝒙𝑡2superscriptdelimited-∥∥subscript𝒙𝑡122subscript𝜂𝑡superscriptsubscript𝑡1𝑇subscript𝒙𝑡1subscript𝒙𝑡subscript𝒖𝑡subscript𝜂𝑡superscriptsubscript𝑡1𝑇subscript𝜂𝑡2superscriptdelimited-∥∥subscript𝑡subscript𝑥𝑡2\displaystyle=\sum_{t=1}^{T}\frac{\lVert\bm{x}_{t}\rVert^{2}-\lVert\bm{x}_{t+1% }\rVert^{2}}{2\eta_{t}}+\sum_{t=1}^{T}\frac{\langle\bm{x}_{t+1}-\bm{x}_{t},\bm% {u}_{t}\rangle}{\eta_{t}}+\sum_{t=1}^{T}\frac{\eta_{t}}{2}\lVert\nabla\ell_{t}% (x_{t})\rVert^{2}= ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT divide start_ARG ∥ bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ∥ bold_italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT divide start_ARG ⟨ bold_italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩ end_ARG start_ARG italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT divide start_ARG italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ ∇ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
(a)𝒙12𝒙T+122ηT+𝒙T+1,𝒖TηT𝒙1,𝒖1η1+t=2T𝒖t1𝒖t,𝒙tηt+t=1Tηt2t(xt)2𝑎superscriptdelimited-∥∥subscript𝒙12superscriptdelimited-∥∥subscript𝒙𝑇122subscript𝜂𝑇subscript𝒙𝑇1subscript𝒖𝑇subscript𝜂𝑇subscript𝒙1subscript𝒖1subscript𝜂1superscriptsubscript𝑡2𝑇subscript𝒖𝑡1subscript𝒖𝑡subscript𝒙𝑡subscript𝜂𝑡superscriptsubscript𝑡1𝑇subscript𝜂𝑡2superscriptdelimited-∥∥subscript𝑡subscript𝑥𝑡2\displaystyle\overset{(a)}{\leq}\frac{\lVert\bm{x}_{1}\rVert^{2}-\lVert\bm{x}_% {T+1}\rVert^{2}}{2\eta_{T}}+\frac{\langle\bm{x}_{T+1},\bm{u}_{T}\rangle}{\eta_% {T}}-\frac{\langle\bm{x}_{1},\bm{u}_{1}\rangle}{\eta_{1}}+\sum_{t=2}^{T}\frac{% \langle\bm{u}_{t-1}-\bm{u}_{t},\bm{x}_{t}\rangle}{\eta_{t}}+\sum_{t=1}^{T}% \frac{\eta_{t}}{2}\lVert\nabla\ell_{t}(x_{t})\rVert^{2}start_OVERACCENT ( italic_a ) end_OVERACCENT start_ARG ≤ end_ARG divide start_ARG ∥ bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ∥ bold_italic_x start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_η start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG + divide start_ARG ⟨ bold_italic_x start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT , bold_italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ⟩ end_ARG start_ARG italic_η start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG - divide start_ARG ⟨ bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ end_ARG start_ARG italic_η start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG + ∑ start_POSTSUBSCRIPT italic_t = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT divide start_ARG ⟨ bold_italic_u start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT - bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩ end_ARG start_ARG italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT divide start_ARG italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ ∇ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
7R24ηT+PTRηT+t=1Tηt2t(xt)2,absent7superscript𝑅24subscript𝜂𝑇subscript𝑃𝑇𝑅subscript𝜂𝑇superscriptsubscript𝑡1𝑇subscript𝜂𝑡2superscriptdelimited-∥∥subscript𝑡subscript𝑥𝑡2\displaystyle\leq\frac{7R^{2}}{4\eta_{T}}+\frac{P_{T}R}{\eta_{T}}+\sum_{t=1}^{% T}\frac{\eta_{t}}{2}\lVert\nabla\ell_{t}(x_{t})\rVert^{2},≤ divide start_ARG 7 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_η start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG + divide start_ARG italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT italic_R end_ARG start_ARG italic_η start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT divide start_ARG italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ ∇ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (22)

where (a) uses the property that η1η2ηTsubscript𝜂1subscript𝜂2subscript𝜂𝑇\eta_{1}\geq\eta_{2}\geq\cdots\geq\eta_{T}italic_η start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ italic_η start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ ⋯ ≥ italic_η start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT.

Moving back to the bandit-feedback model, following the proof of Zhao et al. (2021, Theorem 8), we can consider a loss function defined as ~t(x)t(x)+𝒙,𝒈tt(𝒙t)subscript~𝑡𝑥subscript𝑡𝑥𝒙subscript𝒈𝑡subscript𝑡subscript𝒙𝑡\widetilde{\ell}_{t}(x)\triangleq\ell_{t}(x)+\langle\bm{x},\bm{g}_{t}-\nabla% \ell_{t}(\bm{x}_{t})\rangleover~ start_ARG roman_ℓ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ≜ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) + ⟨ bold_italic_x , bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - ∇ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ⟩. As ~t(𝒙t)=𝒈tsubscript~𝑡subscript𝒙𝑡subscript𝒈𝑡\nabla\widetilde{\ell}_{t}(\bm{x}_{t})=\bm{g}_{t}∇ over~ start_ARG roman_ℓ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and 𝔼[~t(𝒙)]=𝔼[t(𝒙)]𝔼subscript~𝑡𝒙𝔼subscript𝑡𝒙\operatornamewithlimits{\mathbb{E}}[\widetilde{\ell}_{t}(\bm{x})]=% \operatornamewithlimits{\mathbb{E}}[\ell_{t}(\bm{x})]blackboard_E [ over~ start_ARG roman_ℓ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x ) ] = blackboard_E [ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x ) ] (which is due to the fact that 𝔼[𝒈t]=t(𝒙t)𝔼subscript𝒈𝑡subscript𝑡subscript𝒙𝑡\operatornamewithlimits{\mathbb{E}}[\bm{g}_{t}]=\nabla\ell_{t}(\bm{x}_{t})blackboard_E [ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] = ∇ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )), applying Equation 22 gives

t=1T(t(𝒙t)t(𝒖t))=𝔼[t=1T(~t(𝒙t)~t(𝒖t))]superscriptsubscript𝑡1𝑇subscript𝑡subscript𝒙𝑡subscript𝑡subscript𝒖𝑡𝔼superscriptsubscript𝑡1𝑇subscript~𝑡subscript𝒙𝑡subscript~𝑡subscript𝒖𝑡\displaystyle\quad\sum_{t=1}^{T}(\ell_{t}(\bm{x}_{t})-\ell_{t}(\bm{u}_{t}))=% \operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}(\widetilde{\ell}_{t}(% \bm{x}_{t})-\widetilde{\ell}_{t}(\bm{u}_{t}))\right]∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) = blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( over~ start_ARG roman_ℓ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - over~ start_ARG roman_ℓ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) ]
𝔼[7R24ηT+PTRηT+t=1Tηt2~t(𝒙t)2](b)7R24ηT+PTRηT+t=1Tηt2Ct2,absent𝔼7superscript𝑅24subscript𝜂𝑇subscript𝑃𝑇𝑅subscript𝜂𝑇superscriptsubscript𝑡1𝑇subscript𝜂𝑡2superscriptdelimited-∥∥subscript~𝑡subscript𝒙𝑡2𝑏7superscript𝑅24subscript𝜂𝑇subscript𝑃𝑇𝑅subscript𝜂𝑇superscriptsubscript𝑡1𝑇subscript𝜂𝑡2superscriptsubscript𝐶𝑡2\displaystyle\leq\operatornamewithlimits{\mathbb{E}}\left[\frac{7R^{2}}{4\eta_% {T}}+\frac{P_{T}R}{\eta_{T}}+\sum_{t=1}^{T}\frac{\eta_{t}}{2}\lVert\nabla% \widetilde{\ell}_{t}(\bm{x}_{t})\rVert^{2}\right]\overset{(b)}{\leq}\frac{7R^{% 2}}{4\eta_{T}}+\frac{P_{T}R}{\eta_{T}}+\sum_{t=1}^{T}\frac{\eta_{t}}{2}C_{t}^{% 2},≤ blackboard_E [ divide start_ARG 7 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_η start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG + divide start_ARG italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT italic_R end_ARG start_ARG italic_η start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT divide start_ARG italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ ∇ over~ start_ARG roman_ℓ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] start_OVERACCENT ( italic_b ) end_OVERACCENT start_ARG ≤ end_ARG divide start_ARG 7 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 italic_η start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG + divide start_ARG italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT italic_R end_ARG start_ARG italic_η start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT divide start_ARG italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where the expectation is taken w.r.t. the randomness in the stochastic gradient {𝒈t}t[T]subscriptsubscript𝒈𝑡𝑡delimited-[]𝑇\{\bm{g}_{t}\}_{t\in[T]}{ bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT, and (b) makes use of the fact that ~t(𝒙t)=𝒈tsubscript~𝑡subscript𝒙𝑡subscript𝒈𝑡\nabla\widetilde{\ell}_{t}(\bm{x}_{t})=\bm{g}_{t}∇ over~ start_ARG roman_ℓ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = bold_italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. ∎

C.4 Deciding 𝝀(t)𝝀𝑡\bm{\lambda}(t)bold_italic_λ ( italic_t ) via AdaBGD Algorithm (Proof of Theorem 4.4)

Theorem C.5 (Restatement of Theorem 4.4; Deciding 𝝀(t)𝝀𝑡\bm{\lambda}(t)bold_italic_λ ( italic_t ) via AdaBGD Algorithm).

For the reference arrival rates {𝛌̊(t)}t[T]subscript̊𝛌𝑡𝑡delimited-[]𝑇\{\mathring{\bm{\lambda}}(t)\}_{t\in[T]}{ over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT defined in 2, suppose that its path length ensures

Ptλt=1T1𝝀̊(t+1)𝝀̊(t))1Cλt1/2δλ,t=1,2,,T,P_{t}^{\lambda}\triangleq\sum_{t=1}^{T-1}\lVert\mathring{\bm{\lambda}}({t+1})-% \mathring{\bm{\lambda}}(t))\rVert_{1}\leq C^{\lambda}t^{1/2-\delta_{\lambda}},% \quad\forall t=1,2,\ldots,T,italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT ≜ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∥ over̊ start_ARG bold_italic_λ end_ARG ( italic_t + 1 ) - over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , ∀ italic_t = 1 , 2 , … , italic_T ,

where, similar to Theorem 3.6, Cλsuperscript𝐶𝜆C^{\lambda}italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT and δλsubscript𝛿𝜆\delta_{\lambda}italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT are assumed to be known constants but the precise Ptλsuperscriptsubscript𝑃𝑡𝜆P_{t}^{\lambda}italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT or {𝛌̊(t)}t[T]subscript̊𝛌𝑡𝑡delimited-[]𝑇\{\mathring{\bm{\lambda}}(t)\}_{t\in[T]}{ over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT both remain unknown. Suppose that the action set ΛΛ\Lambdaroman_Λ is bounded by [r,R]𝑟𝑅[r,R][ italic_r , italic_R ] (i.e., r𝔹ΛR𝔹𝑟𝔹Λ𝑅𝔹r\mathbb{B}\subseteq\Lambda\subseteq R\mathbb{B}italic_r blackboard_B ⊆ roman_Λ ⊆ italic_R blackboard_B). If we execute AdaBGD (Algorithm 4) over ΛΛ\Lambdaroman_Λ with loss functions t(𝛌)=𝐐(t),𝛌Vgt(𝛌)subscript𝑡𝛌𝐐𝑡𝛌𝑉subscript𝑔𝑡𝛌\ell_{t}(\bm{\lambda})=\langle\bm{Q}(t),\bm{\lambda}\rangle-Vg_{t}(\bm{\lambda})roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ) = ⟨ bold_italic_Q ( italic_t ) , bold_italic_λ ⟩ - italic_V italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ) and parameters ηt,δt,αtsubscript𝜂𝑡subscript𝛿𝑡subscript𝛼𝑡\eta_{t},\delta_{t},\alpha_{t}italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT defined as

ηtsubscript𝜂𝑡\displaystyle\eta_{t}italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =(CλT1/2δλ/(CλT1/2δλ)7/3(4r3d2)28/9(M+R)4/3+CλT1/2δλ(r3d2VG2/L)4/3+s=1t((𝒒s+VG)2(𝒒s2+VL)2)1/3)3/4,\displaystyle=\left(C^{\lambda}T^{1/2-\delta_{\lambda}}\middle/\begin{subarray% }{c}\left(C^{\lambda}T^{1/2-\delta_{\lambda}}\right)^{7/3}\left(4r^{-3}d^{2}% \right)^{28/9}\left(M+R\right)^{4/3}+\\ C^{\lambda}T^{1/2-\delta_{\lambda}}(r^{-3}d^{2}VG^{2}/L)^{4/3}+\\ \sum_{s=1}^{t}\left((\lVert\bm{q}_{s}\rVert_{\infty}+VG)^{2}(\lVert\bm{q}_{s}% \rVert_{2}+VL)^{2}\right)^{1/3}\end{subarray}\right)^{3/4},= ( italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT / start_ARG start_ROW start_CELL ( italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 7 / 3 end_POSTSUPERSCRIPT ( 4 italic_r start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 28 / 9 end_POSTSUPERSCRIPT ( italic_M + italic_R ) start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT + end_CELL end_ROW start_ROW start_CELL italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_r start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_V italic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_L ) start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT + end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( ( ∥ bold_italic_q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT + italic_V italic_G ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ∥ bold_italic_q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_V italic_L ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT ,
δtsubscript𝛿𝑡\displaystyle\delta_{t}italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =(ηtd2(𝑸(t)+VG)2(𝑸(t)2+VL))1/3,αt=δtr,formulae-sequenceabsentsuperscriptsubscript𝜂𝑡superscript𝑑2superscriptsubscriptdelimited-∥∥𝑸𝑡𝑉𝐺2subscriptdelimited-∥∥𝑸𝑡2𝑉𝐿13subscript𝛼𝑡subscript𝛿𝑡𝑟\displaystyle=\left(\eta_{t}d^{2}\frac{(\lVert\bm{Q}(t)\rVert_{\infty}+VG)^{2}% }{(\lVert\bm{Q}(t)\rVert_{2}+VL)}\right)^{1/3},\quad\alpha_{t}=\frac{\delta_{t% }}{r},= ( italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT + italic_V italic_G ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_V italic_L ) end_ARG ) start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT , italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_r end_ARG , (23)

its outputs 𝛌(1),𝛌(2),,𝛌(T)Λ𝛌1𝛌2𝛌𝑇Λ\bm{\lambda}(1),\bm{\lambda}(2),\ldots,\bm{\lambda}(T)\in\Lambdabold_italic_λ ( 1 ) , bold_italic_λ ( 2 ) , … , bold_italic_λ ( italic_T ) ∈ roman_Λ ensure

Proof.

For loss function t(𝝀)=𝑸(t),𝝀Vgt(𝝀)subscript𝑡𝝀𝑸𝑡𝝀𝑉subscript𝑔𝑡𝝀\ell_{t}(\bm{\lambda})=\langle\bm{Q}(t),\bm{\lambda}\rangle-Vg_{t}(\bm{\lambda})roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ) = ⟨ bold_italic_Q ( italic_t ) , bold_italic_λ ⟩ - italic_V italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ), it is bounded by Ct𝑸(t)+VGsubscript𝐶𝑡subscriptdelimited-∥∥𝑸𝑡𝑉𝐺C_{t}\triangleq\lVert\bm{Q}(t)\rVert_{\infty}+VGitalic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≜ ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT + italic_V italic_G and is Lt(𝑸(t)2+VL)subscript𝐿𝑡subscriptdelimited-∥∥𝑸𝑡2𝑉𝐿L_{t}\triangleq(\lVert\bm{Q}(t)\rVert_{2}+VL)italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≜ ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_V italic_L )-Lipschitz. As 𝑸(t)𝑸𝑡\bm{Q}(t)bold_italic_Q ( italic_t ) is revealed after the (t1)𝑡1(t-1)( italic_t - 1 )-th round, Gtsubscript𝐺𝑡G_{t}italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and Ltsubscript𝐿𝑡L_{t}italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are both t1subscript𝑡1\mathcal{F}_{t-1}caligraphic_F start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT-measurable. As sketched in the main text, we first regard ηtsubscript𝜂𝑡\eta_{t}italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT as a constant and tune δtsubscript𝛿𝑡\delta_{t}italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to minimize the summation term in Lemma 4.3.

Let δt=(ηtd2Ct2/Lt)1/3subscript𝛿𝑡superscriptsubscript𝜂𝑡superscript𝑑2superscriptsubscript𝐶𝑡2subscript𝐿𝑡13\delta_{t}=(\eta_{t}d^{2}C_{t}^{2}/L_{t})^{1/3}italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT and αt=δt/rsubscript𝛼𝑡subscript𝛿𝑡𝑟\alpha_{t}=\delta_{t}/ritalic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT / italic_r. Suppose that all conditions in Lemma 4.3 hold, then

𝔼[t=1T(t(𝝀t)t(𝝀t))]𝔼[7R2+4PTR4ηT+t=1T(ηtd2δt2Ct2+3Ltδt+LtαtR)]𝔼superscriptsubscript𝑡1𝑇subscript𝑡subscript𝝀𝑡subscript𝑡superscriptsubscript𝝀𝑡𝔼7superscript𝑅24subscript𝑃𝑇𝑅4subscript𝜂𝑇superscriptsubscript𝑡1𝑇subscript𝜂𝑡superscript𝑑2superscriptsubscript𝛿𝑡2superscriptsubscript𝐶𝑡23subscript𝐿𝑡subscript𝛿𝑡subscript𝐿𝑡subscript𝛼𝑡𝑅\displaystyle\quad\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}(\ell% _{t}(\bm{\lambda}_{t})-\ell_{t}(\bm{\lambda}_{t}^{\ast}))\right]\leq% \operatornamewithlimits{\mathbb{E}}\left[\frac{7R^{2}+4P_{T}R}{4\eta_{T}}+\sum% _{t=1}^{T}\left(\eta_{t}\frac{d^{2}}{\delta_{t}^{2}}C_{t}^{2}+3L_{t}\delta_{t}% +L_{t}\alpha_{t}R\right)\right]blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) ] ≤ blackboard_E [ divide start_ARG 7 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 4 italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT italic_R end_ARG start_ARG 4 italic_η start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT divide start_ARG italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 3 italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_R ) ]
=𝒪(𝔼[R2+CrT1/2δrRηT+t=1T(ηtd2(𝑸(t)+VG)2(𝑸(t)2+VL)2)1/3Rr]).absent𝒪𝔼superscript𝑅2superscript𝐶𝑟superscript𝑇12subscript𝛿𝑟𝑅subscript𝜂𝑇superscriptsubscript𝑡1𝑇superscriptsubscript𝜂𝑡superscript𝑑2superscriptsubscriptdelimited-∥∥𝑸𝑡𝑉𝐺2superscriptsubscriptdelimited-∥∥𝑸𝑡2𝑉𝐿213𝑅𝑟\displaystyle=\operatorname{\mathcal{O}}\left(\operatornamewithlimits{\mathbb{% E}}\left[\frac{R^{2}+C^{r}T^{1/2-\delta_{r}}R}{\eta_{T}}+\sum_{t=1}^{T}\left(% \eta_{t}d^{2}(\lVert\bm{Q}(t)\rVert_{\infty}+VG)^{2}(\lVert\bm{Q}(t)\rVert_{2}% +VL)^{2}\right)^{1/3}\frac{R}{r}\right]\right).= caligraphic_O ( blackboard_E [ divide start_ARG italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_C start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_R end_ARG start_ARG italic_η start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT + italic_V italic_G ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_V italic_L ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT divide start_ARG italic_R end_ARG start_ARG italic_r end_ARG ] ) . (24)

We first only keep the last term in Equation 23, i.e., let

ηt=(CrT1/2δr/s=1t((𝒒s+VG)2(𝒒s2+VL)2)1/3)3/4,\eta_{t}=\left(C^{r}T^{1/2-\delta_{r}}\middle/\sum_{s=1}^{t}\left((\lVert\bm{q% }_{s}\rVert_{\infty}+VG)^{2}(\lVert\bm{q}_{s}\rVert_{2}+VL)^{2}\right)^{1/3}% \right)^{3/4},italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_C start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT / ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( ( ∥ bold_italic_q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT + italic_V italic_G ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ∥ bold_italic_q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_V italic_L ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT , (25)

then we have η1>η2>>ηTsubscript𝜂1subscript𝜂2subscript𝜂𝑇\eta_{1}>\eta_{2}>\cdots>\eta_{T}italic_η start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > italic_η start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > ⋯ > italic_η start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT. Let’s first pretend the other condition of αt<1subscript𝛼𝑡1\alpha_{t}<1italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT < 1 from Lemma 4.3 also holds at this moment. Lemma D.1 reveals that if x1,x2,,xT0subscript𝑥1subscript𝑥2subscript𝑥𝑇0x_{1},x_{2},\ldots,x_{T}\geq 0italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ≥ 0, then t=1Txt/(stxs)1/4=𝒪((t=1Txt)3/4)\left.\sum_{t=1}^{T}x_{t}\middle/(\sum_{s\leq t}x_{s})^{1/4}\right.=% \operatorname{\mathcal{O}}\left(\left(\sum_{t=1}^{T}x_{t}\right)^{3/4}\right)∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT / ( ∑ start_POSTSUBSCRIPT italic_s ≤ italic_t end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT = caligraphic_O ( ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT ). Plugging in xt=(𝒒s+VG)2(𝒒s2+VL)2)1/3x_{t}=\left(\lVert\bm{q}_{s}\rVert_{\infty}+VG)^{2}(\lVert\bm{q}_{s}\rVert_{2}% +VL)^{2}\right)^{1/3}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( ∥ bold_italic_q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT + italic_V italic_G ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ∥ bold_italic_q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_V italic_L ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT,

𝔼[t=1T(t(𝝀t)t(𝝀t))]𝔼superscriptsubscript𝑡1𝑇subscript𝑡subscript𝝀𝑡subscript𝑡superscriptsubscript𝝀𝑡\displaystyle\quad\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}(\ell% _{t}(\bm{\lambda}_{t})-\ell_{t}(\bm{\lambda}_{t}^{\ast}))\right]blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) ]
=𝒪(𝔼[(CrT1/2δr)1/4R(t=1T((𝑸(t)+VG)2(𝑸(t)2+VL)2)1/3)3/4]+\displaystyle=\operatorname{\mathcal{O}}\left(\operatornamewithlimits{\mathbb{% E}}\left[(C^{r}T^{1/2-\delta_{r}})^{1/4}R\left(\sum_{t=1}^{T}\left((\lVert\bm{% Q}(t)\rVert_{\infty}+VG)^{2}(\lVert\bm{Q}(t)\rVert_{2}+VL)^{2}\right)^{1/3}% \right)^{3/4}\right]\right.+= caligraphic_O ( blackboard_E [ ( italic_C start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT italic_R ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT + italic_V italic_G ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_V italic_L ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT ] +
𝔼[Rr(CrT1/2δr)1/4d2/3(t=1T((𝑸(t)+VG)2(𝑸(t)2+VL)2)1/3)3/4])\displaystyle\qquad\left.\operatornamewithlimits{\mathbb{E}}\left[\frac{R}{r}% \left(C^{r}T^{1/2-\delta_{r}}\right)^{1/4}d^{2/3}\left(\sum_{t=1}^{T}\left((% \lVert\bm{Q}(t)\rVert_{\infty}+VG)^{2}(\lVert\bm{Q}(t)\rVert_{2}+VL)^{2}\right% )^{1/3}\right)^{3/4}\right]\right)blackboard_E [ divide start_ARG italic_R end_ARG start_ARG italic_r end_ARG ( italic_C start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT + italic_V italic_G ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_V italic_L ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT ] )
=𝒪(𝔼[(Rrd2/3+R)(CrT1/2δr)1/4(t=1T(𝑸(t)2+V(L+G))4/3)3/4]),absent𝒪𝔼𝑅𝑟superscript𝑑23𝑅superscriptsuperscript𝐶𝑟superscript𝑇12subscript𝛿𝑟14superscriptsuperscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥𝑸𝑡2𝑉𝐿𝐺4334\displaystyle=\operatorname{\mathcal{O}}\left(\operatornamewithlimits{\mathbb{% E}}\left[\left(\frac{R}{r}d^{2/3}+R\right)(C^{r}T^{1/2-\delta_{r}})^{1/4}\left% (\sum_{t=1}^{T}\left(\lVert\bm{Q}(t)\rVert_{2}+V(L+G)\right)^{4/3}\right)^{3/4% }\right]\right),= caligraphic_O ( blackboard_E [ ( divide start_ARG italic_R end_ARG start_ARG italic_r end_ARG italic_d start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT + italic_R ) ( italic_C start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_V ( italic_L + italic_G ) ) start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT ] ) ,

where the last step utilizes 𝑸(t)𝑸(t)2subscriptdelimited-∥∥𝑸𝑡subscriptdelimited-∥∥𝑸𝑡2\lVert\bm{Q}(t)\rVert_{\infty}\leq\lVert\bm{Q}(t)\rVert_{2}∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

This almost recovers our conclusion, so it only remains to ensure αt=δt/r<1subscript𝛼𝑡subscript𝛿𝑡𝑟1\alpha_{t}=\delta_{t}/r<1italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT / italic_r < 1, which is equivalent to ηtd2Ct2/Lt<r3subscript𝜂𝑡superscript𝑑2superscriptsubscript𝐶𝑡2subscript𝐿𝑡superscript𝑟3\eta_{t}d^{2}C_{t}^{2}/L_{t}<r^{3}italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT < italic_r start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT, i.e.,

ηt1>r3d2(𝑸(t)+VG)2𝑸(t)2+VL.superscriptsubscript𝜂𝑡1superscript𝑟3superscript𝑑2superscriptsubscriptdelimited-∥∥𝑸𝑡𝑉𝐺2subscriptdelimited-∥∥𝑸𝑡2𝑉𝐿\eta_{t}^{-1}>r^{-3}d^{2}\frac{(\lVert\bm{Q}(t)\rVert_{\infty}+VG)^{2}}{\lVert% \bm{Q}(t)\rVert_{2}+VL}.italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT > italic_r start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT + italic_V italic_G ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_V italic_L end_ARG .

Consider adding a term X𝑋Xitalic_X into the denominator of Equation 25. As 𝑸(t)𝑸(t)2subscriptdelimited-∥∥𝑸𝑡subscriptdelimited-∥∥𝑸𝑡2\lVert\bm{Q}(t)\rVert_{\infty}\leq\lVert\bm{Q}(t)\rVert_{2}∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, (𝑸(t)+VG)2/(𝑸(t)2+VL)2(𝑸(t)2/𝑸(t)2)+2((VG)2/(VL))2𝑸(t)+2VG2/Lsuperscriptsubscriptdelimited-∥∥𝑸𝑡𝑉𝐺2subscriptdelimited-∥∥𝑸𝑡2𝑉𝐿2superscriptsubscriptdelimited-∥∥𝑸𝑡2subscriptdelimited-∥∥𝑸𝑡22superscript𝑉𝐺2𝑉𝐿2subscriptdelimited-∥∥𝑸𝑡2𝑉superscript𝐺2𝐿(\lVert\bm{Q}(t)\rVert_{\infty}+VG)^{2}/(\lVert\bm{Q}(t)\rVert_{2}+VL)\leq 2(% \lVert\bm{Q}(t)\rVert_{\infty}^{2}/\lVert\bm{Q}(t)\rVert_{2})+2((VG)^{2}/(VL))% \leq 2\lVert\bm{Q}(t)\rVert_{\infty}+2VG^{2}/L( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT + italic_V italic_G ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_V italic_L ) ≤ 2 ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) + 2 ( ( italic_V italic_G ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / ( italic_V italic_L ) ) ≤ 2 ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT + 2 italic_V italic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_L. So we only need to show

(X+s=1t((𝒒s+VG)2(𝒒s2+VL)2)1/3)3/4(CrT1/2δr)3/4>2r3d2(𝑸(t)+VG2L).superscript𝑋superscriptsubscript𝑠1𝑡superscriptsuperscriptsubscriptdelimited-∥∥subscript𝒒𝑠𝑉𝐺2superscriptsubscriptdelimited-∥∥subscript𝒒𝑠2𝑉𝐿21334superscriptsuperscript𝐶𝑟superscript𝑇12subscript𝛿𝑟342superscript𝑟3superscript𝑑2subscriptdelimited-∥∥𝑸𝑡𝑉superscript𝐺2𝐿\displaystyle\frac{\left(X+\sum_{s=1}^{t}\left((\lVert\bm{q}_{s}\rVert_{\infty% }+VG)^{2}(\lVert\bm{q}_{s}\rVert_{2}+VL)^{2}\right)^{1/3}\right)^{3/4}}{(C^{r}% T^{1/2-\delta_{r}})^{3/4}}>2r^{-3}d^{2}\left(\lVert\bm{Q}(t)\rVert_{\infty}+V% \frac{G^{2}}{L}\right).divide start_ARG ( italic_X + ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( ( ∥ bold_italic_q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT + italic_V italic_G ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ∥ bold_italic_q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_V italic_L ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT end_ARG start_ARG ( italic_C start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT end_ARG > 2 italic_r start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT + italic_V divide start_ARG italic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_L end_ARG ) .

We decompose X𝑋Xitalic_X into X1subscript𝑋1X_{1}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and X2subscript𝑋2X_{2}italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and use them to cancel the two terms on the RHS, respectively. That is, we want

X1+s=1t(𝒒s2𝒒s22)1/3subscript𝑋1superscriptsubscript𝑠1𝑡superscriptsuperscriptsubscriptdelimited-∥∥subscript𝒒𝑠2superscriptsubscriptdelimited-∥∥subscript𝒒𝑠2213\displaystyle X_{1}+\sum_{s=1}^{t}\left(\lVert\bm{q}_{s}\rVert_{\infty}^{2}% \lVert\bm{q}_{s}\rVert_{2}^{2}\right)^{1/3}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( ∥ bold_italic_q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ bold_italic_q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT >CrT1/2δr(r3d2𝑸(t))4/3,absentsuperscript𝐶𝑟superscript𝑇12subscript𝛿𝑟superscriptsuperscript𝑟3superscript𝑑2subscriptdelimited-∥∥𝑸𝑡43\displaystyle>C^{r}T^{1/2-\delta_{r}}\left(r^{-3}d^{2}\lVert\bm{Q}(t)\rVert_{% \infty}\right)^{4/3},> italic_C start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_r start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT ,
X2+t3/4(V4G2L2)1/4subscript𝑋2superscript𝑡34superscriptsuperscript𝑉4superscript𝐺2superscript𝐿214\displaystyle X_{2}+t^{3/4}(V^{4}G^{2}L^{2})^{1/4}italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_t start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT ( italic_V start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT >CrT1/2δr(r3d2VG2/L)4/3.absentsuperscript𝐶𝑟superscript𝑇12subscript𝛿𝑟superscriptsuperscript𝑟3superscript𝑑2𝑉superscript𝐺2𝐿43\displaystyle>C^{r}T^{1/2-\delta_{r}}\left(r^{-3}d^{2}VG^{2}/L\right)^{4/3}.> italic_C start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_r start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_V italic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_L ) start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT .

We first craft X1subscript𝑋1X_{1}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. In Lemma D.2, we show that if x1=0subscript𝑥10x_{1}=0italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0, x2,,xT0subscript𝑥2subscript𝑥𝑇0x_{2},\ldots,x_{T}\geq 0italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ≥ 0, and |xt+1xt|1subscript𝑥𝑡1subscript𝑥𝑡1\lvert x_{t+1}-x_{t}\rvert\leq 1| italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | ≤ 1, then t=1Txt4/3(xT/4)7/3superscriptsubscript𝑡1𝑇superscriptsubscript𝑥𝑡43superscriptsubscript𝑥𝑇473\sum_{t=1}^{T}x_{t}^{4/3}\geq(x_{T}/4)^{7/3}∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT ≥ ( italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT / 4 ) start_POSTSUPERSCRIPT 7 / 3 end_POSTSUPERSCRIPT. Thus, using the fact that 𝑸(t)𝑸(t)2subscriptdelimited-∥∥𝑸𝑡subscriptdelimited-∥∥𝑸𝑡2\lVert\bm{Q}(t)\rVert_{\infty}\leq\lVert\bm{Q}(t)\rVert_{2}∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and the queue length increment bound |𝑸(t+1)𝑸(t)|(2NM+R)subscriptdelimited-∥∥𝑸𝑡1subscriptdelimited-∥∥𝑸𝑡2𝑁𝑀𝑅\lvert\lVert\bm{Q}(t+1)\rVert_{\infty}-\lVert\bm{Q}(t)\rVert_{\infty}\rvert% \leq(2NM+R)| ∥ bold_italic_Q ( italic_t + 1 ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT - ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT | ≤ ( 2 italic_N italic_M + italic_R ) (Lemma B.1), we can let xt=𝑸(t)/(2NM+R)subscript𝑥𝑡subscriptdelimited-∥∥𝑸𝑡2𝑁𝑀𝑅x_{t}=\lVert\bm{Q}(t)\rVert_{\infty}/(2NM+R)italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT / ( 2 italic_N italic_M + italic_R ) and lower bound the LHS by

X1+s=1t(𝒒s2𝒒s22)1/3X1+(𝑸(t)/4)7/3(2NM+R)X13/7((𝑸(t)/4)7/3(2NM+R))4/7,subscript𝑋1superscriptsubscript𝑠1𝑡superscriptsuperscriptsubscriptdelimited-∥∥subscript𝒒𝑠2superscriptsubscriptdelimited-∥∥subscript𝒒𝑠2213subscript𝑋1superscriptsubscriptdelimited-∥∥𝑸𝑡4732𝑁𝑀𝑅superscriptsubscript𝑋137superscriptsuperscriptsubscriptdelimited-∥∥𝑸𝑡4732𝑁𝑀𝑅47X_{1}+\sum_{s=1}^{t}\left(\lVert\bm{q}_{s}\rVert_{\infty}^{2}\lVert\bm{q}_{s}% \rVert_{2}^{2}\right)^{1/3}\geq X_{1}+\frac{\left(\lVert\bm{Q}(t)\rVert_{% \infty}/4\right)^{7/3}}{(2NM+R)}\geq X_{1}^{3/7}\left(\frac{\left(\lVert\bm{Q}% (t)\rVert_{\infty}/4\right)^{7/3}}{(2NM+R)}\right)^{4/7},italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( ∥ bold_italic_q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ bold_italic_q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT ≥ italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + divide start_ARG ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT / 4 ) start_POSTSUPERSCRIPT 7 / 3 end_POSTSUPERSCRIPT end_ARG start_ARG ( 2 italic_N italic_M + italic_R ) end_ARG ≥ italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 / 7 end_POSTSUPERSCRIPT ( divide start_ARG ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT / 4 ) start_POSTSUPERSCRIPT 7 / 3 end_POSTSUPERSCRIPT end_ARG start_ARG ( 2 italic_N italic_M + italic_R ) end_ARG ) start_POSTSUPERSCRIPT 4 / 7 end_POSTSUPERSCRIPT ,

where the inequality results from AM-GM inequality ax+bya+bxayba+b𝑎𝑥𝑏𝑦𝑎𝑏𝑎𝑏superscript𝑥𝑎superscript𝑦𝑏\frac{ax+by}{a+b}\geq\sqrt[a+b]{x^{a}y^{b}}divide start_ARG italic_a italic_x + italic_b italic_y end_ARG start_ARG italic_a + italic_b end_ARG ≥ nth-root start_ARG italic_a + italic_b end_ARG start_ARG italic_x start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT italic_y start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT end_ARG. Therefore, we only need to ensure X13/7CrT1/2δr(4r3d2)4/3(2NM+R)4/7superscriptsubscript𝑋137superscript𝐶𝑟superscript𝑇12subscript𝛿𝑟superscript4superscript𝑟3superscript𝑑243superscript2𝑁𝑀𝑅47X_{1}^{3/7}\geq C^{r}T^{1/2-\delta_{r}}(4r^{-3}d^{2})^{4/3}(2NM+R)^{4/7}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 / 7 end_POSTSUPERSCRIPT ≥ italic_C start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( 4 italic_r start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 4 / 7 end_POSTSUPERSCRIPT. Setting X1subscript𝑋1X_{1}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT as following then suffices.

X1=(CrT1/2δr)7/3(4r3d2)28/9((2NM+R))4/3.subscript𝑋1superscriptsuperscript𝐶𝑟superscript𝑇12subscript𝛿𝑟73superscript4superscript𝑟3superscript𝑑2289superscript2𝑁𝑀𝑅43X_{1}=\left(C^{r}T^{1/2-\delta_{r}}\right)^{7/3}\left(4r^{-3}d^{2}\right)^{28/% 9}\left((2NM+R)\right)^{4/3}.italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ( italic_C start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 7 / 3 end_POSTSUPERSCRIPT ( 4 italic_r start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 28 / 9 end_POSTSUPERSCRIPT ( ( 2 italic_N italic_M + italic_R ) ) start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT .

For X2subscript𝑋2X_{2}italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, we only need to set X2=CrT1/2δr(r3d2VG2/L)4/3subscript𝑋2superscript𝐶𝑟superscript𝑇12subscript𝛿𝑟superscriptsuperscript𝑟3superscript𝑑2𝑉superscript𝐺2𝐿43X_{2}=C^{r}T^{1/2-\delta_{r}}(r^{-3}d^{2}VG^{2}/L)^{4/3}italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_C start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_r start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_V italic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_L ) start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT. Plugging back X=X1+X2𝑋subscript𝑋1subscript𝑋2X=X_{1}+X_{2}italic_X = italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, we get the learning rate scheduling defined in Equation 23. Now we verify that the two terms on the RHS of Equation 24 does not increase too much due to X𝑋Xitalic_X. For the first term,

𝔼[R2+CrT1/2δrRηT]𝔼superscript𝑅2superscript𝐶𝑟superscript𝑇12subscript𝛿𝑟𝑅subscript𝜂𝑇\displaystyle\quad\operatornamewithlimits{\mathbb{E}}\left[\frac{R^{2}+C^{r}T^% {1/2-\delta_{r}}R}{\eta_{T}}\right]blackboard_E [ divide start_ARG italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_C start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_R end_ARG start_ARG italic_η start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG ]
=[R2+CrT1/2δrR(CrT1/2δr)3/4(X+t=1T((𝑸(t)+VG)2(𝑸(t)2+VL)2)1/3)3/4]absentdelimited-[]superscript𝑅2superscript𝐶𝑟superscript𝑇12subscript𝛿𝑟𝑅superscriptsuperscript𝐶𝑟superscript𝑇12subscript𝛿𝑟34superscript𝑋superscriptsubscript𝑡1𝑇superscriptsuperscriptsubscriptdelimited-∥∥𝑸𝑡𝑉𝐺2superscriptsubscriptdelimited-∥∥𝑸𝑡2𝑉𝐿21334\displaystyle=\left[\frac{R^{2}+C^{r}T^{1/2-\delta_{r}}R}{(C^{r}T^{1/2-\delta_% {r}})^{3/4}}\left(X+\sum_{t=1}^{T}\left((\lVert\bm{Q}(t)\rVert_{\infty}+VG)^{2% }(\lVert\bm{Q}(t)\rVert_{2}+VL)^{2}\right)^{1/3}\right)^{3/4}\right]= [ divide start_ARG italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_C start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_R end_ARG start_ARG ( italic_C start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT end_ARG ( italic_X + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT + italic_V italic_G ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_V italic_L ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT ]
=𝒪(𝔼[(CrT1/2δrR)(XCrT1/2δr)3/4+(CrT1/2δr)1/4R(t=1T(𝑸(t)2+V(L+G))4/3)3/4])absent𝒪𝔼superscript𝐶𝑟superscript𝑇12subscript𝛿𝑟𝑅superscript𝑋superscript𝐶𝑟superscript𝑇12subscript𝛿𝑟34superscriptsuperscript𝐶𝑟superscript𝑇12subscript𝛿𝑟14𝑅superscriptsuperscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥𝑸𝑡2𝑉𝐿𝐺4334\displaystyle=\operatorname{\mathcal{O}}\left(\operatornamewithlimits{\mathbb{% E}}\left[\left(C^{r}T^{1/2-\delta_{r}}R\right)\left(\frac{X}{C^{r}T^{1/2-% \delta_{r}}}\right)^{3/4}+(C^{r}T^{1/2-\delta_{r}})^{1/4}R\left(\sum_{t=1}^{T}% \left(\lVert\bm{Q}(t)\rVert_{2}+V(L+G)\right)^{4/3}\right)^{3/4}\right]\right)= caligraphic_O ( blackboard_E [ ( italic_C start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_R ) ( divide start_ARG italic_X end_ARG start_ARG italic_C start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT + ( italic_C start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT italic_R ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_V ( italic_L + italic_G ) ) start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT ] )
=𝒪(𝔼[R(2NM+R)r7d14/3(CrT1/2δr)2+R(CrT1/2δr)1/4(s=1t(𝒒s2+V(L+G))4/3)3/4]).absent𝒪𝔼𝑅2𝑁𝑀𝑅superscript𝑟7superscript𝑑143superscriptsuperscript𝐶𝑟superscript𝑇12subscript𝛿𝑟2𝑅superscriptsuperscript𝐶𝑟superscript𝑇12subscript𝛿𝑟14superscriptsuperscriptsubscript𝑠1𝑡superscriptsubscriptdelimited-∥∥subscript𝒒𝑠2𝑉𝐿𝐺4334\displaystyle=\operatorname{\mathcal{O}}\left(\operatornamewithlimits{\mathbb{% E}}\left[\frac{R(2NM+R)}{r^{7}}d^{14/3}(C^{r}T^{1/2-\delta_{r}})^{2}+R(C^{r}T^% {1/2-\delta_{r}})^{1/4}\left(\sum_{s=1}^{t}\left(\lVert\bm{q}_{s}\rVert_{2}+V(% L+G)\right)^{4/3}\right)^{3/4}\right]\right).= caligraphic_O ( blackboard_E [ divide start_ARG italic_R ( 2 italic_N italic_M + italic_R ) end_ARG start_ARG italic_r start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT end_ARG italic_d start_POSTSUPERSCRIPT 14 / 3 end_POSTSUPERSCRIPT ( italic_C start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_R ( italic_C start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( ∥ bold_italic_q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_V ( italic_L + italic_G ) ) start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT ] ) .

For the second term, as ηtsubscript𝜂𝑡\eta_{t}italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is strictly smaller than that in Equation 25, we can again apply Lemma D.1 (if x1,x2,,xT0subscript𝑥1subscript𝑥2subscript𝑥𝑇0x_{1},x_{2},\ldots,x_{T}\geq 0italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ≥ 0, then t=1Txt/(stxs)1/4=𝒪((t=1Txt)3/4)\left.\sum_{t=1}^{T}x_{t}\middle/(\sum_{s\leq t}x_{s})^{1/4}\right.=% \operatorname{\mathcal{O}}\left(\left(\sum_{t=1}^{T}x_{t}\right)^{3/4}\right)∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT / ( ∑ start_POSTSUBSCRIPT italic_s ≤ italic_t end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT = caligraphic_O ( ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT )) to conclude

𝔼[t=1T(ηtd2(𝑸(t)+VG)2(𝑸(t)2+VL)2)1/3Rr]𝔼superscriptsubscript𝑡1𝑇superscriptsubscript𝜂𝑡superscript𝑑2superscriptsubscriptdelimited-∥∥𝑸𝑡𝑉𝐺2superscriptsubscriptdelimited-∥∥𝑸𝑡2𝑉𝐿213𝑅𝑟\displaystyle\quad\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\left% (\eta_{t}d^{2}(\lVert\bm{Q}(t)\rVert_{\infty}+VG)^{2}(\lVert\bm{Q}(t)\rVert_{2% }+VL)^{2}\right)^{1/3}\frac{R}{r}\right]blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT + italic_V italic_G ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_V italic_L ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT divide start_ARG italic_R end_ARG start_ARG italic_r end_ARG ]
=𝔼[Rrd2/3(CrT1/2δr)1/4(t=1T(𝑸(t)2+V(L+G))4/3)3/4].absent𝔼𝑅𝑟superscript𝑑23superscriptsuperscript𝐶𝑟superscript𝑇12subscript𝛿𝑟14superscriptsuperscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥𝑸𝑡2𝑉𝐿𝐺4334\displaystyle=\operatornamewithlimits{\mathbb{E}}\left[\frac{R}{r}d^{2/3}(C^{r% }T^{1/2-\delta_{r}})^{1/4}\left(\sum_{t=1}^{T}\left(\lVert\bm{Q}(t)\rVert_{2}+% V(L+G)\right)^{4/3}\right)^{3/4}\right].= blackboard_E [ divide start_ARG italic_R end_ARG start_ARG italic_r end_ARG italic_d start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT ( italic_C start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_V ( italic_L + italic_G ) ) start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT ] .

Summing up two parts gives our conclusion. ∎

C.5 Main Theorem for Multi-Hop Utility Maximization (Proof of Theorem 4.5)

Theorem C.6 (Restatement of Theorem 4.5; Main Theorem for Multi-Hop Utility Maximization).

Suppose that the feasible set of arrival rates vector ΛΛ\Lambdaroman_Λ is bounded by [r,R]𝑟𝑅[r,R][ italic_r , italic_R ]. Assume all (unknown) utility functions gtsubscript𝑔𝑡g_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to be concave, L𝐿Litalic_L-Lipschitz, and [G,G]𝐺𝐺[-G,G][ - italic_G , italic_G ]-bounded. Consider a reference action sequence {(𝐚̊(t),𝛌̊(t))}t[T]subscript̊𝐚𝑡̊𝛌𝑡𝑡delimited-[]𝑇\{(\mathring{\bm{a}}(t),\mathring{\bm{\lambda}}(t))\}_{t\in[T]}{ ( over̊ start_ARG bold_italic_a end_ARG ( italic_t ) , over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT satisfying 2, such that their path lengths satisfy

Ptas=1t1𝒂̊(s)𝒂̊(s+1)1Cat1/2δa,Ptλs=1t1𝝀̊(s)𝝀̊(s+1)1Cλt1/2δλ,t[T].formulae-sequencesuperscriptsubscript𝑃𝑡𝑎superscriptsubscript𝑠1𝑡1subscriptdelimited-∥∥̊𝒂𝑠̊𝒂𝑠11superscript𝐶𝑎superscript𝑡12subscript𝛿𝑎superscriptsubscript𝑃𝑡𝜆superscriptsubscript𝑠1𝑡1subscriptdelimited-∥∥̊𝝀𝑠̊𝝀𝑠11superscript𝐶𝜆superscript𝑡12subscript𝛿𝜆for-all𝑡delimited-[]𝑇\displaystyle P_{t}^{a}\triangleq\sum_{s=1}^{t-1}\lVert\mathring{\bm{a}}(s)-% \mathring{\bm{a}}(s+1)\rVert_{1}\leq C^{a}t^{1/2-\delta_{a}},\leavevmode% \nobreak\ P_{t}^{\lambda}\triangleq\sum_{s=1}^{t-1}\lVert\mathring{\bm{\lambda% }}(s)-\mathring{\bm{\lambda}}(s+1)\rVert_{1}\leq C^{\lambda}t^{1/2-\delta_{% \lambda}},\quad\forall t\in[T].italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ≜ ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∥ over̊ start_ARG bold_italic_a end_ARG ( italic_s ) - over̊ start_ARG bold_italic_a end_ARG ( italic_s + 1 ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_C start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT ≜ ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∥ over̊ start_ARG bold_italic_λ end_ARG ( italic_s ) - over̊ start_ARG bold_italic_λ end_ARG ( italic_s + 1 ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , ∀ italic_t ∈ [ italic_T ] .

Here, M,R,r,L,G,Ca,δa,Cλ,δλ𝑀𝑅𝑟𝐿𝐺superscript𝐶𝑎subscript𝛿𝑎superscript𝐶𝜆subscript𝛿𝜆M,R,r,L,G,C^{a},\delta_{a},C^{\lambda},\delta_{\lambda}italic_M , italic_R , italic_r , italic_L , italic_G , italic_C start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT , italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT , italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT are assumed to be known constants, whereas the specific {(𝐚̊(t),𝛌̊(t))}t[T]subscript̊𝐚𝑡̊𝛌𝑡𝑡delimited-[]𝑇\{(\mathring{\bm{a}}(t),\mathring{\bm{\lambda}}(t))\}_{t\in[T]}{ ( over̊ start_ARG bold_italic_a end_ARG ( italic_t ) , over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT remains unknown. If we execute the UMO2 framework in Algorithm 3 with the AdaPFOL sub-rountine given in Algorithm 2 and the AdaBGD sub-routine given in Algorithm 4, when T𝑇Titalic_T is large enough such that V=oT(min{T2δa/3,T2δλ/7})𝑉subscript𝑜𝑇superscript𝑇2subscript𝛿𝑎3superscript𝑇2subscript𝛿𝜆7V=o_{T}(\min\{T^{2\delta_{a}/3},T^{2\delta_{\lambda}/7}\})italic_V = italic_o start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( roman_min { italic_T start_POSTSUPERSCRIPT 2 italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT / 3 end_POSTSUPERSCRIPT , italic_T start_POSTSUPERSCRIPT 2 italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT / 7 end_POSTSUPERSCRIPT } ), the following inequalities hold simultaneously:

That is, when T0much-greater-than𝑇0T\gg 0italic_T ≫ 0, our algorithm not only stabilizes the system so that 1T𝔼[t=1T𝐐(t)1]=𝒪T(1)1𝑇𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝐐𝑡1subscript𝒪𝑇1\frac{1}{T}\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{Q}% (t)\rVert_{1}\right]=\operatorname{\mathcal{O}}_{T}(1)divide start_ARG 1 end_ARG start_ARG italic_T end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( 1 ), but also enjoys an average utility approaching that of the reference policy polynomially fast, i.e., 1T𝔼[t=1T(gt(𝛌̊(t))gt(𝛌(t)))]=𝒪T(V1)1𝑇𝔼superscriptsubscript𝑡1𝑇subscript𝑔𝑡̊𝛌𝑡subscript𝑔𝑡𝛌𝑡subscript𝒪𝑇superscript𝑉1\frac{1}{T}\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\left(g_{t}(% \mathring{\bm{\lambda}}(t))-g_{t}(\bm{\lambda}(t))\right)\right]=\operatorname% {\mathcal{O}}_{T}(V^{-1})divide start_ARG 1 end_ARG start_ARG italic_T end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) ) ] = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) – the utility maximization objective Equation 3 is ensured.

Proof.

As sketched in the main text, the first step is to i) combine algorithmic guarantees for AdaPFOL (Theorem 3.5) and AdaBGD (Theorem 4.4), ii) plug in the network stability assumption Lemma 4.1, and iii) make use of the Lyapunov DPP analysis in Lemma C.2. Deferring these calculations to Lemma C.7, we can get

𝔼[t=1T𝑸(t)1]𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1\displaystyle\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{% Q}(t)\rVert_{1}\right]blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] VϵW𝔼[t=1T(gt(𝝀̊(t))gt(𝝀(t)))]+f(T)+absent𝑉subscriptitalic-ϵ𝑊𝔼superscriptsubscript𝑡1𝑇subscript𝑔𝑡̊𝝀𝑡subscript𝑔𝑡𝝀𝑡limit-from𝑓𝑇\displaystyle\leq-\frac{V}{\epsilon_{W}}\operatornamewithlimits{\mathbb{E}}% \left[\sum_{t=1}^{T}\bigg{(}g_{t}(\mathring{\bm{\lambda}}(t))-g_{t}(\bm{% \lambda}(t))\bigg{)}\right]+f(T)+≤ - divide start_ARG italic_V end_ARG start_ARG italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) ) ] + italic_f ( italic_T ) +
g(T)𝔼[t=1T𝑸(t)1]3/4log𝔼[Mt=1T𝑸(t)1]+\displaystyle\quad g(T)\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}% \lVert\bm{Q}(t)\rVert_{1}\right]^{3/4}\log\operatornamewithlimits{\mathbb{E}}% \left[M\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{1}\right]+italic_g ( italic_T ) blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT roman_log blackboard_E [ italic_M ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] +
h(T)𝔼[t=1T𝑸(t)1]7/8.\displaystyle\quad h(T)\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}% \lVert\bm{Q}(t)\rVert_{1}\right]^{7/8}.italic_h ( italic_T ) blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT 7 / 8 end_POSTSUPERSCRIPT . (26)

where

f(T)𝑓𝑇\displaystyle f(T)italic_f ( italic_T ) =ϵW1𝒪((N2(2NM+R)2+ϵWN2(2NM+R))CWT+R(2NM+R)r7d14/3(CλT1/2δλ)2+\displaystyle=\epsilon_{W}^{-1}\operatorname{\mathcal{O}}\left((N^{2}(2NM+R)^{% 2}+\epsilon_{W}N^{2}(2NM+R))C_{W}T+\frac{R(2NM+R)}{r^{7}}d^{14/3}(C^{\lambda}T% ^{1/2-\delta_{\lambda}})^{2}\right.+= italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_O ( ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) ) italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_T + divide start_ARG italic_R ( 2 italic_N italic_M + italic_R ) end_ARG start_ARG italic_r start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT end_ARG italic_d start_POSTSUPERSCRIPT 14 / 3 end_POSTSUPERSCRIPT ( italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT +
(Rrd2/3+R)(CrT1/2δr)1/4V(L+G)T3/4+12N2((NM)2+2(NM)2+2R2)T),\displaystyle\qquad\qquad\left.\left(\frac{R}{r}d^{2/3}+R\right)(C^{r}T^{1/2-% \delta_{r}})^{1/4}V(L+G)T^{3/4}+\frac{1}{2}N^{2}((NM)^{2}+2(NM)^{2}+2R^{2})T% \right),( divide start_ARG italic_R end_ARG start_ARG italic_r end_ARG italic_d start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT + italic_R ) ( italic_C start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT italic_V ( italic_L + italic_G ) italic_T start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_T ) ,
g(T)𝑔𝑇\displaystyle g(T)italic_g ( italic_T ) =ϵW1𝒪((2NM+R)1/4M1+CaT1/2δalogT),absentsuperscriptsubscriptitalic-ϵ𝑊1𝒪superscript2𝑁𝑀𝑅14𝑀1superscript𝐶𝑎superscript𝑇12subscript𝛿𝑎𝑇\displaystyle=\epsilon_{W}^{-1}\operatorname{\mathcal{O}}\left((2NM+R)^{1/4}M% \sqrt{1+C^{a}T^{1/2-\delta_{a}}}\log T\right),= italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_O ( ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT italic_M square-root start_ARG 1 + italic_C start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG roman_log italic_T ) ,
h(T)𝑇\displaystyle h(T)italic_h ( italic_T ) =ϵW1𝒪((2NM+R)1/8(Rrd2/3+R)(CλT1/2δλ)1/4).absentsuperscriptsubscriptitalic-ϵ𝑊1𝒪superscript2𝑁𝑀𝑅18𝑅𝑟superscript𝑑23𝑅superscriptsuperscript𝐶𝜆superscript𝑇12subscript𝛿𝜆14\displaystyle=\epsilon_{W}^{-1}\operatorname{\mathcal{O}}\left((2NM+R)^{1/8}% \left(\frac{R}{r}d^{2/3}+R\right)(C^{\lambda}T^{1/2-\delta_{\lambda}})^{1/4}% \right).= italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_O ( ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT ( divide start_ARG italic_R end_ARG start_ARG italic_r end_ARG italic_d start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT + italic_R ) ( italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT ) .

Step 1 (Develop a Coarse Average Queue Length Bound).

Recall the assumption that gtsubscript𝑔𝑡g_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is uniformly bounded by [G,G]𝐺𝐺[-G,G][ - italic_G , italic_G ]. Therefore, the first term on the RHS of Equation 26 is bounded by 2VϵWGT2𝑉subscriptitalic-ϵ𝑊𝐺𝑇2\frac{V}{\epsilon_{W}}GT2 divide start_ARG italic_V end_ARG start_ARG italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT end_ARG italic_G italic_T in absolute value. In Lemma D.6, we develop a self-bounding property that says, if yf+y3/4glogy+y7/8𝑦𝑓superscript𝑦34𝑔𝑦superscript𝑦78y\leq f+y^{3/4}g\log y+y^{7/8}italic_y ≤ italic_f + italic_y start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT italic_g roman_log italic_y + italic_y start_POSTSUPERSCRIPT 7 / 8 end_POSTSUPERSCRIPT, then y=𝒪(f+g4log8(2(f1/8+g1/2+h)2)+h8)𝑦𝒪𝑓superscript𝑔4superscript82superscriptsuperscript𝑓18superscript𝑔122superscript8y=\operatorname{\mathcal{O}}\left(f+g^{4}\log^{8}\left(2(f^{1/8}+g^{1/2}+h)^{2% }\right)+h^{8}\right)italic_y = caligraphic_O ( italic_f + italic_g start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + italic_h ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_h start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT ). Therefore, applying it to Equation 26, we have

𝔼[t=1T𝑸(t)1]𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1\displaystyle\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{% Q}(t)\rVert_{1}\right]blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] 𝒪(2VϵWGT+f(T)+g(T)4log8(2(f(T)1/8+g(T)1/2+h(T))2)+h(T)8)absent𝒪2𝑉subscriptitalic-ϵ𝑊𝐺𝑇𝑓𝑇𝑔superscript𝑇4superscript82superscript𝑓superscript𝑇18𝑔superscript𝑇12𝑇2superscript𝑇8\displaystyle\leq\operatorname{\mathcal{O}}\left(2\frac{V}{\epsilon_{W}}GT+f(T% )+g(T)^{4}\log^{8}\left(2(f(T)^{1/8}+g(T)^{1/2}+h(T))^{2}\right)+h(T)^{8}\right)≤ caligraphic_O ( 2 divide start_ARG italic_V end_ARG start_ARG italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT end_ARG italic_G italic_T + italic_f ( italic_T ) + italic_g ( italic_T ) start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT ( 2 ( italic_f ( italic_T ) start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g ( italic_T ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + italic_h ( italic_T ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_h ( italic_T ) start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT )
=𝒪(VϵWGT+f(T))+𝒪T(T12δalog8(T1/4+T1/4δa/2+T1/4δλ/2)+T12δλ)absent𝒪𝑉subscriptitalic-ϵ𝑊𝐺𝑇𝑓𝑇subscript𝒪𝑇superscript𝑇12subscript𝛿𝑎superscript8superscript𝑇14superscript𝑇14subscript𝛿𝑎2superscript𝑇14subscript𝛿𝜆2superscript𝑇12subscript𝛿𝜆\displaystyle=\operatorname{\mathcal{O}}\left(\frac{V}{\epsilon_{W}}GT+f(T)% \right)+\operatorname{\mathcal{O}}_{T}\left(T^{1-2\delta_{a}}\log^{8}\left(T^{% 1/4}+T^{1/4-\delta_{a}/2}+T^{1/4-\delta_{\lambda}/2}\right)+T^{1-2\delta_{% \lambda}}\right)= caligraphic_O ( divide start_ARG italic_V end_ARG start_ARG italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT end_ARG italic_G italic_T + italic_f ( italic_T ) ) + caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT 1 - 2 italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT ( italic_T start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_T start_POSTSUPERSCRIPT 1 / 4 - italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT / 2 end_POSTSUPERSCRIPT + italic_T start_POSTSUPERSCRIPT 1 / 4 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT / 2 end_POSTSUPERSCRIPT ) + italic_T start_POSTSUPERSCRIPT 1 - 2 italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT )
=𝒪(VϵWGT+f(T))+oT(1).absent𝒪𝑉subscriptitalic-ϵ𝑊𝐺𝑇𝑓𝑇subscript𝑜𝑇1\displaystyle=\operatorname{\mathcal{O}}\left(\frac{V}{\epsilon_{W}}GT+f(T)% \right)+o_{T}(1).= caligraphic_O ( divide start_ARG italic_V end_ARG start_ARG italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT end_ARG italic_G italic_T + italic_f ( italic_T ) ) + italic_o start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( 1 ) . (27)

As mentioned in the proof sketch of this theorem, this only gives a 1T𝔼[t=1T𝑸(t)1]=𝒪T(V)1𝑇𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1subscript𝒪𝑇𝑉\frac{1}{T}\operatornamewithlimits{\mathbb{E}}[\sum_{t=1}^{T}\lVert\bm{Q}(t)% \rVert_{1}]=\operatorname{\mathcal{O}}_{T}(V)divide start_ARG 1 end_ARG start_ARG italic_T end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_V ) bound on the average queue length, which violates the system stability condition Equation 2. However, this inequality can be used to derive the polynomial convergence result on the utility, which in turn refines the average queue length bound.

Step 2 (Yield Polynomial Convergence on the Utility).

Moving the difference in the average utility in Equation 26 to the LHS, we have

VϵW𝔼[t=1T(gt(𝝀̊(t))gt(𝝀(t)))]𝑉subscriptitalic-ϵ𝑊𝔼superscriptsubscript𝑡1𝑇subscript𝑔𝑡̊𝝀𝑡subscript𝑔𝑡𝝀𝑡\displaystyle\frac{V}{\epsilon_{W}}\operatornamewithlimits{\mathbb{E}}\left[% \sum_{t=1}^{T}\bigg{(}g_{t}(\mathring{\bm{\lambda}}(t))-g_{t}(\bm{\lambda}(t))% \bigg{)}\right]divide start_ARG italic_V end_ARG start_ARG italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) ) ] 𝔼[t=1T𝑸(t)1]+f(T)+absent𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1limit-from𝑓𝑇\displaystyle\leq-\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}% \lVert\bm{Q}(t)\rVert_{1}\right]+f(T)+≤ - blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] + italic_f ( italic_T ) +
g(T)𝔼[t=1T𝑸(t)1]3/4log𝔼[Mt=1T𝑸(t)1]+\displaystyle\quad g(T)\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}% \lVert\bm{Q}(t)\rVert_{1}\right]^{3/4}\log\operatornamewithlimits{\mathbb{E}}% \left[M\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{1}\right]+italic_g ( italic_T ) blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT roman_log blackboard_E [ italic_M ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] +
h(T)𝔼[t=1T𝑸(t)1]7/8.\displaystyle\quad h(T)\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}% \lVert\bm{Q}(t)\rVert_{1}\right]^{7/8}.italic_h ( italic_T ) blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT 7 / 8 end_POSTSUPERSCRIPT .

Plugging in the just-derived bound on average queue length, namely Equation 27, we have

VϵW𝔼[t=1T(gt(𝝀̊(t))gt(𝝀(t)))]𝑉subscriptitalic-ϵ𝑊𝔼superscriptsubscript𝑡1𝑇subscript𝑔𝑡̊𝝀𝑡subscript𝑔𝑡𝝀𝑡\displaystyle\quad\frac{V}{\epsilon_{W}}\operatornamewithlimits{\mathbb{E}}% \left[\sum_{t=1}^{T}\bigg{(}g_{t}(\mathring{\bm{\lambda}}(t))-g_{t}(\bm{% \lambda}(t))\bigg{)}\right]divide start_ARG italic_V end_ARG start_ARG italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) ) ]
0+f(T)+g(T)𝒪((VϵWGT+f(T))3/4log(VϵWGT+f(T))+h(T)(VϵWGT+f(T))7/8)absent0𝑓𝑇𝑔𝑇𝒪superscript𝑉subscriptitalic-ϵ𝑊𝐺𝑇𝑓𝑇34𝑉subscriptitalic-ϵ𝑊𝐺𝑇𝑓𝑇𝑇superscript𝑉subscriptitalic-ϵ𝑊𝐺𝑇𝑓𝑇78\displaystyle\leq 0+f(T)+g(T)\operatorname{\mathcal{O}}\left(\left(\frac{V}{% \epsilon_{W}}GT+f(T)\right)^{3/4}\log\left(\frac{V}{\epsilon_{W}}GT+f(T)\right% )+h(T)\left(\frac{V}{\epsilon_{W}}GT+f(T)\right)^{7/8}\right)≤ 0 + italic_f ( italic_T ) + italic_g ( italic_T ) caligraphic_O ( ( divide start_ARG italic_V end_ARG start_ARG italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT end_ARG italic_G italic_T + italic_f ( italic_T ) ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT roman_log ( divide start_ARG italic_V end_ARG start_ARG italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT end_ARG italic_G italic_T + italic_f ( italic_T ) ) + italic_h ( italic_T ) ( divide start_ARG italic_V end_ARG start_ARG italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT end_ARG italic_G italic_T + italic_f ( italic_T ) ) start_POSTSUPERSCRIPT 7 / 8 end_POSTSUPERSCRIPT )
=f(T)+𝒪T(T1/4δa/2(VT)3/4log(VT)+T1/8δλ/4(VT)7/8).absent𝑓𝑇subscript𝒪𝑇superscript𝑇14subscript𝛿𝑎2superscript𝑉𝑇34𝑉𝑇superscript𝑇18subscript𝛿𝜆4superscript𝑉𝑇78\displaystyle=f(T)+\operatorname{\mathcal{O}}_{T}\left(T^{1/4-\delta_{a}/2}(VT% )^{3/4}\log(VT)+T^{1/8-\delta_{\lambda}/4}(VT)^{7/8}\right).= italic_f ( italic_T ) + caligraphic_O start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT 1 / 4 - italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT / 2 end_POSTSUPERSCRIPT ( italic_V italic_T ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT roman_log ( italic_V italic_T ) + italic_T start_POSTSUPERSCRIPT 1 / 8 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT / 4 end_POSTSUPERSCRIPT ( italic_V italic_T ) start_POSTSUPERSCRIPT 7 / 8 end_POSTSUPERSCRIPT ) .

According to the assumption that V=oT(min{T2δa/3,T2δλ/7})𝑉subscript𝑜𝑇superscript𝑇2subscript𝛿𝑎3superscript𝑇2subscript𝛿𝜆7V=o_{T}(\min\{T^{2\delta_{a}/3},T^{2\delta_{\lambda}/7}\})italic_V = italic_o start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( roman_min { italic_T start_POSTSUPERSCRIPT 2 italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT / 3 end_POSTSUPERSCRIPT , italic_T start_POSTSUPERSCRIPT 2 italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT / 7 end_POSTSUPERSCRIPT } ), the second term on the RHS is of order oT(T)subscript𝑜𝑇𝑇o_{T}(T)italic_o start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T ). Therefore, we have

1T𝔼[t=1T(gt(𝝀̊(t))gt(𝝀(t))))]=ϵWVTf(T)+ϵWVToT(T)\displaystyle\quad\frac{1}{T}\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=% 1}^{T}\left(g_{t}(\mathring{\bm{\lambda}}(t))-g_{t}(\bm{\lambda}(t)))\right)% \right]=\frac{\epsilon_{W}}{VT}f(T)+\frac{\epsilon_{W}}{VT}o_{T}(T)divide start_ARG 1 end_ARG start_ARG italic_T end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) ) ) ] = divide start_ARG italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT end_ARG start_ARG italic_V italic_T end_ARG italic_f ( italic_T ) + divide start_ARG italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT end_ARG start_ARG italic_V italic_T end_ARG italic_o start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T )
=(VT)1𝒪((N2(2NM+R)2+ϵWN2(2NM+R))CWT+R(2NM+R)r7d14/3(CλT1/2δλ)2+\displaystyle=(VT)^{-1}\operatorname{\mathcal{O}}\left((N^{2}(2NM+R)^{2}+% \epsilon_{W}N^{2}(2NM+R))C_{W}T+\frac{R(2NM+R)}{r^{7}}d^{14/3}(C^{\lambda}T^{1% /2-\delta_{\lambda}})^{2}\right.+= ( italic_V italic_T ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_O ( ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) ) italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_T + divide start_ARG italic_R ( 2 italic_N italic_M + italic_R ) end_ARG start_ARG italic_r start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT end_ARG italic_d start_POSTSUPERSCRIPT 14 / 3 end_POSTSUPERSCRIPT ( italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT +
(Rrd2/3+R)(CrT1/2δr)1/4V(L+G)T3/4+12N2((NM)2+2(NM)2+2R2)T)+oT(V1)\displaystyle\qquad\qquad\qquad\left.\left(\frac{R}{r}d^{2/3}+R\right)(C^{r}T^% {1/2-\delta_{r}})^{1/4}V(L+G)T^{3/4}+\frac{1}{2}N^{2}((NM)^{2}+2(NM)^{2}+2R^{2% })T\right)+o_{T}(V^{-1})( divide start_ARG italic_R end_ARG start_ARG italic_r end_ARG italic_d start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT + italic_R ) ( italic_C start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT italic_V ( italic_L + italic_G ) italic_T start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_T ) + italic_o start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT )
=𝒪((N2(2NM+R)2+ϵWN2(2NM+R))CW+(N4M2+N2R2)V)+oT(V1).absent𝒪superscript𝑁2superscript2𝑁𝑀𝑅2subscriptitalic-ϵ𝑊superscript𝑁22𝑁𝑀𝑅subscript𝐶𝑊superscript𝑁4superscript𝑀2superscript𝑁2superscript𝑅2𝑉subscript𝑜𝑇superscript𝑉1\displaystyle=\operatorname{\mathcal{O}}\left(\frac{(N^{2}(2NM+R)^{2}+\epsilon% _{W}N^{2}(2NM+R))C_{W}+(N^{4}M^{2}+N^{2}R^{2})}{V}\right)+o_{T}(V^{-1}).= caligraphic_O ( divide start_ARG ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) ) italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT + ( italic_N start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_V end_ARG ) + italic_o start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) . (28)

The second conclusion of this theorem follows.

Step 3 (Refine the Average Queue Length Bound).

Now we are ready to refine our average queue length bound using Equation 28. Instead of controlling the utility with the uniform boundedness assumption that gt[G,G]subscript𝑔𝑡𝐺𝐺g_{t}\in[-G,G]italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ [ - italic_G , italic_G ], we utilize the just-derived convergence result Equation 28.

Specifically, again applying the self-bounding property in Lemma D.6 to Equation 26 but instead replacing the first term on the RHS with Equation 28, we get

𝔼[t=1T𝑸(t)1]𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1\displaystyle\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{% Q}(t)\rVert_{1}\right]blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] 𝒪(f(T)+oT(T)+f(T)+g(T)4log8(2(f(T)1/8+g(T)1/2+h(T))2)+h(T)8)absent𝒪𝑓𝑇subscript𝑜𝑇𝑇𝑓𝑇𝑔superscript𝑇4superscript82superscript𝑓superscript𝑇18𝑔superscript𝑇12𝑇2superscript𝑇8\displaystyle\leq\operatorname{\mathcal{O}}\left(f(T)+o_{T}(T)+f(T)+g(T)^{4}% \log^{8}\left(2(f(T)^{1/8}+g(T)^{1/2}+h(T))^{2}\right)+h(T)^{8}\right)≤ caligraphic_O ( italic_f ( italic_T ) + italic_o start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T ) + italic_f ( italic_T ) + italic_g ( italic_T ) start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT ( 2 ( italic_f ( italic_T ) start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g ( italic_T ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + italic_h ( italic_T ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_h ( italic_T ) start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT )
=𝒪(f(T))+𝒪(g(T)4log8(2(f(T)1/8+g(T)1/2+h(T))2)+h(T)8)+oT(T)absent𝒪𝑓𝑇𝒪𝑔superscript𝑇4superscript82superscript𝑓superscript𝑇18𝑔superscript𝑇12𝑇2superscript𝑇8subscript𝑜𝑇𝑇\displaystyle=\operatorname{\mathcal{O}}(f(T))+\operatorname{\mathcal{O}}\left% (g(T)^{4}\log^{8}\left(2(f(T)^{1/8}+g(T)^{1/2}+h(T))^{2}\right)+h(T)^{8}\right% )+o_{T}(T)= caligraphic_O ( italic_f ( italic_T ) ) + caligraphic_O ( italic_g ( italic_T ) start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT ( 2 ( italic_f ( italic_T ) start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g ( italic_T ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + italic_h ( italic_T ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_h ( italic_T ) start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT ) + italic_o start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T )
=𝒪((N2(2NM+R)2+ϵWN2(2NM+R))CW+(N4M2+N2R2)ϵWT)+oT(T),absent𝒪superscript𝑁2superscript2𝑁𝑀𝑅2subscriptitalic-ϵ𝑊superscript𝑁22𝑁𝑀𝑅subscript𝐶𝑊superscript𝑁4superscript𝑀2superscript𝑁2superscript𝑅2subscriptitalic-ϵ𝑊𝑇subscript𝑜𝑇𝑇\displaystyle=\operatorname{\mathcal{O}}\left(\frac{(N^{2}(2NM+R)^{2}+\epsilon% _{W}N^{2}(2NM+R))C_{W}+(N^{4}M^{2}+N^{2}R^{2})}{\epsilon_{W}}T\right)+o_{T}(T),= caligraphic_O ( divide start_ARG ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) ) italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT + ( italic_N start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT end_ARG italic_T ) + italic_o start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_T ) ,

which gives our first conclusion as well. ∎

Lemma C.7 (Calculations when Proving Theorem 4.5).

Under the conditions of Theorem 4.5, we have

𝔼[t=1T𝑸(t)1]𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1\displaystyle\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{% Q}(t)\rVert_{1}\right]blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] VϵW𝔼[t=1T(gt(𝝀̊(t))gt(𝝀(t)))]+f(T)+absent𝑉subscriptitalic-ϵ𝑊𝔼superscriptsubscript𝑡1𝑇subscript𝑔𝑡̊𝝀𝑡subscript𝑔𝑡𝝀𝑡limit-from𝑓𝑇\displaystyle\leq-\frac{V}{\epsilon_{W}}\operatornamewithlimits{\mathbb{E}}% \left[\sum_{t=1}^{T}\bigg{(}g_{t}(\mathring{\bm{\lambda}}(t))-g_{t}(\bm{% \lambda}(t))\bigg{)}\right]+f(T)+≤ - divide start_ARG italic_V end_ARG start_ARG italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) ) ] + italic_f ( italic_T ) +
g(T)𝔼[t=1T𝑸(t)1]3/4log𝔼[Mt=1T𝑸(t)1]+\displaystyle\quad g(T)\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}% \lVert\bm{Q}(t)\rVert_{1}\right]^{3/4}\log\operatornamewithlimits{\mathbb{E}}% \left[M\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{1}\right]+italic_g ( italic_T ) blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT roman_log blackboard_E [ italic_M ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] +
h(T)𝔼[t=1T𝑸(t)1]7/8.\displaystyle\quad h(T)\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}% \lVert\bm{Q}(t)\rVert_{1}\right]^{7/8}.italic_h ( italic_T ) blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT 7 / 8 end_POSTSUPERSCRIPT .

where

f(T)𝑓𝑇\displaystyle f(T)italic_f ( italic_T ) =ϵW1𝒪((N2(2NM+R)2+ϵWN2(2NM+R))CWT+R(2NM+R)r7d14/3(CλT1/2δλ)2+\displaystyle=\epsilon_{W}^{-1}\operatorname{\mathcal{O}}\left((N^{2}(2NM+R)^{% 2}+\epsilon_{W}N^{2}(2NM+R))C_{W}T+\frac{R(2NM+R)}{r^{7}}d^{14/3}(C^{\lambda}T% ^{1/2-\delta_{\lambda}})^{2}\right.+= italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_O ( ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) ) italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_T + divide start_ARG italic_R ( 2 italic_N italic_M + italic_R ) end_ARG start_ARG italic_r start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT end_ARG italic_d start_POSTSUPERSCRIPT 14 / 3 end_POSTSUPERSCRIPT ( italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT +
(Rrd2/3+R)(CrT1/2δr)1/4V(L+G)T3/4+12N2((NM)2+2(NM)2+2R2)T),\displaystyle\qquad\qquad\left.\left(\frac{R}{r}d^{2/3}+R\right)(C^{r}T^{1/2-% \delta_{r}})^{1/4}V(L+G)T^{3/4}+\frac{1}{2}N^{2}((NM)^{2}+2(NM)^{2}+2R^{2})T% \right),( divide start_ARG italic_R end_ARG start_ARG italic_r end_ARG italic_d start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT + italic_R ) ( italic_C start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT italic_V ( italic_L + italic_G ) italic_T start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_T ) ,
g(T)𝑔𝑇\displaystyle g(T)italic_g ( italic_T ) =ϵW1𝒪((2NM+R)1/4M1+CaT1/2δalogT),absentsuperscriptsubscriptitalic-ϵ𝑊1𝒪superscript2𝑁𝑀𝑅14𝑀1superscript𝐶𝑎superscript𝑇12subscript𝛿𝑎𝑇\displaystyle=\epsilon_{W}^{-1}\operatorname{\mathcal{O}}\left((2NM+R)^{1/4}M% \sqrt{1+C^{a}T^{1/2-\delta_{a}}}\log T\right),= italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_O ( ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT italic_M square-root start_ARG 1 + italic_C start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG roman_log italic_T ) ,
h(T)𝑇\displaystyle h(T)italic_h ( italic_T ) =ϵW1𝒪((2NM+R)1/8(Rrd2/3+R)(CλT1/2δλ)1/4).absentsuperscriptsubscriptitalic-ϵ𝑊1𝒪superscript2𝑁𝑀𝑅18𝑅𝑟superscript𝑑23𝑅superscriptsuperscript𝐶𝜆superscript𝑇12subscript𝛿𝜆14\displaystyle=\epsilon_{W}^{-1}\operatorname{\mathcal{O}}\left((2NM+R)^{1/8}% \left(\frac{R}{r}d^{2/3}+R\right)(C^{\lambda}T^{1/2-\delta_{\lambda}})^{1/4}% \right).= italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_O ( ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT ( divide start_ARG italic_R end_ARG start_ARG italic_r end_ARG italic_d start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT + italic_R ) ( italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT ) .
Proof.

From the network stability assumption, we derived in Lemma 4.1 that

ϵW𝔼[t=1Tn𝒩k𝒩Qn(k)(t)](N2(2NM+R)2+ϵWN2(2NM+R))CWTsubscriptitalic-ϵ𝑊𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡superscript𝑁2superscript2𝑁𝑀𝑅2subscriptitalic-ϵ𝑊superscript𝑁22𝑁𝑀𝑅subscript𝐶𝑊𝑇\displaystyle\quad\epsilon_{W}\operatornamewithlimits{\mathbb{E}}\left[\sum_{t% =1}^{T}\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q_{n}^{(k)}(t)\right]-(% N^{2}(2NM+R)^{2}+\epsilon_{W}N^{2}(2NM+R))C_{W}Titalic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] - ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) ) italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_T
𝔼[t=1T(n,m)k𝒩μ̊n,m(k)(t)(Qm(k)(t)Qn(k)(t))]𝔼[t=1Tn𝒩k𝒩Qn(k)(t)λ̊n(k)(t)].absent𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝑚subscript𝑘𝒩superscriptsubscript̊𝜇𝑛𝑚𝑘𝑡superscriptsubscript𝑄𝑚𝑘𝑡superscriptsubscript𝑄𝑛𝑘𝑡𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡superscriptsubscript̊𝜆𝑛𝑘𝑡\displaystyle\leq-\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum_% {(n,m)\in{\mathcal{L}}}\sum_{k\in{\mathcal{N}}}\mathring{\mu}_{n,m}^{(k)}(t)(Q% _{m}^{(k)}(t)-Q_{n}^{(k)}(t))\right]-\operatornamewithlimits{\mathbb{E}}\left[% \sum_{t=1}^{T}\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q_{n}^{(k)}(t)% \mathring{\lambda}_{n}^{(k)}(t)\right].≤ - blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT over̊ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ( italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) ] - blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) over̊ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] .

Recall the AdaPFOL guarantee in Theorem 3.5 that

𝔼[t=1T(n,m)k𝒩(μn,m(k)(t)μ̊n,m(k)(t))(Qm(k)(t)Qn(k)(t))]𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝑚subscript𝑘𝒩superscriptsubscript𝜇𝑛𝑚𝑘𝑡superscriptsubscript̊𝜇𝑛𝑚𝑘𝑡superscriptsubscript𝑄𝑚𝑘𝑡superscriptsubscript𝑄𝑛𝑘𝑡\displaystyle\quad\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum_% {(n,m)\in{\mathcal{L}}}\sum_{k\in{\mathcal{N}}}(\mu_{n,m}^{(k)}(t)-\mathring{% \mu}_{n,m}^{(k)}(t))\left(Q_{m}^{(k)}(t)-Q_{n}^{(k)}(t)\right)\right]blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - over̊ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) ( italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) ]
=𝒪(M1+PTa𝔼[t=1T𝑸(t)22logTlog(maxt=1Tmax(n,m)M𝑸m(t)𝑸n(t))]),absent𝒪𝑀1superscriptsubscript𝑃𝑇𝑎𝔼superscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥𝑸𝑡22𝑇superscriptsubscript𝑡1𝑇subscript𝑛𝑚𝑀subscriptdelimited-∥∥subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡,\displaystyle=\operatorname{\mathcal{O}}\left(M\sqrt{1+P_{T}^{a}}% \operatornamewithlimits{\mathbb{E}}\left[\sqrt{\sum_{t=1}^{T}\lVert\bm{Q}(t)% \rVert_{2}^{2}}\log T\log\left(\max_{t=1}^{T}\max_{(n,m)\in{\mathcal{L}}}M% \lVert\bm{Q}_{m}(t)-\bm{Q}_{n}(t)\rVert_{\infty}\right)\right]\right)\text{,}= caligraphic_O ( italic_M square-root start_ARG 1 + italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_ARG blackboard_E [ square-root start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_log italic_T roman_log ( roman_max start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_max start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_M ∥ bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) ] ) ,

and the Bandit Convex Optimization guarantee in Theorem 4.4 that

we therefore have

ϵW𝔼[t=1Tn𝒩k𝒩Qn(k)(t)](N2(2NM+R)2+ϵWN2(2NM+R))CWTsubscriptitalic-ϵ𝑊𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡superscript𝑁2superscript2𝑁𝑀𝑅2subscriptitalic-ϵ𝑊superscript𝑁22𝑁𝑀𝑅subscript𝐶𝑊𝑇\displaystyle\quad\epsilon_{W}\operatornamewithlimits{\mathbb{E}}\left[\sum_{t% =1}^{T}\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q_{n}^{(k)}(t)\right]-(% N^{2}(2NM+R)^{2}+\epsilon_{W}N^{2}(2NM+R))C_{W}Titalic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] - ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) ) italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_T
𝔼[t=1T(n,m)k𝒩μn,m(k)(t)(Qm(k)(t)Qn(k)(t))]+absentlimit-from𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝑚subscript𝑘𝒩superscriptsubscript𝜇𝑛𝑚𝑘𝑡superscriptsubscript𝑄𝑚𝑘𝑡superscriptsubscript𝑄𝑛𝑘𝑡\displaystyle\leq-\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum_% {(n,m)\in{\mathcal{L}}}\sum_{k\in{\mathcal{N}}}\mu_{n,m}^{(k)}(t)(Q_{m}^{(k)}(% t)-Q_{n}^{(k)}(t))\right]+≤ - blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ( italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ) ] +
𝒪(M1+PTa𝔼[t=1T𝑸(t)22logTlog(maxt=1Tmax(n,m)M𝑸m(t)𝑸n(t))])+limit-from𝒪𝑀1superscriptsubscript𝑃𝑇𝑎𝔼superscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥𝑸𝑡22𝑇superscriptsubscript𝑡1𝑇subscript𝑛𝑚𝑀subscriptdelimited-∥∥subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡\displaystyle\quad\operatorname{\mathcal{O}}\left(M\sqrt{1+P_{T}^{a}}% \operatornamewithlimits{\mathbb{E}}\left[\sqrt{\sum_{t=1}^{T}\lVert\bm{Q}(t)% \rVert_{2}^{2}}\log T\log\left(\max_{t=1}^{T}\max_{(n,m)\in{\mathcal{L}}}M% \lVert\bm{Q}_{m}(t)-\bm{Q}_{n}(t)\rVert_{\infty}\right)\right]\right)+caligraphic_O ( italic_M square-root start_ARG 1 + italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_ARG blackboard_E [ square-root start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_log italic_T roman_log ( roman_max start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_max start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_M ∥ bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) ] ) +
𝔼[t=1Tn𝒩k𝒩Qn(k)(t)λn(k)(t)]V𝔼[t=1T(gt(𝝀̊(t))gt(𝝀(t)))]+𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡superscriptsubscript𝜆𝑛𝑘𝑡limit-from𝑉𝔼superscriptsubscript𝑡1𝑇subscript𝑔𝑡̊𝝀𝑡subscript𝑔𝑡𝝀𝑡\displaystyle\quad-\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\sum% _{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q_{n}^{(k)}(t)\lambda_{n}^{(k)}(t)% \right]-V\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\bigg{(}g_{t}(% \mathring{\bm{\lambda}}(t))-g_{t}(\bm{\lambda}(t))\bigg{)}\right]+- blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] - italic_V blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) ) ] +
𝒪(R(2NM+R)r7d14/3(CλT1/2δλ)2)+limit-from𝒪𝑅2𝑁𝑀𝑅superscript𝑟7superscript𝑑143superscriptsuperscript𝐶𝜆superscript𝑇12subscript𝛿𝜆2\displaystyle\quad\operatorname{\mathcal{O}}\left(\frac{R(2NM+R)}{r^{7}}d^{14/% 3}(C^{\lambda}T^{1/2-\delta_{\lambda}})^{2}\right)+caligraphic_O ( divide start_ARG italic_R ( 2 italic_N italic_M + italic_R ) end_ARG start_ARG italic_r start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT end_ARG italic_d start_POSTSUPERSCRIPT 14 / 3 end_POSTSUPERSCRIPT ( italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) +
𝒪(𝔼[(Rrd2/3+R)(CλT1/2δλ)1/4(t=1T(𝑸(t)2+V(L+G))4/3)3/4]).𝒪𝔼𝑅𝑟superscript𝑑23𝑅superscriptsuperscript𝐶𝜆superscript𝑇12subscript𝛿𝜆14superscriptsuperscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥𝑸𝑡2𝑉𝐿𝐺4334\displaystyle\quad\operatorname{\mathcal{O}}\left(\operatornamewithlimits{% \mathbb{E}}\left[\left(\frac{R}{r}d^{2/3}+R\right)(C^{\lambda}T^{1/2-\delta_{% \lambda}})^{1/4}\left(\sum_{t=1}^{T}\left(\lVert\bm{Q}(t)\rVert_{2}+V(L+G)% \right)^{4/3}\right)^{3/4}\right]\right).caligraphic_O ( blackboard_E [ ( divide start_ARG italic_R end_ARG start_ARG italic_r end_ARG italic_d start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT + italic_R ) ( italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_V ( italic_L + italic_G ) ) start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT ] ) .

Further plugging in the Lyapunov DPP calculation in Equation 15 (which controls the three 𝔼[t=1T]𝔼superscriptsubscript𝑡1𝑇\operatornamewithlimits{\mathbb{E}}[\sum_{t=1}^{T}\cdots]blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ⋯ ] terms outside 𝒪𝒪\operatorname{\mathcal{O}}caligraphic_O on the RHS), we have

ϵW𝔼[t=1Tn𝒩k𝒩Qn(k)(t)](N2(2NM+R)2+ϵWN2(2NM+R))CWTsubscriptitalic-ϵ𝑊𝔼superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘𝑡superscript𝑁2superscript2𝑁𝑀𝑅2subscriptitalic-ϵ𝑊superscript𝑁22𝑁𝑀𝑅subscript𝐶𝑊𝑇\displaystyle\quad\epsilon_{W}\operatornamewithlimits{\mathbb{E}}\left[\sum_{t% =1}^{T}\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}Q_{n}^{(k)}(t)\right]-(% N^{2}(2NM+R)^{2}+\epsilon_{W}N^{2}(2NM+R))C_{W}Titalic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) ] - ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) ) italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_T
𝒪(M1+PTa𝔼[t=1T𝑸(t)22logTlog(maxt=1Tmax(n,m)M𝑸m(t)𝑸n(t))])+absentlimit-from𝒪𝑀1superscriptsubscript𝑃𝑇𝑎𝔼superscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥𝑸𝑡22𝑇superscriptsubscript𝑡1𝑇subscript𝑛𝑚𝑀subscriptdelimited-∥∥subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡\displaystyle\leq\operatorname{\mathcal{O}}\left(M\sqrt{1+P_{T}^{a}}% \operatornamewithlimits{\mathbb{E}}\left[\sqrt{\sum_{t=1}^{T}\lVert\bm{Q}(t)% \rVert_{2}^{2}}\log T\log\left(\max_{t=1}^{T}\max_{(n,m)\in{\mathcal{L}}}M% \lVert\bm{Q}_{m}(t)-\bm{Q}_{n}(t)\rVert_{\infty}\right)\right]\right)+≤ caligraphic_O ( italic_M square-root start_ARG 1 + italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_ARG blackboard_E [ square-root start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_log italic_T roman_log ( roman_max start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_max start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_M ∥ bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) ] ) +
𝒪(R(2NM+R)r7d14/3(CλT1/2δλ)2)+limit-from𝒪𝑅2𝑁𝑀𝑅superscript𝑟7superscript𝑑143superscriptsuperscript𝐶𝜆superscript𝑇12subscript𝛿𝜆2\displaystyle\quad\operatorname{\mathcal{O}}\left(\frac{R(2NM+R)}{r^{7}}d^{14/% 3}(C^{\lambda}T^{1/2-\delta_{\lambda}})^{2}\right)+caligraphic_O ( divide start_ARG italic_R ( 2 italic_N italic_M + italic_R ) end_ARG start_ARG italic_r start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT end_ARG italic_d start_POSTSUPERSCRIPT 14 / 3 end_POSTSUPERSCRIPT ( italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) +
𝒪(𝔼[(Rrd2/3+R)(CλT1/2δλ)1/4(t=1T(𝑸(t)2+V(L+G))4/3)3/4])+limit-from𝒪𝔼𝑅𝑟superscript𝑑23𝑅superscriptsuperscript𝐶𝜆superscript𝑇12subscript𝛿𝜆14superscriptsuperscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥𝑸𝑡2𝑉𝐿𝐺4334\displaystyle\quad\operatorname{\mathcal{O}}\left(\operatornamewithlimits{% \mathbb{E}}\left[\left(\frac{R}{r}d^{2/3}+R\right)(C^{\lambda}T^{1/2-\delta_{% \lambda}})^{1/4}\left(\sum_{t=1}^{T}\left(\lVert\bm{Q}(t)\rVert_{2}+V(L+G)% \right)^{4/3}\right)^{3/4}\right]\right)+caligraphic_O ( blackboard_E [ ( divide start_ARG italic_R end_ARG start_ARG italic_r end_ARG italic_d start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT + italic_R ) ( italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_V ( italic_L + italic_G ) ) start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT ] ) +
12N2((NM)2+2(NM)2+2R2)+V𝔼[t=1T(gt(𝝀(t))gt(𝝀̊(t)))].12superscript𝑁2superscript𝑁𝑀22superscript𝑁𝑀22superscript𝑅2𝑉𝔼superscriptsubscript𝑡1𝑇subscript𝑔𝑡𝝀𝑡subscript𝑔𝑡̊𝝀𝑡\displaystyle\quad\frac{1}{2}N^{2}((NM)^{2}+2(NM)^{2}+2R^{2})+V% \operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}(g_{t}(\bm{\lambda}(t))% -g_{t}(\mathring{\bm{\lambda}}(t)))\right].divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_V blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) ) ] .

For notational simplicity, we can abbreviate this inequality as

𝔼[t=1T𝑸(t)1]VϵW𝔼[t=1T(gt(𝝀̊(t))gt(𝝀(t)))]+f~(T)+𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1𝑉subscriptitalic-ϵ𝑊𝔼superscriptsubscript𝑡1𝑇subscript𝑔𝑡̊𝝀𝑡subscript𝑔𝑡𝝀𝑡limit-from~𝑓𝑇\displaystyle\quad\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}% \lVert\bm{Q}(t)\rVert_{1}\right]\leq-\frac{V}{\epsilon_{W}}% \operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\bigg{(}g_{t}(\mathring% {\bm{\lambda}}(t))-g_{t}(\bm{\lambda}(t))\bigg{)}\right]+\widetilde{f}(T)+blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] ≤ - divide start_ARG italic_V end_ARG start_ARG italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) ) ] + over~ start_ARG italic_f end_ARG ( italic_T ) +
g~(T)𝔼[t=1T𝑸(t)22]log(maxt=1Tmax(n,m)M𝑸m(t)𝑸n(t))+h~(T)(𝔼[t=1T𝑸(t)24/3])3/4,~𝑔𝑇𝔼superscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥𝑸𝑡22superscriptsubscript𝑡1𝑇subscript𝑛𝑚𝑀subscriptdelimited-∥∥subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡~𝑇superscript𝔼superscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥𝑸𝑡24334\displaystyle\quad\widetilde{g}(T)\sqrt{\operatornamewithlimits{\mathbb{E}}% \left[\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{2}^{2}\right]}\log\left(\max_{t=1}^% {T}\max_{(n,m)\in{\mathcal{L}}}M\lVert\bm{Q}_{m}(t)-\bm{Q}_{n}(t)\rVert_{% \infty}\right)+\widetilde{h}(T)\left(\operatornamewithlimits{\mathbb{E}}\left[% \sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{2}^{4/3}\right]\right)^{3/4},over~ start_ARG italic_g end_ARG ( italic_T ) square-root start_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG roman_log ( roman_max start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_max start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_M ∥ bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) + over~ start_ARG italic_h end_ARG ( italic_T ) ( blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT ] ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT ,

where

f~(T)~𝑓𝑇\displaystyle\widetilde{f}(T)over~ start_ARG italic_f end_ARG ( italic_T ) =ϵW1𝒪((N2(2NM+R)2+ϵWN2(2NM+R))CWT+R(2NM+R)r7d14/3(CλT1/2δλ)2+\displaystyle=\epsilon_{W}^{-1}\operatorname{\mathcal{O}}\left((N^{2}(2NM+R)^{% 2}+\epsilon_{W}N^{2}(2NM+R))C_{W}T+\frac{R(2NM+R)}{r^{7}}d^{14/3}(C^{\lambda}T% ^{1/2-\delta_{\lambda}})^{2}\right.+= italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_O ( ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_N italic_M + italic_R ) ) italic_C start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT italic_T + divide start_ARG italic_R ( 2 italic_N italic_M + italic_R ) end_ARG start_ARG italic_r start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT end_ARG italic_d start_POSTSUPERSCRIPT 14 / 3 end_POSTSUPERSCRIPT ( italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT +
(Rrd2/3+R)(CrT1/2δr)1/4V(L+G)T3/4+12N2((NM)2+2(NM)2+2R2)T),\displaystyle\qquad\qquad\left.\left(\frac{R}{r}d^{2/3}+R\right)(C^{r}T^{1/2-% \delta_{r}})^{1/4}V(L+G)T^{3/4}+\frac{1}{2}N^{2}((NM)^{2}+2(NM)^{2}+2R^{2})T% \right),( divide start_ARG italic_R end_ARG start_ARG italic_r end_ARG italic_d start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT + italic_R ) ( italic_C start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT italic_V ( italic_L + italic_G ) italic_T start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ( italic_N italic_M ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_T ) ,
g~(T)~𝑔𝑇\displaystyle\widetilde{g}(T)over~ start_ARG italic_g end_ARG ( italic_T ) =ϵW1𝒪(M1+CaT1/2δalogT),absentsuperscriptsubscriptitalic-ϵ𝑊1𝒪𝑀1superscript𝐶𝑎superscript𝑇12subscript𝛿𝑎𝑇\displaystyle=\epsilon_{W}^{-1}\operatorname{\mathcal{O}}\left(M\sqrt{1+C^{a}T% ^{1/2-\delta_{a}}}\log T\right),= italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_O ( italic_M square-root start_ARG 1 + italic_C start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG roman_log italic_T ) ,
h~(T)~𝑇\displaystyle\widetilde{h}(T)over~ start_ARG italic_h end_ARG ( italic_T ) =ϵW1𝒪((Rrd2/3+R)(CλT1/2δλ)1/4).absentsuperscriptsubscriptitalic-ϵ𝑊1𝒪𝑅𝑟superscript𝑑23𝑅superscriptsuperscript𝐶𝜆superscript𝑇12subscript𝛿𝜆14\displaystyle=\epsilon_{W}^{-1}\operatorname{\mathcal{O}}\left(\left(\frac{R}{% r}d^{2/3}+R\right)(C^{\lambda}T^{1/2-\delta_{\lambda}})^{1/4}\right).= italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_O ( ( divide start_ARG italic_R end_ARG start_ARG italic_r end_ARG italic_d start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT + italic_R ) ( italic_C start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT 1 / 2 - italic_δ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT ) .

To handle the g~(T)~𝑔𝑇\widetilde{g}(T)over~ start_ARG italic_g end_ARG ( italic_T )-related term, we use the argument same to that of Section 3.5: Lemma D.3 states that if x1=0subscript𝑥10x_{1}=0italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0, x2,,xT0subscript𝑥2subscript𝑥𝑇0x_{2},\ldots,x_{T}\geq 0italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ≥ 0, and |xt+1xt|1subscript𝑥𝑡1subscript𝑥𝑡1\lvert x_{t+1}-x_{t}\rvert\leq 1| italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | ≤ 1, then t=1Txt2=𝒪((t=1Txt)3/2)superscriptsubscript𝑡1𝑇superscriptsubscript𝑥𝑡2𝒪superscriptsuperscriptsubscript𝑡1𝑇subscript𝑥𝑡32\sum_{t=1}^{T}x_{t}^{2}=\operatorname{\mathcal{O}}\left((\sum_{t=1}^{T}x_{t})^% {3/2}\right)∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = caligraphic_O ( ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT ). From Lemma B.1, any single queue Qn(k)(t)superscriptsubscript𝑄𝑛𝑘𝑡Q_{n}^{(k)}(t)italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) satisfies |Qn(k)(t+1)Qn(k)(t)|(2NM+R)superscriptsubscript𝑄𝑛𝑘𝑡1superscriptsubscript𝑄𝑛𝑘𝑡2𝑁𝑀𝑅\lvert Q_{n}^{(k)}(t+1)-Q_{n}^{(k)}(t)\rvert\leq(2NM+R)| italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t + 1 ) - italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) | ≤ ( 2 italic_N italic_M + italic_R ). Hence, applying Lemma D.3 to {Qn(k)(t)/(2NM+R)}t[T]subscriptsuperscriptsubscript𝑄𝑛𝑘𝑡2𝑁𝑀𝑅𝑡delimited-[]𝑇\{Q_{n}^{(k)}(t)/(2NM+R)\}_{t\in[T]}{ italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) / ( 2 italic_N italic_M + italic_R ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT to every n𝒩𝑛𝒩n\in{\mathcal{N}}italic_n ∈ caligraphic_N and k𝒩𝑘𝒩k\in{\mathcal{N}}italic_k ∈ caligraphic_N, we have

t=1T𝑸(t)22=(2NM+R)2t=1Tn𝒩k𝒩(Qn(k)(t)2NM+R)2=𝒪(2NM+R(t=1T𝑸(t)1)1.5).superscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥𝑸𝑡22superscript2𝑁𝑀𝑅2superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑘𝒩superscriptsuperscriptsubscript𝑄𝑛𝑘𝑡2𝑁𝑀𝑅2𝒪2𝑁𝑀𝑅superscriptsuperscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡11.5\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{2}^{2}=(2NM+R)^{2}\sum_{t=1}^{T}\sum_{n% \in{\mathcal{N}}}\sum_{k\in{\mathcal{N}}}\left(\frac{Q_{n}^{(k)}(t)}{2NM+R}% \right)^{2}=\operatorname{\mathcal{O}}\left(\sqrt{2NM+R}\left(\sum_{t=1}^{T}% \lVert\bm{Q}(t)\rVert_{1}\right)^{1.5}\right).∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT ( divide start_ARG italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) end_ARG start_ARG 2 italic_N italic_M + italic_R end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = caligraphic_O ( square-root start_ARG 2 italic_N italic_M + italic_R end_ARG ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1.5 end_POSTSUPERSCRIPT ) .

Further noticing that

maxt=1Tmax(n,m)M𝑸m(t)𝑸n(t)t=1TMn𝒩𝑸n(t)1Mt=1T𝑸(t)1,superscriptsubscript𝑡1𝑇subscript𝑛𝑚𝑀subscriptdelimited-∥∥subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡superscriptsubscript𝑡1𝑇𝑀subscript𝑛𝒩subscriptdelimited-∥∥subscript𝑸𝑛𝑡1𝑀superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1\max_{t=1}^{T}\max_{(n,m)\in{\mathcal{L}}}M\lVert\bm{Q}_{m}(t)-\bm{Q}_{n}(t)% \rVert_{\infty}\leq\sum_{t=1}^{T}M\sum_{n\in{\mathcal{N}}}\lVert\bm{Q}_{n}(t)% \rVert_{1}\leq M\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{1},roman_max start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_max start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_M ∥ bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_M ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∥ bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_M ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ,

the g~(T)~𝑔𝑇\widetilde{g}(T)over~ start_ARG italic_g end_ARG ( italic_T )-related term then becomes

g~(T)𝔼[t=1T𝑸(t)22]log(maxt=1Tmax(n,m)M𝑸m(t)𝑸n(t))~𝑔𝑇𝔼superscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥𝑸𝑡22superscriptsubscript𝑡1𝑇subscript𝑛𝑚𝑀subscriptdelimited-∥∥subscript𝑸𝑚𝑡subscript𝑸𝑛𝑡\displaystyle\quad\widetilde{g}(T)\sqrt{\operatornamewithlimits{\mathbb{E}}% \left[\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{2}^{2}\right]}\log\left(\max_{t=1}^% {T}\max_{(n,m)\in{\mathcal{L}}}M\lVert\bm{Q}_{m}(t)-\bm{Q}_{n}(t)\rVert_{% \infty}\right)over~ start_ARG italic_g end_ARG ( italic_T ) square-root start_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG roman_log ( roman_max start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_max start_POSTSUBSCRIPT ( italic_n , italic_m ) ∈ caligraphic_L end_POSTSUBSCRIPT italic_M ∥ bold_italic_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) - bold_italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT )
=g~(T)𝒪((2NM+R)1/4𝔼[(t=1T𝑸(t)1)3/4log(Mt=1T𝑸(t)1)]).absent~𝑔𝑇𝒪superscript2𝑁𝑀𝑅14𝔼superscriptsuperscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡134𝑀superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1\displaystyle=\widetilde{g}(T)\operatorname{\mathcal{O}}\left((2NM+R)^{1/4}% \operatornamewithlimits{\mathbb{E}}\left[\left(\sum_{t=1}^{T}\lVert\bm{Q}(t)% \rVert_{1}\right)^{3/4}\log\left(M\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{1}% \right)\right]\right).= over~ start_ARG italic_g end_ARG ( italic_T ) caligraphic_O ( ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT blackboard_E [ ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT roman_log ( italic_M ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ] ) .

Noticing that xx3/4log(Mx)maps-to𝑥superscript𝑥34𝑀𝑥x\mapsto x^{3/4}\log(Mx)italic_x ↦ italic_x start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT roman_log ( italic_M italic_x ) is concave when x𝑥xitalic_x is large enough, Jensen inequality then gives

𝒪(𝔼[(t=1T𝑸(t)1)3/4log(Mt=1T𝑸(t)1)])=𝒪(𝔼[t=1T𝑸(t)1]3/4log𝔼[t=1T𝑸(t)1]).\operatorname{\mathcal{O}}\left(\operatornamewithlimits{\mathbb{E}}\left[\left% (\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{1}\right)^{3/4}\log\left(M\sum_{t=1}^{T}% \lVert\bm{Q}(t)\rVert_{1}\right)\right]\right)=\operatorname{\mathcal{O}}\left% (\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_% {1}\right]^{3/4}\log\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}% \lVert\bm{Q}(t)\rVert_{1}\right]\right).caligraphic_O ( blackboard_E [ ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT roman_log ( italic_M ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ] ) = caligraphic_O ( blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT roman_log blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] ) .

Moreover, we handle the h~(T)~𝑇\widetilde{h}(T)over~ start_ARG italic_h end_ARG ( italic_T )-related term using Lemma D.4, a variant of Lemma D.3 which states that if x1=0subscript𝑥10x_{1}=0italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0, x2,,xT0subscript𝑥2subscript𝑥𝑇0x_{2},\ldots,x_{T}\geq 0italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ≥ 0, and |xt+1xt|1subscript𝑥𝑡1subscript𝑥𝑡1\lvert x_{t+1}-x_{t}\rvert\leq 1| italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | ≤ 1, then t=1Txt4/3=𝒪((t=1Txt)7/6)superscriptsubscript𝑡1𝑇superscriptsubscript𝑥𝑡43𝒪superscriptsuperscriptsubscript𝑡1𝑇subscript𝑥𝑡76\sum_{t=1}^{T}x_{t}^{4/3}=\operatorname{\mathcal{O}}\left((\sum_{t=1}^{T}x_{t}% )^{7/6}\right)∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT = caligraphic_O ( ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 7 / 6 end_POSTSUPERSCRIPT ). Hence, still applying it to {Qn(k)(t)/(2NM+R)}t[T]subscriptsuperscriptsubscript𝑄𝑛𝑘𝑡2𝑁𝑀𝑅𝑡delimited-[]𝑇\{Q_{n}^{(k)}(t)/(2NM+R)\}_{t\in[T]}{ italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) / ( 2 italic_N italic_M + italic_R ) } start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT for every n𝒩𝑛𝒩n\in{\mathcal{N}}italic_n ∈ caligraphic_N and k𝒩𝑘𝒩k\in{\mathcal{N}}italic_k ∈ caligraphic_N,

t=1T𝑸(t)24/3superscriptsubscript𝑡1𝑇superscriptsubscriptdelimited-∥∥𝑸𝑡243\displaystyle\sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{2}^{4/3}∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT =t=1T(n𝒩k𝒩Qn(k)(t)2)2/3t=1Tn𝒩k𝒩Qn(k)(t)4/3absentsuperscriptsubscript𝑡1𝑇superscriptsubscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘superscript𝑡223superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscript𝑘𝒩superscriptsubscript𝑄𝑛𝑘superscript𝑡43\displaystyle=\sum_{t=1}^{T}\left(\sum_{n\in{\mathcal{N}}}\sum_{k\in{\mathcal{% N}}}Q_{n}^{(k)}(t)^{2}\right)^{2/3}\leq\sum_{t=1}^{T}\sum_{n\in{\mathcal{N}}}% \sum_{k\in{\mathcal{N}}}Q_{n}^{(k)}(t)^{4/3}= ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT ≤ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_t ) start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT
=𝒪((2NM+R)1/6(t=1T𝑸(t)1)7/6).absent𝒪superscript2𝑁𝑀𝑅16superscriptsuperscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡176\displaystyle=\operatorname{\mathcal{O}}\left(\left(2NM+R\right)^{1/6}\left(% \sum_{t=1}^{T}\lVert\bm{Q}(t)\rVert_{1}\right)^{7/6}\right).= caligraphic_O ( ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 1 / 6 end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 7 / 6 end_POSTSUPERSCRIPT ) .

Therefore, we have

𝔼[t=1T𝑸(t)1]𝔼superscriptsubscript𝑡1𝑇subscriptdelimited-∥∥𝑸𝑡1\displaystyle\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{% Q}(t)\rVert_{1}\right]blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] VϵW𝔼[t=1T(gt(𝝀̊(t))gt(𝝀(t)))]+f~(T)+absent𝑉subscriptitalic-ϵ𝑊𝔼superscriptsubscript𝑡1𝑇subscript𝑔𝑡̊𝝀𝑡subscript𝑔𝑡𝝀𝑡limit-from~𝑓𝑇\displaystyle\leq-\frac{V}{\epsilon_{W}}\operatornamewithlimits{\mathbb{E}}% \left[\sum_{t=1}^{T}\bigg{(}g_{t}(\mathring{\bm{\lambda}}(t))-g_{t}(\bm{% \lambda}(t))\bigg{)}\right]+\widetilde{f}(T)+≤ - divide start_ARG italic_V end_ARG start_ARG italic_ϵ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over̊ start_ARG bold_italic_λ end_ARG ( italic_t ) ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_λ ( italic_t ) ) ) ] + over~ start_ARG italic_f end_ARG ( italic_T ) +
g~(T)𝒪((2NM+R)1/4)𝔼[t=1T𝑸(t)1]3/4log𝔼[Mt=1T𝑸(t)1]+\displaystyle\quad\widetilde{g}(T)\operatorname{\mathcal{O}}\left((2NM+R)^{1/4% }\right)\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{Q}(t)% \rVert_{1}\right]^{3/4}\log\operatornamewithlimits{\mathbb{E}}\left[M\sum_{t=1% }^{T}\lVert\bm{Q}(t)\rVert_{1}\right]+over~ start_ARG italic_g end_ARG ( italic_T ) caligraphic_O ( ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT ) blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT roman_log blackboard_E [ italic_M ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] +
h~(T)𝒪((2NM+R)1/8)𝔼[t=1T𝑸(t)1]7/8.\displaystyle\quad\widetilde{h}(T)\operatorname{\mathcal{O}}\left((2NM+R)^{1/8% }\right)\operatornamewithlimits{\mathbb{E}}\left[\sum_{t=1}^{T}\lVert\bm{Q}(t)% \rVert_{1}\right]^{7/8}.over~ start_ARG italic_h end_ARG ( italic_T ) caligraphic_O ( ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT ) blackboard_E [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ bold_italic_Q ( italic_t ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT 7 / 8 end_POSTSUPERSCRIPT .

Setting f(T)=f~(T)𝑓𝑇~𝑓𝑇f(T)=\widetilde{f}(T)italic_f ( italic_T ) = over~ start_ARG italic_f end_ARG ( italic_T ), g(T)=g~(T)𝒪((2NM+R)1/4)𝑔𝑇~𝑔𝑇𝒪superscript2𝑁𝑀𝑅14g(T)=\widetilde{g}(T)\operatorname{\mathcal{O}}((2NM+R)^{1/4})italic_g ( italic_T ) = over~ start_ARG italic_g end_ARG ( italic_T ) caligraphic_O ( ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT ), and h(T)=h~(T)𝒪((2NM+R)1/8)𝑇~𝑇𝒪superscript2𝑁𝑀𝑅18h(T)=\widetilde{h}(T)\operatorname{\mathcal{O}}((2NM+R)^{1/8})italic_h ( italic_T ) = over~ start_ARG italic_h end_ARG ( italic_T ) caligraphic_O ( ( 2 italic_N italic_M + italic_R ) start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT ) gives our conclusion. ∎

Appendix D Auxiliary Lemmas

The first lemma extends the famous summation lemma t=1Txts=1txs=𝒪(t=1Txt)superscriptsubscript𝑡1𝑇subscript𝑥𝑡superscriptsubscript𝑠1𝑡subscript𝑥𝑠𝒪superscriptsubscript𝑡1𝑇subscript𝑥𝑡\sum_{t=1}^{T}\frac{x_{t}}{\sqrt{\sum_{s=1}^{t}x_{s}}}=\operatorname{\mathcal{% O}}\left(\sqrt{\sum_{t=1}^{T}x_{t}}\right)∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_ARG end_ARG = caligraphic_O ( square-root start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG ) (Auer et al., 2002).

Lemma D.1.

For non-negative real numbers x1,x2,,xTsubscript𝑥1subscript𝑥2subscript𝑥𝑇x_{1},x_{2},\ldots,x_{T}\in\mathbb{R}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∈ blackboard_R, we have

t=1Txt(s=1txs)1/42(t=1Txt)3/4.superscriptsubscript𝑡1𝑇subscript𝑥𝑡superscriptsuperscriptsubscript𝑠1𝑡subscript𝑥𝑠142superscriptsuperscriptsubscript𝑡1𝑇subscript𝑥𝑡34\sum_{t=1}^{T}\frac{x_{t}}{\left(\sum_{s=1}^{t}x_{s}\right)^{1/4}}\leq 2\left(% \sum_{t=1}^{T}x_{t}\right)^{3/4}.∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ( ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT end_ARG ≤ 2 ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT .
Proof.

Prove by induction. The case when T=1𝑇1T=1italic_T = 1 is obvious. Suppose that the conclusion holds for T1𝑇1T-1italic_T - 1, then consider some xTsubscript𝑥𝑇x_{T}italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT:

t=1Txt(s=1txs)1/4superscriptsubscript𝑡1𝑇subscript𝑥𝑡superscriptsuperscriptsubscript𝑠1𝑡subscript𝑥𝑠14\displaystyle\quad\sum_{t=1}^{T}\frac{x_{t}}{\left(\sum_{s=1}^{t}x_{s}\right)^% {1/4}}∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ( ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT end_ARG
=t=1T1xt(s=1txs)1/4+xT(t=1Txt)1/4absentsuperscriptsubscript𝑡1𝑇1subscript𝑥𝑡superscriptsuperscriptsubscript𝑠1𝑡subscript𝑥𝑠14subscript𝑥𝑇superscriptsuperscriptsubscript𝑡1𝑇subscript𝑥𝑡14\displaystyle=\sum_{t=1}^{T-1}\frac{x_{t}}{\left(\sum_{s=1}^{t}x_{s}\right)^{1% /4}}+\frac{x_{T}}{\left(\sum_{t=1}^{T}x_{t}\right)^{1/4}}= ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ( ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG start_ARG ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT end_ARG
2(t=1T1xt)3/4+xT(t=1Txt)1/4,absent2superscriptsuperscriptsubscript𝑡1𝑇1subscript𝑥𝑡34subscript𝑥𝑇superscriptsuperscriptsubscript𝑡1𝑇subscript𝑥𝑡14\displaystyle\leq 2\left(\sum_{t=1}^{T-1}x_{t}\right)^{3/4}+\frac{x_{T}}{\left% (\sum_{t=1}^{T}x_{t}\right)^{1/4}},≤ 2 ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT + divide start_ARG italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG start_ARG ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT end_ARG ,

so it suffices to prove xT/(t=1Txt)1/42(t=1Txt)3/42(t=1T1xt)3/4\left.x_{T}\middle/\left(\sum_{t=1}^{T}x_{t}\right)^{1/4}\right.\leq 2\left(% \sum_{t=1}^{T}x_{t}\right)^{3/4}-2\left(\sum_{t=1}^{T-1}x_{t}\right)^{3/4}italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT / ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT ≤ 2 ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT - 2 ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT. Notice that

((t=1Txt)3/4(t=1T1xt)3/4)((t=1Txt)1/4+(t=1T1xt)1/4)superscriptsuperscriptsubscript𝑡1𝑇subscript𝑥𝑡34superscriptsuperscriptsubscript𝑡1𝑇1subscript𝑥𝑡34superscriptsuperscriptsubscript𝑡1𝑇subscript𝑥𝑡14superscriptsuperscriptsubscript𝑡1𝑇1subscript𝑥𝑡14\displaystyle\quad\left(\left(\sum_{t=1}^{T}x_{t}\right)^{3/4}-\left(\sum_{t=1% }^{T-1}x_{t}\right)^{3/4}\right)\left(\left(\sum_{t=1}^{T}x_{t}\right)^{1/4}+% \left(\sum_{t=1}^{T-1}x_{t}\right)^{1/4}\right)( ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT - ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT ) ( ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT )
=(t=1Txt)+(t=1Txt)3/4(t=1T1xt)1/4(t=1Txt)1/4(t=1T1xt)3/4(t=1T1xt)absentsuperscriptsubscript𝑡1𝑇subscript𝑥𝑡superscriptsuperscriptsubscript𝑡1𝑇subscript𝑥𝑡34superscriptsuperscriptsubscript𝑡1𝑇1subscript𝑥𝑡14superscriptsuperscriptsubscript𝑡1𝑇subscript𝑥𝑡14superscriptsuperscriptsubscript𝑡1𝑇1subscript𝑥𝑡34superscriptsubscript𝑡1𝑇1subscript𝑥𝑡\displaystyle=\left(\sum_{t=1}^{T}x_{t}\right)+\left(\sum_{t=1}^{T}x_{t}\right% )^{3/4}\left(\sum_{t=1}^{T-1}x_{t}\right)^{1/4}-\left(\sum_{t=1}^{T}x_{t}% \right)^{1/4}\left(\sum_{t=1}^{T-1}x_{t}\right)^{3/4}-\left(\sum_{t=1}^{T-1}x_% {t}\right)= ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT - ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT - ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
xT,absentsubscript𝑥𝑇\displaystyle\geq x_{T},≥ italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ,

where the last inequality uses t=1Txtt=1T1xtsuperscriptsubscript𝑡1𝑇subscript𝑥𝑡superscriptsubscript𝑡1𝑇1subscript𝑥𝑡\sum_{t=1}^{T}x_{t}\geq\sum_{t=1}^{T-1}x_{t}∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≥ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (follows from xT0subscript𝑥𝑇0x_{T}\geq 0italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ≥ 0). Hence, due to the fact that

xT/2(t=1Txt)1/4xT/(t=1Txt)1/4+(t=1T1xt)1/4,\left.x_{T}\middle/2\left(\sum_{t=1}^{T}x_{t}\right)^{1/4}\right.\leq\left.x_{% T}\middle/\left(\sum_{t=1}^{T}x_{t}\right)^{1/4}+\left(\sum_{t=1}^{T-1}x_{t}% \right)^{1/4}\right.,italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT / 2 ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT ≤ italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT / ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT ,

the conclusion holds for T𝑇Titalic_T as well. ∎

Lemma D.2.

Suppose that x1=0subscript𝑥10x_{1}=0italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0, x2,x3,,xT0subscript𝑥2subscript𝑥3subscript𝑥𝑇0x_{2},x_{3},\ldots,x_{T}\geq 0italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ≥ 0, and |xtxt1|1subscript𝑥𝑡subscript𝑥𝑡11\lvert x_{t}-x_{t-1}\rvert\leq 1| italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT | ≤ 1, t=2,3,,Tfor-all𝑡23𝑇\forall t=2,3,\ldots,T∀ italic_t = 2 , 3 , … , italic_T, then

t=1Txt4/347/3xT7/3.superscriptsubscript𝑡1𝑇superscriptsubscript𝑥𝑡43superscript473superscriptsubscript𝑥𝑇73\sum_{t=1}^{T}x_{t}^{4/3}\geq 4^{-7/3}x_{T}^{7/3}.∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT ≥ 4 start_POSTSUPERSCRIPT - 7 / 3 end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 7 / 3 end_POSTSUPERSCRIPT .
Proof.

As adjacent xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT’s differ by no more than 1111, xT<Tsubscript𝑥𝑇𝑇\lfloor x_{T}\rfloor<T⌊ italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ⌋ < italic_T and xTtxTtsubscript𝑥𝑇𝑡subscript𝑥𝑇𝑡x_{T-t}\geq x_{T}-titalic_x start_POSTSUBSCRIPT italic_T - italic_t end_POSTSUBSCRIPT ≥ italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT - italic_t. Therefore,

t=1Txt4/3t=0xTxTt4/3t=0xT(xTt)4/3t=0xT(t4/3+(xTxT)4/3)t=0xTt4/3,superscriptsubscript𝑡1𝑇superscriptsubscript𝑥𝑡43superscriptsubscript𝑡0subscript𝑥𝑇superscriptsubscript𝑥𝑇𝑡43superscriptsubscript𝑡0subscript𝑥𝑇superscriptsubscript𝑥𝑇𝑡43superscriptsubscript𝑡0subscript𝑥𝑇superscript𝑡43superscriptsubscript𝑥𝑇subscript𝑥𝑇43superscriptsubscript𝑡0subscript𝑥𝑇superscript𝑡43\sum_{t=1}^{T}x_{t}^{4/3}\geq\sum_{t=0}^{\lfloor x_{T}\rfloor}x_{T-t}^{4/3}% \geq\sum_{t=0}^{\lfloor x_{T}\rfloor}(x_{T}-t)^{4/3}\geq\sum_{t=0}^{\lfloor x_% {T}\rfloor}\left(t^{4/3}+(x_{T}-\lfloor x_{T}\rfloor)^{4/3}\right)\geq\sum_{t=% 0}^{\lfloor x_{T}\rfloor}t^{4/3},∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT ≥ ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⌊ italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ⌋ end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_T - italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT ≥ ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⌊ italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ⌋ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT - italic_t ) start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT ≥ ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⌊ italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ⌋ end_POSTSUPERSCRIPT ( italic_t start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT + ( italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT - ⌊ italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ⌋ ) start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT ) ≥ ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⌊ italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ⌋ end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT ,

where the last step uses (a+b)4/3a4/3+b4/3superscript𝑎𝑏43superscript𝑎43superscript𝑏43(a+b)^{4/3}\geq a^{4/3}+b^{4/3}( italic_a + italic_b ) start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT ≥ italic_a start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT + italic_b start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT. As

i=0ni4/3i=n2ni4/3(nn2)(n2)4/3(n2)7/3,superscriptsubscript𝑖0𝑛superscript𝑖43superscriptsubscript𝑖𝑛2𝑛superscript𝑖43𝑛𝑛2superscript𝑛243superscript𝑛273\sum_{i=0}^{n}i^{4/3}\geq\sum_{i=\lfloor\frac{n}{2}\rfloor}^{n}i^{4/3}\geq% \left(n-\left\lfloor\frac{n}{2}\right\rfloor\right)\left(\left\lfloor\frac{n}{% 2}\right\rfloor\right)^{4/3}\geq\left(\left\lfloor\frac{n}{2}\right\rfloor% \right)^{7/3},∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_i start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT ≥ ∑ start_POSTSUBSCRIPT italic_i = ⌊ divide start_ARG italic_n end_ARG start_ARG 2 end_ARG ⌋ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_i start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT ≥ ( italic_n - ⌊ divide start_ARG italic_n end_ARG start_ARG 2 end_ARG ⌋ ) ( ⌊ divide start_ARG italic_n end_ARG start_ARG 2 end_ARG ⌋ ) start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT ≥ ( ⌊ divide start_ARG italic_n end_ARG start_ARG 2 end_ARG ⌋ ) start_POSTSUPERSCRIPT 7 / 3 end_POSTSUPERSCRIPT ,

we have t=1Txt4/3(xT2)7/3superscriptsubscript𝑡1𝑇superscriptsubscript𝑥𝑡43superscriptsubscript𝑥𝑇273\sum_{t=1}^{T}x_{t}^{4/3}\geq(\lfloor\frac{x_{T}}{2}\rfloor)^{7/3}∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT ≥ ( ⌊ divide start_ARG italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ⌋ ) start_POSTSUPERSCRIPT 7 / 3 end_POSTSUPERSCRIPT. If xT4subscript𝑥𝑇4x_{T}\geq 4italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ≥ 4, then xT2xT21xT4subscript𝑥𝑇2subscript𝑥𝑇21subscript𝑥𝑇4\lfloor\frac{x_{T}}{2}\rfloor\geq\frac{x_{T}}{2}-1\geq\frac{x_{T}}{4}⌊ divide start_ARG italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ⌋ ≥ divide start_ARG italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG - 1 ≥ divide start_ARG italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG start_ARG 4 end_ARG, giving the conclusion. Otherwise, i.e., xT<4subscript𝑥𝑇4x_{T}<4italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT < 4, then we naturally have t=1Txt4/3(xT4)4/3(xT4)7/3superscriptsubscript𝑡1𝑇superscriptsubscript𝑥𝑡43superscriptsubscript𝑥𝑇443superscriptsubscript𝑥𝑇473\sum_{t=1}^{T}x_{t}^{4/3}\geq(\frac{x_{T}}{4})^{4/3}\geq(\frac{x_{T}}{4})^{7/3}∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT ≥ ( divide start_ARG italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT ≥ ( divide start_ARG italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG start_ARG 4 end_ARG ) start_POSTSUPERSCRIPT 7 / 3 end_POSTSUPERSCRIPT, so our conclusion still follows. ∎

Lemma D.3 ((Huang et al., 2024, Lemma 4)).

If x1=0subscript𝑥10x_{1}=0italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0, x2,x3,,xT0subscript𝑥2subscript𝑥3subscript𝑥𝑇0x_{2},x_{3},\ldots,x_{T}\geq 0italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ≥ 0, and |xtxt1|1subscript𝑥𝑡subscript𝑥𝑡11\lvert x_{t}-x_{t-1}\rvert\leq 1| italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT | ≤ 1, t=2,3,,Tfor-all𝑡23𝑇\forall t=2,3,\ldots,T∀ italic_t = 2 , 3 , … , italic_T, then

t=1Txt24(t=1Txt)3/2.superscriptsubscript𝑡1𝑇superscriptsubscript𝑥𝑡24superscriptsuperscriptsubscript𝑡1𝑇subscript𝑥𝑡32\sum_{t=1}^{T}x_{t}^{2}\leq 4\left(\sum_{t=1}^{T}x_{t}\right)^{3/2}.∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 4 ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT .
Lemma D.4.

If x1=0subscript𝑥10x_{1}=0italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0, x2,x3,,xT0subscript𝑥2subscript𝑥3subscript𝑥𝑇0x_{2},x_{3},\ldots,x_{T}\geq 0italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ≥ 0, and |xtxt1|1subscript𝑥𝑡subscript𝑥𝑡11\lvert x_{t}-x_{t-1}\rvert\leq 1| italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT | ≤ 1, t=2,3,,Tfor-all𝑡23𝑇\forall t=2,3,\ldots,T∀ italic_t = 2 , 3 , … , italic_T, then

t=1Txt4/321/6(t=1Txt)7/6.superscriptsubscript𝑡1𝑇superscriptsubscript𝑥𝑡43superscript216superscriptsuperscriptsubscript𝑡1𝑇subscript𝑥𝑡76\sum_{t=1}^{T}x_{t}^{4/3}\leq 2^{1/6}\left(\sum_{t=1}^{T}x_{t}\right)^{7/6}.∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT ≤ 2 start_POSTSUPERSCRIPT 1 / 6 end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 7 / 6 end_POSTSUPERSCRIPT .
Proof.

Imitating the proof of Lemma D.3 (Huang et al., 2024, Lemma 4), we short x1,x2,,xTsubscript𝑥1subscript𝑥2subscript𝑥𝑇x_{1},x_{2},\ldots,x_{T}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT as y1y2yTsubscript𝑦1subscript𝑦2subscript𝑦𝑇y_{1}\leq y_{2}\leq\cdots\leq y_{T}italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ ⋯ ≤ italic_y start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT. According to the original proof, yT=maxt[T]xt(2t=1Txt)1/2subscript𝑦𝑇subscript𝑡delimited-[]𝑇subscript𝑥𝑡superscript2superscriptsubscript𝑡1𝑇subscript𝑥𝑡12y_{T}=\max_{t\in[T]}x_{t}\leq(2\sum_{t=1}^{T}x_{t})^{1/2}italic_y start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≤ ( 2 ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT. Hence,

t=1Txt4/3yT1/3t=1Txt(2t=1Txt)1/6(t=1Txt)=21/6(t=1Txt)7/6.superscriptsubscript𝑡1𝑇superscriptsubscript𝑥𝑡43superscriptsubscript𝑦𝑇13superscriptsubscript𝑡1𝑇subscript𝑥𝑡superscript2superscriptsubscript𝑡1𝑇subscript𝑥𝑡16superscriptsubscript𝑡1𝑇subscript𝑥𝑡superscript216superscriptsuperscriptsubscript𝑡1𝑇subscript𝑥𝑡76\sum_{t=1}^{T}x_{t}^{4/3}\leq y_{T}^{1/3}\sum_{t=1}^{T}x_{t}\leq\left(2\sum_{t% =1}^{T}x_{t}\right)^{1/6}\left(\sum_{t=1}^{T}x_{t}\right)=2^{1/6}\left(\sum_{t% =1}^{T}x_{t}\right)^{7/6}.∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 / 3 end_POSTSUPERSCRIPT ≤ italic_y start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 3 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≤ ( 2 ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 6 end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = 2 start_POSTSUPERSCRIPT 1 / 6 end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 7 / 6 end_POSTSUPERSCRIPT .

The following two lemmas are similar to Lemma 5 of Huang et al. (2024).

Lemma D.5.

If yf+y3/4glogy𝑦𝑓superscript𝑦34𝑔𝑦y\leq f+y^{3/4}g\log yitalic_y ≤ italic_f + italic_y start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT italic_g roman_log italic_y and f,g1𝑓𝑔1f,g\geq 1italic_f , italic_g ≥ 1, then

y1/4f1/4+glog(2(f1/4+g)2).superscript𝑦14superscript𝑓14𝑔2superscriptsuperscript𝑓14𝑔2y^{1/4}\leq f^{1/4}+g\log\left(2(f^{1/4}+g)^{2}\right).italic_y start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT ≤ italic_f start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_g roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_g ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) .
Proof.

Let i(z)=z4z3glog(z4)f𝑖𝑧superscript𝑧4superscript𝑧3𝑔superscript𝑧4𝑓i(z)=z^{4}-z^{3}g\log(z^{4})-fitalic_i ( italic_z ) = italic_z start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT - italic_z start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_g roman_log ( italic_z start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) - italic_f. Notice that

(f1/4+glog(2(f1/4+g)2))4superscriptsuperscript𝑓14𝑔2superscriptsuperscript𝑓14𝑔24\displaystyle\quad\left(f^{1/4}+g\log\left(2(f^{1/4}+g)^{2}\right)\right)^{4}( italic_f start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_g roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_g ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT
f+4(f1/4+glog(2(f1/4+g)2))3glog(2(f1/4+g)2).absent𝑓4superscriptsuperscript𝑓14𝑔2superscriptsuperscript𝑓14𝑔23𝑔2superscriptsuperscript𝑓14𝑔2\displaystyle\geq f+4\left(f^{1/4}+g\log\left(2(f^{1/4}+g)^{2}\right)\right)^{% 3}g\log\left(2(f^{1/4}+g)^{2}\right).≥ italic_f + 4 ( italic_f start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_g roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_g ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_g roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_g ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) .

As f,g1𝑓𝑔1f,g\geq 1italic_f , italic_g ≥ 1, we have 2(f1/4+g)2f1/4+glog(2(f1/4+g)2)2superscriptsuperscript𝑓14𝑔2superscript𝑓14𝑔2superscriptsuperscript𝑓14𝑔22(f^{1/4}+g)^{2}\geq f^{1/4}+g\log(2(f^{1/4}+g)^{2})2 ( italic_f start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_g ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ italic_f start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_g roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_g ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Applying this relationship to the second log on the RHS of the inequality above, we yield

i(f1/4+glog(2(f1/4+g)2))0.𝑖superscript𝑓14𝑔2superscriptsuperscript𝑓14𝑔20i\left(f^{1/4}+g\log\left(2(f^{1/4}+g)^{2}\right)\right)\geq 0.italic_i ( italic_f start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_g roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_g ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ) ≥ 0 .

On the other hand, from the conditions, we know i(y1/4)0𝑖superscript𝑦140i(y^{1/4})\leq 0italic_i ( italic_y start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT ) ≤ 0. Hence, if we can prove that i(z)𝑖𝑧i(z)italic_i ( italic_z ) is monotone (at least for a range of z𝑧zitalic_z), we can conclude that y1/4f1/4+glog(2(f1/4+g)2)superscript𝑦14superscript𝑓14𝑔2superscriptsuperscript𝑓14𝑔2y^{1/4}\leq f^{1/4}+g\log\left(2(f^{1/4}+g)^{2}\right)italic_y start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT ≤ italic_f start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_g roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_g ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), thus giving our conclusion. To conclude the monotonity of i(z)𝑖𝑧i(z)italic_i ( italic_z ), we calculate its derivative as

i(z)=4z33z2glog(z4)4z2g.superscript𝑖𝑧4superscript𝑧33superscript𝑧2𝑔superscript𝑧44superscript𝑧2𝑔i^{\prime}(z)=4z^{3}-3z^{2}g\log(z^{4})-4z^{2}g.italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_z ) = 4 italic_z start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT - 3 italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_g roman_log ( italic_z start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) - 4 italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_g .

Thus, if we only consider the case where z0𝑧0z\geq 0italic_z ≥ 0, i(z)0superscript𝑖𝑧0i^{\prime}(z)\geq 0italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_z ) ≥ 0 holds when z3glogz+g𝑧3𝑔𝑧𝑔z\geq 3g\log z+gitalic_z ≥ 3 italic_g roman_log italic_z + italic_g. Denoting the larger root of z=3glogz+g𝑧3𝑔𝑧𝑔z=3g\log z+gitalic_z = 3 italic_g roman_log italic_z + italic_g as z0subscript𝑧0z_{0}italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (in case it has no root, let z0=1subscript𝑧01z_{0}=1italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1), we know i(z)𝑖𝑧i(z)italic_i ( italic_z ) is increasing in [z0,+)subscript𝑧0[z_{0},+\infty)[ italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , + ∞ ). Further observing that f1/4+glog(2(f1/4+g)2)superscript𝑓14𝑔2superscriptsuperscript𝑓14𝑔2f^{1/4}+g\log\left(2(f^{1/4}+g)^{2}\right)italic_f start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_g roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_g ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) indeed satisfies z3glogz+g𝑧3𝑔𝑧𝑔z\geq 3g\log z+gitalic_z ≥ 3 italic_g roman_log italic_z + italic_g, we know f1/4+glog(2(f1/4+g)2)z0superscript𝑓14𝑔2superscriptsuperscript𝑓14𝑔2subscript𝑧0f^{1/4}+g\log\left(2(f^{1/4}+g)^{2}\right)\geq z_{0}italic_f start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_g roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_g ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ≥ italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

On the other hand, from the assumption that yf+y3/4glogy𝑦𝑓superscript𝑦34𝑔𝑦y\leq f+y^{3/4}g\log yitalic_y ≤ italic_f + italic_y start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT italic_g roman_log italic_y, we know i(y1/4)0i(f1/4+glog(2(f1/4+g)2))𝑖superscript𝑦140𝑖superscript𝑓14𝑔2superscriptsuperscript𝑓14𝑔2i(y^{1/4})\leq 0\leq i(f^{1/4}+g\log\left(2(f^{1/4}+g)^{2}\right))italic_i ( italic_y start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT ) ≤ 0 ≤ italic_i ( italic_f start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_g roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_g ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ). Our conclusion follows from discussing the relationship between y1/4superscript𝑦14y^{1/4}italic_y start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT and z0subscript𝑧0z_{0}italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT: If y1/4z0superscript𝑦14subscript𝑧0y^{1/4}\leq z_{0}italic_y start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT ≤ italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, then we immediately have

y1/4z0f1/4+glog(2(f1/4+g)2).superscript𝑦14subscript𝑧0superscript𝑓14𝑔2superscriptsuperscript𝑓14𝑔2y^{1/4}\leq z_{0}\leq f^{1/4}+g\log\left(2(f^{1/4}+g)^{2}\right).italic_y start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT ≤ italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ≤ italic_f start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_g roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_g ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) .

Otherwise, according to the monotonity of i(z)𝑖𝑧i(z)italic_i ( italic_z ) when zz0𝑧subscript𝑧0z\geq z_{0}italic_z ≥ italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, we still have

y1/4f1/4+glog(2(f1/4+g)2),superscript𝑦14superscript𝑓14𝑔2superscriptsuperscript𝑓14𝑔2y^{1/4}\leq f^{1/4}+g\log\left(2(f^{1/4}+g)^{2}\right),italic_y start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT ≤ italic_f start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_g roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT + italic_g ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ,

as claimed. ∎

Lemma D.6.

If yf+y3/4glogy+y7/8h𝑦𝑓superscript𝑦34𝑔𝑦superscript𝑦78y\leq f+y^{3/4}g\log y+y^{7/8}hitalic_y ≤ italic_f + italic_y start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT italic_g roman_log italic_y + italic_y start_POSTSUPERSCRIPT 7 / 8 end_POSTSUPERSCRIPT italic_h and f,g,h1𝑓𝑔1f,g,h\geq 1italic_f , italic_g , italic_h ≥ 1, then

y1/8f1/8+g1/2log(2(f1/8+g1/2+h)2)+h.superscript𝑦18superscript𝑓18superscript𝑔122superscriptsuperscript𝑓18superscript𝑔122y^{1/8}\leq f^{1/8}+g^{1/2}\log\left(2(f^{1/8}+g^{1/2}+h)^{2}\right)+h.italic_y start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT ≤ italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + italic_h ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_h .
Proof.

Let i(z)=z8fz6glog(z8)z7h𝑖𝑧superscript𝑧8𝑓superscript𝑧6𝑔superscript𝑧8superscript𝑧7i(z)=z^{8}-f-z^{6}g\log(z^{8})-z^{7}hitalic_i ( italic_z ) = italic_z start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT - italic_f - italic_z start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT italic_g roman_log ( italic_z start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT ) - italic_z start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT italic_h. Notice that

(f1/8+g1/2log(2(f1/8+g1/2+h)2)+h)8superscriptsuperscript𝑓18superscript𝑔122superscriptsuperscript𝑓18superscript𝑔1228\displaystyle\quad\left(f^{1/8}+g^{1/2}\log\left(2(f^{1/8}+g^{1/2}+h)^{2}% \right)+h\right)^{8}( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + italic_h ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_h ) start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT
=(f1/8+g1/2log(2(f1/8+g1/2+h)2)+h)7(f1/8+g1/2log(2(f1/8+g1/2+h)2))+absentlimit-fromsuperscriptsuperscript𝑓18superscript𝑔122superscriptsuperscript𝑓18superscript𝑔1227superscript𝑓18superscript𝑔122superscriptsuperscript𝑓18superscript𝑔122\displaystyle=\left(f^{1/8}+g^{1/2}\log\left(2(f^{1/8}+g^{1/2}+h)^{2}\right)+h% \right)^{7}\left(f^{1/8}+g^{1/2}\log\left(2(f^{1/8}+g^{1/2}+h)^{2}\right)% \right)+= ( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + italic_h ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_h ) start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT ( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + italic_h ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ) +
(f1/8+g1/2log(2(f1/8+g1/2+h)2)+h)7hsuperscriptsuperscript𝑓18superscript𝑔122superscriptsuperscript𝑓18superscript𝑔1227\displaystyle\quad\left(f^{1/8}+g^{1/2}\log\left(2(f^{1/8}+g^{1/2}+h)^{2}% \right)+h\right)^{7}h( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + italic_h ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_h ) start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT italic_h
(f1/8+g1/2log(2(f1/8+g1/2+h)2)+h)6f1/4+absentlimit-fromsuperscriptsuperscript𝑓18superscript𝑔122superscriptsuperscript𝑓18superscript𝑔1226superscript𝑓14\displaystyle\geq\left(f^{1/8}+g^{1/2}\log\left(2(f^{1/8}+g^{1/2}+h)^{2}\right% )+h\right)^{6}f^{1/4}+≥ ( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + italic_h ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_h ) start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT +
(f1/8+g1/2log(2(f1/8+g1/2+h)2)+h)6glog(2(f1/8+g1/2+h)2)+limit-fromsuperscriptsuperscript𝑓18superscript𝑔122superscriptsuperscript𝑓18superscript𝑔1226𝑔2superscriptsuperscript𝑓18superscript𝑔122\displaystyle\quad\left(f^{1/8}+g^{1/2}\log\left(2(f^{1/8}+g^{1/2}+h)^{2}% \right)+h\right)^{6}g\log\left(2(f^{1/8}+g^{1/2}+h)^{2}\right)+( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + italic_h ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_h ) start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT italic_g roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + italic_h ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) +
(f1/8+g1/2log(2(f1/8+g1/2+h)2)+h)7hsuperscriptsuperscript𝑓18superscript𝑔122superscriptsuperscript𝑓18superscript𝑔1227\displaystyle\quad\left(f^{1/8}+g^{1/2}\log\left(2(f^{1/8}+g^{1/2}+h)^{2}% \right)+h\right)^{7}h( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + italic_h ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_h ) start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT italic_h
f+(f1/8+g1/2log(2(f1/8+g1/2+h)2)+h)6glog(2(f1/8+g1/2+h)2)+absent𝑓limit-fromsuperscriptsuperscript𝑓18superscript𝑔122superscriptsuperscript𝑓18superscript𝑔1226𝑔2superscriptsuperscript𝑓18superscript𝑔122\displaystyle\geq f+\left(f^{1/8}+g^{1/2}\log\left(2(f^{1/8}+g^{1/2}+h)^{2}% \right)+h\right)^{6}g\log\left(2(f^{1/8}+g^{1/2}+h)^{2}\right)+≥ italic_f + ( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + italic_h ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_h ) start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT italic_g roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + italic_h ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) +
(f1/8+g1/2log(2(f1/8+g1/2+h)2)+h)7h.superscriptsuperscript𝑓18superscript𝑔122superscriptsuperscript𝑓18superscript𝑔1227\displaystyle\quad\left(f^{1/8}+g^{1/2}\log\left(2(f^{1/8}+g^{1/2}+h)^{2}% \right)+h\right)^{7}h.( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + italic_h ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_h ) start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT italic_h .

As f,g,h1𝑓𝑔1f,g,h\geq 1italic_f , italic_g , italic_h ≥ 1, we have 2(f1/8+g1/2+h)2f1/8+g1/2log(2(f1/8+g1/2+h)2)+h2superscriptsuperscript𝑓18superscript𝑔122superscript𝑓18superscript𝑔122superscriptsuperscript𝑓18superscript𝑔1222(f^{1/8}+g^{1/2}+h)^{2}\geq f^{1/8}+g^{1/2}\log\left(2(f^{1/8}+g^{1/2}+h)^{2}% \right)+h2 ( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + italic_h ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + italic_h ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_h. Applying it to the ()6glog(2(f1/8+g1/2+h)2)superscript6𝑔2superscriptsuperscript𝑓18superscript𝑔122(\cdots)^{6}g\log(2(f^{1/8}+g^{1/2}+h)^{2})( ⋯ ) start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT italic_g roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + italic_h ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) term on the RHS, we have

i(f1/8+g1/2log(2(f1/8+g1/2+h)2)+h)0𝑖superscript𝑓18superscript𝑔122superscriptsuperscript𝑓18superscript𝑔1220i\left(f^{1/8}+g^{1/2}\log\left(2(f^{1/8}+g^{1/2}+h)^{2}\right)+h\right)\geq 0italic_i ( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + italic_h ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_h ) ≥ 0

On the other hand, from the conditions we know i(y1/8)0𝑖superscript𝑦180i(y^{1/8})\leq 0italic_i ( italic_y start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT ) ≤ 0. Hence, we again want to prove the monotony of i(z)𝑖𝑧i(z)italic_i ( italic_z ), which gives the conclusion that y1/8f1/8+g1/2+hsuperscript𝑦18superscript𝑓18superscript𝑔12y^{1/8}\leq f^{1/8}+g^{1/2}+hitalic_y start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT ≤ italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + italic_h.

Calculating the derivative of i(z)𝑖𝑧i(z)italic_i ( italic_z ), we have

i(z)=8z76z5glog(z8)8z5g7z6h.superscript𝑖𝑧8superscript𝑧76superscript𝑧5𝑔superscript𝑧88superscript𝑧5𝑔7superscript𝑧6i^{\prime}(z)=8z^{7}-6z^{5}g\log(z^{8})-8z^{5}g-7z^{6}h.italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_z ) = 8 italic_z start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT - 6 italic_z start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT italic_g roman_log ( italic_z start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT ) - 8 italic_z start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT italic_g - 7 italic_z start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT italic_h .

Thus, if we only consider the case where z1𝑧1z\geq 1italic_z ≥ 1, i(z)0superscript𝑖𝑧0i^{\prime}(z)\geq 0italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_z ) ≥ 0 holds when 8z26glog(z8)8g7zh08superscript𝑧26𝑔superscript𝑧88𝑔7𝑧08z^{2}-6g\log(z^{8})-8g-7zh\geq 08 italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 6 italic_g roman_log ( italic_z start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT ) - 8 italic_g - 7 italic_z italic_h ≥ 0. As z2superscript𝑧2z^{2}italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is convex and 6glog(z8)+8g+7zh6𝑔superscript𝑧88𝑔7𝑧6g\log(z^{8})+8g+7zh6 italic_g roman_log ( italic_z start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT ) + 8 italic_g + 7 italic_z italic_h is concave, there are at most two intersections. Hence, again denoting the larger root of 8z26glog(z8)8g7zh=08superscript𝑧26𝑔superscript𝑧88𝑔7𝑧08z^{2}-6g\log(z^{8})-8g-7zh=08 italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 6 italic_g roman_log ( italic_z start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT ) - 8 italic_g - 7 italic_z italic_h = 0 as z0subscript𝑧0z_{0}italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (in case there is no root, let z0=1subscript𝑧01z_{0}=1italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1), i(z)𝑖𝑧i(z)italic_i ( italic_z ) is monotonic in [z0,+)subscript𝑧0[z_{0},+\infty)[ italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , + ∞ ).

As f,g,h1𝑓𝑔1f,g,h\geq 1italic_f , italic_g , italic_h ≥ 1, f1/8+g1/2log(2(f1/8+g1/2+h)2)+hz0superscript𝑓18superscript𝑔122superscriptsuperscript𝑓18superscript𝑔122subscript𝑧0f^{1/8}+g^{1/2}\log\left(2(f^{1/8}+g^{1/2}+h)^{2}\right)+h\geq z_{0}italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + italic_h ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_h ≥ italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT always holds. The conclusion follows by discussing the relationship of y1/8superscript𝑦18y^{1/8}italic_y start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT and z0subscript𝑧0z_{0}italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT: If y1/8z0superscript𝑦18subscript𝑧0y^{1/8}\leq z_{0}italic_y start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT ≤ italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, we directly have

y1/8z0f1/8+g1/2log(2(f1/8+g1/2+h)2)+h.superscript𝑦18subscript𝑧0superscript𝑓18superscript𝑔122superscriptsuperscript𝑓18superscript𝑔122y^{1/8}\leq z_{0}\leq f^{1/8}+g^{1/2}\log\left(2(f^{1/8}+g^{1/2}+h)^{2}\right)% +h.italic_y start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT ≤ italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ≤ italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + italic_h ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_h .

Otherwise, we can still conclude

y1/8f1/8+g1/2log(2(f1/8+g1/2+h)2)+hsuperscript𝑦18superscript𝑓18superscript𝑔122superscriptsuperscript𝑓18superscript𝑔122y^{1/8}\leq f^{1/8}+g^{1/2}\log\left(2(f^{1/8}+g^{1/2}+h)^{2}\right)+hitalic_y start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT ≤ italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT roman_log ( 2 ( italic_f start_POSTSUPERSCRIPT 1 / 8 end_POSTSUPERSCRIPT + italic_g start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + italic_h ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_h

from the monotonity of i(z)𝑖𝑧i(z)italic_i ( italic_z ) when zz0𝑧subscript𝑧0z\geq z_{0}italic_z ≥ italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. ∎