Large Language Model (LLM)-enabled In-context Learning for Wireless Network Optimization: A Case Study of Power Control thanks: Hao Zhou, Chengming Hu, Dun Yuan, Ye Yuan, and Xue Liu are with the School of Computer Science, McGill University, Montreal, QC H3A 0E9, Canada. (mails:hao.zhou4, chengming.hu, dun.yuan, [email protected], [email protected]). Di Wu is with the School of Electrical and Computer Engineering, McGill University, Montreal, QC H3A 0E9, Canada. (email: [email protected]). Charlie Zhang is with Samsung Research America, Plano, Texas, TX 75023, USA. (email: [email protected]).

Hao Zhou, Chengming Hu, Dun Yuan, Ye Yuan, Di Wu,
Xue Liu, , and Charlie Zhang
Abstract

Large language model (LLM) has recently been considered a promising technique for many fields. This work explores LLM-based wireless network optimization via in-context learning. To showcase the potential of LLM technologies, we consider the base station (BS) power control as a case study, a fundamental but crucial technique that is widely investigated in wireless networks. Different from existing machine learning (ML) methods, our proposed in-context learning algorithm relies on LLM’s inference capabilities. It avoids the complexity of tedious model training and hyper-parameter fine-tuning, which is a well-known bottleneck of many ML algorithms. Specifically, the proposed algorithm first describes the target task via formatted natural language, and then designs the in-context learning framework and demonstration examples. After that, it considers two cases, namely discrete-state and continuous-state problems, and proposes state-based and ranking-based methods to select appropriate examples for these two cases, respectively. Finally, the simulations demonstrate that the proposed algorithm can achieve comparable performance as conventional deep reinforcement learning (DRL) techniques without dedicated model training or fine-tuning. Such an efficient and low-complexity approach has great potential for future wireless network optimization.

Index Terms:
Large language model, in-context learning, network optimization, transmission power control

I Introduction

From LTE and 5G to envisioned 6G, wireless networks are increasingly complicated with diverse application scenarios and novel signal processing and transmission techniques, e.g., unmanned aerial vehicle (UAV), vehicle-to-everything (V2X), mmWave and THz networks, reconfigurable intelligent surface, etc [1]. The constantly evolving network architecture requires more efficient management schemes, and most existing network optimization methods can be summarized into two main approaches: convex optimization and machine learning (ML) algorithms. Specifically, convex optimization usually needs dedicated problem formulation for each specific task, then transforms the objective function or constraints into convex forms. By contrast, ML algorithms, such as reinforcement learning, have lower requirements for problem formulations, but the tedious model training and fine-tuning indicate a large number of iterations [2]. Therefore, these potential issues, e.g., problem-specific transformation and relaxation, hyperparameter tuning, and long training iterations, have become obstacles to further improve the efficiency of next-generation networks.

Large language models (LLMs) are recently considered revolutionary technologies that have been successfully applied to education, finance, healthcare, biology, etc [3]. The great potential of LLM technologies also provides promising opportunities for network management and optimization [4]. For instance, a promising feature of LLMs is learning from language-based descriptions and demonstrations, which is known as in-context learning. Compared with convex optimization or conventional ML approaches, in-context learning has multiple advantages [5]: 1) in-context learning relies on LLM’s inference process, and it avoids the complexity of dedicated model training and fine-tuning, which is a well-known bottleneck for many ML techniques; 2) in-context learning allows natural language-based task design and implementation, and the operator can easily formulate the target task using human language and instructions. Such a user-friendly approach can also significantly lower the requirements for professional knowledge when solving specific tasks.

LLM-enabled wireless networks have recently attracted considerable interest, e.g., 6G edge intelligence[6], network intrusion detection and LLM-enhanced reconfigurable intelligent surface for internet of vehicles (IoV)[7, 8]. By contrast, this work focuses on optimization problems, and it considers base station (BS) power control as a case study, exploring the potential of using LLM to solve optimization problems. The power control of cellular networks has been extensively studied with diverse objectives and algorithms, i.e., convex optimization, game theory, reinforcement learning [9], etc. These studies prove that power control is a fundamental and critical technique to improve energy efficiency, reduce interference, and save power consumption [10]. Therefore, given such crucial importance, we select BS power control as a case study, and explore the potential of state-of-the-art LLM technologies for wireless network optimization.

In particular, our proposed LLM-enabled in-context learning algorithm first designs a natural language-based task description, i.e., task goal, definition, and rules. The formatted task description, along with a set of selected examples, will become the prompt input for the LLM model. Then, the LLM model can utilize the task description and advisable examples to generate a decision based on the current environment state. After that, the output decision, environment state, and corresponding system performance will become a new example and be stored in an experience pool. Given the next environment state, we will select new advisable examples from the experience pool, serving as references for the next LLM decision-making. In addition, we further consider two scenarios, namely discrete-state and continuous-state problems, and propose state-based and ranking-based example selection methods for these two cases, respectively. Finally, we evaluate the proposed algorithm with various LLMs, e.g., Llama3-8b-instruct, Llama3-70b-instruct, and GPT-3.5 turbo, and the simulations prove that the proposed algorithm can achieve satisfactory performance.

The main contribution of this work is that we explored LLM-enabled in-context learning for wireless network optimization problems. Specifically, it proposed an in-context optimization technique that learns from the environment interactions without model training and fine-tuning. The simulation results show that our proposed algorithm enables LLM to learn from previous explorations and examples, and constantly improve the performance on target tasks. Such an efficient and training-free algorithm has great potential for wireless network optimization and management.

II System Model

This section introduces a BS power minimization problem, serving as a case study of the proposed in-context learning algorithm111Various objectives and constraints can be defined for power control problems as summarized in [10], and here we select power minimization as a specific problem formulation.. Considering a BS with Ubsubscript𝑈𝑏U_{b}italic_U start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT users, the achievable data rate Cb,usubscript𝐶𝑏𝑢C_{b,u}italic_C start_POSTSUBSCRIPT italic_b , italic_u end_POSTSUBSCRIPT between BS b𝑏bitalic_b and user u𝑢uitalic_u is defined by [11]

Cb,u=k=1Kbdklog(1+pb,khb,k,uγb,k,ubBbpb,khb,k,uγb,k,u+dkN0),subscript𝐶𝑏𝑢superscriptsubscript𝑘1subscript𝐾𝑏subscript𝑑𝑘𝑙𝑜𝑔1subscript𝑝𝑏𝑘subscript𝑏𝑘𝑢subscript𝛾𝑏𝑘𝑢subscriptsuperscript𝑏subscript𝐵𝑏subscript𝑝superscript𝑏superscript𝑘subscriptsuperscript𝑏superscript𝑘superscript𝑢subscript𝛾superscript𝑏superscript𝑘superscript𝑢subscript𝑑𝑘subscript𝑁0C_{b,u}=\sum\limits_{k=1}\limits^{K_{b}}d_{k}log(1+\frac{p_{b,k}h_{b,k,u}% \gamma_{b,k,u}}{\sum\limits_{b^{\prime}\in B_{-b}}{p_{b^{\prime},k^{\prime}}h_% {b^{\prime},k^{\prime},u^{\prime}}\gamma_{b^{\prime},k^{\prime},u^{\prime}}}+d% _{k}N_{0}}),italic_C start_POSTSUBSCRIPT italic_b , italic_u end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_l italic_o italic_g ( 1 + divide start_ARG italic_p start_POSTSUBSCRIPT italic_b , italic_k end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_b , italic_k , italic_u end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_b , italic_k , italic_u end_POSTSUBSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT - italic_b end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) ,

(1)

where Kbsubscript𝐾𝑏K_{b}italic_K start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT is the total number of resource blocks (RBs) in BS b𝑏bitalic_b, dksubscript𝑑𝑘d_{k}italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is the bandwidth of RB k𝑘kitalic_k, pb,ksubscript𝑝𝑏𝑘p_{b,k}italic_p start_POSTSUBSCRIPT italic_b , italic_k end_POSTSUBSCRIPT indicates the transmission power of BS b𝑏bitalic_b on RB k𝑘kitalic_k, hb,k,usubscript𝑏𝑘𝑢h_{b,k,u}italic_h start_POSTSUBSCRIPT italic_b , italic_k , italic_u end_POSTSUBSCRIPT defines the channel gain between BS b𝑏bitalic_b and user u𝑢uitalic_u on RB k𝑘kitalic_k, and N0subscript𝑁0N_{0}italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the noise power density. For the RB allocation, γb,k,u{0,1}subscript𝛾𝑏𝑘𝑢01\gamma_{b,k,u}\in\{0,1\}italic_γ start_POSTSUBSCRIPT italic_b , italic_k , italic_u end_POSTSUBSCRIPT ∈ { 0 , 1 } indicates whether RB k𝑘kitalic_k is allocated to the transmission for user u𝑢uitalic_u. For the interference, Bbsubscript𝐵𝑏B_{-b}italic_B start_POSTSUBSCRIPT - italic_b end_POSTSUBSCRIPT represent the set of adjacent BSs except for BS b𝑏bitalic_b, pb,khb,k,uγb,k,usubscript𝑝superscript𝑏superscript𝑘subscriptsuperscript𝑏superscript𝑘superscript𝑢subscript𝛾superscript𝑏superscript𝑘superscript𝑢p_{b^{\prime},k^{\prime}}h_{b^{\prime},k^{\prime},u^{\prime}}\gamma_{b^{\prime% },k^{\prime},u^{\prime}}italic_p start_POSTSUBSCRIPT italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT defines the inter-cell interference, and we assume orthogonal frequency-division multiplexing is applied to eliminate intra-cell interference.

This work aims to minimize the BS transmission power and meanwhile satisfy the average data rate constraint [10]:

minPbsubscriptsubscript𝑃𝑏\displaystyle\min_{P_{b}}roman_min start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUBSCRIPT bBPbsubscript𝑏𝐵subscript𝑃𝑏\displaystyle\enspace\sum\nolimits_{b\in B}P_{b}∑ start_POSTSUBSCRIPT italic_b ∈ italic_B end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT (2)
s.t. 0PbPmax,0subscript𝑃𝑏subscript𝑃𝑚𝑎𝑥\displaystyle\enspace 0\leq P_{b}\leq P_{max},0 ≤ italic_P start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ≤ italic_P start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT , (2a)
Pb=k=1Kbpb,k,subscript𝑃𝑏superscriptsubscript𝑘1subscript𝐾𝑏subscript𝑝𝑏𝑘\displaystyle\enspace P_{b}=\sum\nolimits_{k=1}^{K_{b}}p_{b,k},italic_P start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_b , italic_k end_POSTSUBSCRIPT , (2b)
u=1UbCb,u/UbCmin,superscriptsubscript𝑢1subscript𝑈𝑏subscript𝐶𝑏𝑢subscript𝑈𝑏subscript𝐶𝑚𝑖𝑛\displaystyle\enspace\sum\nolimits_{u=1}^{U_{b}}C_{b,u}/U_{b}\geq C_{min},∑ start_POSTSUBSCRIPT italic_u = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_b , italic_u end_POSTSUBSCRIPT / italic_U start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ≥ italic_C start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT , (2c)

where Pbsubscript𝑃𝑏P_{b}italic_P start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT is the total transmission power of BS b𝑏bitalic_b and Pb=k=1Kbpb,ksubscript𝑃𝑏superscriptsubscript𝑘1subscript𝐾𝑏subscript𝑝𝑏𝑘P_{b}=\sum\nolimits_{k=1}^{K_{b}}p_{b,k}italic_P start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_b , italic_k end_POSTSUBSCRIPT, pb,ksubscript𝑝𝑏𝑘p_{b,k}italic_p start_POSTSUBSCRIPT italic_b , italic_k end_POSTSUBSCRIPT has been defined in equation (1) as the transmission power of RB k𝑘kitalic_k, Pmaxsubscript𝑃𝑚𝑎𝑥P_{max}italic_P start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT is the maximum power, Ubsubscript𝑈𝑏U_{b}italic_U start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT is the total number of users, and Cminsubscript𝐶𝑚𝑖𝑛C_{min}italic_C start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT is the average achievable data rate constraint. We assume Pbsubscript𝑃𝑏P_{b}italic_P start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT is equally allocated to all RBs, and a proportional fairness method is used for RB allocation, which has been widely used as a classic approach. Then we can better focus on LLM features.

The control variable in equation (2) is the total BS transmission power Pbsubscript𝑃𝑏P_{b}italic_P start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT, which needs to be dynamically adjusted based on the wireless environment, e.g., current user numbers or user-BS distances, to save power consumption and meanwhile maintain the average data rate. Problem (2) has been extensively investigated in existing studies, but this work differs from previous works by presenting a unique view from the perspective of natural language-based network optimization. Specifically, we propose a novel LLM-enabled in-context learning algorithm in the following Section III.

Refer to caption
Figure 1: Overall design of the proposed LLM-enabled in-context learning for transmission power control.

III In-context Learning-based Optimization Algorithm

In-context learning refers to the process that LLMs can learn from formatted natural language such as task descriptions and task solution demonstrations, to improve the performance on target tasks. In-context learning can be defined as [5]

Dtask×t×st×at,subscript𝐷𝑡𝑎𝑠𝑘subscript𝑡subscript𝑠𝑡subscript𝑎𝑡D_{task}\times\mathcal{E}_{t}\times s_{t}\times\mathcal{LLM}\Rightarrow a_{t},italic_D start_POSTSUBSCRIPT italic_t italic_a italic_s italic_k end_POSTSUBSCRIPT × caligraphic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT × italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT × caligraphic_L caligraphic_L caligraphic_M ⇒ italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (3)

where Dtasksubscript𝐷𝑡𝑎𝑠𝑘D_{task}italic_D start_POSTSUBSCRIPT italic_t italic_a italic_s italic_k end_POSTSUBSCRIPT is the task description, tsubscript𝑡\mathcal{E}_{t}caligraphic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the set of examples at time t𝑡titalic_t, stsubscript𝑠𝑡s_{t}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the environment state at time t𝑡titalic_t that is associated with the target task , \mathcal{LLM}caligraphic_L caligraphic_L caligraphic_M indicates the LLM model, and atsubscript𝑎𝑡a_{t}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the LLM output. For a sequential decision-making problem, we expect the LLM can utilize the initial task description Dtasksubscript𝐷𝑡𝑎𝑠𝑘D_{task}italic_D start_POSTSUBSCRIPT italic_t italic_a italic_s italic_k end_POSTSUBSCRIPT, learn from the example set tsubscript𝑡\mathcal{E}_{t}caligraphic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and then make decision atsubscript𝑎𝑡a_{t}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT based on current environment state stsubscript𝑠𝑡s_{t}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT of the target task. In the following, we will introduce the design of task description Dtasksubscript𝐷𝑡𝑎𝑠𝑘D_{task}italic_D start_POSTSUBSCRIPT italic_t italic_a italic_s italic_k end_POSTSUBSCRIPT and the selection of example set tsubscript𝑡\mathcal{E}_{t}caligraphic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

III-A Language-based Task Description

Dtasksubscript𝐷𝑡𝑎𝑠𝑘D_{task}italic_D start_POSTSUBSCRIPT italic_t italic_a italic_s italic_k end_POSTSUBSCRIPT is crucial to provide target task information to the LLM model. In particular, it involves “Task_goal𝑇𝑎𝑠𝑘_𝑔𝑜𝑎𝑙Task\_goalitalic_T italic_a italic_s italic_k _ italic_g italic_o italic_a italic_l”, “Task_definition𝑇𝑎𝑠𝑘_𝑑𝑒𝑓𝑖𝑛𝑖𝑡𝑖𝑜𝑛Task\_definitionitalic_T italic_a italic_s italic_k _ italic_d italic_e italic_f italic_i italic_n italic_i italic_t italic_i italic_o italic_n”, and extra “Rules𝑅𝑢𝑙𝑒𝑠Rulesitalic_R italic_u italic_l italic_e italic_s”. The following is a detailed task description we designed to prompt the LLM.

Task description for BS transmission power control Task goal: You have a decision-making task for base station power control, and you need to select between 4 power levels from 1 to 4. Task definition: You have to consider the specific user number of each case, which is the “base station user number”. Following are some examples {Example_set}𝐸𝑥𝑎𝑚𝑝𝑙𝑒_𝑠𝑒𝑡\{Example\_set\}{ italic_E italic_x italic_a italic_m italic_p italic_l italic_e _ italic_s italic_e italic_t }. Now I will give you a new condition to solve, the current BS user number is {Num_BS_user}𝑁𝑢𝑚_𝐵𝑆_𝑢𝑠𝑒𝑟\{Num\_BS\_user\}{ italic_N italic_u italic_m _ italic_B italic_S _ italic_u italic_s italic_e italic_r }. Rules: Now please select from “level 1”, “level 2”, “level 3”, and “level 4” based on the above examples.

In particular, the Task_goal𝑇𝑎𝑠𝑘_𝑔𝑜𝑎𝑙Task\_goalitalic_T italic_a italic_s italic_k _ italic_g italic_o italic_a italic_l first specifies a “decision-making task for base station power control”, and the goal is to “select between 4 power levels222Here we select 4 power levels as an example, which can be changed to any number of levels. Then the Task_definition𝑇𝑎𝑠𝑘_𝑑𝑒𝑓𝑖𝑛𝑖𝑡𝑖𝑜𝑛Task\_definitionitalic_T italic_a italic_s italic_k _ italic_d italic_e italic_f italic_i italic_n italic_i italic_t italic_i italic_o italic_n introduces the environment states we need to consider. For example, this work assumes the total user numbers may change dynamically, and then the LLM has to consider the “base station user number” of each case. In addition, it means that the environment state stsubscript𝑠𝑡s_{t}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in equation (3) refers to the total user number Ubsubscript𝑈𝑏U_{b}italic_U start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT in problem (2). After that, the example set tsubscript𝑡\mathcal{E}_{t}caligraphic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is included by “Following are some examples….”, and we provide a new condition for the LLM to solve, which is associated with the current user number Ubsubscript𝑈𝑏U_{b}italic_U start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT. Finally, we set extra reply rules such as “select from … based on the above examples”, indicating the LLM to focus on the decision-making process.

The above task description provides a template to define optimization tasks by formatted natural language, avoiding the complexity of dedicated optimization model design. It is also user-friendly since the operator can easily add or remove task descriptions without requiring any professional knowledge of optimization techniques.

III-B In-context Learning Framework and Example Design

Examples are of great importance in in-context learning, and they must be carefully selected because: 1) examples serve as crucial references for LLM decision-making, which means the LLM relies on examples to justify its decision; 2) due to the LLM context window size constraint333The context window size indicates the largest number of tokens that can be sent to the LLM, it is impractical to send a large number of examples to the LLM. Moreover, there are many optimization problems with continuous environment states, which are very common in wireless networks, e.g., adjusting the BS transmission power based on user-BS distance. Such cases mean that there may be an infinite number of examples, and therefore identifying the most relevant and useful examples becomes challenging. This work defines an example E𝐸Eitalic_E by

E={s,a,r(s,a)},Eformulae-sequence𝐸𝑠𝑎𝑟𝑠𝑎𝐸E=\{s,a,r(s,a)\},E\in\mathcal{E}italic_E = { italic_s , italic_a , italic_r ( italic_s , italic_a ) } , italic_E ∈ caligraphic_E (4)

where s𝑠sitalic_s and a𝑎aitalic_a are environment state and decision, respectively. Inspired by reinforcement learning, we further define a reward value to evaluate the decision a𝑎aitalic_a by

r=PtargetPbβ𝑟subscript𝑃𝑡𝑎𝑟𝑔𝑒𝑡subscript𝑃𝑏𝛽r=P_{target}-P_{b}-\betaitalic_r = italic_P start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT - italic_P start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT - italic_β (5)

where Ptargetsubscript𝑃𝑡𝑎𝑟𝑔𝑒𝑡P_{target}italic_P start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT is a target power consumption, and Pbsubscript𝑃𝑏P_{b}italic_P start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT has been defined in problem (2) as the total power consumption of BS b𝑏bitalic_b. β𝛽\betaitalic_β is a penalty term, which is only applied when constraint (2c) is not satisfied. Then, r𝑟ritalic_r provides a comprehensive metric to evaluate the selected decision a𝑎aitalic_a under environment state s𝑠sitalic_s.

Fig.1 shows the overall design of the proposed in-context learning algorithm for transmission power control. Specifically, the above task description Dtasksubscript𝐷𝑡𝑎𝑠𝑘D_{task}italic_D start_POSTSUBSCRIPT italic_t italic_a italic_s italic_k end_POSTSUBSCRIPT, current environment state stsubscript𝑠𝑡s_{t}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and selected examples tsubscript𝑡\mathcal{E}_{t}caligraphic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are integrated as input prompt as defined in equation (3), and then the LLM model will generate a power control decision atsubscript𝑎𝑡a_{t}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT based on stsubscript𝑠𝑡s_{t}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and the experiences in tsubscript𝑡\mathcal{E}_{t}caligraphic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Then, the decision atsubscript𝑎𝑡a_{t}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is implemented, the achieved data rate Cb,usubscript𝐶𝑏𝑢C_{b,u}italic_C start_POSTSUBSCRIPT italic_b , italic_u end_POSTSUBSCRIPT is collected, and the reward rtsubscript𝑟𝑡r_{t}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is calculated as equation (5). Et={st,at,rt(st,at)}subscript𝐸𝑡subscript𝑠𝑡subscript𝑎𝑡subscript𝑟𝑡subscript𝑠𝑡subscript𝑎𝑡E_{t}=\{s_{t},a_{t},r_{t}(s_{t},a_{t})\}italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = { italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) } becomes a new example in the accumulated experience pool poolsubscript𝑝𝑜𝑜𝑙\mathcal{E}_{pool}caligraphic_E start_POSTSUBSCRIPT italic_p italic_o italic_o italic_l end_POSTSUBSCRIPT in Fig.1. After that, based on the next environment state st+1subscript𝑠𝑡1s_{t+1}italic_s start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT, a new example set t+1subscript𝑡1\mathcal{E}_{t+1}caligraphic_E start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT is selected, and the selected examples are inserted into the task description with st+1subscript𝑠𝑡1s_{t+1}italic_s start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT, becoming a new prompt for the LLM model to generate at+1subscript𝑎𝑡1a_{t+1}italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT.

III-C State-based Example Selection for Discrete State Problems

Selecting appropriate examples is critical for in-context learning since the LLM model learns from existing demonstrations to handle the target task. For problems with discrete environment states, relevant demonstrations can be easily identified by finding existing examples with the same states in the accumulated experience pool poolsubscript𝑝𝑜𝑜𝑙\mathcal{E}_{pool}caligraphic_E start_POSTSUBSCRIPT italic_p italic_o italic_o italic_l end_POSTSUBSCRIPT. Considering a target task with environment state value stargetsubscript𝑠𝑡𝑎𝑟𝑔𝑒𝑡s_{target}italic_s start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT, the set of relevant examples can be identified by

relevant={E{s,a,r(s,a)}|s=starget,Epool}subscript𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡conditional-set𝐸𝑠𝑎𝑟𝑠𝑎formulae-sequence𝑠subscript𝑠𝑡𝑎𝑟𝑔𝑒𝑡𝐸subscript𝑝𝑜𝑜𝑙\mathcal{E}_{relevant}=\Big{\{}E\{s,a,r(s,a)\}\Big{|}s=s_{target},E\in\mathcal% {E}_{pool}\Big{\}}caligraphic_E start_POSTSUBSCRIPT italic_r italic_e italic_l italic_e italic_v italic_a italic_n italic_t end_POSTSUBSCRIPT = { italic_E { italic_s , italic_a , italic_r ( italic_s , italic_a ) } | italic_s = italic_s start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT , italic_E ∈ caligraphic_E start_POSTSUBSCRIPT italic_p italic_o italic_o italic_l end_POSTSUBSCRIPT } (6)

where poolsubscript𝑝𝑜𝑜𝑙\mathcal{E}_{pool}caligraphic_E start_POSTSUBSCRIPT italic_p italic_o italic_o italic_l end_POSTSUBSCRIPT is the accumulated experience pool in Fig. 2. Given relevantsubscript𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡\mathcal{E}_{relevant}caligraphic_E start_POSTSUBSCRIPT italic_r italic_e italic_l italic_e italic_v italic_a italic_n italic_t end_POSTSUBSCRIPT, we can easily select recommended examples with high reward, i.e., top-K examples, and inadvisable examples, e.g., examples with lower reward or violating the minimum data rate constraint.

In addition, we include a well-known epsilon-greedy policy to balance exploration and exploitation.

a={Random action selection,if rand<ϵ;LLM-based decision-making,else,𝑎casesRandom action selectionif 𝑟𝑎𝑛𝑑italic-ϵmissing-subexpressionLLM-based decision-making,elsemissing-subexpressiona=\left\{\begin{array}[]{lcl}\text{Random action selection},&\text{if }rand<% \epsilon;\\ \text{LLM-based decision-making,}&\text{else},\end{array}\right.italic_a = { start_ARRAY start_ROW start_CELL Random action selection , end_CELL start_CELL if italic_r italic_a italic_n italic_d < italic_ϵ ; end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL LLM-based decision-making, end_CELL start_CELL else , end_CELL start_CELL end_CELL end_ROW end_ARRAY (7)

where ϵitalic-ϵ\epsilonitalic_ϵ is a predefined value, and rand𝑟𝑎𝑛𝑑randitalic_r italic_a italic_n italic_d is a random number between 0 and 1. Therefore, the random exploration in equation (7) can constantly explore new examples, and then the LLM model can learn from better relevant examples relevantsubscript𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡\mathcal{E}_{relevant}caligraphic_E start_POSTSUBSCRIPT italic_r italic_e italic_l italic_e italic_v italic_a italic_n italic_t end_POSTSUBSCRIPT to improve the performance.

III-D Ranking-based Example Selection for Continuous State Problems

Compared with discrete-state problems, environments with continuous states can be much more complicated. For instance, when using average user-BS distance as an environment state for BS transmission power control with a target task stargetsubscript𝑠𝑡𝑎𝑟𝑔𝑒𝑡s_{target}italic_s start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT, it is unlikely to find a specific existing example E{s,a,r(s,a)}𝐸𝑠𝑎𝑟𝑠𝑎E\{s,a,r(s,a)\}italic_E { italic_s , italic_a , italic_r ( italic_s , italic_a ) } with s=starget𝑠subscript𝑠𝑡𝑎𝑟𝑔𝑒𝑡s=s_{target}italic_s = italic_s start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT, since stargetsubscript𝑠𝑡𝑎𝑟𝑔𝑒𝑡s_{target}italic_s start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT is a random number within the BS maximum coverage distance. This problem may be solved by discretizing the continuous states into some discrete values, but this may still lead to a large number of states or extra errors. To this end, we define a new metric \mathcal{L}caligraphic_L for example selection with continuous states:

(E,starget)=r(s,a)τsstarget,𝐸subscript𝑠𝑡𝑎𝑟𝑔𝑒𝑡𝑟𝑠𝑎𝜏norm𝑠subscript𝑠𝑡𝑎𝑟𝑔𝑒𝑡\mathcal{L}(E,s_{target})=r(s,a)-\tau||s-s_{target}||,caligraphic_L ( italic_E , italic_s start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT ) = italic_r ( italic_s , italic_a ) - italic_τ | | italic_s - italic_s start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT | | , (8)

where (E,starget)𝐸subscript𝑠𝑡𝑎𝑟𝑔𝑒𝑡\mathcal{L}(E,s_{target})caligraphic_L ( italic_E , italic_s start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT ) is a comprehensive metric to evaluate the usefulness of E={s,a,r(s,a)}𝐸𝑠𝑎𝑟𝑠𝑎E=\{s,a,r(s,a)\}italic_E = { italic_s , italic_a , italic_r ( italic_s , italic_a ) } to the decision-making of stargetsubscript𝑠𝑡𝑎𝑟𝑔𝑒𝑡s_{target}italic_s start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT, and sstargetnorm𝑠subscript𝑠𝑡𝑎𝑟𝑔𝑒𝑡||s-s_{target}||| | italic_s - italic_s start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT | | is the L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT norm to define the distance between s𝑠sitalic_s and stargetsubscript𝑠𝑡𝑎𝑟𝑔𝑒𝑡s_{target}italic_s start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT. Equation (8) aims to jointly consider the reward and states of example E𝐸Eitalic_E, and τ𝜏\tauitalic_τ is a weighting factor to balance the importance of higher reward r(s,a)𝑟𝑠𝑎r(s,a)italic_r ( italic_s , italic_a ) and more similar states between s𝑠sitalic_s and stargetsubscript𝑠𝑡𝑎𝑟𝑔𝑒𝑡s_{target}italic_s start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT. Specifically, a higher reward r(s,a)𝑟𝑠𝑎r(s,a)italic_r ( italic_s , italic_a ) indicates that E𝐸Eitalic_E includes a good action selection a𝑎aitalic_a under environment state s𝑠sitalic_s, and meanwhile lower sstargetnorm𝑠subscript𝑠𝑡𝑎𝑟𝑔𝑒𝑡||s-s_{target}||| | italic_s - italic_s start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT | | value means the environment state s𝑠sitalic_s in E𝐸Eitalic_E is more similar to stargetsubscript𝑠𝑡𝑎𝑟𝑔𝑒𝑡s_{target}italic_s start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT. Therefore, we use (E,starget)𝐸subscript𝑠𝑡𝑎𝑟𝑔𝑒𝑡\mathcal{L}(E,s_{target})caligraphic_L ( italic_E , italic_s start_POSTSUBSCRIPT italic_t italic_a italic_r italic_g italic_e italic_t end_POSTSUBSCRIPT ) as a comprehensive metric, and then the recommended and inadvisable examples can be selected similarly as in Section III-C by using top-K methods.

III-E Baseline Algorithm

We consider deep reinforcement learning (DRL) as a baseline since it is one of the most widely used ML algorithms to solve network optimization problems [12]. DRL can usually produce satisfactory optimization results, but it requires proper parameter fine-tuning and dedicated model training. This work investigates two scenarios, and the Markov decision processes (MDPs) are defined by: 1) for discrete-state problems, the state is defined by user numbers associated with the BS; 2) for continuous-state problems, we use average user-BS distance as a continuous-changing state. The action is defined by the BS power level, and the reward is shown as equation (5).

IV Performance Evaluation

Refer to caption
(a) Discrete state space: System reward and
service quality comparison of various LLMs.
Refer to caption
(b) Continuous state space: System reward
comparison of various LLMs.
Refer to caption
(c) Continuous state space: Power consumption comparison of various LLMs.
Refer to caption
(d) Continuous state space: Average reward comparison under different data rate constraints.
Refer to caption
(e) Continuous state space: Average power
consumption comparison under different
data rate constraints.
Refer to caption
(f) Continuous state space: Average service quality comparison under different data rate constraints.
Figure 2: Simulation results and comparisons

IV-A Simulation Settings

We consider three adjacent small base stations (SBSs); the user number of each SBS randomly changes from 5 to 15, and the SBS’s coverage is 20 meters. The channel gain applies 3GPP urban network models, and 2 cases are evaluated:
Case I: Discrete states defined by user numbers of each SBS;
Case II: Continuous states defined by average user-SBS distance, which represents 2 kinds of network optimization problems. Then, the simulation considers two main approaches:
1) LLM-based method includes 3 models: Llama3-8b-instruct, Llama3-70b-instruct, and GPT-3.5 turbo. Llama3-8b is a small-scale LLM, while Llama3-70b and GPT-3.5 turbo are large models. Using LLM models with various sizes can better evaluate the capabilities of our proposed algorithms.
2) DRL-based method as introduced in Section III-E. With dedicated model training, here we consider DRL as an optimal baseline since its capability has been demonstrated in many existing studies [12, 11].

IV-B Simulation Results

Fig. 2 shows the simulation results and comparisons, and the metrics include average reward, power consumption, and service quality (indicating the probability of satisfying the minimum average data rate constraint defined in equation (2)).

Firstly, Fig. 2(a) presents the system reward and service quality of different LLMs under discrete state space. One can observe that both Llama3 LLMs achieve a comparable reward and service quality as the DRL baseline, while GPT-3.5 shows a lower reward and service quality. Fig. 2(a) demonstrates that the proposed in-context learning algorithm and state-based example selection method can provide satisfactory performance for problems with a limited number of environment states.

Then, we consider more complicated scenarios with continuous states defined by the average user-BS distance. Fig. 2(b) and 2(c) present the reward and average power consumption, respectively. All LLM models achieve higher rewards and lower power consumption as the number of episodes increases and finally converge to stable values. The results demonstrate that LLMs can learn from previous examples and explorations and then improve the performance on target tasks.

In addition, we observe the algorithm performance under different minimum data rate constraints. Fig. 2(d), 2(e), and 2(f) present the average reward, power consumption, and service quality, respectively. Here, every value in the results is obtained by taking the average performance of converged episodes of corresponding LLMs as in Fig. 2(b) and 2(c). As expected, the simulation results show that increasing the minimum data rate constraint leads to lower reward, lower service quality, and higher power consumption. Therefore, Fig. 2(d), 2(e), and 2(f) demonstrate that the proposed algorithms can adapt to different optimization settings and then adjust their policies to improve the performance on target tasks.

Note that the algorithm performance is also related to specific LLMs. Llama3 represents state-of-the-art LLM designs, while GPT-3.5 is an early LLM model. Therefore, it is reasonable that Llama3-8b and Llama3-70b maintain comparable performance as the DRL baseline, while GPT-3.5 turbo presents a worse performance in different tasks. For instance, when the minimum data rate constraint is 2 Mbps per user, GPT-3.5 has a 25% lower reward and 20% lower service quality than other LLMs. In summary, the simulations in Fig. 2 demonstrate that our proposed in-context learning algorithm can achieve comparable performance as conventional DRL algorithms without dedicated model training and fine-tuning.

V Conclusion

LLM is a promising technique for future wireless networks, and this work proposes an LLM-enabled in-context learning algorithm for BS transmission power control. The proposed algorithm can handle both discrete and continuous state problems, and the simulations show that it achieves comparable performance as conventional DRL algorithms. This work demonstrates the great potential of in-context learning for handling network management and optimization problems.

References

  • [1] Z. Zhang, Y. Xiao, Z. Ma, M. Xiao, Z. Ding, X. Lei, G. K. Karagiannidis, and P. Fan, “6g wireless networks: Vision, requirements, architecture, and key technologies,” IEEE vehicular technology magazine, vol. 14, no. 3, pp. 28–41.
  • [2] H. Zhou, M. Erol-Kantarci, Y. Liu, and H. V. Poor, “A survey on model-based, heuristic, and machine learning optimization approaches in ris-aided wireless networks,” IEEE Communications Surveys & Tutorials, 2023.
  • [3] W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong et al., “A survey of large language models,” arXiv preprint arXiv:2303.18223, 2023.
  • [4] H. Zhou, C. Hu, Y. Yuan, Y. Cui, Y. Jin, C. Chen, H. Wu, D. Yuan, L. Jiang, D. Wu et al., “Large language model (LLM) for telecommunications: A comprehensive survey on principles, key techniques, and opportunities,” arXiv preprint arXiv:2405.10825, 2024.
  • [5] Q. Dong, L. Li, D. Dai, C. Zheng, Z. Wu, and et al., “A survey on in-context learning,” arXiv preprint arXiv:2301.00234, 2022.
  • [6] Z. Lin, G. Qu, Q. Chen, X. Chen, Z. Chen, and K. Huang, “Pushing large language models to the 6G edge: Vision, challenges, and opportunities,” arXiv preprint arXiv:2309.16739, 2023.
  • [7] M. Fu, P. Wang, M. Liu, Z. Zhang, and X. Zhou, “Iov-bert-ids: Hybrid network intrusion detection system in iov using large language models,” IEEE Transactions on Vehicular Technology, 2024.
  • [8] Q. Liu, J. Mu, D. ChenZhang, Y. Liu, and T. Hong, “Llm enhanced reconfigurable intelligent surface for energy-efficient and reliable 6g iov,” IEEE Transactions on Vehicular Technology, 2024.
  • [9] F. H. C. Neto, D. C. Araújo, M. P. Mota, T. F. Maciel, and A. L. de Almeida, “Uplink power control framework based on reinforcement learning for 5g networks,” IEEE Transactions on Vehicular Technology, vol. 70, no. 6, pp. 5734–5748, 2021.
  • [10] M. Chiang, P. Hande, T. Lan, C. W. Tan et al., “Power control in wireless cellular networks,” Foundations and Trends® in Networking, vol. 2, no. 4, pp. 381–533, 2008.
  • [11] H. Zhou, M. Erol-Kantarci, and H. V. Poor, “Learning from peers: Deep transfer reinforcement learning for joint radio and cache resource allocation in 5G RAN slicing,” IEEE Transactions on Cognitive Communications and Networking, vol. 8, no. 4, pp. 1925–1941, 2022.
  • [12] L. Zhang and Y.-C. Liang, “Deep reinforcement learning for multi-agent power control in heterogeneous networks,” IEEE Transactions on Wireless Communications, vol. 20, no. 4, pp. 2551–2564, 2020.