Large Language Model (LLM)-enabled In-context Learning for Wireless Network Optimization: A Case Study of Power Control ††thanks: Hao Zhou, Chengming Hu, Dun Yuan, Ye Yuan, and Xue Liu are with the School of Computer Science, McGill University, Montreal, QC H3A 0E9, Canada. (mails:hao.zhou4, chengming.hu, dun.yuan, [email protected], [email protected]). Di Wu is with the School of Electrical and Computer Engineering, McGill University, Montreal, QC H3A 0E9, Canada. (email: [email protected]). Charlie Zhang is with Samsung Research America, Plano, Texas, TX 75023, USA. (email: [email protected]).
Abstract
Large language model (LLM) has recently been considered a promising technique for many fields. This work explores LLM-based wireless network optimization via in-context learning. To showcase the potential of LLM technologies, we consider the base station (BS) power control as a case study, a fundamental but crucial technique that is widely investigated in wireless networks. Different from existing machine learning (ML) methods, our proposed in-context learning algorithm relies on LLM’s inference capabilities. It avoids the complexity of tedious model training and hyper-parameter fine-tuning, which is a well-known bottleneck of many ML algorithms. Specifically, the proposed algorithm first describes the target task via formatted natural language, and then designs the in-context learning framework and demonstration examples. After that, it considers two cases, namely discrete-state and continuous-state problems, and proposes state-based and ranking-based methods to select appropriate examples for these two cases, respectively. Finally, the simulations demonstrate that the proposed algorithm can achieve comparable performance as conventional deep reinforcement learning (DRL) techniques without dedicated model training or fine-tuning. Such an efficient and low-complexity approach has great potential for future wireless network optimization.
Index Terms:
Large language model, in-context learning, network optimization, transmission power controlI Introduction
From LTE and 5G to envisioned 6G, wireless networks are increasingly complicated with diverse application scenarios and novel signal processing and transmission techniques, e.g., unmanned aerial vehicle (UAV), vehicle-to-everything (V2X), mmWave and THz networks, reconfigurable intelligent surface, etc [1]. The constantly evolving network architecture requires more efficient management schemes, and most existing network optimization methods can be summarized into two main approaches: convex optimization and machine learning (ML) algorithms. Specifically, convex optimization usually needs dedicated problem formulation for each specific task, then transforms the objective function or constraints into convex forms. By contrast, ML algorithms, such as reinforcement learning, have lower requirements for problem formulations, but the tedious model training and fine-tuning indicate a large number of iterations [2]. Therefore, these potential issues, e.g., problem-specific transformation and relaxation, hyperparameter tuning, and long training iterations, have become obstacles to further improve the efficiency of next-generation networks.
Large language models (LLMs) are recently considered revolutionary technologies that have been successfully applied to education, finance, healthcare, biology, etc [3]. The great potential of LLM technologies also provides promising opportunities for network management and optimization [4]. For instance, a promising feature of LLMs is learning from language-based descriptions and demonstrations, which is known as in-context learning. Compared with convex optimization or conventional ML approaches, in-context learning has multiple advantages [5]: 1) in-context learning relies on LLM’s inference process, and it avoids the complexity of dedicated model training and fine-tuning, which is a well-known bottleneck for many ML techniques; 2) in-context learning allows natural language-based task design and implementation, and the operator can easily formulate the target task using human language and instructions. Such a user-friendly approach can also significantly lower the requirements for professional knowledge when solving specific tasks.
LLM-enabled wireless networks have recently attracted considerable interest, e.g., 6G edge intelligence[6], network intrusion detection and LLM-enhanced reconfigurable intelligent surface for internet of vehicles (IoV)[7, 8]. By contrast, this work focuses on optimization problems, and it considers base station (BS) power control as a case study, exploring the potential of using LLM to solve optimization problems. The power control of cellular networks has been extensively studied with diverse objectives and algorithms, i.e., convex optimization, game theory, reinforcement learning [9], etc. These studies prove that power control is a fundamental and critical technique to improve energy efficiency, reduce interference, and save power consumption [10]. Therefore, given such crucial importance, we select BS power control as a case study, and explore the potential of state-of-the-art LLM technologies for wireless network optimization.
In particular, our proposed LLM-enabled in-context learning algorithm first designs a natural language-based task description, i.e., task goal, definition, and rules. The formatted task description, along with a set of selected examples, will become the prompt input for the LLM model. Then, the LLM model can utilize the task description and advisable examples to generate a decision based on the current environment state. After that, the output decision, environment state, and corresponding system performance will become a new example and be stored in an experience pool. Given the next environment state, we will select new advisable examples from the experience pool, serving as references for the next LLM decision-making. In addition, we further consider two scenarios, namely discrete-state and continuous-state problems, and propose state-based and ranking-based example selection methods for these two cases, respectively. Finally, we evaluate the proposed algorithm with various LLMs, e.g., Llama3-8b-instruct, Llama3-70b-instruct, and GPT-3.5 turbo, and the simulations prove that the proposed algorithm can achieve satisfactory performance.
The main contribution of this work is that we explored LLM-enabled in-context learning for wireless network optimization problems. Specifically, it proposed an in-context optimization technique that learns from the environment interactions without model training and fine-tuning. The simulation results show that our proposed algorithm enables LLM to learn from previous explorations and examples, and constantly improve the performance on target tasks. Such an efficient and training-free algorithm has great potential for wireless network optimization and management.
II System Model
This section introduces a BS power minimization problem, serving as a case study of the proposed in-context learning algorithm111Various objectives and constraints can be defined for power control problems as summarized in [10], and here we select power minimization as a specific problem formulation.. Considering a BS with users, the achievable data rate between BS and user is defined by [11]
|
(1) |
where is the total number of resource blocks (RBs) in BS , is the bandwidth of RB , indicates the transmission power of BS on RB , defines the channel gain between BS and user on RB , and is the noise power density. For the RB allocation, indicates whether RB is allocated to the transmission for user . For the interference, represent the set of adjacent BSs except for BS , defines the inter-cell interference, and we assume orthogonal frequency-division multiplexing is applied to eliminate intra-cell interference.
This work aims to minimize the BS transmission power and meanwhile satisfy the average data rate constraint [10]:
(2) | ||||
s.t. | (2a) | |||
(2b) | ||||
(2c) |
where is the total transmission power of BS and , has been defined in equation (1) as the transmission power of RB , is the maximum power, is the total number of users, and is the average achievable data rate constraint. We assume is equally allocated to all RBs, and a proportional fairness method is used for RB allocation, which has been widely used as a classic approach. Then we can better focus on LLM features.
The control variable in equation (2) is the total BS transmission power , which needs to be dynamically adjusted based on the wireless environment, e.g., current user numbers or user-BS distances, to save power consumption and meanwhile maintain the average data rate. Problem (2) has been extensively investigated in existing studies, but this work differs from previous works by presenting a unique view from the perspective of natural language-based network optimization. Specifically, we propose a novel LLM-enabled in-context learning algorithm in the following Section III.

III In-context Learning-based Optimization Algorithm
In-context learning refers to the process that LLMs can learn from formatted natural language such as task descriptions and task solution demonstrations, to improve the performance on target tasks. In-context learning can be defined as [5]
(3) |
where is the task description, is the set of examples at time , is the environment state at time that is associated with the target task , indicates the LLM model, and is the LLM output. For a sequential decision-making problem, we expect the LLM can utilize the initial task description , learn from the example set , and then make decision based on current environment state of the target task. In the following, we will introduce the design of task description and the selection of example set .
III-A Language-based Task Description
is crucial to provide target task information to the LLM model. In particular, it involves “”, “”, and extra “”. The following is a detailed task description we designed to prompt the LLM.
In particular, the first specifies a “decision-making task for base station power control”, and the goal is to “select between 4 power levels”222Here we select 4 power levels as an example, which can be changed to any number of levels. Then the introduces the environment states we need to consider. For example, this work assumes the total user numbers may change dynamically, and then the LLM has to consider the “base station user number” of each case. In addition, it means that the environment state in equation (3) refers to the total user number in problem (2). After that, the example set is included by “Following are some examples….”, and we provide a new condition for the LLM to solve, which is associated with the current user number . Finally, we set extra reply rules such as “select from … based on the above examples”, indicating the LLM to focus on the decision-making process.
The above task description provides a template to define optimization tasks by formatted natural language, avoiding the complexity of dedicated optimization model design. It is also user-friendly since the operator can easily add or remove task descriptions without requiring any professional knowledge of optimization techniques.
III-B In-context Learning Framework and Example Design
Examples are of great importance in in-context learning, and they must be carefully selected because: 1) examples serve as crucial references for LLM decision-making, which means the LLM relies on examples to justify its decision; 2) due to the LLM context window size constraint333The context window size indicates the largest number of tokens that can be sent to the LLM, it is impractical to send a large number of examples to the LLM. Moreover, there are many optimization problems with continuous environment states, which are very common in wireless networks, e.g., adjusting the BS transmission power based on user-BS distance. Such cases mean that there may be an infinite number of examples, and therefore identifying the most relevant and useful examples becomes challenging. This work defines an example by
(4) |
where and are environment state and decision, respectively. Inspired by reinforcement learning, we further define a reward value to evaluate the decision by
(5) |
where is a target power consumption, and has been defined in problem (2) as the total power consumption of BS . is a penalty term, which is only applied when constraint (2c) is not satisfied. Then, provides a comprehensive metric to evaluate the selected decision under environment state .
Fig.1 shows the overall design of the proposed in-context learning algorithm for transmission power control. Specifically, the above task description , current environment state , and selected examples are integrated as input prompt as defined in equation (3), and then the LLM model will generate a power control decision based on and the experiences in . Then, the decision is implemented, the achieved data rate is collected, and the reward is calculated as equation (5). becomes a new example in the accumulated experience pool in Fig.1. After that, based on the next environment state , a new example set is selected, and the selected examples are inserted into the task description with , becoming a new prompt for the LLM model to generate .
III-C State-based Example Selection for Discrete State Problems
Selecting appropriate examples is critical for in-context learning since the LLM model learns from existing demonstrations to handle the target task. For problems with discrete environment states, relevant demonstrations can be easily identified by finding existing examples with the same states in the accumulated experience pool . Considering a target task with environment state value , the set of relevant examples can be identified by
(6) |
where is the accumulated experience pool in Fig. 2. Given , we can easily select recommended examples with high reward, i.e., top-K examples, and inadvisable examples, e.g., examples with lower reward or violating the minimum data rate constraint.
In addition, we include a well-known epsilon-greedy policy to balance exploration and exploitation.
(7) |
where is a predefined value, and is a random number between 0 and 1. Therefore, the random exploration in equation (7) can constantly explore new examples, and then the LLM model can learn from better relevant examples to improve the performance.
III-D Ranking-based Example Selection for Continuous State Problems
Compared with discrete-state problems, environments with continuous states can be much more complicated. For instance, when using average user-BS distance as an environment state for BS transmission power control with a target task , it is unlikely to find a specific existing example with , since is a random number within the BS maximum coverage distance. This problem may be solved by discretizing the continuous states into some discrete values, but this may still lead to a large number of states or extra errors. To this end, we define a new metric for example selection with continuous states:
(8) |
where is a comprehensive metric to evaluate the usefulness of to the decision-making of , and is the norm to define the distance between and . Equation (8) aims to jointly consider the reward and states of example , and is a weighting factor to balance the importance of higher reward and more similar states between and . Specifically, a higher reward indicates that includes a good action selection under environment state , and meanwhile lower value means the environment state in is more similar to . Therefore, we use as a comprehensive metric, and then the recommended and inadvisable examples can be selected similarly as in Section III-C by using top-K methods.
III-E Baseline Algorithm
We consider deep reinforcement learning (DRL) as a baseline since it is one of the most widely used ML algorithms to solve network optimization problems [12]. DRL can usually produce satisfactory optimization results, but it requires proper parameter fine-tuning and dedicated model training. This work investigates two scenarios, and the Markov decision processes (MDPs) are defined by: 1) for discrete-state problems, the state is defined by user numbers associated with the BS; 2) for continuous-state problems, we use average user-BS distance as a continuous-changing state. The action is defined by the BS power level, and the reward is shown as equation (5).
IV Performance Evaluation

service quality comparison of various LLMs.

comparison of various LLMs.



consumption comparison under different
data rate constraints.

IV-A Simulation Settings
We consider three adjacent small base stations (SBSs); the user number of each SBS randomly changes from 5 to 15, and the SBS’s coverage is 20 meters. The channel gain applies 3GPP urban network models, and
2 cases are evaluated:
Case I: Discrete states defined by user numbers of each SBS;
Case II: Continuous states defined by average user-SBS distance, which represents 2 kinds of network optimization problems.
Then, the simulation considers two main approaches:
1) LLM-based method includes 3 models: Llama3-8b-instruct, Llama3-70b-instruct, and GPT-3.5 turbo.
Llama3-8b is a small-scale LLM, while Llama3-70b and GPT-3.5 turbo are large models.
Using LLM models with various sizes can better evaluate the capabilities of our proposed algorithms.
2) DRL-based method as introduced in Section III-E. With dedicated model training, here we consider DRL as an optimal baseline since its capability has been demonstrated in many existing studies [12, 11].
IV-B Simulation Results
Fig. 2 shows the simulation results and comparisons, and the metrics include average reward, power consumption, and service quality (indicating the probability of satisfying the minimum average data rate constraint defined in equation (2)).
Firstly, Fig. 2(a) presents the system reward and service quality of different LLMs under discrete state space. One can observe that both Llama3 LLMs achieve a comparable reward and service quality as the DRL baseline, while GPT-3.5 shows a lower reward and service quality. Fig. 2(a) demonstrates that the proposed in-context learning algorithm and state-based example selection method can provide satisfactory performance for problems with a limited number of environment states.
Then, we consider more complicated scenarios with continuous states defined by the average user-BS distance. Fig. 2(b) and 2(c) present the reward and average power consumption, respectively. All LLM models achieve higher rewards and lower power consumption as the number of episodes increases and finally converge to stable values. The results demonstrate that LLMs can learn from previous examples and explorations and then improve the performance on target tasks.
In addition, we observe the algorithm performance under different minimum data rate constraints. Fig. 2(d), 2(e), and 2(f) present the average reward, power consumption, and service quality, respectively. Here, every value in the results is obtained by taking the average performance of converged episodes of corresponding LLMs as in Fig. 2(b) and 2(c). As expected, the simulation results show that increasing the minimum data rate constraint leads to lower reward, lower service quality, and higher power consumption. Therefore, Fig. 2(d), 2(e), and 2(f) demonstrate that the proposed algorithms can adapt to different optimization settings and then adjust their policies to improve the performance on target tasks.
Note that the algorithm performance is also related to specific LLMs. Llama3 represents state-of-the-art LLM designs, while GPT-3.5 is an early LLM model. Therefore, it is reasonable that Llama3-8b and Llama3-70b maintain comparable performance as the DRL baseline, while GPT-3.5 turbo presents a worse performance in different tasks. For instance, when the minimum data rate constraint is 2 Mbps per user, GPT-3.5 has a 25% lower reward and 20% lower service quality than other LLMs. In summary, the simulations in Fig. 2 demonstrate that our proposed in-context learning algorithm can achieve comparable performance as conventional DRL algorithms without dedicated model training and fine-tuning.
V Conclusion
LLM is a promising technique for future wireless networks, and this work proposes an LLM-enabled in-context learning algorithm for BS transmission power control. The proposed algorithm can handle both discrete and continuous state problems, and the simulations show that it achieves comparable performance as conventional DRL algorithms. This work demonstrates the great potential of in-context learning for handling network management and optimization problems.
References
- [1] Z. Zhang, Y. Xiao, Z. Ma, M. Xiao, Z. Ding, X. Lei, G. K. Karagiannidis, and P. Fan, “6g wireless networks: Vision, requirements, architecture, and key technologies,” IEEE vehicular technology magazine, vol. 14, no. 3, pp. 28–41.
- [2] H. Zhou, M. Erol-Kantarci, Y. Liu, and H. V. Poor, “A survey on model-based, heuristic, and machine learning optimization approaches in ris-aided wireless networks,” IEEE Communications Surveys & Tutorials, 2023.
- [3] W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong et al., “A survey of large language models,” arXiv preprint arXiv:2303.18223, 2023.
- [4] H. Zhou, C. Hu, Y. Yuan, Y. Cui, Y. Jin, C. Chen, H. Wu, D. Yuan, L. Jiang, D. Wu et al., “Large language model (LLM) for telecommunications: A comprehensive survey on principles, key techniques, and opportunities,” arXiv preprint arXiv:2405.10825, 2024.
- [5] Q. Dong, L. Li, D. Dai, C. Zheng, Z. Wu, and et al., “A survey on in-context learning,” arXiv preprint arXiv:2301.00234, 2022.
- [6] Z. Lin, G. Qu, Q. Chen, X. Chen, Z. Chen, and K. Huang, “Pushing large language models to the 6G edge: Vision, challenges, and opportunities,” arXiv preprint arXiv:2309.16739, 2023.
- [7] M. Fu, P. Wang, M. Liu, Z. Zhang, and X. Zhou, “Iov-bert-ids: Hybrid network intrusion detection system in iov using large language models,” IEEE Transactions on Vehicular Technology, 2024.
- [8] Q. Liu, J. Mu, D. ChenZhang, Y. Liu, and T. Hong, “Llm enhanced reconfigurable intelligent surface for energy-efficient and reliable 6g iov,” IEEE Transactions on Vehicular Technology, 2024.
- [9] F. H. C. Neto, D. C. Araújo, M. P. Mota, T. F. Maciel, and A. L. de Almeida, “Uplink power control framework based on reinforcement learning for 5g networks,” IEEE Transactions on Vehicular Technology, vol. 70, no. 6, pp. 5734–5748, 2021.
- [10] M. Chiang, P. Hande, T. Lan, C. W. Tan et al., “Power control in wireless cellular networks,” Foundations and Trends® in Networking, vol. 2, no. 4, pp. 381–533, 2008.
- [11] H. Zhou, M. Erol-Kantarci, and H. V. Poor, “Learning from peers: Deep transfer reinforcement learning for joint radio and cache resource allocation in 5G RAN slicing,” IEEE Transactions on Cognitive Communications and Networking, vol. 8, no. 4, pp. 1925–1941, 2022.
- [12] L. Zhang and Y.-C. Liang, “Deep reinforcement learning for multi-agent power control in heterogeneous networks,” IEEE Transactions on Wireless Communications, vol. 20, no. 4, pp. 2551–2564, 2020.