Progress
Progress
The rapid growth of wireless data traffic and the increasing demand for faster, more reliable
connectivity will strain existing mobile networks. As 5G networks roll out and 6G technologies
emerge, innovative solutions will be needed. The integration of mmWave massive MIMO with
NOMA will offer improved bandwidth and system capacity, enabling more users and higher-
quality communication. However, challenges in resource management and power allocation will
arise. Future research will explore deep reinforcement learning (DRL) techniques, such as deep
Q-network (DQN) and deep deterministic policy gradient (DDPG), for optimizing resource
allocation in cell-free massive MIMO and mmWave-NOMA systems. A two-level scheduling
framework will be proposed to enhance performance, spectral efficiency, and scalability,
addressing the complex needs of future wireless applications.
1. INTRODUCTION
The rapid growth of wireless data traffic, driven by the increasing number of connected users and
the rising demand for faster, more reliable connectivity, is placing significant strain on existing
mobile communication networks. As the rollout of fifth-generation (5G) networks begins to take
shape, anticipation for sixth-generation (6G) technologies is growing, highlighting the need for
innovative solutions to support the increasing demands of modern wireless communication. With
the proliferation of data-driven services, such as Internet of Things (IoT) devices, augmented
reality (AR), virtual reality (VR), and high-definition video streaming, the need for robust
network infrastructure that can deliver high-speed, low-latency communication is more critical
than ever. To meet these growing demands, new technologies must be adopted that can enhance
the capacity and efficiency of wireless networks.
However, the integration of mmWave MIMO and NOMA presents a key challenge—efficient
resource management. The need for effective power allocation, optimal user grouping, and
interference management is critical to ensuring these technologies can function efficiently,
especially as user density and traffic demand continue to rise. One particular challenge in this
evolving landscape is the efficient allocation of resources across various network slices, a critical
feature in 5G and beyond. Network slicing allows a single physical infrastructure to support
multiple virtual networks, each tailored to meet the specific requirements of different services.
For instance, ultra-reliable low-latency communications (uRLLC) are required for applications
like autonomous vehicles, while enhanced mobile broadband (eMBB) supports high-speed
internet access for users. Efficiently managing these diverse needs within a single network
infrastructure is a complex but necessary task for achieving optimal performance.
To address these challenges, this study explores the use of advanced deep reinforcement learning
(DRL) techniques for optimizing resource allocation in cell-free massive MIMO systems and
mmWave massive MIMO-NOMA systems. DRL, a subset of machine learning, enables systems
to learn optimal decision-making strategies through interaction with the environment and
feedback from past actions. In particular, this research focuses on two DRL methods—deep Q-
network (DQN) and deep deterministic policy gradient (DDPG)—to optimize power allocation
in dynamic wireless networks.
Cell-free massive MIMO systems, composed of distributed access points (APs) that serve
multiple users (UEs) simultaneously, are gaining attention for their potential to improve network
coverage and capacity, especially in high-density urban environments. These systems have the
advantage of using a large number of distributed antennas to create a seamless communication
environment for users, reducing issues like signal interference and coverage gaps. However,
optimizing power allocation among APs remains a significant challenge, particularly when faced
with imperfect channel state information (CSI) and pilot contamination, which can degrade
system performance.
Traditional power allocation methods, such as the weighted minimum mean square error
(WMMSE) algorithm, while effective, are computationally expensive and may not be suitable
for real-time applications where rapid adaptation to changing network conditions is required.
This study addresses these limitations by incorporating DRL techniques, which enable adaptive
learning of optimal power allocation strategies through continuous interaction with the network
environment. The simulation results demonstrate that both DQN and DDPG methods
significantly outperform traditional power allocation techniques, with the DDPG method in
particular showing faster convergence and lower computational complexity. This makes DDPG
more suited for real-time adaptation in dynamic wireless networks, where conditions can change
rapidly.
Furthermore, this work proposes a two-level scheduling framework for managing resource
allocation in 5G networks. In this framework, the upper level handles the allocation of resources
across various network slices, while the lower level focuses on scheduling resources within each
slice. This hierarchical approach ensures that each application, whether it requires high
throughput or ultra-low latency, receives the appropriate amount of resources to meet its specific
requirements. A key focus of this approach is to optimize short packet transmission, which is
crucial for reducing latency in delay-sensitive applications, such as real-time gaming or
emergency communications.
The findings from this study show that the DRL-based approach not only improves resource
allocation efficiency but also significantly reduces computational complexity compared to
traditional methods. By optimizing user grouping, subchannel allocation, and power distribution,
the proposed framework enhances system performance, maximizing capacity and improving
spectral efficiency. Simulation results demonstrate that this approach outperforms conventional
methods in terms of system capacity, convergence speed, and overall efficiency. The ability to
handle a higher number of users while maintaining high service quality and low latency is
particularly important in the context of dense urban environments where mobile data traffic is at
its peak.
Despite these promising results, several challenges remain that warrant further investigation.
These include the development of more advanced interference management strategies, the
refinement of channel estimation and feedback mechanisms, and the exploration of energy-
efficient solutions. As wireless networks become more complex, energy efficiency will play an
increasingly important role in ensuring the sustainability of network infrastructure.
1.1 Table 1: Overview of Machine Learning Techniques in Wireless Networks
Motivation
1. Wireless data traffic is growing rapidly due to IoT, AR/VR, and high-definition video
streaming, placing significant strain on current networks.
2. Existing networks, including 5G, struggle to meet the escalating demand for faster and
more reliable connectivity.
3. Integrating mmWave massive MIMO with NOMA can significantly enhance bandwidth
and system capacity.
4. Resource management challenges, such as power allocation and interference
management, need innovative solutions.
5. Deep reinforcement learning (DRL) techniques, like DQN and DDPG, offer potential
for optimizing dynamic network systems.
6. Future wireless networks must be scalable, robust, and efficient to handle the demands
of diverse applications like uRLLC and eMBB.
7. Energy-efficient solutions are crucial for ensuring the sustainability of next-generation
network infrastructures.
Objectives
Base Station (BS): The BS is equipped with a large antenna array (Massive MIMO). It is
responsible for allocating resources to users in the network. In the dynamic system, the
BS acts as the RL agent.
Users: A set of mobile users are served by the BS. Each user has its own channel state,
interference conditions, and QoS (Quality of Service) requirements. The RL agent needs
to decide how to allocate resources (e.g., power and beamforming) to maximize overall
network performance.
Resource Variables:
o Beamforming vectors (W k): The BS selects a beamforming vector for each user
to maximize the SINR (Signal-to-Interference-plus-Noise Ratio).
o Power Allocation ( Pk ): The BS allocates transmission power to each user,
ensuring that interference is minimized, and SINR requirements are met.
o Scheduling/Time-frequency resources: The BS may also need to decide which
users should be scheduled to transmit or receive in specific time slots or
frequency bands.
State Representation
The states(t) of the system at any time t should capture the key aspects of the current system
environment. The state could include:
Channel State Information (CSI): The channel gains between the BS and the users,
typically represented as the channel vectorsh k for each user k.
User Location: The relative position of users to the BS, which affects the channel
propagation.
Interference Levels: The interference caused by other users, either from the BS or from
other cells in the network.
Traffic Demand: The data rates required by users or their Quality of Service (QoS)
demands.
Power Consumption: The current power usage, which is relevant when trying to
minimize energy consumption.
Action Space:
The actiona(t) is the decision the RL agent makes at time t based on the current state. Possible
actions include:
Beamforming Vector Allocation: For each user k, the BS selects a beamforming vector
W kthat maximizes the SINR. This is typically a high-dimensional decision since there are
many antennas at the BS.
Power Allocation: The BS assigns transmission power Pk to each user k.
Scheduling: The BS decides which users to serve in a given time slot or sub-carrier.
Reward Function:
The rewardr (t) is the feedback received by the RL agent after taking an action a(t) at states(t).
The reward function is designed to reflect the performance of the system, such as throughput,
fairness, and energy efficiency. Common choices for the reward are:
Throughput Maximization: The sum of the throughput across all users can be used as
the reward. For user k, the throughput is a function of SINR:
Energy Efficiency: Another reward can be based on energy efficiency, which considers
both throughput and power consumption. The reward could be formulated as:
K K
r ( t )= ∑ log 2 ( 1+SINR K )−¿ λ ∑ Pk (t )¿
K =1 K=1
Fairness: To ensure fairness among users, the reward could incorporate fairness metrics,
such as the Jain’s fairness index.
Policy:
The policyπ(s) defines the action taken by the RL agent for each state s. The RL agent is
objective is to learn the optimal policy π ¿(s) that maximizes the expected cumulative reward over
time.
5. LEARNING PROCESS
To learn the optimal policy, the RL agent goes through an exploration phase where it tries
various actions to understand the environment and a subsequent exploitation phase where it uses
the learned policy to make decisions that maximize the expected reward.
Q ( s , a ) ← Q ( s , a ) +α ( r ( t ) + γ max Q ( S ,a ) −Q(s , a) )
' '
Where,
α is the learning rate, γ is the discount factor, and r(t) is the reward received after taking
action a in state s.
Deep Q-Networks (DQN): For large state and action spaces (which is typical in Massive
MIMO), the Q-function is approximated using a deep neural network. DQN allows the
agent to scale to environments with large dimensionality (such as Massive MIMO
systems).
Policy Gradient methods (e.g., Proximal Policy Optimization (PPO)) are another
approach where the agent directly learns the policy π(a∣s) instead of the Q-function. This
is useful in high-dimensional action spaces, like the beamforming vector in Massive
MIMO.
6. RESOURCE ALLOCATION DECISIONS
Once the RL agent is trained, it dynamically allocates resources in the following way:
1. Beamforming: The agent selects beamforming vectors W kfor each user based on the
current state, optimizing the SINR for all users.
2. Power Control: The agent allocates power Pk to each user to ensure fairness and
minimize interference.
3. Scheduling: The agent may also decide which users should be scheduled in each time
slot based on their QoS requirements and current channel conditions.
7. IMPLEMENTATION CHALLENGES
Real-Time Training: In a dynamic network, the RL agent must continually update its
policy as the environment changes. Training in real-time can be computationally
intensive.
Exploration-Exploitation Trade-off: Balancing exploration and exploitation is crucial,
as too much exploration can cause inefficiency, while excessive exploitation may result
in suboptimal performance.
Scalability: With a large number of users and antennas, the action and state spaces grow
exponentially, requiring advanced techniques like DQN or PPO for effective learning.
In the context of RL applied to Massive MIMO, the goal is to design an agent that can
dynamically allocate resources (such as power or beamforming) to users in the network, to
maximize a cumulative reward function that reflects the network performance. We can describe
this problem using the RL framework, where:
State (s): The state at time t could represent the current conditions of the system, such as
channel state information (CSI), user locations, interference levels, etc. Denoted as s(t).
Action (a): The action represents the resource allocation decisions the agent can take,
such as power control, beamforming vector adjustment, or scheduling of users. Denoted
as a(t).
Reward (r): The reward represents the feedback the system gives to the agent based on
the chosen action and current state. This reward is typically related to the system's
performance (e.g., throughput, energy efficiency, or SINR). Denoted as r(t).
Policy (π): The policy is the strategy the agent uses to choose actions based on the state.
This can be deterministic or probabilistic.
Value Function (V): The value function V(s) estimates the long-term expected reward
the agent can achieve from a given state s.
Q-function (Q): The Q-function Q(s,a) is the expected cumulative reward of taking
action a from state s.
M =(S , A , P , R , γ )
The objective is to maximize the cumulative reward over time, which can be represented as the
expected return J(π) from following a policy π:
J ( π )=E ¿]
In Massive MIMO systems, the main objective is often to maximize the throughput or SINR
(Signal-to-Interference-plus-Noise Ratio) of the users, while minimizing interference and power
consumption. This can be done by applying RL to the problem of beamforming and power
allocation. Let’s break this down:
Beamforming:
In Massive MIMO systems, the base station with multiple antennas needs to allocate
beamforming vectors to users to maximize SINR or throughput. For user k, the SINR can be
expressed as:
H 2
¿ hk W k ∨¿
SINRk = ¿
∑ ¿ h Hk W i∨¿2 +σ 2k ¿
i k
Power Control:
For power allocation, the RL agent must allocate power pkp_kpk to user kkk in such a way that
interference is minimized while maintaining an acceptable SINR. The power control strategy can
be modeled as:
P ( t ) ={ P1 ( t ) , P1 ( t ) , … .. P k (t ) }
The reward function can be adjusted to consider both throughput and power efficiency, for
instance:
k k
r ( t )=∑ log ( 1+ SINRk ) −λ ∑ Pk ( t )
k=1 k=1
Where:
To learn the optimal policy π∗, RL algorithms such as Q-learning, Deep Q-Networks (DQN), or
Proximal Policy Optimization (PPO) can be applied. These algorithms aim to estimate the Q-
function Q(s,a), which gives the expected cumulative reward for taking action a instate s. The
agent updates the Q-function iteratively by observing the rewards and state transitions over time.
For example, in Q-learning, the Q-function is updated using the Bellman equation:
Q ( s , a ) ← Q ( s , a ) +α ( r ( t ) + γmaxQ ( s , a ) −Qs ,a )
' '
Where:
The optimization of resource allocation in cell-free massive MIMO and mmWave massive
MIMO-NOMA systems was successfully achieved by utilizing advanced deep reinforcement
learning (DRL) techniques. The framework developed was demonstrated to provide robust and
adaptive solutions to address the increasing demands posed by 5G networks and the transition
towards 6G. By integrating mmWave massive MIMO with NOMA, significant improvements in
bandwidth utilization and system capacity were achieved, highlighting the potential of these
technologies to support high user densities and diverse application requirements.
The DRL-based methods, including deep Q-network (DQN) and deep deterministic policy
gradient (DDPG), were validated to outperform traditional resource allocation techniques in
various performance metrics, including computational complexity, adaptability to dynamic
network conditions, and system efficiency. The hierarchical two-level scheduling framework
proposed was shown to enhance resource allocation efficiency by ensuring that the distinct needs
of each application—ranging from ultra-reliable low-latency communication (uRLLC) to
enhanced mobile broadband (eMBB)—were met effectively. Simulation results demonstrated
substantial gains in system capacity, spectral efficiency, and convergence speed, even in the most
challenging scenarios such as dense urban environments characterized by high user mobility and
interference.
Despite these advancements, it was acknowledged that significant challenges remain. The
integration of mmWave massive MIMO and NOMA into practical deployments is constrained by
issues such as interference management, the need for precise channel state information (CSI),
and energy consumption concerns. Addressing these limitations is critical for the development of
scalable, reliable, and sustainable next-generation networks.
Future Work
Future research will be directed towards addressing the unresolved challenges that hinder the full
potential of the proposed methodologies. Interference management will be a primary focus, with
advanced techniques being explored to mitigate both inter-cell and intra-cell interference. Novel
approaches for channel estimation and feedback mechanisms will also be investigated to improve
the accuracy and reliability of CSI in dynamic environments. Additionally, the trade-off between
computational complexity and real-time adaptability in DRL-based frameworks will be further
optimized to ensure compatibility with large-scale, real-world deployments.
Moreover, the integration of the proposed DRL-based frameworks with emerging technologies,
including reconfigurable intelligent surfaces (RIS) and terahertz communication, will be
explored. RIS can enhance system capacity and spectral efficiency by dynamically shaping the
wireless propagation environment, while terahertz communication offers untapped spectrum
resources for ultra-high-speed data transmission. Additionally, the scalability of the framework
will be tested in more complex scenarios, such as those involving heterogeneous networks,
multi-hop communication, and diverse quality of service (QoS) requirements.
The potential application of multi-agent DRL will also be explored to enable decentralized
resource management, which is expected to enhance scalability and real-time decision-making in
distributed systems. Collaboration with artificial intelligence (AI)-driven techniques, such as
federated learning, will be investigated to further enhance the adaptability and privacy of
resource allocation mechanisms.
Ultimately, the insights gained from these advancements are expected to pave the way for the
development of future-proof wireless networks that are capable of addressing the rapidly
growing data traffic and evolving user requirements. These networks will be characterized by
high reliability, energy efficiency, and the flexibility to support a broad range of applications,
from autonomous vehicles and industrial IoT to immersive extended reality experiences.
REFERENCE:-
[2] D. Yan, B. K. Ng, W. Ke, and C.-T. Lam, "Deep Reinforcement Learning Based Resource
Allocation for Network Slicing With Massive MIMO," IEEE Access, 2023.
[4] G. Gupta and A. K. Chaturvedi, "Conditional Entropy Based User Selection for Multiuser
MIMO Systems," arXiv:1310.7852v1 [cs.IT], Oct. 29, 2013.
[5] Y. Zhao, I. G. Niemegeers, and S. M. Heemstra de Groot, "Dynamic Power Allocation for
Cell-Free Massive MIMO: Deep Reinforcement Learning Methods," Jul. 27, 2021.
[7] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra,
"Continuous Control With Deep Reinforcement Learning," arXiv:1509.02971v6 [cs.LG], Jul. 5,
2019.
[8] Q. Shi, M. Razaviyayn, Z.-Q. Luo, and C. He, "An Iteratively Weighted MMSE Approach to
Distributed Sum-Utility Maximization for a MIMO Interfering Broadcast Channel," IEEE Trans.
Signal Process., vol. 59, no. 9, pp. 4331–4340, Sep. 2011.
[9] U. F. Siddiqi, S. M. Sait, and M. Uysal, "Deep Reinforcement Based Power Allocation for
the Max-Min Optimization in Non-Orthogonal Multiple Access," IEEE Trans. Veh. Technol.,
vol. 69, no. 11, pp. 13124–13138, Nov. 2020.
[10] D. Guo and L. Tang, "Joint Optimization of Handover Control and Power Allocation Based
on Multi-Agent Deep Reinforcement Learning," IEEE Trans. Veh. Technol., vol. 69, no. 11, pp.
13124–13138, Nov. 2020.
[11] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra,
"Continuous Control With Deep Reinforcement Learning," arXiv:1509.02971 [cs.LG], 2015.
[Online]. Available: http://arxiv.org/abs/1509.02971
[12] M. Wang, X. Liu, F. Wang, Y. Liu, T. Qiu, and M. Jin, "Spectrum-Efficient User Grouping
and Resource Allocation Based on Deep Reinforcement Learning for mmWave Massive MIMO-
NOMA Systems," Scientific Reports, 2024. [Online]. Available: https://doi.org/10.1038/s41598-
024-59241-x
[13] Rao Muhammad Asif, Jehangir Arshad, Mustafa Shakir, Sohail M. Noman, and Ateeq Ur
Rehman, “Energy Efficiency Augmentation in Massive MIMO Systems through Linear
Precoding Schemes and Power Consumption Modeling,” Hindawi Wireless Communications and
Mobile Computing, 2020.
[14] Jehangir Arshad, Abdul Rehman, Ateeq Ur Rehman, Rehmat Ullah, and Seong Oun Hwang,
“Spectral Efficiency Augmentation in Uplink Massive MIMO Systems by Increasing Transmit
Power and Uniform Linear Array Gain,” Sensors, MDPI, 2020.
[15] Jakob Hoydis, Stephan ten Brink, and M´erouane Debbah, “Massive MIMO in the UL/DL
of Cellular Networks: How Many Antennas Do We Need?” IEEE Journal on Selected Areas in
Communications, 2013.
[16] Emil Bjornson, Luca Sanguinetti, Jakob Hoydis, and Merouane Debbah, “Optimal Design
of Energy-Efficient Multi-User MIMO Systems: Is Massive MIMO the Answer?”
arXiv:1403.6150v2 [cs.IT], 2015.
[17] K. N. R. Surya Vara Prasad, Ekram Hossain, and Vijay K. Bhargava, “Energy Efficiency in
Massive MIMO-Based 5G Networks: Opportunities and Challenges,” arXiv:1511.08689v1
[cs.NI], 2015.
[18] Jing Yang, Liping Zhang, Chunhua Zhu, Xinying Guo, and Jiankang Zhang, “Energy
Efficiency Optimization of Massive MIMO Systems Based on the Particle Swarm Optimization
Algorithm,” Hindawi Wireless Communications and Mobile Computing, 2021.
[19] Robin Chataut and Robert Akl, “Massive MIMO Systems for 5G and Beyond Networks—
Overview, Recent Trends, Challenges, and Future Research Direction,” Sensors, MDPI, 2020.
[20] Joumana Kassam, Daniel Castanheira, Adão Silva, Rui Dinis, and Atílio Gameiro, “A
Review on Cell-Free Massive MIMO Systems,” Electronics, MDPI, 2023.
[21] Nikolaos Kolomvakis, Majid Bavand, Israfil Bahceci, and Ulf Gustavsson, “A Distortion
Nullforming Precoder in Massive MIMO Systems with Nonlinear Hardware,” IEEE Wireless
Communication Letters, 2022.
[22] Erik G. Larsson, Edfors, Fredrik Tufvesson, and Thomas L. Marzetta, “Massive MIMO for
Next Generation Wireless Systems,” IEEE Communications Magazine, 2014.
[23] AKHIL GUPTA and RAKESH KUMAR JHA, “A Survey of 5G Network: Architecture and
Emerging Technologies,” IEEE Translations. [SPECIAL SECTION ON RECENT ADVANCES
IN SOFTWARE DEFINED NETWORKING FOR 5G NETWORKS], 2015.
[24] Ali M A and E. A. Jasmin, “Optimization of Spectral Efficiency in Massive-MIMO TDD
Systems with Linear Precoding,” Advances in Computational Sciences and Technology, 2017.
Let me know if you need any additional adjustments or the references for further Sl. Nos. to be
formatted similarly.
[25] Sk. Saddam Hussain, Shaik Mohammed Yaseen, and Koushik Barman, “An Overview of
Massive MIMO System in 5G,” IJCTA, 2016.
[26] Ehab Ali, Mahamod Ismail, Rosdiadee Nordin, and Nor Fadzilah Abdulah, “Beamforming
Techniques for Massive MIMO Systems in 5G: Overview, Classification, and Trends for Future
Research,” FITEE: Frontiers of Information Technology & Electronic Engineering, 2014.
[27] Italo Atzeni and Antti Tölli, “Channel Estimation and Data Detection Analysis of Massive
MIMO With 1-Bit ADCs,” IEEE Transactions on Wireless Communications, 2022.
[29] Kan Zheng, Suling Ou, and Xuefeng Yin, “Massive MIMO Channel Models: A Survey,”
Hindawi Publishing Corporation, 2014.
[30] Robin Chataut and Robert Akl, “Massive MIMO Systems for 5G and Beyond Networks—
Overview, Recent Trends, Challenges, and Future Research Direction,” Sensors, MDPI, 2020.
[31] Anum Ali, Elisabeth de Carvalho, and Robert W. Heath Jr., “Linear Receivers in Non-
Stationary Massive MIMO Channels with Visibility Regions,” IEEE Wireless Communications
Letters, 2019.
[32] Sinan A. Khwandah, John P. Cosmas, Pavlos I. Lazaridis, Zaharias D. Zaharis, and Ioannis
P. Chochliouros, “Massive MIMO Systems for 5G Communications,” Wireless Personal
Communications, Springer, 2021.
[34] Lu Lu, Geoffrey Ye Li, A. Lee Swindlehurst, and Rui Zhang, “An Overview of Massive
MIMO: Benefits and Challenges,” IEEE Journal of Selected Topics in Signal Processing, 2014.
[35] Sampson E. Nwachukwu, Maurine Chepkoech, Albert A. Lysko, Kehinde Awodele, and
Joyce Mwangama, “Integration of Massive MIMO and Machine Learning in the Present and
Future of Power Consumption in Wireless Networks: A Review,” NEC Technical Journal, 2023.