IJRAR24D1660
IJRAR24D1660
Index Terms - A*, DQN, Machine learning, MATLAB, PPO, Quadcopter, RRT*, SAC.
I.INTRODUCTION
A quadcopter is a kind of UAV (Unmanned Aerial Vehicle) with four rotors. This type of drone is best suited for various essential
applications due to its good balance of control and manageability. The main components that build up a quadcopter are motors
(usually BLDC i.e., brushless DC), quadcopter frame, flight controller, ESC (Electronic Speed Control), propellers, battery,
transmitter & receiver GPS module and sensors. Quadcopters are small sized helicopters that could be controlled remotely by manual
operations or even by automation with minimal human intervention. This automation increases its efficiency, performance and
operation time of the quadcopter. Quadcopters are multitasking and can perform various operations and their applications include
surveillance, photography, agriculture, search and rescue operations, firefighting operations, deliver packages and drone racing etc.
The critical thing in efficient quadcopter movement is path planning-calculating the best route for the drone to fly along so it
avoids any obstructions on its route and lands safely. Much more importance, however, is attached to dynamic or cluttered
environments where the speed of adjustment is critical in order not to collide. Solutions to the problems of path planning have gone
parallel with technological advancement. Algorithms include Rapidly-exploring Random Tree Star (RRT)* and A*, which have
proven effective for 2D and 3D spaces. They are aiming at the calculation of the shortest or near-shortest path between two points
while avoiding obstacles. Their disadvantages is that they take a really long time to compute and fail to work properly in a more
dynamic setup, mainly because of applications on quadcopters in real-time scenarios.
Fig.1 Quadcopter
More advanced technologies related to machine learning and artificial intelligence use sophisticated algorithms, such as the Soft
Actor-Critic (SAC), Proximal Policy Optimization (PPO), and Deep Q-Networks (DQN). This portfolio uses reinforcement learning
methods; hence, the quadcopter can apply optimal navigation strategies through trial and error. Moreover, these algorithms have
IJRAR24D1660 International Journal of Research and Analytical Reviews (IJRAR) 381
© 2024 IJRAR October 2024, Volume 11, Issue 4 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)
high performance in areas important in changing environments where conditions are constantly shifting. For example, SAC balances
exploration and exploitation very nicely in unclear scenarios, while PPO mostly sticks to stability with simplicity of implementation.
DQN is much more straightforward and is therefore well placed to the discrete space but can potentially be applied in more structured
domains as well. With the assistance of MATLAB, among other software, these modern path planning algorithms can be
implemented and then simulated with an analysis of performance under various conditions, holding other more valuable insights
into strengths and limitations. Altogether, this field's current state exhibits cutting-edge thrusts in the development of latest
traditional methods along with AI-empowered ones, improvement through real-time decisions, computational efficiency, and
adaptability. With ongoing employment of quadcopters in commercial and industrial sectors, the demand for further development
of more sophisticated, reliable, and efficient path planning algorithms is increasingly critical. These developments further improve
the performance of the quadcopters while increasing their range of application potential toward the frontiers of emerging possibilities
with UAV technology. The future then belongs to further refinement of these algorithms to unlock new applications for operation
in increasingly complex environments, such as urban air mobility and autonomous delivery systems. As the technology continues
to advance, the roles that quadcopters with intelligent path planning capabilities will play in shaping the future of autonomous aerial
systems expand evermore.
IV. A* ALGORITHM
This is a heuristic and graph based or grid-based type of path finding algorithm predominantly used in robotics and works well
in discrete environment. In graphs, each point is called nodes and the distance between two nodes is called edge. The algorithm
finds a path from starting point to goal point. The path is found by expanding the nodes calculating the cost of each subsequent
nodes. The formula for finding total cost of the path through a given node n is given by
where f(n) gives the total estimated cost of the path through node n, g(n) gives the actual cost of reaching a node n from the start
node. h(n) gives the heuristic estimate of the cost to reach the goal from node n which is a straight line in most of the cases. The f(n)
is calculated for each node until the goal is reached [2]. To implement the algorithm in MATLAB, it is required to write a MATLAB
code for the following steps. The environment is initialized in 3D space by defining the x, y, and z coordinates. Maximum number
of iterations to be performed by the algorithm, search radius and the step size is defined. Starting position, target or goal position,
the radius around newly added nodes to rewire nearby connections in RRT * is also predefined. Obstacles are defined as rectangles
are considered as obstacles in this case. The tree is initialized with the starting position as first node. There are two sets involved in
the algorithm, one for the nodes that are explored already set (1) and the other for the one that are yet to be explored set (2). While
the nodes to be explored is not zero, the node with lowest f(n) value is selected. If goal is not reached, the node is removed from set
(2) and added to set (1). Each node is checked if lies within the environment, doesn’t collide with obstacles and it is not present in
set (2). The current node is plotted for every step and when the goal is reached, optimal path is plotted by tracing back from goal
point to start point. The output of the implemented MATLAB code is shown in fig.4
SAC is an off-policy algorithm designed to maximize both expected reward and entropy, promoting exploration during
training. It is particularly well-suited for continuous action spaces, which makes it ideal for quadcopter navigation. The SAC
algorithm uses two main objective functions: the soft Q-value function Qπ (s, a) and the entropy term αH(π(⋅|s)), which promotes
exploration by encouraging randomness in action selection:
′
𝐽𝑄 = 𝐸(𝑠, 𝑎) ∼ 𝐷 [21 (𝑄(𝑠, 𝑎) − (𝑟 + 𝛾(1 − 𝑑)𝑉 (𝑠 ) )) 2]
SAC balances exploration and exploitation through this entropy term, making it particularly adept at navigating through
environments with complex obstacle arrangements. The Soft Actor-Critic (SAC) algorithm offers a promising approach to path
planning for quadcopters by effectively balancing exploration and exploitation through its entropy term. This mechanism encourages
the quadcopter to explore diverse trajectories while optimizing its navigation strategy, enabling it to navigate complex environments
filled with obstacles.
This capability is crucial for quadcopters, which often operate in dynamic and unpredictable settings. The high exploration rate
provided by SAC contributes to robust performance, allowing the quadcopter to adapt its path in response to changing environmental
conditions. However, implementing SAC in quadcopter path planning is not without challenges. The algorithm is computationally
intensive, necessitating considerable training time to fine-tune its parameters and achieve optimal performance. Moreover, SAC
may be prone to overfitting, particularly in scenarios with sparse rewards, which can hinder the quadcopter's ability to learn effective
navigation strategies. Research has indicated that while SAC can enhance the flexibility and efficiency of quadcopter path planning,
careful consideration of its computational demands and training requirements is essential to leverage its full potential in real-world
applications.
PPO is an on-policy algorithm that optimizes policies by taking small, controlled updates. It minimizes a clipped surrogate
objective function to prevent drastic policy updates, ensuring stability in the training process
PPO’s controlled updates make it ideal for tasks that require fine adjustments, such as navigating tight spaces with obstacles. Its
stability during training makes it a popular choice for path planning problems where smooth trajectories are essential. Proximal
Policy Optimization (PPO) is a widely adopted algorithm in the realm of path planning for quadcopters due to its simplicity and
stability. This ease of implementation allows for quick integration into various systems, and it typically results in smooth trajectories,
minimizing erratic movements during flight. These characteristics are particularly valuable for quadcopters, which require reliable
navigation in complex environments.
However, one of the drawbacks of PPO is its lower sample efficiency compared to off-policy methods like Soft Actor-Critic (SAC).
This means that PPO often requires a larger amount of data to learn effective policies, which can slow down the training process.
Additionally, because PPO operates on an on-policy basis, it may struggle in highly dynamic environments where rapid changes
occur frequently. In such scenarios, the algorithm's reliance on current data can limit its adaptability, making it less effective than
off-policy alternatives in some applications. Overall, while PPO offers benefits for quadcopter path planning, particularly in terms
of stability and implementation, it is essential to consider its limitations in terms of sample efficiency and responsiveness to dynamic
conditions.
DQN combines Q-learning with deep neural networks, enabling it to handle larger state spaces. It uses a neural network to
approximate the Q-function Q (s, a), which estimates the expected reward for a given state-action pair:
While DQN is best suited for discrete action spaces, it has been adapted for continuous environments with certain modifications,
such as using a discretized action set for quadcopter control [6].
Deep Q-Networks (DQN) are known for their ease of implementation and widespread use in various reinforcement learning
applications, including quadcopter path planning. Their architecture effectively leverages deep learning to approximate the Q-value
function, making them particularly well-suited for environments with discrete action spaces. This capability allows DQN to navigate
straightforward scenarios efficiently, achieving commendable performance in tasks with clearly defined actions. However, DQNs
face challenges in environments requiring continuous action spaces, which significantly limits their effectiveness in complex
quadcopter navigation tasks. In these scenarios, the inability to seamlessly handle continuous inputs can hinder the quadcopter's
ability to make precise adjustments while navigating intricate environments. Moreover, achieving optimal performance with DQN
often requires extensive hyperparameter tuning, which can be time-consuming, especially in dynamic environments where
conditions change rapidly. Research has indicated that while DQNs can be effective in simpler tasks, their limitations in continuous
action scenarios and the need for meticulous tuning pose challenges for deploying them in more complex quadcopter navigation
applications.
To evaluate the performance of the algorithms discussed—Soft Actor-Critic (SAC), Proximal Policy Optimization (PPO), and Deep
Q-Networks (DQN)—several important metrics are considered to highlight their effectiveness in quadcopter path planning. Path
Optimality assesses the efficiency of the paths generated by each algorithm, comparing their lengths to the ideal shortest route. This
metric is crucial for understanding how effectively a quadcopter can navigate from its starting point to its destination while
minimizing distance traveled. Computation Time evaluates the duration required to compute a path. This metric is vital for real-
time applications, where rapid decision-making is essential for safe navigation, especially in complex environments. Adaptability
examines how well each algorithm can respond to rapidly changing conditions, such as moving obstacles or varying wind patterns.
The ability to adapt is critical for ensuring that the quadcopter can maintain safe and effective navigation in unpredictable settings.
Sample Efficiency measures how effectively an algorithm utilizes data to learn an optimal policy. This is particularly significant for
algorithms that may require extensive training data, as high sample efficiency can lead to faster convergence and reduced training
costs. Analyzing these metrics provides a comprehensive understanding of the strengths and weaknesses of each algorithm,
facilitating the selection of the most appropriate one for specific navigation challenges.
When comparing traditional path-planning algorithms like A* and RRT* with more recent machine learning-based methods such
as SAC, PPO, and DQN, several key differences emerge across different performance metrics.
In static environments, A* is highly efficient in finding the optimal path. Its heuristic-based approach ensures minimal path
length, making it a reliable choice in grid-like settings with well-defined goals. However, its computational cost grows rapidly as
the grid size increases or when obstacles become more complex. This is particularly evident in scenarios where recalculating the
path frequently becomes necessary, such as dynamic obstacle avoidance or changing goals. RRT*, on the other hand, excels in high-
dimensional spaces. Its probabilistic nature allows it to explore vast search spaces more flexibly, making it suitable for environments
that are cluttered or have complex obstacles. However, RRT* requires significant computational time to reach the optimal solution,
and it often generates paths that are not inherently smooth, requiring post-processing or additional optimization steps to refine.
In contrast, machine learning-based methods exhibit a different set of strengths and limitations. SAC (Soft Actor-Critic)
particularly stands out in dynamic and continuous environments. Its ability to optimize for both the reward and exploration by
maximizing entropy enables it to efficiently navigate through highly complex environments. By continuously updating its policy
through experience, SAC demonstrates strong adaptability, making it superior in handling unexpected changes in the environment
or goals. However, this advantage comes at the cost of significant computational resources, as SAC requires extensive data and time
to train the agent to an optimal policy. The reward sparsity often seen in real-world tasks like quadcopter navigation further
complicates this, sometimes leading to longer convergence times.
PPO (Proximal Policy Optimization), on the other hand, offers more stability during training. Its clipped objective function
ensures that the policy does not undergo abrupt changes, making it suitable for environments requiring smooth trajectories, such as
when navigating around tight corners or narrow spaces. This stability makes PPO a popular choice for real-time navigation tasks.
However, PPO’s on-policy nature means it requires a lot of data to train effectively, which limits its sample efficiency compared to
off-policy methods like SAC. While PPO works well in environments with limited dynamic elements, its adaptability is not as
robust as SAC when faced with highly unpredictable changes.
IJRAR24D1660 International Journal of Research and Analytical Reviews (IJRAR) 385
© 2024 IJRAR October 2024, Volume 11, Issue 4 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)
DQN (Deep Q-Learning) stands out for its simplicity and effectiveness in discrete action spaces. While not as adept in continuous
environments due to its inherent limitations in action representation, DQN has proven to be effective in environments where actions
can be discretized without losing significant granularity. That said, for tasks like quadcopter navigation—where fine-grained control
is essential—DQN is often not the best-suited option, as it struggles to match the performance of SAC and PPO in terms of precision
and adaptability. The need to discretize the action space also results in suboptimal paths, as DQN tends to produce jerky, less fluid
motion in continuous environments [6]. When juxtaposed against traditional methods, machine learning algorithms not only offer
better adaptability but also show immense potential in dynamic, real-time path planning. In contrast to A* and RRT*, which rely
on predefined rules and heuristics, machine learning approaches actively learn from their environment. This learning enables them
to develop more sophisticated strategies to avoid obstacles and optimize paths dynamically as the environment evolves. However,
the primary trade-offs are training time and computational overhead. While traditional algorithms can be implemented with lower
computational costs, machine learning-based algorithms require powerful hardware and longer development phases due to the
extensive training involved. Recent advancements in path-planning techniques have focused on state-of-the-art algorithms that blend
machine learning with traditional methods to enhance performance. One such promising approach is the PRM*-RL (Probabilistic
Roadmaps with Reinforcement Learning) model. This hybrid system integrates the fast initial path generation of Probabilistic
Roadmaps with reinforcement learning algorithms, like SAC or PPO, allowing for real-time adjustments and dynamic adaptability.
The PRM*-RL model has demonstrated significantly higher accuracy, ranging between 92% and 96%, compared to traditional path-
planning algorithms. It also surpasses standalone reinforcement learning models, such as DQN and PPO, particularly in dynamic
environments where adaptability is crucial. Additionally, state-of-the-art graph-based learning techniques, like Graph Neural
Networks (GNNs), have emerged as highly competitive tools for path planning. By modeling the environment as a graph, GNNs
enable agents to update their paths dynamically with minimal recomputation. These models achieve accuracy rates exceeding 95%
in both static and dynamic environments and outperform algorithms like SAC and PPO, particularly in large-scale applications such
as urban air mobility and drone delivery systems.
Looking ahead, research in quadcopter path planning is likely to focus on several promising areas that build upon both traditional
and machine learning approaches. One key direction is the development of hybrid models, which combine the deterministic nature
of traditional algorithms with the adaptability of machine learning techniques. For instance, researchers could explore the integration
of A* or RRT* with reinforcement learning methods to create hybrid frameworks that switch between model-based planning and
model-free learning based on environmental context. This approach would leverage the efficiency of traditional methods for static
or semi-dynamic environments while relying on machine learning for complex, dynamic scenarios. Another area of interest is the
improvement of sample efficiency in reinforcement learning algorithms. Both SAC and PPO, despite their robust performance,
require significant amounts of training data. Future research could focus on enhancing transfer learning and meta-learning
techniques, allowing agents to generalize across environments more efficiently. Such advancements would significantly reduce the
training time and computational resources required for these algorithms, making them more practical for real-time applications.
Additionally, the concept of multi-agent systems in path planning holds considerable potential. By enabling multiple agents
(quadcopters) to learn and collaborate in a shared environment, researchers can develop more resilient and efficient navigation
strategies. Such systems can distribute tasks dynamically, thereby optimizing the path planning process across a fleet of autonomous
vehicles. Machine learning algorithms like multi-agent PPO (MAPPO) and multi-agent SAC (MASAC) are already being explored
for this purpose, and future research could further enhance their scalability and coordination abilities. In the realm of traditional
algorithms, improving the speed and efficiency of probabilistic methods like RRT* remains an ongoing challenge. Recent
advancements in Fast Marching Trees (FMT) and Probabilistic Roadmaps (PRM*) suggest that these algorithms can achieve greater
efficiency in both static and dynamic environments. Future work could integrate these methods with machine learning techniques
to develop hybrid models that balance exploration with real-time adaptability [7]. Moreover, the incorporation of uncertainty
modeling into both traditional and machine learning algorithms is another promising direction. Techniques such as Bayesian
optimization or Gaussian processes could be used to model the uncertainty in environmental dynamics, allowing quadcopters to
make more informed decisions when navigating through unpredictable environments.
VIII. CONCLUSION
In conclusion, while traditional algorithms like A* and RRT* provide reliable solutions for static environments, their
limitations in adaptability and dynamicity are driving the shift toward machine learning approaches such as SAC and PPO. The
future of quadcopter path planning lies in the integration of these paradigms, leveraging the strengths of both to create solutions that
are not only efficient and optimal but also adaptable to the complex, dynamic environments of the real world. In this paper, the
traditional path planning algorithms, such as A* and RRT* were compared with machine learning methods like PPO, SAC, and
DQN for quadcopter navigation. The analysis revealed that traditional algorithms are effective in simple, static environments but
tend to struggle in dynamic scenarios. In contrast, machine learning algorithms showed better adaptability, particularly SAC and
PPO, which excelled in finding efficient paths and navigating around obstacles. Advanced methods like PRM*-RL and GNN were
explored, which improve path planning by enhancing adaptability and efficiency. This indicates that merging traditional and
machine learning approaches could lead to even more effective solutions for quadcopter navigation. Future research could focus
on these hybrid strategies and test their performance in real-time situations and more complex environments, ultimately advancing
the capabilities of autonomous flight.
[1] K. Danancier, D. Ruvio, I. Sung, and P. Nielsen, "Comparison of Path Planning Algorithms for an Unmanned Aerial Vehicle
Deployment Under Threats," Procedia Manufacturing, vol. 42, pp. 311-317, 2020.
[2] P. E. Hart, N. J. Nilsson, and B. Raphael, "A Formal Basis for the Heuristic Determination of Minimum Cost Paths," in IEEE
Transactions on Systems Science and Cybernetics, vol. 4, no. 2, pp. 100–107, July 1968.
[3] S. Karaman and E. Frazzoli, "Sampling-based Algorithms for Optimal Motion Planning," in International Journal of Robotics
Research, vol. 30, no. 7, pp. 846–894, 2011.
[4] S. Thrun, W. Burgard, and D. Fox, "A Probabilistic Approach to Concurrent Mapping and Localization for Mobile Robots," in
Machine Learning, vol. 31, pp. 29–53, 1998.
[5] J. Delmerico and D. Scaramuzza, "A Benchmark Comparison of Planning Algorithms for Autonomous Quadrotor Navigation,"
arXiv preprint, Mar. 2022.
[6] H. Van Hasselt, A. Guez, and D. Silver, "Deep Reinforcement Learning with Double Q-learning," in Proceedings of the Thirtieth
AAAI Conference on Artificial Intelligence, 2016, pp. 2094–2100.
[7] C. Wang, L. Meng, S. She, et al., "Reinforcement Learning-Based Path Planning: A Reward Function Strategy," Applied
Sciences, vol. 14, no. 17, pp. 7654, 2024.