Integrated Intention Prediction and Decision-Making with Spectrum Attention Net and Proximal Policy Optimization
Abstract
For autonomous driving in highly dynamic environments, it is anticipated to predict the future behaviors of surrounding vehicles (SVs) and make safe and effective decisions. However, modeling the inherent coupling effect between the prediction and decision-making modules has been a long-standing challenge, especially when there is a need to maintain appropriate computational efficiency. To tackle these problems, we propose a novel integrated intention prediction and decision-making approach, which explicitly models the coupling relationship and achieves efficient computation. Specifically, a spectrum attention net is designed to predict the intentions of SVs by capturing the trends of each frequency component over time and their interrelations. Fast computation of the intention prediction module is attained as the predicted intentions are not decoded to trajectories in the executing process. Furthermore, the proximal policy optimization (PPO) algorithm is employed to address the non-stationary problem in the framework through a modest policy update enabled by a clipping mechanism within its objective function. On the basis of these developments, the intention prediction and decision-making modules are integrated through joint learning. Experiments are conducted in representative traffic scenarios, and the results reveal that the proposed integrated framework demonstrates superior performance over several deep reinforcement learning (DRL) baselines in terms of success rate, efficiency, and safety in driving tasks.
I INTRODUCTION
Advancements in computer technology and artificial intelligence have led to significant developments in the field of autonomous driving technology. Nevertheless, the realization of autonomous driving systems remains a formidable challenge, with system design standing as a principal barrier. The architectures of contemporary autonomous driving systems fall into two major categories: end-to-end and hierarchical systems. While there have been notable advancements in end-to-end autonomous driving systems based on machine learning [1], these systems continue to grapple with issues of poor interpretability, high data demands, limited generalization capabilities, and insufficient robustness. Alternatively, hierarchical autonomous driving systems [2, 3], which involve the process of perception, prediction, decision-making, planning, and control have exhibited considerable enhancements in terms of interpretability and robustness. However, for such systems, tasks including prediction and decision-making are often treated as relatively distinct modules, which ignores the inherent coupling relationship between them. This mutual effect comes from the influence of the prediction module on the decision-making module; and also, due to the interaction between vehicles, the decision-making module will in turn affect the behavior of the surrounding vehicles (SVs), thereby affecting the next prediction step. Since the coupling between prediction and decision-making could be overlooked, this separation can lead to deteriorated performance of autonomous vehicles (AVs) in complex situations. In response to this shortcoming, several studies have proposed frameworks that integrate the prediction and decision-making modules [4, 5], leveraging predicted trajectories or intention to inform the decision-making modules in autonomous driving. In integrated frameworks such as [6], the forecasting and imitation nets are trained jointly, which is more efficient compared to training the network individually. Despite these efforts, the design of an integrated framework remains a significant challenge.
In general, the output of the decision-making module can be obtained through approaches such as rule-based methods [7], optimization-based methods [8], and learning-based methods [9]. Rule-based and optimization-based methods have obtained commendable results owing to their strong interpretability and related attributes. Nevertheless, due to the limited adaptability of these methods, they generally deliver poor performance in complex real-world environments. On the other hand, learning-based methods have demonstrated significant and efficacious progress, with a particularly promising direction in reinforcement learning (RL). A deep reinforcement learning (DRL) method presented in [10] adjusts the clipping parameter at different stages of training using proximal policy optimization (PPO), so that the vehicle could quickly search for an approximate optimal policy or its neighborhood with a large parameter and then converge to the optimal policy with a smaller one. The close interplay between prediction and decision-making implies that the prediction module exerts a considerable influence on decision-making module. Therefore, it is crucial to incorporate a well-developed prediction module in the framework to ensure that the decisions made by the system are safe and reliable.
Mainstream prediction methods typically focus on intention or trajectory prediction. Within the integrated framework, trajectory prediction is more commonly utilized and it can be divided into two main types: optimization-based and learning-based methods. Exemplified by [11], optimization-based methods are highly efficient in capturing long-term movements, but replicating this success in complex urban road environments proves challenging due to the intricate road layouts and variable traffic conditions. In contrast, learning-based methods [12, 13] are more adept at handling prediction tasks in such complex settings, benefiting from the strengths of neural networks in processing large amounts of trajectory data and understanding urban road conditions. Learning-based methods forecast future trajectories by analyzing historical trajectory data, resulting in encouraging outcomes. The performance can be improved when transforming trajectory representation from the time domain to the frequency domain, as the frequency domain reflects different characteristics such as energy distribution. Relevant records have shown that these features in the frequency domain are critical to the prediction process. A novel approach for predicting state sequences through Fourier transform (FT) is proposed by [14], which extracts hidden structural information from the frequency domain, thereby enhancing sample efficiency. A network called SpectrumNet is introduced in [15] to predict pedestrian trajectories, which is capable of breaking down information across various time scales through FT and representing it in the frequency domain. In [16], the future trajectory is predicted from the history trajectory, which is transformed from the time domain to the frequency domain using FT. The low-frequency and high-frequency components reflect the long-term objective (LTO) and the short-term dynamic (STD) intention, respectively. However, the FT used in [14, 15, 16] fails to account for changes in frequency over time, hindering the dynamic estimation of trends in trajectory changes and affecting the performance of prediction in real-world scenarios. Given that the goal of trajectory prediction is to obtain precise future trajectories, it demands considerable computational resources, leading to difficulties in prediction within complex dynamic environments. In addition to trajectory prediction, intention prediction has also been the subject of extensive research. For example, an intention prediction approach developed by [17] utilizes Gaussian mixed hidden Markov models to forecast the short-term intention of vehicles based on their historical trajectories. However, this method concentrates exclusively on past trajectory data to predict short-term driving intention and does not take into account the LTO intention. The same issue can also be found in the majority of intention prediction approaches, such as [18]. The absence of consideration for LTO intention challenges the assurance of long-term decision-making effectiveness, which can compromise robustness in complex driving scenarios applied within integrated frameworks.
To address the aforementioned issues, we propose an integrated intention prediction and decision-making approach for autonomous driving, which leverages the proposed spectrum attention net to predict the intention of SVs at the frequency domain and utilizes the PPO algorithm to generate the decisions. The main contributions of this paper are as follows: An integrated intention prediction and decision-making framework is proposed to explicitly model the strong coupling effect between intention prediction and decision-making, where we use a joint learning mechanism to learn the coupling and ensure the effectiveness of network parameter updates. Specifically, we propose the spectrum attention net to predict the intention of the SVs with the spectrum of trajectory obtained by short-time Fourier transform (STFT). With the predicted intention of SVs (which is the combination of LTO and STD intention), the AV can make safer and more effective decisions. The effectiveness of the proposed integrated framework is demonstrated by comparisons with existing methods in various driving scenarios, and the results indicate that the proposed approach outperforms other baseline methods in these driving tasks.
We organize the remainder of the article as follows: Section II introduces the preliminaries of the Markov decision process (MDP) and STFT, and also gives the problem statement. The details of the proposed integrated framework are given in Section III, including the design of the spectrum attention net, the PPO algorithm, and the joint learning mechanism in the framework. Based on these findings, we conduct experiments in four different scenarios described in Section IV. Finally, we summarize the conclusion and future work in Section V.
II Preliminaries
II-A Markov Decision Process
MDP is used to describe decision-making problems with the Markov property. In an MDP, the agent selects the optimal decision based on the current state and possible actions to maximize long-term rewards. The MDP consists of five elements, which can be represented by the tuple , where is the state space, is the action space, is the state transition probability function that represents the probability of transitioning from a past state to a new state after taking the action. is the reward function that represents the immediate reward obtained after taking action in state , and is the discount factor used to discount the value of future rewards.
II-B Short-Time Fourier Transform
FT is pivotal in applications that need to transform signals between the time and frequency domains. An important variant is the STFT [19], which addresses the limitation of the FT in analyzing non-stationary signals where frequency components vary over time. The primary purpose of the STFT is to determine the frequency and phase content of local sections of a signal as it changes over time. This is achieved by multiplying the signal by a window function which is shifted along the duration of the signal. For a signal where represents the time step, the STFT transforms it into the frequency domain by applying FT to segments of the signal, defined by the window function. The STFT can be expressed as:
(1) |
where denotes the window function centered around time , is the index of time frame, and corresponds to the frequency.
On the other hand, when the frequency domain signal obtained from STFT needs to be converted to the time domain, the inverse short-time Fourier transform (ISTFT) can be adopted. Given the STFT representation of a signal, the corresponding signal at time domain can be reconstructed using the formula:
(2) |
II-C Problem Statement
The objective of this research is to develop a framework for effectively navigating AV in various traffic environments. We will focus on four key scenarios: a straight roadway, a four-way intersection, a two-way intersection, and a roundabout.
By modeling this problem as an MDP, our goal is to find an optimal policy , guided by parameters , that maximizes expected cumulative discounted rewards, formulated as:
(3) |
Specifically, the state space and action space in the problem is defined as:
State space : We define the state at any given time as a matrix,
(4) |
where is the number of vehicles. The state of the AV is captured in the first row of as , and other rows of , i.e., , represent vectors of the kinematic features of the SVs. Each vehicle’s state is characterized by the following kinematic features:
(5) |
which contains the information about the position, velocity, and orientation of vehicle at time step .
Action space : The agent has access to a set of five discrete actions,
(6) |
including lateral maneuvers such as changing lanes to the left or right (, ), maintaining current motion (), and longitudinal adjustments like decelerating or accelerating (, ).
III METHODOLOGY
In this section, we present an inaugural framework for integrated intention prediction and decision-making as shown in Fig. 1, where the AV utilizes the intention prediction obtained from the spectrum attention net to guide its decisions. We first introduce the details of the proposed spectrum attention net, which is used to predict the intention of SVs. Then, we illustrate the advantages of the PPO algorithm in processing complex and non-stationary inputs as a decision-making module in the integrated framework. Finally, we give the details about the joint learning process of the intention prediction module and decision-making module in the integrated framework.

III-A Intention Prediction Module with Spectrum Attention Net
The behavior of SVs is affected by both LTO and STD intention, which can be reflected by the low-frequency component and high-frequency component in the spectrum, respectively. Therefore, we propose a spectrum attention net to predict the intention of SVs from the frequency spectrum, where the spectral information can characterize the intention of SVs.
At time step , the AV observes current environmental states, which include information about both itself and observable SVs. The state information of these observable SVs is recorded in the trajectory buffer with a horizon of to maintain a consistent input length for the spectrum attention net. Specifically, for each observable vehicle in the scenario, its state information over the time interval is incorporated into the trajectory buffer , thereby forming a segmented historical trajectory for vehicle . By utilizing the segmented historical trajectory as time sequence input, we derive the features of multiple frequency components evolving over time through the transformation of time sequence input into the frequency domain using STFT, expressed as follows:
(7) |
where denotes the trend of the frequency component at across time frame . Note that is the number of frequency components, is the number of time frames, and denotes the dimension of kinematic features. Indeed, STFT can be regarded as a feature extraction function that derives intent representation from a segmented historical trajectory. The fixed time-frequency resolution of STFT ensures uniform representation of each time frame, which is crucial for the attention mechanism to effectively discern and correlate different temporal segments, thereby facilitating accurate intention predictions.
The behavior of SVs involves weighing different levels of intention, and the intention is interrelated. The frequency components at different time frames within the spectrogram depict the distribution of intention across time. Therefore, to thoroughly research intention patterns across diverse combinations of frequency components, we introduce a multi-head spectrum attention net in this study to forecast the intention of SVs. Initially, the spectrogram is embedded with a linear projection and then sent to the spectral attention module. Through the transformations executed by networks , , and , all embedded vectors are converted into query , key , and value , respectively. The attention scores are computed by multiplying query matrix with the transpose of , followed by scaling with the inverse square root of dimension and normalization via a softmax function across the frequency dimension. This process yields a spectral attention matrix, which is utilized to evaluate the correlation among different frequency components, thereby facilitating the prediction about the intention of vehicle . The resulting prediction vector is subsequently fed into the linear projection network, with each head of spectral attention generating outputs as represented below:
(8) |
Finally, the outputs from all heads are combined and decoded by the intention decoder, ultimately serving as the predicted intention of SV.
Remark 1
The behavior of SVs is affected by multiple levels of intention at the same time, including LTO and STD intention. For a spectrogram that delineates frequency domain variations over time, the attention mechanism can capture the trend of each frequency component over time and their interrelations. Therefore, reasonable predictions about the spectrum distribution in a future time horizon can be obtained by spectrum attention net and reflect the intention of SVs.
III-B Decision-Making Module with PPO
One of the main challenges in the decision-making module of the integrated framework is the non-stationary problem led by joint learning. In the joint learning process, the intention prediction in the observation input of the decision-making algorithm keeps changing, therefore the decision-making policy needs to be adjusted accordingly. As the spectrum attention net in the intention prediction module updates, corresponding adjustments of the policy in the decision-making module become necessary. However, unchecked policy updates can lead to catastrophic consequences in this scenario. Significant updates on the policy of the decision-making module based on current intent prediction module may lead to excellent performance in current environment, but the performance could be suboptimal or even terrible after the intent prediction module updates its network parameters. To address this concern, we adopt the PPO algorithm in the integrated framework.
The PPO algorithm is an online reinforcement learning framework designed to solve sequential decision-making problems. Notably, the PPO algorithm employs a clipping mechanism within its objective function to ensure modest policy updates. This feature becomes especially crucial when handling complex and non-stationary multimodal inputs from the intention prediction module. Specifically, the objective function of PPO designed to mitigate the risk of destabilizing updates is given by:
(9) |
where represents the probability ratio of selecting action in state under the new policy compared to the old policy . Additionally, denotes the estimated advantage function at timestep , and serves as a hyperparameter defining the clipping boundary.
Essentially, the clipped surrogate objective in PPO provides a theoretically constrained network update routine to tackle the non-stationary problem associated with the intention prediction module. This mechanism ensures a stable policy update step within the clipping boundary, which is indispensable for the algorithm’s convergence and stationarity in a non-stationary environment.
III-C Joint Learning Process
The integrated intention prediction and decision-making framework is proposed in this study utilizing the spectrum attention net and PPO algorithm. The joint training of the intention prediction and decision-making modules is employed, as it outperforms the standalone training mechanism [6].
For the joint learning process in the integrated framework, the agent needs to explore the environment and collect experience based on the current network parameters first. In detail, the spectrum attention net predicts SVs’ intention and the PPO algorithm makes decisions based on the observation and predicted intention to explore the environment and collect experiences. Then, the collected experiences serve as training data for the spectrum attention net and the PPO algorithm in the integrated framework. Specifically, the predicted intention recorded in the experiences is sent to the trajectory decoder and transformed to trajectory time sequences in the future time horizon through ISTFT transformation. Subsequently, the trajectory time sequences are compared with the ground truth recorded in the collected experience to calculate the prediction loss. Finally, the spectrum attention net is optimized by minimizing the prediction loss with stochastic gradient descent. Meanwhile, the collected experiences contribute to optimizing the network parameters in the PPO algorithm by maximizing the objective function as in (9). To sum up, the learning process in the integrated framework is outlined in Algorithm 1.
Remark 2
The prediction and decision-making modules of autonomous driving systems are highly coupled in the conventional setting. Therefore, in this work, we explicitly model the coupling relationship through joint learning of the prediction and the decision-making modules. Specifically, in the proposed integrated framework, the spectrum attention net and the decision-making algorithm use the identical data for training and then update parameters simultaneously. It is worth noting that unlike end-to-end methods which optimize all networks with decision-making objective function, we optimize the spectrum attention net through the prediction loss, thus ensuring the effectiveness of network parameter updates.




IV EXPERIMENTS
In this section, we test the proposed integrated framework in the four common autonomous driving scenarios and compare the results with other classic DRL algorithms. The simulations are conducted in an OpenAI Gym environment [20], and the behaviors of SVs are characterized by the intelligent driver model (IDM) [21]. and the baseline DRL methods are implemented based on Stable-baselines3 [22].
Hyperparameter | DQN | A2C | PPO | Ours |
Discount Factor | ||||
Learning Rate | ||||
Batch Size | ||||
Exploration Rate | - | - | - | |
Target Network Update Frequency | - | - | - | |
Entropy Coefficient | - | |||
Value Function Coefficient | - | |||
Number of Steps | - |
IV-A Experimental Setup
Straight Road: In this scenario, the AV drives on a four-lane straight road filled with SVs, with the task of safely driving 600 meters without collisions within 1 minute. The reward function is defined as , where and represent the rewards of completing the task and driving on the right side of the road. and are the penalties of collision and lane change, which can reduce unnecessary lane-changing behavior. will reward high-speed driving behavior within the speed limitation while penalizing for exceeding the speed limitation.
Intersection-v0: This scenario is a four-way intersection, with the task of safely and quickly driving toward the designated lane. The reward function is settled as , where is the penalty of out of the road and is the speed of the AV and number of SVs in every time interval, which reflects the collision rate.
Intersection-v1: It is similar to Intersection-v0, except that there are two lanes in one direction, and vehicles are allowed to change lanes. Therefore, a lane-changing penalty is added to the reward function, which is written as .
Roundabout: In this scenario, there is a roundabout with two lanes, with the task of passing through the roundabout as soon as possible without collision. To prevent vehicles from circling, a penalty of timeout is added in the reward function, which can be expressed as
To validate the effectiveness of the proposed method, we evaluate its performance in the above scenarios and compare it with other three baseline DRL methods. The number of SVs is 8. In each simulation, we randomize the states of all agents to prevent the policy network from memorizing actions. We compare the proposed framework with the following baseline methods:
The hyperparameters for network training of the four algorithms are listed in Table I, and all networks use Adam as the optimizer. The experiments with the aforementioned scenarios are conducted on the Windows 11 system environment with an Intel Core i7-14700KF processor and an NVIDIA RTX 4070 Ti GPU.
IV-B Performance Demonstration
















To verify the effectiveness of the proposed algorithm, we implement our framework and other baseline methods in the aforementioned scenarios and obtain the learning curves shown in Fig. 2. It can be observed that the training results of the proposed algorithm are significantly better than those of all baseline methods. This could be attributed to the effectiveness of the intention prediction module within the proposed integrated framework. The decision-making process of SVs takes into account both STD and LTO intention, where LTO intention indicates the willingness to complete tasks of SVs and STD reflects short-term obstacle avoidance intention. The AV predicts the intention of SVs, thereby making decisions accordingly in quick succession, resulting in greater rewards than other baseline methods. Compared with the ablation study of the PPO method without these intention predictions, the method we proposed achieves a reward increase in all scenarios, but just a little improvement in the roundabout scenario. The reason is that our framework has some difficulties in taking spatio-spectrum into account, which significantly affects the performance in the roundabout scenario, due to its complex road structure. When compared with two other baseline methods, the improvement is even greater.
To further evaluate the performance of the integrated intention prediction and decision-making framework, we evaluate the trained policy in the test environment and set three metrics to assess the effectiveness of each algorithm: (1) we define the success rate as the ratio of successful task completions in 100 trials to reflect the ability of different algorithm-trained policies to complete the task; (2) we define the efficiency of the trained policy as , where and are the time thresholds for task completion; (3) we define the safety as the proportion of the AV that does not collide with SVs or the road boundary in 100 trials. The results are shown in Table II. It can be observed that the framework we proposed achieves the best performance in all four scenarios, which can be attributed to predicting the LTO and STD intention of SVs, enabling the agent to strike a balance between safety and efficiency, thus yielding better results.
Methods | Straight Road | Intersection-v0 | Intersection-v1 | Roundabout | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
succ.(%) | eff.(%) | safety(%) | succ.(%) | eff.(%) | safety(%) | succ.(%) | eff.(%) | safety(%) | succ.(%) | eff.(%) | safety(%) | |
A2C | 65 | 74 | 79 | 70 | 92 | 72 | 42 | 91 | 46 | 67 | 54 | 81 |
PPO | 75 | 70 | 81 | 68 | 88 | 71 | 67 | 81 | 68 | 91 | 82 | 94 |
DQN | 61 | 67 | 77 | 68 | 84 | 73 | 66 | 71 | 73 | 61 | 88 | 65 |
Ours | 93 | 86 | 95 | 88 | 92 | 91 | 87 | 90 | 89 | 93 | 92 | 96 |
Fig. 3 shows the interactive behavior of the AV in the four scenarios. As a representative example, in the scenario of Intersection-v1, the goal of the AV is to safely and efficiently navigate the intersection. In the left turn task, the AV notices the intention of a vehicle making a left turn and moving slowly. So the AV chooses to change lanes before a conflict occurs to avoid a collision. When the AV detects a lane conflict with the planned route of the same vehicle in front, it opts to change lanes to a safer but further lane. This behavior demonstrates the trade-off considered by the AV between efficiency and safety, and the AV selects a safer decision after evaluation. Here, we only visualize the AV’s trajectory in the representative snapshots.
V CONCLUSIONS
In this paper, we propose a novel integrated intention prediction and decision-making framework for autonomous driving systems from the perspective of the frequency domain. It is worth noting that a spectrum attention net is designed to capture the trend of each frequency component over time and their interrelations, so that the intention of SVs can be predicted. Besides, the PPO algorithm tackles the non-stationary problem in the integrated framework by employing a clipping mechanism within its objective function. Experiments in four representative traffic scenarios are conducted to verify the effectiveness of the proposed framework. The results show that the proposed framework outperforms baselines across various metrics in terms of the success rate of completing driving tasks, efficiency, and safety. Future work involves taking advantage of frequency domain representation in dealing with uncertainty in autonomous driving tasks.
References
- [1] S. R. Jaladi, Z. Chen, N. R. Malayanur, R. M. Macherla, and B. Li, “End-to-End Training and Testing Gamification Framework to Learn Human Highway Driving,” in IEEE 25th Conf. Intell. Transport. Syst. Proc., pp. 4296–4301, 2022.
- [2] K. Shu, H. Yu, X. Chen, S. Li, L. Chen, Q. Wang, L. Li, and D. Cao, “Autonomous Driving at Intersections: A Behavior-Oriented Critical-Turning-Point Approach for Decision Making,” IEEE-ASME Trans. Mechatron., vol. 27, no. 1, pp. 234–244, 2022.
- [3] Y. Jiao, M. Miao, Z. Yin, C. Lei, X. Zhu, X. Zhao, L. Nie, and B. Tao, “A Hierarchical Hybrid Learning Framework for Multi-Agent Trajectory Prediction,” IEEE Trans. Intell. Transp. Syst., pp. 1–11, 2024.
- [4] M. Kloock, P. Scheffe, S. Marquardt, J. Maczijewski, B. Alrifaee, and S. Kowalewski, “Distributed model predictive intersection control of multiple vehicles,” in IEEE 22nd Conf. Intell. Transport. Syst. Proc., pp. 1735–1740, IEEE, 2019.
- [5] X. Tang, K. Yang, H. Wang, J. Wu, Y. Qin, W. Yu, and D. Cao, “Prediction-Uncertainty-Aware Decision-Making for Autonomous Vehicles,” IEEE Trans. Intell. Veh., vol. 7, no. 4, pp. 849–862, 2022.
- [6] Z. Huang, H. Liu, J. Wu, and C. Lv, “Differentiable Integrated Motion Prediction and Planning with Learnable Cost Function for Autonomous Driving,” IEEE Trans. Neural Netw. Learn. Syst., pp. 1–15, 2023.
- [7] J. Ding, L. Li, H. Peng, and Y. Zhang, “A Rule-Based Cooperative Merging Strategy for Connected and Automated Vehicles,” IEEE Trans. Intell. Transp. Syst., vol. 21, no. 8, pp. 3436–3446, 2020.
- [8] J. Ma, Z. Cheng, X. Zhang, M. Tomizuka, and T. H. Lee, “Alternating direction method of multipliers for constrained iterative LQR in autonomous driving,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 12, pp. 23031–23042, 2022.
- [9] Z. Peng, X. Zhou, L. Zheng, Y. Wang, and J. Ma, “Reward-Driven Automated Curriculum Learning for Interaction-Aware Self-Driving at Unsignalized Intersections,” arXiv preprint arXiv:2403.13674, 2024.
- [10] Z. Peng, X. Zhou, Y. Wang, L. Zheng, M. Liu, and J. Ma, “Curriculum Proximal Policy Optimization with Stage-Decaying Clipping for Self-Driving at Unsignalized Intersections,” in Proc. 26th Int. IEEE Conf. Intell. Transp. Syst., 2023.
- [11] J. Leu, Y. Wang, M. Tomizuka, and S. Di Cairano, “Autonomous Vehicle Parking in Dynamic Environments: An Integrated System with Prediction and Motion Planning,” in Int. Conf. Robot. Autom., pp. 10890–10897, 2022.
- [12] E. Zhang, R. Zhang, and N. Masoud, “Predictive Trajectory Planning for Autonomous Vehicles at Intersections Using Reinforcement Learning,” Transp. Res. Pt. C-Emerg. Technol., vol. 149, 2023.
- [13] Y. Chen, B. Ivanovic, and M. Pavone, “ScePT: Scene-Consistent, Policy-Based Trajectory Predictions for Planning,” in IEEE Conf. Comput. Vis. Pattern Recognit., pp. 17082–17091, 2022.
- [14] M. Ye, Y. Kuang, J. Wang, Y. Rui, W. Zhou, H. Li, and F. Wu, “State Sequences Prediction via Fourier Transform for Representation Learning,” in Adv. Neural Inf. Proces. Syst., vol. 36, pp. 67565–67588, 2023.
- [15] S. Liu, Y. Zhu, P. Yao, T. Mao, and Z. Wang, “SpectrumNet: Spectrum-Based Trajectory Encode Neural Network for Pedestrian Trajectory Prediction,” in IEEE Int. Conf. Acoust., Speech Signal Process., Proc., pp. 7075–7079, 2024.
- [16] C. Wong, B. Xia, Z. Hong, Q. Peng, W. Yuan, Q. Cao, Y. Yang, and X. You, View Vertically: A Hierarchical Network for Trajectory Prediction via Fourier Spectrums, book section Chapter 39, pp. 682–700. Lecture Notes in Computer Science, 2022.
- [17] Y. Luo, J. Zhang, S. Wang, F. Lv, J. Zhang, and H. Gao, “A Driving Intention Prediction Method for Mixed Traffic Scenarios,” in IEEE 7th Int. Conf. Intell. Transp. Eng., pp. 296–301, 2022.
- [18] J. Karlsson and J. Tumova, “Intention-Aware Motion Planning with Road Rules,” in IEEE 16th Int. Conf. Autom. Sci. Eng., pp. 526–532, 2020.
- [19] J. B. Allen and L. R. Rabiner, “A Unified Approach to Short-Time Fourier Analysis and Synthesis,” Proc. IEEE, vol. 65, no. 11, pp. 1558–1564, 1977.
- [20] E. Leurent, “An Environment for Autonomous Driving Decision-Making.” https://github.com/eleurent/highway-env, 2018.
- [21] M. Treiber, A. Hennecke, and D. Helbing, “Congested Traffic States in Empirical Observations and Microscopic Simulations,” Phys. Rev. E, vol. 62, no. 2, p. 1805, 2000.
- [22] A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann, “Stable-Baselines3: Reliable Reinforcement Learning Implementations,” J. of Mach. Learn. Res., vol. 22, no. 268, pp. 1–8, 2021.
- [23] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous Methods for Deep Reinforcement Learning,” in Int. Conf. on Mach. Learn.Proc., pp. 1928–1937, 2016.
- [24] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing Atari with Deep Reinforcement Learning,” arXiv:1312.5602, 2013.
- [25] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal Policy Optimization Algorithms,” arXiv:1707.06347, 2017.