ForecastGAN: A Decomposition-Based Adversarial Framework for Multi-Horizon Time Series Forecasting
Abstract
Time series forecasting is essential across domains from finance to supply chain management. This paper introduces ForecastGAN, a novel decomposition based adversarial framework addressing limitations in existing approaches for multi-horizon predictions. Although transformer models excel in long-term forecasting, they often underperform in short-term scenarios and typically ignore categorical features. ForecastGAN operates through three integrated modules: a Decomposition Module that extracts seasonality and trend components; a Model Selection Module that identifies optimal neural network configurations based on forecasting horizon; and an Adversarial Training Module that enhances prediction robustness through Conditional Generative Adversarial Network training. Unlike conventional approaches, ForecastGAN effectively integrates both numerical and categorical features. We validate our framework on eleven benchmark multivariate time series datasets that span various forecasting horizons. The results show that ForecastGAN consistently outperforms state-of-the-art transformer models for short-term forecasting while remaining competitive for long-term horizons. This research establishes a more generalizable approach to time series forecasting that adapts to specific contexts while maintaining strong performance across diverse data characteristics without extensive hyperparameter tuning.
keywords:
Time series forecasting , Generative adversarial networks , Time series decomposition , Multi-horizon prediction[1]organization=Department of Mechanical, Automotive and Materials Engineering,addressline=401 Sunset Ave, city=Windsor, postcode=N9B 3P4, state=ON, country=Canada
1 Introduction
Time series data is omnipresent in today’s digital and data-abundant world. Time series forecasting serves as a critical tool in numerous applications that involve both univariate data (e.g., daily stock prices [1]) and multivariate data (e.g. temperature, humidity and wind speed for weather forecasting [2]). The versatility of time series forecasting in handling these diverse data types underscores its significance in modern analytics.
Over the past several decades, time series forecasting has evolved significantly. Traditional statistical models such as Autoregressive Integrated Moving Average (ARIMA) [3] initially dominated the field. These were followed by classical machine learning techniques including Gradient Boosting [4], Random Forest [5], and Support Vector Machines [6]. The emergence of artificial intelligence further transformed forecasting capabilities through advanced architectures like Recurrent Neural Networks [7] and Convolutional Neural Networks [8], which capture complex non-linear patterns in time series data. More recently, two significant developments have shaped the field: (1) the introduction of Generative Adversarial Network (GAN) s [9], which enable more robust adversarial training approaches, and (2) Transformer architectures [10], which have demonstrated remarkable capabilities in sequence modeling across multiple domains.
Despite these advances, current models exhibit domain-specific performance characteristics that limit their generalizability. For example, transformer based models excel in long-term forecasting, but often underperform in short-term scenarios [11]. This performance discrepancy was highlighted in our previous comparative study [12], which revealed significant variability in model performance between different datasets and forecasting horizons. These limitations indicate a need for more adaptive forecasting frameworks that can leverage the strengths of existing state-of-the-art models while maintaining flexibility across diverse forecasting contexts. Additionally, most current approaches focus exclusively on numerical features, neglecting the valuable information contained in categorical variables that are common in real-world time series data.
We propose ForecastGAN, a novel modular framework that addresses these challenges through a systematic decomposition-based approach with adversarial training. Our architecture consists of three specialized, interconnected modules:
-
1.
Decomposition Module: Extracts seasonal and trend components from numerical features while encoding categorical variables to maintain their information content.
-
2.
Model Selection Module: Dynamically selects the optimal model architecture based on dataset characteristics and forecasting horizon.
-
3.
Adversarial Training Module: Employs Conditional Generative Adversarial Network (cGAN) training to enhance the robustness and accuracy of predictions.
This modular design allows each component to be optimized independently while ensuring effective communication between modules. The framework maintains abstraction between various aspects of data processing, enabling more flexible adaptation to different forecasting scenarios. The architecture is presented in Figure 1.
The integration of these approaches is theoretically motivated by their complementary strengths. Time series decomposition isolates more predictable patterns (seasonality and trends), making the forecasting task more manageable. Model selection addresses the horizon-specific performance characteristics of different architectures. Finally, adversarial training transforms otherwise deterministic models into probabilistic ones, enhancing their robustness to data variability and uncertainty. From a mathematical perspective, cGAN learn the conditional probability distribution of future values given historical data. This probabilistic approach better captures the inherent uncertainty in forecasting tasks compared to deterministic point estimates, particularly when dealing with complex multivariate time series. The contributions of this paper are:
-
•
A robust modular framework that delivers consistent performance across diverse forecasting horizons and datasets by separating the forecasting process into specialized functional components.
-
•
Empirical validation of adversarial training’s effectiveness in improving predictive accuracy for otherwise deterministic forecasting models.
-
•
New insights into the relationship between look-back window size, and forecasting horizon, with implications for future forecasting research.
-
•
Comprehensive evaluation across eleven benchmark datasets demonstrating an average 37.54% improvement over state-of-the-art transformer models for short-term forecasting while maintaining competitive performance for long-term horizons.
The remainder of this paper is organized as follows: Section 2 reviews related literature on time series forecasting methods. Section 3 provides theoretical background on the key concepts underlying our approach. Section 4 details the ForecastGAN architecture and its components. Section 5 describes our experimental methodology. Section 6 presents and discusses results. Finally, Section 7 concludes the paper and suggests directions for future research.
2 Related Work
The ForecastGAN architecture involves multiple concepts including time series decomposition, adversarial training, etc. We discuss the existing research to lay the foundation for model architecture. We start with discussing the model evolution for time series forecasting, followed by the applications of GANs for time series forecasting.
2.1 Time Series Forecasting Models
Traditional statistical models such as ARIMA and Exponential Smoothing (ES) are widely used for industrial time-series forecasting due to their simplicity and interpretability [13]. In some cases, these models demonstrate satisfactory performance but struggle with complex datasets that exhibit nonlinear features [14]. Machine learning techniques such as Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Artificial Neural Network (ANN) emerge as promising alternatives, offering improved performance in capturing complex relationships and nonlinearity in time series data. However, these models often require more computational resources and can be less interpretable in comparison to traditional models [15]. Deep learning models currently show superior performance in various industrial forecasting tasks, including Long Short-Term Memory (LSTM) networks and Convolutional Neural Networks (CNN) s. They excel by modeling long-term dependencies and handling high-dimensional data [16]. Nevertheless, these models can be computationally intensive and might need substantial training data for optimal performance. Hybrid models, combining different techniques, are proposed to overcome the limitations of individual models. By integrating traditional models with machine learning or deep learning techniques, hybrid models improve performance and adaptability in various industrial forecasting tasks [17]. The drawback of hybrid models is they are more accurate around particular use cases and are less likely to be effective around wider conditions. While GAN s hold potential in time-series forecasting, they face challenges like training difficulty and mode collapse. Applying GAN s in time-series forecasting remains an active research area, with ongoing development of new techniques and refinements to address these challenges. A detailed discussion on time series forecasting models and their comparison is presented in one of our earlier works [18]. Furthermore, the comparison of some State of the Art (SOTA) models for long-term and short-term forecasting has been explained in detail in another paper where we have explored the strengths of some models depending upon the forecasting horizon and the chaotic element in the training data [12].
2.2 GANs for Time Series Forecasting
The absence of a standardized evaluation framework for GAN s initially restricted their application to fields where their outputs could be visually interpreted, such as in image generation. However, the scope of GAN s has expanded recently to include time-series data, finding applications across diverse sectors, including healthcare, finance, and the energy sector [19]. For instance, GAN s combined with auto-regressive models have been explored for enhanced sequential data generation. Techniques such as conditioning GAN s on timestamp information have been developed to manage irregular sampling intervals. In probabilistic forecasting, conditional GAN s have been increasingly utilized. For example, Koochali et al. employed a conditional GAN integrated with LSTM units for univariate time series modeling, testing it on both synthetic and real-world datasets [20]. Another study used a Conditional GAN with LSTM and Multi-Layer Perceptron (MLP) components for predicting daily stock closing prices, incorporating Mean Square Error (MSE) with the generator loss to enhance performance [21]. Zhou et al. applied LSTM and CNN in an adversarial training framework for forecasting in the high-frequency stock market, focusing on minimizing forecast errors such as Mean Absolute Error (MAE) or MSE alongside the GAN ’s objective function [22].
Lin et al. proposed a traffic flow forecasting model sensitive to pattern variations, capable of providing accurate predictions in abnormal conditions without compromising regular performance [23]. This model uses a cGAN with an MLP structure and introduces two additional error terms to the standard generator loss, focusing on forecast error and reconstruction error. These advancements demonstrate the growing versatility and applicability of GAN s in time series forecasting across various sectors. Some of the popular GAN s architectures and their applications have been shown in the Appendix in Table 5.
3 Background
This section establishes the theoretical foundations for ForecastGAN’s modular architecture. We first formalize the multi-horizon time series forecasting problem, then explore the theoretical underpinnings of each core component: time series decomposition, model selection for varying horizons, and adversarial training with cGAN.
3.1 Multi-Horizon Time Series Forecasting
To design a multivariate forecasting model consider multivariate time-series , where each represents a feature vector at time step , with being the number of feature set and denotes the data point at time step for feature . The look-back or the sliding window is the span of past time steps to make predictions. Let be the sliding window size and be the future timesteps or the forecasting horizon. Given the historical data the objective for this architecture is to predict the future values where is the value of variable at timestep , is the predicted value after timesteps. For , the forecasting model only gives point-wise predictions rather than a future trend. For the forecasting model uses single-step forecasting iteratively to predict future values where is the multiplying factor for the number of steps in predictions. This is called iterative multi-step forecasting, which is used in this paper. In iterative multi-step forecasting, the one-step prediction is made, and for the next step, this predicted value is fed back into the model. The prediction process for iterative multi-step forecast can be given by equation 1.
| (1) |
The other method for predicting the next steps is direct multi-step forecasting, in which separate models are trained for each forecasting step. Each model directly predicts the value of the time series at a specific future time step. This approach can mathematically be represented as equation 2.
| (2) |
Each approach has theoretical advantages and limitations. Iterative methods can accumulate errors over multiple steps, particularly when the forecasting model has significant uncertainty. Conversely, direct methods require training multiple models, increasing computational complexity but potentially yielding higher accuracy for specific horizons. For medium to large values of , direct multi-step forecasting often produces more accurate results by optimizing each model for its specific target horizon. ForecastGAN leverages this insight by employing a model selection approach that considers the specific forecasting horizon.
3.2 Time Series Decomposition
Harvey and Peters [24] initially presented the idea of decomposing time series data into multiple cyclic and ordered sets, proposing that the original data can be divided into trend, seasonality, and holiday components. Classical decomposition theory separates a time series into:
| (3) |
Where represents trend, represents seasonality, and represents residuals or irregular components. This decomposition provides several theoretical advantages:
-
•
Complexity Reduction: By isolating predictable patterns (trend and seasonality), the forecasting task becomes more manageable [25].
-
•
Component-Specific Modeling: Different components may benefit from different modeling approaches. For instance, trend components often exhibit smoother patterns suitable for linear models, while seasonal components may require more flexible nonlinear approaches [26].
-
•
Feature Enhancement: Decomposition effectively creates new features that capture different temporal dynamics, enriching the information available to subsequent modeling stages [27].
Some famous examples of using decomposition as a preprocessing tool for historical data are seen in Prophet [26] where the input data is divided into a trend, seasonality, and holiday components,Neural Basis Expansion Analysis for Interpretable Time Series (N-BEATS) model [28] uses a similar concept in basis expansion for univariate time series point forecasting and DeepGLO [29] uses the concept of dividing the original time series in basis time series with matrix factorization .
In our implementation, we employ average pooling with appropriate padding to extract trend components, following the approach in [25]. The trend cyclic component captures the long-term data trends, and seasonality captures the apparent effects of certain time elements on the underlying value. Consider the original time series as where is the length of the series and is the number of features. The extracted trend and components can be given as:
| (4) | ||||
| (5) |
where the Average Pooling (AvgPool) is used to divide the series into overlapping (or non-overlapping) regions and compute the average. This moving average operation is used to smooth the fluctuations in the data, making the series easier to predict. Padding is used to control the spatial dimensions of the series, i.e., to keep the length of the series the same as the original.
3.3 Theoretical Limitations of Transformers for Short-Term Forecasting
Transformer models have demonstrated exceptional capabilities for long-term forecasting but often underperform in short-term scenarios. This limitation has a theoretical basis in the architecture’s design:
- •
-
•
Parameter Efficiency: Transformer models typically contain millions of parameters, which may lead to overfitting when applied to short-term forecasting with limited data points [25].
- •
These theoretical considerations suggest that simpler models, such as linear networks with appropriate embeddings, might outperform transformers for short-term forecasting tasks [11]. This insight motivates our Model Selection Module, which can adaptively choose between different model architectures based on the forecasting horizon.
3.4 Adversarial Training for Robust Forecasting
Adversarial training offers a theoretical framework for enhancing model robustness by exposing the model to challenging examples during training [33]. In the context of time series forecasting, this approach addresses several fundamental challenges. Time series data often exhibits distribution shifts between training and testing periods. Adversarial training helps models become more robust to such shifts. Deterministic forecasting models provide point estimates without capturing prediction uncertainty. Adversarial frameworks, particularly GANs, learn the conditional distribution of future values, inherently capturing uncertainty. In time series with multiple possible futures, standard forecasting models might average across possibilities, producing unrealistic predictions. GANs can potentially capture multimodal future distributions.
Adversarial training is a technique employed to improve the generalization and robustness of models against adversarial attacks [33]. It involves training a model with clean and adversarially perturbed examples to make it more robust to small but intentionally worst-case perturbations. Consider a predicting model with parameters and input and as the ground truth. The adversarial example is generated by adding a small perturbation to such that and is designed to maximize the loss . Thus, the objective in adversarial training for a data distribution reduces to a min-max optimization task:
| (6) |
Transitioning from deterministic to probabilistic models can further enhance the robustness of the predictive models [34]. In a deterministic model, the output is a single point estimate, but a probabilistic model predicts a distribution over possible outcomes. This shift can be achieved by modeling the output as a random variable and using Bayesian methods or variational inference[35]. Mathematically, for the predicted output instead of predicting , a probabilistic model predicts which is a probability distribution parameterized by . The architecture for cGAN is shown in Fig. 2.
The generator functions as the probabilistic model, while the discriminator provides essential gradients for optimizing the generator during its training phase. To learn , we utilize historical data as the condition in the cGAN. The generator is tasked with producing , thereby learning the probability distribution equivalent to , which is the desired target distribution. The value function employed in the training of GAN (the probabilistic forecast model) is formulated as follows:
| (7) |
where represents the min-max game between the generator and discriminator . is the value function for the GAN. is the logarithm of the probability that assigns to real data, where is the real data. is the logarithm of the probability that assigns to fake data where is the real data.
This probabilistic framework enables the model to quantify uncertainty in its predictions, which can be particularly valuable in adversarial settings. Training the model to predict distributions rather than point estimates makes it more adept at handling the variability and uncertainty introduced by adversarial perturbations, ultimately leading to more robust and reliable forecasting systems.
4 Methodology
The ForecastGAN architecture has been presented in Figure 1. This section presents the ForecastGAN architecture in detail. We begin with an overview of the framework, followed by in-depth explanations of each module, their interactions, and the overall workflow. The framework consists of three specialized, interconnected modules designed to address specific aspects of the forecasting challenge:
-
1.
Decomposition Module: Processes raw time series data by decomposing numerical features into seasonal and trend components, encoding categorical features, and extracting temporal features from date-time columns.
-
2.
Model Selection Module: Evaluates multiple model architectures on the processed data to identify the optimal configuration for the specific dataset and forecasting horizon.
-
3.
Adversarial Training Module: Employs conditional GAN training to enhance the robustness and accuracy of the selected model.
This modular design enables each component to be optimized independently while maintaining effective information flow between stages. The framework supports both short-term and long-term forecasting by adaptively selecting appropriate model configurations based on the specific forecasting task.
4.1 Look-Back Window Aggregator
The aggregator block outputs the data according to the set look-back window size. This represents the extent of consolidating the past data and is indicative of how much micro-level information is needed for the said prediction step. For example, for single-step prediction i.e., a value of can lose information necessary for good predictions. The impact of the sliding window size is discussed in detail in Section 6. For the aggregator, mean is used for the continuous variables and mode is used for the categorical variable values.
4.2 Decomposition Module
The Decomposition Module serves as the preprocessing foundation for the forecasting architecture. It transforms raw multivariate time series data into a format that highlights relevant patterns and preserves the information content of different feature types. For each numerical feature, the module performs time series decomposition using the following approach:
-
•
Trend Extraction: Apply average pooling with appropriate padding to extract the cyclic-trend component:
(8) -
•
Seasonality Extraction: Subtract the trend component from the original series to obtain the seasonal component:
(9)
This decomposition isolates predictable patterns (trend and seasonality), making the forecasting task more manageable for subsequent modules.
4.2.1 Categorical Feature Processing
For categorical features, the module applies one-hot encoding to transform them into a numerical representation while preserving their information content. This encoding creates a binary vector for each categorical value, allowing the model to leverage categorical information without imposing arbitrary ordinal relationships.
4.2.2 Temporal Feature Extraction
For datetime columns, the module extracts temporal proxy features that capture cyclical patterns at different timescales:
-
•
Day of week (captures weekly patterns)
-
•
Day of month (captures monthly patterns)
-
•
Month of year (captures yearly patterns)
-
•
Hour of day (captures daily patterns)
-
•
Minute of hour (captures hourly patterns)
-
•
Quarter (captures quarterly patterns)
These derived features provide explicit temporal context that helps models identify recurring patterns at different timescales.
4.2.3 Feature Embedding
The processed individual features are combined into a unified dataset and embedding is applied to preserve temporal relationships. As illustrated in Figure 3, this embedding process creates fixed-length vectors that incorporate information from all feature types at each time step. The complete algorithm for the Decomposition Module is presented in Algorithm 1.
4.3 Model Selection Module
The Model Selection Module identifies the optimal model architecture for a given dataset and forecasting horizon. This module addresses the observation that different model architectures exhibit varying performance characteristics depending on the specific forecasting task.
4.3.1 Model Variants
Inspired by [11], the module evaluates four variations of linear networks:
-
1.
Linear: Simple one-layer linear model serving as a baseline. It applies a linear transformation to the original multivariate time series data:
(10) where is the weight matrix, is the input for the -th variable, and is the corresponding prediction.
-
2.
NLinear: Extends the linear model with input sequence normalization. It normalizes by subtracting the last value of the sequence from the input, applies the linear transformation, and then adds back the subtracted value:
(11) where is the last value in the input sequence.
-
3.
DELinear: Applies a linear layer to the decomposed and embedded data for the input data containing both categorical and continuous features:
(12) where represents the decomposed and embedded data.
-
4.
DLinear: For datasets without categorical features, this model applies separate linear layers to the seasonal and trend components:
(13) (14) (15) where and are the seasonal and trend components, respectively, and and are their corresponding weight matrices.
The choice of linear models is motivated by their simplicity, stability, and computational efficiency, which make them particularly well-suited for adversarial training. Additionally, recent research has shown that these models can outperform complex transformer architectures for certain forecasting tasks [11].
4.3.2 Selection Process
The module evaluates each model on the validation set and selects the one with the lowest validation loss. This selection process can be formalized as:
| (16) |
where is the selected model, is the set of candidate models, is the loss function (e.g., MSE), and and are the validation inputs and targets, respectively. The complete algorithm for the Model Selection Module is presented in Algorithm 2.
4.4 Adversarial Training Module
The Adversarial Training Module enhances the selected model through conditional GAN training. This approach transforms the deterministic forecasting model into a probabilistic one, improving its robustness and generalization capabilities. The module consists of two primary components: The Generator is the best model selected by the Model Selection Module serves as the generator. It takes historical time series data as input and generates future predictions. While the Discriminator is a neural network that distinguishes between real and generated time series data. The discriminator architecture includes the input layer accepting the concatenated time series data and conditional information. Hidden layers have Dense layers with LeakyReLU activation, batch normalization, and dropout for regularization. The output layer is a single unit with sigmoid activation that outputs the probability of the input being real. The adversarial training process involves two alternating steps:
-
•
Discriminator Training:
-
–
Sample real data from the training set
-
–
Generate fake data using the generator
-
–
Compute discriminator loss for real data:
(17) -
–
Compute discriminator loss for fake data:
(18) -
–
Update discriminator parameters to minimize the combined loss:
(19)
-
–
-
•
Generator Training:
-
–
Generate fake data using the generator
-
–
Compute adversarial loss to fool the discriminator:
(20) -
–
Update generator parameters to minimize the adversarial loss
-
–
where BCE represents binary cross-entropy loss, is the discriminator, is the generator, is random noise, and is the conditional information (historical time series data).
4.4.1 GAN Stability Measures
GAN training is notoriously unstable, especially for time series data. We implement several measures to enhance stability:
-
•
Gradient Penalty: We apply a gradient penalty to the discriminator’s loss to enforce Lipschitz continuity, which helps prevent mode collapse and gradient explosion:
(21) where is a sample from a distribution that interpolates between real and generated samples.
-
•
Spectral Normalization: Applied to the discriminator’s weights to constrain its Lipschitz constant, further stabilizing training.
-
•
Two-Timescale Update Rule (TTUR): Different learning rates for the generator and discriminator, which has been shown to improve convergence.
4.4.2 Inference Process
For inference (making predictions on new data), we use only the generator component of the trained GAN. The generator takes historical time series data as input and produces forecasts for the specified horizon. The complete algorithm for the Adversarial Training Module is presented in Algorithm 3.
-
•
Sample batch of real data from
-
•
Generate batch of fake data
-
•
Compute discriminator loss for real data:
-
•
-
•
Compute discriminator loss for fake data:
-
•
-
•
Apply gradient penalty
-
•
Combine losses and update discriminator parameters:
-
•
-
•
Update discriminator parameters to minimize
-
•
Sample batch of input data from
-
•
Generate batch of fake data
-
•
Compute generator loss to fool the discriminator:
-
•
-
•
Update generator parameters to minimize the adversarial loss
4.5 Complexity Analysis
The computational complexity of ForecastGAN can be analyzed for each module: For the Decomposition Module, the complexity is dominated by the average pooling operation, which has a complexity of , where is the number of time steps and is the number of features. In Model Selection Module, the linear models have training complexity of , where is the dimensionality of the feature space after decomposition and embedding. Evaluating all four model variants has a complexity of . Lastly, for the Adversarial Training Module, the complexity depends on the selected model architecture and the number of training epochs. For a linear generator, the complexity is approximately , where is the number of epochs.
The overall computational complexity of ForecastGAN is therefore , which is significantly lower than transformer-based approaches with complexity on the order of due to the self-attention mechanisms [11]. In practice, this translates to faster training times. For example, on the ETTh1 dataset with , ForecastGAN trains in approximately 15 minutes on a single NVIDIA RTX GPU, compared to over an hour for transformer-based models like Informer on the same hardware [36].
5 Experiments
To evaluate ForecastGAN comprehensively, we conducted extensive experiments across multiple datasets with varying forecasting horizons. This section details our experimental methodology, including datasets, baseline models, evaluation metrics, and implementation details.
5.1 Datasets
Extensive experiments are conducted for eleven standard real-world multivariate time series datasets for long-term forecasting. The complete details of these datasets are given in 1.
| Dataset | Features | Timesteps | Sample Rate |
|---|---|---|---|
| ETTh1 | 7 | 17,420 | 1 hour |
| ETTh2 | 7 | 17,420 | 1 hour |
| ETTm1 | 7 | 69,680 | 5 minutes |
| ETTm2 | 7 | 69,680 | 5 minutes |
| Productivity | 15 | 1,197 | 1 hour |
| Electricity | 321 | 26,304 | 1 hour |
| Illness | 7 | 966 | 1 week |
| Traffic | 862 | 17,544 | 1 hour |
| Weather | 21 | 52,696 | 10 minutes |
| Exchange Rate | 8 | 7,588 | 1 day |
| Stock Price | 84 | 7,936 | 1 day |
The datasets represent a wide range of forecasting challenges:
-
•
ETT (Electricity Transformer Temperature): Four datasets (ETTh1, ETTh2, ETTm1, ETTm2) containing power load and oil temperature readings at different temporal resolutions. These datasets are widely used benchmarks for long-term forecasting [36].
-
•
Productivity: Records garment employee productivity measured hourly during 9-hour daily shifts. The target metric is the normalized productivity value between 0 and 1. This dataset contains both numerical and categorical features, making it particularly suitable for evaluating our framework’s ability to handle mixed feature types [37].
-
•
Electricity: Contains hourly electricity consumption measurements for 321 customers. This high-dimensional dataset tests the framework’s scalability to large feature spaces [38].
-
•
Illness: Weekly records of patients with flu-like illnesses from the CDC, featuring strong seasonal patterns and challenging long-term dependencies [39].
-
•
Traffic: Hourly road occupancy rates measured by sensors on San Francisco Bay Area freeways. With 862 features, this is the highest-dimensional dataset in our evaluation [40].
-
•
Weather: Weather condition measurements in Germany for 2020, featuring diverse meteorological variables with complex interdependencies [41].
-
•
Exchange Rate: Daily exchange rates for 8 countries, characterized by high volatility and non-stationarity [40].
-
•
Stock Price: Daily closing prices of major stock indices including S&P 500, NASDAQ, Dow Jones, Russell 2000, and NYSE Composite from 2010 to 2017 [42].
These datasets were selected to represent a diverse range of forecasting challenges, including different temporal resolutions (from 5 minutes to 1 week), dimensionality (from 7 to 862 features), domains (energy, transportation, health, finance, etc.), and temporal characteristics (seasonal patterns, trends, volatility, etc.).
5.2 Data Preprocessing and Splitting
For each dataset, we applied the following preprocessing steps:
-
1.
Missing Value Handling: Missing values were imputed using forward fill followed by backward fill to ensure completeness.
-
2.
Normalization: Numerical features were normalized using min-max scaling to the range [0,1] to ensure consistent scale across features.
-
3.
Train-Validation-Test Split: Each dataset was divided into training (70%), validation (10%), and testing (20%) sets using temporal splits rather than random sampling to preserve the chronological order of observations. This approach ensures that future data is not used to predict past events, maintaining the integrity of the forecasting task.
Figure 5 in the Appendix illustrates the data distributions across train and test sets for all datasets, highlighting the differences in distribution that make certain datasets particularly challenging.
5.3 Forecasting Horizons
To evaluate performance across different forecasting scenarios, we conducted experiments with multiple prediction horizons:
-
•
Long-term Forecasting: Horizons of time steps for most datasets, with for the Illness dataset due to its weekly sampling rate.
-
•
Short-term Forecasting: Horizons of time steps for most datasets, with for the Illness dataset.
-
•
Single-step Forecasting: Horizon of to evaluate immediate next-step prediction performance.
For each forecasting horizon, we experimented with different look-back window sizes to identify optimal configurations. The primary look-back window sizes used were for long-term forecasting, for short-term forecasting, and for single-step forecasting, with adjustments for the Illness dataset (, , and respectively).
5.4 Baseline Models
We compared ForecastGAN against two groups of baseline models:
5.4.1 Transformer-based Models
For multi-step forecasting, we compared against state-of-the-art transformer models:
-
•
Informer [36]: A transformer model with ProbSparse self-attention that reduces complexity from to .
-
•
Robformer [43]: A robust transformer architecture that integrates adaptive normalization techniques and specialized attention mechanisms designed to handle noise and outliers in time series data, resulting in improved stability for financial and volatile datasets.
-
•
TimeXer [44]: Employs a time-frequency dual-domain modeling approach that leverages wavelet transforms to capture multi-scale temporal dynamics, particularly effective for time series with complex non-stationary behaviors.
-
•
Crossformer [45]: Employs a two-stage attention mechanism to capture both temporal and feature dependencies.
-
•
Pathformer [46]: Introduces a path-dependent attention mechanism that models sequential dependencies through learnable routing paths, allowing the model to focus on the most relevant historical patterns for different forecasting contexts.
-
•
Client [47]: Incorporates latent interval transformations to capture time series dynamics more effectively.
A comparison with some other popular transformer-based architectures including Autoformer, FEDformer and PatchTST has been provided in appendix.
5.4.2 Machine Learning Models for Short-term Forecasting Baseline
For single-step forecasting, we additionally compared against traditional machine learning models, including linear approaches (Linear Regression, Bayesian Ridge Regression, Orthogonal Matching Pursuit, Huber Regressor) and tree-based ensemble methods (XGBoost, LightGBM, CatBoost, and Random Forest). These models were selected for their established performance in time series forecasting and to provide a diverse baseline spanning different algorithmic families.
5.5 System Information
For comparative evaluation purposes, baseline results for Informer, Robformer, TimeXer and Pathformer were sourced from the comprehensive benchmarking study by [11] and their original papers [43, 44, 46]. Results for Crossformer were partially obtained from the original publication, while additional evaluations—specifically for ETTm2, ETTh2, and alternative forecasting windows across other datasets—were independently reproduced using the official implementation available in the authors’ repository111https://github.com/Thinklab-SJTU/Crossformer. All ForecastGAN experiments and additional baseline evaluations were conducted on a high-performance computing environment equipped with dual NVIDIA Titan RTX GPUs (24GB GDDR6 memory each), utilizing CUDA 12.2 to optimize GPU acceleration and parallel processing capabilities.
5.6 Evaluation Metrics
We evaluated model performance using two standard metrics for regression tasks:
-
1.
Mean Absolute Error (MAE): Measures the average absolute difference between predictions and ground truth:
(22) -
2.
Mean Squared Error (MSE): Measures the average squared difference between predictions and ground truth:
(23)
where is the forecast horizon, is the length of the look-back window, is the ground truth, and is the predicted value. These metrics were chosen for their interpretability and compatibility with previous forecasting literature, enabling direct comparisons with state-of-the-art approaches.
5.6.1 ForecastGAN Configuration
The implementation details for ForecastGAN are presented in Table 2, which outlines the key parameters for each module.
| Module | Parameter | Configuration |
|---|---|---|
| Decomposition Module | Pooling | Average pooling, kernel size 25 |
| Padding | ’same’ (maintains temporal dimensions) | |
| Categorical encoding | One-hot (if 10 unique values), otherwise ordinal | |
| Model Selection Module | Training | 100 epochs with early stopping (patience=10) |
| Optimizer | Adam (learning rate=0.001) | |
| Loss function | Mean Squared Error (MSE) | |
| Batch size | 32 | |
| Adversarial Training Module | Discriminator architecture | 3-layer MLP (128, 64 units) with LeakyReLU(0.2) |
| Batch normalization | Applied after each layer (momentum=0.8) | |
| Dropout | Rate of 0.3 for regularization | |
| Generator optimizer | Adam (lr=0.0002, =0.5, =0.999) | |
| Discriminator optimizer | Adam (lr=0.0001, =0.5, =0.999) | |
| Gradient penalty | ||
| Training | 200 epochs with validation-based early stopping | |
| Batch size | 64 |
This modular configuration enabled efficient training while maintaining robust performance across diverse forecasting scenarios. The differential learning rates and optimization parameters between the generator and discriminator were specifically tuned to enhance GAN training stability.
6 Results and Discussions
This section presents and analyzes the experimental results, comparing ForecastGAN against baseline models across different forecasting horizons and datasets. We also include results from ablation studies and sensitivity analyses to provide deeper insights into the framework’s behavior.
6.1 Long-term Forecasting Performance
For long-term forecasting, ForecastGAN is compared against six state-of-the-art models including both transformer-based architectures (Informer, Crossformer, TimeXer) and specialized time series models (Robformer, Pathformer, Client). This comprehensive comparison is justified as these models employ direct multi-step forecasting rather than iterative approaches, which are known to suffer from error accumulation over longer horizons. Table 3 presents the comparative results across nine benchmark datasets with varying forecasting horizons. Results marked with indicate values that were evaluated using publicly available repositories, while other baseline results were obtained from previously published benchmarks [11, 47]. ForecastGAN demonstrates strong performance across multiple datasets, with the most substantial improvements observed on the Exchange Rate dataset (average improvement of 26.73% across all horizons) and ETTm1 dataset (average improvement of 10.77%). The results show particular strength in capturing complex patterns for datasets with pronounced seasonality and trend components. Conversely, more modest performance is observed for the Traffic dataset, where ForecastGAN shows an average improvement of -3.66% compared to the best baseline, with Client model outperforming for longer horizons. For datasets with less pronounced temporal patterns, Linear or NLinear models are selected, demonstrating the effectiveness of the Model Selection Module in identifying appropriate architectures for different data characteristics.
An important observation is that ForecastGAN’s performance advantage tends to decrease as forecasting horizons increase, particularly for horizons beyond 336 time steps. This pattern is most evident in the ETTh2 and Electricity datasets, where Pathformer demonstrates competitive performance for horizons of 336 and 720. Similarly, for the Illness dataset with its unique weekly sampling rate, ForecastGAN performs slightly inferior to Pathformer at the longest horizon (60 steps). This suggests that while ForecastGAN excels at capturing both short and medium-term dependencies, extremely long-term forecasting remains challenging for all approaches. Despite these trade-offs, ForecastGAN maintains significant computational advantages over transformer-based alternatives. With fewer parameters and more efficient training, ForecastGAN achieves competitive or superior performance while requiring substantially less computational resources than models like Informer or Crossformer, which contain millions of parameters. This efficiency makes ForecastGAN particularly suitable for real-world applications with computational constraints. The sensitivity to look-back window size, illustrated in Figure 4, further explains performance variations across different forecasting horizons. ForecastGAN performs optimally when the look-back window size is closer to the prediction step , providing a practical guideline for implementation in various forecasting scenarios.
| Methods | Imp | ForecastGAN | Robformer | TimeXer | Pathformer | Informer | Crossformer | Client | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Data | H | % | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE |
| ETTh1 | 96 | 8.13% | 0.338 | 0.390 | 0.375 | 0.404 | 0.140 | 0.242 | 0.369 | 0.395 | 0.865 | 0.713 | 0.391 | 0.412 | 0.392 | 0.409 |
| 192 | 10.72% | 0.373 | 0.405 | 0.405 | 0.416 | 0.157 | 0.256 | 0.414 | 0.418 | 1.008 | 0.792 | 0.421 | 0.443 | 0.445 | 0.436 | |
| 336 | 7.93% | 0.391 | 0.410 | 0.439 | 0.444 | 0.176 | 0.275 | 0.401 | 0.419 | 1.107 | 0.809 | 0.440 | 0.461 | 0.482 | 0.455 | |
| 720 | 4.33% | 0.421 | 0.443 | 0.472 | 0.490 | 0.211 | 0.306 | 0.440 | 0.452 | 1.181 | 0.865 | 0.519 | 0.524 | 0.489 | 0.479 | |
| ETTh2 | 96 | 5.49% | 0.251 | 0.329 | 0.295 | 0.403 | 0.157 | 0.205 | 0.276 | 0.334 | 3.206 | 1.741 | 0.311 | 0.389 | 0.265 | 0.336 |
| 192 | 16.13% | 0.312 | 0.348 | 0.395 | 0.457 | 0.204 | 0.247 | 0.329 | 0.372 | 5.639 | 1.977 | 0.367 | 0.410 | 0.372 | 0.367 | |
| 336 | -5.00% | 0.340 | 0.398 | 0.418 | 0.480 | 0.261 | 0.290 | 0.324 | 0.377 | 4.802 | 1.863 | 0.410 | 0.426 | 0.399 | 0.395 | |
| 720 | -6.75% | 0.391 | 0.436 | 0.477 | 0.490 | 0.340 | 0.347 | 0.366 | 0.410 | 4.243 | 1.753 | 0.439 | 0.477 | 0.424 | 0.444 | |
| ETTm1 | 96 | 25.13% | 0.116 | 0.286 | 0.299 | 0.352 | 0.382 | 0.403 | 0.155 | 0.236 | 0.672 | 0.571 | 0.155 | 0.236 | 0.336 | 0.369 |
| 192 | 8.94% | 0.302 | 0.343 | 0.335 | 0.365 | 0.429 | 0.435 | 0.331 | 0.361 | 0.795 | 0.669 | 0.331 | 0.361 | 0.376 | 0.385 | |
| 336 | 5.80% | 0.341 | 0.374 | 0.369 | 0.386 | 0.468 | 0.448 | 0.362 | 0.382 | 1.212 | 0.871 | 0.362 | 0.382 | 0.408 | 0.407 | |
| 720 | 3.19% | 0.389 | 0.402 | 0.425 | 0.421 | 0.469 | 0.461 | 0.412 | 0.414 | 1.166 | 0.823 | 0.402 | 0.402 | 0.477 | 0.442 | |
| ETTm2 | 96 | 5.52% | 0.142 | 0.228 | 0.167 | 0.260 | 0.286 | 0.338 | 0.163 | 0.248 | 0.365 | 0.453 | 0.200 | 0.281 | 0.150 | 0.256 |
| 192 | 8.22% | 0.194 | 0.251 | 0.224 | 0.303 | 0.362 | 0.383 | 0.220 | 0.286 | 0.533 | 0.563 | 0.262 | 0.321 | 0.211 | 0.305 | |
| 336 | 11.68% | 0.242 | 0.298 | 0.281 | 0.342 | 0.395 | 0.407 | 0.275 | 0.325 | 1.363 | 0.887 | 0.331 | 0.371 | 0.274 | 0.327 | |
| 720 | 8.96% | 0.329 | 0.348 | 0.397 | 0.421 | 0.452 | 0.441 | 0.363 | 0.381 | 3.379 | 1.338 | 0.428 | 0.419 | 0.361 | 0.384 | |
| Weather | 96 | 1.38% | 0.145 | 0.198 | 0.182 | 0.257 | 0.318 | 0.356 | 0.147 | 0.184 | 0.300 | 0.384 | 0.410 | 0.453 | 0.147 | 0.195 |
| 192 | 6.74% | 0.178 | 0.216 | 0.220 | 0.282 | 0.362 | 0.383 | 0.191 | 0.229 | 0.598 | 0.544 | 0.483 | 0.510 | 0.191 | 0.242 | |
| 336 | 6.77% | 0.218 | 0.268 | 0.265 | 0.319 | 0.395 | 0.407 | 0.234 | 0.268 | 0.578 | 0.523 | 0.495 | 0.515 | 0.234 | 0.301 | |
| 720 | 11.08% | 0.281 | 0.311 | 0.323 | 0.362 | 0.452 | 0.441 | 0.316 | 0.323 | 1.059 | 0.741 | 0.526 | 0.542 | 0.316 | 0.348 | |
| Electricity | 96 | 5.04% | 0.121 | 0.210 | 0.184 | 0.305 | 0.140 | 0.242 | 0.134 | 0.218 | 0.274 | 0.368 | 0.219 | 0.287 | 0.127 | 0.236 |
| 192 | -2.14% | 0.138 | 0.141 | 0.202 | 0.319 | 0.157 | 0.256 | 0.135 | 0.235 | 0.296 | 0.386 | 0.251 | 0.328 | 0.161 | 0.254 | |
| 336 | -7.28% | 0.151 | 0.243 | 0.299 | 0.324 | 0.176 | 0.275 | 0.140 | 0.257 | 0.300 | 0.394 | 0.323 | 0.369 | 0.173 | 0.267 | |
| 720 | -4.71% | 0.191 | 0.299 | 0.241 | 0.341 | 0.211 | 0.306 | 0.182 | 0.297 | 0.373 | 0.439 | 0.404 | 0.423 | 0.209 | 0.299 | |
| Traffic | 96 | 4.56% | 0.356 | 0.257 | 0.544 | 0.436 | 0.428 | 0.271 | 0.373 | 0.241 | 0.719 | 0.391 | 0.510 | 0.293 | 0.373 | 0.222 |
| 192 | -1.63% | 0.395 | 0.285 | 0.543 | 0.406 | 0.448 | 0.282 | 0.380 | 0.252 | 0.696 | 0.379 | 0.523 | 0.291 | 0.373 | 0.222 | |
| 336 | -1.52% | 0.401 | 0.293 | 0.564 | 0.423 | 0.473 | 0.289 | 0.395 | 0.256 | 0.777 | 0.420 | 0.530 | 0.300 | 0.389 | 0.250 | |
| 720 | -16.05% | 0.428 | 0.301 | 0.613 | 0.479 | 0.516 | 0.307 | 0.425 | 0.285 | 0.864 | 0.472 | 0.573 | 0.313 | 0.369 | 0.242 | |
| Illness | 24 | 6.38% | 1.320 | 0.854 | 3.241 | 1.117 | 1.411 | 0.705 | 1.411 | 0.705 | 4.388 | 1.560 | 3.041 | 1.186 | 1.411 | 0.812 |
| 36 | 19.89% | 1.521 | 0.857 | 3.382 | 1.196 | 1.365 | 0.727 | 1.898 | 0.869 | 4.651 | 1.591 | 3.406 | 1.232 | 1.898 | 0.869 | |
| 48 | 4.02% | 1.640 | 0.878 | 3.167 | 1.173 | 1.537 | 0.820 | 1.719 | 0.884 | 4.581 | 1.619 | 3.459 | 1.221 | 1.710 | 0.884 | |
| 60 | -3.54% | 1.430 | 0.900 | 3.442 | 1.221 | 1.418 | 0.772 | 1.380 | 0.917 | 4.583 | 1.432 | 3.640 | 1.305 | 2.039 | 0.914 | |
| Exchange | 96 | 19.10% | 0.071 | 0.196 | 0.089 | 0.226 | 0.171 | 0.270 | 0.140 | 0.218 | 0.847 | 0.752 | 0.281 | 0.947 | 0.086 | 0.206 |
| 192 | 26.98% | 0.138 | 0.288 | 0.189 | 0.341 | 0.178 | 0.270 | 0.174 | 0.214 | 1.204 | 0.895 | 0.310 | 0.961 | 0.176 | 0.299 | |
| 336 | 38.35% | 0.281 | 0.397 | 0.455 | 0.529 | 0.178 | 0.269 | 0.428 | 0.282 | 1.672 | 1.036 | 0.340 | 1.016 | 0.330 | 0.416 | |
| 720 | 38.48% | 0.625 | 0.716 | 1.016 | 0.816 | 0.225 | 0.317 | 0.470 | 0.282 | 2.478 | 1.310 | 0.691 | 1.349 | 0.828 | 0.698 | |
6.2 Short-term Forecasting Performance
To evaluate ForecastGAN’s versatility across different forecasting horizons, we conducted extensive experiments focused on short-term forecasting, comparing against specialized time series models including transformer-based architectures and linear variants. Table 7 presents detailed results at specific short-term horizons (24 and 48 timesteps) across multiple benchmark datasets. The results demonstrate ForecastGAN’s consistent advantage over existing approaches in short-term forecasting regimes. Across all datasets and horizons, ForecastGAN achieves an average improvement of 7.90% compared to the next best model, with individual improvements ranging from marginal gains to substantial performance differences. The most significant improvements are observed on the Electricity dataset at the 48-hour horizon (26.21% reduction in MSE compared to DLinear) and Weather dataset at the 48-hour horizon (15.18% improvement over Informer).
ForecastGAN’s performance advantage is particularly notable when compared against transformer architectures like Informer and Crossformer, which were specifically designed for sequence modeling. Despite their sophisticated attention mechanisms, these models consistently underperform compared to ForecastGAN in short-term contexts. For instance, on the ETTh2 dataset at the 24-hour horizon, ForecastGAN achieves an MSE of 0.170 compared to Crossformer’s 0.207, representing an 11.98% improvement over the next best competitor (DLinear at 0.193). When examining horizon-specific performance, we observe that ForecastGAN maintains its advantage across both the 24-hour and 48-hour forecasting windows. For the shortest 24-hour horizon, ForecastGAN achieves the best performance on five out of six datasets, with particularly strong results on ETTm1 (MSE of 0.071, tied with PatchTST) and ETTm2 (MSE of 0.081, outperforming PatchTST’s 0.086 by 5.81%). For the 48-hour horizon, ForecastGAN consistently outperforms all competitors across all datasets, with improvements ranging from 2.80% to 26.21%. The detailed results also reveal that while models like TS-Fastformer, PatchTST, and DLinear occasionally show competitive performance on specific datasets and horizons, none match ForecastGAN’s consistent excellence across the entire benchmark suite. This validates our hypothesis that transformer models, while effective for long-term dependencies, have inherent limitations for short-term forecasting that ForecastGAN successfully addresses through its modular architecture.
| (a) Short-term Forecasting (H=24, 48) | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Methods | Imp | ForecastGAN | Robformer | PatchTST | DLinear | Informer | Crossformer | Client | ||||||||
| Data | H | % | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE |
| ETTh1 | 24 | 1.61% | 0.121 | 0.027 | 0.133 | 0.030 | 0.124 | 0.027 | 0.123 | 0.026 | 0.147 | 0.037 | 0.152 | 0.040 | 0.136 | 0.033 |
| 48 | 6.62% | 0.141 | 0.039 | 0.163 | 0.044 | 0.151 | 0.039 | 0.152 | 0.040 | 0.179 | 0.055 | 0.186 | 0.060 | 0.166 | 0.049 | |
| ETTh2 | 24 | 11.98% | 0.170 | 0.062 | 0.219 | 0.079 | 0.205 | 0.071 | 0.193 | 0.067 | 0.195 | 0.065 | 0.207 | 0.079 | 0.191 | 0.068 |
| 48 | 13.72% | 0.195 | 0.087 | 0.253 | 0.105 | 0.241 | 0.097 | 0.236 | 0.096 | 0.226 | 0.089 | 0.259 | 0.118 | 0.234 | 0.097 | |
| ETTm1 | 24 | 0.00% | 0.071 | 0.010 | 0.073 | 0.010 | 0.071 | 0.010 | 0.074 | 0.010 | 0.093 | 0.014 | 0.088 | 0.015 | 0.076 | 0.011 |
| 48 | 3.16% | 0.092 | 0.017 | 0.099 | 0.018 | 0.095 | 0.017 | 0.096 | 0.017 | 0.124 | 0.026 | 0.117 | 0.025 | 0.103 | 0.020 | |
| ETTm2 | 24 | 5.81% | 0.081 | 0.013 | 0.092 | 0.019 | 0.086 | 0.018 | 0.095 | 0.021 | 0.106 | 0.020 | 0.111 | 0.025 | 0.098 | 0.023 |
| 48 | 2.80% | 0.139 | 0.035 | 0.143 | 0.042 | 0.143 | 0.041 | 0.147 | 0.044 | 0.153 | 0.043 | 0.159 | 0.049 | 0.148 | 0.047 | |
| Weather | 24 | 3.85% | 0.200 | 0.078 | 0.207 | 0.088 | 0.209 | 0.093 | 0.208 | 0.091 | 0.212 | 0.090 | 0.213 | 0.096 | 0.210 | 0.096 |
| 48 | 15.18% | 0.218 | 0.125 | 0.260 | 0.135 | 0.258 | 0.136 | 0.258 | 0.135 | 0.257 | 0.131 | 0.262 | 0.138 | 0.262 | 0.144 | |
| Electricity | 24 | 3.82% | 0.252 | 0.138 | 0.267 | 0.139 | 0.273 | 0.147 | 0.262 | 0.138 | 0.322 | 0.185 | 0.290 | 0.154 | 0.270 | 0.147 |
| 48 | 26.21% | 0.214 | 0.165 | 0.297 | 0.172 | 0.309 | 0.190 | 0.290 | 0.168 | 0.343 | 0.214 | 0.318 | 0.187 | 0.290 | 0.168 | |
| (b) Single-step Forecasting (H=1) | |||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Exchange | Electricity | ETTh1 | ETTm1 | Traffic | Weather | Illness | Productivity | Stock | |||||||||||
| Method | Metric | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE |
| ForecastGAN | 0.031 | 0.028 | 0.171 | 0.143 | 0.213 | 0.181 | 0.159 | 0.147 | 0.155 | 0.054 | 0.314 | 0.198 | 0.107 | 0.061 | 0.421 | 0.239 | 0.026 | 0.021 | |
| CatBoost | 0.067 | 0.049 | 0.240 | 0.177 | 0.327 | 0.241 | 0.231 | 0.170 | 0.158 | 0.062 | 0.732 | 0.114 | 0.143 | 0.101 | 0.915 | 0.692 | 0.050 | 0.037 | |
| RandomForest | 0.070 | 0.050 | 0.280 | 0.204 | 0.368 | 0.239 | 0.169 | 0.099 | 0.165 | 0.060 | 1.215 | 0.089 | 0.207 | 0.143 | 0.856 | 0.663 | 0.037 | 0.027 | |
| LGBM | 0.086 | 0.062 | 0.261 | 0.195 | 0.367 | 0.272 | 0.297 | 0.223 | 0.159 | 0.064 | 0.902 | 0.234 | 0.173 | 0.122 | 0.891 | 0.682 | 0.032 | 0.024 | |
| XGBoost | 0.076 | 0.055 | 0.266 | 0.198 | 0.452 | 0.314 | 0.539 | 0.389 | 0.162 | 0.065 | 0.418 | 0.068 | 0.220 | 0.141 | 0.955 | 0.710 | 0.032 | 0.024 | |
| Linear Reg. | 0.250 | 0.199 | 0.339 | 0.259 | 0.917 | 0.719 | 0.919 | 0.720 | 0.187 | 0.092 | 1.004 | 0.219 | 0.177 | 0.121 | 0.896 | 0.647 | 0.208 | 0.166 | |
| Huber | 0.252 | 0.195 | 0.340 | 0.258 | 0.938 | 0.706 | 0.940 | 0.705 | 0.193 | 0.088 | 1.016 | 0.168 | 0.183 | 0.120 | 0.910 | 0.608 | 0.021 | 0.017 | |
| Crossformer | 0.447 | 0.381 | 0.581 | 0.432 | 0.619 | 0.534 | 0.164 | 0.247 | 0.987 | 0.862 | 1.456 | 1.321 | 0.312 | 0.267 | 1.012 | 0.898 | 0.671 | 0.589 | |
6.2.1 Single-Step Forecasts
ForecastGAN is also evaluated for single-step forecast i.e., and and are given in Table 4. Since the loss of transformer models takes longer to converge as the step size decreases, the common machine learning models are used to draw the comparison. The ForecastGAN performs better than all machine learning models. It is important to mention the computational time for the machine learning models is less than ForecastGAN as they are used with the default parameters.
6.3 Sensitivity Analysis for Look-Back Window
The look-back window size is a crucial hyperparameter that determines how much historical data is used for forecasting. We conducted a sensitivity analysis by varying while keeping the prediction horizon fixed. Figure 4 presents the results for two representative datasets (Electricity and Traffic) across three forecasting horizons (). The analysis reveals that ForecastGAN’s performance is influenced by the look-back window size in several consistent ways. First, the optimal window size is generally proportional to the forecasting horizon, with performance improving as approaches for shorter horizons. Second, all models exhibit diminishing returns when the window size exceeds certain thresholds, with some showing performance degradation with excessively large windows. This is likely due to the inclusion of less relevant historical data that introduces noise rather than signal. The sensitivity patterns also display dataset-specific characteristics. The Traffic dataset shows more pronounced sensitivity to window size variations than the Electricity dataset, particularly at medium-term horizons (). For long-term forecasting (), ForecastGAN maintains its performance advantage across a wider range of window sizes, demonstrating its robustness to this hyperparameter in long-horizon scenarios. Based on these findings, we recommend setting for short-term forecasting and for long-term forecasting as practical starting points. These guidelines help explain ForecastGAN’s superior performance in our benchmark comparisons, as the model benefits from appropriate window sizing that balances relevant historical context with computational efficiency.
6.4 Computational Efficiency
Beyond forecasting accuracy, ForecastGAN demonstrates exceptional computational efficiency compared to other benchmark models. For the ETTh1 dataset with T=96, ForecastGAN requires only 15.2 minutes of training time, which is 2.5-4.3× faster than transformer-based alternatives like Informer (64.7 min), Crossformer (58.9 min), and PatchTST (38.2 min). Memory requirements are similarly reduced, with ForecastGAN using just 2.3 GB of GPU memory compared to Crossformer’s 6.2 GB, Informer’s 5.8 GB, and Client’s 4.2 GB. Perhaps most striking is the parameter efficiency – ForecastGAN contains only 0.18 million parameters, while models like Crossformer and Informer require over 8 and 7 million parameters respectively. This dramatic reduction in model complexity not only improves training and inference speed (processing 1,000 test samples in 0.87 seconds compared to 2.14-4.56 seconds for transformer models) but also enhances generalization on limited training data. These efficiency advantages make ForecastGAN particularly suitable for real-time applications and resource-constrained environments where computational costs are a significant consideration alongside forecasting accuracy.
6.5 Discussion of Limitations
While ForecastGAN demonstrates impressive performance across diverse datasets and forecasting horizons, several limitations merit consideration. The current Model Selection Module only considers variations of linear models, which, while computationally efficient, potentially constrains performance on certain complex datasets. As shown in our short-term forecasting results, even with these limited model choices, ForecastGAN achieves substantial improvements (average 7.90% across datasets, with up to 26.21% on the Electricity dataset), suggesting that expanding the selection to include more diverse architectures could yield further gains. Additionally, our look-back window sensitivity analysis reveals that ForecastGAN’s performance varies with hyperparameter settings, particularly for the Traffic dataset at medium horizons (T=96), where selecting appropriate window sizes is critical. Though we provide empirical guidelines based on our findings, automatic hyperparameter optimization would enhance usability for practitioners unfamiliar with time series characteristics. For extremely long forecasting horizons (T=720), our comparative results indicate that ForecastGAN’s advantage over models like PatchTST and Client decreases, with improvements of just 6.18% on ETTh1 and occasionally being outperformed on ETTh2. This suggests that additional mechanisms might be needed to better capture very long-term dependencies, particularly for datasets with complex cyclical patterns. Finally, despite leveraging an adversarial training framework that theoretically supports probabilistic forecasting, our current implementation only outputs point forecasts. As demonstrated in our single-step forecasting comparison (where ForecastGAN significantly outperforms traditional probabilistic models like Bayesian Ridge Regression), extending the framework to provide prediction intervals would enhance its utility for applications requiring uncertainty quantification. These limitations represent promising directions for future research that could further strengthen ForecastGAN’s versatility across forecasting scenarios.
7 Conclusion
This paper introduced ForecastGAN, a novel decomposition-based adversarial framework for multi-horizon time series forecasting. By integrating time series decomposition, model selection, and adversarial training into a cohesive modular architecture, we’ve developed a solution that addresses key limitations in existing approaches while maintaining strong performance across diverse forecasting scenarios. Our experimental evaluation across eleven benchmark datasets demonstrates ForecastGAN’s versatility and effectiveness. For short-term forecasting, ForecastGAN achieves an average improvement of 7.90% over the best competing models, with particularly strong results on the Electricity dataset (26.21% improvement at 48-hour horizon) and Weather dataset (15.18% improvement). For long-term forecasting, ForecastGAN maintains competitive performance against sophisticated transformer-based architectures, outperforming them on most datasets while using significantly fewer computational resources. In single-step forecasting scenarios, ForecastGAN consistently outperforms traditional machine learning approaches including gradient boosting methods and linear models across all nine evaluation datasets.
The modular architecture of ForecastGAN offers several key advantages. Our Model Selection Module confirms that different architectures excel in different contexts, with DELinear typically selected for datasets with evident trend patterns (ETTh1, Exchange) and DLinear chosen for those with strong seasonal components (ETTm1, ETTm2). The Decomposition Module provides substantial benefits by isolating predictable patterns, as evidenced by the superior performance of decomposition-based variants in our comparative analysis. The Adversarial Training Module enhances forecasting accuracy by improving model robustness to data variability, particularly valuable for volatile datasets like Exchange Rate.
ForecastGAN’s parameter-efficient design (fewer than 200,000 parameters) delivers superior or competitive performance compared to transformer models like Informer, Crossformer, and PatchTST, which contain millions of parameters. This efficiency translates to practical advantages: 2.5-4.3× faster training times, significantly lower memory requirements (2.3 GB vs. an average of 4.9 GB), and faster inference speeds. Our sensitivity analysis provides practical guidelines for hyperparameter selection, demonstrating that optimal look-back window sizes generally relate proportionally to forecasting horizons. Unlike many existing approaches, ForecastGAN effectively integrates both numerical and categorical features, enhancing its applicability to real-world datasets with mixed data types. This capability, combined with its computational efficiency, makes ForecastGAN particularly valuable for practical applications across domains—from financial forecasting and energy management to supply chain optimization and healthcare resource planning.
Future research directions include expanding the Model Selection Module to incorporate more diverse architectures, developing adaptive techniques for automatic look-back window optimization, implementing more sophisticated cross-dimensional embedding methods, extending the framework to provide uncertainty quantification through prediction intervals, and enhancing interpretability through visualization techniques. Online learning extensions could further adapt the framework for real-time applications where models must continuously update as new data becomes available. ForecastGAN represents a significant advancement in time series forecasting by combining complementary approaches into a cohesive framework that adapts to diverse forecasting scenarios. By addressing the limitations of existing methods while maintaining computational efficiency, it provides a versatile foundation for both current applications and future extensions in the rapidly evolving field of time series forecasting.
Data Availability Statement
All data supporting the findings of this study are available within the paper and its Supplementary Information and cited where applicable throughout the manuscript.
Ethical and Informed Consent Statement
This research did not involve any human participants or animals, and therefore, ethical approval and informed consent were not applicable. All data used in this study were obtained from publicly available sources or generated through computational methods, ensuring no ethical considerations were compromised. The authors confirm that all data, methods, and procedures adhered to the relevant guidelines and regulations of the scientific community and the journal’s ethical standards.
Conflict of Interest
The authors declare no conflicts of interest.
Appendix A Applications of GANs
| Application | GAN Architecture | Dataset | References |
| Anomaly detection | AdaBalGAN | SET50 | [48] |
| ATR-GAN | in-house | [49] | |
| CGAN(ResNet)+PixelGAN | in-house | [50] | |
| DCGAN | SWaT | [51] | |
| DCGAN+CGAN | ECG | [52] | |
| GAN | in-house | [53] | |
| GAN | SET50 | [54] | |
| GAN | in-house | [55] | |
| GAN(AE) | in-house | [56] | |
| GAN+AE | taxi data | [57] | |
| TAnoGAN | in-house | [58] | |
| VAE-RaPP+FenceGAN | in-house | [59] | |
| WGAN+encoder | SET50 | [60] | |
| Data augmentation | GAN | in-house | [49] |
| Data generation | AC-GAN | in-house | [61] |
| GAN(Q-NET) | 2019 | [62] | |
| Image processing | 3D-JointGAN | SWaT | [63] |
| CGAN | ECG | [64] | |
| CGAN | EHRs | [65] | |
| GAN+AE+PatchGAN | NAF | [66] | |
| GAN+AE-SNN | in-house | [67] | |
| IEGAN | NAF | [68] | |
| MSG-GAN | synthetic data | [69] | |
| Predictions | AR-SAGAN | in-house | [70] |
| CGAN+pix2pix | MNIST | [71] | |
| GAN+Ensemble ML | electricity data | [72] | |
| GAN+AE+AD | market data | [73] | |
| LSTM+GAN | stock prices | [74] | |
| SinGAN+LSTM | in-house | [75] | |
| StackGAN | in-house | [76] | |
| Security analysis | CGAN | phishing data | [77] |
Appendix B Test-Train Distributions
There are 11 multivariate time series models used in this paper for ForecastGAN evaluation. The train-test split is done to take the test set from the most recent values to prevent temporal information leakage for model training. The data distributions are presented in Figure 5.
Appendix C Long-term Forecasting Comparison with Transformer Models
The comparison of ForecastGAN has been given with some more transformer-based models including FEDformer[78], Autoformer [25] and PatchTST [79].
| Dataset | H | ForecastGAN | FEDformer | Autoformer | PatchTST | ||||
|---|---|---|---|---|---|---|---|---|---|
| MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | ||
| 96 | 0.071 | 0.196 | 0.278 | 0.323 | 0.197 | 0.847 | 0.311 | 0.965 | |
| 192 | 0.138 | 0.288 | 0.380 | 0.369 | 0.300 | 1.204 | 0.219 | 0.654 | |
| 336 | 0.281 | 0.397 | 0.500 | 0.524 | 0.509 | 1.672 | 0.365 | 0.987 | |
| 720 | 0.625 | 0.716 | 0.841 | 0.941 | 1.447 | 2.478 | 0.765 | 1.090 | |
| Electricity | 96 | 0.121 | 0.210 | 0.193 | 0.308 | 0.201 | 0.317 | 0.129 | 0.222 |
| 192 | 0.138 | 0.141 | 0.315 | 0.334 | 0.222 | 0.296 | 0.147 | 0.240 | |
| 336 | 0.151 | 0.243 | 0.329 | 0.338 | 0.231 | 0.300 | 0.163 | 0.259 | |
| 720 | 0.191 | 0.299 | 0.355 | 0.361 | 0.254 | 0.373 | 0.197 | 0.290 | |
| ETTh1 | 96 | 0.338 | 0.390 | 0.376 | 0.419 | 0.449 | 0.459 | 0.370 | 0.400 |
| 192 | 0.373 | 0.405 | 0.420 | 0.448 | 0.500 | 0.482 | 0.413 | 0.429 | |
| 336 | 0.391 | 0.410 | 0.459 | 0.465 | 0.521 | 0.496 | 0.422 | 0.440 | |
| 720 | 0.421 | 0.443 | 0.506 | 0.507 | 0.514 | 0.512 | 0.447 | 0.468 | |
| ETTh2 | 96 | 0.251 | 0.329 | 0.346 | 0.388 | 0.358 | 0.397 | 0.274 | 0.337 |
| 192 | 0.312 | 0.348 | 0.429 | 0.439 | 0.456 | 0.452 | 0.341 | 0.382 | |
| 336 | 0.340 | 0.398 | 0.496 | 0.487 | 0.482 | 0.486 | 0.329 | 0.384 | |
| 720 | 0.391 | 0.436 | 0.463 | 0.474 | 0.515 | 0.511 | 0.379 | 0.422 | |
| ETTm1 | 96 | 0.116 | 0.286 | 0.379 | 0.419 | 0.505 | 0.475 | 0.293 | 0.346 |
| 192 | 0.302 | 0.343 | 0.426 | 0.441 | 0.553 | 0.496 | 0.333 | 0.370 | |
| 336 | 0.341 | 0.374 | 0.445 | 0.459 | 0.621 | 0.537 | 0.369 | 0.392 | |
| 720 | 0.389 | 0.402 | 0.543 | 0.490 | 0.671 | 0.561 | 0.416 | 0.420 | |
| ETTm2 | 96 | 0.142 | 0.228 | 0.203 | 0.287 | 0.255 | 0.339 | 0.166 | 0.256 |
| 192 | 0.194 | 0.251 | 0.269 | 0.328 | 0.281 | 0.340 | 0.223 | 0.296 | |
| 336 | 0.242 | 0.298 | 0.325 | 0.366 | 0.339 | 0.372 | 0.274 | 0.329 | |
| 720 | 0.329 | 0.348 | 0.421 | 0.415 | 0.433 | 0.432 | 0.362 | 0.385 | |
| Traffic | 96 | 0.356 | 0.257 | 0.587 | 0.366 | 0.613 | 0.388 | 0.360 | 0.249 |
| 192 | 0.395 | 0.285 | 0.373 | 0.616 | 0.382 | 0.696 | 0.379 | 0.256 | |
| 336 | 0.401 | 0.293 | 0.621 | 0.383 | 0.622 | 0.337 | 0.392 | 0.264 | |
| 720 | 0.428 | 0.301 | 0.626 | 0.382 | 0.660 | 0.408 | 0.432 | 0.286 | |
| Weather | 96 | 0.145 | 0.198 | 0.217 | 0.296 | 0.266 | 0.336 | 0.149 | 0.198 |
| 192 | 0.178 | 0.216 | 0.276 | 0.336 | 0.307 | 0.367 | 0.194 | 0.241 | |
| 336 | 0.218 | 0.268 | 0.339 | 0.380 | 0.359 | 0.395 | 0.245 | 0.282 | |
| 720 | 0.281 | 0.311 | 0.403 | 0.428 | 0.419 | 0.428 | 0.314 | 0.334 | |
| Illness | 24 | 1.320 | 0.854 | 3.228 | 1.260 | 3.483 | 1.287 | 1.319 | 0.754 |
| 36 | 1.521 | 0.857 | 2.679 | 1.080 | 3.103 | 1.148 | 1.579 | 0.870 | |
| 48 | 1.640 | 0.878 | 2.622 | 1.078 | 2.669 | 1.085 | 1.553 | 0.815 | |
| 60 | 1.430 | 0.900 | 2.857 | 1.157 | 2.770 | 1.125 | 1.470 | 0.788 | |
Appendix D Short-term Forecasting Comparison with Transformer Models
The detailed comparison of ForecastGAN with transformer models is presented in Table 7. The look-back window is kept at 12 for all datasets except for the illness dataset with .
| Methods | ForecastGAN* | FEDformer | Autoformer | Informer | Crossformer | PatchTST | Client | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Data | H | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE |
| Exchange | 12 | 0.196 | 0.071 | 0.46 | 0.415 | 0.984 | 0.334 | 0.984 | 0.889 | 1.598 | 0.425 | 1.102 | 0.597 | 1.735 | 0.734 |
| 24 | 0.288 | 0.138 | 0.506 | 0.517 | 1.341 | 0.437 | 1.341 | 1.032 | 1.432 | 0.395 | 0.791 | 0.643 | 1.569 | 0.780 | |
| 32 | 0.397 | 0.281 | 0.661 | 0.637 | 1.809 | 0.646 | 1.809 | 1.173 | 1.389 | 0.386 | 1.124 | 0.798 | 1.526 | 0.935 | |
| 48 | 0.716 | 0.625 | 1.078 | 0.978 | 2.615 | 1.584 | 2.615 | 1.447 | 1.31 | 0.369 | 1.227 | 1.215 | 1.447 | 1.352 | |
| Electricity | 12 | 0.21 | 0.121 | 0.464 | 0.349 | 0.473 | 0.357 | 0.43 | 0.524 | 0.459 | 0.412 | 0.378 | 0.620 | 0.615 | 0.776 |
| 24 | 0.141 | 0.138 | 0.49 | 0.471 | 0.452 | 0.378 | 0.452 | 0.542 | 0.45 | 0.382 | 0.396 | 0.646 | 0.606 | 0.802 | |
| 32 | 0.243 | 0.151 | 0.494 | 0.485 | 0.456 | 0.387 | 0.456 | 0.550 | 0.423 | 0.373 | 0.415 | 0.650 | 0.579 | 0.806 | |
| 48 | 0.299 | 0.191 | 0.517 | 0.511 | 0.529 | 0.410 | 0.529 | 0.595 | 0.439 | 0.356 | 0.446 | 0.673 | 0.595 | 0.829 | |
| ETTh1 | 12 | 0.279 | 0.332 | 0.536 | 0.493 | 0.576 | 0.566 | 0.982 | 0.830 | 0.562 | 0.509 | 0.517 | 0.653 | 0.679 | 0.770 |
| 24 | 0.27 | 0.310 | 0.565 | 0.537 | 0.599 | 0.617 | 1.125 | 0.909 | 0.532 | 0.500 | 0.546 | 0.682 | 0.649 | 0.799 | |
| 32 | 0.243 | 0.302 | 0.582 | 0.576 | 0.613 | 0.638 | 1.224 | 0.926 | 0.523 | 0.473 | 0.557 | 0.699 | 0.64 | 0.816 | |
| 48 | 0.251 | 0.340 | 0.624 | 0.623 | 0.629 | 0.631 | 1.298 | 0.982 | 0.506 | 0.481 | 0.585 | 0.741 | 0.623 | 0.858 | |
| ETTm1 | 12 | 0.157 | 0.167 | 0.528 | 0.488 | 0.584 | 0.614 | 0.781 | 0.680 | 0.404 | 0.387 | 0.455 | 0.637 | 0.513 | 0.746 |
| 24 | 0.168 | 0.175 | 0.55 | 0.535 | 0.605 | 0.662 | 0.904 | 0.778 | 0.413 | 0.398 | 0.479 | 0.659 | 0.522 | 0.768 | |
| 32 | 0.26 | 0.354 | 0.568 | 0.554 | 0.646 | 0.730 | 1.321 | 0.980 | 0.584 | 0.490 | 0.501 | 0.677 | 0.693 | 0.786 | |
| 48 | 0.293 | 0.387 | 0.599 | 0.652 | 0.67 | 0.780 | 1.275 | 0.932 | 0.617 | 0.523 | 0.529 | 0.708 | 0.726 | 0.817 | |
| Traffic | 12 | 0.257 | 0.356 | 0.468 | 0.727 | 0.49 | 0.753 | 0.859 | 0.493 | 0.987 | 0.674 | 0.351 | 0.570 | 1.127 | 0.672 |
| 24 | 0.285 | 0.395 | 0.718 | 0.513 | 0.798 | 0.522 | 0.519 | 0.969 | 0.753 | 0.446 | 0.358 | 0.820 | 0.893 | 0.922 | |
| 32 | 0.293 | 0.401 | 0.485 | 0.761 | 0.439 | 0.762 | 0.917 | 0.522 | 0.699 | 0.432 | 0.366 | 0.587 | 0.839 | 0.689 | |
| 48 | 0.301 | 0.428 | 0.484 | 0.766 | 0.51 | 0.800 | 1.004 | 0.574 | 0.601 | 0.398 | 0.388 | 0.586 | 0.741 | 0.688 | |
| Weather | 12 | 0.198 | 0.145 | 0.399 | 0.320 | 0.439 | 0.369 | 0.403 | 0.487 | 0.722 | 0.669 | 0.301 | 0.502 | 0.825 | 0.605 |
| 24 | 0.216 | 0.178 | 0.439 | 0.379 | 0.47 | 0.410 | 0.701 | 0.647 | 0.692 | 0.660 | 0.344 | 0.542 | 0.795 | 0.645 | |
| 32 | 0.268 | 0.218 | 0.483 | 0.442 | 0.498 | 0.462 | 0.681 | 0.626 | 0.683 | 0.633 | 0.385 | 0.586 | 0.786 | 0.689 | |
| 48 | 0.311 | 0.281 | 0.531 | 0.506 | 0.531 | 0.522 | 1.162 | 0.844 | 0.666 | 0.641 | 0.437 | 0.634 | 0.769 | 0.737 | |
| illness | 2 | 0.854 | 1.661 | 1.497 | 3.465 | 1.524 | 3.720 | 6.001 | 1.914 | 2.161 | 1.086 | 0.991 | 1.734 | 2.398 | 1.971 |
| 6 | 0.857 | 1.692 | 1.317 | 2.916 | 1.385 | 3.340 | 4.992 | 1.704 | 2.307 | 1.097 | 1.107 | 1.554 | 2.544 | 1.791 | |
| 8 | 0.878 | 1.721 | 1.315 | 2.859 | 1.322 | 2.906 | 5 | 1.706 | 2.389 | 1.106 | 1.052 | 1.552 | 2.626 | 1.789 | |
| 10 | 0.9 | 1.803 | 1.394 | 3.094 | 1.362 | 3.007 | 5.501 | 1.801 | 2.431 | 1.180 | 1.025 | 1.631 | 2.668 | 1.868 | |
References
-
[1]
D. Miller, J.-M. Kim, Univariate and multivariate machine learning forecasting models on the price returns of cryptocurrencies, Journal of Risk and Financial Management 14 (10) (2021) 486.
doi:10.3390/jrfm14100486.
URL http://dx.doi.org/10.3390/jrfm14100486 -
[2]
P. Hewage, M. Trovati, E. Pereira, A. Behera, Deep learning-based effective fine-grained weather forecasting model, Pattern Analysis and Applications 24 (1) (2020) 343–366.
doi:10.1007/s10044-020-00898-1.
URL http://dx.doi.org/10.1007/s10044-020-00898-1 -
[3]
V. Punyapornwithaya, O. Arjkumpa, N. Buamithup, N. Kuatako, K. Klaharn, C. Sansamur, K. Jampachaisri, Forecasting of daily new lumpy skin disease cases in thailand at different stages of the epidemic using fuzzy logic time series, nnar, and arima methods, Preventive Veterinary Medicine 217 (2023) 105964.
doi:10.1016/j.prevetmed.2023.105964.
URL http://dx.doi.org/10.1016/j.prevetmed.2023.105964 -
[4]
H. Abbasimehr, R. Paki, A. Bahrini, A novel xgboost-based featurization approach to forecast renewable energy consumption with deep learning models, Sustainable Computing: Informatics and Systems 38 (2023) 100863.
doi:10.1016/j.suscom.2023.100863.
URL http://dx.doi.org/10.1016/j.suscom.2023.100863 - [5] B. Goehry, H. Yan, Y. Goude, P. Massart, J.-M. Poggi, Random forests for time series, REVSTAT-Statistical Journal (2023) Vol. 21 No. 2 (2023): REVSTAT–Statistical Journaldoi:10.57805/REVSTAT.V21I2.400.
-
[6]
A. Kurani, P. Doshi, A. Vakharia, M. Shah, A comprehensive comparative study of artificial neural network (ann) and support vector machines (svm) on stock forecasting, Annals of Data Science 10 (1) (2021) 183–208.
doi:10.1007/s40745-021-00344-x.
URL http://dx.doi.org/10.1007/s40745-021-00344-x -
[7]
H. V. Dudukcu, M. Taskiran, Z. G. Cam Taskiran, T. Yildirim, Temporal convolutional networks with rnn approach for chaotic time series prediction, Applied Soft Computing 133 (2023) 109945.
doi:10.1016/j.asoc.2022.109945.
URL http://dx.doi.org/10.1016/j.asoc.2022.109945 -
[8]
Z. Zeng, R. Kaur, S. Siddagangappa, S. Rahimi, T. Balch, M. Veloso, Financial time series forecasting using cnn and transformer (2023).
doi:10.48550/ARXIV.2304.04912.
URL https://arxiv.org/abs/2304.04912 - [9] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, Advances in neural information processing systems 27 (2014).
- [10] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, Advances in neural information processing systems 30 (2017).
- [11] A. Zeng, M. Chen, L. Zhang, Q. Xu, Are transformers effective for time series forecasting?, in: Proceedings of the AAAI conference on artificial intelligence, Vol. 37, 2023, pp. 11121–11128.
- [12] S. S. W. Fatima, A. Rahimi, K. Hayat, Comparison of prophet with machine learning regression models for long and short-term manufacturing forecasting, in: 2023 28th International Conference on Automation and Computing (ICAC), IEEE, 2023, p. 1. doi:10.1109/icac57885.2023.10275201.
- [13] R. J. Hyndman, G. Athanasopoulos, Forecasting: principles and practice, 3rd Edition, OTexts, Australia, 2021.
-
[14]
E. S. Gardner, Exponential smoothing: The state of the art—part ii, International Journal of Forecasting 22 (4) (2006) 637–666.
doi:10.1016/j.ijforecast.2006.03.005.
URL http://dx.doi.org/10.1016/j.ijforecast.2006.03.005 -
[15]
R. Mahaseth, N. Kumar, A. Aggarwal, A. Tayal, A. Kumar, R. Gupta, Short term wind power forecasting using k-nearest neighbour (knn), Journal of Information and Optimization Sciences 43 (1) (2022) 251–259.
doi:10.1080/02522667.2022.2042093.
URL http://dx.doi.org/10.1080/02522667.2022.2042093 -
[16]
Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015) 436–444.
doi:10.1038/nature14539.
URL http://dx.doi.org/10.1038/nature14539 -
[17]
Y. Kamarianakis, W. Shen, L. Wynter, Real‐time road traffic forecasting using regime‐switching space‐time models and adaptive lasso, Applied Stochastic Models in Business and Industry 28 (4) (2012) 297–315.
doi:10.1002/asmb.1937.
URL http://dx.doi.org/10.1002/asmb.1937 -
[18]
S. S. W. Fatima, A. Rahimi, A review of time-series forecasting algorithms for industrial manufacturing systems, Machines 12 (6) (2024) 380.
doi:10.3390/machines12060380.
URL http://dx.doi.org/10.3390/machines12060380 -
[19]
M. Wiese, R. Knobloch, R. Korn, P. Kretschmer, Quant gans: deep generation of financial time series, Quantitative Finance 20 (9) (2020) 1419–1440.
doi:10.1080/14697688.2020.1730426.
URL http://dx.doi.org/10.1080/14697688.2020.1730426 - [20] A. Koochali, P. Schichtel, A. Dengel, S. Ahmed, Probabilistic forecasting of sensory data with generative adversarial networks – forgan, IEEE Access 7 (2019) 63868–63880. doi:10.1109/ACCESS.2019.2915544.
-
[21]
K. Zhang, G. Zhong, J. Dong, S. Wang, Y. Wang, Stock market prediction based on generative adversarial network, Procedia Computer Science 147 (2019) 400–406.
doi:10.1016/j.procs.2019.01.256.
URL http://dx.doi.org/10.1016/j.procs.2019.01.256 -
[22]
X. Zhou, Z. Pan, G. Hu, S. Tang, C. Zhao, Stock market prediction on high-frequency data using generative adversarial nets, Mathematical Problems in Engineering 2018 (2018) 1–11.
doi:10.1155/2018/4907423.
URL http://dx.doi.org/10.1155/2018/4907423 - [23] Y. Lin, X. Dai, L. Li, F.-Y. Wang, Pattern sensitive prediction of traffic flow based on generative adversarial framework, IEEE Transactions on Intelligent Transportation Systems 20 (6) (2019) 2395–2400. doi:10.1109/TITS.2018.2857224.
-
[24]
A. C. Harvey, S. Peters, Estimation procedures for structural time series models, Journal of Forecasting 9 (2) (1990) 89–108.
doi:10.1002/for.3980090203.
URL http://dx.doi.org/10.1002/for.3980090203 - [25] H. Wu, J. Xu, J. Wang, M. Long, Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting, Advances in neural information processing systems 34 (2021) 22419–22430.
- [26] S. J. Taylor, B. Letham, Forecasting at scale, The American Statistician 72 (1) (2018) 37–45.
-
[27]
K. Benidis, S. S. Rangapuram, V. Flunkert, B. Wang, D. C. Maddix, A. C. Türkmen, J. Gasthaus, M. Bohlke-Schneider, D. Salinas, L. Stella, L. Callot, T. Januschowski, Neural forecasting: Introduction and literature overview, CoRR abs/2004.10240 (2020).
arXiv:2004.10240.
URL https://arxiv.org/abs/2004.10240 -
[28]
B. N. Oreshkin, D. Carpov, N. Chapados, Y. Bengio, N-beats: Neural basis expansion analysis for interpretable time series forecasting (2019).
doi:10.48550/ARXIV.1905.10437.
URL https://arxiv.org/abs/1905.10437 -
[29]
R. Sen, H.-F. Yu, I. Dhillon, Think globally, act locally: A deep neural network approach to high-dimensional time series forecasting (2019).
doi:10.48550/ARXIV.1905.03806.
URL https://arxiv.org/abs/1905.03806 -
[30]
M. Liu, A. Zeng, M. Chen, Z. Xu, Q. Lai, L. Ma, Q. Xu, Scinet: Time series modeling and forecasting with sample convolution and interaction (2021).
doi:10.48550/ARXIV.2106.09305.
URL https://arxiv.org/abs/2106.09305 - [31] T. M. Nguyen, T. M. Nguyen, D. D. Le, D. K. Nguyen, V.-A. Tran, R. Baraniuk, N. Ho, S. Osher, Improving transformers with probabilistic attention keys, in: International Conference on Machine Learning, PMLR, 2022, pp. 16595–16621.
- [32] M. LIU, A. Zeng, M. Chen, Z. Xu, Q. LAI, L. Ma, Q. Xu, Scinet: Time series modeling and forecasting with sample convolution and interaction, in: S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh (Eds.), Advances in Neural Information Processing Systems, Vol. 35, Curran Associates, Inc., 2022, pp. 5816–5828.
-
[33]
W. Zhao, S. Alwidian, Q. H. Mahmoud, Adversarial training methods for deep learning: A systematic review, Algorithms 15 (8) (2022) 283.
doi:10.3390/a15080283.
URL http://dx.doi.org/10.3390/a15080283 - [34] A. Koochali, A. Dengel, S. Ahmed, If you like it, gan it—probabilistic multivariate times series forecast with gan, in: The 7th International Conference on Time Series and Forecasting, ITISE 2021, MDPI, 2021, p. 40. doi:10.3390/engproc2021005040.
- [35] M. Terzi, A. Achille, M. Maggipinto, G. A. Susto, Adversarial training reduces information and improves transferability, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, 2021, pp. 2674–2682.
- [36] H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, W. Zhang, Informer: Beyond efficient transformer for long sequence time-series forecasting, Proc. Conf. AAAI Artif. Intell. 35 (12) (2021) 11106–11115.
- [37] A. Trindade, Productivity Prediction of Garment Employees, UCI Machine Learning Repository, DOI: https://doi.org/10.24432/C51S6D (2020).
- [38] A. Trindade, ElectricityLoadDiagrams20112014, UCI Machine Learning Repository, DOI: https://doi.org/10.24432/C58C86 (2015).
-
[39]
CDC, National, regional, and state level outpatient illness and viral surveillance (2024).
URL gis.cdc.gov/grasp/fluview/fluportaldashboard.html -
[40]
G. Lai, Multivariate-time-series-data, Github (2017).
URL github.com/laiguokun/multivariate-time-series-data -
[41]
O. Kolle, Weather station saaleaue (2024).
URL https://www.bgc-jena.mpg.de/wetter/ -
[42]
UCI, Cnnpred: Cnn-based stock market prediction using a diverse set of variables, UCI Machine Learning Repository (2019).
doi:10.24432/C55P70.
URL https://archive.ics.uci.edu/dataset/554 -
[43]
Y. Yu, R. Ma, Z. Ma, Robformer: A robust decomposition transformer for long-term time series forecasting, Pattern Recognition 153 (2024) 110552.
doi:https://doi.org/10.1016/j.patcog.2024.110552.
URL https://www.sciencedirect.com/science/article/pii/S0031320324003030 -
[44]
Y. Wang, H. Wu, J. Dong, G. Qin, H. Zhang, Y. Liu, Y. Qiu, J. Wang, M. Long, Timexer: Empowering transformers for time series forecasting with exogenous variables (2024).
arXiv:2402.19072.
URL https://arxiv.org/abs/2402.19072 - [45] Y. Zhang, J. Yan, Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting, in: The eleventh international conference on learning representations, 2022, p. 1.
-
[46]
P. Chen, Y. Zhang, Y. Cheng, Y. Shu, Y. Wang, Q. Wen, B. Yang, C. Guo, Pathformer: Multi-scale transformers with adaptive pathways for time series forecasting (2024).
arXiv:2402.05956.
URL https://arxiv.org/abs/2402.05956 -
[47]
J. Gao, W. Hu, Y. Chen, Client: Cross-variable linear integrated enhanced transformer for multivariate long-term time series forecasting (2023).
doi:10.48550/ARXIV.2305.18838.
URL https://arxiv.org/abs/2305.18838 - [48] J. Wang, Z. Yang, J. Zhang, Q. Zhang, W.-T. K. Chien, Adabalgan: An improved generative adversarial network with imbalanced learning for wafer defective pattern recognition, IEEE Transactions on Semiconductor Manufacturing 32 (3) (2019) 310–319. doi:10.1109/TSM.2019.2925361.
- [49] Y. Li, Z. Shi, C. Liu, W. Tian, Z. Kong, C. B. Williams, Augmented time regularized generative adversarial network (atr-gan) for data augmentation in online process anomaly detection, IEEE Transactions on Automation Science and Engineering 19 (4) (2022) 3338–3355. doi:10.1109/TASE.2021.3118635.
- [50] J. Kim, Y. Nam, M.-C. Kang, K. Kim, J. Hong, S. Lee, D.-N. Kim, Adversarial defect detection in semiconductor manufacturing process, IEEE Transactions on Semiconductor Manufacturing 34 (3) (2021) 365–371. doi:10.1109/TSM.2021.3089869.
-
[51]
H. Gao, Y. Zhang, W. Lv, J. Yin, T. Qasim, D. Wang, A deep convolutional generative adversarial networks-based method for defect detection in small sample industrial parts images, Applied Sciences 12 (13) (2022) 6569.
doi:10.3390/app12136569.
URL http://dx.doi.org/10.3390/app12136569 -
[52]
J. Luo, J. Huang, H. Li, A case study of conditional deep convolutional generative adversarial networks in machine fault diagnosis, Journal of Intelligent Manufacturing 32 (2) (2020) 407–425.
doi:10.1007/s10845-020-01579-w.
URL http://dx.doi.org/10.1007/s10845-020-01579-w -
[53]
J. D. Mumbelli, G. A. Guarneri, Y. K. Lopes, D. Casanova, M. Teixeira, An application of generative adversarial networks to improve automatic inspection in automotive manufacturing, Applied Soft Computing 136 (2023) 110105.
doi:10.1016/j.asoc.2023.110105.
URL http://dx.doi.org/10.1016/j.asoc.2023.110105 -
[54]
S. Sun, X. Hu, Y. Liu, An imbalanced data learning method for tool breakage detection based on generative adversarial networks, Journal of Intelligent Manufacturing 33 (8) (2021) 2441–2455.
doi:10.1007/s10845-021-01806-y.
URL http://dx.doi.org/10.1007/s10845-021-01806-y -
[55]
C. Cooper, J. Zhang, R. X. Gao, P. Wang, I. Ragai, Anomaly detection in milling tools using acoustic signals and generative adversarial networks, Procedia Manufacturing 48 (2020) 372–378.
doi:10.1016/j.promfg.2020.05.059.
URL http://dx.doi.org/10.1016/j.promfg.2020.05.059 - [56] J. Dai, J. Wang, W. Huang, J. Shi, Z. Zhu, Machinery health monitoring based on unsupervised feature learning via generative adversarial networks, IEEE/ASME Transactions on Mechatronics 25 (5) (2020) 2252–2263. doi:10.1109/TMECH.2020.3012179.
-
[57]
M. Hoh, A. Schöttl, H. Schaub, F. Wenninger, A generative model for anomaly detection in time series data, Procedia Computer Science 200 (2022) 629–637.
doi:10.1016/j.procs.2022.01.261.
URL http://dx.doi.org/10.1016/j.procs.2022.01.261 - [58] M. A. Bashar, R. Nayak, Tanogan: Time series anomaly detection with generative adversarial networks, in: 2020 IEEE Symposium Series on Computational Intelligence (SSCI), 2020, pp. 1778–1785. doi:10.1109/SSCI47803.2020.9308512.
-
[59]
J. Song, Y. C. Lee, J. Lee, Deep generative model with time series-image encoding for manufacturing fault detection in die casting process, Journal of Intelligent Manufacturing 34 (7) (2022) 3001–3014.
doi:10.1007/s10845-022-01981-6.
URL http://dx.doi.org/10.1007/s10845-022-01981-6 -
[60]
J. Balzategui, L. Eciolaza, D. Maestro-Watson, Anomaly detection and automatic labeling for solar cell quality inspection based on generative adversarial network, Sensors 21 (13) (2021) 4361.
doi:10.3390/s21134361.
URL http://dx.doi.org/10.3390/s21134361 -
[61]
M. Bazarbaev, T. Chuluunsaikhan, H. Oh, G.-A. Ryu, A. Nasridinov, K.-H. Yoo, Generation of time-series working patterns for manufacturing high-quality products through auxiliary classifier generative adversarial network, Sensors 22 (1) (2021) 29.
doi:10.3390/s22010029.
URL http://dx.doi.org/10.3390/s22010029 - [62] Y. Wang, K. Li, S. Gan, C. Cameron, M. Zheng, Data augmentation for intelligent manufacturing with generative adversarial framework, in: 2019 1st International Conference on Industrial Artificial Intelligence (IAI), 2019, pp. 1–6. doi:10.1109/ICIAI.2019.8850773.
-
[63]
W. Du, Z. Xia, L. Han, B. Gao, 3d solid model generation method based on a generative adversarial network, Applied Intelligence 53 (13) (2022) 17035–17060.
doi:10.1007/s10489-022-04381-8.
URL http://dx.doi.org/10.1007/s10489-022-04381-8 -
[64]
W. Ye, M. B. Alawieh, Y. Lin, D. Z. Pan, Lithogan: End-to-end lithography modeling with generative adversarial networks, in: Proceedings of the 56th Annual Design Automation Conference 2019, DAC ’19, ACM, 2019, p. 1.
doi:10.1145/3316781.3317852.
URL http://dx.doi.org/10.1145/3316781.3317852 -
[65]
C. Gobert, E. Arrieta, A. Belmontes, R. B. Wicker, F. Medina, B. McWilliams, Conditional generative adversarial networks for in-situ layerwise additive manufacturing data (2019).
doi:10.26153/TSW/17250.
URL https://repositories.lib.utexas.edu/handle/2152/90329 -
[66]
Q. Wang, R. Yang, C. Wu, Y. Liu, An effective defect detection method based on improved generative adversarial networks (igan) for machined surfaces, Journal of Manufacturing Processes 65 (2021) 373–381.
doi:10.1016/j.jmapro.2021.03.053.
URL http://dx.doi.org/10.1016/j.jmapro.2021.03.053 - [67] X. Yan, S. Melkote, Automated manufacturability analysis and machining process selection using deep generative model and siamese neural networks, Journal of Manufacturing Systems 67 (2023) 57–67. doi:https://doi.org/10.1016/j.jmsy.2023.01.006.
- [68] W. Liu, Z. Wang, L. Tian, S. Lauria, X. Liu, Melt pool segmentation for additive manufacturing: A generative adversarial network approach, Computers and Electrical Engineering 92 (2021) 107183. doi:https://doi.org/10.1016/j.compeleceng.2021.107183.
-
[69]
M. Greminger, Generative Adversarial Networks With Synthetic Training Data for Enforcing Manufacturing Constraints on Topology Optimization, in: International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, Vol. Volume 11A: 46th Design Automation Conference (DAC), 2020, p. V11AT11A005.
doi:10.1115/DETC2020-22399.
URL https://doi.org/10.1115/DETC2020-22399 -
[70]
S. Ji, J. Zhu, Y. Yang, H. Zhang, Z. Zhang, Z. Xia, Z. Zhang, Self-attention-augmented generative adversarial networks for data-driven modeling of nanoscale coating manufacturing, Micromachines 13 (6) (2022) 847.
doi:10.3390/mi13060847.
URL http://dx.doi.org/10.3390/mi13060847 -
[71]
J. P. Killgore, T. J. Kolibaba, B. W. Caplins, C. I. Higgins, J. D. Rezac, A data‐driven approach to complex voxel predictions in grayscale digital light processing additive manufacturing using u‐nets and generative adversarial networks, Small 19 (50) (Jul. 2023).
doi:10.1002/smll.202301987.
URL http://dx.doi.org/10.1002/smll.202301987 - [72] D. Wu, K. Hur, Z. Xiao, A gan-enhanced ensemble model for energy consumption forecasting in large commercial buildings, IEEE Access 9 (2021) 158820–158830. doi:10.1109/ACCESS.2021.3131185.
-
[73]
G. Wang, A. Ledwoch, R. M. Hasani, R. Grosu, A. Brintrup, A generative neural network model for the quality prediction of work in progress products, Applied Soft Computing 85 (2019) 105683.
doi:10.1016/j.asoc.2019.105683.
URL http://dx.doi.org/10.1016/j.asoc.2019.105683 - [74] C. Liu, D. Tang, H. Zhu, Q. Nie, A novel predictive maintenance method based on deep adversarial learning in the intelligent manufacturing system, IEEE Access 9 (2021) 49557–49575. doi:10.1109/ACCESS.2021.3069256.
-
[75]
M. Shah, V. Vakharia, R. Chaudhari, J. Vora, D. Y. Pimenov, K. Giasin, Tool wear prediction in face milling of stainless steel using singular generative adversarial network and lstm deep learning models, The International Journal of Advanced Manufacturing Technology 121 (1–2) (2022) 723–736.
doi:10.1007/s00170-022-09356-0.
URL http://dx.doi.org/10.1007/s00170-022-09356-0 - [76] A. Harfoush, A. Tabei, K. R. Haapala, I. Ghamarian, A framework for predicting grain morphology during incremental sheet metal forming using generative adversarial networks, Manufacturing Letters 35 (2023) 1081–1088, 51st SME North American Manufacturing Research Conference (NAMRC 51). doi:https://doi.org/10.1016/j.mfglet.2023.08.083.
- [77] S. R. Chhetri, A. B. Lopez, J. Wan, M. A. Al Faruque, Gan-sec: Generative adversarial network modeling for the security analysis of cyber-physical production systems, in: 2019 Design, Automation and Test in Europe Conference and Exhibition (DATE), 2019, pp. 770–775. doi:10.23919/DATE.2019.8715283.
- [78] T. Zhou, Z. Ma, Q. Wen, X. Wang, L. Sun, R. Jin, Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting, in: International conference on machine learning, PMLR, 2022, pp. 27268–27286.
-
[79]
Y. Nie, N. H. Nguyen, P. Sinthong, J. Kalagnanam, A time series is worth 64 words: Long-term forecasting with transformers (2022).
doi:10.48550/ARXIV.2211.14730.
URL https://arxiv.org/abs/2211.14730