ForecastGAN: A Decomposition-Based Adversarial Framework for Multi-Horizon Time Series Forecasting

Syeda Sitara Wishal Fatima [email protected] Afshin Rahimi

Abstract

Time series forecasting is essential across domains from finance to supply chain management. This paper introduces ForecastGAN, a novel decomposition based adversarial framework addressing limitations in existing approaches for multi-horizon predictions. Although transformer models excel in long-term forecasting, they often underperform in short-term scenarios and typically ignore categorical features. ForecastGAN operates through three integrated modules: a Decomposition Module that extracts seasonality and trend components; a Model Selection Module that identifies optimal neural network configurations based on forecasting horizon; and an Adversarial Training Module that enhances prediction robustness through Conditional Generative Adversarial Network training. Unlike conventional approaches, ForecastGAN effectively integrates both numerical and categorical features. We validate our framework on eleven benchmark multivariate time series datasets that span various forecasting horizons. The results show that ForecastGAN consistently outperforms state-of-the-art transformer models for short-term forecasting while remaining competitive for long-term horizons. This research establishes a more generalizable approach to time series forecasting that adapts to specific contexts while maintaining strong performance across diverse data characteristics without extensive hyperparameter tuning.

keywords:

Time series forecasting , Generative adversarial networks , Time series decomposition , Multi-horizon prediction

\affiliation

[1]organization=Department of Mechanical, Automotive and Materials Engineering,addressline=401 Sunset Ave, city=Windsor, postcode=N9B 3P4, state=ON, country=Canada

1 Introduction

Time series data is omnipresent in today’s digital and data-abundant world. Time series forecasting serves as a critical tool in numerous applications that involve both univariate data (e.g., daily stock prices [1]) and multivariate data (e.g. temperature, humidity and wind speed for weather forecasting [2]). The versatility of time series forecasting in handling these diverse data types underscores its significance in modern analytics.

Over the past several decades, time series forecasting has evolved significantly. Traditional statistical models such as Autoregressive Integrated Moving Average (ARIMA) [3] initially dominated the field. These were followed by classical machine learning techniques including Gradient Boosting [4], Random Forest [5], and Support Vector Machines [6]. The emergence of artificial intelligence further transformed forecasting capabilities through advanced architectures like Recurrent Neural Networks [7] and Convolutional Neural Networks [8], which capture complex non-linear patterns in time series data. More recently, two significant developments have shaped the field: (1) the introduction of Generative Adversarial Network (GAN) s [9], which enable more robust adversarial training approaches, and (2) Transformer architectures [10], which have demonstrated remarkable capabilities in sequence modeling across multiple domains.

Despite these advances, current models exhibit domain-specific performance characteristics that limit their generalizability. For example, transformer based models excel in long-term forecasting, but often underperform in short-term scenarios [11]. This performance discrepancy was highlighted in our previous comparative study [12], which revealed significant variability in model performance between different datasets and forecasting horizons. These limitations indicate a need for more adaptive forecasting frameworks that can leverage the strengths of existing state-of-the-art models while maintaining flexibility across diverse forecasting contexts. Additionally, most current approaches focus exclusively on numerical features, neglecting the valuable information contained in categorical variables that are common in real-world time series data.

We propose ForecastGAN, a novel modular framework that addresses these challenges through a systematic decomposition-based approach with adversarial training. Our architecture consists of three specialized, interconnected modules:

1.

Decomposition Module: Extracts seasonal and trend components from numerical features while encoding categorical variables to maintain their information content.
2.

Model Selection Module: Dynamically selects the optimal model architecture based on dataset characteristics and forecasting horizon.
3.

Adversarial Training Module: Employs Conditional Generative Adversarial Network (cGAN) training to enhance the robustness and accuracy of predictions.

This modular design allows each component to be optimized independently while ensuring effective communication between modules. The framework maintains abstraction between various aspects of data processing, enabling more flexible adaptation to different forecasting scenarios. The architecture is presented in Figure 1.

Refer to caption — Figure 1: ForecastGAN architecture diagram (Decomposition module has the time series decomposition element, model selection module performs model selection on four of the available models, and adversarial training module is a cGAN model with a deterministically selected Generator explained in section 4.2, section 4.3 and 4.4 respectively)

The integration of these approaches is theoretically motivated by their complementary strengths. Time series decomposition isolates more predictable patterns (seasonality and trends), making the forecasting task more manageable. Model selection addresses the horizon-specific performance characteristics of different architectures. Finally, adversarial training transforms otherwise deterministic models into probabilistic ones, enhancing their robustness to data variability and uncertainty. From a mathematical perspective, cGAN learn the conditional probability distribution $P(X_{t+T}|X_{t},...,X_{0})$ of future values given historical data. This probabilistic approach better captures the inherent uncertainty in forecasting tasks compared to deterministic point estimates, particularly when dealing with complex multivariate time series. The contributions of this paper are:

•

A robust modular framework that delivers consistent performance across diverse forecasting horizons and datasets by separating the forecasting process into specialized functional components.
•

Empirical validation of adversarial training’s effectiveness in improving predictive accuracy for otherwise deterministic forecasting models.
•

New insights into the relationship between look-back window size, and forecasting horizon, with implications for future forecasting research.
•

Comprehensive evaluation across eleven benchmark datasets demonstrating an average 37.54% improvement over state-of-the-art transformer models for short-term forecasting while maintaining competitive performance for long-term horizons.

The remainder of this paper is organized as follows: Section 2 reviews related literature on time series forecasting methods. Section 3 provides theoretical background on the key concepts underlying our approach. Section 4 details the ForecastGAN architecture and its components. Section 5 describes our experimental methodology. Section 6 presents and discusses results. Finally, Section 7 concludes the paper and suggests directions for future research.

2 Related Work

The ForecastGAN architecture involves multiple concepts including time series decomposition, adversarial training, etc. We discuss the existing research to lay the foundation for model architecture. We start with discussing the model evolution for time series forecasting, followed by the applications of GANs for time series forecasting.

2.1 Time Series Forecasting Models

Traditional statistical models such as ARIMA and Exponential Smoothing (ES) are widely used for industrial time-series forecasting due to their simplicity and interpretability [13]. In some cases, these models demonstrate satisfactory performance but struggle with complex datasets that exhibit nonlinear features [14]. Machine learning techniques such as Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Artificial Neural Network (ANN) emerge as promising alternatives, offering improved performance in capturing complex relationships and nonlinearity in time series data. However, these models often require more computational resources and can be less interpretable in comparison to traditional models [15]. Deep learning models currently show superior performance in various industrial forecasting tasks, including Long Short-Term Memory (LSTM) networks and Convolutional Neural Networks (CNN) s. They excel by modeling long-term dependencies and handling high-dimensional data [16]. Nevertheless, these models can be computationally intensive and might need substantial training data for optimal performance. Hybrid models, combining different techniques, are proposed to overcome the limitations of individual models. By integrating traditional models with machine learning or deep learning techniques, hybrid models improve performance and adaptability in various industrial forecasting tasks [17]. The drawback of hybrid models is they are more accurate around particular use cases and are less likely to be effective around wider conditions. While GAN s hold potential in time-series forecasting, they face challenges like training difficulty and mode collapse. Applying GAN s in time-series forecasting remains an active research area, with ongoing development of new techniques and refinements to address these challenges. A detailed discussion on time series forecasting models and their comparison is presented in one of our earlier works [18]. Furthermore, the comparison of some State of the Art (SOTA) models for long-term and short-term forecasting has been explained in detail in another paper where we have explored the strengths of some models depending upon the forecasting horizon and the chaotic element in the training data [12].

2.2 GANs for Time Series Forecasting

The absence of a standardized evaluation framework for GAN s initially restricted their application to fields where their outputs could be visually interpreted, such as in image generation. However, the scope of GAN s has expanded recently to include time-series data, finding applications across diverse sectors, including healthcare, finance, and the energy sector [19]. For instance, GAN s combined with auto-regressive models have been explored for enhanced sequential data generation. Techniques such as conditioning GAN s on timestamp information have been developed to manage irregular sampling intervals. In probabilistic forecasting, conditional GAN s have been increasingly utilized. For example, Koochali et al. employed a conditional GAN integrated with LSTM units for univariate time series modeling, testing it on both synthetic and real-world datasets [20]. Another study used a Conditional GAN with LSTM and Multi-Layer Perceptron (MLP) components for predicting daily stock closing prices, incorporating Mean Square Error (MSE) with the generator loss to enhance performance [21]. Zhou et al. applied LSTM and CNN in an adversarial training framework for forecasting in the high-frequency stock market, focusing on minimizing forecast errors such as Mean Absolute Error (MAE) or MSE alongside the GAN ’s objective function [22].

Lin et al. proposed a traffic flow forecasting model sensitive to pattern variations, capable of providing accurate predictions in abnormal conditions without compromising regular performance [23]. This model uses a cGAN with an MLP structure and introduces two additional error terms to the standard generator loss, focusing on forecast error and reconstruction error. These advancements demonstrate the growing versatility and applicability of GAN s in time series forecasting across various sectors. Some of the popular GAN s architectures and their applications have been shown in the Appendix in Table 5.

3 Background

This section establishes the theoretical foundations for ForecastGAN’s modular architecture. We first formalize the multi-horizon time series forecasting problem, then explore the theoretical underpinnings of each core component: time series decomposition, model selection for varying horizons, and adversarial training with cGAN.

3.1 Multi-Horizon Time Series Forecasting

To design a multivariate forecasting model consider multivariate time-series $X={X_{0},X_{1},...,X_{T}}$ , where each $X_{t}={x_{t,1},x_{t,2},...,x_{t,f}}$ represents a feature vector at time step $t$ , with $f$ being the number of feature set and $x_{t,f}$ denotes the data point at time step $t$ for feature $f$ . The look-back or the sliding window is the span of past time steps to make predictions. Let $S$ be the sliding window size and $T$ be the future timesteps or the forecasting horizon. Given the historical data $X={\{X_{1}^{t},X_{2}^{t},...,X_{f}^{t}\}}_{t=1}^{S}$ the objective for this architecture is to predict the future values $\hat{X}={\{\hat{X}_{1}^{t},\hat{X}_{2}^{t},...,\hat{X}_{f}^{t}\}}_{t=S+1}^{S+T}$ where $X_{i}^{t}$ is the value of variable $i$ at timestep $t$ , $\hat{X}_{i}^{t}$ is the predicted value after $T$ timesteps. For $T=1$ , the forecasting model only gives point-wise predictions rather than a future trend. For $T>1$ the forecasting model uses single-step forecasting iteratively to predict $HT$ future values where $H$ is the multiplying factor for the number of steps in predictions. This is called iterative multi-step forecasting, which is used in this paper. In iterative multi-step forecasting, the one-step prediction is made, and for the next step, this predicted value is fed back into the model. The prediction process for iterative multi-step forecast can be given by equation 1.

$\displaystyle\hat{X}_{t+T}$	$\displaystyle=f(X_{t})$
$\displaystyle\hat{X}_{t+2T}$	$\displaystyle=f([X_{t},\hat{X}_{t+T}])$
	$\displaystyle\vdots$
$\displaystyle\hat{X}_{t+HT}$	$\displaystyle=f([X_{t},\hat{X}_{t+T},\hat{X}_{t+2T},\ldots,\hat{X}_{t+HT-1}])$	(1)

The other method for predicting the next steps is direct multi-step forecasting, in which separate models are trained for each forecasting step. Each model directly predicts the value of the time series at a specific future time step. This approach can mathematically be represented as equation 2.

$\displaystyle X_{t+T}$	$\displaystyle=f_{1}(X_{t})$
$\displaystyle X_{t+2T}$	$\displaystyle=f_{2}(X_{t})$
	$\displaystyle\vdots$
$\displaystyle X_{t+HT}$	$\displaystyle=f_{H}(X_{t})$	(2)

Each approach has theoretical advantages and limitations. Iterative methods can accumulate errors over multiple steps, particularly when the forecasting model has significant uncertainty. Conversely, direct methods require training multiple models, increasing computational complexity but potentially yielding higher accuracy for specific horizons. For medium to large values of $T$ , direct multi-step forecasting often produces more accurate results by optimizing each model for its specific target horizon. ForecastGAN leverages this insight by employing a model selection approach that considers the specific forecasting horizon.

3.2 Time Series Decomposition

Harvey and Peters [24] initially presented the idea of decomposing time series data into multiple cyclic and ordered sets, proposing that the original data can be divided into trend, seasonality, and holiday components. Classical decomposition theory separates a time series into:

X_{t}=T_{t}+S_{t}+R_{t}

(3)

Where $T_{t}$ represents trend, $S_{t}$ represents seasonality, and $R_{t}$ represents residuals or irregular components. This decomposition provides several theoretical advantages:

•

Complexity Reduction: By isolating predictable patterns (trend and seasonality), the forecasting task becomes more manageable [25].
•

Component-Specific Modeling: Different components may benefit from different modeling approaches. For instance, trend components often exhibit smoother patterns suitable for linear models, while seasonal components may require more flexible nonlinear approaches [26].
•

Feature Enhancement: Decomposition effectively creates new features that capture different temporal dynamics, enriching the information available to subsequent modeling stages [27].

Some famous examples of using decomposition as a preprocessing tool for historical data are seen in Prophet [26] where the input data is divided into a trend, seasonality, and holiday components,Neural Basis Expansion Analysis for Interpretable Time Series (N-BEATS) model [28] uses a similar concept in basis expansion for univariate time series point forecasting and DeepGLO [29] uses the concept of dividing the original time series in $k$ basis time series with matrix factorization .

In our implementation, we employ average pooling with appropriate padding to extract trend components, following the approach in [25]. The trend cyclic component captures the long-term data trends, and seasonality captures the apparent effects of certain time elements on the underlying value. Consider the original time series as $X\in\mathbb{R}^{Nxf}$ where $N$ is the length of the series and $f$ is the number of features. The extracted trend $X_{t}\in\mathbb{R}^{Nxf}$ and $X_{s}\in\mathbb{R}^{Nxf}$ components can be given as:

	$\displaystyle X_{t}$	$\displaystyle=AvgPool(Padding(X))$		(4)
	$\displaystyle X_{s}$	$\displaystyle=X-X_{t}$		(5)

where the Average Pooling (AvgPool) is used to divide the series into overlapping (or non-overlapping) regions and compute the average. This moving average operation is used to smooth the fluctuations in the data, making the series easier to predict. Padding is used to control the spatial dimensions of the series, i.e., to keep the length of the series the same as the original.

3.3 Theoretical Limitations of Transformers for Short-Term Forecasting

Transformer models have demonstrated exceptional capabilities for long-term forecasting but often underperform in short-term scenarios. This limitation has a theoretical basis in the architecture’s design:

•

Self-Attention Mechanism: Transformers rely on self-attention mechanisms that are inherently permutation-invariant. While positional encoding attempts to preserve temporal order, some temporal information is lost, particularly for fine-grained short-term patterns [11, 30].
•

Parameter Efficiency: Transformer models typically contain millions of parameters, which may lead to overfitting when applied to short-term forecasting with limited data points [25].
•

Context Window Utilization: For short-term forecasting, local patterns within a small temporal neighborhood often contain most of the relevant information. Transformers’ global attention mechanisms may unnecessarily distribute focus across the entire sequence [31, 32].

These theoretical considerations suggest that simpler models, such as linear networks with appropriate embeddings, might outperform transformers for short-term forecasting tasks [11]. This insight motivates our Model Selection Module, which can adaptively choose between different model architectures based on the forecasting horizon.

3.4 Adversarial Training for Robust Forecasting

Adversarial training offers a theoretical framework for enhancing model robustness by exposing the model to challenging examples during training [33]. In the context of time series forecasting, this approach addresses several fundamental challenges. Time series data often exhibits distribution shifts between training and testing periods. Adversarial training helps models become more robust to such shifts. Deterministic forecasting models provide point estimates without capturing prediction uncertainty. Adversarial frameworks, particularly GANs, learn the conditional distribution of future values, inherently capturing uncertainty. In time series with multiple possible futures, standard forecasting models might average across possibilities, producing unrealistic predictions. GANs can potentially capture multimodal future distributions.

Adversarial training is a technique employed to improve the generalization and robustness of models against adversarial attacks [33]. It involves training a model with clean and adversarially perturbed examples to make it more robust to small but intentionally worst-case perturbations. Consider a predicting model $f_{\theta}$ with parameters $\theta$ and input $X$ and $Y$ as the ground truth. The adversarial example $X^{\prime}$ is generated by adding a small perturbation $\lambda$ to $X$ such that $X^{\prime}=X+\lambda$ and $\lambda$ is designed to maximize the loss $\mathcal{L}(f_{\theta}(X^{\prime}),Y)$ . Thus, the objective in adversarial training for a data distribution $\mathcal{D}$ reduces to a min-max optimization task:

\displaystyle\text{min}_{\theta}\mathbb{E}_{(X,Y)\sim\mathcal{D}}[\text{max}_{\lambda}\mathcal{L}(f_{\theta}(X+\lambda),Y)]

(6)

Transitioning from deterministic to probabilistic models can further enhance the robustness of the predictive models [34]. In a deterministic model, the output $f_{\theta}(x)$ is a single point estimate, but a probabilistic model predicts a distribution over possible outcomes. This shift can be achieved by modeling the output as a random variable and using Bayesian methods or variational inference[35]. Mathematically, for the predicted output $\hat{Y}$ instead of predicting $\hat{Y}=f_{\theta}(X)$ , a probabilistic model predicts $\hat{Y}\sim P(Y|X,\theta)$ which is a probability distribution parameterized by $\theta$ . The architecture for cGAN is shown in Fig. 2.

The generator functions as the probabilistic model, while the discriminator provides essential gradients for optimizing the generator during its training phase. To learn $P(X_{t+T}|X_{t},...,X_{0})$ , we utilize historical data ${X_{t},...,X_{0}}$ as the condition in the cGAN. The generator is tasked with producing $X_{t+T}$ , thereby learning the probability distribution equivalent to $P(X_{t+T}|X_{t},...,X_{0})$ , which is the desired target distribution. The value function employed in the training of GAN (the probabilistic forecast model) is formulated as follows:

\displaystyle\min_{G}\max_{D}V(D,G)=\mathbb{E}_{x\sim P_{\text{data}}(x)}[\log(D(x))]+\mathbb{E}_{z\sim P_{\text{noise}}(z)}[\log(1-D(G(z)))]

(7)

where $min_{G}$ $max_{D}$ represents the min-max game between the generator $G$ and discriminator $D$ . $V(D,G)$ is the value function for the GAN. $log(D(x))$ is the logarithm of the probability that $D$ assigns to real data, where $x$ is the real data. $log(1-D(G(z)))$ is the logarithm of the probability that $D$ assigns to fake data where $G(z)$ is the real data.

This probabilistic framework enables the model to quantify uncertainty in its predictions, which can be particularly valuable in adversarial settings. Training the model to predict distributions rather than point estimates makes it more adept at handling the variability and uncertainty introduced by adversarial perturbations, ultimately leading to more robust and reliable forecasting systems.

4 Methodology

The ForecastGAN architecture has been presented in Figure 1. This section presents the ForecastGAN architecture in detail. We begin with an overview of the framework, followed by in-depth explanations of each module, their interactions, and the overall workflow. The framework consists of three specialized, interconnected modules designed to address specific aspects of the forecasting challenge:

1.

Decomposition Module: Processes raw time series data by decomposing numerical features into seasonal and trend components, encoding categorical features, and extracting temporal features from date-time columns.
2.

Model Selection Module: Evaluates multiple model architectures on the processed data to identify the optimal configuration for the specific dataset and forecasting horizon.
3.

Adversarial Training Module: Employs conditional GAN training to enhance the robustness and accuracy of the selected model.

This modular design enables each component to be optimized independently while maintaining effective information flow between stages. The framework supports both short-term and long-term forecasting by adaptively selecting appropriate model configurations based on the specific forecasting task.

4.1 Look-Back Window Aggregator

The aggregator block outputs the data according to the set look-back window size. This represents the extent of consolidating the past data and is indicative of how much micro-level information is needed for the said prediction step. For example, for single-step prediction i.e., $T=1$ a value of $S=96$ can lose information necessary for good predictions. The impact of the sliding window size is discussed in detail in Section 6. For the aggregator, mean is used for the continuous variables and mode is used for the categorical variable values.

4.2 Decomposition Module

The Decomposition Module serves as the preprocessing foundation for the forecasting architecture. It transforms raw multivariate time series data into a format that highlights relevant patterns and preserves the information content of different feature types. For each numerical feature, the module performs time series decomposition using the following approach:

•

Trend Extraction: Apply average pooling with appropriate padding to extract the cyclic-trend component:

$X_{t}=\text{AvgPool}(\text{Padding}(X))$ (8)
•

Seasonality Extraction: Subtract the trend component from the original series to obtain the seasonal component:

$X_{s}=X-X_{t}$ (9)

This decomposition isolates predictable patterns (trend and seasonality), making the forecasting task more manageable for subsequent modules.

4.2.1 Categorical Feature Processing

For categorical features, the module applies one-hot encoding to transform them into a numerical representation while preserving their information content. This encoding creates a binary vector for each categorical value, allowing the model to leverage categorical information without imposing arbitrary ordinal relationships.

4.2.2 Temporal Feature Extraction

For datetime columns, the module extracts temporal proxy features that capture cyclical patterns at different timescales:

•

Day of week (captures weekly patterns)
•

Day of month (captures monthly patterns)
•

Month of year (captures yearly patterns)
•

Hour of day (captures daily patterns)
•

Minute of hour (captures hourly patterns)
•

Quarter (captures quarterly patterns)

These derived features provide explicit temporal context that helps models identify recurring patterns at different timescales.

4.2.3 Feature Embedding

The processed individual features are combined into a unified dataset and embedding is applied to preserve temporal relationships. As illustrated in Figure 3, this embedding process creates fixed-length vectors that incorporate information from all feature types at each time step. The complete algorithm for the Decomposition Module is presented in Algorithm 1.

Algorithm 1 Decomposition Module Algorithm

1:Multivariate time series data

X

with

t

time steps and

f

features

2:Decomposed, processed and embedded time series data

3:Initialize empty lists for seasonal components, trend components, and encoded categorical features

4:if data contains categorical features then

5: for each feature

f

X

6: if feature

f

is numerical then

7: Apply average pooling with padding on

X(f)

to obtain the cyclic-trend component

X_{t}(f)

8: Calculate the seasonality component

X_{s}(f)

by subtracting

X_{t}(f)

from

X(f)

9: Append

X_{t}(f)

and

X_{s}(f)

to their respective lists

10: else if feature

f

is categorical then

11: Apply one-hot encoding to

X(f)

to obtain encoded features

12: Append encoded features to categorical features list

13: end if

14: Extract column with type datetime and generate time features

15: Combine all processed features into a single dataset

16: Embed the combined data to preserve temporal information

17: Forward the embedded data

18: end for

19:else

20: for each feature

f

X

21: Apply average pooling with padding on

X(f)

to obtain the cyclic-trend component

X_{t}(f)

22: Calculate the seasonality component

X_{s}(f)

by subtracting

X_{t}(f)

from

X(f)

23: Append

X_{t}(f)

and

X_{s}(f)

to their respective lists

24: Forward the trend and seasonality list

25: end for

26:end if

4.3 Model Selection Module

The Model Selection Module identifies the optimal model architecture for a given dataset and forecasting horizon. This module addresses the observation that different model architectures exhibit varying performance characteristics depending on the specific forecasting task.

4.3.1 Model Variants

Inspired by [11], the module evaluates four variations of linear networks:

1.

Linear: Simple one-layer linear model serving as a baseline. It applies a linear transformation to the original multivariate time series data:

$\hat{X}_{i}=\mathbb{W}X_{i}$ (10)

where $\mathbb{W}\in\mathbb{R}^{T\times L}$ is the weight matrix, $X_{i}$ is the input for the $i$ -th variable, and $\hat{X}_{i}$ is the corresponding prediction.
2.

NLinear: Extends the linear model with input sequence normalization. It normalizes by subtracting the last value of the sequence from the input, applies the linear transformation, and then adds back the subtracted value:

$\hat{X}_{i}=\mathbb{W}(X_{i}-X_{t})+X_{t}$ (11)

where $X_{t}$ is the last value in the input sequence.
3.

DELinear: Applies a linear layer to the decomposed and embedded data for the input data containing both categorical and continuous features:

$\hat{X}_{i}=\mathbb{W}D_{i}$ (12)

where $D_{i}$ represents the decomposed and embedded data.

DLinear: For datasets without categorical features, this model applies separate linear layers to the seasonal and trend components:

$\displaystyle\hat{X}_{s,i}$	$\displaystyle=\mathbb{W}_{s}X_{s,i}$	(13)
$\displaystyle\hat{X}_{tr,i}$	$\displaystyle=\mathbb{W}_{tr}X_{tr,i}$	(14)
$\displaystyle\hat{X}_{i}$	$\displaystyle=\hat{X}_{s,i}+\hat{X}_{tr,i}$	(15)

where $X_{s,i}$ and $X_{tr,i}$ are the seasonal and trend components, respectively, and $\mathbb{W}_{s}$ and $\mathbb{W}_{tr}$ are their corresponding weight matrices.

The choice of linear models is motivated by their simplicity, stability, and computational efficiency, which make them particularly well-suited for adversarial training. Additionally, recent research has shown that these models can outperform complex transformer architectures for certain forecasting tasks [11].

4.3.2 Selection Process

The module evaluates each model on the validation set and selects the one with the lowest validation loss. This selection process can be formalized as:

M^{*}=\arg\min_{M\in\mathcal{M}}\mathcal{L}(M,X_{val},Y_{val})

(16)

where $M^{*}$ is the selected model, $\mathcal{M}$ is the set of candidate models, $\mathcal{L}$ is the loss function (e.g., MSE), and $X_{val}$ and $Y_{val}$ are the validation inputs and targets, respectively. The complete algorithm for the Model Selection Module is presented in Algorithm 2.

Algorithm 2 Model Selection Module Algorithm

1:Input time series data

X

, decomposed datasets

X_{s}

and

X_{t}

from Decomposition Module, validation data

2:Best linear model and corresponding predictions

\hat{X}

3:Initialize best model as None and best loss as

\infty

4:for each model in models do

5: if model is Linear then

6: Apply linear regression on

X

using the weight vector

\hat{X}_{i}=\mathbb{W}X_{i}

8: else if model is NLinear then

9: Normalize

X

by subtracting the last value of

X_{t}

from each

X_{i}

10: Apply linear regression on normalized

X

11:

\hat{X}_{i}=\mathbb{W}(X_{i}-X_{t})+X_{t}

12: else if model is DELinear then

13: Linear regression on decomposed data

D

using the weight vector

14:

\hat{X}_{i}=\mathbb{W}D_{i}

15: else if model is DLinear then

16: Apply linear regression on

X_{s}

and

X_{tr}

components separately

17:

\hat{X}_{s,i}=\mathbb{W}_{s}X_{s,i}

18:

\hat{X}_{tr,i}=\mathbb{W}_{tr}X_{tr,i}

19:

\hat{X}_{i}=\hat{X}_{s,i}+\hat{X}_{tr,i}

20: end if

21: Calculate validation loss for the model

22: if current loss

<

best loss then

23: Update best model and best loss

24: end if

25:end for

26:Return best model and configuration

4.4 Adversarial Training Module

The Adversarial Training Module enhances the selected model through conditional GAN training. This approach transforms the deterministic forecasting model into a probabilistic one, improving its robustness and generalization capabilities. The module consists of two primary components: The Generator is the best model selected by the Model Selection Module serves as the generator. It takes historical time series data as input and generates future predictions. While the Discriminator is a neural network that distinguishes between real and generated time series data. The discriminator architecture includes the input layer accepting the concatenated time series data and conditional information. Hidden layers have Dense layers with LeakyReLU activation, batch normalization, and dropout for regularization. The output layer is a single unit with sigmoid activation that outputs the probability of the input being real. The adversarial training process involves two alternating steps:

•
Discriminator Training:
- –
  
  Sample real data from the training set
- –
  
  Generate fake data using the generator
- –
  
  Compute discriminator loss for real data:
  
  $\mathcal{L}{\text{real}}=\text{BCE}(D(X{\text{real}}|c),1)$ (17)
- –
  
  Compute discriminator loss for fake data:
  
  $\mathcal{L}{\text{fake}}=\text{BCE}(D(G(z|c)),0)$ (18)
- –
  
  Update discriminator parameters to minimize the combined loss:
  
  $\mathcal{L}D=\mathcal{L}{\text{real}}+\mathcal{L}{\text{fake}}$ (19)
•
Generator Training:
- –
  
  Generate fake data using the generator
- –
  
  Compute adversarial loss to fool the discriminator:
  
  $\mathcal{L}_{G}=\text{BCE}(D(G(z|c)),1)$ (20)
- –
  
  Update generator parameters to minimize the adversarial loss

where BCE represents binary cross-entropy loss, $D$ is the discriminator, $G$ is the generator, $z$ is random noise, and $c$ is the conditional information (historical time series data).

4.4.1 GAN Stability Measures

GAN training is notoriously unstable, especially for time series data. We implement several measures to enhance stability:

•

Gradient Penalty: We apply a gradient penalty to the discriminator’s loss to enforce Lipschitz continuity, which helps prevent mode collapse and gradient explosion:

$\mathcal{L}{GP}=\lambda{GP}\mathbb{E}{\hat{x}\sim\mathbb{P}{\hat{x}}}[(|\nabla_{\hat{x}}D(\hat{x})|2-1)^{2}]$ (21)

where $\hat{x}$ is a sample from a distribution $\mathbb{P}{\hat{x}}$ that interpolates between real and generated samples.
•

Spectral Normalization: Applied to the discriminator’s weights to constrain its Lipschitz constant, further stabilizing training.
•

Two-Timescale Update Rule (TTUR): Different learning rates for the generator and discriminator, which has been shown to improve convergence.

4.4.2 Inference Process

For inference (making predictions on new data), we use only the generator component of the trained GAN. The generator takes historical time series data as input and produces forecasts for the specified horizon. The complete algorithm for the Adversarial Training Module is presented in Algorithm 3.

Algorithm 3 Adversarial Training Module Algorithm

1:Best model parameters from Model Selection Module, training data

X_{\text{train}}

, test data

X_{\text{test}}

, number of epochs

2:Adversarially trained generator model

3:Initialize generator using the best model architecture and weights from Model Selection Module

4:Initialize discriminator with neural network architecture including batch normalization and dropout

5:for specified number of epochs do

6: Step 1: Train Discriminator

•

Sample batch of real data $X_{\text{real}}$ from $X_{\text{train}}$
•

Generate batch of fake data $X_{\text{fake}}=\text{Generator}(X_{\text{input}})$
•

Compute discriminator loss for real data:
•

$\mathcal{L}_{\text{real}}=\text{BCE}(\text{Discriminator}(X_{\text{real}}),1)$
•

Compute discriminator loss for fake data:
•

$\mathcal{L}_{\text{fake}}=\text{BCE}(\text{Discriminator}(X_{\text{fake}}),0)$
•

Apply gradient penalty
•

Combine losses and update discriminator parameters:
•

$\mathcal{L}_{\text{D}}=\mathcal{L}_{\text{real}}+\mathcal{L}_{\text{fake}}+\lambda_{\text{GP}}\mathcal{L}_{\text{GP}}$
•

Update discriminator parameters to minimize $\mathcal{L}_{\text{D}}$

7: Step 2: Train Generator Adversarially

•

Sample batch of input data $X_{\text{input}}$ from $X_{\text{train}}$
•

Generate batch of fake data $X_{\text{fake}}=\text{Generator}(X_{\text{input}})$
•

Compute generator loss to fool the discriminator:
•

$\mathcal{L}_{\text{G}}=\text{BCE}(\text{Discriminator}(X_{\text{fake}}),1)$
•

Update generator parameters to minimize the adversarial loss $\mathcal{L}_{\text{G}}$

8:end for

9:Return the adversarially trained generator model

4.5 Complexity Analysis

The computational complexity of ForecastGAN can be analyzed for each module: For the Decomposition Module, the complexity is dominated by the average pooling operation, which has a complexity of $O(nf)$ , where $n$ is the number of time steps and $f$ is the number of features. In Model Selection Module, the linear models have training complexity of $O(nfd)$ , where $d$ is the dimensionality of the feature space after decomposition and embedding. Evaluating all four model variants has a complexity of $O(4nfd)$ . Lastly, for the Adversarial Training Module, the complexity depends on the selected model architecture and the number of training epochs. For a linear generator, the complexity is approximately $O(enfd)$ , where $e$ is the number of epochs.

The overall computational complexity of ForecastGAN is therefore $O(nf+4nfd+enfd)=O(nfd(4+e))$ , which is significantly lower than transformer-based approaches with complexity on the order of $O(n^{2}d)$ due to the self-attention mechanisms [11]. In practice, this translates to faster training times. For example, on the ETTh1 dataset with $T=96$ , ForecastGAN trains in approximately 15 minutes on a single NVIDIA RTX GPU, compared to over an hour for transformer-based models like Informer on the same hardware [36].

5 Experiments

To evaluate ForecastGAN comprehensively, we conducted extensive experiments across multiple datasets with varying forecasting horizons. This section details our experimental methodology, including datasets, baseline models, evaluation metrics, and implementation details.

5.1 Datasets

Extensive experiments are conducted for eleven standard real-world multivariate time series datasets for long-term forecasting. The complete details of these datasets are given in 1.

Table 1: Details of eleven popular multivariate time series datasets used for ForecastGAN evaluation

Dataset	Features	Timesteps	Sample Rate
ETTh1	7	17,420	1 hour
ETTh2	7	17,420	1 hour
ETTm1	7	69,680	5 minutes
ETTm2	7	69,680	5 minutes
Productivity	15	1,197	1 hour
Electricity	321	26,304	1 hour
Illness	7	966	1 week
Traffic	862	17,544	1 hour
Weather	21	52,696	10 minutes
Exchange Rate	8	7,588	1 day
Stock Price	84	7,936	1 day

The datasets represent a wide range of forecasting challenges:

•

ETT (Electricity Transformer Temperature): Four datasets (ETTh1, ETTh2, ETTm1, ETTm2) containing power load and oil temperature readings at different temporal resolutions. These datasets are widely used benchmarks for long-term forecasting [36].
•

Productivity: Records garment employee productivity measured hourly during 9-hour daily shifts. The target metric is the normalized productivity value between 0 and 1. This dataset contains both numerical and categorical features, making it particularly suitable for evaluating our framework’s ability to handle mixed feature types [37].
•

Electricity: Contains hourly electricity consumption measurements for 321 customers. This high-dimensional dataset tests the framework’s scalability to large feature spaces [38].
•

Illness: Weekly records of patients with flu-like illnesses from the CDC, featuring strong seasonal patterns and challenging long-term dependencies [39].
•

Traffic: Hourly road occupancy rates measured by sensors on San Francisco Bay Area freeways. With 862 features, this is the highest-dimensional dataset in our evaluation [40].
•

Weather: Weather condition measurements in Germany for 2020, featuring diverse meteorological variables with complex interdependencies [41].
•

Exchange Rate: Daily exchange rates for 8 countries, characterized by high volatility and non-stationarity [40].
•

Stock Price: Daily closing prices of major stock indices including S&P 500, NASDAQ, Dow Jones, Russell 2000, and NYSE Composite from 2010 to 2017 [42].

These datasets were selected to represent a diverse range of forecasting challenges, including different temporal resolutions (from 5 minutes to 1 week), dimensionality (from 7 to 862 features), domains (energy, transportation, health, finance, etc.), and temporal characteristics (seasonal patterns, trends, volatility, etc.).

5.2 Data Preprocessing and Splitting

For each dataset, we applied the following preprocessing steps:

1.

Missing Value Handling: Missing values were imputed using forward fill followed by backward fill to ensure completeness.
2.

Normalization: Numerical features were normalized using min-max scaling to the range [0,1] to ensure consistent scale across features.
3.

Train-Validation-Test Split: Each dataset was divided into training (70%), validation (10%), and testing (20%) sets using temporal splits rather than random sampling to preserve the chronological order of observations. This approach ensures that future data is not used to predict past events, maintaining the integrity of the forecasting task.

Figure 5 in the Appendix illustrates the data distributions across train and test sets for all datasets, highlighting the differences in distribution that make certain datasets particularly challenging.

5.3 Forecasting Horizons

To evaluate performance across different forecasting scenarios, we conducted experiments with multiple prediction horizons:

•

Long-term Forecasting: Horizons of $T\in\{96,192,336,720\}$ time steps for most datasets, with $T\in\{24,36,48,60\}$ for the Illness dataset due to its weekly sampling rate.
•

Short-term Forecasting: Horizons of $T\in\{12,24,32,48\}$ time steps for most datasets, with $T\in\{2,6,8,10\}$ for the Illness dataset.
•

Single-step Forecasting: Horizon of $T=1$ to evaluate immediate next-step prediction performance.

For each forecasting horizon, we experimented with different look-back window sizes to identify optimal configurations. The primary look-back window sizes used were $S=96$ for long-term forecasting, $S=12$ for short-term forecasting, and $S=1$ for single-step forecasting, with adjustments for the Illness dataset ( $S=24$ , $S=2$ , and $S=1$ respectively).

5.4 Baseline Models

We compared ForecastGAN against two groups of baseline models:

5.4.1 Transformer-based Models

For multi-step forecasting, we compared against state-of-the-art transformer models:

•

Informer [36]: A transformer model with ProbSparse self-attention that reduces complexity from $O(L^{2})$ to $O(L\log L)$ .
•

Robformer [43]: A robust transformer architecture that integrates adaptive normalization techniques and specialized attention mechanisms designed to handle noise and outliers in time series data, resulting in improved stability for financial and volatile datasets.
•

TimeXer [44]: Employs a time-frequency dual-domain modeling approach that leverages wavelet transforms to capture multi-scale temporal dynamics, particularly effective for time series with complex non-stationary behaviors.
•

Crossformer [45]: Employs a two-stage attention mechanism to capture both temporal and feature dependencies.
•

Pathformer [46]: Introduces a path-dependent attention mechanism that models sequential dependencies through learnable routing paths, allowing the model to focus on the most relevant historical patterns for different forecasting contexts.
•

Client [47]: Incorporates latent interval transformations to capture time series dynamics more effectively.

A comparison with some other popular transformer-based architectures including Autoformer, FEDformer and PatchTST has been provided in appendix.

5.4.2 Machine Learning Models for Short-term Forecasting Baseline

For single-step forecasting, we additionally compared against traditional machine learning models, including linear approaches (Linear Regression, Bayesian Ridge Regression, Orthogonal Matching Pursuit, Huber Regressor) and tree-based ensemble methods (XGBoost, LightGBM, CatBoost, and Random Forest). These models were selected for their established performance in time series forecasting and to provide a diverse baseline spanning different algorithmic families.

5.5 System Information

For comparative evaluation purposes, baseline results for Informer, Robformer, TimeXer and Pathformer were sourced from the comprehensive benchmarking study by [11] and their original papers [43, 44, 46]. Results for Crossformer were partially obtained from the original publication, while additional evaluations—specifically for ETTm2, ETTh2, and alternative forecasting windows across other datasets—were independently reproduced using the official implementation available in the authors’ repository¹¹1https://github.com/Thinklab-SJTU/Crossformer. All ForecastGAN experiments and additional baseline evaluations were conducted on a high-performance computing environment equipped with dual NVIDIA Titan RTX GPUs (24GB GDDR6 memory each), utilizing CUDA 12.2 to optimize GPU acceleration and parallel processing capabilities.

5.6 Evaluation Metrics

We evaluated model performance using two standard metrics for regression tasks:

1.

Mean Absolute Error (MAE): Measures the average absolute difference between predictions and ground truth:

$\text{MAE}=\frac{1}{H}\sum_{i=1}^{H}|y_{T+i}-\hat{y}_{T+i}|$ (22)
2.

Mean Squared Error (MSE): Measures the average squared difference between predictions and ground truth:

$\text{MSE}=\frac{1}{H}\sum_{i=1}^{H}(y_{T+i}-\hat{y}_{T+i})^{2}$ (23)

where $H$ is the forecast horizon, $T$ is the length of the look-back window, $y$ is the ground truth, and $\hat{y}$ is the predicted value. These metrics were chosen for their interpretability and compatibility with previous forecasting literature, enabling direct comparisons with state-of-the-art approaches.

5.6.1 ForecastGAN Configuration

The implementation details for ForecastGAN are presented in Table 2, which outlines the key parameters for each module.

Table 2: ForecastGAN implementation configuration by module

Module	Parameter	Configuration
Decomposition Module	Pooling	Average pooling, kernel size 25
	Padding	’same’ (maintains temporal dimensions)
	Categorical encoding	One-hot (if $\leq$ 10 unique values), otherwise ordinal
Model Selection Module	Training	100 epochs with early stopping (patience=10)
	Optimizer	Adam (learning rate=0.001)
	Loss function	Mean Squared Error (MSE)
	Batch size	32
Adversarial Training Module	Discriminator architecture	3-layer MLP (128, 64 units) with LeakyReLU(0.2)
	Batch normalization	Applied after each layer (momentum=0.8)
	Dropout	Rate of 0.3 for regularization
	Generator optimizer	Adam (lr=0.0002, $\beta_{1}$ =0.5, $\beta_{2}$ =0.999)
	Discriminator optimizer	Adam (lr=0.0001, $\beta_{1}$ =0.5, $\beta_{2}$ =0.999)
	Gradient penalty	$\lambda_{GP}=10$
	Training	200 epochs with validation-based early stopping
	Batch size	64

This modular configuration enabled efficient training while maintaining robust performance across diverse forecasting scenarios. The differential learning rates and optimization parameters between the generator and discriminator were specifically tuned to enhance GAN training stability.

6 Results and Discussions

This section presents and analyzes the experimental results, comparing ForecastGAN against baseline models across different forecasting horizons and datasets. We also include results from ablation studies and sensitivity analyses to provide deeper insights into the framework’s behavior.

6.1 Long-term Forecasting Performance

For long-term forecasting, ForecastGAN is compared against six state-of-the-art models including both transformer-based architectures (Informer, Crossformer, TimeXer) and specialized time series models (Robformer, Pathformer, Client). This comprehensive comparison is justified as these models employ direct multi-step forecasting rather than iterative approaches, which are known to suffer from error accumulation over longer horizons. Table 3 presents the comparative results across nine benchmark datasets with varying forecasting horizons. Results marked with $*$ indicate values that were evaluated using publicly available repositories, while other baseline results were obtained from previously published benchmarks [11, 47]. ForecastGAN demonstrates strong performance across multiple datasets, with the most substantial improvements observed on the Exchange Rate dataset (average improvement of 26.73% across all horizons) and ETTm1 dataset (average improvement of 10.77%). The results show particular strength in capturing complex patterns for datasets with pronounced seasonality and trend components. Conversely, more modest performance is observed for the Traffic dataset, where ForecastGAN shows an average improvement of -3.66% compared to the best baseline, with Client model outperforming for longer horizons. For datasets with less pronounced temporal patterns, Linear or NLinear models are selected, demonstrating the effectiveness of the Model Selection Module in identifying appropriate architectures for different data characteristics.

An important observation is that ForecastGAN’s performance advantage tends to decrease as forecasting horizons increase, particularly for horizons beyond 336 time steps. This pattern is most evident in the ETTh2 and Electricity datasets, where Pathformer demonstrates competitive performance for horizons of 336 and 720. Similarly, for the Illness dataset with its unique weekly sampling rate, ForecastGAN performs slightly inferior to Pathformer at the longest horizon (60 steps). This suggests that while ForecastGAN excels at capturing both short and medium-term dependencies, extremely long-term forecasting remains challenging for all approaches. Despite these trade-offs, ForecastGAN maintains significant computational advantages over transformer-based alternatives. With fewer parameters and more efficient training, ForecastGAN achieves competitive or superior performance while requiring substantially less computational resources than models like Informer or Crossformer, which contain millions of parameters. This efficiency makes ForecastGAN particularly suitable for real-world applications with computational constraints. The sensitivity to look-back window size, illustrated in Figure 4, further explains performance variations across different forecasting horizons. ForecastGAN performs optimally when the look-back window size is closer to the prediction step $T$ , providing a practical guideline for implementation in various forecasting scenarios.

Table 3: Performance comparison of different models for Long-term time series forecasting

Methods		Imp	ForecastGAN		Robformer		TimeXer		Pathformer		Informer		Crossformer		Client
Data	H	%	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE
ETTh1	96	8.13%	0.338	0.390	0.375	0.404	0.140	0.242	0.369	0.395	0.865	0.713	0.391	0.412	0.392	0.409
	192	10.72%	0.373	0.405	0.405	0.416	0.157	0.256	0.414	0.418	1.008	0.792	0.421	0.443	0.445	0.436
	336	7.93%	0.391	0.410	0.439	0.444	0.176	0.275	0.401	0.419	1.107	0.809	0.440	0.461	0.482	0.455
	720	4.33%	0.421	0.443	0.472	0.490	0.211	0.306	0.440	0.452	1.181	0.865	0.519	0.524	0.489	0.479
ETTh2	96	5.49%	0.251	0.329	0.295	0.403	0.157	0.205	0.276	0.334	3.206	1.741	0.311	0.389	0.265	0.336
	192	16.13%	0.312	0.348	0.395	0.457	0.204	0.247	0.329	0.372	5.639	1.977	0.367	0.410	0.372	0.367
	336	-5.00%	0.340	0.398	0.418	0.480	0.261	0.290	0.324	0.377	4.802	1.863	0.410	0.426	0.399	0.395
	720	-6.75%	0.391	0.436	0.477	0.490	0.340	0.347	0.366	0.410	4.243	1.753	0.439	0.477	0.424	0.444
ETTm1	96	25.13%	0.116	0.286	0.299	0.352	0.382	0.403	0.155	0.236	0.672	0.571	0.155	0.236	0.336	0.369
	192	8.94%	0.302	0.343	0.335	0.365	0.429	0.435	0.331	0.361	0.795	0.669	0.331	0.361	0.376	0.385
	336	5.80%	0.341	0.374	0.369	0.386	0.468	0.448	0.362	0.382	1.212	0.871	0.362	0.382	0.408	0.407
	720	3.19%	0.389	0.402	0.425	0.421	0.469	0.461	0.412	0.414	1.166	0.823	0.402	0.402	0.477	0.442
ETTm2	96	5.52%	0.142	0.228	0.167	0.260	0.286	0.338	0.163	0.248	0.365	0.453	0.200	0.281	0.150	0.256
	192	8.22%	0.194	0.251	0.224	0.303	0.362	0.383	0.220	0.286	0.533	0.563	0.262	0.321	0.211	0.305
	336	11.68%	0.242	0.298	0.281	0.342	0.395	0.407	0.275	0.325	1.363	0.887	0.331	0.371	0.274	0.327
	720	8.96%	0.329	0.348	0.397	0.421	0.452	0.441	0.363	0.381	3.379	1.338	0.428	0.419	0.361	0.384
Weather	96	1.38%	0.145	0.198	0.182	0.257	0.318	0.356	0.147	0.184	0.300	0.384	0.410	0.453	0.147	0.195
	192	6.74%	0.178	0.216	0.220	0.282	0.362	0.383	0.191	0.229	0.598	0.544	0.483	0.510	0.191	0.242
	336	6.77%	0.218	0.268	0.265	0.319	0.395	0.407	0.234	0.268	0.578	0.523	0.495	0.515	0.234	0.301
	720	11.08%	0.281	0.311	0.323	0.362	0.452	0.441	0.316	0.323	1.059	0.741	0.526	0.542	0.316	0.348
Electricity	96	5.04%	0.121	0.210	0.184	0.305	0.140	0.242	0.134	0.218	0.274	0.368	0.219	0.287	0.127	0.236
	192	-2.14%	0.138	0.141	0.202	0.319	0.157	0.256	0.135	0.235	0.296	0.386	0.251	0.328	0.161	0.254
	336	-7.28%	0.151	0.243	0.299	0.324	0.176	0.275	0.140	0.257	0.300	0.394	0.323	0.369	0.173	0.267
	720	-4.71%	0.191	0.299	0.241	0.341	0.211	0.306	0.182	0.297	0.373	0.439	0.404	0.423	0.209	0.299
Traffic	96	4.56%	0.356	0.257	0.544	0.436	0.428	0.271	0.373	0.241	0.719	0.391	0.510	0.293	0.373	0.222
	192	-1.63%	0.395	0.285	0.543	0.406	0.448	0.282	0.380	0.252	0.696	0.379	0.523	0.291	0.373	0.222
	336	-1.52%	0.401	0.293	0.564	0.423	0.473	0.289	0.395	0.256	0.777	0.420	0.530	0.300	0.389	0.250
	720	-16.05%	0.428	0.301	0.613	0.479	0.516	0.307	0.425	0.285	0.864	0.472	0.573	0.313	0.369	0.242
Illness	24	6.38%	1.320	0.854	3.241	1.117	1.411	0.705	1.411	0.705	4.388	1.560	3.041	1.186	1.411	0.812
	36	19.89%	1.521	0.857	3.382	1.196	1.365	0.727	1.898	0.869	4.651	1.591	3.406	1.232	1.898	0.869
	48	4.02%	1.640	0.878	3.167	1.173	1.537	0.820	1.719	0.884	4.581	1.619	3.459	1.221	1.710	0.884
	60	-3.54%	1.430	0.900	3.442	1.221	1.418	0.772	1.380	0.917	4.583	1.432	3.640	1.305	2.039	0.914
Exchange	96	19.10%	0.071	0.196	0.089	0.226	0.171	0.270	0.140	0.218	0.847	0.752	0.281	0.947	0.086	0.206
	192	26.98%	0.138	0.288	0.189	0.341	0.178	0.270	0.174	0.214	1.204	0.895	0.310	0.961	0.176	0.299
	336	38.35%	0.281	0.397	0.455	0.529	0.178	0.269	0.428	0.282	1.672	1.036	0.340	1.016	0.330	0.416
	720	38.48%	0.625	0.716	1.016	0.816	0.225	0.317	0.470	0.282	2.478	1.310	0.691	1.349	0.828	0.698

6.2 Short-term Forecasting Performance

To evaluate ForecastGAN’s versatility across different forecasting horizons, we conducted extensive experiments focused on short-term forecasting, comparing against specialized time series models including transformer-based architectures and linear variants. Table 7 presents detailed results at specific short-term horizons (24 and 48 timesteps) across multiple benchmark datasets. The results demonstrate ForecastGAN’s consistent advantage over existing approaches in short-term forecasting regimes. Across all datasets and horizons, ForecastGAN achieves an average improvement of 7.90% compared to the next best model, with individual improvements ranging from marginal gains to substantial performance differences. The most significant improvements are observed on the Electricity dataset at the 48-hour horizon (26.21% reduction in MSE compared to DLinear) and Weather dataset at the 48-hour horizon (15.18% improvement over Informer).

ForecastGAN’s performance advantage is particularly notable when compared against transformer architectures like Informer and Crossformer, which were specifically designed for sequence modeling. Despite their sophisticated attention mechanisms, these models consistently underperform compared to ForecastGAN in short-term contexts. For instance, on the ETTh2 dataset at the 24-hour horizon, ForecastGAN achieves an MSE of 0.170 compared to Crossformer’s 0.207, representing an 11.98% improvement over the next best competitor (DLinear at 0.193). When examining horizon-specific performance, we observe that ForecastGAN maintains its advantage across both the 24-hour and 48-hour forecasting windows. For the shortest 24-hour horizon, ForecastGAN achieves the best performance on five out of six datasets, with particularly strong results on ETTm1 (MSE of 0.071, tied with PatchTST) and ETTm2 (MSE of 0.081, outperforming PatchTST’s 0.086 by 5.81%). For the 48-hour horizon, ForecastGAN consistently outperforms all competitors across all datasets, with improvements ranging from 2.80% to 26.21%. The detailed results also reveal that while models like TS-Fastformer, PatchTST, and DLinear occasionally show competitive performance on specific datasets and horizons, none match ForecastGAN’s consistent excellence across the entire benchmark suite. This validates our hypothesis that transformer models, while effective for long-term dependencies, have inherent limitations for short-term forecasting that ForecastGAN successfully addresses through its modular architecture.

Table 4: Performance comparison of different models for multi-horizon time series forecasting

(a) Short-term Forecasting (H=24, 48)
Methods		Imp	ForecastGAN		Robformer		PatchTST		DLinear		Informer		Crossformer		Client
Data	H	%	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE
ETTh1	24	1.61%	0.121	0.027	0.133	0.030	0.124	0.027	0.123	0.026	0.147	0.037	0.152	0.040	0.136	0.033
	48	6.62%	0.141	0.039	0.163	0.044	0.151	0.039	0.152	0.040	0.179	0.055	0.186	0.060	0.166	0.049
ETTh2	24	11.98%	0.170	0.062	0.219	0.079	0.205	0.071	0.193	0.067	0.195	0.065	0.207	0.079	0.191	0.068
	48	13.72%	0.195	0.087	0.253	0.105	0.241	0.097	0.236	0.096	0.226	0.089	0.259	0.118	0.234	0.097
ETTm1	24	0.00%	0.071	0.010	0.073	0.010	0.071	0.010	0.074	0.010	0.093	0.014	0.088	0.015	0.076	0.011
	48	3.16%	0.092	0.017	0.099	0.018	0.095	0.017	0.096	0.017	0.124	0.026	0.117	0.025	0.103	0.020
ETTm2	24	5.81%	0.081	0.013	0.092	0.019	0.086	0.018	0.095	0.021	0.106	0.020	0.111	0.025	0.098	0.023
	48	2.80%	0.139	0.035	0.143	0.042	0.143	0.041	0.147	0.044	0.153	0.043	0.159	0.049	0.148	0.047
Weather	24	3.85%	0.200	0.078	0.207	0.088	0.209	0.093	0.208	0.091	0.212	0.090	0.213	0.096	0.210	0.096
	48	15.18%	0.218	0.125	0.260	0.135	0.258	0.136	0.258	0.135	0.257	0.131	0.262	0.138	0.262	0.144
Electricity	24	3.82%	0.252	0.138	0.267	0.139	0.273	0.147	0.262	0.138	0.322	0.185	0.290	0.154	0.270	0.147
	48	26.21%	0.214	0.165	0.297	0.172	0.309	0.190	0.290	0.168	0.343	0.214	0.318	0.187	0.290	0.168

(b) Single-step Forecasting (H=1)
		Exchange		Electricity		ETTh1		ETTm1		Traffic		Weather		Illness		Productivity		Stock
Method	Metric	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE
ForecastGAN		0.031	0.028	0.171	0.143	0.213	0.181	0.159	0.147	0.155	0.054	0.314	0.198	0.107	0.061	0.421	0.239	0.026	0.021
CatBoost		0.067	0.049	0.240	0.177	0.327	0.241	0.231	0.170	0.158	0.062	0.732	0.114	0.143	0.101	0.915	0.692	0.050	0.037
RandomForest		0.070	0.050	0.280	0.204	0.368	0.239	0.169	0.099	0.165	0.060	1.215	0.089	0.207	0.143	0.856	0.663	0.037	0.027
LGBM		0.086	0.062	0.261	0.195	0.367	0.272	0.297	0.223	0.159	0.064	0.902	0.234	0.173	0.122	0.891	0.682	0.032	0.024
XGBoost		0.076	0.055	0.266	0.198	0.452	0.314	0.539	0.389	0.162	0.065	0.418	0.068	0.220	0.141	0.955	0.710	0.032	0.024
Linear Reg.		0.250	0.199	0.339	0.259	0.917	0.719	0.919	0.720	0.187	0.092	1.004	0.219	0.177	0.121	0.896	0.647	0.208	0.166
Huber		0.252	0.195	0.340	0.258	0.938	0.706	0.940	0.705	0.193	0.088	1.016	0.168	0.183	0.120	0.910	0.608	0.021	0.017
Crossformer		0.447	0.381	0.581	0.432	0.619	0.534	0.164	0.247	0.987	0.862	1.456	1.321	0.312	0.267	1.012	0.898	0.671	0.589

6.2.1 Single-Step Forecasts

ForecastGAN is also evaluated for single-step forecast i.e., $T=1$ and $S=1$ and are given in Table 4. Since the loss of transformer models takes longer to converge as the step size decreases, the common machine learning models are used to draw the comparison. The ForecastGAN performs better than all machine learning models. It is important to mention the computational time for the machine learning models is less than ForecastGAN as they are used with the default parameters.

6.3 Sensitivity Analysis for Look-Back Window

The look-back window size $S$ is a crucial hyperparameter that determines how much historical data is used for forecasting. We conducted a sensitivity analysis by varying $S$ while keeping the prediction horizon $T$ fixed. Figure 4 presents the results for two representative datasets (Electricity and Traffic) across three forecasting horizons ( $T\in{24,96,720}$ ). The analysis reveals that ForecastGAN’s performance is influenced by the look-back window size in several consistent ways. First, the optimal window size is generally proportional to the forecasting horizon, with performance improving as $S$ approaches $T$ for shorter horizons. Second, all models exhibit diminishing returns when the window size exceeds certain thresholds, with some showing performance degradation with excessively large windows. This is likely due to the inclusion of less relevant historical data that introduces noise rather than signal. The sensitivity patterns also display dataset-specific characteristics. The Traffic dataset shows more pronounced sensitivity to window size variations than the Electricity dataset, particularly at medium-term horizons ( $T=96$ ). For long-term forecasting ( $T=720$ ), ForecastGAN maintains its performance advantage across a wider range of window sizes, demonstrating its robustness to this hyperparameter in long-horizon scenarios. Based on these findings, we recommend setting $S\approx T$ for short-term forecasting and $S\approx 0.5T$ for long-term forecasting as practical starting points. These guidelines help explain ForecastGAN’s superior performance in our benchmark comparisons, as the model benefits from appropriate window sizing that balances relevant historical context with computational efficiency.

6.4 Computational Efficiency

Beyond forecasting accuracy, ForecastGAN demonstrates exceptional computational efficiency compared to other benchmark models. For the ETTh1 dataset with T=96, ForecastGAN requires only 15.2 minutes of training time, which is 2.5-4.3× faster than transformer-based alternatives like Informer (64.7 min), Crossformer (58.9 min), and PatchTST (38.2 min). Memory requirements are similarly reduced, with ForecastGAN using just 2.3 GB of GPU memory compared to Crossformer’s 6.2 GB, Informer’s 5.8 GB, and Client’s 4.2 GB. Perhaps most striking is the parameter efficiency – ForecastGAN contains only 0.18 million parameters, while models like Crossformer and Informer require over 8 and 7 million parameters respectively. This dramatic reduction in model complexity not only improves training and inference speed (processing 1,000 test samples in 0.87 seconds compared to 2.14-4.56 seconds for transformer models) but also enhances generalization on limited training data. These efficiency advantages make ForecastGAN particularly suitable for real-time applications and resource-constrained environments where computational costs are a significant consideration alongside forecasting accuracy.

6.5 Discussion of Limitations

While ForecastGAN demonstrates impressive performance across diverse datasets and forecasting horizons, several limitations merit consideration. The current Model Selection Module only considers variations of linear models, which, while computationally efficient, potentially constrains performance on certain complex datasets. As shown in our short-term forecasting results, even with these limited model choices, ForecastGAN achieves substantial improvements (average 7.90% across datasets, with up to 26.21% on the Electricity dataset), suggesting that expanding the selection to include more diverse architectures could yield further gains. Additionally, our look-back window sensitivity analysis reveals that ForecastGAN’s performance varies with hyperparameter settings, particularly for the Traffic dataset at medium horizons (T=96), where selecting appropriate window sizes is critical. Though we provide empirical guidelines based on our findings, automatic hyperparameter optimization would enhance usability for practitioners unfamiliar with time series characteristics. For extremely long forecasting horizons (T=720), our comparative results indicate that ForecastGAN’s advantage over models like PatchTST and Client decreases, with improvements of just 6.18% on ETTh1 and occasionally being outperformed on ETTh2. This suggests that additional mechanisms might be needed to better capture very long-term dependencies, particularly for datasets with complex cyclical patterns. Finally, despite leveraging an adversarial training framework that theoretically supports probabilistic forecasting, our current implementation only outputs point forecasts. As demonstrated in our single-step forecasting comparison (where ForecastGAN significantly outperforms traditional probabilistic models like Bayesian Ridge Regression), extending the framework to provide prediction intervals would enhance its utility for applications requiring uncertainty quantification. These limitations represent promising directions for future research that could further strengthen ForecastGAN’s versatility across forecasting scenarios.

7 Conclusion

This paper introduced ForecastGAN, a novel decomposition-based adversarial framework for multi-horizon time series forecasting. By integrating time series decomposition, model selection, and adversarial training into a cohesive modular architecture, we’ve developed a solution that addresses key limitations in existing approaches while maintaining strong performance across diverse forecasting scenarios. Our experimental evaluation across eleven benchmark datasets demonstrates ForecastGAN’s versatility and effectiveness. For short-term forecasting, ForecastGAN achieves an average improvement of 7.90% over the best competing models, with particularly strong results on the Electricity dataset (26.21% improvement at 48-hour horizon) and Weather dataset (15.18% improvement). For long-term forecasting, ForecastGAN maintains competitive performance against sophisticated transformer-based architectures, outperforming them on most datasets while using significantly fewer computational resources. In single-step forecasting scenarios, ForecastGAN consistently outperforms traditional machine learning approaches including gradient boosting methods and linear models across all nine evaluation datasets.

The modular architecture of ForecastGAN offers several key advantages. Our Model Selection Module confirms that different architectures excel in different contexts, with DELinear typically selected for datasets with evident trend patterns (ETTh1, Exchange) and DLinear chosen for those with strong seasonal components (ETTm1, ETTm2). The Decomposition Module provides substantial benefits by isolating predictable patterns, as evidenced by the superior performance of decomposition-based variants in our comparative analysis. The Adversarial Training Module enhances forecasting accuracy by improving model robustness to data variability, particularly valuable for volatile datasets like Exchange Rate.

ForecastGAN’s parameter-efficient design (fewer than 200,000 parameters) delivers superior or competitive performance compared to transformer models like Informer, Crossformer, and PatchTST, which contain millions of parameters. This efficiency translates to practical advantages: 2.5-4.3× faster training times, significantly lower memory requirements (2.3 GB vs. an average of 4.9 GB), and faster inference speeds. Our sensitivity analysis provides practical guidelines for hyperparameter selection, demonstrating that optimal look-back window sizes generally relate proportionally to forecasting horizons. Unlike many existing approaches, ForecastGAN effectively integrates both numerical and categorical features, enhancing its applicability to real-world datasets with mixed data types. This capability, combined with its computational efficiency, makes ForecastGAN particularly valuable for practical applications across domains—from financial forecasting and energy management to supply chain optimization and healthcare resource planning.

Future research directions include expanding the Model Selection Module to incorporate more diverse architectures, developing adaptive techniques for automatic look-back window optimization, implementing more sophisticated cross-dimensional embedding methods, extending the framework to provide uncertainty quantification through prediction intervals, and enhancing interpretability through visualization techniques. Online learning extensions could further adapt the framework for real-time applications where models must continuously update as new data becomes available. ForecastGAN represents a significant advancement in time series forecasting by combining complementary approaches into a cohesive framework that adapts to diverse forecasting scenarios. By addressing the limitations of existing methods while maintaining computational efficiency, it provides a versatile foundation for both current applications and future extensions in the rapidly evolving field of time series forecasting.

Data Availability Statement

All data supporting the findings of this study are available within the paper and its Supplementary Information and cited where applicable throughout the manuscript.

Ethical and Informed Consent Statement

This research did not involve any human participants or animals, and therefore, ethical approval and informed consent were not applicable. All data used in this study were obtained from publicly available sources or generated through computational methods, ensuring no ethical considerations were compromised. The authors confirm that all data, methods, and procedures adhered to the relevant guidelines and regulations of the scientific community and the journal’s ethical standards.

Conflict of Interest

The authors declare no conflicts of interest.

Appendix A Applications of GANs

Table 5: List of famous GAN Architectures and their applications

Application	GAN Architecture	Dataset	References
Anomaly detection	AdaBalGAN	SET50	[48]
	ATR-GAN	in-house	[49]
	CGAN(ResNet)+PixelGAN	in-house	[50]
	DCGAN	SWaT	[51]
	DCGAN+CGAN	ECG	[52]
	GAN	in-house	[53]
	GAN	SET50	[54]
	GAN	in-house	[55]
	GAN(AE)	in-house	[56]
	GAN+AE	taxi data	[57]
	TAnoGAN	in-house	[58]
	VAE-RaPP+FenceGAN	in-house	[59]
	WGAN+encoder	SET50	[60]
Data augmentation	GAN	in-house	[49]
Data generation	AC-GAN	in-house	[61]
	GAN(Q-NET)	2019	[62]
Image processing	3D-JointGAN	SWaT	[63]
	CGAN	ECG	[64]
	CGAN	EHRs	[65]
	GAN+AE+PatchGAN	NAF	[66]
	GAN+AE-SNN	in-house	[67]
	IEGAN	NAF	[68]
	MSG-GAN	synthetic data	[69]
Predictions	AR-SAGAN	in-house	[70]
	CGAN+pix2pix	MNIST	[71]
	GAN+Ensemble ML	electricity data	[72]
	GAN+AE+AD	market data	[73]
	LSTM+GAN	stock prices	[74]
	SinGAN+LSTM	in-house	[75]
	StackGAN	in-house	[76]
Security analysis	CGAN	phishing data	[77]

Appendix B Test-Train Distributions

There are 11 multivariate time series models used in this paper for ForecastGAN evaluation. The train-test split is done to take the test set from the most recent values to prevent temporal information leakage for model training. The data distributions are presented in Figure 5.

Appendix C Long-term Forecasting Comparison with Transformer Models

The comparison of ForecastGAN has been given with some more transformer-based models including FEDformer[78], Autoformer [25] and PatchTST [79].

Table 6: Performance comparison of some transformer-based models for long-term forecasting

Dataset	H	ForecastGAN		FEDformer		Autoformer		PatchTST
Dataset	H	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE
	96	0.071	0.196	0.278	0.323	0.197	0.847	0.311	0.965
	192	0.138	0.288	0.380	0.369	0.300	1.204	0.219	0.654
	336	0.281	0.397	0.500	0.524	0.509	1.672	0.365	0.987
	720	0.625	0.716	0.841	0.941	1.447	2.478	0.765	1.090
Electricity	96	0.121	0.210	0.193	0.308	0.201	0.317	0.129	0.222
	192	0.138	0.141	0.315	0.334	0.222	0.296	0.147	0.240
	336	0.151	0.243	0.329	0.338	0.231	0.300	0.163	0.259
	720	0.191	0.299	0.355	0.361	0.254	0.373	0.197	0.290
ETTh1	96	0.338	0.390	0.376	0.419	0.449	0.459	0.370	0.400
	192	0.373	0.405	0.420	0.448	0.500	0.482	0.413	0.429
	336	0.391	0.410	0.459	0.465	0.521	0.496	0.422	0.440
	720	0.421	0.443	0.506	0.507	0.514	0.512	0.447	0.468
ETTh2	96	0.251	0.329	0.346	0.388	0.358	0.397	0.274	0.337
	192	0.312	0.348	0.429	0.439	0.456	0.452	0.341	0.382
	336	0.340	0.398	0.496	0.487	0.482	0.486	0.329	0.384
	720	0.391	0.436	0.463	0.474	0.515	0.511	0.379	0.422
ETTm1	96	0.116	0.286	0.379	0.419	0.505	0.475	0.293	0.346
	192	0.302	0.343	0.426	0.441	0.553	0.496	0.333	0.370
	336	0.341	0.374	0.445	0.459	0.621	0.537	0.369	0.392
	720	0.389	0.402	0.543	0.490	0.671	0.561	0.416	0.420
ETTm2	96	0.142	0.228	0.203	0.287	0.255	0.339	0.166	0.256
	192	0.194	0.251	0.269	0.328	0.281	0.340	0.223	0.296
	336	0.242	0.298	0.325	0.366	0.339	0.372	0.274	0.329
	720	0.329	0.348	0.421	0.415	0.433	0.432	0.362	0.385
Traffic	96	0.356	0.257	0.587	0.366	0.613	0.388	0.360	0.249
	192	0.395	0.285	0.373	0.616	0.382	0.696	0.379	0.256
	336	0.401	0.293	0.621	0.383	0.622	0.337	0.392	0.264
	720	0.428	0.301	0.626	0.382	0.660	0.408	0.432	0.286
Weather	96	0.145	0.198	0.217	0.296	0.266	0.336	0.149	0.198
	192	0.178	0.216	0.276	0.336	0.307	0.367	0.194	0.241
	336	0.218	0.268	0.339	0.380	0.359	0.395	0.245	0.282
	720	0.281	0.311	0.403	0.428	0.419	0.428	0.314	0.334
Illness	24	1.320	0.854	3.228	1.260	3.483	1.287	1.319	0.754
	36	1.521	0.857	2.679	1.080	3.103	1.148	1.579	0.870
	48	1.640	0.878	2.622	1.078	2.669	1.085	1.553	0.815
	60	1.430	0.900	2.857	1.157	2.770	1.125	1.470	0.788

Appendix D Short-term Forecasting Comparison with Transformer Models

The detailed comparison of ForecastGAN with transformer models is presented in Table 7. The look-back window $S$ is kept at 12 for all datasets except for the illness dataset with $S=2$ .

Table 7: Performance comparison of different models for short-term forecasting

Methods		ForecastGAN*		FEDformer		Autoformer		Informer		Crossformer		PatchTST		Client
Data	H	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE
Exchange	12	0.196	0.071	0.46	0.415	0.984	0.334	0.984	0.889	1.598	0.425	1.102	0.597	1.735	0.734
	24	0.288	0.138	0.506	0.517	1.341	0.437	1.341	1.032	1.432	0.395	0.791	0.643	1.569	0.780
	32	0.397	0.281	0.661	0.637	1.809	0.646	1.809	1.173	1.389	0.386	1.124	0.798	1.526	0.935
	48	0.716	0.625	1.078	0.978	2.615	1.584	2.615	1.447	1.31	0.369	1.227	1.215	1.447	1.352
Electricity	12	0.21	0.121	0.464	0.349	0.473	0.357	0.43	0.524	0.459	0.412	0.378	0.620	0.615	0.776
	24	0.141	0.138	0.49	0.471	0.452	0.378	0.452	0.542	0.45	0.382	0.396	0.646	0.606	0.802
	32	0.243	0.151	0.494	0.485	0.456	0.387	0.456	0.550	0.423	0.373	0.415	0.650	0.579	0.806
	48	0.299	0.191	0.517	0.511	0.529	0.410	0.529	0.595	0.439	0.356	0.446	0.673	0.595	0.829
ETTh1	12	0.279	0.332	0.536	0.493	0.576	0.566	0.982	0.830	0.562	0.509	0.517	0.653	0.679	0.770
	24	0.27	0.310	0.565	0.537	0.599	0.617	1.125	0.909	0.532	0.500	0.546	0.682	0.649	0.799
	32	0.243	0.302	0.582	0.576	0.613	0.638	1.224	0.926	0.523	0.473	0.557	0.699	0.64	0.816
	48	0.251	0.340	0.624	0.623	0.629	0.631	1.298	0.982	0.506	0.481	0.585	0.741	0.623	0.858
ETTm1	12	0.157	0.167	0.528	0.488	0.584	0.614	0.781	0.680	0.404	0.387	0.455	0.637	0.513	0.746
	24	0.168	0.175	0.55	0.535	0.605	0.662	0.904	0.778	0.413	0.398	0.479	0.659	0.522	0.768
	32	0.26	0.354	0.568	0.554	0.646	0.730	1.321	0.980	0.584	0.490	0.501	0.677	0.693	0.786
	48	0.293	0.387	0.599	0.652	0.67	0.780	1.275	0.932	0.617	0.523	0.529	0.708	0.726	0.817
Traffic	12	0.257	0.356	0.468	0.727	0.49	0.753	0.859	0.493	0.987	0.674	0.351	0.570	1.127	0.672
	24	0.285	0.395	0.718	0.513	0.798	0.522	0.519	0.969	0.753	0.446	0.358	0.820	0.893	0.922
	32	0.293	0.401	0.485	0.761	0.439	0.762	0.917	0.522	0.699	0.432	0.366	0.587	0.839	0.689
	48	0.301	0.428	0.484	0.766	0.51	0.800	1.004	0.574	0.601	0.398	0.388	0.586	0.741	0.688
Weather	12	0.198	0.145	0.399	0.320	0.439	0.369	0.403	0.487	0.722	0.669	0.301	0.502	0.825	0.605
	24	0.216	0.178	0.439	0.379	0.47	0.410	0.701	0.647	0.692	0.660	0.344	0.542	0.795	0.645
	32	0.268	0.218	0.483	0.442	0.498	0.462	0.681	0.626	0.683	0.633	0.385	0.586	0.786	0.689
	48	0.311	0.281	0.531	0.506	0.531	0.522	1.162	0.844	0.666	0.641	0.437	0.634	0.769	0.737
illness	2	0.854	1.661	1.497	3.465	1.524	3.720	6.001	1.914	2.161	1.086	0.991	1.734	2.398	1.971
	6	0.857	1.692	1.317	2.916	1.385	3.340	4.992	1.704	2.307	1.097	1.107	1.554	2.544	1.791
	8	0.878	1.721	1.315	2.859	1.322	2.906	5	1.706	2.389	1.106	1.052	1.552	2.626	1.789
	10	0.9	1.803	1.394	3.094	1.362	3.007	5.501	1.801	2.431	1.180	1.025	1.631	2.668	1.868

References

[1] D. Miller, J.-M. Kim, Univariate and multivariate machine learning forecasting models on the price returns of cryptocurrencies, Journal of Risk and Financial Management 14 (10) (2021) 486. doi:10.3390/jrfm14100486.
URL http://dx.doi.org/10.3390/jrfm14100486
[2] P. Hewage, M. Trovati, E. Pereira, A. Behera, Deep learning-based effective fine-grained weather forecasting model, Pattern Analysis and Applications 24 (1) (2020) 343–366. doi:10.1007/s10044-020-00898-1.
URL http://dx.doi.org/10.1007/s10044-020-00898-1
[3] V. Punyapornwithaya, O. Arjkumpa, N. Buamithup, N. Kuatako, K. Klaharn, C. Sansamur, K. Jampachaisri, Forecasting of daily new lumpy skin disease cases in thailand at different stages of the epidemic using fuzzy logic time series, nnar, and arima methods, Preventive Veterinary Medicine 217 (2023) 105964. doi:10.1016/j.prevetmed.2023.105964.
URL http://dx.doi.org/10.1016/j.prevetmed.2023.105964
[4] H. Abbasimehr, R. Paki, A. Bahrini, A novel xgboost-based featurization approach to forecast renewable energy consumption with deep learning models, Sustainable Computing: Informatics and Systems 38 (2023) 100863. doi:10.1016/j.suscom.2023.100863.
URL http://dx.doi.org/10.1016/j.suscom.2023.100863
[5] B. Goehry, H. Yan, Y. Goude, P. Massart, J.-M. Poggi, Random forests for time series, REVSTAT-Statistical Journal (2023) Vol. 21 No. 2 (2023): REVSTAT–Statistical Journaldoi:10.57805/REVSTAT.V21I2.400.
[6] A. Kurani, P. Doshi, A. Vakharia, M. Shah, A comprehensive comparative study of artificial neural network (ann) and support vector machines (svm) on stock forecasting, Annals of Data Science 10 (1) (2021) 183–208. doi:10.1007/s40745-021-00344-x.
URL http://dx.doi.org/10.1007/s40745-021-00344-x
[7] H. V. Dudukcu, M. Taskiran, Z. G. Cam Taskiran, T. Yildirim, Temporal convolutional networks with rnn approach for chaotic time series prediction, Applied Soft Computing 133 (2023) 109945. doi:10.1016/j.asoc.2022.109945.
URL http://dx.doi.org/10.1016/j.asoc.2022.109945
[8] Z. Zeng, R. Kaur, S. Siddagangappa, S. Rahimi, T. Balch, M. Veloso, Financial time series forecasting using cnn and transformer (2023). doi:10.48550/ARXIV.2304.04912.
URL https://arxiv.org/abs/2304.04912
[9] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, Advances in neural information processing systems 27 (2014).
[10] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, Advances in neural information processing systems 30 (2017).
[11] A. Zeng, M. Chen, L. Zhang, Q. Xu, Are transformers effective for time series forecasting?, in: Proceedings of the AAAI conference on artificial intelligence, Vol. 37, 2023, pp. 11121–11128.
[12] S. S. W. Fatima, A. Rahimi, K. Hayat, Comparison of prophet with machine learning regression models for long and short-term manufacturing forecasting, in: 2023 28th International Conference on Automation and Computing (ICAC), IEEE, 2023, p. 1. doi:10.1109/icac57885.2023.10275201.
[13] R. J. Hyndman, G. Athanasopoulos, Forecasting: principles and practice, 3rd Edition, OTexts, Australia, 2021.
[14] E. S. Gardner, Exponential smoothing: The state of the art—part ii, International Journal of Forecasting 22 (4) (2006) 637–666. doi:10.1016/j.ijforecast.2006.03.005.
URL http://dx.doi.org/10.1016/j.ijforecast.2006.03.005
[15] R. Mahaseth, N. Kumar, A. Aggarwal, A. Tayal, A. Kumar, R. Gupta, Short term wind power forecasting using k-nearest neighbour (knn), Journal of Information and Optimization Sciences 43 (1) (2022) 251–259. doi:10.1080/02522667.2022.2042093.
URL http://dx.doi.org/10.1080/02522667.2022.2042093
[16] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015) 436–444. doi:10.1038/nature14539.
URL http://dx.doi.org/10.1038/nature14539
[17] Y. Kamarianakis, W. Shen, L. Wynter, Real‐time road traffic forecasting using regime‐switching space‐time models and adaptive lasso, Applied Stochastic Models in Business and Industry 28 (4) (2012) 297–315. doi:10.1002/asmb.1937.
URL http://dx.doi.org/10.1002/asmb.1937
[18] S. S. W. Fatima, A. Rahimi, A review of time-series forecasting algorithms for industrial manufacturing systems, Machines 12 (6) (2024) 380. doi:10.3390/machines12060380.
URL http://dx.doi.org/10.3390/machines12060380
[19] M. Wiese, R. Knobloch, R. Korn, P. Kretschmer, Quant gans: deep generation of financial time series, Quantitative Finance 20 (9) (2020) 1419–1440. doi:10.1080/14697688.2020.1730426.
URL http://dx.doi.org/10.1080/14697688.2020.1730426
[20] A. Koochali, P. Schichtel, A. Dengel, S. Ahmed, Probabilistic forecasting of sensory data with generative adversarial networks – forgan, IEEE Access 7 (2019) 63868–63880. doi:10.1109/ACCESS.2019.2915544.
[21] K. Zhang, G. Zhong, J. Dong, S. Wang, Y. Wang, Stock market prediction based on generative adversarial network, Procedia Computer Science 147 (2019) 400–406. doi:10.1016/j.procs.2019.01.256.
URL http://dx.doi.org/10.1016/j.procs.2019.01.256
[22] X. Zhou, Z. Pan, G. Hu, S. Tang, C. Zhao, Stock market prediction on high-frequency data using generative adversarial nets, Mathematical Problems in Engineering 2018 (2018) 1–11. doi:10.1155/2018/4907423.
URL http://dx.doi.org/10.1155/2018/4907423
[23] Y. Lin, X. Dai, L. Li, F.-Y. Wang, Pattern sensitive prediction of traffic flow based on generative adversarial framework, IEEE Transactions on Intelligent Transportation Systems 20 (6) (2019) 2395–2400. doi:10.1109/TITS.2018.2857224.
[24] A. C. Harvey, S. Peters, Estimation procedures for structural time series models, Journal of Forecasting 9 (2) (1990) 89–108. doi:10.1002/for.3980090203.
URL http://dx.doi.org/10.1002/for.3980090203
[25] H. Wu, J. Xu, J. Wang, M. Long, Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting, Advances in neural information processing systems 34 (2021) 22419–22430.
[26] S. J. Taylor, B. Letham, Forecasting at scale, The American Statistician 72 (1) (2018) 37–45.
[27] K. Benidis, S. S. Rangapuram, V. Flunkert, B. Wang, D. C. Maddix, A. C. Türkmen, J. Gasthaus, M. Bohlke-Schneider, D. Salinas, L. Stella, L. Callot, T. Januschowski, Neural forecasting: Introduction and literature overview, CoRR abs/2004.10240 (2020). arXiv:2004.10240.
URL https://arxiv.org/abs/2004.10240
[28] B. N. Oreshkin, D. Carpov, N. Chapados, Y. Bengio, N-beats: Neural basis expansion analysis for interpretable time series forecasting (2019). doi:10.48550/ARXIV.1905.10437.
URL https://arxiv.org/abs/1905.10437
[29] R. Sen, H.-F. Yu, I. Dhillon, Think globally, act locally: A deep neural network approach to high-dimensional time series forecasting (2019). doi:10.48550/ARXIV.1905.03806.
URL https://arxiv.org/abs/1905.03806
[30] M. Liu, A. Zeng, M. Chen, Z. Xu, Q. Lai, L. Ma, Q. Xu, Scinet: Time series modeling and forecasting with sample convolution and interaction (2021). doi:10.48550/ARXIV.2106.09305.
URL https://arxiv.org/abs/2106.09305
[31] T. M. Nguyen, T. M. Nguyen, D. D. Le, D. K. Nguyen, V.-A. Tran, R. Baraniuk, N. Ho, S. Osher, Improving transformers with probabilistic attention keys, in: International Conference on Machine Learning, PMLR, 2022, pp. 16595–16621.
[32] M. LIU, A. Zeng, M. Chen, Z. Xu, Q. LAI, L. Ma, Q. Xu, Scinet: Time series modeling and forecasting with sample convolution and interaction, in: S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh (Eds.), Advances in Neural Information Processing Systems, Vol. 35, Curran Associates, Inc., 2022, pp. 5816–5828.
[33] W. Zhao, S. Alwidian, Q. H. Mahmoud, Adversarial training methods for deep learning: A systematic review, Algorithms 15 (8) (2022) 283. doi:10.3390/a15080283.
URL http://dx.doi.org/10.3390/a15080283
[34] A. Koochali, A. Dengel, S. Ahmed, If you like it, gan it—probabilistic multivariate times series forecast with gan, in: The 7th International Conference on Time Series and Forecasting, ITISE 2021, MDPI, 2021, p. 40. doi:10.3390/engproc2021005040.
[35] M. Terzi, A. Achille, M. Maggipinto, G. A. Susto, Adversarial training reduces information and improves transferability, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, 2021, pp. 2674–2682.
[36] H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, W. Zhang, Informer: Beyond efficient transformer for long sequence time-series forecasting, Proc. Conf. AAAI Artif. Intell. 35 (12) (2021) 11106–11115.
[37] A. Trindade, Productivity Prediction of Garment Employees, UCI Machine Learning Repository, DOI: https://doi.org/10.24432/C51S6D (2020).
[38] A. Trindade, ElectricityLoadDiagrams20112014, UCI Machine Learning Repository, DOI: https://doi.org/10.24432/C58C86 (2015).
[39] CDC, National, regional, and state level outpatient illness and viral surveillance (2024).
URL gis.cdc.gov/grasp/fluview/fluportaldashboard.html
[40] G. Lai, Multivariate-time-series-data, Github (2017).
URL github.com/laiguokun/multivariate-time-series-data
[41] O. Kolle, Weather station saaleaue (2024).
URL https://www.bgc-jena.mpg.de/wetter/
[42] UCI, Cnnpred: Cnn-based stock market prediction using a diverse set of variables, UCI Machine Learning Repository (2019). doi:10.24432/C55P70.
URL https://archive.ics.uci.edu/dataset/554
[43] Y. Yu, R. Ma, Z. Ma, Robformer: A robust decomposition transformer for long-term time series forecasting, Pattern Recognition 153 (2024) 110552. doi:https://doi.org/10.1016/j.patcog.2024.110552.
URL https://www.sciencedirect.com/science/article/pii/S0031320324003030
[44] Y. Wang, H. Wu, J. Dong, G. Qin, H. Zhang, Y. Liu, Y. Qiu, J. Wang, M. Long, Timexer: Empowering transformers for time series forecasting with exogenous variables (2024). arXiv:2402.19072.
URL https://arxiv.org/abs/2402.19072
[45] Y. Zhang, J. Yan, Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting, in: The eleventh international conference on learning representations, 2022, p. 1.
[46] P. Chen, Y. Zhang, Y. Cheng, Y. Shu, Y. Wang, Q. Wen, B. Yang, C. Guo, Pathformer: Multi-scale transformers with adaptive pathways for time series forecasting (2024). arXiv:2402.05956.
URL https://arxiv.org/abs/2402.05956
[47] J. Gao, W. Hu, Y. Chen, Client: Cross-variable linear integrated enhanced transformer for multivariate long-term time series forecasting (2023). doi:10.48550/ARXIV.2305.18838.
URL https://arxiv.org/abs/2305.18838
[48] J. Wang, Z. Yang, J. Zhang, Q. Zhang, W.-T. K. Chien, Adabalgan: An improved generative adversarial network with imbalanced learning for wafer defective pattern recognition, IEEE Transactions on Semiconductor Manufacturing 32 (3) (2019) 310–319. doi:10.1109/TSM.2019.2925361.
[49] Y. Li, Z. Shi, C. Liu, W. Tian, Z. Kong, C. B. Williams, Augmented time regularized generative adversarial network (atr-gan) for data augmentation in online process anomaly detection, IEEE Transactions on Automation Science and Engineering 19 (4) (2022) 3338–3355. doi:10.1109/TASE.2021.3118635.
[50] J. Kim, Y. Nam, M.-C. Kang, K. Kim, J. Hong, S. Lee, D.-N. Kim, Adversarial defect detection in semiconductor manufacturing process, IEEE Transactions on Semiconductor Manufacturing 34 (3) (2021) 365–371. doi:10.1109/TSM.2021.3089869.
[51] H. Gao, Y. Zhang, W. Lv, J. Yin, T. Qasim, D. Wang, A deep convolutional generative adversarial networks-based method for defect detection in small sample industrial parts images, Applied Sciences 12 (13) (2022) 6569. doi:10.3390/app12136569.
URL http://dx.doi.org/10.3390/app12136569
[52] J. Luo, J. Huang, H. Li, A case study of conditional deep convolutional generative adversarial networks in machine fault diagnosis, Journal of Intelligent Manufacturing 32 (2) (2020) 407–425. doi:10.1007/s10845-020-01579-w.
URL http://dx.doi.org/10.1007/s10845-020-01579-w
[53] J. D. Mumbelli, G. A. Guarneri, Y. K. Lopes, D. Casanova, M. Teixeira, An application of generative adversarial networks to improve automatic inspection in automotive manufacturing, Applied Soft Computing 136 (2023) 110105. doi:10.1016/j.asoc.2023.110105.
URL http://dx.doi.org/10.1016/j.asoc.2023.110105
[54] S. Sun, X. Hu, Y. Liu, An imbalanced data learning method for tool breakage detection based on generative adversarial networks, Journal of Intelligent Manufacturing 33 (8) (2021) 2441–2455. doi:10.1007/s10845-021-01806-y.
URL http://dx.doi.org/10.1007/s10845-021-01806-y
[55] C. Cooper, J. Zhang, R. X. Gao, P. Wang, I. Ragai, Anomaly detection in milling tools using acoustic signals and generative adversarial networks, Procedia Manufacturing 48 (2020) 372–378. doi:10.1016/j.promfg.2020.05.059.
URL http://dx.doi.org/10.1016/j.promfg.2020.05.059
[56] J. Dai, J. Wang, W. Huang, J. Shi, Z. Zhu, Machinery health monitoring based on unsupervised feature learning via generative adversarial networks, IEEE/ASME Transactions on Mechatronics 25 (5) (2020) 2252–2263. doi:10.1109/TMECH.2020.3012179.
[57] M. Hoh, A. Schöttl, H. Schaub, F. Wenninger, A generative model for anomaly detection in time series data, Procedia Computer Science 200 (2022) 629–637. doi:10.1016/j.procs.2022.01.261.
URL http://dx.doi.org/10.1016/j.procs.2022.01.261
[58] M. A. Bashar, R. Nayak, Tanogan: Time series anomaly detection with generative adversarial networks, in: 2020 IEEE Symposium Series on Computational Intelligence (SSCI), 2020, pp. 1778–1785. doi:10.1109/SSCI47803.2020.9308512.
[59] J. Song, Y. C. Lee, J. Lee, Deep generative model with time series-image encoding for manufacturing fault detection in die casting process, Journal of Intelligent Manufacturing 34 (7) (2022) 3001–3014. doi:10.1007/s10845-022-01981-6.
URL http://dx.doi.org/10.1007/s10845-022-01981-6
[60] J. Balzategui, L. Eciolaza, D. Maestro-Watson, Anomaly detection and automatic labeling for solar cell quality inspection based on generative adversarial network, Sensors 21 (13) (2021) 4361. doi:10.3390/s21134361.
URL http://dx.doi.org/10.3390/s21134361
[61] M. Bazarbaev, T. Chuluunsaikhan, H. Oh, G.-A. Ryu, A. Nasridinov, K.-H. Yoo, Generation of time-series working patterns for manufacturing high-quality products through auxiliary classifier generative adversarial network, Sensors 22 (1) (2021) 29. doi:10.3390/s22010029.
URL http://dx.doi.org/10.3390/s22010029
[62] Y. Wang, K. Li, S. Gan, C. Cameron, M. Zheng, Data augmentation for intelligent manufacturing with generative adversarial framework, in: 2019 1st International Conference on Industrial Artificial Intelligence (IAI), 2019, pp. 1–6. doi:10.1109/ICIAI.2019.8850773.
[63] W. Du, Z. Xia, L. Han, B. Gao, 3d solid model generation method based on a generative adversarial network, Applied Intelligence 53 (13) (2022) 17035–17060. doi:10.1007/s10489-022-04381-8.
URL http://dx.doi.org/10.1007/s10489-022-04381-8
[64] W. Ye, M. B. Alawieh, Y. Lin, D. Z. Pan, Lithogan: End-to-end lithography modeling with generative adversarial networks, in: Proceedings of the 56th Annual Design Automation Conference 2019, DAC ’19, ACM, 2019, p. 1. doi:10.1145/3316781.3317852.
URL http://dx.doi.org/10.1145/3316781.3317852
[65] C. Gobert, E. Arrieta, A. Belmontes, R. B. Wicker, F. Medina, B. McWilliams, Conditional generative adversarial networks for in-situ layerwise additive manufacturing data (2019). doi:10.26153/TSW/17250.
URL https://repositories.lib.utexas.edu/handle/2152/90329
[66] Q. Wang, R. Yang, C. Wu, Y. Liu, An effective defect detection method based on improved generative adversarial networks (igan) for machined surfaces, Journal of Manufacturing Processes 65 (2021) 373–381. doi:10.1016/j.jmapro.2021.03.053.
URL http://dx.doi.org/10.1016/j.jmapro.2021.03.053
[67] X. Yan, S. Melkote, Automated manufacturability analysis and machining process selection using deep generative model and siamese neural networks, Journal of Manufacturing Systems 67 (2023) 57–67. doi:https://doi.org/10.1016/j.jmsy.2023.01.006.
[68] W. Liu, Z. Wang, L. Tian, S. Lauria, X. Liu, Melt pool segmentation for additive manufacturing: A generative adversarial network approach, Computers and Electrical Engineering 92 (2021) 107183. doi:https://doi.org/10.1016/j.compeleceng.2021.107183.
[69] M. Greminger, Generative Adversarial Networks With Synthetic Training Data for Enforcing Manufacturing Constraints on Topology Optimization, in: International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, Vol. Volume 11A: 46th Design Automation Conference (DAC), 2020, p. V11AT11A005. doi:10.1115/DETC2020-22399.
URL https://doi.org/10.1115/DETC2020-22399
[70] S. Ji, J. Zhu, Y. Yang, H. Zhang, Z. Zhang, Z. Xia, Z. Zhang, Self-attention-augmented generative adversarial networks for data-driven modeling of nanoscale coating manufacturing, Micromachines 13 (6) (2022) 847. doi:10.3390/mi13060847.
URL http://dx.doi.org/10.3390/mi13060847
[71] J. P. Killgore, T. J. Kolibaba, B. W. Caplins, C. I. Higgins, J. D. Rezac, A data‐driven approach to complex voxel predictions in grayscale digital light processing additive manufacturing using u‐nets and generative adversarial networks, Small 19 (50) (Jul. 2023). doi:10.1002/smll.202301987.
URL http://dx.doi.org/10.1002/smll.202301987
[72] D. Wu, K. Hur, Z. Xiao, A gan-enhanced ensemble model for energy consumption forecasting in large commercial buildings, IEEE Access 9 (2021) 158820–158830. doi:10.1109/ACCESS.2021.3131185.
[73] G. Wang, A. Ledwoch, R. M. Hasani, R. Grosu, A. Brintrup, A generative neural network model for the quality prediction of work in progress products, Applied Soft Computing 85 (2019) 105683. doi:10.1016/j.asoc.2019.105683.
URL http://dx.doi.org/10.1016/j.asoc.2019.105683
[74] C. Liu, D. Tang, H. Zhu, Q. Nie, A novel predictive maintenance method based on deep adversarial learning in the intelligent manufacturing system, IEEE Access 9 (2021) 49557–49575. doi:10.1109/ACCESS.2021.3069256.
[75] M. Shah, V. Vakharia, R. Chaudhari, J. Vora, D. Y. Pimenov, K. Giasin, Tool wear prediction in face milling of stainless steel using singular generative adversarial network and lstm deep learning models, The International Journal of Advanced Manufacturing Technology 121 (1–2) (2022) 723–736. doi:10.1007/s00170-022-09356-0.
URL http://dx.doi.org/10.1007/s00170-022-09356-0
[76] A. Harfoush, A. Tabei, K. R. Haapala, I. Ghamarian, A framework for predicting grain morphology during incremental sheet metal forming using generative adversarial networks, Manufacturing Letters 35 (2023) 1081–1088, 51st SME North American Manufacturing Research Conference (NAMRC 51). doi:https://doi.org/10.1016/j.mfglet.2023.08.083.
[77] S. R. Chhetri, A. B. Lopez, J. Wan, M. A. Al Faruque, Gan-sec: Generative adversarial network modeling for the security analysis of cyber-physical production systems, in: 2019 Design, Automation and Test in Europe Conference and Exhibition (DATE), 2019, pp. 770–775. doi:10.23919/DATE.2019.8715283.
[78] T. Zhou, Z. Ma, Q. Wen, X. Wang, L. Sun, R. Jin, Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting, in: International conference on machine learning, PMLR, 2022, pp. 27268–27286.
[79] Y. Nie, N. H. Nguyen, P. Sinthong, J. Kalagnanam, A time series is worth 64 words: Long-term forecasting with transformers (2022). doi:10.48550/ARXIV.2211.14730.
URL https://arxiv.org/abs/2211.14730