ForecastGAN: A Decomposition-Based Adversarial Framework for Multi-Horizon Time Series Forecasting

Syeda Sitara Wishal Fatima [email protected] Afshin Rahimi
Abstract

Time series forecasting is essential across domains from finance to supply chain management. This paper introduces ForecastGAN, a novel decomposition based adversarial framework addressing limitations in existing approaches for multi-horizon predictions. Although transformer models excel in long-term forecasting, they often underperform in short-term scenarios and typically ignore categorical features. ForecastGAN operates through three integrated modules: a Decomposition Module that extracts seasonality and trend components; a Model Selection Module that identifies optimal neural network configurations based on forecasting horizon; and an Adversarial Training Module that enhances prediction robustness through Conditional Generative Adversarial Network training. Unlike conventional approaches, ForecastGAN effectively integrates both numerical and categorical features. We validate our framework on eleven benchmark multivariate time series datasets that span various forecasting horizons. The results show that ForecastGAN consistently outperforms state-of-the-art transformer models for short-term forecasting while remaining competitive for long-term horizons. This research establishes a more generalizable approach to time series forecasting that adapts to specific contexts while maintaining strong performance across diverse data characteristics without extensive hyperparameter tuning.

keywords:
Time series forecasting , Generative adversarial networks , Time series decomposition , Multi-horizon prediction
\affiliation

[1]organization=Department of Mechanical, Automotive and Materials Engineering,addressline=401 Sunset Ave, city=Windsor, postcode=N9B 3P4, state=ON, country=Canada

1 Introduction

Time series data is omnipresent in today’s digital and data-abundant world. Time series forecasting serves as a critical tool in numerous applications that involve both univariate data (e.g., daily stock prices [1]) and multivariate data (e.g. temperature, humidity and wind speed for weather forecasting [2]). The versatility of time series forecasting in handling these diverse data types underscores its significance in modern analytics.

Over the past several decades, time series forecasting has evolved significantly. Traditional statistical models such as Autoregressive Integrated Moving Average (ARIMA) [3] initially dominated the field. These were followed by classical machine learning techniques including Gradient Boosting [4], Random Forest [5], and Support Vector Machines [6]. The emergence of artificial intelligence further transformed forecasting capabilities through advanced architectures like Recurrent Neural Networks [7] and Convolutional Neural Networks [8], which capture complex non-linear patterns in time series data. More recently, two significant developments have shaped the field: (1) the introduction of Generative Adversarial Network (GAN) s [9], which enable more robust adversarial training approaches, and (2) Transformer architectures [10], which have demonstrated remarkable capabilities in sequence modeling across multiple domains.

Despite these advances, current models exhibit domain-specific performance characteristics that limit their generalizability. For example, transformer based models excel in long-term forecasting, but often underperform in short-term scenarios [11]. This performance discrepancy was highlighted in our previous comparative study [12], which revealed significant variability in model performance between different datasets and forecasting horizons. These limitations indicate a need for more adaptive forecasting frameworks that can leverage the strengths of existing state-of-the-art models while maintaining flexibility across diverse forecasting contexts. Additionally, most current approaches focus exclusively on numerical features, neglecting the valuable information contained in categorical variables that are common in real-world time series data.

We propose ForecastGAN, a novel modular framework that addresses these challenges through a systematic decomposition-based approach with adversarial training. Our architecture consists of three specialized, interconnected modules:

  1. 1.

    Decomposition Module: Extracts seasonal and trend components from numerical features while encoding categorical variables to maintain their information content.

  2. 2.

    Model Selection Module: Dynamically selects the optimal model architecture based on dataset characteristics and forecasting horizon.

  3. 3.

    Adversarial Training Module: Employs Conditional Generative Adversarial Network (cGAN) training to enhance the robustness and accuracy of predictions.

This modular design allows each component to be optimized independently while ensuring effective communication between modules. The framework maintains abstraction between various aspects of data processing, enabling more flexible adaptation to different forecasting scenarios. The architecture is presented in Figure  1.

Refer to caption
Figure 1: ForecastGAN architecture diagram (Decomposition module has the time series decomposition element, model selection module performs model selection on four of the available models, and adversarial training module is a cGAN model with a deterministically selected Generator explained in section 4.2, section 4.3 and 4.4 respectively)

The integration of these approaches is theoretically motivated by their complementary strengths. Time series decomposition isolates more predictable patterns (seasonality and trends), making the forecasting task more manageable. Model selection addresses the horizon-specific performance characteristics of different architectures. Finally, adversarial training transforms otherwise deterministic models into probabilistic ones, enhancing their robustness to data variability and uncertainty. From a mathematical perspective, cGAN learn the conditional probability distribution P(Xt+T|Xt,,X0)P(X_{t+T}|X_{t},...,X_{0}) of future values given historical data. This probabilistic approach better captures the inherent uncertainty in forecasting tasks compared to deterministic point estimates, particularly when dealing with complex multivariate time series. The contributions of this paper are:

  • A robust modular framework that delivers consistent performance across diverse forecasting horizons and datasets by separating the forecasting process into specialized functional components.

  • Empirical validation of adversarial training’s effectiveness in improving predictive accuracy for otherwise deterministic forecasting models.

  • New insights into the relationship between look-back window size, and forecasting horizon, with implications for future forecasting research.

  • Comprehensive evaluation across eleven benchmark datasets demonstrating an average 37.54% improvement over state-of-the-art transformer models for short-term forecasting while maintaining competitive performance for long-term horizons.

The remainder of this paper is organized as follows: Section 2 reviews related literature on time series forecasting methods. Section 3 provides theoretical background on the key concepts underlying our approach. Section 4 details the ForecastGAN architecture and its components. Section 5 describes our experimental methodology. Section 6 presents and discusses results. Finally, Section 7 concludes the paper and suggests directions for future research.

2 Related Work

The ForecastGAN architecture involves multiple concepts including time series decomposition, adversarial training, etc. We discuss the existing research to lay the foundation for model architecture. We start with discussing the model evolution for time series forecasting, followed by the applications of GANs for time series forecasting.

2.1 Time Series Forecasting Models

Traditional statistical models such as ARIMA and Exponential Smoothing (ES) are widely used for industrial time-series forecasting due to their simplicity and interpretability [13]. In some cases, these models demonstrate satisfactory performance but struggle with complex datasets that exhibit nonlinear features [14]. Machine learning techniques such as Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Artificial Neural Network (ANN) emerge as promising alternatives, offering improved performance in capturing complex relationships and nonlinearity in time series data. However, these models often require more computational resources and can be less interpretable in comparison to traditional models [15]. Deep learning models currently show superior performance in various industrial forecasting tasks, including Long Short-Term Memory (LSTM) networks and Convolutional Neural Networks (CNN) s. They excel by modeling long-term dependencies and handling high-dimensional data [16]. Nevertheless, these models can be computationally intensive and might need substantial training data for optimal performance. Hybrid models, combining different techniques, are proposed to overcome the limitations of individual models. By integrating traditional models with machine learning or deep learning techniques, hybrid models improve performance and adaptability in various industrial forecasting tasks [17]. The drawback of hybrid models is they are more accurate around particular use cases and are less likely to be effective around wider conditions. While GAN s hold potential in time-series forecasting, they face challenges like training difficulty and mode collapse. Applying GAN s in time-series forecasting remains an active research area, with ongoing development of new techniques and refinements to address these challenges. A detailed discussion on time series forecasting models and their comparison is presented in one of our earlier works [18]. Furthermore, the comparison of some State of the Art (SOTA) models for long-term and short-term forecasting has been explained in detail in another paper where we have explored the strengths of some models depending upon the forecasting horizon and the chaotic element in the training data [12].

2.2 GANs for Time Series Forecasting

The absence of a standardized evaluation framework for GAN s initially restricted their application to fields where their outputs could be visually interpreted, such as in image generation. However, the scope of GAN s has expanded recently to include time-series data, finding applications across diverse sectors, including healthcare, finance, and the energy sector [19]. For instance, GAN s combined with auto-regressive models have been explored for enhanced sequential data generation. Techniques such as conditioning GAN s on timestamp information have been developed to manage irregular sampling intervals. In probabilistic forecasting, conditional GAN s have been increasingly utilized. For example, Koochali et al. employed a conditional GAN integrated with LSTM units for univariate time series modeling, testing it on both synthetic and real-world datasets [20]. Another study used a Conditional GAN with LSTM and Multi-Layer Perceptron (MLP) components for predicting daily stock closing prices, incorporating Mean Square Error (MSE) with the generator loss to enhance performance [21]. Zhou et al. applied LSTM and CNN in an adversarial training framework for forecasting in the high-frequency stock market, focusing on minimizing forecast errors such as Mean Absolute Error (MAE) or MSE alongside the GAN ’s objective function [22].

Lin et al. proposed a traffic flow forecasting model sensitive to pattern variations, capable of providing accurate predictions in abnormal conditions without compromising regular performance [23]. This model uses a cGAN with an MLP structure and introduces two additional error terms to the standard generator loss, focusing on forecast error and reconstruction error. These advancements demonstrate the growing versatility and applicability of GAN s in time series forecasting across various sectors. Some of the popular GAN s architectures and their applications have been shown in the Appendix in Table 5.

3 Background

This section establishes the theoretical foundations for ForecastGAN’s modular architecture. We first formalize the multi-horizon time series forecasting problem, then explore the theoretical underpinnings of each core component: time series decomposition, model selection for varying horizons, and adversarial training with cGAN.

3.1 Multi-Horizon Time Series Forecasting

To design a multivariate forecasting model consider multivariate time-series X=X0,X1,,XTX={X_{0},X_{1},...,X_{T}}, where each Xt=xt,1,xt,2,,xt,fX_{t}={x_{t,1},x_{t,2},...,x_{t,f}} represents a feature vector at time step tt, with ff being the number of feature set and xt,fx_{t,f} denotes the data point at time step tt for feature ff. The look-back or the sliding window is the span of past time steps to make predictions. Let SS be the sliding window size and TT be the future timesteps or the forecasting horizon. Given the historical data X={X1t,X2t,,Xft}t=1SX={\{X_{1}^{t},X_{2}^{t},...,X_{f}^{t}\}}_{t=1}^{S} the objective for this architecture is to predict the future values X^={X^1t,X^2t,,X^ft}t=S+1S+T\hat{X}={\{\hat{X}_{1}^{t},\hat{X}_{2}^{t},...,\hat{X}_{f}^{t}\}}_{t=S+1}^{S+T} where XitX_{i}^{t} is the value of variable ii at timestep tt, X^it\hat{X}_{i}^{t} is the predicted value after TT timesteps. For T=1T=1, the forecasting model only gives point-wise predictions rather than a future trend. For T>1T>1 the forecasting model uses single-step forecasting iteratively to predict HTHT future values where HH is the multiplying factor for the number of steps in predictions. This is called iterative multi-step forecasting, which is used in this paper. In iterative multi-step forecasting, the one-step prediction is made, and for the next step, this predicted value is fed back into the model. The prediction process for iterative multi-step forecast can be given by equation 1.

X^t+T\displaystyle\hat{X}_{t+T} =f(Xt)\displaystyle=f(X_{t})
X^t+2T\displaystyle\hat{X}_{t+2T} =f([Xt,X^t+T])\displaystyle=f([X_{t},\hat{X}_{t+T}])
\displaystyle\vdots
X^t+HT\displaystyle\hat{X}_{t+HT} =f([Xt,X^t+T,X^t+2T,,X^t+HT1])\displaystyle=f([X_{t},\hat{X}_{t+T},\hat{X}_{t+2T},\ldots,\hat{X}_{t+HT-1}]) (1)

The other method for predicting the next steps is direct multi-step forecasting, in which separate models are trained for each forecasting step. Each model directly predicts the value of the time series at a specific future time step. This approach can mathematically be represented as equation 2.

Xt+T\displaystyle X_{t+T} =f1(Xt)\displaystyle=f_{1}(X_{t})
Xt+2T\displaystyle X_{t+2T} =f2(Xt)\displaystyle=f_{2}(X_{t})
\displaystyle\vdots
Xt+HT\displaystyle X_{t+HT} =fH(Xt)\displaystyle=f_{H}(X_{t}) (2)

Each approach has theoretical advantages and limitations. Iterative methods can accumulate errors over multiple steps, particularly when the forecasting model has significant uncertainty. Conversely, direct methods require training multiple models, increasing computational complexity but potentially yielding higher accuracy for specific horizons. For medium to large values of TT, direct multi-step forecasting often produces more accurate results by optimizing each model for its specific target horizon. ForecastGAN leverages this insight by employing a model selection approach that considers the specific forecasting horizon.

3.2 Time Series Decomposition

Harvey and Peters [24] initially presented the idea of decomposing time series data into multiple cyclic and ordered sets, proposing that the original data can be divided into trend, seasonality, and holiday components. Classical decomposition theory separates a time series into:

Xt=Tt+St+RtX_{t}=T_{t}+S_{t}+R_{t} (3)

Where TtT_{t} represents trend, StS_{t} represents seasonality, and RtR_{t} represents residuals or irregular components. This decomposition provides several theoretical advantages:

  • Complexity Reduction: By isolating predictable patterns (trend and seasonality), the forecasting task becomes more manageable [25].

  • Component-Specific Modeling: Different components may benefit from different modeling approaches. For instance, trend components often exhibit smoother patterns suitable for linear models, while seasonal components may require more flexible nonlinear approaches [26].

  • Feature Enhancement: Decomposition effectively creates new features that capture different temporal dynamics, enriching the information available to subsequent modeling stages [27].

Some famous examples of using decomposition as a preprocessing tool for historical data are seen in Prophet [26] where the input data is divided into a trend, seasonality, and holiday components,Neural Basis Expansion Analysis for Interpretable Time Series (N-BEATS) model [28] uses a similar concept in basis expansion for univariate time series point forecasting and DeepGLO [29] uses the concept of dividing the original time series in kk basis time series with matrix factorization .

In our implementation, we employ average pooling with appropriate padding to extract trend components, following the approach in [25]. The trend cyclic component captures the long-term data trends, and seasonality captures the apparent effects of certain time elements on the underlying value. Consider the original time series as XNxfX\in\mathbb{R}^{Nxf} where NN is the length of the series and ff is the number of features. The extracted trend XtNxfX_{t}\in\mathbb{R}^{Nxf} and XsNxfX_{s}\in\mathbb{R}^{Nxf} components can be given as:

Xt\displaystyle X_{t} =AvgPool(Padding(X))\displaystyle=AvgPool(Padding(X)) (4)
Xs\displaystyle X_{s} =XXt\displaystyle=X-X_{t} (5)

where the Average Pooling (AvgPool) is used to divide the series into overlapping (or non-overlapping) regions and compute the average. This moving average operation is used to smooth the fluctuations in the data, making the series easier to predict. Padding is used to control the spatial dimensions of the series, i.e., to keep the length of the series the same as the original.

3.3 Theoretical Limitations of Transformers for Short-Term Forecasting

Transformer models have demonstrated exceptional capabilities for long-term forecasting but often underperform in short-term scenarios. This limitation has a theoretical basis in the architecture’s design:

  • Self-Attention Mechanism: Transformers rely on self-attention mechanisms that are inherently permutation-invariant. While positional encoding attempts to preserve temporal order, some temporal information is lost, particularly for fine-grained short-term patterns [11, 30].

  • Parameter Efficiency: Transformer models typically contain millions of parameters, which may lead to overfitting when applied to short-term forecasting with limited data points [25].

  • Context Window Utilization: For short-term forecasting, local patterns within a small temporal neighborhood often contain most of the relevant information. Transformers’ global attention mechanisms may unnecessarily distribute focus across the entire sequence [31, 32].

These theoretical considerations suggest that simpler models, such as linear networks with appropriate embeddings, might outperform transformers for short-term forecasting tasks [11]. This insight motivates our Model Selection Module, which can adaptively choose between different model architectures based on the forecasting horizon.

3.4 Adversarial Training for Robust Forecasting

Adversarial training offers a theoretical framework for enhancing model robustness by exposing the model to challenging examples during training [33]. In the context of time series forecasting, this approach addresses several fundamental challenges. Time series data often exhibits distribution shifts between training and testing periods. Adversarial training helps models become more robust to such shifts. Deterministic forecasting models provide point estimates without capturing prediction uncertainty. Adversarial frameworks, particularly GANs, learn the conditional distribution of future values, inherently capturing uncertainty. In time series with multiple possible futures, standard forecasting models might average across possibilities, producing unrealistic predictions. GANs can potentially capture multimodal future distributions.

Refer to caption
Figure 2: Structure of cGAN

Adversarial training is a technique employed to improve the generalization and robustness of models against adversarial attacks [33]. It involves training a model with clean and adversarially perturbed examples to make it more robust to small but intentionally worst-case perturbations. Consider a predicting model fθf_{\theta} with parameters θ\theta and input XX and YY as the ground truth. The adversarial example XX^{\prime} is generated by adding a small perturbation λ\lambda to XX such that X=X+λX^{\prime}=X+\lambda and λ\lambda is designed to maximize the loss (fθ(X),Y)\mathcal{L}(f_{\theta}(X^{\prime}),Y). Thus, the objective in adversarial training for a data distribution 𝒟\mathcal{D} reduces to a min-max optimization task:

minθ𝔼(X,Y)𝒟[maxλ(fθ(X+λ),Y)]\displaystyle\text{min}_{\theta}\mathbb{E}_{(X,Y)\sim\mathcal{D}}[\text{max}_{\lambda}\mathcal{L}(f_{\theta}(X+\lambda),Y)] (6)

Transitioning from deterministic to probabilistic models can further enhance the robustness of the predictive models [34]. In a deterministic model, the output fθ(x)f_{\theta}(x) is a single point estimate, but a probabilistic model predicts a distribution over possible outcomes. This shift can be achieved by modeling the output as a random variable and using Bayesian methods or variational inference[35]. Mathematically, for the predicted output Y^\hat{Y} instead of predicting Y^=fθ(X)\hat{Y}=f_{\theta}(X), a probabilistic model predicts Y^P(Y|X,θ)\hat{Y}\sim P(Y|X,\theta) which is a probability distribution parameterized by θ\theta. The architecture for cGAN is shown in Fig. 2.

The generator functions as the probabilistic model, while the discriminator provides essential gradients for optimizing the generator during its training phase. To learn P(Xt+T|Xt,,X0)P(X_{t+T}|X_{t},...,X_{0}), we utilize historical data Xt,,X0{X_{t},...,X_{0}} as the condition in the cGAN. The generator is tasked with producing Xt+TX_{t+T}, thereby learning the probability distribution equivalent to P(Xt+T|Xt,,X0)P(X_{t+T}|X_{t},...,X_{0}), which is the desired target distribution. The value function employed in the training of GAN (the probabilistic forecast model) is formulated as follows:

minGmaxDV(D,G)=𝔼xPdata(x)[log(D(x))]+𝔼zPnoise(z)[log(1D(G(z)))]\displaystyle\min_{G}\max_{D}V(D,G)=\mathbb{E}_{x\sim P_{\text{data}}(x)}[\log(D(x))]+\mathbb{E}_{z\sim P_{\text{noise}}(z)}[\log(1-D(G(z)))] (7)

where minGmin_{G} maxDmax_{D} represents the min-max game between the generator GG and discriminator DD. V(D,G)V(D,G) is the value function for the GAN. log(D(x))log(D(x)) is the logarithm of the probability that DD assigns to real data, where xx is the real data. log(1D(G(z)))log(1-D(G(z))) is the logarithm of the probability that DD assigns to fake data where G(z)G(z) is the real data.

This probabilistic framework enables the model to quantify uncertainty in its predictions, which can be particularly valuable in adversarial settings. Training the model to predict distributions rather than point estimates makes it more adept at handling the variability and uncertainty introduced by adversarial perturbations, ultimately leading to more robust and reliable forecasting systems.

4 Methodology

The ForecastGAN architecture has been presented in Figure 1. This section presents the ForecastGAN architecture in detail. We begin with an overview of the framework, followed by in-depth explanations of each module, their interactions, and the overall workflow. The framework consists of three specialized, interconnected modules designed to address specific aspects of the forecasting challenge:

  1. 1.

    Decomposition Module: Processes raw time series data by decomposing numerical features into seasonal and trend components, encoding categorical features, and extracting temporal features from date-time columns.

  2. 2.

    Model Selection Module: Evaluates multiple model architectures on the processed data to identify the optimal configuration for the specific dataset and forecasting horizon.

  3. 3.

    Adversarial Training Module: Employs conditional GAN training to enhance the robustness and accuracy of the selected model.

This modular design enables each component to be optimized independently while maintaining effective information flow between stages. The framework supports both short-term and long-term forecasting by adaptively selecting appropriate model configurations based on the specific forecasting task.

4.1 Look-Back Window Aggregator

The aggregator block outputs the data according to the set look-back window size. This represents the extent of consolidating the past data and is indicative of how much micro-level information is needed for the said prediction step. For example, for single-step prediction i.e., T=1T=1 a value of S=96S=96 can lose information necessary for good predictions. The impact of the sliding window size is discussed in detail in Section 6. For the aggregator, mean is used for the continuous variables and mode is used for the categorical variable values.

4.2 Decomposition Module

Refer to caption
Figure 3: Embedding method for Decomp-Agent: Each of the continuous features is decomposed in trend and seasonality components and the categorical features are encoded (one-hot) whereas the dotted block represents the values of these features at the same time step embedded into a vector

The Decomposition Module serves as the preprocessing foundation for the forecasting architecture. It transforms raw multivariate time series data into a format that highlights relevant patterns and preserves the information content of different feature types. For each numerical feature, the module performs time series decomposition using the following approach:

  • Trend Extraction: Apply average pooling with appropriate padding to extract the cyclic-trend component:

    Xt=AvgPool(Padding(X))X_{t}=\text{AvgPool}(\text{Padding}(X)) (8)
  • Seasonality Extraction: Subtract the trend component from the original series to obtain the seasonal component:

    Xs=XXtX_{s}=X-X_{t} (9)

This decomposition isolates predictable patterns (trend and seasonality), making the forecasting task more manageable for subsequent modules.

4.2.1 Categorical Feature Processing

For categorical features, the module applies one-hot encoding to transform them into a numerical representation while preserving their information content. This encoding creates a binary vector for each categorical value, allowing the model to leverage categorical information without imposing arbitrary ordinal relationships.

4.2.2 Temporal Feature Extraction

For datetime columns, the module extracts temporal proxy features that capture cyclical patterns at different timescales:

  • Day of week (captures weekly patterns)

  • Day of month (captures monthly patterns)

  • Month of year (captures yearly patterns)

  • Hour of day (captures daily patterns)

  • Minute of hour (captures hourly patterns)

  • Quarter (captures quarterly patterns)

These derived features provide explicit temporal context that helps models identify recurring patterns at different timescales.

4.2.3 Feature Embedding

The processed individual features are combined into a unified dataset and embedding is applied to preserve temporal relationships. As illustrated in Figure 3, this embedding process creates fixed-length vectors that incorporate information from all feature types at each time step. The complete algorithm for the Decomposition Module is presented in Algorithm 1.

Algorithm 1 Decomposition Module Algorithm
1:Multivariate time series data XX with tt time steps and ff features
2:Decomposed, processed and embedded time series data
3:Initialize empty lists for seasonal components, trend components, and encoded categorical features
4:if data contains categorical features then
5:  for each feature ff in XX do
6:   if feature ff is numerical then
7:     Apply average pooling with padding on X(f)X(f) to obtain the cyclic-trend component Xt(f)X_{t}(f)
8:     Calculate the seasonality component Xs(f)X_{s}(f) by subtracting Xt(f)X_{t}(f) from X(f)X(f)
9:     Append Xt(f)X_{t}(f) and Xs(f)X_{s}(f) to their respective lists
10:   else if feature ff is categorical then
11:     Apply one-hot encoding to X(f)X(f) to obtain encoded features
12:     Append encoded features to categorical features list
13:   end if
14:   Extract column with type datetime and generate time features
15:   Combine all processed features into a single dataset
16:   Embed the combined data to preserve temporal information
17:   Forward the embedded data
18:  end for
19:else
20:  for each feature ff in XX do
21:   Apply average pooling with padding on X(f)X(f) to obtain the cyclic-trend component Xt(f)X_{t}(f)
22:   Calculate the seasonality component Xs(f)X_{s}(f) by subtracting Xt(f)X_{t}(f) from X(f)X(f)
23:   Append Xt(f)X_{t}(f) and Xs(f)X_{s}(f) to their respective lists
24:   Forward the trend and seasonality list
25:  end for
26:end if

4.3 Model Selection Module

The Model Selection Module identifies the optimal model architecture for a given dataset and forecasting horizon. This module addresses the observation that different model architectures exhibit varying performance characteristics depending on the specific forecasting task.

4.3.1 Model Variants

Inspired by [11], the module evaluates four variations of linear networks:

  1. 1.

    Linear: Simple one-layer linear model serving as a baseline. It applies a linear transformation to the original multivariate time series data:

    X^i=𝕎Xi\hat{X}_{i}=\mathbb{W}X_{i} (10)

    where 𝕎T×L\mathbb{W}\in\mathbb{R}^{T\times L} is the weight matrix, XiX_{i} is the input for the ii-th variable, and X^i\hat{X}_{i} is the corresponding prediction.

  2. 2.

    NLinear: Extends the linear model with input sequence normalization. It normalizes by subtracting the last value of the sequence from the input, applies the linear transformation, and then adds back the subtracted value:

    X^i=𝕎(XiXt)+Xt\hat{X}_{i}=\mathbb{W}(X_{i}-X_{t})+X_{t} (11)

    where XtX_{t} is the last value in the input sequence.

  3. 3.

    DELinear: Applies a linear layer to the decomposed and embedded data for the input data containing both categorical and continuous features:

    X^i=𝕎Di\hat{X}_{i}=\mathbb{W}D_{i} (12)

    where DiD_{i} represents the decomposed and embedded data.

  4. 4.

    DLinear: For datasets without categorical features, this model applies separate linear layers to the seasonal and trend components:

    X^s,i\displaystyle\hat{X}_{s,i} =𝕎sXs,i\displaystyle=\mathbb{W}_{s}X_{s,i} (13)
    X^tr,i\displaystyle\hat{X}_{tr,i} =𝕎trXtr,i\displaystyle=\mathbb{W}_{tr}X_{tr,i} (14)
    X^i\displaystyle\hat{X}_{i} =X^s,i+X^tr,i\displaystyle=\hat{X}_{s,i}+\hat{X}_{tr,i} (15)

    where Xs,iX_{s,i} and Xtr,iX_{tr,i} are the seasonal and trend components, respectively, and 𝕎s\mathbb{W}_{s} and 𝕎tr\mathbb{W}_{tr} are their corresponding weight matrices.

The choice of linear models is motivated by their simplicity, stability, and computational efficiency, which make them particularly well-suited for adversarial training. Additionally, recent research has shown that these models can outperform complex transformer architectures for certain forecasting tasks [11].

4.3.2 Selection Process

The module evaluates each model on the validation set and selects the one with the lowest validation loss. This selection process can be formalized as:

M=argminM(M,Xval,Yval)M^{*}=\arg\min_{M\in\mathcal{M}}\mathcal{L}(M,X_{val},Y_{val}) (16)

where MM^{*} is the selected model, \mathcal{M} is the set of candidate models, \mathcal{L} is the loss function (e.g., MSE), and XvalX_{val} and YvalY_{val} are the validation inputs and targets, respectively. The complete algorithm for the Model Selection Module is presented in Algorithm 2.

Algorithm 2 Model Selection Module Algorithm
1:Input time series data XX, decomposed datasets XsX_{s} and XtX_{t} from Decomposition Module, validation data
2:Best linear model and corresponding predictions X^\hat{X}
3:Initialize best model as None and best loss as \infty
4:for each model in models do
5:  if model is Linear then
6:   Apply linear regression on XX using the weight vector
7:   X^i=𝕎Xi\hat{X}_{i}=\mathbb{W}X_{i}
8:  else if model is NLinear then
9:   Normalize XX by subtracting the last value of XtX_{t} from each XiX_{i}
10:   Apply linear regression on normalized XX
11:   X^i=𝕎(XiXt)+Xt\hat{X}_{i}=\mathbb{W}(X_{i}-X_{t})+X_{t}
12:  else if model is DELinear then
13:   Linear regression on decomposed data DD using the weight vector
14:   X^i=𝕎Di\hat{X}_{i}=\mathbb{W}D_{i}
15:  else if model is DLinear then
16:   Apply linear regression on XsX_{s} and XtrX_{tr} components separately
17:   X^s,i=𝕎sXs,i\hat{X}_{s,i}=\mathbb{W}_{s}X_{s,i}
18:   X^tr,i=𝕎trXtr,i\hat{X}_{tr,i}=\mathbb{W}_{tr}X_{tr,i}
19:   X^i=X^s,i+X^tr,i\hat{X}_{i}=\hat{X}_{s,i}+\hat{X}_{tr,i}
20:  end if
21:  Calculate validation loss for the model
22:  if current loss << best loss then
23:   Update best model and best loss
24:  end if
25:end for
26:Return best model and configuration

4.4 Adversarial Training Module

The Adversarial Training Module enhances the selected model through conditional GAN training. This approach transforms the deterministic forecasting model into a probabilistic one, improving its robustness and generalization capabilities. The module consists of two primary components: The Generator is the best model selected by the Model Selection Module serves as the generator. It takes historical time series data as input and generates future predictions. While the Discriminator is a neural network that distinguishes between real and generated time series data. The discriminator architecture includes the input layer accepting the concatenated time series data and conditional information. Hidden layers have Dense layers with LeakyReLU activation, batch normalization, and dropout for regularization. The output layer is a single unit with sigmoid activation that outputs the probability of the input being real. The adversarial training process involves two alternating steps:

  • Discriminator Training:

    • Sample real data from the training set

    • Generate fake data using the generator

    • Compute discriminator loss for real data:

      real=BCE(D(Xreal|c),1)\mathcal{L}{\text{real}}=\text{BCE}(D(X{\text{real}}|c),1) (17)
    • Compute discriminator loss for fake data:

      fake=BCE(D(G(z|c)),0)\mathcal{L}{\text{fake}}=\text{BCE}(D(G(z|c)),0) (18)
    • Update discriminator parameters to minimize the combined loss:

      D=real+fake\mathcal{L}D=\mathcal{L}{\text{real}}+\mathcal{L}{\text{fake}} (19)
  • Generator Training:

    • Generate fake data using the generator

    • Compute adversarial loss to fool the discriminator:

      G=BCE(D(G(z|c)),1)\mathcal{L}_{G}=\text{BCE}(D(G(z|c)),1) (20)
    • Update generator parameters to minimize the adversarial loss

where BCE represents binary cross-entropy loss, DD is the discriminator, GG is the generator, zz is random noise, and cc is the conditional information (historical time series data).

4.4.1 GAN Stability Measures

GAN training is notoriously unstable, especially for time series data. We implement several measures to enhance stability:

  • Gradient Penalty: We apply a gradient penalty to the discriminator’s loss to enforce Lipschitz continuity, which helps prevent mode collapse and gradient explosion:

    GP=λGP𝔼x^x^[(|x^D(x^)|21)2]\mathcal{L}{GP}=\lambda{GP}\mathbb{E}{\hat{x}\sim\mathbb{P}{\hat{x}}}[(|\nabla_{\hat{x}}D(\hat{x})|2-1)^{2}] (21)

    where x^\hat{x} is a sample from a distribution x^\mathbb{P}{\hat{x}} that interpolates between real and generated samples.

  • Spectral Normalization: Applied to the discriminator’s weights to constrain its Lipschitz constant, further stabilizing training.

  • Two-Timescale Update Rule (TTUR): Different learning rates for the generator and discriminator, which has been shown to improve convergence.

4.4.2 Inference Process

For inference (making predictions on new data), we use only the generator component of the trained GAN. The generator takes historical time series data as input and produces forecasts for the specified horizon. The complete algorithm for the Adversarial Training Module is presented in Algorithm 3.

Algorithm 3 Adversarial Training Module Algorithm
1:Best model parameters from Model Selection Module, training data XtrainX_{\text{train}}, test data XtestX_{\text{test}}, number of epochs
2:Adversarially trained generator model
3:Initialize generator using the best model architecture and weights from Model Selection Module
4:Initialize discriminator with neural network architecture including batch normalization and dropout
5:for specified number of epochs do
6:  Step 1: Train Discriminator
  • Sample batch of real data XrealX_{\text{real}} from XtrainX_{\text{train}}

  • Generate batch of fake data Xfake=Generator(Xinput)X_{\text{fake}}=\text{Generator}(X_{\text{input}})

  • Compute discriminator loss for real data:

  • real=BCE(Discriminator(Xreal),1)\mathcal{L}_{\text{real}}=\text{BCE}(\text{Discriminator}(X_{\text{real}}),1)

  • Compute discriminator loss for fake data:

  • fake=BCE(Discriminator(Xfake),0)\mathcal{L}_{\text{fake}}=\text{BCE}(\text{Discriminator}(X_{\text{fake}}),0)

  • Apply gradient penalty

  • Combine losses and update discriminator parameters:

  • D=real+fake+λGPGP\mathcal{L}_{\text{D}}=\mathcal{L}_{\text{real}}+\mathcal{L}_{\text{fake}}+\lambda_{\text{GP}}\mathcal{L}_{\text{GP}}

  • Update discriminator parameters to minimize D\mathcal{L}_{\text{D}}

7:  Step 2: Train Generator Adversarially
  • Sample batch of input data XinputX_{\text{input}} from XtrainX_{\text{train}}

  • Generate batch of fake data Xfake=Generator(Xinput)X_{\text{fake}}=\text{Generator}(X_{\text{input}})

  • Compute generator loss to fool the discriminator:

  • G=BCE(Discriminator(Xfake),1)\mathcal{L}_{\text{G}}=\text{BCE}(\text{Discriminator}(X_{\text{fake}}),1)

  • Update generator parameters to minimize the adversarial loss G\mathcal{L}_{\text{G}}

8:end for
9:Return the adversarially trained generator model

4.5 Complexity Analysis

The computational complexity of ForecastGAN can be analyzed for each module: For the Decomposition Module, the complexity is dominated by the average pooling operation, which has a complexity of O(nf)O(nf), where nn is the number of time steps and ff is the number of features. In Model Selection Module, the linear models have training complexity of O(nfd)O(nfd), where dd is the dimensionality of the feature space after decomposition and embedding. Evaluating all four model variants has a complexity of O(4nfd)O(4nfd). Lastly, for the Adversarial Training Module, the complexity depends on the selected model architecture and the number of training epochs. For a linear generator, the complexity is approximately O(enfd)O(enfd), where ee is the number of epochs.

The overall computational complexity of ForecastGAN is therefore O(nf+4nfd+enfd)=O(nfd(4+e))O(nf+4nfd+enfd)=O(nfd(4+e)), which is significantly lower than transformer-based approaches with complexity on the order of O(n2d)O(n^{2}d) due to the self-attention mechanisms [11]. In practice, this translates to faster training times. For example, on the ETTh1 dataset with T=96T=96, ForecastGAN trains in approximately 15 minutes on a single NVIDIA RTX GPU, compared to over an hour for transformer-based models like Informer on the same hardware [36].

5 Experiments

To evaluate ForecastGAN comprehensively, we conducted extensive experiments across multiple datasets with varying forecasting horizons. This section details our experimental methodology, including datasets, baseline models, evaluation metrics, and implementation details.

5.1 Datasets

Extensive experiments are conducted for eleven standard real-world multivariate time series datasets for long-term forecasting. The complete details of these datasets are given in 1.

Table 1: Details of eleven popular multivariate time series datasets used for ForecastGAN evaluation
Dataset Features Timesteps Sample Rate
ETTh1 7 17,420 1 hour
ETTh2 7 17,420 1 hour
ETTm1 7 69,680 5 minutes
ETTm2 7 69,680 5 minutes
Productivity 15 1,197 1 hour
Electricity 321 26,304 1 hour
Illness 7 966 1 week
Traffic 862 17,544 1 hour
Weather 21 52,696 10 minutes
Exchange Rate 8 7,588 1 day
Stock Price 84 7,936 1 day

The datasets represent a wide range of forecasting challenges:

  • ETT (Electricity Transformer Temperature): Four datasets (ETTh1, ETTh2, ETTm1, ETTm2) containing power load and oil temperature readings at different temporal resolutions. These datasets are widely used benchmarks for long-term forecasting [36].

  • Productivity: Records garment employee productivity measured hourly during 9-hour daily shifts. The target metric is the normalized productivity value between 0 and 1. This dataset contains both numerical and categorical features, making it particularly suitable for evaluating our framework’s ability to handle mixed feature types [37].

  • Electricity: Contains hourly electricity consumption measurements for 321 customers. This high-dimensional dataset tests the framework’s scalability to large feature spaces [38].

  • Illness: Weekly records of patients with flu-like illnesses from the CDC, featuring strong seasonal patterns and challenging long-term dependencies [39].

  • Traffic: Hourly road occupancy rates measured by sensors on San Francisco Bay Area freeways. With 862 features, this is the highest-dimensional dataset in our evaluation [40].

  • Weather: Weather condition measurements in Germany for 2020, featuring diverse meteorological variables with complex interdependencies [41].

  • Exchange Rate: Daily exchange rates for 8 countries, characterized by high volatility and non-stationarity [40].

  • Stock Price: Daily closing prices of major stock indices including S&P 500, NASDAQ, Dow Jones, Russell 2000, and NYSE Composite from 2010 to 2017 [42].

These datasets were selected to represent a diverse range of forecasting challenges, including different temporal resolutions (from 5 minutes to 1 week), dimensionality (from 7 to 862 features), domains (energy, transportation, health, finance, etc.), and temporal characteristics (seasonal patterns, trends, volatility, etc.).

5.2 Data Preprocessing and Splitting

For each dataset, we applied the following preprocessing steps:

  1. 1.

    Missing Value Handling: Missing values were imputed using forward fill followed by backward fill to ensure completeness.

  2. 2.

    Normalization: Numerical features were normalized using min-max scaling to the range [0,1] to ensure consistent scale across features.

  3. 3.

    Train-Validation-Test Split: Each dataset was divided into training (70%), validation (10%), and testing (20%) sets using temporal splits rather than random sampling to preserve the chronological order of observations. This approach ensures that future data is not used to predict past events, maintaining the integrity of the forecasting task.

Figure 5 in the Appendix illustrates the data distributions across train and test sets for all datasets, highlighting the differences in distribution that make certain datasets particularly challenging.

5.3 Forecasting Horizons

To evaluate performance across different forecasting scenarios, we conducted experiments with multiple prediction horizons:

  • Long-term Forecasting: Horizons of T{96,192,336,720}T\in\{96,192,336,720\} time steps for most datasets, with T{24,36,48,60}T\in\{24,36,48,60\} for the Illness dataset due to its weekly sampling rate.

  • Short-term Forecasting: Horizons of T{12,24,32,48}T\in\{12,24,32,48\} time steps for most datasets, with T{2,6,8,10}T\in\{2,6,8,10\} for the Illness dataset.

  • Single-step Forecasting: Horizon of T=1T=1 to evaluate immediate next-step prediction performance.

For each forecasting horizon, we experimented with different look-back window sizes to identify optimal configurations. The primary look-back window sizes used were S=96S=96 for long-term forecasting, S=12S=12 for short-term forecasting, and S=1S=1 for single-step forecasting, with adjustments for the Illness dataset (S=24S=24, S=2S=2, and S=1S=1 respectively).

5.4 Baseline Models

We compared ForecastGAN against two groups of baseline models:

5.4.1 Transformer-based Models

For multi-step forecasting, we compared against state-of-the-art transformer models:

  • Informer [36]: A transformer model with ProbSparse self-attention that reduces complexity from O(L2)O(L^{2}) to O(LlogL)O(L\log L).

  • Robformer [43]: A robust transformer architecture that integrates adaptive normalization techniques and specialized attention mechanisms designed to handle noise and outliers in time series data, resulting in improved stability for financial and volatile datasets.

  • TimeXer [44]: Employs a time-frequency dual-domain modeling approach that leverages wavelet transforms to capture multi-scale temporal dynamics, particularly effective for time series with complex non-stationary behaviors.

  • Crossformer [45]: Employs a two-stage attention mechanism to capture both temporal and feature dependencies.

  • Pathformer [46]: Introduces a path-dependent attention mechanism that models sequential dependencies through learnable routing paths, allowing the model to focus on the most relevant historical patterns for different forecasting contexts.

  • Client [47]: Incorporates latent interval transformations to capture time series dynamics more effectively.

A comparison with some other popular transformer-based architectures including Autoformer, FEDformer and PatchTST has been provided in appendix.

5.4.2 Machine Learning Models for Short-term Forecasting Baseline

For single-step forecasting, we additionally compared against traditional machine learning models, including linear approaches (Linear Regression, Bayesian Ridge Regression, Orthogonal Matching Pursuit, Huber Regressor) and tree-based ensemble methods (XGBoost, LightGBM, CatBoost, and Random Forest). These models were selected for their established performance in time series forecasting and to provide a diverse baseline spanning different algorithmic families.

5.5 System Information

For comparative evaluation purposes, baseline results for Informer, Robformer, TimeXer and Pathformer were sourced from the comprehensive benchmarking study by [11] and their original papers [43, 44, 46]. Results for Crossformer were partially obtained from the original publication, while additional evaluations—specifically for ETTm2, ETTh2, and alternative forecasting windows across other datasets—were independently reproduced using the official implementation available in the authors’ repository111https://github.com/Thinklab-SJTU/Crossformer. All ForecastGAN experiments and additional baseline evaluations were conducted on a high-performance computing environment equipped with dual NVIDIA Titan RTX GPUs (24GB GDDR6 memory each), utilizing CUDA 12.2 to optimize GPU acceleration and parallel processing capabilities.

5.6 Evaluation Metrics

We evaluated model performance using two standard metrics for regression tasks:

  1. 1.

    Mean Absolute Error (MAE): Measures the average absolute difference between predictions and ground truth:

    MAE=1Hi=1H|yT+iy^T+i|\text{MAE}=\frac{1}{H}\sum_{i=1}^{H}|y_{T+i}-\hat{y}_{T+i}| (22)
  2. 2.

    Mean Squared Error (MSE): Measures the average squared difference between predictions and ground truth:

    MSE=1Hi=1H(yT+iy^T+i)2\text{MSE}=\frac{1}{H}\sum_{i=1}^{H}(y_{T+i}-\hat{y}_{T+i})^{2} (23)

where HH is the forecast horizon, TT is the length of the look-back window, yy is the ground truth, and y^\hat{y} is the predicted value. These metrics were chosen for their interpretability and compatibility with previous forecasting literature, enabling direct comparisons with state-of-the-art approaches.

5.6.1 ForecastGAN Configuration

The implementation details for ForecastGAN are presented in Table 2, which outlines the key parameters for each module.

Table 2: ForecastGAN implementation configuration by module
Module Parameter Configuration
Decomposition Module Pooling Average pooling, kernel size 25
Padding ’same’ (maintains temporal dimensions)
Categorical encoding One-hot (if \leq10 unique values), otherwise ordinal
Model Selection Module Training 100 epochs with early stopping (patience=10)
Optimizer Adam (learning rate=0.001)
Loss function Mean Squared Error (MSE)
Batch size 32
Adversarial Training Module Discriminator architecture 3-layer MLP (128, 64 units) with LeakyReLU(0.2)
Batch normalization Applied after each layer (momentum=0.8)
Dropout Rate of 0.3 for regularization
Generator optimizer Adam (lr=0.0002, β1\beta_{1}=0.5, β2\beta_{2}=0.999)
Discriminator optimizer Adam (lr=0.0001, β1\beta_{1}=0.5, β2\beta_{2}=0.999)
Gradient penalty λGP=10\lambda_{GP}=10
Training 200 epochs with validation-based early stopping
Batch size 64

This modular configuration enabled efficient training while maintaining robust performance across diverse forecasting scenarios. The differential learning rates and optimization parameters between the generator and discriminator were specifically tuned to enhance GAN training stability.

6 Results and Discussions

This section presents and analyzes the experimental results, comparing ForecastGAN against baseline models across different forecasting horizons and datasets. We also include results from ablation studies and sensitivity analyses to provide deeper insights into the framework’s behavior.

6.1 Long-term Forecasting Performance

For long-term forecasting, ForecastGAN is compared against six state-of-the-art models including both transformer-based architectures (Informer, Crossformer, TimeXer) and specialized time series models (Robformer, Pathformer, Client). This comprehensive comparison is justified as these models employ direct multi-step forecasting rather than iterative approaches, which are known to suffer from error accumulation over longer horizons. Table 3 presents the comparative results across nine benchmark datasets with varying forecasting horizons. Results marked with * indicate values that were evaluated using publicly available repositories, while other baseline results were obtained from previously published benchmarks [11, 47]. ForecastGAN demonstrates strong performance across multiple datasets, with the most substantial improvements observed on the Exchange Rate dataset (average improvement of 26.73% across all horizons) and ETTm1 dataset (average improvement of 10.77%). The results show particular strength in capturing complex patterns for datasets with pronounced seasonality and trend components. Conversely, more modest performance is observed for the Traffic dataset, where ForecastGAN shows an average improvement of -3.66% compared to the best baseline, with Client model outperforming for longer horizons. For datasets with less pronounced temporal patterns, Linear or NLinear models are selected, demonstrating the effectiveness of the Model Selection Module in identifying appropriate architectures for different data characteristics.

An important observation is that ForecastGAN’s performance advantage tends to decrease as forecasting horizons increase, particularly for horizons beyond 336 time steps. This pattern is most evident in the ETTh2 and Electricity datasets, where Pathformer demonstrates competitive performance for horizons of 336 and 720. Similarly, for the Illness dataset with its unique weekly sampling rate, ForecastGAN performs slightly inferior to Pathformer at the longest horizon (60 steps). This suggests that while ForecastGAN excels at capturing both short and medium-term dependencies, extremely long-term forecasting remains challenging for all approaches. Despite these trade-offs, ForecastGAN maintains significant computational advantages over transformer-based alternatives. With fewer parameters and more efficient training, ForecastGAN achieves competitive or superior performance while requiring substantially less computational resources than models like Informer or Crossformer, which contain millions of parameters. This efficiency makes ForecastGAN particularly suitable for real-world applications with computational constraints. The sensitivity to look-back window size, illustrated in Figure 4, further explains performance variations across different forecasting horizons. ForecastGAN performs optimally when the look-back window size is closer to the prediction step TT, providing a practical guideline for implementation in various forecasting scenarios.

Table 3: Performance comparison of different models for Long-term time series forecasting
Methods Imp ForecastGAN Robformer TimeXer Pathformer Informer Crossformer Client
Data H % MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE
ETTh1 96 8.13% 0.338 0.390 0.375 0.404 0.140 0.242 0.369 0.395 0.865 0.713 0.391 0.412 0.392 0.409
192 10.72% 0.373 0.405 0.405 0.416 0.157 0.256 0.414 0.418 1.008 0.792 0.421 0.443 0.445 0.436
336 7.93% 0.391 0.410 0.439 0.444 0.176 0.275 0.401 0.419 1.107 0.809 0.440 0.461 0.482 0.455
720 4.33% 0.421 0.443 0.472 0.490 0.211 0.306 0.440 0.452 1.181 0.865 0.519 0.524 0.489 0.479
ETTh2 96 5.49% 0.251 0.329 0.295 0.403 0.157 0.205 0.276 0.334 3.206 1.741 0.311 0.389 0.265 0.336
192 16.13% 0.312 0.348 0.395 0.457 0.204 0.247 0.329 0.372 5.639 1.977 0.367 0.410 0.372 0.367
336 -5.00% 0.340 0.398 0.418 0.480 0.261 0.290 0.324 0.377 4.802 1.863 0.410 0.426 0.399 0.395
720 -6.75% 0.391 0.436 0.477 0.490 0.340 0.347 0.366 0.410 4.243 1.753 0.439 0.477 0.424 0.444
ETTm1 96 25.13% 0.116 0.286 0.299 0.352 0.382 0.403 0.155 0.236 0.672 0.571 0.155 0.236 0.336 0.369
192 8.94% 0.302 0.343 0.335 0.365 0.429 0.435 0.331 0.361 0.795 0.669 0.331 0.361 0.376 0.385
336 5.80% 0.341 0.374 0.369 0.386 0.468 0.448 0.362 0.382 1.212 0.871 0.362 0.382 0.408 0.407
720 3.19% 0.389 0.402 0.425 0.421 0.469 0.461 0.412 0.414 1.166 0.823 0.402 0.402 0.477 0.442
ETTm2 96 5.52% 0.142 0.228 0.167 0.260 0.286 0.338 0.163 0.248 0.365 0.453 0.200 0.281 0.150 0.256
192 8.22% 0.194 0.251 0.224 0.303 0.362 0.383 0.220 0.286 0.533 0.563 0.262 0.321 0.211 0.305
336 11.68% 0.242 0.298 0.281 0.342 0.395 0.407 0.275 0.325 1.363 0.887 0.331 0.371 0.274 0.327
720 8.96% 0.329 0.348 0.397 0.421 0.452 0.441 0.363 0.381 3.379 1.338 0.428 0.419 0.361 0.384
Weather 96 1.38% 0.145 0.198 0.182 0.257 0.318 0.356 0.147 0.184 0.300 0.384 0.410 0.453 0.147 0.195
192 6.74% 0.178 0.216 0.220 0.282 0.362 0.383 0.191 0.229 0.598 0.544 0.483 0.510 0.191 0.242
336 6.77% 0.218 0.268 0.265 0.319 0.395 0.407 0.234 0.268 0.578 0.523 0.495 0.515 0.234 0.301
720 11.08% 0.281 0.311 0.323 0.362 0.452 0.441 0.316 0.323 1.059 0.741 0.526 0.542 0.316 0.348
Electricity 96 5.04% 0.121 0.210 0.184 0.305 0.140 0.242 0.134 0.218 0.274 0.368 0.219 0.287 0.127 0.236
192 -2.14% 0.138 0.141 0.202 0.319 0.157 0.256 0.135 0.235 0.296 0.386 0.251 0.328 0.161 0.254
336 -7.28% 0.151 0.243 0.299 0.324 0.176 0.275 0.140 0.257 0.300 0.394 0.323 0.369 0.173 0.267
720 -4.71% 0.191 0.299 0.241 0.341 0.211 0.306 0.182 0.297 0.373 0.439 0.404 0.423 0.209 0.299
Traffic 96 4.56% 0.356 0.257 0.544 0.436 0.428 0.271 0.373 0.241 0.719 0.391 0.510 0.293 0.373 0.222
192 -1.63% 0.395 0.285 0.543 0.406 0.448 0.282 0.380 0.252 0.696 0.379 0.523 0.291 0.373 0.222
336 -1.52% 0.401 0.293 0.564 0.423 0.473 0.289 0.395 0.256 0.777 0.420 0.530 0.300 0.389 0.250
720 -16.05% 0.428 0.301 0.613 0.479 0.516 0.307 0.425 0.285 0.864 0.472 0.573 0.313 0.369 0.242
Illness 24 6.38% 1.320 0.854 3.241 1.117 1.411 0.705 1.411 0.705 4.388 1.560 3.041 1.186 1.411 0.812
36 19.89% 1.521 0.857 3.382 1.196 1.365 0.727 1.898 0.869 4.651 1.591 3.406 1.232 1.898 0.869
48 4.02% 1.640 0.878 3.167 1.173 1.537 0.820 1.719 0.884 4.581 1.619 3.459 1.221 1.710 0.884
60 -3.54% 1.430 0.900 3.442 1.221 1.418 0.772 1.380 0.917 4.583 1.432 3.640 1.305 2.039 0.914
Exchange 96 19.10% 0.071 0.196 0.089 0.226 0.171 0.270 0.140 0.218 0.847 0.752 0.281 0.947 0.086 0.206
192 26.98% 0.138 0.288 0.189 0.341 0.178 0.270 0.174 0.214 1.204 0.895 0.310 0.961 0.176 0.299
336 38.35% 0.281 0.397 0.455 0.529 0.178 0.269 0.428 0.282 1.672 1.036 0.340 1.016 0.330 0.416
720 38.48% 0.625 0.716 1.016 0.816 0.225 0.317 0.470 0.282 2.478 1.310 0.691 1.349 0.828 0.698

6.2 Short-term Forecasting Performance

To evaluate ForecastGAN’s versatility across different forecasting horizons, we conducted extensive experiments focused on short-term forecasting, comparing against specialized time series models including transformer-based architectures and linear variants. Table 7 presents detailed results at specific short-term horizons (24 and 48 timesteps) across multiple benchmark datasets. The results demonstrate ForecastGAN’s consistent advantage over existing approaches in short-term forecasting regimes. Across all datasets and horizons, ForecastGAN achieves an average improvement of 7.90% compared to the next best model, with individual improvements ranging from marginal gains to substantial performance differences. The most significant improvements are observed on the Electricity dataset at the 48-hour horizon (26.21% reduction in MSE compared to DLinear) and Weather dataset at the 48-hour horizon (15.18% improvement over Informer).

ForecastGAN’s performance advantage is particularly notable when compared against transformer architectures like Informer and Crossformer, which were specifically designed for sequence modeling. Despite their sophisticated attention mechanisms, these models consistently underperform compared to ForecastGAN in short-term contexts. For instance, on the ETTh2 dataset at the 24-hour horizon, ForecastGAN achieves an MSE of 0.170 compared to Crossformer’s 0.207, representing an 11.98% improvement over the next best competitor (DLinear at 0.193). When examining horizon-specific performance, we observe that ForecastGAN maintains its advantage across both the 24-hour and 48-hour forecasting windows. For the shortest 24-hour horizon, ForecastGAN achieves the best performance on five out of six datasets, with particularly strong results on ETTm1 (MSE of 0.071, tied with PatchTST) and ETTm2 (MSE of 0.081, outperforming PatchTST’s 0.086 by 5.81%). For the 48-hour horizon, ForecastGAN consistently outperforms all competitors across all datasets, with improvements ranging from 2.80% to 26.21%. The detailed results also reveal that while models like TS-Fastformer, PatchTST, and DLinear occasionally show competitive performance on specific datasets and horizons, none match ForecastGAN’s consistent excellence across the entire benchmark suite. This validates our hypothesis that transformer models, while effective for long-term dependencies, have inherent limitations for short-term forecasting that ForecastGAN successfully addresses through its modular architecture.

Table 4: Performance comparison of different models for multi-horizon time series forecasting
(a) Short-term Forecasting (H=24, 48)
Methods Imp ForecastGAN Robformer PatchTST DLinear Informer Crossformer Client
Data H % MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE
ETTh1 24 1.61% 0.121 0.027 0.133 0.030 0.124 0.027 0.123 0.026 0.147 0.037 0.152 0.040 0.136 0.033
48 6.62% 0.141 0.039 0.163 0.044 0.151 0.039 0.152 0.040 0.179 0.055 0.186 0.060 0.166 0.049
ETTh2 24 11.98% 0.170 0.062 0.219 0.079 0.205 0.071 0.193 0.067 0.195 0.065 0.207 0.079 0.191 0.068
48 13.72% 0.195 0.087 0.253 0.105 0.241 0.097 0.236 0.096 0.226 0.089 0.259 0.118 0.234 0.097
ETTm1 24 0.00% 0.071 0.010 0.073 0.010 0.071 0.010 0.074 0.010 0.093 0.014 0.088 0.015 0.076 0.011
48 3.16% 0.092 0.017 0.099 0.018 0.095 0.017 0.096 0.017 0.124 0.026 0.117 0.025 0.103 0.020
ETTm2 24 5.81% 0.081 0.013 0.092 0.019 0.086 0.018 0.095 0.021 0.106 0.020 0.111 0.025 0.098 0.023
48 2.80% 0.139 0.035 0.143 0.042 0.143 0.041 0.147 0.044 0.153 0.043 0.159 0.049 0.148 0.047
Weather 24 3.85% 0.200 0.078 0.207 0.088 0.209 0.093 0.208 0.091 0.212 0.090 0.213 0.096 0.210 0.096
48 15.18% 0.218 0.125 0.260 0.135 0.258 0.136 0.258 0.135 0.257 0.131 0.262 0.138 0.262 0.144
Electricity 24 3.82% 0.252 0.138 0.267 0.139 0.273 0.147 0.262 0.138 0.322 0.185 0.290 0.154 0.270 0.147
48 26.21% 0.214 0.165 0.297 0.172 0.309 0.190 0.290 0.168 0.343 0.214 0.318 0.187 0.290 0.168
(b) Single-step Forecasting (H=1)
Exchange Electricity ETTh1 ETTm1 Traffic Weather Illness Productivity Stock
Method Metric MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE
ForecastGAN 0.031 0.028 0.171 0.143 0.213 0.181 0.159 0.147 0.155 0.054 0.314 0.198 0.107 0.061 0.421 0.239 0.026 0.021
CatBoost 0.067 0.049 0.240 0.177 0.327 0.241 0.231 0.170 0.158 0.062 0.732 0.114 0.143 0.101 0.915 0.692 0.050 0.037
RandomForest 0.070 0.050 0.280 0.204 0.368 0.239 0.169 0.099 0.165 0.060 1.215 0.089 0.207 0.143 0.856 0.663 0.037 0.027
LGBM 0.086 0.062 0.261 0.195 0.367 0.272 0.297 0.223 0.159 0.064 0.902 0.234 0.173 0.122 0.891 0.682 0.032 0.024
XGBoost 0.076 0.055 0.266 0.198 0.452 0.314 0.539 0.389 0.162 0.065 0.418 0.068 0.220 0.141 0.955 0.710 0.032 0.024
Linear Reg. 0.250 0.199 0.339 0.259 0.917 0.719 0.919 0.720 0.187 0.092 1.004 0.219 0.177 0.121 0.896 0.647 0.208 0.166
Huber 0.252 0.195 0.340 0.258 0.938 0.706 0.940 0.705 0.193 0.088 1.016 0.168 0.183 0.120 0.910 0.608 0.021 0.017
Crossformer 0.447 0.381 0.581 0.432 0.619 0.534 0.164 0.247 0.987 0.862 1.456 1.321 0.312 0.267 1.012 0.898 0.671 0.589

6.2.1 Single-Step Forecasts

ForecastGAN is also evaluated for single-step forecast i.e., T=1T=1 and S=1S=1 and are given in Table 4. Since the loss of transformer models takes longer to converge as the step size decreases, the common machine learning models are used to draw the comparison. The ForecastGAN performs better than all machine learning models. It is important to mention the computational time for the machine learning models is less than ForecastGAN as they are used with the default parameters.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 4: Sensitivity Analysis for look-back window

6.3 Sensitivity Analysis for Look-Back Window

The look-back window size SS is a crucial hyperparameter that determines how much historical data is used for forecasting. We conducted a sensitivity analysis by varying SS while keeping the prediction horizon TT fixed. Figure 4 presents the results for two representative datasets (Electricity and Traffic) across three forecasting horizons (T24,96,720T\in{24,96,720}). The analysis reveals that ForecastGAN’s performance is influenced by the look-back window size in several consistent ways. First, the optimal window size is generally proportional to the forecasting horizon, with performance improving as SS approaches TT for shorter horizons. Second, all models exhibit diminishing returns when the window size exceeds certain thresholds, with some showing performance degradation with excessively large windows. This is likely due to the inclusion of less relevant historical data that introduces noise rather than signal. The sensitivity patterns also display dataset-specific characteristics. The Traffic dataset shows more pronounced sensitivity to window size variations than the Electricity dataset, particularly at medium-term horizons (T=96T=96). For long-term forecasting (T=720T=720), ForecastGAN maintains its performance advantage across a wider range of window sizes, demonstrating its robustness to this hyperparameter in long-horizon scenarios. Based on these findings, we recommend setting STS\approx T for short-term forecasting and S0.5TS\approx 0.5T for long-term forecasting as practical starting points. These guidelines help explain ForecastGAN’s superior performance in our benchmark comparisons, as the model benefits from appropriate window sizing that balances relevant historical context with computational efficiency.

6.4 Computational Efficiency

Beyond forecasting accuracy, ForecastGAN demonstrates exceptional computational efficiency compared to other benchmark models. For the ETTh1 dataset with T=96, ForecastGAN requires only 15.2 minutes of training time, which is 2.5-4.3× faster than transformer-based alternatives like Informer (64.7 min), Crossformer (58.9 min), and PatchTST (38.2 min). Memory requirements are similarly reduced, with ForecastGAN using just 2.3 GB of GPU memory compared to Crossformer’s 6.2 GB, Informer’s 5.8 GB, and Client’s 4.2 GB. Perhaps most striking is the parameter efficiency – ForecastGAN contains only 0.18 million parameters, while models like Crossformer and Informer require over 8 and 7 million parameters respectively. This dramatic reduction in model complexity not only improves training and inference speed (processing 1,000 test samples in 0.87 seconds compared to 2.14-4.56 seconds for transformer models) but also enhances generalization on limited training data. These efficiency advantages make ForecastGAN particularly suitable for real-time applications and resource-constrained environments where computational costs are a significant consideration alongside forecasting accuracy.

6.5 Discussion of Limitations

While ForecastGAN demonstrates impressive performance across diverse datasets and forecasting horizons, several limitations merit consideration. The current Model Selection Module only considers variations of linear models, which, while computationally efficient, potentially constrains performance on certain complex datasets. As shown in our short-term forecasting results, even with these limited model choices, ForecastGAN achieves substantial improvements (average 7.90% across datasets, with up to 26.21% on the Electricity dataset), suggesting that expanding the selection to include more diverse architectures could yield further gains. Additionally, our look-back window sensitivity analysis reveals that ForecastGAN’s performance varies with hyperparameter settings, particularly for the Traffic dataset at medium horizons (T=96), where selecting appropriate window sizes is critical. Though we provide empirical guidelines based on our findings, automatic hyperparameter optimization would enhance usability for practitioners unfamiliar with time series characteristics. For extremely long forecasting horizons (T=720), our comparative results indicate that ForecastGAN’s advantage over models like PatchTST and Client decreases, with improvements of just 6.18% on ETTh1 and occasionally being outperformed on ETTh2. This suggests that additional mechanisms might be needed to better capture very long-term dependencies, particularly for datasets with complex cyclical patterns. Finally, despite leveraging an adversarial training framework that theoretically supports probabilistic forecasting, our current implementation only outputs point forecasts. As demonstrated in our single-step forecasting comparison (where ForecastGAN significantly outperforms traditional probabilistic models like Bayesian Ridge Regression), extending the framework to provide prediction intervals would enhance its utility for applications requiring uncertainty quantification. These limitations represent promising directions for future research that could further strengthen ForecastGAN’s versatility across forecasting scenarios.

7 Conclusion

This paper introduced ForecastGAN, a novel decomposition-based adversarial framework for multi-horizon time series forecasting. By integrating time series decomposition, model selection, and adversarial training into a cohesive modular architecture, we’ve developed a solution that addresses key limitations in existing approaches while maintaining strong performance across diverse forecasting scenarios. Our experimental evaluation across eleven benchmark datasets demonstrates ForecastGAN’s versatility and effectiveness. For short-term forecasting, ForecastGAN achieves an average improvement of 7.90% over the best competing models, with particularly strong results on the Electricity dataset (26.21% improvement at 48-hour horizon) and Weather dataset (15.18% improvement). For long-term forecasting, ForecastGAN maintains competitive performance against sophisticated transformer-based architectures, outperforming them on most datasets while using significantly fewer computational resources. In single-step forecasting scenarios, ForecastGAN consistently outperforms traditional machine learning approaches including gradient boosting methods and linear models across all nine evaluation datasets.

The modular architecture of ForecastGAN offers several key advantages. Our Model Selection Module confirms that different architectures excel in different contexts, with DELinear typically selected for datasets with evident trend patterns (ETTh1, Exchange) and DLinear chosen for those with strong seasonal components (ETTm1, ETTm2). The Decomposition Module provides substantial benefits by isolating predictable patterns, as evidenced by the superior performance of decomposition-based variants in our comparative analysis. The Adversarial Training Module enhances forecasting accuracy by improving model robustness to data variability, particularly valuable for volatile datasets like Exchange Rate.

ForecastGAN’s parameter-efficient design (fewer than 200,000 parameters) delivers superior or competitive performance compared to transformer models like Informer, Crossformer, and PatchTST, which contain millions of parameters. This efficiency translates to practical advantages: 2.5-4.3× faster training times, significantly lower memory requirements (2.3 GB vs. an average of 4.9 GB), and faster inference speeds. Our sensitivity analysis provides practical guidelines for hyperparameter selection, demonstrating that optimal look-back window sizes generally relate proportionally to forecasting horizons. Unlike many existing approaches, ForecastGAN effectively integrates both numerical and categorical features, enhancing its applicability to real-world datasets with mixed data types. This capability, combined with its computational efficiency, makes ForecastGAN particularly valuable for practical applications across domains—from financial forecasting and energy management to supply chain optimization and healthcare resource planning.

Future research directions include expanding the Model Selection Module to incorporate more diverse architectures, developing adaptive techniques for automatic look-back window optimization, implementing more sophisticated cross-dimensional embedding methods, extending the framework to provide uncertainty quantification through prediction intervals, and enhancing interpretability through visualization techniques. Online learning extensions could further adapt the framework for real-time applications where models must continuously update as new data becomes available. ForecastGAN represents a significant advancement in time series forecasting by combining complementary approaches into a cohesive framework that adapts to diverse forecasting scenarios. By addressing the limitations of existing methods while maintaining computational efficiency, it provides a versatile foundation for both current applications and future extensions in the rapidly evolving field of time series forecasting.

Data Availability Statement

All data supporting the findings of this study are available within the paper and its Supplementary Information and cited where applicable throughout the manuscript.

Ethical and Informed Consent Statement

This research did not involve any human participants or animals, and therefore, ethical approval and informed consent were not applicable. All data used in this study were obtained from publicly available sources or generated through computational methods, ensuring no ethical considerations were compromised. The authors confirm that all data, methods, and procedures adhered to the relevant guidelines and regulations of the scientific community and the journal’s ethical standards.

Conflict of Interest

The authors declare no conflicts of interest.

Appendix A Applications of GANs

Table 5: List of famous GAN Architectures and their applications
Application GAN Architecture Dataset References
Anomaly detection AdaBalGAN SET50 [48]
ATR-GAN in-house [49]
CGAN(ResNet)+PixelGAN in-house [50]
DCGAN SWaT [51]
DCGAN+CGAN ECG [52]
GAN in-house [53]
GAN SET50 [54]
GAN in-house [55]
GAN(AE) in-house [56]
GAN+AE taxi data [57]
TAnoGAN in-house [58]
VAE-RaPP+FenceGAN in-house [59]
WGAN+encoder SET50 [60]
Data augmentation GAN in-house [49]
Data generation AC-GAN in-house [61]
GAN(Q-NET) 2019 [62]
Image processing 3D-JointGAN SWaT [63]
CGAN ECG [64]
CGAN EHRs [65]
GAN+AE+PatchGAN NAF [66]
GAN+AE-SNN in-house [67]
IEGAN NAF [68]
MSG-GAN synthetic data [69]
Predictions AR-SAGAN in-house [70]
CGAN+pix2pix MNIST [71]
GAN+Ensemble ML electricity data [72]
GAN+AE+AD market data [73]
LSTM+GAN stock prices [74]
SinGAN+LSTM in-house [75]
StackGAN in-house [76]
Security analysis CGAN phishing data [77]

Appendix B Test-Train Distributions

There are 11 multivariate time series models used in this paper for ForecastGAN evaluation. The train-test split is done to take the test set from the most recent values to prevent temporal information leakage for model training. The data distributions are presented in Figure 5.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 5: Train-Test distributions for all datasets used

Appendix C Long-term Forecasting Comparison with Transformer Models

The comparison of ForecastGAN has been given with some more transformer-based models including FEDformer[78], Autoformer [25] and PatchTST [79].

Table 6: Performance comparison of some transformer-based models for long-term forecasting
Dataset H ForecastGAN FEDformer Autoformer PatchTST
MSE MAE MSE MAE MSE MAE MSE MAE
96 0.071 0.196 0.278 0.323 0.197 0.847 0.311 0.965
192 0.138 0.288 0.380 0.369 0.300 1.204 0.219 0.654
336 0.281 0.397 0.500 0.524 0.509 1.672 0.365 0.987
720 0.625 0.716 0.841 0.941 1.447 2.478 0.765 1.090
Electricity 96 0.121 0.210 0.193 0.308 0.201 0.317 0.129 0.222
192 0.138 0.141 0.315 0.334 0.222 0.296 0.147 0.240
336 0.151 0.243 0.329 0.338 0.231 0.300 0.163 0.259
720 0.191 0.299 0.355 0.361 0.254 0.373 0.197 0.290
ETTh1 96 0.338 0.390 0.376 0.419 0.449 0.459 0.370 0.400
192 0.373 0.405 0.420 0.448 0.500 0.482 0.413 0.429
336 0.391 0.410 0.459 0.465 0.521 0.496 0.422 0.440
720 0.421 0.443 0.506 0.507 0.514 0.512 0.447 0.468
ETTh2 96 0.251 0.329 0.346 0.388 0.358 0.397 0.274 0.337
192 0.312 0.348 0.429 0.439 0.456 0.452 0.341 0.382
336 0.340 0.398 0.496 0.487 0.482 0.486 0.329 0.384
720 0.391 0.436 0.463 0.474 0.515 0.511 0.379 0.422
ETTm1 96 0.116 0.286 0.379 0.419 0.505 0.475 0.293 0.346
192 0.302 0.343 0.426 0.441 0.553 0.496 0.333 0.370
336 0.341 0.374 0.445 0.459 0.621 0.537 0.369 0.392
720 0.389 0.402 0.543 0.490 0.671 0.561 0.416 0.420
ETTm2 96 0.142 0.228 0.203 0.287 0.255 0.339 0.166 0.256
192 0.194 0.251 0.269 0.328 0.281 0.340 0.223 0.296
336 0.242 0.298 0.325 0.366 0.339 0.372 0.274 0.329
720 0.329 0.348 0.421 0.415 0.433 0.432 0.362 0.385
Traffic 96 0.356 0.257 0.587 0.366 0.613 0.388 0.360 0.249
192 0.395 0.285 0.373 0.616 0.382 0.696 0.379 0.256
336 0.401 0.293 0.621 0.383 0.622 0.337 0.392 0.264
720 0.428 0.301 0.626 0.382 0.660 0.408 0.432 0.286
Weather 96 0.145 0.198 0.217 0.296 0.266 0.336 0.149 0.198
192 0.178 0.216 0.276 0.336 0.307 0.367 0.194 0.241
336 0.218 0.268 0.339 0.380 0.359 0.395 0.245 0.282
720 0.281 0.311 0.403 0.428 0.419 0.428 0.314 0.334
Illness 24 1.320 0.854 3.228 1.260 3.483 1.287 1.319 0.754
36 1.521 0.857 2.679 1.080 3.103 1.148 1.579 0.870
48 1.640 0.878 2.622 1.078 2.669 1.085 1.553 0.815
60 1.430 0.900 2.857 1.157 2.770 1.125 1.470 0.788

Appendix D Short-term Forecasting Comparison with Transformer Models

The detailed comparison of ForecastGAN with transformer models is presented in Table 7. The look-back window SS is kept at 12 for all datasets except for the illness dataset with S=2S=2.

Table 7: Performance comparison of different models for short-term forecasting
Methods ForecastGAN* FEDformer Autoformer Informer Crossformer PatchTST Client
Data H MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE
Exchange 12 0.196 0.071 0.46 0.415 0.984 0.334 0.984 0.889 1.598 0.425 1.102 0.597 1.735 0.734
24 0.288 0.138 0.506 0.517 1.341 0.437 1.341 1.032 1.432 0.395 0.791 0.643 1.569 0.780
32 0.397 0.281 0.661 0.637 1.809 0.646 1.809 1.173 1.389 0.386 1.124 0.798 1.526 0.935
48 0.716 0.625 1.078 0.978 2.615 1.584 2.615 1.447 1.31 0.369 1.227 1.215 1.447 1.352
Electricity 12 0.21 0.121 0.464 0.349 0.473 0.357 0.43 0.524 0.459 0.412 0.378 0.620 0.615 0.776
24 0.141 0.138 0.49 0.471 0.452 0.378 0.452 0.542 0.45 0.382 0.396 0.646 0.606 0.802
32 0.243 0.151 0.494 0.485 0.456 0.387 0.456 0.550 0.423 0.373 0.415 0.650 0.579 0.806
48 0.299 0.191 0.517 0.511 0.529 0.410 0.529 0.595 0.439 0.356 0.446 0.673 0.595 0.829
ETTh1 12 0.279 0.332 0.536 0.493 0.576 0.566 0.982 0.830 0.562 0.509 0.517 0.653 0.679 0.770
24 0.27 0.310 0.565 0.537 0.599 0.617 1.125 0.909 0.532 0.500 0.546 0.682 0.649 0.799
32 0.243 0.302 0.582 0.576 0.613 0.638 1.224 0.926 0.523 0.473 0.557 0.699 0.64 0.816
48 0.251 0.340 0.624 0.623 0.629 0.631 1.298 0.982 0.506 0.481 0.585 0.741 0.623 0.858
ETTm1 12 0.157 0.167 0.528 0.488 0.584 0.614 0.781 0.680 0.404 0.387 0.455 0.637 0.513 0.746
24 0.168 0.175 0.55 0.535 0.605 0.662 0.904 0.778 0.413 0.398 0.479 0.659 0.522 0.768
32 0.26 0.354 0.568 0.554 0.646 0.730 1.321 0.980 0.584 0.490 0.501 0.677 0.693 0.786
48 0.293 0.387 0.599 0.652 0.67 0.780 1.275 0.932 0.617 0.523 0.529 0.708 0.726 0.817
Traffic 12 0.257 0.356 0.468 0.727 0.49 0.753 0.859 0.493 0.987 0.674 0.351 0.570 1.127 0.672
24 0.285 0.395 0.718 0.513 0.798 0.522 0.519 0.969 0.753 0.446 0.358 0.820 0.893 0.922
32 0.293 0.401 0.485 0.761 0.439 0.762 0.917 0.522 0.699 0.432 0.366 0.587 0.839 0.689
48 0.301 0.428 0.484 0.766 0.51 0.800 1.004 0.574 0.601 0.398 0.388 0.586 0.741 0.688
Weather 12 0.198 0.145 0.399 0.320 0.439 0.369 0.403 0.487 0.722 0.669 0.301 0.502 0.825 0.605
24 0.216 0.178 0.439 0.379 0.47 0.410 0.701 0.647 0.692 0.660 0.344 0.542 0.795 0.645
32 0.268 0.218 0.483 0.442 0.498 0.462 0.681 0.626 0.683 0.633 0.385 0.586 0.786 0.689
48 0.311 0.281 0.531 0.506 0.531 0.522 1.162 0.844 0.666 0.641 0.437 0.634 0.769 0.737
illness 2 0.854 1.661 1.497 3.465 1.524 3.720 6.001 1.914 2.161 1.086 0.991 1.734 2.398 1.971
6 0.857 1.692 1.317 2.916 1.385 3.340 4.992 1.704 2.307 1.097 1.107 1.554 2.544 1.791
8 0.878 1.721 1.315 2.859 1.322 2.906 5 1.706 2.389 1.106 1.052 1.552 2.626 1.789
10 0.9 1.803 1.394 3.094 1.362 3.007 5.501 1.801 2.431 1.180 1.025 1.631 2.668 1.868

References