Table of Contents
Table of Contents
1. Introduction
o Overview of the Analysis
o Objective
2. Data Preprocessing
o Handling Missing Values
o Data Transformation
4. Model Building
o Overview of Models Used
o Training and Test Data Split
o Model Evaluation Metrics
6. Stationarity Check
o Augmented Dickey-Fuller Test
o Differencing
7. Model Comparison
o Performance Metrics
o Best Performing Model
8. Conclusion
o Key Takeaways
o Actionable Insights
o Recommendations
Question 1
#Data Overview
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2539 entries, 0 to 2538
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 2539 non-null object
1 Price 2460 non-null float64
dtypes: float64(1), object(1)
memory usage: 39.8+ KB
Dat Pric
e e
Fals
0 False
e
Fals
1 False
e
Fals
2 False
e
Fals
3 False
e
Fals
4 True
e
Fals
2534 True
e
Dat Pric
e e
Fals
2535 False
e
Fals
2536 False
e
Fals
2537 False
e
Fals
2538 False
e
2539 rows × 2 columns
Data Pre-processing
Split the data into train and test and plot the training and test data.
Date
2013-08-
1391.340000
31
2013-09-
1348.461905
30
2013-10-
1316.591304
31
2013-11-
1273.433333
30
2013-12-
1223.390909
31
Last few rows of Training Data
Price
Date
2020-08-
1980.271429
31
2020-09-
1929.819048
30
2020-10-
1905.550000
31
2020-11-
1867.115000
30
2020-12-
1862.845455
31
First few rows of Test Data
Price
Date
2021-01-
1867.557895
31
2021-02- 1807.263158
Price
Date
28
2021-03-
1719.908696
31
2021-04-
1760.909524
30
2021-05-
1851.620000
31
Last few rows of Test Data
Price
Date
2023-04-
2011.431579
30
2023-05-
1997.940909
31
2023-06-
1951.438095
30
2023-07-
1953.925000
31
2023-08-
1941.984615
31
Question 3
[89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107,
108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120]
Price time
Date
2013-08-31 1391.340000 1
2013-09-30 1348.461905 2
2013-10-31 1316.591304 3
2013-11-30 1273.433333 4
2013-12-31 1223.390909 5
Price time
Date
2020-08-31 1980.271429 85
2020-09-30 1929.819048 86
2020-10-31 1905.550000 87
2020-11-30 1867.115000 88
2020-12-31 1862.845455 89
First few rows of Test Data
Price time
Date
2021-01-31 1867.557895 89
2021-02-28 1807.263158 90
2021-03-31 1719.908696 91
2021-04-30 1760.909524 92
2021-05-31 1851.620000 93
{'smoothing_level': 0.9801302386409774,
'smoothing_trend': 0.07943746820166658,
'smoothing_seasonal': 0.006555840562072103,
'damping_trend': nan,
'initial_level': 1353.121247550983,
'initial_trend': -7.11117686886287,
'initial_seasons': array([ 48.32102202, 24.91742089, 2.22516281, -36.64278831,
-55.68490852, -15.22973869, 13.71722148, 7.08971454,
18.74391233, 6.73573979, 12.78138802, 25.00402705]), 'use_boxcox': False,
'lamda': None, 'remove_bias': False}
Alpha Values Beta Values Gamma Values Train RMSE Test RMSE
201 0.6 0.4 0.4 40.352617 107.964546
200 0.6 0.4 0.3 39.536539 107.996848
136 0.5 0.4 0.3 40.421425 108.208399
273 0.7 0.5 0.4 40.479353 108.490169
137 0.5 0.4 0.4 41.283022 108.697184
Test
RMSE
2pointTrailingMovingAverage 27.944954
4pointTrailingMovingAverage 54.619271
6pointTrailingMovingAverage 70.894305
9pointTrailingMovingAverage 85.550202
Alpha=0.995,SimpleExponentialSmoothing 89.668646
Alpha=0.8,Beta=0.5,Gamma=0.5,TripleExponentialSmoothing 107.964546
Alpha=0.676,Beta=0.088,Gamma=0.323,TripleExponentialSmoothing 520.803779
Alpha=0.676,Beta=0.088,Gamma=0.323,TripleExponentialSmoothing 520.803779
Alpha=0.676,Beta=0.088,Gamma=0.323,TripleExponentialSmoothing 520.803779
Alpha=0.676,Beta=0.088,Gamma=0.323,TripleExponentialSmoothing 520.803779
Alpha=0.676,Beta=0.088,Gamma=0.323,TripleExponentialSmoothing 520.803779
Question 4
Question 5
para AI
m C
ARIMA(1, 1, 1) - AIC:909.950670899301
ARIMA(1, 1, 2) - AIC:911.3728643896695
ARIMA(2, 1, 1) - AIC:911.5729813396761
ARIMA(2, 1, 2) - AIC:912.9244622316618
param AIC
0 (1, 1, 1) 909.950671
1 (1, 1, 2) 911.372864
2 (2, 1, 1) 911.572981
3 (2, 1, 2) 912.924462
SARIMAX Results
==============================================================================
Dep. Variable: y No. Observations: 89
Model: SARIMAX(0, 2, 2) Log Likelihood -448.375
Date: Sat, 20 Jul 2024 AIC 902.751
Time: 10:24:46 BIC 910.148
Sample: 08-31-2013 HQIC 905.729
- 12-31-2020
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
ma.L1 -0.6820 0.118 -5.762 0.000 -0.914 -0.450
ma.L2 -0.2627 0.134 -1.960 0.050 -0.525 -5.69e-05
sigma2 1714.0921 261.293 6.560 0.000 1201.966 2226.218
===================================================================================
Ljung-Box (L1) (Q): 0.05 Jarque-Bera (JB): 0.63
Prob(Q): 0.83 Prob(JB): 0.73
Heteroskedasticity (H): 1.34 Skew: 0.20
Prob(H) (two-sided): 0.43 Kurtosis: 2.91
===================================================================================
#Predict on the Test Set using this model and evaluate the model.
306.3031559772143
RMSE
Auto
306.303156
ARIMA
# arima model
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
90.21680533104079
Test
RMSE
207.87212
Linear Regression
6
2pointTrailingMovingAverage 27.944954
4pointTrailingMovingAverage 54.619271
6pointTrailingMovingAverage 70.894305
9pointTrailingMovingAverage 85.550202
Alpha=0.995,SimpleExponentialSmoothing 89.668646
Test
RMSE
520.80377
Triple Exponential Smoothing
9
520.80377
Alpha=0.676,Beta=0.088,Gamma=0.323,TripleExponentialSmoothing
9
520.80377
Alpha=0.676,Beta=0.088,Gamma=0.323,TripleExponentialSmoothing
9
520.80377
Alpha=0.676,Beta=0.088,Gamma=0.323,TripleExponentialSmoothing
9
520.80377
Alpha=0.676,Beta=0.088,Gamma=0.323,TripleExponentialSmoothing
9
520.80377
Alpha=0.676,Beta=0.088,Gamma=0.323,TripleExponentialSmoothing
9
107.96454
Alpha=0.8,Beta=0.5,Gamma=0.5,TripleExponentialSmoothing
6
Question 6
Key Takeaways
3. Model Performance:
o Various forecasting models were built and evaluated on the test set, including Linear
Regression, Moving Averages, Simple and Triple Exponential Smoothing, and ARIMA.
o The 2-point Trailing Moving Average model outperformed other models with the lowest
RMSE of 27.94 on the test data.
o Advanced models like Triple Exponential Smoothing and Auto ARIMA did not perform as
well as simpler moving average models, indicating potential overfitting or the
complexity of capturing the underlying patterns with these models.
4. Stationarity Check:
o The Augmented Dickey-Fuller test indicated that the original series was non-stationary.
o Differencing was applied to achieve stationarity for ARIMA model building.
Actionable Insights
By following these insights and recommendations, businesses can enhance their forecasting
capabilities, leading to more informed decision-making and better strategic planning in a
dynamic market environment.