0% found this document useful (0 votes)
3 views27 pages

Table of Contents

Uploaded by

amansinhmar2303
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views27 pages

Table of Contents

Uploaded by

amansinhmar2303
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Table of Contents

1. Introduction
o Overview of the Analysis
o Objective

2. Data Preprocessing
o Handling Missing Values
o Data Transformation

3. Exploratory Data Analysis (EDA)


o Data Visualization
o Decomposition of Time Series

4. Model Building
o Overview of Models Used
o Training and Test Data Split
o Model Evaluation Metrics

5. Model Performance and Results


o Linear Regression
o Simple Moving Average (2-point Trailing)
o Simple Moving Average (4-point Trailing)
o Simple Exponential Smoothing
o Triple Exponential Smoothing
o ARIMA
o Auto ARIMA

6. Stationarity Check
o Augmented Dickey-Fuller Test
o Differencing

7. Model Comparison
o Performance Metrics
o Best Performing Model

8. Conclusion
o Key Takeaways
o Actionable Insights
o Recommendations
Question 1

Define the problem and perform Exploratory Data Analysis

#Loading the data


from google.colab import drive
drive.mount('/content/drive')
path=("/content/drive/MyDrive/gold_prices.csv")
df = pd.read_csv(path)

#Data Overview

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2539 entries, 0 to 2538
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 2539 non-null object
1 Price 2460 non-null float64
dtypes: float64(1), object(1)
memory usage: 39.8+ KB

Dat Pric
e e

Fals
0 False
e

Fals
1 False
e

Fals
2 False
e

Fals
3 False
e

Fals
4 True
e

... ... ...

Fals
2534 True
e
Dat Pric
e e

Fals
2535 False
e

Fals
2536 False
e

Fals
2537 False
e

Fals
2538 False
e
2539 rows × 2 columns

#Missing value treatment

df['Price'] = df['Price'].fillna(method='ffill') # Fill the missing values with


forward fill method
df.isnull().sum() # Check for the number of null values in each column
df.set_index('Date', inplace=True)

#Exploratory Data Analysis


- Perform Decomposition
Question 2

Data Pre-processing

#Missing value treatment

df['Price'] = df['Price'].fillna(method='ffill') # Fill the missing values with


forward fill method
df.isnull().sum() # Check for the number of null values in each column
df.set_index('Date', inplace=True)

Split the data into train and test and plot the training and test data.

First few rows of Training Data


Price

Date

2013-08-
1391.340000
31

2013-09-
1348.461905
30

2013-10-
1316.591304
31

2013-11-
1273.433333
30

2013-12-
1223.390909
31
Last few rows of Training Data

Price

Date

2020-08-
1980.271429
31

2020-09-
1929.819048
30

2020-10-
1905.550000
31

2020-11-
1867.115000
30

2020-12-
1862.845455
31
First few rows of Test Data

Price

Date

2021-01-
1867.557895
31

2021-02- 1807.263158
Price

Date

28

2021-03-
1719.908696
31

2021-04-
1760.909524
30

2021-05-
1851.620000
31
Last few rows of Test Data

Price

Date

2023-04-
2011.431579
30

2023-05-
1997.940909
31

2023-06-
1951.438095
30

2023-07-
1953.925000
31

2023-08-
1941.984615
31
Question 3

Model Building - Original Data

#Model Building - Original Data

Build forecasting models

Training Time instance


[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,
66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86,
87, 88, 89]
Test Time instance

[89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107,
108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120]

First few rows of Training Data

Price time
Date
2013-08-31 1391.340000 1
2013-09-30 1348.461905 2
2013-10-31 1316.591304 3
2013-11-30 1273.433333 4
2013-12-31 1223.390909 5

Last few rows of Training Data

Price time
Date
2020-08-31 1980.271429 85
2020-09-30 1929.819048 86
2020-10-31 1905.550000 87
2020-11-30 1867.115000 88
2020-12-31 1862.845455 89
First few rows of Test Data

Price time
Date
2021-01-31 1867.557895 89
2021-02-28 1807.263158 90
2021-03-31 1719.908696 91
2021-04-30 1760.909524 92
2021-05-31 1851.620000 93

Last few rows of Test Data


Price time
Date
2023-04-30 2011.431579 116
2023-05-31 1997.940909 117
2023-06-30 1951.438095 118
2023-07-31 1953.925000 119
2023-08-31 1941.984615 120

# Define the linear regression model

## Test Data - RMSE

For RegressionOnTime forecast on the Test Data, RMSE is 207.87


Test RMSE
Linear Regression 207.872126

#Moving Average (MA)


For 2 point Moving Average Model forecast on the Testing Data, RMSE is 27.945
Test RMSE
Linear Regression 207.872126
2pointTrailingMovingAverage 27.944954
4pointTrailingMovingAverage 54.619271
6pointTrailingMovingAverage 70.894305
9pointTrailingMovingAverage 85.550202

# Define the SES model

{'smoothing_level': 0.9829025568211125, 'smoothing_trend': nan, 'smoothing_seasonal':


nan, 'damping_trend': nan, 'initial_level': 1373.557207700078, 'initial_trend': nan,
'initial_seasons': array([], dtype=float64), 'use_boxcox': False, 'lamda': None,
'remove_bias': False}
For Alpha =0.995 Simple Exponential Smoothing Model forecast on the Test Data, RMSE is
89.669
Test RMSE
Linear Regression 207.872126
2pointTrailingMovingAverage 27.944954
4pointTrailingMovingAverage 54.619271
6pointTrailingMovingAverage 70.894305
9pointTrailingMovingAverage 85.550202
Alpha=0.995,SimpleExponentialSmoothing 89.668646

# Initialize and Fit the Triple Exponential Smoothing Model

{'smoothing_level': 0.9801302386409774,
'smoothing_trend': 0.07943746820166658,
'smoothing_seasonal': 0.006555840562072103,
'damping_trend': nan,
'initial_level': 1353.121247550983,
'initial_trend': -7.11117686886287,
'initial_seasons': array([ 48.32102202, 24.91742089, 2.22516281, -36.64278831,
-55.68490852, -15.22973869, 13.71722148, 7.08971454,
18.74391233, 6.73573979, 12.78138802, 25.00402705]), 'use_boxcox': False,
'lamda': None, 'remove_bias': False}

# Plot predictions from Double Exponential Smoothing


# Plot the best combination of Double Exponential Smoothing
# Calculate RMSE for Triple Exponential Smoothing on test data
# Calculate RMSE for Triple Exponential Smoothing on test data

For Alpha=0.676, Beta=0.088, Gamma=0.323, Triple Exponential Smoothing Model forecast


on the Test Data, RMSE is 520.804

Alpha Values Beta Values Gamma Values Train RMSE Test RMSE
201 0.6 0.4 0.4 40.352617 107.964546
200 0.6 0.4 0.3 39.536539 107.996848
136 0.5 0.4 0.3 40.421425 108.208399
273 0.7 0.5 0.4 40.479353 108.490169
137 0.5 0.4 0.4 41.283022 108.697184

Sorted by RMSE values on the Test Data:

Test
RMSE

2pointTrailingMovingAverage 27.944954

4pointTrailingMovingAverage 54.619271

6pointTrailingMovingAverage 70.894305

9pointTrailingMovingAverage 85.550202

Alpha=0.995,SimpleExponentialSmoothing 89.668646

Best Double Exponential Smoothing 90.076443


Test
RMSE

Alpha=0.8,Beta=0.5,Gamma=0.5,TripleExponentialSmoothing 107.964546

Linear Regression 207.872126

Triple Exponential Smoothing 520.803779

Alpha=0.676, Beta=0.088, Gamma=0.323, Triple Exponential Smoothing 520.803779

Alpha=0.676,Beta=0.088,Gamma=0.323,TripleExponentialSmoothing 520.803779

Alpha=0.676,Beta=0.088,Gamma=0.323,TripleExponentialSmoothing 520.803779

Alpha=0.676,Beta=0.088,Gamma=0.323,TripleExponentialSmoothing 520.803779

Alpha=0.676,Beta=0.088,Gamma=0.323,TripleExponentialSmoothing 520.803779

Alpha=0.676,Beta=0.088,Gamma=0.323,TripleExponentialSmoothing 520.803779

Question 4

Check for Stationarity

#Check for stationarity of the whole Time Series data.


Results of Dickey-Fuller Test:
Test Statistic -0.038478
p-value 0.955232
#Lags Used 1.000000
Number of Observations Used 87.000000
Critical Value (1%) -3.507853
Critical Value (5%) -2.895382
Critical Value (10%) -2.584824
dtype: float64
Results of Dickey-Fuller Test:
Test Statistic -3.013758
p-value 0.033629
#Lags Used 5.000000
Number of Observations Used 26.000000
Critical Value (1%) -3.711212
Critical Value (5%) -2.981247
Critical Value (10%) -2.630095
dtype: float64

Question 5

Model Building - Stationary Data

#Model Building - Stationary Data


#Auto ARIMA Model

Some parameter combinations for the Model...


Model: (0, '_', 1)
Model: (0, '_', 2)
Model: (0, '_', 0)
Model: (0, '_', 1)
Model: (0, '_', 2)
Model: (0, '_', 0)
Model: (0, '_', 1)
Model: (0, '_', 2)
Model: (1, '_', 0)
Model: (1, '_', 1)
Model: (1, '_', 2)
Model: (1, '_', 0)
Model: (1, '_', 1)
Model: (1, '_', 2)
Model: (1, '_', 0)
Model: (1, '_', 1)
Model: (1, '_', 2)
Model: (2, '_', 0)
Model: (2, '_', 1)
Model: (2, '_', 2)
Model: (2, '_', 0)
Model: (2, '_', 1)
Model: (2, '_', 2)
Model: (2, '_', 0)
Model: (2, '_', 1)
Model: (2, '_', 2)

para AI
m C

# Build the ARIMA model based on the best AIC values

ARIMA(1, 1, 1) - AIC:909.950670899301
ARIMA(1, 1, 2) - AIC:911.3728643896695
ARIMA(2, 1, 1) - AIC:911.5729813396761
ARIMA(2, 1, 2) - AIC:912.9244622316618
param AIC
0 (1, 1, 1) 909.950671
1 (1, 1, 2) 911.372864
2 (2, 1, 1) 911.572981
3 (2, 1, 2) 912.924462
SARIMAX Results
==============================================================================
Dep. Variable: y No. Observations: 89
Model: SARIMAX(0, 2, 2) Log Likelihood -448.375
Date: Sat, 20 Jul 2024 AIC 902.751
Time: 10:24:46 BIC 910.148
Sample: 08-31-2013 HQIC 905.729
- 12-31-2020
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
ma.L1 -0.6820 0.118 -5.762 0.000 -0.914 -0.450
ma.L2 -0.2627 0.134 -1.960 0.050 -0.525 -5.69e-05
sigma2 1714.0921 261.293 6.560 0.000 1201.966 2226.218
===================================================================================
Ljung-Box (L1) (Q): 0.05 Jarque-Bera (JB): 0.63
Prob(Q): 0.83 Prob(JB): 0.73
Heteroskedasticity (H): 1.34 Skew: 0.20
Prob(H) (two-sided): 0.43 Kurtosis: 2.91
===================================================================================

#Predict on the Test Set using this model and evaluate the model.

306.3031559772143

RMSE

Auto
306.303156
ARIMA

# arima model

Some parameter combinations for the Model...


Model: (0, 1, 0)
Model: (0, 1, 1)
Model: (0, 1, 2)
Model: (1, 1, 0)
Model: (1, 1, 1)
Model: (1, 1, 2)
Model: (2, 1, 0)
Model: (2, 1, 1)
Model: (2, 1, 2)
ARIMA(0, 1, 0) - AIC:914.2597353789228
ARIMA(0, 1, 1) - AIC:908.1514792211835
ARIMA(0, 1, 2) - AIC:909.7965186418962
ARIMA(1, 1, 0) - AIC:908.4009658697732
ARIMA(1, 1, 1) - AIC:909.950670899301
ARIMA(1, 1, 2) - AIC:911.3728643896695
ARIMA(2, 1, 0) - AIC:909.6915224955326
ARIMA(2, 1, 1) - AIC:911.5729813396761
ARIMA(2, 1, 2) - AIC:912.9244622316618
param AIC
0 (0, 1, 0) 914.259735
1 (0, 1, 1) 908.151479
2 (0, 1, 2) 909.796519
3 (1, 1, 0) 908.400966
4 (1, 1, 1) 909.950671
5 (1, 1, 2) 911.372864
6 (2, 1, 0) 909.691522
7 (2, 1, 1) 911.572981
8 (2, 1, 2) 912.924462
SARIMAX Results
==============================================================================
Dep. Variable: Price No. Observations: 89
Model: ARIMA(0, 1, 1) Log Likelihood -452.076
Date: Sat, 20 Jul 2024 AIC 908.151
Time: 10:42:06 BIC 913.106
Sample: 08-31-2013 HQIC 910.148
- 12-31-2020
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
ma.L1 0.2937 0.111 2.637 0.008 0.075 0.512
sigma2 1694.5828 243.026 6.973 0.000 1218.260 2170.905
===================================================================================
Ljung-Box (L1) (Q): 0.01 Jarque-Bera (JB): 2.45
Prob(Q): 0.94 Prob(JB): 0.29
Heteroskedasticity (H): 1.79 Skew: 0.41
Prob(H) (two-sided): 0.12 Kurtosis: 3.09
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
90.21680533104079

# Predict on test data

Index(['Price', 'auto_predict', 'predict_0.3_0.3_0.3', 'predict_0.3_0.3_0.4',


'predict_0.3_0.3_0.5', 'predict_0.3_0.3_0.6000000000000001',
'predict_0.3_0.3_0.7000000000000002',
'predict_0.3_0.3_0.8000000000000003',
'predict_0.3_0.3_0.9000000000000001',
'predict_0.3_0.3_1.0000000000000002',
...
'predict_1.0000000000000002_0.9000000000000001_0.9000000000000001',
'predict_1.0000000000000002_0.9000000000000001_1.0000000000000002',
'predict_1.0000000000000002_1.0000000000000002_0.3',
'predict_1.0000000000000002_1.0000000000000002_0.4',
'predict_1.0000000000000002_1.0000000000000002_0.5',
'predict_1.0000000000000002_1.0000000000000002_0.6000000000000001',
'predict_1.0000000000000002_1.0000000000000002_0.7000000000000002',
'predict_1.0000000000000002_1.0000000000000002_0.8000000000000003',
'predict_1.0000000000000002_1.0000000000000002_0.9000000000000001',
'predict_1.0000000000000002_1.0000000000000002_1.0000000000000002'],
dtype='object', length=514)
Warning: 'auto_predict_1.0_0.9_0.8' not found in TES_test. Check if Triple Exponential
Smoothing model was run and generated predictions under this name.

# Display the updated results DataFrame

Test
RMSE

207.87212
Linear Regression
6

2pointTrailingMovingAverage 27.944954

4pointTrailingMovingAverage 54.619271

6pointTrailingMovingAverage 70.894305

9pointTrailingMovingAverage 85.550202

Alpha=0.995,SimpleExponentialSmoothing 89.668646
Test
RMSE

Best Double Exponential Smoothing 90.076443

520.80377
Triple Exponential Smoothing
9

Alpha=0.676, Beta=0.088, Gamma=0.323, Triple Exponential 520.80377


Smoothing 9

520.80377
Alpha=0.676,Beta=0.088,Gamma=0.323,TripleExponentialSmoothing
9

520.80377
Alpha=0.676,Beta=0.088,Gamma=0.323,TripleExponentialSmoothing
9

520.80377
Alpha=0.676,Beta=0.088,Gamma=0.323,TripleExponentialSmoothing
9

520.80377
Alpha=0.676,Beta=0.088,Gamma=0.323,TripleExponentialSmoothing
9

520.80377
Alpha=0.676,Beta=0.088,Gamma=0.323,TripleExponentialSmoothing
9

107.96454
Alpha=0.8,Beta=0.5,Gamma=0.5,TripleExponentialSmoothing
6

Question 6

Compare the performance of the models

#Compare the Performance of the model & Forecast


# Plot the original data and the forecasted data

Conclusion: Key Takeaways, Actionable Insights, and Recommendations

Key Takeaways

1. Data Quality and Preprocessing:


o The dataset contained missing values, particularly in the 'Price' column, which were
successfully addressed using forward fill.
o The dataset was transformed to have the 'Date' column as the index, facilitating time
series analysis.

2. Exploratory Data Analysis (EDA):


o Initial data visualization and decomposition revealed seasonal patterns and trends in the
gold prices.
o The dataset was divided into training and test sets to evaluate model performance
effectively.

3. Model Performance:
o Various forecasting models were built and evaluated on the test set, including Linear
Regression, Moving Averages, Simple and Triple Exponential Smoothing, and ARIMA.
o The 2-point Trailing Moving Average model outperformed other models with the lowest
RMSE of 27.94 on the test data.
o Advanced models like Triple Exponential Smoothing and Auto ARIMA did not perform as
well as simpler moving average models, indicating potential overfitting or the
complexity of capturing the underlying patterns with these models.

4. Stationarity Check:
o The Augmented Dickey-Fuller test indicated that the original series was non-stationary.
o Differencing was applied to achieve stationarity for ARIMA model building.

Actionable Insights

1. Effective Forecasting Techniques:


o Simpler models like the Moving Average models proved to be more effective for short-
term forecasting in this dataset. Businesses should consider using these for reliable
short-term predictions.

2. Need for Continuous Model Evaluation:


o Regularly update and evaluate forecasting models with new data to ensure their
accuracy and relevance.
o Track model performance over time to identify when a more complex model might
become necessary or beneficial.

3. Data Quality Maintenance:


o Ensure consistent and accurate data collection practices to minimize missing values and
discrepancies.
o Implement robust data preprocessing steps to handle any future data issues efficiently.

4. Understanding Market Conditions:


o While models provide statistical insights, it is crucial to incorporate market knowledge
and external factors (e.g., geopolitical events, economic indicators) into forecasting and
decision-making processes.
Recommendations

1. Implement a Hybrid Forecasting Approach:


o Combine the strengths of simple models (e.g., Moving Average) with domain expertise
to create a more robust forecasting system.
o Use simple models for immediate forecasts and monitor performance while exploring
more complex models for potential future use.

2. Invest in Data Analytics Infrastructure:


o Develop infrastructure for real-time data collection and processing to ensure the most
up-to-date information is used in forecasting models.
o Leverage advanced analytics tools to facilitate more complex model experimentation
and deployment when needed.

3. Continuous Monitoring and Adjustment:


o Set up a process for continuous monitoring of forecasting accuracy and make
adjustments as necessary.
o Regularly review and update models based on new data and changing market
conditions.

4. Incorporate External Data Sources:


o Enhance forecasting models by incorporating relevant external data sources such as
economic indicators, currency exchange rates, and global market trends.
o Use these additional data points to improve model accuracy and provide more
comprehensive insights.

By following these insights and recommendations, businesses can enhance their forecasting
capabilities, leading to more informed decision-making and better strategic planning in a
dynamic market environment.

You might also like