St.
Francis Institute of Technology
SV Road, Borivali (West), Mumbai 400103
Department of Computer Engineering
Academic Year: 2023-2024 Semester: VIII
Subject: Applied Data Science Class / Division: BE/CMPNA
Name :- Jess Lopes Roll Number: 65
Experiment No.: 9
Implement time series forecasting.
Aim : Implement time series forecasting
I OBJECTIVE
To understand basic concepts of time series forecasting.
To explore time series forecasting methods.
II THEORY
The investigation of time series can also be broadly divided into descriptive modeling, called
time series analysis, and predictive modeling, called time series forecasting.
Time series forecasting is a method used to predict future values based on previously
observed values. It is commonly used in areas such as economics, finance, and demand
forecasting for products.
Fig 1. Taxonomy of time series forecasting techniques.
Time Series forecasting can be further classified into four broad categories of techniques:
1. Time Series Decomposition: This approach involves breaking down a time series into
its trend, seasonality, and residual components, and then forecasting each component
separately. The final forecast is obtained by adding the forecasts of each component
1
St. Francis Institute of Technology
SV Road, Borivali (West), Mumbai 400103
2. Smoothing Based Techniques: This category includes methods like moving average
and exponential smoothing that use past values to smooth out noise in the time series
and make predictions
3. Regression Based Techniques: This category includes methods like linear regression
and ARIMA that use a combination of past values and other predictors to make
predictions.
4. Machine Learning Based Techniques: This category includes methods like random
forests, neural networks, and support vector machines that use algorithms from
machine learning to fit models to the time series data and make predictions. These
methods can handle complex and non-linear data and have achieved state-of-the-art
performance in many time series forecasting tasks.
Smoothing methods are a class of time series forecasting techniques that aim to remove noise
and make predictions by averaging the past values of a time series. There are two commonly
used smoothing methods:
1. Average Method: This method involves taking the average of a fixed number of past
values to make predictions. For example, if we have a time series y with n values, the
average method would predict the next value as the average of the last n values.
2. Moving Average Smoothing: This method is similar to the average method, but
instead of using a fixed number of past values, it uses a sliding window of a fixed size
to average the past values. For example, if the window size is m, the moving average
method would predict the next value as the average of the last m values.
Regression Based Techniques
1. Linear Regression: Time series analysis can also be performed using linear
regression, which is a supervised machine learning method used to model the
relationship between a dependent variable and one or more independent variables. In
time series analysis, the dependent variable is the time series data, and the
independent variable is time.
2. ARIMA Model: ARIMA is a commonly used method for time series analysis and
forecasting, which stands for Autoregressive Integrated Moving Average. It is a type
of regression-based model that uses past values of the time series to make predictions.
III IMPLEMENTATION
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
dataset =
pd.read_csv('https://raw.githubusercontent.com/maks-p/restaurant_sales_forecasting/master/c
sv/CSV_for_EDA.csv')
total_data = dataset["date"].count()
split = int(total_data * 0.90)
train = dataset[:split]
test = dataset[split:]
plt.figure(figsize=(12, 8))
plt.plot(train.date, train.inside_sales, label='Train')
plt.plot(test.date, test.inside_sales, label='Test')
plt.xticks(rotation='vertical')
plt.legend(loc='best')
plt.title("Train Test Split")
2
St. Francis Institute of Technology
SV Road, Borivali (West), Mumbai 400103
plt.show()
from statsmodels.tsa.ar_model import AutoReg
model_ag = AutoReg(endog = train ["inside_sales"], \
lags = 7, \
trend='c', \
seasonal = False, \
exog = None, \
hold_back = None, \
period = None, \
missing = 'none')
fit_ag = model_ag.fit()
print("Coefficients: \n%s" % fit_ag.params)
Coefficients:
const 9441.469011
inside_sales.L1 0.128783
inside_sales.L2 -0.051229
inside_sales.L3 0.004284
inside_sales.L4 -0.072142
inside_sales.L5 0.011292
inside_sales.L6 0.123930
inside_sales.L7 0.209163
dtype: float64
predictions = fit_ag.predict(start=len(train), \
end=len(train)+len(test)-1, \
dynamic=False)
predictions.name = "Predictions"
result = pd.concat([test, predictions], axis=1).reindex(test.index)
print(result)
3
St. Francis Institute of Technology
SV Road, Borivali (West), Mumbai 400103
from sklearn.metrics import mean_squared_error
from math import sqrt
rmse = sqrt(mean_squared_error(test["inside_sales"], predictions))
print("AR Root Mean Square Error (RMSE): %.3f" % rmse)
AR Root Mean Square Error (RMSE): 2427.322
from statsmodels.tsa.arima.model import ARIMA
model_ma = ARIMA(endog = train["inside_sales"], \
order=(0, 0, 2))
#endog: dependent variable, response variable or y (endogenous)
#order: order of the model for the autoregressive, differences & moving average components.
fit_ma = model_ma.fit()
print("Coefficients: \n%s" % fit_ma.params)
Coefficients:
const 1.460960e+04
ma.L1 1.679819e-01
ma.L2 -3.590835e-02
sigma2 5.585174e+06
dtype: float64
predictions_ma = fit_ma.predict(start = len(train), \
end = len(train)+len (test)-1, \
dynamic = False)
predictions_ma.name = "Predictions"
result_ma = pd.concat([test, predictions_ma], axis=1).reindex(test.index)
print (result_ma)
from sklearn.metrics import mean_squared_error
from math import sqrt
rmse_ma = sqrt(mean_squared_error(test["inside_sales"], predictions_ma))
4
St. Francis Institute of Technology
SV Road, Borivali (West), Mumbai 400103
print("MA - Root Mean Square Error (RMSE): %.3f" % rmse_ma)
MA - Root Mean Square Error (RMSE): 2457.157
plt.figure(figsize=(12,8))
plt.plot(train.date, train.inside_sales, label='Train')
plt.plot(test.date, test.inside_sales, label='Test')
plt.plot(result_ma.date, result_ma. Predictions, label='Prediction')
plt.xticks(dataset ["date"], dataset ["date"], rotation='vertical')
plt.legend(loc='best')
plt.title("Predictions MA model")
plt.xlabel('date')
plt.ylabel('inside_sales')
plt.show()
from statsmodels.tsa.arima.model import ARIMA
model_arima = ARIMA(endog = train["inside_sales"], \
order = (1, 1, 1))
fit_arima = model_arima.fit()
print("Coefficients: \n%s" % fit_arima.params)
Coefficients:
ar.L1 1.433320e-01
ma.L1 -9.792971e-01
sigma2 5.628931e+06
dtype: float64
predictions_arima = fit_arima.predict(start =
len(train), \ end = len(train)+len
(test)-1, \ dynamic = False)
predictions_arima.name = "Predictions"
result_arima = pd.concat([test, predictions_arima], axis=1).reindex(test.index)
print (result_arima)
5
St. Francis Institute of Technology
SV Road, Borivali (West), Mumbai 400103
from sklearn.metrics import mean_squared_error
from math import sqrt
rmse_arima = sqrt(mean_squared_error(test ["inside_sales"], predictions_arima))
print("ARIMA Root Mean Square Error (RMSE): %.3f" % rmse_arima)
ARIMA Root Mean Square Error (RMSE):
2387.420 plt.figure(figsize=(12,8))
plt.plot(train.date, train.inside_sales, label='Train')
plt.plot(test.date, test.inside_sales, label='Test')
plt.plot(result_arima.date, result_arima. Predictions, label='Prediction')
plt.xticks (dataset ["date"], dataset ["date"], rotation='vertical')
plt.legend(loc='best')
plt.title("Predictions ARIMA model")
plt.xlabel('date')
plt.ylabel('inside_sales')
plt.show()
from statsmodels.tsa.statespace.sarimax import SARIMAX
model_sarima = SARIMAX(endog = train["inside_sales"], \
order = (1, 1, 1), \
seasonal_order=(0, 0, 0, 0))
fit_sarima = model_sarima.fit()
6
St. Francis Institute of Technology
SV Road, Borivali (West), Mumbai 400103
print("Coefficients: \n%s" % fit_sarima.params)
Coefficients:
ar.L1 1.433320e-01
ma.L1 -9.792971e-01
sigma2 5.628931e+06
dtype: float64
predictions_sarima = fit_sarima.predict(start = len(train), \
end = len(train)+len(test)-1, \
dynamic = False)
predictions_sarima.name = "Predictions"
result_sarima = pd.concat([test, predictions_sarima], axis=1) \
.reindex(test.index)
print (result_sarima)
from sklearn.metrics import mean_squared_error
from math import sqrt
rmse_sarima =
sqrt(mean_squared_error(test["inside_
sales"], \ predictions_sarima))
print("SARIMA Root Mean Square Error (RMSE): %.3f" % rmse_sarima)
SARIMA Root Mean Square Error (RMSE): 2387.420
plt.figure(figsize=(12,8))
plt.plot(train.date, train.inside_sales, label='Train')
plt.plot(test.date, test.inside_sales, label='Test')
plt.plot(result_sarima.date, result_sarima. Predictions, label='Prediction')
plt.xticks (dataset ["date"], dataset ["date"], rotation='vertical')
plt.legend(loc='best')
plt.title("Predictions SARIMA model")
plt.xlabel('date')
plt.ylabel('inside_sales')
plt.show()
7
St. Francis Institute of Technology
SV Road, Borivali (West), Mumbai 400103
IV CONCLUSION
We have understood the basic concepts of time series forecasting and implemented it.
V REFERENCES
https://www.justintodata.com/arima-models-in-python-time-series-prediction/
https://machinelearningmastery.com/time-series-forecasting-methods-in-python-cheat-
s heet/
https://cprosenjit.medium.com/10-time-series-forecasting-methods-we-should-know-
291 037d2e285
VI POST LAB QUESTION/ANSWER
1. Explain what are the three components of ARIMA model
The ARIMA model is a combination of three components: the autoregression (AR)
component, the differencing (I) component, and the moving average (MA)
component. The autoregression component models the relationship between the
current value and past values, the differencing component removes the non-
stationarity from the time series data, and the moving average component models the
errors in the time series.
8