0% found this document useful (0 votes)
296 views

Mini Project Time Series

This document summarizes time series demand data analysis and forecasting for two items, Item A and Item B. Key steps included: exploratory data analysis, converting to time series objects, decomposing into seasonal, trend and remainder components using STL, dividing into training and test data, building random walk with drift and Holt-Winters models on training data and calculating accuracy metrics like MAPE on test data. Item A demand showed an increasing trend while Item B showed decreasing trend over time. MAPE for Item A forecast was 14.1% and for Item B was 10.8%, indicating room for improvement in model accuracy.

Uploaded by

sumit kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
296 views

Mini Project Time Series

This document summarizes time series demand data analysis and forecasting for two items, Item A and Item B. Key steps included: exploratory data analysis, converting to time series objects, decomposing into seasonal, trend and remainder components using STL, dividing into training and test data, building random walk with drift and Holt-Winters models on training data and calculating accuracy metrics like MAPE on test data. Item A demand showed an increasing trend while Item B showed decreasing trend over time. MAPE for Item A forecast was 14.1% and for Item B was 10.8%, indicating room for improvement in model accuracy.

Uploaded by

sumit kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 55

Mini Project:

Time Series Modeling


Model Report

Table of Contents

 Project Objective
 Exploratory Data Analysis and Descriptive Statistics
 Summary Statistics of the dataset:
 Creation of Training and Testing Data set
 Model Building – CART
 Model Building – Random Forest
 Model Building –Logistic Regression
 Model Comparison
 Appendix – Sample Source Code

PROBLEM OBJECTIVE:

Assignment: Business Intelligence Using


Text Mining
1. A dataset of Shark Tank episodes is made available. It contains 495 entrepreneurs making their
pitch to the VC sharks.
2. You will ONLY use “Description” column for the initial text mining exercise.
3. Step 1:
a. Extract the text into text corpus and perform following operations:
i. Create DTM
ii. Use “Deal” as a Dependent Variable
iii. Use CART model and arrive at your CART diagram
iv. Build Logistic Regression Model and find out your accuracy of the model
v. Build randomForst model and arrive at your varImpPlot
4. Step 2:
a. Now, add a variable to you analysis called as “ratio”. This variable is
“askedfor/valuation”. (This variable is to be added as a column to your dataframe in
Step 1.)
b. Rebuild “New” models- CART, randomForest and Logistic Regression
5. Deliverables: (in a word document)
a. CART Tree (Before and After) (5 Marks)
b. RandomForest plot (Before and After) (5 Marks)
c. Confusion Matrix of Logistic Regression (Before and After) (5 Marks)
d. (Most important)- Your interpretation in plain simple English not extending more than
half a page. (15 Marks)

Solution:
Read the dataset
library(readxl)

#Remove the first record from the excel file (skip = 1)


demand <- read_excel("E:/Great_Lakes/TimeSeries/Demand.xlsx", skip = 1)

str(demand)

## Classes 'tbl_df', 'tbl' and 'data.frame': 187 obs. of 4 variables:


## $ Year : num 2002 2002 2002 2002 2002 ...
## $ Month : num 1 2 3 4 5 6 7 8 9 10 ...
## $ Item A: num 1954 2302 3054 2414 2226 ...
## $ Item B: num 2585 3368 3210 3111 3756 ...
Rename the columns as per the series
names(demand)[3] <- c("IteamA")
names(demand)[4] <- c("IteamB")

str(demand)

## Classes 'tbl_df', 'tbl' and 'data.frame': 187 obs. of 4 variables:


## $ Year : num 2002 2002 2002 2002 2002 ...
## $ Month : num 1 2 3 4 5 6 7 8 9 10 ...
## $ IteamA: num 1954 2302 3054 2414 2226 ...
## $ IteamB: num 2585 3368 3210 3111 3756 ...

Exploratory Data Analysis and Descriptive Statistics


Key Observations from data:-

 No of rows in dataset 187


 No of columns in dataset 4
 Demand data is from January 2002 to July 2017

Missing Value in Dataset:-

 No Missing value in Dataset

Data summary
summary(demand[,3:4])

## IteamA IteamB
## Min. :1954 Min. :1153
## 1st Qu.:2748 1st Qu.:2362
## Median :3134 Median :2876
## Mean :3263 Mean :2962
## 3rd Qu.:3741 3rd Qu.:3468
## Max. :5725 Max. :5618

Convert the data into a time series dataset


#Plottig Time Series for Item A for Monthly data from year 2012 Jan to 2017
July
dem_ItA <- ts(demand[,3], start=c(2002,1), end=c(2017,7), frequency=12)
plot(dem_ItA)

#Plottig Time Series for Item B for Monthly data from year 2012 Jan to 2017
July
dem_ItB <- ts(demand[,4], start=c(2002,1), end=c(2017,7), frequency=12)
plot(dem_ItB)
#Plotting the Time Series across Item A and Item B
ts.plot(dem_ItA, dem_ItB, gpars = list(col = c("black", "red")),xlab="year",
ylab="demand")
legend("topleft", colnames(demand[3:4]), col=1:ncol(demand), lty=1.9,
cex=.45)
From the above plots, we can see Item A has an increasing demand, whereas Item B has fall in
demand. Also, there is some seasonality and trend in demands. Both Item A and B doesn’t seem to
have cyclic in nature. Item A variation increases with time whereas Item B variation decreases.

Decomposition of time series dataset


A time series decomposition is procedure which transform a time series into multiple different series.
The original time series is often computed (decompose) into 3 sub-time series:

1. Seasonal: patterns that repeat with fixed period of time.


2. Trend: the underlying trend of the metrics.
3. Random: (also call “noise”, “Irregular” or “Remainder”) Is the residuals of the time series
after allocation into the seasonal and trends time series.

Other than above three component there is Cyclic component which occurs after long period of
time.
Additive or multiplicative decomposition? To get a successful decomposition, it is important to
choose between the additive or multiplicativemodel. To choose the right model we need to look at
the time series.

a. The additive model is useful when the seasonal variation is relatively constant over time.
b. The multiplicative model is useful when the seasonal variation increases over time.
#Iteam A
monthplot(dem_ItA)

boxplot (dem_ItA ~cycle(dem_ItA))


#Iteam B
monthplot(dem_ItB)

boxplot (dem_ItA ~cycle(dem_ItB))

The seasonal variation looked to be about the same magnitude across time, so an additive
decomposition might give good results.
Decomposing the time series using STL
STL is a very versatile and robust method for decomposing time series. STL is an acronym for
“Seasonal and Trend decomposition using Loess”. It does an additive decomposition and the four
graphs are the original data, seasonal component, trend component and the remainder.

#Item A
ItA_Sea <- stl(dem_ItA[,1], s.window="p") #constant seasonality
plot(ItA_Sea)
#Item B
ItB_Sea <- stl(dem_ItB[,1], s.window="p") #constant seasonality
plot(ItB_Sea)

From the above decomposed details, we can see that there is continuous increase in demand for
Item A, but on contrary similar drop pattern observed for Item B.
Decompose the time series and plot the deseasoned series
If the focus is on figuring out whether the general trend of demand is up, we deseasonalize, and
possibly forget about the seasonal component. However, if you need to forecast the demand in next
month, then you need take into account both the secular trend and seasonality.
Item A

series_names <- c('Deseasoned', 'Actual')


Deseason_ItA <- (ItA_Sea$time.series[,2]+ItA_Sea$time.series[,3])
ts.plot(dem_ItA, Deseason_ItA, col=c("red", "blue"), main="ItemA Demand vs
Deseasoned Demand")
Above plot show demand in Red and de-seasoned demand in Blue, we can see that there is
increasing trend of demand. The residual component is also part of analysis.
Item B

series_names <- c('Deseasoned', 'Actual')


Deseason_ItB <- (ItB_Sea$time.series[,2]+ItB_Sea$time.series[,3])
ts.plot(dem_ItB, Deseason_ItB, col=c("red", "blue"), main="ItemB Demand vs
Deseasoned Demand")
The above plot show demand in Red and de-seasoned demand in Blue, we can see that there is
decreasing trend of demand.

Divide data into test and train


DataATrain <- window(dem_ItA, start=c(2002,1), end=c(2015,12), frequency=12)
DataATest <- window(dem_ItA, start=c(2016,1), frequency=12)

DataBTrain <- window(dem_ItB, start=c(2002,1), end=c(2015,12), frequency=12)


DataBTest <- window(dem_ItB, start=c(2016,1), frequency=12)

Convert into seasonal, trend and irregular component using


STL
ItmATrn <- stl(DataATrain[,1], s.window="p")
ItmBTrn <- stl(DataBTrain[,1], s.window="p")
Model building
Random walk with drift model - Forecast on train data
It assumes that, at each point in time, the series merely takes a random step away from its last
recorded position, with steps whose mean value is zero.

library(forecast)
fcst.ItA.stl <- forecast(ItmATrn, method="rwdrift", h=19)
fcst.ItB.stl <- forecast(ItmBTrn, method="rwdrift", h=19)

VecA<- cbind(DataATest,fcst.ItA.stl$mean)
VecB<- cbind(DataBTest,fcst.ItB.stl$mean)

Item A
#par(mfrow=c(1,1), mar=c(2, 2, 2, 2), mgp=c(3, 1, 0), las=0)
ts.plot(VecA, col=c("blue", "red"),xlab="year", ylab="demand",
main="Quarterly Demand A: Actual vs Forecast")
Mean absolute percentage error (MAPE)
Calculates the mean absolute percentage error (Deviation) function for the forecast and the eventual
outcomes.

MAPEA <- mean(abs(VecA[,1]-VecA[,2])/VecA[,1])


MAPEA

## [1] 0.1408798

Box-Ljung Test
To check is residual are independent
H0: Residuals are independent
Ha: Residuals are not independent

Box.test(fcst.ItA.stl$residuals, lag=10, type="Ljung-Box")


##
## Box-Ljung test
##
## data: fcst.ItA.stl$residuals
## X-squared = 72.292, df = 10, p-value = 1.597e-11

Conclusion: Reject Ha: Residuals are not independent

Item B
ts.plot(VecB, col=c("blue", "red"),xlab="year", ylab="demand",
main="Quarterly Demand B: Actual vs Forecast")
Mean absolute percentage error (MAPE)
Calculates the mean absolute percentage error (Deviation) function for the forecast and the eventual
outcomes.

MAPEB <- mean(abs(VecB[,1]-VecB[,2])/VecB[,1])


MAPEB
## [1] 0.1082608

Box-Ljung Test
To check is resuidual are independent
H0: Residuals are independent
Ha: Residuals are not independent

Box.test(fcst.ItB.stl$residuals, lag=30, type="Ljung-Box")


##
## Box-Ljung test
##
## data: fcst.ItB.stl$residuals
## X-squared = 123.22, df = 30, p-value = 2.931e-13

Conclusion: Reject Ha: Residuals are not independent


From the above MAPE results we can see the 14 % and 10.8% less accuracy in model.

Holt Winter method


The Holt-Winters seasonal method comprises the forecast equation and three smoothing equations
— one for the level, one for trend, and one for the seasonal component, with smoothing parameters
alpha(a), beta(ß) and gama(??).
Item A

hwA <- HoltWinters(as.ts(DataATrain),seasonal="additive")


hwA
## Holt-Winters exponential smoothing with trend and additive seasonal
component.
##
## Call:
## HoltWinters(x = as.ts(DataATrain), seasonal = "additive")
##
## Smoothing parameters:
## alpha: 0.1241357
## beta : 0.03174654
## gamma: 0.3636975
##
## Coefficients:
## [,1]
## a 3753.348040
## b 7.663395
## s1 -1250.098605
## s2 -438.592232
## s3 -224.017731
## s4 -407.395313
## s5 -507.668223
## s6 -667.267246
## s7 63.659702
## s8 197.909330
## s9 -301.525945
## s10 25.272325
## s11 712.529546
## s12 1545.291998

plot(hwA)

hwAForecast <- forecast(hwA, h=19)


VecA1 <- cbind(DataATest,hwAForecast)
par(mfrow=c(1,1), mar=c(2, 2, 2, 2), mgp=c(3, 1, 0), las=0)

ts.plot(VecA1[,1],VecA1[,2], col=c("blue","red"),xlab="year", ylab="demand",


main="Demand A: Actual vs Forecast")

Box.test(hwAForecast$residuals, lag=20, type="Ljung-Box")


##
## Box-Ljung test
##
## data: hwAForecast$residuals
## X-squared = 14.227, df = 20, p-value = 0.8188

Conclusion: Reject H0: Residuals are independent


#install.packages("MLmetrics")
library(MLmetrics)
## Warning: package 'MLmetrics' was built under R version 3.5.1
##
## Attaching package: 'MLmetrics'
## The following object is masked from 'package:base':
##
## Recall
MAPE(VecA1[,1],VecA1[,2])
## [1] 0.1160528

Item B

hwB <- HoltWinters(as.ts(DataBTrain),seasonal="additive")


hwB
## Holt-Winters exponential smoothing with trend and additive seasonal
component.
##
## Call:
## HoltWinters(x = as.ts(DataBTrain), seasonal = "additive")
##
## Smoothing parameters:
## alpha: 0.0166627
## beta : 0.4878834
## gamma: 0.5000132
##
## Coefficients:
## [,1]
## a 2297.12724
## b -15.29024
## s1 -1222.01821
## s2 -1012.34884
## s3 -442.56913
## s4 -307.95973
## s5 79.56065
## s6 258.33260
## s7 697.64492
## s8 241.68337
## s9 -246.12729
## s10 -465.09216
## s11 120.77708
## s12 412.50043
plot(hwB)

hwBForecast <- forecast(hwB, h=19)


VecB1 <- cbind(DataBTest,hwBForecast)

par(mfrow=c(1,1), mar=c(2, 2, 2, 2), mgp=c(3, 1, 0), las=0)


ts.plot(VecB1[,1],VecB1[,2], col=c("blue","red"),xlab="year", ylab="demand",
main="Demand B: Actual vs Forecast")

Box.test(hwBForecast$residuals, lag=20, type="Ljung-Box")


##
## Box-Ljung test
##
## data: hwBForecast$residuals
## X-squared = 13.101, df = 20, p-value = 0.873

Conclusion: Do not reject H0: Residuals are independent

MAPE(VecB1[,1],VecB1[,2])
## [1] 0.1867152

MAPE is 11.6 % and 18.6 % for item A and Item B resp.

ARIMA Model
Check for stationary time series
Dickey-Fuller test
Statistical tests make strong assumptions about your data. They can only be used to inform the
degree to which a null hypothesis can be accepted or rejected. The result must be interpreted for a
given problem to be meaningful. Nevertheless, they can provide a quick check and confirmatory
evidence that your time series is stationary or non-stationary.
Null Hypothesis (H0): If accepted, it suggests the time series has a unit root, meaning it is non-
stationary. It has some time dependent structure.
Alternate Hypothesis (H1): The null hypothesis is rejected; it suggests the time series does not
have a unit root, meaning it is stationary. It does not have time-dependent structure.
p-value > 0.05: Accept the null hypothesis (H0), the data has a unit root and is non-stationary.
p-value <= 0.05: Reject the null hypothesis (H0), the data does not have a unit root and is
stationary.
Item A

library(tseries)
adf.test(dem_ItA)
## Warning in adf.test(dem_ItA): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: dem_ItA
## Dickey-Fuller = -7.8632, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

diff_dem_ItA <- diff(dem_ItA)


plot(diff_dem_ItA)
adf.test(diff(dem_ItA))
## Warning in adf.test(diff(dem_ItA)): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: diff(dem_ItA)
## Dickey-Fuller = -8.0907, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

Item B

library(tseries)
adf.test(dem_ItB)
## Warning in adf.test(dem_ItB): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: dem_ItB
## Dickey-Fuller = -12.967, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
diff_dem_ItB <- diff(dem_ItB)
plot(diff_dem_ItB)

adf.test(diff(dem_ItB))
## Warning in adf.test(diff(dem_ItB)): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: diff(dem_ItB)
## Dickey-Fuller = -9.8701, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

From the above ADF test the Null Hypothesis is rejected.


The time series of differences (above) does appear to be stationary in mean and variance, as the
level of the series stays roughly constant over time, and the variance of the series appears roughly
constant over time.

ACF and PACF (performing to check the stationary data


and autocorrelation)
The function Acf computes an estimate of the autocorrelation function of a (possibly multivariate)
time series. Function Pacf computes an estimate of the partial autocorrelation function of a (possibly
multivariate) time series.
ACF
> Autocorrelation of different orders gives inside information regarding time series. It will help
determine order p of the series
> Significant autocorrelations imply observations of long past influences current observation.
> lag.max: maximum lag at which to calculate the acf. Default is 10∗∗log10(N/m)
>>where N is the number of observations and m the number of series.
>>Will be automatically limited to one less than the number of observations in the series
PACF
> Partial autocorrelation adjusts for the intervening periods

acf(dem_ItA,lag=15)
acf(diff_dem_ItA, lag=15)

acf(dem_ItB,lag=15)
acf(diff_dem_ItB, lag=15)
Checking with lag 50

acf(dem_ItA,lag=50)
acf(diff_dem_ItA, lag=50)
pacf(dem_ItA)
pacf(diff_dem_ItA)
acf(dem_ItB,lag=50)
acf(diff_dem_ItB, lag=50)
pacf(dem_ItB)
pacf(diff_dem_ItB)
ARIMA model
ARMA models are commonly used in time series modeling. In ARMA model, AR stands for auto-
regression and MA stands for moving average.
The above ACF and PACF we have found out that the positive and negative values mean (that is
because of data is stationary); they are not cuts for AR(2) series and no gradually decrease in the
value of PACF, no significance of MA(2).
Item A

ItA.arima.fit.train <- auto.arima(DataATrain, seasonal=TRUE)


ItA.arima.fit.train
## Series: DataATrain
## ARIMA(0,0,0)(0,1,1)[12] with drift
##
## Coefficients:
## sma1 drift
## -0.6581 3.9132
## s.e. 0.0798 0.9188
##
## sigma^2 estimated as 116022: log likelihood=-1133.35
## AIC=2272.71 AICc=2272.86 BIC=2281.86
plot(ItA.arima.fit.train$residuals)

plot(ItA.arima.fit.train$x,col="blue")
lines(ItA.arima.fit.train$fitted,col="red",main="Demand A: Actual vs
Forecast")
MAPE(ItA.arima.fit.train$fitted,ItA.arima.fit.train$x)
## [1] 0.0733376

The MAPE percentage error is now reduced to 7.3% for ARIMA model

acf(ItA.arima.fit.train$residuals)
pacf(ItA.arima.fit.train$residuals)

Box-Ljung Test
To check is resuidual are independent
H0: Residuals are independent
Ha: Residuals are not independent

Box.test(ItA.arima.fit.train$residuals, lag = 10, type = c("Ljung-Box"),


fitdf = 0)
##
## Box-Ljung test
##
## data: ItA.arima.fit.train$residuals
## X-squared = 16.716, df = 10, p-value = 0.0809

Conclusion: Reject H0: Residuals are independent


Forecasting on hold out dataset

ArimafcastA <- forecast(ItA.arima.fit.train, h=19)


VecA2 <- cbind(DataATest,ArimafcastA)

par(mfrow=c(1,1), mar=c(2, 2, 2, 2), mgp=c(3, 1, 0), las=0)

ts.plot(VecA2[,1],VecA2[,2], col=c("blue","red"),xlab="year", ylab="demand",


main="Demand A: Actual vs Forecast")

From the plot and data, we can see the forecasted value follows almost the same as actual value,
there are point of interaction at Jan 2016, May 2016, Dec 2016, Jan 2017.
Item B

ItB.arima.fit.train <- auto.arima(dem_ItB, seasonal=TRUE)


ItB.arima.fit.train
## Series: dem_ItB
## ARIMA(0,0,0)(2,1,1)[12] with drift
##
## Coefficients:
## sar1 sar2 sma1 drift
## 0.1763 0.0339 -0.7149 -9.3358
## s.e. 0.2083 0.1433 0.1880 0.8680
##
## sigma^2 estimated as 102973: log likelihood=-1258.89
## AIC=2527.78 AICc=2528.13 BIC=2543.6
plot(ItB.arima.fit.train$residuals)

plot(ItB.arima.fit.train$x,col="blue")
lines(ItB.arima.fit.train$fitted,col="red", main="Demand B: Actual vs
Forecast")
MAPE(ItB.arima.fit.train$fitted,ItB.arima.fit.train$x)
## [1] 0.07654621

The MAPE percentage error is now reduced to 9% for ARIMA model

acf(ItB.arima.fit.train$residuals)
pacf(ItB.arima.fit.train$residuals)

Box-Ljung Test
To check is resuidual are independent
H0: Residuals are independent
Ha: Residuals are not independent

Box.test(ItB.arima.fit.train$residuals, lag = 10, type = c("Ljung-Box"),


fitdf = 0)
##
## Box-Ljung test
##
## data: ItB.arima.fit.train$residuals
## X-squared = 16.285, df = 10, p-value = 0.09177

Conclusion: Reject H0: Residuals are independent


Forecasting on hold out dataset

ArimafcastB <- forecast(ItB.arima.fit.train, h=19)


VecB2 <- cbind(DataBTest,ArimafcastB)

par(mfrow=c(1,1), mar=c(2, 2, 2, 2), mgp=c(3, 1, 0), las=0)

ts.plot(VecB2[,1],VecB2[,2], col=c("blue","red"),xlab="year", ylab="demand",


main="Demand B: Actual vs Forecast")

From the plot and data, we can see the forecasted value doesn’t exactly follows the actual value, but
there are point of interaction at Mar 2016, Apr 2016, May 2016 Nov 2016, Mar 2017.

Conclusion
For Time Series Forecasting problem, we observed the trend and seasonality in the data.
We have observed that the Item A has increasing trend, but for Item B the trend is declining.
Also, we observed for both item there are few months with high variation in seasonality; and for Item
A there are few outliers.
As the seasonality was not following the trend pattern we have used the “Additive” seasonality. We
have performed the three models
1. Random Walk with Drift,
2. Holt Winters and
3. ARIMA model.
Below are MAPE and Box-Ljung test observations for Models.
Random Walk with Drift
Item A# 0.1408798 (14%), p-value < 2.2e-16
Item B# 0.1082608 (10.8%), p-value = 2.931e-13
Holt Winters
Item A# 0.1160528 (11.6%), p-value = 0.8188
Item B# 0.1867152 (18.6%), p-value = 0.873
ARIMA
Item A# 0.0733376 (7%), p-value = 0.0809
Item B# 0.07654621 (7%), p-value = 0.09177
From the MAPE values observed the ARIMA model provided the lowest values and we selected the
model for the Forecasting.

Forecasting using ARIMA Model


Iteam A

ItA.arima.fit <- auto.arima(dem_ItA, seasonal=TRUE)


fcastA <- forecast(ItA.arima.fit, h=19)

plot(fcastA)
fcastA
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Aug 2017 4320.211 3879.991 4760.431 3646.953 4993.469
## Sep 2017 4169.513 3725.551 4613.476 3490.531 4848.495
## Oct 2017 4428.791 3981.385 4876.197 3744.542 5113.040
## Nov 2017 5102.669 4652.091 5553.246 4413.570 5791.767
## Dec 2017 5879.220 5425.721 6332.719 5185.653 6572.787
## Jan 2018 2819.535 2363.343 3275.727 2121.849 3517.221
## Feb 2018 3990.984 3532.307 4449.660 3289.498 4692.469
## Mar 2018 4181.449 3720.480 4642.419 3476.458 4886.441
## Apr 2018 4081.089 3618.003 4544.174 3372.860 4789.317
## May 2018 3888.336 3423.296 4353.376 3177.118 4599.554
## Jun 2018 4029.525 3562.679 4496.370 3315.545 4743.504
## Jul 2018 4390.292 3921.777 4858.807 3673.760 5106.823
## Aug 2018 4407.590 3900.778 4914.402 3632.487 5182.693
## Sep 2018 4257.019 3747.019 4767.019 3477.041 5036.997
## Oct 2018 4516.419 4003.480 5029.358 3731.946 5300.892
## Nov 2018 5190.414 4674.763 5706.065 4401.794 5979.034
## Dec 2018 5967.079 5448.925 6485.232 5174.632 6759.526
## Jan 2019 2907.502 2387.039 3427.966 2111.522 3703.483
## Feb 2019 4079.056 3556.458 4601.654 3279.812 4878.301

Item B

ItB.arima.fit <- auto.arima(dem_ItB, seasonal=TRUE)


fcastB <- forecast(ItB.arima.fit, h=19)

plot(fcastB)

fcastB
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Aug 2017 2356.3605 1945.1156 2767.605 1727.4157 2985.305
## Sep 2017 2082.9473 1671.7024 2494.192 1454.0025 2711.892
## Oct 2017 1784.7949 1373.5500 2196.040 1155.8501 2413.740
## Nov 2017 2436.4019 2025.1570 2847.647 1807.4571 3065.347
## Dec 2017 2429.8611 2018.6162 2841.106 1800.9163 3058.806
## Jan 2018 965.2270 553.9834 1376.471 336.2842 1594.170
## Feb 2018 1278.2300 866.9865 1689.474 649.2873 1907.173
## Mar 2018 1693.3730 1282.1294 2104.617 1064.4302 2322.316
## Apr 2018 2088.9240 1677.6805 2500.168 1459.9813 2717.867
## May 2018 2342.6726 1931.4290 2753.916 1713.7298 2971.615
## Jun 2018 2587.5703 2176.3268 2998.814 1958.6276 3216.513
## Jul 2018 2903.9519 2492.7084 3315.196 2275.0092 3532.895
## Aug 2018 2267.6121 1814.7093 2720.515 1574.9570 2960.267
## Sep 2018 1945.3845 1492.4816 2398.287 1252.7293 2638.040
## Oct 2018 1663.8271 1210.9242 2116.730 971.1719 2356.482
## Nov 2018 2293.2590 1840.3561 2746.162 1600.6038 2985.914
## Dec 2018 2325.0672 1872.1643 2777.970 1632.4120 3017.722
## Jan 2019 843.6106 390.7094 1296.512 150.9580 1536.263
## Feb 2019 1150.9134 698.0123 1603.815 458.2608 1843.566

Appendix – Sample Source Code :


library(readxl)
setwd("E:/Great_Lakes/TimeSeries")

#Remove the first record from the excel file (skip = 1)


demand <- read_excel("Demand.xlsx", skip = 1)

str(demand)

## Classes 'tbl_df', 'tbl' and 'data.frame': 187 obs. of 4 variables:


## $ Year : num 2002 2002 2002 2002 2002 ...
## $ Month : num 1 2 3 4 5 6 7 8 9 10 ...
## $ Item A: num 1954 2302 3054 2414 2226 ...
## $ Item B: num 2585 3368 3210 3111 3756 ...
#Rename the columns as per the series
names(demand)[3] <- c("IteamA")
names(demand)[4] <- c("IteamB")
str(demand)
## Classes 'tbl_df', 'tbl' and 'data.frame': 187 obs. of 4 variables:
## $ Year : num 2002 2002 2002 2002 2002 ...
## $ Month : num 1 2 3 4 5 6 7 8 9 10 ...
## $ IteamA: num 1954 2302 3054 2414 2226 ...
## $ IteamB: num 2585 3368 3210 3111 3756 ...

dim(demand)
# [1] 187 4

View(demand)
head(demand)
summary(demand[,3:4])
## IteamA IteamB
## Min. :1954 Min. :1153
## 1st Qu.:2748 1st Qu.:2362
## Median :3134 Median :2876
## Mean :3263 Mean :2962
## 3rd Qu.:3741 3rd Qu.:3468
## Max. :5725 Max. :5618

#Plottig Time Series for Item A for Monthly data from year 2012 Jan to 2017 July
dem_ItA <- ts(demand[,3], start=c(2002,1), end=c(2017,7), frequency=12)
plot(dem_ItA)

#Plottig Time Series for Item B for Monthly data from year 2012 Jan to 2017 July
dem_ItB <- ts(demand[,4], start=c(2002,1), end=c(2017,7), frequency=12)
plot(dem_ItB)
#Plotting the Time Series across Item A and Item B
ts.plot(dem_ItA, dem_ItB, gpars = list(col = c("black", "red")),xlab="year", ylab="demand")
legend("topleft", colnames(demand[3:4]), col=1:ncol(demand), lty=1.9, cex=.45)

#Iteam A
monthplot(dem_ItA)
boxplot (dem_ItA ~cycle(dem_ItA))
#Iteam B
monthplot(dem_ItB)
boxplot (dem_ItA ~cycle(dem_ItB))
#Item A
ItA_Sea <- stl(dem_ItA[,1], s.window="p") #constant seasonality
plot(ItA_Sea)
#Item B
ItB_Sea <- stl(dem_ItB[,1], s.window="p") #constant seasonality
plot(ItB_Sea)

series_names <- c('Deseasoned', 'Actual')


Deseason_ItA <- (ItA_Sea$time.series[,2]+ItA_Sea$time.series[,3])
ts.plot(dem_ItA, Deseason_ItA, col=c("red", "blue"), main="ItemA Demand vs Deseasoned Demand")
series_names <- c('Deseasoned', 'Actual')
Deseason_ItB <- (ItB_Sea$time.series[,2]+ItB_Sea$time.series[,3])
ts.plot(dem_ItB, Deseason_ItB, col=c("red", "blue"), main="ItemB Demand vs Deseasoned Demand")
DataATrain <- window(dem_ItA, start=c(2002,1), end=c(2015,12), frequency=12)
DataATest <- window(dem_ItA, start=c(2016,1), frequency=12)
DataBTrain <- window(dem_ItB, start=c(2002,1), end=c(2015,12), frequency=12)
DataBTest <- window(dem_ItB, start=c(2016,1), frequency=12)
ItmATrn <- stl(DataATrain[,1], s.window="p")
ItmBTrn <- stl(DataBTrain[,1], s.window="p")
install.packages('forecast')
library(forecast)
fcst.ItA.stl <- forecast(ItmATrn, method="rwdrift", h=19)
fcst.ItB.stl <- forecast(ItmBTrn, method="rwdrift", h=19)
VecA<- cbind(DataATest,fcst.ItA.stl$mean)
VecB<- cbind(DataBTest,fcst.ItB.stl$mean)

#par(mfrow=c(1,1), mar=c(2, 2, 2, 2), mgp=c(3, 1, 0), las=0)


ts.plot(VecA, col=c("blue", "red"),xlab="year", ylab="demand", main="Quarterly Demand A: Actual vs
Forecast")

MAPEA <- mean(abs(VecA[,1]-VecA[,2])/VecA[,1])


MAPEA
## [1] 0.1408798

Box.test(fcst.ItA.stl$residuals, lag=10, type="Ljung-Box")


##
## Box-Ljung test
##
## data: fcst.ItA.stl$residuals
## X-squared = 72.292, df = 10, p-value = 1.597e-11

ts.plot(VecB, col=c("blue", "red"),xlab="year", ylab="demand", main="Quarterly Demand B: Actual vs


Forecast")
MAPEB <- mean(abs(VecB[,1]-VecB[,2])/VecB[,1])
MAPEB
## [1] 0.1082608

Box.test(fcst.ItB.stl$residuals, lag=30, type="Ljung-Box")


##
## Box-Ljung test
##
## data: fcst.ItB.stl$residuals
## X-squared = 123.22, df = 30, p-value = 2.931e-13
hwA <- HoltWinters(as.ts(DataATrain),seasonal="additive")
hwA

## Holt-Winters exponential smoothing with trend and additive seasonal component.


##
## Call:
## HoltWinters(x = as.ts(DataATrain), seasonal = "additive")
##
## Smoothing parameters:
## alpha: 0.1241357
## beta : 0.03174654
## gamma: 0.3636975
##
## Coefficients:
## [,1]
## a 3753.348040
## b 7.663395
## s1 -1250.098605
## s2 -438.592232
## s3 -224.017731
## s4 -407.395313
## s5 -507.668223
## s6 -667.267246
## s7 63.659702
## s8 197.909330
## s9 -301.525945
## s10 25.272325
## s11 712.529546
## s12 1545.291998

plot(hwA)
hwAForecast <- forecast(hwA, h=19)
VecA1 <- cbind(DataATest,hwAForecast)
par(mfrow=c(1,1), mar=c(2, 2, 2, 2), mgp=c(3, 1, 0), las=0)
ts.plot(VecA1[,1],VecA1[,2], col=c("blue","red"),xlab="year", ylab="demand", main="Demand A:
Actual vs Forecast")
Box.test(hwAForecast$residuals, lag=20, type="Ljung-Box")
##
## Box-Ljung test
##
## data: hwAForecast$residuals
## X-squared = 14.227, df = 20, p-value = 0.8188

#install.packages("MLmetrics")
library(MLmetrics)
## Warning: package 'MLmetrics' was built under R version 3.5.1
##
## Attaching package: 'MLmetrics'
## The following object is masked from 'package:base':
##
## Recall
MAPE(VecA1[,1],VecA1[,2])
## [1] 0.1160528

hwB <- HoltWinters(as.ts(DataBTrain),seasonal="additive")


hwB
## Holt-Winters exponential smoothing with trend and additive seasonal component.
##
## Call:
## HoltWinters(x = as.ts(DataBTrain), seasonal = "additive")
##
## Smoothing parameters:
## alpha: 0.0166627
## beta : 0.4878834
## gamma: 0.5000132
##
## Coefficients:
## [,1]
## a 2297.12724
## b -15.29024
## s1 -1222.01821
## s2 -1012.34884
## s3 -442.56913
## s4 -307.95973
## s5 79.56065
## s6 258.33260
## s7 697.64492
## s8 241.68337
## s9 -246.12729
## s10 -465.09216
## s11 120.77708
## s12 412.50043

plot(hwB)
hwBForecast <- forecast(hwB, h=19)
VecB1 <- cbind(DataBTest,hwBForecast)
par(mfrow=c(1,1), mar=c(2, 2, 2, 2), mgp=c(3, 1, 0), las=0)
ts.plot(VecB1[,1],VecB1[,2], col=c("blue","red"),xlab="year", ylab="demand", main="Demand B:
Actual vs Forecast")
Box.test(hwBForecast$residuals, lag=20, type="Ljung-Box")
##
## Box-Ljung test
##
## data: hwBForecast$residuals
## X-squared = 13.101, df = 20, p-value = 0.873
MAPE(VecB1[,1],VecB1[,2])
## [1] 0.1867152

library(tseries)
adf.test(dem_ItA)
## Warning in adf.test(dem_ItA): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: dem_ItA
## Dickey-Fuller = -7.8632, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
diff_dem_ItA <- diff(dem_ItA)
plot(diff_dem_ItA)
adf.test(diff(dem_ItA))
## Warning in adf.test(diff(dem_ItA)): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: diff(dem_ItA)
## Dickey-Fuller = -8.0907, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

library(tseries)
adf.test(dem_ItB)
## Warning in adf.test(dem_ItB): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: dem_ItB
## Dickey-Fuller = -12.967, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

diff_dem_ItB <- diff(dem_ItB)


plot(diff_dem_ItB)
adf.test(diff(dem_ItB))
## Warning in adf.test(diff(dem_ItB)): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: diff(dem_ItB)
## Dickey-Fuller = -9.8701, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

acf(dem_ItA,lag=15)
acf(diff_dem_ItA, lag=15)
acf(dem_ItB,lag=15)
acf(diff_dem_ItB, lag=15)
acf(dem_ItA,lag=50)
acf(diff_dem_ItA, lag=50)
pacf(dem_ItA)
pacf(diff_dem_ItA)
acf(dem_ItB,lag=50)
acf(diff_dem_ItB, lag=50)
pacf(dem_ItB)
pacf(diff_dem_ItB)
ItA.arima.fit.train <- auto.arima(DataATrain, seasonal=TRUE)
ItA.arima.fit.train
## Series: DataATrain
## ARIMA(0,0,0)(0,1,1)[12] with drift
##
## Coefficients:
## sma1 drift
## -0.6581 3.9132
## s.e. 0.0798 0.9188
##
## sigma^2 estimated as 116022: log likelihood=-1133.35
## AIC=2272.71 AICc=2272.86 BIC=2281.86

plot(ItA.arima.fit.train$residuals)
plot(ItA.arima.fit.train$x,col="blue")
lines(ItA.arima.fit.train$fitted,col="red",main="Demand A: Actual vs Forecast")
MAPE(ItA.arima.fit.train$fitted,ItA.arima.fit.train$x)
## [1] 0.0733376
##The MAPE percentage error is now reduced to 7.3% for ARIMA model

acf(ItA.arima.fit.train$residuals)
pacf(ItA.arima.fit.train$residuals)
Box.test(ItA.arima.fit.train$residuals, lag = 10, type = c("Ljung-Box"), fitdf = 0)
##
## Box-Ljung test
##
## data: ItA.arima.fit.train$residuals
## X-squared = 16.716, df = 10, p-value = 0.0809

ArimafcastA <- forecast(ItA.arima.fit.train, h=19)


VecA2 <- cbind(DataATest,ArimafcastA)
par(mfrow=c(1,1), mar=c(2, 2, 2, 2), mgp=c(3, 1, 0), las=0)
ts.plot(VecA2[,1],VecA2[,2], col=c("blue","red"),xlab="year", ylab="demand", main="Demand A:
Actual vs Forecast")

ItB.arima.fit.train <- auto.arima(dem_ItB, seasonal=TRUE)


ItB.arima.fit.train
## Series: dem_ItB
## ARIMA(0,0,0)(2,1,1)[12] with drift
##
## Coefficients:
## sar1 sar2 sma1 drift
## 0.1763 0.0339 -0.7149 -9.3358
## s.e. 0.2083 0.1433 0.1880 0.8680
##
## sigma^2 estimated as 102973: log likelihood=-1258.89
## AIC=2527.78 AICc=2528.13 BIC=2543.6
plot(ItB.arima.fit.train$residuals)
plot(ItB.arima.fit.train$x,col="blue")
lines(ItB.arima.fit.train$fitted,col="red", main="Demand B: Actual vs Forecast")
MAPE(ItB.arima.fit.train$fitted,ItB.arima.fit.train$x)
## [1] 0.07654621
##The MAPE percentage error is now reduced to 9% for ARIMA model

acf(ItB.arima.fit.train$residuals)
pacf(ItB.arima.fit.train$residuals)
Box.test(ItB.arima.fit.train$residuals, lag = 10, type = c("Ljung-Box"), fitdf = 0)
##
## Box-Ljung test
##
## data: ItB.arima.fit.train$residuals
## X-squared = 16.285, df = 10, p-value = 0.09177
##Conclusion: Reject H0: Residuals are independent

##Forecasting on hold out dataset


ArimafcastB <- forecast(ItB.arima.fit.train, h=19)
VecB2 <- cbind(DataBTest,ArimafcastB)

par(mfrow=c(1,1), mar=c(2, 2, 2, 2), mgp=c(3, 1, 0), las=0)

ts.plot(VecB2[,1],VecB2[,2], col=c("blue","red"),xlab="year", ylab="demand", main="Demand B:


Actual vs Forecast")

ItA.arima.fit <- auto.arima(dem_ItA, seasonal=TRUE)


fcastA <- forecast(ItA.arima.fit, h=19)
plot(fcastA)

fcastA
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Aug 2017 4320.211 3879.991 4760.431 3646.953 4993.469
## Sep 2017 4169.513 3725.551 4613.476 3490.531 4848.495
## Oct 2017 4428.791 3981.385 4876.197 3744.542 5113.040
## Nov 2017 5102.669 4652.091 5553.246 4413.570 5791.767
## Dec 2017 5879.220 5425.721 6332.719 5185.653 6572.787
## Jan 2018 2819.535 2363.343 3275.727 2121.849 3517.221
## Feb 2018 3990.984 3532.307 4449.660 3289.498 4692.469
## Mar 2018 4181.449 3720.480 4642.419 3476.458 4886.441
## Apr 2018 4081.089 3618.003 4544.174 3372.860 4789.317
## May 2018 3888.336 3423.296 4353.376 3177.118 4599.554
## Jun 2018 4029.525 3562.679 4496.370 3315.545 4743.504
## Jul 2018 4390.292 3921.777 4858.807 3673.760 5106.823
## Aug 2018 4407.590 3900.778 4914.402 3632.487 5182.693
## Sep 2018 4257.019 3747.019 4767.019 3477.041 5036.997
## Oct 2018 4516.419 4003.480 5029.358 3731.946 5300.892
## Nov 2018 5190.414 4674.763 5706.065 4401.794 5979.034
## Dec 2018 5967.079 5448.925 6485.232 5174.632 6759.526
## Jan 2019 2907.502 2387.039 3427.966 2111.522 3703.483
## Feb 2019 4079.056 3556.458 4601.654 3279.812 4878.301

ItB.arima.fit <- auto.arima(dem_ItB, seasonal=TRUE)


fcastB <- forecast(ItB.arima.fit, h=19)
plot(fcastB)

fcastB
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Aug 2017 2356.3605 1945.1156 2767.605 1727.4157 2985.305
## Sep 2017 2082.9473 1671.7024 2494.192 1454.0025 2711.892
## Oct 2017 1784.7949 1373.5500 2196.040 1155.8501 2413.740
## Nov 2017 2436.4019 2025.1570 2847.647 1807.4571 3065.347
## Dec 2017 2429.8611 2018.6162 2841.106 1800.9163 3058.806
## Jan 2018 965.2270 553.9834 1376.471 336.2842 1594.170
## Feb 2018 1278.2300 866.9865 1689.474 649.2873 1907.173
## Mar 2018 1693.3730 1282.1294 2104.617 1064.4302 2322.316
## Apr 2018 2088.9240 1677.6805 2500.168 1459.9813 2717.867
## May 2018 2342.6726 1931.4290 2753.916 1713.7298 2971.615
## Jun 2018 2587.5703 2176.3268 2998.814 1958.6276 3216.513
## Jul 2018 2903.9519 2492.7084 3315.196 2275.0092 3532.895
## Aug 2018 2267.6121 1814.7093 2720.515 1574.9570 2960.267
## Sep 2018 1945.3845 1492.4816 2398.287 1252.7293 2638.040
## Oct 2018 1663.8271 1210.9242 2116.730 971.1719 2356.482
## Nov 2018 2293.2590 1840.3561 2746.162 1600.6038 2985.914
## Dec 2018 2325.0672 1872.1643 2777.970 1632.4120 3017.722
## Jan 2019 843.6106 390.7094 1296.512 150.9580 1536.263
## Feb 2019 1150.9134 698.0123 1603.815 458.2608 1843.566

You might also like