International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 5, Issue 4, April 2016
Development of Demand Forecasting Models for Improved Customer Service in Nigeria
Soft Drink Industry_ Case of Coca-Cola Company Enugu
Godwin H. C.1 and Fakiyesi O. B.2
Department of Industrial/Production Engineering, Nnamdi Azikiwe University Awka, Nigeria1, 2
Email: hcgodwin@yahoo.com1 and oladapo_fakiyesi@yahoo.com2
ABSTRACT
The study is targeted at developing appropriate
forecasting models for the production of Coca-Cola
products using Box-Jenkins method. This method is
based on a group of stages which are (definingestimating-diagnosing-and forecasting). The time series
of five (5) products of the company were examined
namely; Coke, Fanta, Sprite, Limca and Schweppes)
which were tagged Product 1, 2, 3, 4 and 5
respectively. Each of the products was also examined
separately on both monthly and quarterly basis. The
results of this study showed that the suitable and
efficient model to represent the data of the time series
according to AIC, BIC, MSE and RMSE criteria with
the smallest values as well as the Box-Ljung test are the
fitted models; SARIMA (0,1,0) (1,1,1) 4, SARIMA
(1,1,0) (1,1,1) 4, SARIMA (1,1,1) (0,1,0) 4, SARIMA
(1,1,0) (1,1,1) 12 & SARIMA (0,1,0) (1,1,1) 12.
According to these results, the future demand of the
products has been forecasted from 2015 to 2019 and
those values gotten showed harmony with their
counterparts in the original time series. It provided us
with the image of the reality of the expected demand in
future.
Keywords: Akaike and Bayesian Information Criteria
(AIC & BIC), Demand, Forecasting, Seasonality, Time
Series.
1. INTRODUCTION
Forecasts for the soft drink industry are made using
Volume (in gallons) and revenue (in naira).
Consumption from a volume perspective is expected to
increase as a result of an anticipated increase in
consumer spending as the recession ends, above-average
expansion of the 55-and-older age groups, faster-paced
lifestyles that demand convenience products, and rising
demand for functional beverages. A number of factors
determine demand for soft drinks; price, income,
consumers lifestyles and tastes. The absence of
effective and scientific demand forecasting methods in
most Nigeria public organizations seems to be the main
bane of shortages and excess in human resources
resulting to unmanageable and experience imbalances in
the number and quality of employees needed to
optimally achieve organizational objectives and plans
[1].
The study is significant in the following ways: The
findings will guide the decision makers in coming up
with economic growth Soft Drink sector policies that
favor both individual and public owned investments. It
will also enhance the productivity level of Nigerian
Bottling Company as such reduces the bottle-neck in the
production line of The Coca-Cola Company. This would
eventually create efficient and effective system that will
accommodate the future demand of their products.
Health issues are a hot topic with many consumers and,
as a result, are driving demand in both directions. Soft
drinks developed to be low-calorie, low-sugar, and
preservative-free are in line with consumers health
consciousness, and demand for these products is
increasing. At the same time, the public debate about
nutrition, and specifically about Sugar-Sweetened
Beverages (SSBs), has reduced demand for non-diet
Carbonated Soft Drinks (CSDs) or shifted demand to
diet CSDs [2].
Several demand forecasting techniques currently exist.
They vary from fairly simple qualitative methods based
on individual or group judgments to highly complicated
methods
involving
sophisticated
statistical
computerization. In this study, the problem is to analyze
the demand-forecasting model using a Non-Seasonal and
Seasonal Autoregressive Integrated Moving Average
(SARIMA) model. The rationale for choosing this type
of model is contingent on the behavior of the time series
data. Also in the history of demand forecasting, this
model has proved to perform better than other models
because the model can replicate existing conditions, and
therefore suitable to predict future demand [3].
www.ijsret.org
259
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 5, Issue 4, April 2016
Autoregressive Integrated Moving Average (ARIMA) is
the method first introduced by Box and Jenkins (1976)
and until now become the most popular models for
forecasting univariate time series data. This model has
been originated from the Autoregressive model (AR),
the Moving Average model (MA) and the Combination
of the AR and MA, the ARMA models. In the case
where seasonal components are included in this model,
then the model is referred to as Seasonal Autoregressive
Integrated Moving Average (SARIMA) model. BoxJenkins procedure comprises model identification, model
estimation and model checking [4]. The scope of this
research work is limited to development of demandforecasting models for Coca-Cola products. The
application of Seasonal (ARIMA) models/methods were
employed to develop the models that successfully
forecast the future monthly and quarterly demand of
these products. These models are capable of replicating
the stochastic process that generated the time series.
Forecasting is a scientifically calculated guess for
estimating future event by casting forward past data. The
past data is systematically combined in a predetermined
way to obtain the estimate of the future [5]. Godwin and
Igboanugo [6] analyzed time series of PHCN using
ARMA and ARIMA models. According to their
findings, the models predict fairly accurately in-sample
and out-of-sample values. Ihueze and Okafor [7]
developed predictive production rate model using
general multiplicative regression equation form.
Similarly, time series decomposition analysis was used
to study seasonality and trend in some selected products
[8]. Their findings shown reduction in production level
and short period forecasting was also recommended.
2. METHODOLOGY AND MODELING
This study centered on month-to-month data for five
years which were obtained from production/sales
department of the case study. The model/method
adopted in this study represents:
(1)
Where
non-seasonal AR order,
order,
non-seasonal differencing,
differencing,
seasonal
seasonal
MA order. The entire procedure for the time series
modeling selected for this study has been summarized in
the flow chart shown in Fig 1.
1.
2.
DATA PREPARATION
Transform data to stabilize variance
Difference data to obtain stationary
1.
2.
MODEL SELECTION
Examine data ACF and PACF
Select best model
ESTIMATION
Estimate parameters in potential models
1.
2.
3.
non-seasonal MA order,
seasonal AR
DIAGNOSTICS
Check ACF/PACF of residuals
Do portmanteau test of residuals
Are the residual White noise?
NO
OO
OO
YES
FORECASTING
Use selected model to forecast
FIGURE 1: Schematic Representation of the Box-Jenkins Methodology for Time Series Modeling [9]
Use selected model to forecast
www.ijsret.org
260
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 5, Issue 4, April 2016
Following the method as proposed by Box and Jenkins
[10], the components are then broken down as follows
according to the order in Fig. (1).
2.1.
Trend and Seasonal Differences: Based on the
findings from the data used in this study, it was
found that both trend and seasonality were present
in the original time series data of the case study.
Thus, there was need to detrend and also remove
the presence of seasonality using the general
stationarity transformation according to Box and
Reinsel [4].
(2)
Where
backshift operator,
number of seasons,
degree of seasonal differencing,
Process
AR (p)
MA (q)
non-seasonal
differencing,
pre-differencing
transformation.
2.2. Identification of Potential Models: After the time
series data has been certified stationary (i.e. the mean,
variance and autocorrelations are constant). Haven
applied both seasonal and non-seasonal differences of
order 1 to make the time series data stationary
(i.e.
). Therefore, the Autocorrelation
Function (ACF) and Partial Autocorrelation Function
(PACF) of the stationary time series of the case study
were then examined. Fig. 2 shows the basic
characteristics for the identification of
in
the form of Equation (1).
degree of
Autocorrelation Function
(ACF)
Tails off towards zero
(exponential decay or damped
sine wave)
Cuts off to zero after lag p
Partial Autocorrelation
Function (PACF)
Cuts off to zero after lag p
Tails off towards zero
(exponential decay or damped
sine wave)
ARMA (p, q) Tails off towards zero
Tails off towards zero
(exponential decay or damped (exponential decay or damped
sine wave)
sine wave)
Figure 2: Distinguishing Characteristics of Theoretical ACF and PACF
Source: Adopted from Chatfied [11]
The ACF and PACF of the stationary time series are
computed and expressed as follows:
Table 1(a): Quarterly Model Criteria
Pro SARIMA Models
AIC
duct
P1
(3)
(4)
Where
variable,
time,
number of lags,
the following models were tentatively identified as
presented in Tables 1(a) and 1(b).
response
predictor variable
From the critically observation of the ACF and PACF
plots of the time series data of the case study, after
differencing the data to obtain stationarity. Considering
also both quarterly and monthly plots of each product,
P2
www.ijsret.org
BIC
261
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 5, Issue 4, April 2016
P3
P2
P4
P3
P5
P4
P5
Table 1(b): Monthly Model Criteria
Pro SARIMA Models
duct
P1
AIC
BIC
2.3. Model Selection: The tentative models selected in
this study as shown in Tables 1(a) and 1(b) using a
penalty functions statistics: Akaike Information
Criterion (AIC) and Bayesian Information Criterion
(BIC). These criteria were used to measure the
goodness-of-fit of estimated statistical models and
are calculated using the expressions below:
OR
of observation, equivalently the sample size,
error variance [12].
Finally, the models that has the least values of AIC and
BIC for each product were then selected and
summarized in Table 3.
Table 2: Summary of the Models
(5)
OR
Models
A
B
(6)
C
Where;
The number of parameters in the statistical
model,
The maximized value of the likelihood
function for the estimated model,
The residual
sum of squares of the estimated model,
The number
The
D
E
www.ijsret.org
Seasonal ARIMA Models
262
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 5, Issue 4, April 2016
2.5. Diagnostic Checking and Model Validation: The
selected models might appear to be the most appropriate
among the series of models considered; it becomes
necessary to do diagnostic checking to verify the
adequacy of the models. This was done in this study by:
Residual Plots for P4(*
Normal Probability Plot
Versus Fits
99
1000
90
Residual
the next step is the parameter estimation or fitting stage.
The parameters of the models as given in Table 2 are
estimated by the maximum likelihood method.
It is critical for SARIMA model to examine the behavior
of residual values to see whether they are normal,
random and have constant variation. From Fig 3, the
assumptions are met quite well, except there are nonconstant variations in the Versus Fits Plot. This stems
from the fact that the quality of the models fit is better
for early data points than for more recent ones.
Percent
2.4. Parameter Estimations: Immediately a suitable
structure is identified,
50
i. Verifying the ACF of the residuals.
10
1
ii. Verifying the normal probability plots of the
residuals.
-1000
-1000
0
Residual
1000
120000
Histogram
1000
Residual
Frequency
4.5
3.0
1.5
0.0
-1000
-1000
-500
0
Residual
500
1000
6
8 10 12 14 16
Observation Order
Residual Plots for P2(*
Residual
Percent
1000
0
-1000
-2000
-2000
-1000
10000
0
1000
Residual
2000
85000
90000
Histogram
1
-10000
-5000
0
5000
Residual
-5000
300000
10000
315000
330000 345000
Fitted Value
360000
2000
1000
2
1
0
Histogram
Versus Order
10000
4.8
Residual
3.6
2.4
5000
7500
0
-1000
-2000
-2000
-1000
0
Residual
1000
2000
6 8 10 12 14 16
Observation Order
18
20
Figure 5: Normal Probability Plot of Residual for
MODEL (C)
0
-5000
0
2500
Residual
105000
5000
1.2
-5000 -2500
95000
100000
Fitted Value
Versus Order
4
Residual
50
5000
Frequency
Residual
Percent
50
10
Frequency
2000
90
Versus Fits
90
0.0
Versus Fits
99
Residual Plots for P1(*
99
20
Fig. 4 shows the normal probability plot of the residuals
for MODEL (B), non-constant variations were noticed in
Versus Fits plot and also the Observation Order show
reasonable fluctuations.
10
Normal Probability Plot
18
Figure 4: Normal Probability Plot of Residual for
MODEL (B)
Normal Probability Plot
The normal probability plots of the selected models in
this study are analyzed in Fig. 3-7.
132000
Versus Order
6.0
2.6. Normal Probability Plot of the Residuals
The normal probability plot is a graphical technique to
identify substantive departures from normality. This
includes identifying outliers, skewness, a need for
transformations, and mixtures. Normal probability plots
are made of raw data, residuals from model fits, and
estimated parameters. In a normal probability plots (also
called a normal plot), the sorted data are plotted versus
values selected to make the resulting image look close to
a straight line if the data are approximately normally
distributed. Deviations from straight lines suggest
departures from normality [11].
123000 126000 129000
Fitted Value
6
8 10 12 14 16
Observation Order
18
20
Figure 3: Normal Probability Plot of Residual for
MODEL (A)
Looking at Fig. 5 which shows the normal probability
plot of the residual for Model C, the residuals roughly
form a horizontal bond around the 0 line at the versus
www.ijsret.org
263
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 5, Issue 4, April 2016
fits. This suggests that the variances of the error terms
are equal.
to forecast the future demand of Coca-Cola products of
the case study.
Residual Plots for Product1
Normal Probability Plot
10000
Residual
90
Percent
2.7. Forecasting: This is the last phase of Box and
Jenkins methodology. The models that have been
selected and tested as appropriate are used to make
future predictions for demand of Coca-Cola products.
Fig. 8-12 shows plot of 5-year (60 months) time series
data for Products 1-5 with the forecast results for
predicted demand for a future period of 5years (60
months) therein (i.e. 2015-2019).
Versus Fits
99
50
0
-10000
10
1
-20000
-20000
-10000
0
Residual
10000
-10000
Histogram
10000
Versus Order
10000
16
12
Residual
Frequency
0
Fitted Value
0
-10000
4
0
-20000
-18000
-12000 -6000
Residual
6000
1 5 10 15 20 25 30 35 40 45 50 55
Observation Order
Figure 6: Normal Probability Plot of Residual for
MODEL (D)
Fig. 6 shows the normal probability plot of residual for
Model D. Observing the plot critically shows that the
residuals bounces randomly around the 0 line. This
suggests that the assumption that the relationship is
linear is reasonable. The points on this plot form a nearly
linear pattern, which indicates that the normal
distribution is a good model for this data set.
Residual Plots for Product3
Normal Probability Plot
Versus Fits
99
0
Residual
Percent
90
50
Figure 8: Actual and Forecast Plot for P1
-5000
10
1
-10000
-5000
-10000
-40000
5000
Residual
Histogram
-20000
0
20000
Fitted Value
40000
Versus Order
0
12
Residual
Frequency
16
-5000
4
0
-8000
-6000
-4000 -2000
Residual
2000
-10000
1 5 10 15 20 25 30 35 40 45 50 55
Observation Order
Figure 7: Normal Probability Plot of Residual for
MODEL (E)
This type of residual test is carried out specifically to
determine the true nature of the residuals. And it shown
clearly from Fig. 3-7 that the nature of the residual
followed a normal distribution (i.e. the residuals of the
models are normally distributed). This phenomena added
to the facts already proved in this study that these
selected models are more adequate and can then be used
Figure 9: Actual and Forecast Plot for P2
www.ijsret.org
264
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 5, Issue 4, April 2016
3. Discussion of Results
Figure 10: Actual and Forecast Plot for P3
Figure 11: Actual and Forecast Plot for P4
The time series of the differenced data shown the data is
stable and yielded accurate values during the analysis.
The first difference of the original data was taken to
remove trend and the seasonal fluctuations at lag 12.
This was done on both monthly and quarterly data series
of each product. All these products of the differenced
time series data judging from the ACF and PACF plots,
the hypothesis of stationary dependency of the time
series of both quarterly and monthly observations is not
rejected and thereby made it possible to obtain
stationarity in the series.
Two major goodness-of-fit were used for model
selection; Akaike Information Criterion (AIC) and
Bayesian Information Criterion (BIC). These were
determined based on likelihood function incorporating
the sum of squared errors (SSE), the sum of parameters
and the number of observation. Invariably, a single
model for each product (monthly and quarterly) was
selected with the least values of AIC and BIC. These
selected SARIMA models were then simplified to five
(5) models which expected to make forecast for Products
1-5 (monthly and quarterly) as shown in Table 2.
These models were properly checked and validated by
verifying the ACF of the residuals as well as the normal
probability plots of the residuals. The result showed that
the residual of the models are normally distributed. This
supports the fact that the models are adequate.
Using the general Box-Jenkins model of order (p, P, q,
Q), the point forecast of the models for each product
(quarterly and monthly) were made. Model-A (quarterly)
was used to make forecast for Products 1, 3 and 5 after
the
model
has
been
fitted
as
.
The demand forecast of Period 21 was made
Model-B (quarterly) was used to make forecast for
Products 1 after the model has been fitted
as
.
The demand forecast of Period 21 was made
Model-C (quarterly) was used to make forecast for
Products 1 and 2 after the model has been fitted
as
. The demand forecast
of Period 21 was made
Figure 12: Actual and Forecast Plot for P5
www.ijsret.org
265
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 5, Issue 4, April 2016
Model-D (monthly) was used to make forecast for
Products 1 after the model has been fitted
as
achieve even production and distribution of the products
across the nation.
REFERENCES
. The demand forecast of Period 21 was made
Model-E (monthly) was used to make forecast for
Products 1, 3 and 5 after the model has been fitted
as
. The
made
demand
forecast
of
Period
that all these values are in Centiliters (
21
was
. Note
CL).
Fig. 8-12 has shown clearly the variation between the
actual data and forecast. The actual data covers a period
of 60 month observations (i.e. 2010-2014) whereas the
forecast was made from a period of 61-120months
(2015-2019).
3. Conclusion
The study mainly intended to develop forecasting
models using Box-Jenkins Autoregressive Integrated
Moving Average (SARIMA) to make future predictions.
The statistical tests show that the time series of the
monthly and quarterly Coca-Cola production of Nigerian
Bottling Company (NBC), Enugu is not stable and has
seasonal changes. To ensure stability in the series,
firstly, the general trend was removed using differences
of the first lag, second, the seasonal differences of lag
12. The most appropriate models were chosen using the
balancing standards (the smallest value of each: AIC,
BIC, MSE and RMSE as well as the Box-Ljung test).
The models developed were able to replicate the
stochastic process that generated the time series thereby
eliminate other factors that might affect the demand of
these products as demonstrated in Fig. 8-12. Coca-Cola
products have become leading products in soft drink
industry, adopting the results of these findings would
help the company to meet future demand of their
products and customers satisfaction. Effort should be
made to have a centralized forecasting method to
[1] D. Meghan, E. Meghan, K. Emily, P. Leslie, Z.
Kelly. Strategic Management in a Global
Context of Soft Drink Industry. 2006.
[2] R. Thomas. Health Services Planning. 2 ndEd.
Springer US, 2003.
[3] C. P. Tae, S.K. Ui, K. Lae-Hyum, W. J. Byung, & K.
Y. Yeong. Heat Consumption Forecasting Using
Partial Least Square, Artificial Neural Network
and Support Vector. 2010.
[4] G. E. P. Box and G. C. Reinsel. Time Series
Analysis: Forecasting and Control. 4th Ed.
Hoboken, NJ: Wiley. 2008.
[5] J. S. Armstrong. Principles of Forecasting: A
Handbook for Researchers and Practitioners.
Boston: Kluwer. 2001.
[6] H. C. Godwin and A. C. Igboanugo. Prediction of
Electricity Generation and Consumption in
Nigeria Using ARIMA Models, International
Journal of Engineering Service, Volume 2, 2010,
Number 2.
[7] C. C. Ihueze and E. C. Okafor. Multivariate Time
Series Analysis for Optimum Production
Forecast: A Case Study 7up Soft Drink
Company in Nigeria. 2010.
[8] C. C. Ihueze and D. C. Ezeloira. Application of
Selected Forecasting Techniques in Modeling an
Analysis of Historical Data Related to Plastic
Pipe Product. Nnamdi Azikiwe University,
Awka, Nigeria, MA, 2014.
[9] M. Spyros, S. C. Wheelwright and R. J. Hyndman.
Forecasting: Models and Applications, 3rd Ed,
John Wiley & Sons, Inc, U.S.A. 1998.
[10] G. E. P. Box and G. M. Jenkins. Time Series
Analysis: Forecasting and Control, San
Francisco: Holden-Day. 1976.
[11] C. Chatfied. The Analysis of Time Series, Chapman
and Hall, London, U.K. 2004.
[12] H. Akaike. A New Look at the Statistical Model
Identification. IEEE Transaction on Automatic
Control, 19(6), 1974, 716-723.
www.ijsret.org
266