QMT 3001 Business Forecasting Term Project
QMT 3001 Business Forecasting Term Project
TERM PROJECT
Although archaeological evidences suggest that first attempts date back to around 6000 BC,
modern brewing has accelerated and popularized with the Industrial Revolution. Prior to the
Industrial Revolution, ale was still made and sold on a domestic scale, but by the 7th century
AD, European monasteries were still producing and selling beer. Beer output shifted from
artisanal to commercial during the Industrial Revolution, and domestic production ceased to
be important by the end of the nineteenth century. Brewing was transformed by the invention
of hydrometers and thermometers, which gave brewers more control over the process and a
better understanding of the effects. The brewing industry is now a global market, with a few
large international corporations and thousands of smaller producers ranging from brewpubs
to regional breweries. More than 150 billion litres are sold per year, generating $623,6 billion
in global sales by 2020.
In the scope of this project retail sales of Heineken, which is a Dutch brewing company, in
U.S. market is investigated beginning from January 1995 to January 2021 with data collected
in monthly frequency in millions of dollars. Heineken N.V. is a Dutch brewing company that
was founded in 1864 in Amsterdam by Gerard Adriaan Heineken. Heineken owns over 165
breweries in more than 70 countries by 2020. It employs about 73,000 people and produces
250 worldwide, regional, local, and specialty beers and ciders. It is the largest brewer in
Europe and one of the world's largest by volume, with annual beer production of 200 million
hectolitres in 2020 and global revenue of about 23,77 billion euros in 2015.
Dependent variable is the monthly sales figures. Overall, the series show an increasing trend
by the time. Therefore, time is an independent variable in this dataset. Another independent
variable might be average temperature of the month. At first sight to the data, it is easily seen
that in warmer months of the years the sales figures increased. This also gives us the
seasonality effect with the length of the cycle being one year. In December the numbers
peaked in every year which can be associated with new year and Christmas celebrations.
Other independent variable can be number of adult consumers in the country which is the
customer base of the product.
2. PRELIMINARY DATA ANALYSIS
Time series and cumulative column graph of the data is given below. As we can see, between
January 1995 and January 2021 there has been an increasing trend with a seasonality effect.
As previously mentioned in the first part when we look at the time series graph it is easily
seen that in warmer months of the years the sales figures increased. This also gives us the
seasonality effect with the length of the cycle being one year. As a result, it can be said that
data have a positive trend pattern with seasonality. The data follows normal distribution
which can be inferred from the p-value in Figure-4 which is less than 0,005 satisfying the
requirement of normal distribution. This can also be inferred from the histogram and the
normal probability plot, both indicating that data follow normal distribution.
Figure 2: Cumulative Column Graph of Retail Sales of Heineken Between 1995 and 2021
Figure 3: Histogram of Sales
Let’s first look at the mean, mode, and median of our data. We can look at the graphical
summary to see mean value. When we look at the graphical summary, we see that the mean is
3303,6. This means that, during those 26 years, USA’s average purchase of Heineken beer
from retail stores is 3303,6. After that, we can look at the histogram chart to see the mode
value. When we look at the histogram chart, we see that values between 2375 and 2625, and
3375 and 3625 have the highest number of repetitions (30 times for each frequency). Then,
median shows us the middle value in a data set sorted from small to large. We can look at the
graphical summary or boxplot. When we look at these two charts, we see that median value is
3125. Also, we can see the minimum value and the maximum value in the data set. As seen in
from graphical summary with clearly, the minimum value is 1501, maximum value is 7739,
and 1st quartile is 2357,500 and 3rd quartile is 4066,00. Other value, that we must look, is the
standard deviation. The method that measures the proximity and compatibility of
observations in a data set is called standard deviation. When we look at the graphical
summary report, we can see that it is 1153,5. It means that the data is scattered from the
mean. Scatter plot is also given below.
Figure 6: Scatterplot of Sales
When we look at the past data presence of a noticeable trend and seasonality is obviously
seen. Thus, moving average is expected to perform a poor result to be used in this data as it
does not handle trend and seasonality very well. Exponential smoothing, on the other hand is
expected to provide better result. More specifically, Winter's exponential smoothing model is
assumed to be the best option as it is widely accepted to be used for data that exhibit both
trend and seasonality.
Naïve method is not the best option to use in this data because it is simply referred to be
called "no change forecast" and might not perform well in data that exhibit both trend and
seasonality. In this section a naïve method that combines seasonal and trend estimates has
been applied to data. Results are given in the below table.
When we look at the past data presence of a noticeable trend and seasonality is obviously
seen. Thus, moving average is expected to perform a poor result to be used in this data as it
does not handle trend and seasonality very well. When moving average is applied to our data
with each of the parameters below, k=12 provided best result. This can be seen from MAD,
MSE and MAPE values. Two reasons for this are cycle length and the trend pattern of the
data. The graph of parameter and the MAD, MSE or MAPE values displays a parabolic-like
shape with the valley of the parabola being the cycle length of the series which means for k
values larger than cycle length, error starts to increase. Thus, it is not surprising for k=12 to
yield the smallest error. MAD, MSE and MAPE values are given in the table on the next
page.
Theoretically, single exponential smoothing is a primitive method for data with both
seasonality and trend, therefore, it is not the best option for this project's data. As a rule of
thumb, in actual practice, alpha values from 0.05 to 0.30 work very well in most simple
smoothing models. Thus, predictably, in this data the parameter with smallest error yielded is
0,1.
Holt's method is the trend adjusted version of smoothing. Although it is theoretically better
than single exponential smoothing as a result of responding to trend, it lacks the seasonality.
Therefore, it is not quite appropriate for our data in this project. Best results obtained with α =
0,9 and β = 0,1 and the results got better where α gets larger and β gets smaller which means
with this data, best results are obtained with larger smoothing constant and smaller trend
smoothing constant. Results are given in the table on the next page.
β
0.1 0.2 0.3 0.6 0.9
0.1 8,17 8,21 8,36 8,79 9,30
0.2 7,61 7,83 8,06 8,89 10,20
0.3 6,92 7,12 7,33 8,41 9,60
α
In order to compare the performances of the models, a table has been prepared and given in
this page. According to the values in the table, Holt's Method with α = 0,9 and β = 0,1 yields
the smallest error. But in order to determine the best fitting model we should check if there is
an overfitting and analyse residuals. An overfitting can be detected with calculation of R-
squared value. A higher R-squared value is generally desired although it always does not
necessarily indicate a good fit, residual analysis should also be conducted to detect if there is
a bias. In residual analysis, which can be made by examining normal probability plot and
histogram, below assumptions of regression should be checked by examining residuals in
order to make sure that a bias does not occur and the model fits the data;
Linearity: Plot of residuals should follow a linear trend.
Homoscedasticidity: Variance should be constant for all levels of "x".
Normality: Distribution of residuals should follow a normal distribution.
Independency: Residuals should be independent.
Highest R-squared values are 97,4% and 97,0% for single exponential smoothing and moving
average (k=12) but contrary to these values these models are not the most suitable ones
because their normal probability and versus fits plots clearly indicates the deviation from
normal distribution and the bias. Similarly, in the naïve method's plots, deviation and bias is
obviously visible and in addition to this R-squared value is also relatively low with 81,8%,
thus making it far further from being the most suitable model. Remained models are Holt's
and Winter's (multiplicative) and Winter's (additive). These models have similar R-squared
values, all of them being around 80%, and residual plots. But theoretically the model chosen
should also respond to seasonality as the data forecasted has seasonality, therefore, Holt's
method is also eliminated. Between Winter's (multiplicative) and Winter's (additive), Winter's
(multiplicative) is chosen as the most suitable option due to having smaller error measures
and it also satisfies all assumptions and appropriate for data. All plots are given in the
following pages.
Model Model Parameters MAD MSE MAPE Response to Trend Response to Seasonality
Naive Ŷt+1=Yt-11 + (Yt -Yt-12)/12 384,83 469.185,50 10,07 Not quite Not quite
Moving Average k=12 294,44 214.964,98 8,17 Not quite Not quite
Single Exponential Smoothing α=0,1 321,76 239.248,00 9,12 No No
Holt's Method (Double Exponential Smoothing) α = 0,9 | β = 0,1 30,48 2.390,33 0,98 Yes No
Winter's Method (Multiplicative) γ=0.2 | α=0.6 | β=0.2 33,83 2.394,92 1,00 Yes Yes
Winter's Method (Additive) γ=0.2 | α=0.6 | β=0.3 44,29 3.684,43 1,34 Yes Yes
Figure 17: Residual Plots of Holt's Method/Double Exponential Smoothing (a=0,9/β=0,1) Forecasts
Figure 18: Fitted Line Plot of Winter's Method (Multiplicative, γ=0,2/a=0,6/β=0,2) Forecasts
In this project, additive decomposition model suits better for the data. The reasons for this
are, although relatively smaller error measures have been observed in multiplicative model,
additive model having smaller seasonal residuals and closer fits and actual values. The plots
of these models are given below.
Period Index
1 -529,16
2 -440,83
3 -201,2
4 -208,58
5 72,71
6 27,05
7 158,23
8 43,46
9 -111,12
10 -60,29
11 60,84
12 1188,88
Data in the project exhibits both seasonality and trend. In January, February, March and
April, valleys have been observed in the graph while the numbers peaked in July and
December. Summer and Christmas are the most obvious reasons for these but in addition to
these, reason behind this situation can also be end of each academic term when most college
students buy alcoholic drink to consume in home parties celebrating end of the term.
Temperature is another factor as well as it's obvious that in warmer months sales have been
increased in general.
To be more specific, in order to provide further explanation, October and November are
chosen. In October 60,29 millions less sales than average have been realized in average while
in November 60,84 more than average have been observed. The numbers on the seasonal
indices indicate the deviation from the mean and the sign indicates the direction. In
forecasting as negative seasonal indices gets larger it means larger deviation occurs from the
mean in a negative direction. In other words, the gap between the value of that period and the
mean gets larger and the value of that month is smaller than the mean. This situation happens
exactly the same way when indices are positive. Thus, seasonal indices should be considered
in order to avoid errors and increase accuracy.
In the boxplot of detrended data by season, distribution of the data is given. Large upper part
or lower part of the box indicates accumulation over or under the median value of the data
respectively. Higher deviations have been observed in January, February, March, April, July
and December. It is important to note that these values have been observed to be either peak
or valley in the time series graph, thus, it can be inferred that important events happening in
these periods might have affected the data significantly. For example, a competitor in the
market might have made a big promotion in order to increase its market share that would
have caused fluctuations in sales' of Heineken these months. Similarly, Heineken might have
done these promotions which would also cause deviation in that period, thus, both situations
causing higher deviations from the mean either by increasing or decreasing the sales. Similar
situation is also observed in residuals. The reason for this, as the data in the period deviates
from the mean of that period it causes larger residual values by being significantly larger or
smaller than the forecasted value. So, this situation in residuals have been realised in the
same months as detrended data by season. Seasonal analysis and plots are given below.
The data had 313 periods, we should calculate for 314, 315,316 and 317.
For multiplicative decomposition;
Intercept=1.509,4; Slope=11,428;
Forecast formula = (Intercept + (Slope*Period Number))*(Seasonal Factor);
F314 = (1509,4 + 11,428 * 314) * (0,8434) = 4.299,48
F315 = (1509,4 + 11,428 * 315) * (0,94015) = 4.803,43
F316 = (1509,4 + 11,428 * 316) * (0,93326) = 4.778,90
F317 = (1509,4 + 11,428 * 317) * (1,02474) = 5.259,04
5. REGRESSION ANALYSIS
Variables and categorization are given in the below table. According to these variables
conceptual model should be;
Ŷ = b0 + b1*x1 + b2*x2 + b3*x3 + b4*x4 + b5*x5
The model is in the linear form and the correlation matrices are given below. According to
the correlation matrix impacts of independent variables on dependent variables are correctly
predicted though with weaker correlation coefficients. x1, x2, x3, x4 affects sales revenue in a
positive way while x5 does in a negative way.
Thus, H0 is rejected and it's safe to say that at least one independent variable affects Y.
R-squared value can be seen from the Figure-29 which is 93,24% and this value makes it
possible to say that 93,24% of the variation in retail sales is explained by the variation in the
independent variables in the model. Multicollinearity can be checked with Variance Inflation
Factors (VIF). A value of 1 indicates that there is no correlation between this independent
variable and any others. VIFs between 1 and 5 suggest that there is a moderate correlation,
but it is not severe enough to warrant corrective measures. VIFs greater than 5 represent
critical levels of multicollinearity where the coefficients are poorly estimated, and the p-
values are questionable. When Figure-29 is examined, it is visible that all VIF values are
around 1 and multicollinearity does not exist.
1.1.2. Reduced Model
Figure 31: Coefficients, Model Summary and ANOVA Table of Reduced Model
Residuals seems to be normal and homogeneous which can be inferred from the observation
order plot with values mostly having constant variance and its normal probability plot and
bell-like histogram indicating normality respectively. In addition to these calculated Durbin-
Watson statistics is 1,64 and it refers to randomness and independency of residuals.
Therefore, the model is appropriate for the data. Error measures are MAD = 337,8981, MSE
= 246.809,70, MAPE = 9,88%.
Data on hand shows sales until January of 2021 so rest of the year is calculated with
regression model.
Regression Model: Y = 1498,2 + 11,499*X1.
F314 = 1498,2 + 11,499 * 314 = 5.108,89
F315 = 1498,2 + 11,499 * 315 = 5.120,39
F316 = 1498,2 + 11,499 * 316 = 5.131,88
F317 = 1498,2 + 11,499 * 317 = 5.143,38
F318 = 1498,2 + 11,499 * 318 = 5.154,88
F319 = 1498,2 + 11,499 * 319 = 5.166,38
F320 = 1498,2 + 11,499 * 320 = 5.177,88
F321 = 1498,2 + 11,499 * 321 = 5.189,38
F322 = 1498,2 + 11,499 * 322 = 5.200,88
F323 = 1498,2 + 11,499 * 323 = 5.212,38
F324 = 1498,2 + 11,499 * 324 = 5.223,88
Error measures and conceptual validity of the models are calculated and compared in the
previous section. As a result of this study, Winter's (multiplicative) is selected as the most
suitable model for our data with one percent error in average and satisfying all requirements
of the properties of the data used. Calculations of next four periods are given below.
3. MANAGERIAL IMPLICATIONS
In our project we analyzed Heineken sales revenue between 1995 and 2021 in monthly
frequency. Firstly, several different methods applied to the data and their performances are
compared with accuracy measures and conceptual validity considering statistical properties of
the data. After determining the most suitable method, next four terms forecasts have been
calculated with this method. This method calculates forecasts with one percent error and is
actually quite useful when considering effects of forecasts in the production plan. Forecasts
enable decision making process and help reduce risks, costs, especially opportunity costs,
production costs and storage costs, and uncertainty while increasing profits. Thus, while
planning the production and supply chain actions for next four terms forecasts should be used
as a rough guide while remembering there is always a risk and forecasts are very rarely
perfectly accurate. These forecasts imply increase in general but with an expected fall after
Christmas.
1- https://fred.stlouisfed.org
3- Nerlove, M., & Wallis, K. F. (1966). Use of the Durbin-Watson statistic in inappropriate
situations. Econometrica: Journal of the Econometric Society, 235-238.
4- Savin, N. E., & White, K. J. (1977). The Durbin-Watson test for serial correlation with
extreme sample sizes or many regressors. Econometrica: Journal of the Econometric Society,
1989-1996.