0% found this document useful (0 votes)
65 views1 page

Coca Cola Start

The document discusses models for seasonal time series data such as quarterly sales. It describes both a linear model and a multiplicative model for seasonality. The linear model represents seasonality through dummy variables for each quarter, while the multiplicative model models seasonality through percentage changes from a reference quarter. Both models aim to separate the trend effect from seasonal effects in the time series data.

Uploaded by

xtremewhiz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views1 page

Coca Cola Start

The document discusses models for seasonal time series data such as quarterly sales. It describes both a linear model and a multiplicative model for seasonality. The linear model represents seasonality through dummy variables for each quarter, while the multiplicative model models seasonality through percentage changes from a reference quarter. Both models aim to separate the trend effect from seasonal effects in the time series data.

Uploaded by

xtremewhiz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

CocaCola sales seem to follow a cycle of four quarters (using quarterly data). We model the sales below.

In the linear model of seasonality:

the coefficients of the dummy variables b1, b2 and b3 indicate how much each quarter differs from the reference quarter, quarter4. The average increase from one quarter to
the next is b (the coefficient of t). This is the trend effect. Quarter 1 averages b1 units higher than the reference quarter, quarter 4, quarter 2 averages b2 units higher than
quarter 4, etc. These other coefficients indicate the effect of seasonality.

In what follows we are implementing a multiplicative seasonal model of seasonality:

Which after taking logs is:

In this multiplicative model the coefficients are percentage changes in the original sales variable Y. The coefficient of time means that deseasonalized sales increase by b% per
quarter. This is the trend effect. The coefficients b1, b2 and b3 mean that the sales in quarters 1, 2 and 3 are respectively b1% above quarter 4, b2% above quarter 4 and b3%
above quarter 4. Quarter 4 is the reference quarter.

In [1]: import pandas as pd


import numpy as np
from sklearn.model_selection import TimeSeriesSplit
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import cross_val_score
import matplotlib.pyplot as plt
%matplotlib inline

In [2]: #The following function is borrowed from : Dmitriy Sergeyev, Data Scientist @ Zeptolab, lecturer in the Center of Mathematical Finance in MSU
#https://mlcourse.ai/articles/topic9-part1-time-series/

import statsmodels.api as sm
import statsmodels.tsa.api as smt

def tsplot(y, lags=None, figsize=(12, 7), style='bmh'):


"""
Plot time series, its ACF and PACF, calculate Dickey–Fuller test

y - timeseries pandas series


lags - how many lags to include in ACF, PACF calculation
"""
if not isinstance(y, pd.Series):
y = pd.Series(y)

with plt.style.context(style):
fig = plt.figure(figsize=figsize)
layout = (2, 2)
ts_ax = plt.subplot2grid(layout, (0, 0), colspan=2)
acf_ax = plt.subplot2grid(layout, (1, 0))
pacf_ax = plt.subplot2grid(layout, (1, 1))

y.plot(ax=ts_ax)
p_value = sm.tsa.stattools.adfuller(y)[1]
ts_ax.set_title('Time Series Analysis Plots\n Agumented Dickey-Fuller: p={0:.5f}'.format(p_value))
smt.graphics.plot_acf(y, lags=lags, ax=acf_ax)
smt.graphics.plot_pacf(y, lags=lags, ax=pacf_ax)
plt.tight_layout()

In [3]: def detrendPrice(dft):


#make sure the input dft has no Nan, otherwise OLS will break
# fit linear model
series=dft.Sales
length = len(series)
x = np.arange(length)
y = np.array(series.values)
x_const = sm.add_constant(x) #need to add intercept constant
model = sm.OLS(y,x_const)
results = model.fit()
predictions = results.predict(x_const)
resid = y - predictions
df=pd.DataFrame(resid, columns=['Sales'])
df.index = dft.index
return df

In [4]: def MAD_mean_ratio(y_true, y_pred):


return np.mean(np.abs((y_true - y_pred) / np.mean(y_true))) * 100

In [5]: #The following 2 functions are borrowed from : Dmitriy Sergeyev, Data Scientist @ Zeptolab, lecturer in the Center of Mathematical Finance in MSU
#https://mlcourse.ai/articles/topic9-part1-time-series/

def plotModelResults(model, X_train, X_test, y_test, plot_intervals=False, test_data=True):


"""
Plots modelled vs fact values, prediction intervals

"""
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import TimeSeriesSplit
tscv = TimeSeriesSplit(n_splits=5)

prediction = model.predict(X_test)

plt.figure(figsize=(15, 7))
plt.plot(prediction, "g", label="prediction", linewidth=2.0)
plt.plot(y_test.values, label="actual", linewidth=2.0)

if plot_intervals:
cv = cross_val_score(model, X_train, y_train,
cv=tscv,
scoring="neg_mean_absolute_error")
mae = cv.mean() * (-1)
deviation = cv.std()

scale = 1.96
lower = prediction - (mae + scale * deviation)
upper = prediction + (mae + scale * deviation)

plt.plot(lower, "r--", label="upper bond / lower bond", alpha=0.5)


plt.plot(upper, "r--", alpha=0.5)

error = MAD_mean_ratio(prediction, y_test)


if (test_data==True):
plt.title("Test data MAD_mean_ratio error {0:.2f}%".format(error))
else:
plt.title("Train data MAD_mean_ratio error {0:.2f}%".format(error))
plt.legend(loc="best")
plt.tight_layout()
plt.grid(True);

def plotCoefficients(model,X_train):
"""
Plots sorted coefficient values of the model
"""

coefs = pd.DataFrame(model.coef_, X_train.columns)


coefs.columns = ["coef"]
coefs["abs"] = coefs.coef.apply(np.abs)
coefs = coefs.sort_values(by="abs", ascending=False).drop(["abs"], axis=1)

plt.figure(figsize=(15, 7))
coefs.coef.plot(kind='bar')
plt.grid(True, axis='y')
plt.hlines(y=0, xmin=0, xmax=len(coefs), linestyles='dashed');

In [ ]:

You might also like