100% found this document useful (4 votes)

5K views70 pages

20th Century Wine Sales Analysis

The document provides instructions to analyze wine sales data from two datasets, Sparkling and Sparkling, from a wine company. The analysis includes loading and exploring the Sparkling dataset, splitting it into training and test sets, building linear regression, naive, and simple average forecasting models on the training set, and evaluating the models on the test set using RMSE. The best model would then be used to forecast wine sales 12 months into the future and provide recommendations to the company.

Uploaded by

Anushi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (4 votes)

5K views70 pages

20th Century Wine Sales Analysis

Uploaded by

Anushi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Problem-

For this particular assignment, the data of different types of wine sales in the 20th
century is to be analyzed. Both of these data are from the same company but of
different wines. As an analyst in the ABC Estate Wines, you are tasked to analyse
and forecast Wine Sales in the 20th century.
Data set for the Problem: Sparkling.csv and Sparkling.csv
Please do perform the following questions on each of these two data sets
separately.
1. Read the data as an appropriate Time Series data and plot the data.
2. Perform appropriate Exploratory Data Analysis to understand the data and
also perform decomposition.
3. Split the data into training and test. The test data should start in 1991.
4. Build various exponential smoothing models on the training data and
evaluate the model using RMSE on the test data.
Other models such as regression, naive forecast models, simple average
models etc. should also be built on the training data and check the
performance on the test data using RMSE.
5. Check for the stationarity of the data on which the model is being built on
using appropriate statistical tests and also mention the hypothesis for the
statistical test. If the data is found to be non-stationary, take appropriate
steps to make it stationary. Check the new data for stationarity and
comment.
Note: Stationarity should be checked at alpha = 0.05.
6. Build an automated version of the ARIMA/SARIMA model in which the
parameters are selected using the lowest Akaike Information Criteria (AIC)
on the training data and evaluate this model on the test data using RMSE.
7. Build ARIMA/SARIMA models based on the cut-off points of ACF and
PACF on the training data and evaluate this model on the test data using
RMSE.
8. Build a table with all the models built along with their corresponding
parameters and the respective RMSE values on the test data.
9. Based on the model-building exercise, build the most optimum model(s) on
the complete data and predict 12 months into the future with appropriate
confidence intervals/bands.
10.Comment on the model thus built and report your findings and suggest the
measures that the company should be taking for future sales.

OBJECTIVE-

The data of different types of wine sales in the 20th century is to be analyzed.
Both of these data are from the same company but of different wines. As an
analyst in the ABC Estate Wines, We have to analyze and forecast Wine Sales in
the 20th century.

SO THEY HAVE PROVIDED WITH 2 DATASETS-

1. SPARKLING DATASET
2. SPARKLING DATASET

So basically they want to go depth into the datasets provided to us analyze it and
provide them the information regarding optimizing there marketing strategy so to
increase their company growth or sales.
1. WE WILL TAKE SPARKLING DATASET-

The dataset Sparkling.csv was loaded into the database

Shape-there are 187 rows and 2 columns in the df_1 dataset.

Head and Tail-

YearMonth Sparkling
0 1980-01 1686
1 1980-02 1591
2 1980-03 2304
3 1980-04 1712
4 1980-05 1471
YearMonth Sparkling
182 1995-03 1897
183 1995-04 1862
184 1995-05 1670
185 1995-06 1688
186 1995-07 2031

We need to convert them into date format.

We have converted the data into date format and given the column name as
Time_Stamp.

We can also drop the column Year Month as we got the month year and date
format in one column named Time_Stamp.

Now that we have seen how to load the data from a '.csv' file as a Time Series
object, let us go ahead and analyse the Time Series plot that we got.
We can see that there is a slight downward trend with a seasonal pattern
associated as well.

INFO-

RangeIndex: 187 entries, 0 to 186

Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Year Month 187 non-null object
1 Sparkling 187 non-null int64
dtypes: int64(1), object(1)
2. Perform appropriate Exploratory Data Analysis to understand the data and
also perform decomposition.

Descriptive Summary-

The average sales of Sparkling Wine per month are around 2402.
The maximum sale of the Wine is approx 7242.
The minimum sale of the Wine is approx 1070.

Box plot-
Now, let us plot a box and whisker (1.5* IQR) plot to understand the spread of the
data and check for outliers in each year, if any-

As we got to know from the Time Series plot, the box plots over here also
indicates a measure of trend being present. Also, we see that Sales of Sparkling
Wine has some outliers for certain years.

Monthly Box Plot-

Since this is a monthly data, let us plot a box and whisker (1.5* IQR) plot to
understand the spread of the data and check for outliers for every month across
all the years, if any.

The highest such numbers are being recorded in the month of December across
various years.

Monthly Sales across Years-

We can see the highlighted part which indicates the maximum sales of Wine in
year and the month.

Quarterly plot-
We can see there is a outlier present in the data.

Missing Values-

Sparkling 0
dtype: int64
there are no missing values in the dataset

Decompose the Time Series-

Additive-

We see that the residuals are located around 0 from the plot of the residuals in
the decomposition.

Also there is a trend which keeps on changing.

Also there are no outliers in the dataset.

.
As per the 'additive' decomposition, we see that there is a pronounced trend in
the earlier years of the data. There is seasonality as well.

Also there are no outliers in the dataset.

The trend keeps on changing.

3. Split the data into train and test-

Training Data is till the end of 1990. Test Data is from the beginning of 1991 to the
last time stamp provided.

Train and test value counts-

TARAIN-0.9999999999999998
TEST- 1.0
Train and test number of columns and rows –

(132, 1)
(55, 1)

Sales for last 5 years-

It is difficult to predict the future observations if such an instance have not

happened in the past. From our train-test split we are predicting likewise behavior
as compared to the past years.
4. Building different models and comparing the accuracy metrics-
Model 1: Linear Regression-

For this particular linear regression, we are going to regress the 'Sales' variable
against the order of the occurrence.

Training Time instance

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22 , 23, 24, 25,
26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56,
57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78,
79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100,
101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132]

Test Time instance

[133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148,
149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180,
181, 182, 183, 184, 185, 186, 187]

We see that we have successfully the generated the numerical time instance
order for both the training and test set. Now we will add these values in the
training and test set.

First few rows of Training Data -

Sparkling time
Time_stamp
1980-01-31 112.0 1
1980-02-29 118.0 2
1980-03-31 129.0 3
1980-04-30 99.0 4
1980-05-31 116.0 5
Last few rows of Training Data -
First few rows of Training Data
Sparkling time
Time_stamp
1980-01-31 1686 1
1980-02-29 1591 2
1980-03-31 2304 3
1980-04-30 1712 4
1980-05-31 1471 5

Last few rows of Training Data

Sparkling time
Time_stamp
1990-08-31 1605 128
1990-09-30 2424 129
1990-10-31 3116 130
1990-11-30 4286 131
1990-12-31 6047 132

First few rows of Test Data

Sparkling time
Time_stamp
1991-01-31 1902 133
1991-02-28 2049 134
1991-03-31 1874 135
1991-04-30 1279 136
1991-05-31 1432 137

Last few rows of Test Data

Sparkling time
Time_stamp
1995-03-31 1897 183
1995-04-30 1862 184
1995-05-31 1670 185
1995-06-30 1688 186
1995-07-31 2031 187
Now that our training and test data has been modified, let us go ahead use
𝐿𝑖𝑛𝑒𝑎𝑟𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 to build the model on the training data and test the model on
the test data

Sales data for last 5 years-

Defining the accuracy metrics.

Model Evaluation using RMSE on test data-

For RegressionOnTime forecast on the Test Data, RMSE is 1389.14

Test
RMSE

RegressionOn 1389.
Time 14
Model 2: Naive Approach:

For this particular naive model, we say that the prediction for tomorrow is the
same as today and the prediction for day after tomorrow is tomorrow and since
the prediction of tomorrow is same as today, therefore the prediction for day
after tomorrow is also today.

Time_stamp
1991-01-31 6047
1991-02-28 6047
1991-03-31 6047
1991-04-30 6047
1991-05-31 6047
Name: naive, dtype: int64
Model Evaluation
Model Evaluation using RMSE on test data-

For RegressionOnTime forecast on the Test Data, RMSE is 3864.28

Test
RMSE

RegressionOn 1389.
Time 14

NaiveModel 3864.
28
Method 3: Simple Average

For this particular simple average method, we will forecast by using the average
of the training values.

Top 5 rows of test data-

Sparkling mean_forecast
Time_stam
p
31-01-1991 1902 2403.78
28-02-1991 2049 2403.78
31-03-1991 1874 2403.78
30-04-1991 1279 2403.78
31-05-1991 1432 2403.78
Model Evaluation
Model Evaluation using RMSE on test data-
For Simple Average forecast on the Test Data, RMSE is 53.461

Test
RMSE

RegressionOnTi 1389.1
me 4

NaiveModel 3864.2
8

SimpleAverageM 1275.0
odel 8

Method 4: Moving Average (MA)

For the moving average model, we are going to calculate rolling means (or moving
averages) for different intervals. The best interval can be determined by the
maximum accuracy (or the minimum error) over here.

For Moving Average, we are going to average over the entire data.

Top rows-
Train data-

Let us split the data into train and test and plot this Time Series. The window of
the moving average is need to be carefully selected as too big a window will
result in not having any test set as the whole series might get averaged over.
Test data-

Model Evaluation using RMSE-

Done only on the test data.

Train data-
For 2 point Moving Average Model forecast on the Training Data, RMSE is 813.401
For 4 point Moving Average Model forecast on the Training Data, RMSE is
1156.590
For 6 point Moving Average Model forecast on the Training Data, RMSE is
1283.927
For 9 point Moving Average Model forecast on the Training Data, RMSE is
1346.278

Test data-

Before we go on to build the various Exponential Smoothing models, let us plot

all the models and compare the Time Series plots.
Method 5: Simple Exponential Smoothing-

'smoothing_level': 0.04960659880745982,
'smoothing_trend': Nan,
'smoothing_seasonal': Nan,
'damping_trend': Nan,
'initial_level': 1818.5047538435374,
'initial_trend': Nan,
'initial_seasons': array ([], dtype=float64),
'use_boxcox': False,
'Lamda': None,
'remove_bias': False}

Top 5 rows of test data-

Plotting on train and test dataset-
Model Evaluation -Simple Exponential Smoothing

For Alpha =0.04 Simple Exponential Smoothing Model forecast on the Test Data,
RMSE is 1316.035

Setting different alpha values.

Remember, the higher the alpha value more weightage is given to the more
recent observation. That means, what happened recently will happen again.

We will run a loop with different alpha values to understand which particular
value works best for alpha on the test set

First we will define an empty data frame to store our values from the loop-

Model Evaluation-
Plotting on both the Training and Test data-
-
Summary-

Method 6: Double Exponential Smoothing (Holt's Model)

Two parameters 𝛼 and 𝛽 are estimated in this model. Level and Trend are
accounted for in this model.

First we will define an empty data frame to store our values from the loop-
Let us sort the data frame in the ascending ordering of the 'Test RMSE' and the
'Test MAPE' values.
Plotting on both the Training and Test data-

In this particular we have built several models and went through a model building
exercise. This particular exercise has given us an idea as to which particular model
gives us the least error on our test set for this data. But in Time Series
Forecasting, we need to be careful about the fact that after we have done this
exercise we need to build the model on the whole data. Remember, the training
data that we have used to build the model stops much before the data ends. In
order to forecast using any of the models built, we need to build the models again
(this time on the complete data) with the same parameters.

The two models to be built on the whole data are the following:

Alpha , Beta , Gamma, TripleExponentialSmoothing

Alpha , Beta, Gamma, TripleExponentialSmoothing

Method 7: Triple Exponential Smoothing (Holt - winter’s Model)

Three parameters , 𝛽 and 𝛾 are estimated in this model. Level, Trend and
Seasonality are accounted for in this model.

Parameters-
{'smoothing_level': 0.11107308290744182,
'smoothing_trend': 0.06167745801641925,
'smoothing_seasonal': 0.39488777704116057,
'damping_trend': Nan,
'initial_level': 1639.5306320456996,
'initial_trend': -13.803739314239138,
'initial_seasons': array ([1.04411064, 1.00095858, 1.40459398, 1.20906039,
0.96413947,
0.96754964, 1.3048211, 1.69841076, 1.37034155, 1.81659752,
2.84708154, 3.62462473]),
'use_boxcox': False,
'Lamda': None,
'remove_bias': False}
Prediction on the test data-
Plotting on both the Training and Test using auto fit-

Model Evaluation-

Test Data-

For Alpha=0.111, Beta=0.061, Gamma=0.395, Triple Exponential Smoothing

Model forecast on the Test Data, RMSE is 469.432
First we will define an empty data frame to store our values from the loop-
Train RMSE-

Test RMSE-
Plotting on both the Training and Test data using brute force alpha, beta and
gamma determination-
Sorted by RMSE values on the Test Data:

Plotting on both the Training and Test data-

The two models to be built on the whole data are the following:

Alpha, Beta, Gamma, TripleExponentialSmoothing

1. MODEL1-

RMSE: 421.30973568581123

MAPE: 14.463167851671658

Getting the predictions for the same number of times stamps that are present in
the test data-
One assumption that we have made over here while calculating the confidence
bands is that the standard deviation of the forecast distribution is almost equal to
the residual standard deviation.

 In the below code, we have calculated the upper and lower confidence
bands at 95% confidence level

Plot the forecast along with the confidence band-

Let us now build the second model using the same parameters on the full data
and check the confidence bands when we forecast into the future for the length
of the test set.

2. MODEL2-

RMSE: 353.89206663885477

MAPE: 11.681458721875629

Getting the predictions for the same number of times stamps that are present in
the test data-

 In the below code, we have calculated the upper and lower confidence
bands.
 The percentile function under numpy lets us calculate these and adding
and subtracting from the predictions
 gives us the necessary confidence bands for the predictions
Plot the forecast along with the confidence band-
5.
Check for stationarity of the whole Time Series data.

Dicky Fuller Test

Null Hypothesis H0- Series is not stationary

Alternative Hypothesis H1- Series is Stationary

Results of Dickey-Fuller Test:

Test Statistic -1.36
P-value 0.60
#Lags Used 11.00
Number of Observations Used 175.00
Critical Value (1%) -3.47
Critical Value (5%) -2.88
Critical Value (10%) -2.58
Dtype: float64

We see that at 5% significant level the Time Series is non-stationary.

Let us take a difference of order 1 and check whether the Time Series is stationary
or not.

Results of Dickey-Fuller Test:

Test Statistic -45.05
P-value 0.00
#Lags Used 10.00
Number of Observations Used 175.00
Critical Value (1%) -3.47
Critical Value (5%) -2.88
Critical Value (10%) -2.58
Dtype: float64

We see that at 𝛼 = 0.05 the Time Series is indeed stationary.

Plot the Autocorrelation and the Partial Autocorrelation function plots on the
whole data.
From the above plots, we can say that there seems to be seasonality in the data.

Check for stationarity of the Training Data Time Series.

Results of Dickey-Fuller Test:

Test Statistic -1.21
p-value 0.67
#Lags Used 12.00
Number of Observations Used 119.00
Critical Value (1%) -3.49
Critical Value (5%) -2.89
Critical Value (10%) -2.58
dtype: float64
We see that the series is not stationary at 𝛼 = 0.05.

Results of Dickey-Fuller Test:

Test Statistic -8.01
P-value 0.00
#Lags Used 11.00
Number of Observations Used 119.00
Critical Value (1%) -3.49
Critical Value (5%) -2.89
Critical Value (10%) -2.58
Dtype: float64

We see that after taking a difference of order 1 the series have become stationary
at 𝛼 = 0.05.

6. Build an Automated version of an ARIMA/SARIMA model for which the best

parameters are selected in accordance with the lowest Akaike Information
Criteria (AIC).
The following loop helps us in getting a combination of different parameters of p
and q in the range of 0 and 2
We have kept the value of d as 1 as we need to take a difference of the series to
make it stationary.

Some parameter combinations for the Model...

Model: (0, 1, 0)
Model: (0, 1, 1)
Model: (0, 1, 2)
Model: (1, 1, 0)
Model: (1, 1, 1)
Model: (1, 1, 2)
Model: (2, 1, 0)
Model: (2, 1, 1)
Model: (2, 1, 2)

Sort the below AIC values in the ascending order to get the parameters for the
minimum AIC value-
ARIMA (0, 1, 0) - AIC: 2269.582796371201
ARIMA (0, 1, 1) - AIC: 2264.9064421638386
ARIMA (0, 1, 2) - AIC: 2232.783097684079
ARIMA (1, 1, 0) - AIC: 2268.5280607731743
ARIMA (1, 1, 1) - AIC: 2235.0139453492993
ARIMA (1, 1, 2) - AIC: 2233.59764711895
ARIMA (2, 1, 0) - AIC: 2262.035600097813
ARIMA (2, 1, 1) - AIC: 2232.3604898927674
ARIMA (2, 1, 2) - AIC: 2210.61951923921

After arranging in ascending order-

ARIMA MODEL RESULTS-

Predict on the Test Set using this model and evaluate the model
Build an Automated version of a SARIMA model for which the best parameters
are selected in accordance with the lowest Akaike Information Criteria (AIC).

Let us look at the ACF plot once more to understand the seasonal parameter for
the SARIMA model.
Setting the seasonality as 6 for the first iteration of the auto SARIMA model.

Examples of some parameter combinations for Model...

Model: (0, 1, 1)(0, 0, 1, 6)
Model: (0, 1, 2)(0, 0, 2, 6)
Model: (1, 1, 0)(1, 0, 0, 6)
Model: (1, 1, 1)(1, 0, 1, 6)
Model: (1, 1, 2)(1, 0, 2, 6)
Model: (2, 1, 0)(2, 0, 0, 6)
Model: (2, 1, 1)(2, 0, 1, 6)
Model: (2, 1, 2)(2, 0, 2, 6)

Sort values of AIC-

Predict on the Test Set using this model and evaluate the model.
We see that we have huge gain the RMSE value by including the seasonal
parameters as well.

Setting the seasonality as 12 for the second iteration of the auto SARIMA model.

Examples of some parameter combinations for Model...

Model: (0, 1, 1)(0, 0, 1, 12)
Model: (0, 1, 2)(0, 0, 2, 12)
Model: (1, 1, 0)(1, 0, 0, 12)
Model: (1, 1, 1)(1, 0, 1, 12)
Model: (1, 1, 2)(1, 0, 2, 12)
Model: (2, 1, 0)(2, 0, 0, 12)
Model: (2, 1, 1)(2, 0, 1, 12)
Model: (2, 1, 2)(2, 0, 2, 12)

Summary-
Similar to the last iteration of the model where the seasonality parameter was
taken as 6, here also we see that the model diagnostics plot does not indicate any
remaining information that we can get.
Predict on the Test Set using this model and evaluate the model.

We see that the RMSE value have not reduced further when the seasonality
parameter was changed to 12.
7 AND 8 ANSWER-
Build a version of the ARIMA model for which the best parameters are selected
by looking at the ACF and the PACF plots.

Let us look at the ACF and the PACF plots once more.

Here, we have taken alpha=0.05.

The Auto-Regressive parameter in an ARIMA model is 'p' which comes from the
significant lag before which the PACF plot cuts-off to 0. The Moving-Average
parameter in an ARIMA model is 'q' which comes from the significant lag before
the ACF plot cuts-off to 0. By looking at the above plots, we can say that both the
PACF and ACF plot cuts-off at lag 0.
We get a comparatively simpler model by looking at the ACF and the PACF plots.

Predict on the Test Set using this model and evaluate the model.

We see that there is difference in the RMSE values for both the models, but
remember that the second model is a much simpler model.
Build a version of the SARIMA model for which the best parameters are selected
by looking at the ACF and the PACF plots. - Seasonality at 6.

We see that our ACF plot at the seasonal interval (6) does not taper off. So, we go
ahead and take a seasonal differencing of the original series. Before that let us
look at the original series
We see that there is a trend and seasonality. So, now we take a seasonal
differencing and check the series.

Now we see that there is almost no trend present in the data. Seasonality is only
present in the data.

Let us go ahead and check the stationarity of the above series before fitting the
SARIMA mode.
Results of Dickey-Fuller Test:
Test Statistic -7.02
P-value 0.00
#Lags Used 13.00
Number of Observations Used 111.00
Critical Value (1%) -3.49
Critical Value (5%) -2.89
Critical Value (10%) -2.58
Dtype: float64
Checking the ACF and the PACF plots for the new modified Time Series.

Here, we have taken alpha=0.05.

We are going to take the seasonal period as 6. We will keep the p(1) and q(1)
parameters same as the ARIMA model.

The Auto-Regressive parameter in an SARIMA model is 'P' which comes from the
significant lag after which the PACF plot cuts-off to 0. The Moving-Average
parameter in an SARIMA model is 'q' which comes from the significant lag after
which the ACF plot cuts-off to 0. Remember to check the ACF and the PACF plots
only at multiples of 6 (since 6 is the seasonal period).

This is a common problem while building models by looking at the ACF and the
PACF plots. But we are able to explain the model.
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-
step).
[2] Covariance matrix is singular or near-singular, with condition number

Predict on the Test Set using this model and evaluate the model.
SARIMA summary-
This is where our model building ends.

Now, we will take our best model and forecast 12 months into the future with
appropriate confidence intervals to see how the predictions look. We have to
build our model on the full data for this.
9.
Building the most optimum model on the Full Data-

We can see the we have annual seasonality rather than half year seasonality.

NORMALLY DISTRIBUTED
Results-

Evaluate the model on the whole and predict 12 months into the future (till the
end of next year
Plot the forecast along with the confidence band-

Final result of RMSE-

After arranging in ascending order

SARIMA (0,1,2)(2,0,2,12)
Insights and recommendations-

1. We have loaded the Sparkling dataset.csv.

2. Performed EDA to check whether there are any missing values and outliers
present in the dataset.

3. After applying EDA we have split the data into train and test

Train is our sample data

Test is are predicted data (actual data)

Training Data is till the end of 1990. Test Data is from the beginning of 1991 to the
last time stamp provided

4. Building different models and prediction the accuracy or doing model

evaluation using RMSE-

 Linear Regression model

 Naïve Model

 Simple average model

 Simple exponential model

 Trailing moving average model

 Double exponential model

 Triple exponential model

5. Built the automated ARIMA/SARIMA MODEL

6, many different forecasting algorithms and analysis methods can be applied

to extract the relevant information that is required.

Regardless of using Autoregressive algorithms to determine the trend patterns

for forecasting or the ARIMA model to deduce the correlation pattern of the
data, it all depends on the application use cases and the complexity. Since
most time series forecasting analyses are trivial, choosing the easiest and
simplest model is the best way to look at it.

7. So we have

 Read the problem described it.

 performed the various models

 evaluated the models

8. In the month of December the sales for Sparkling Wine increases have more
demand those other months.

9 .Matching season and customers demand and trend.

10. In-store Tastings and Events can attract the customers to your store and can
increase sales.

11. We can also see in the year 1981, 1983 and 1994 the Wine sales in the month
of October, November remained constant after that it has starting fluctuating
needs to pay attention after that.

Problem-
For this particular assignment, the data of different types of wine sales in the 20th
century is to be analyzed. Bot

10.Comment on the model thus built and report your findings and suggest the
measures that the company should be taking for f

1. WE WILL TAKE SPARKLING DATASET-
The dataset Sparkling.csv was loaded into the database
Shape-there are 187 rows and 2 colu

We have converted the data into date format and given the column name as
Time_Stamp.
We can also drop the column Year Month

We can see that there is a slight downward trend with a seasonal pattern
associated as well.
INFO-
RangeIndex: 187 entries,

2. Perform appropriate Exploratory Data Analysis to understand the data and
also perform decomposition.
Descriptive Summary-

Now, let us plot a box and whisker (1.5* IQR) plot to understand the spread of the
data and check for outliers in each year,

Since this is a monthly data, let us plot a box and whisker (1.5* IQR) plot to
understand the spread of the data and check f

We can see the highlighted part which indicates the maximum sales of Wine in
year and the month.
Quarterly plot-

We can see there is a outlier present in the data.
Missing Values-
Sparkling 0
dtype: int64
there are no missing values in

ARIMA Models: Instructions
60% (5)
ARIMA Models: Instructions
3 pages
Wine Sales Time Series Analysis
82% (11)
Wine Sales Time Series Analysis
33 pages
FRA Milestone1 - Maminulislam
100% (4)
FRA Milestone1 - Maminulislam
23 pages
MRA Project Milesone-1: BY-Shorya Goel PGP Dsba Oct - 20 B
92% (25)
MRA Project Milesone-1: BY-Shorya Goel PGP Dsba Oct - 20 B
35 pages
Data Insights for Auto Parts Sales
100% (3)
Data Insights for Auto Parts Sales
29 pages
MRA Project MIlestone1
83% (18)
MRA Project MIlestone1
29 pages
Arnab Chowdhury DM
75% (4)
Arnab Chowdhury DM
14 pages
TSF Week3 Quiz Part2 PDF
67% (3)
TSF Week3 Quiz Part2 PDF
3 pages
Car Insurance Data Insights
100% (2)
Car Insurance Data Insights
27 pages
Data Visualization Project Nilanjan Das PDF
100% (1)
Data Visualization Project Nilanjan Das PDF
26 pages
Insurance Insights with Tableau
No ratings yet
Insurance Insights with Tableau
4 pages
Analyzing US Presidential Speeches
100% (1)
Analyzing US Presidential Speeches
10 pages
A Wholesale Distributor
86% (7)
A Wholesale Distributor
5 pages
FRA Assignment
100% (1)
FRA Assignment
31 pages
Project 7 - DVT - Manoj
No ratings yet
Project 7 - DVT - Manoj
1 page
Project Report - FRA V1.0
71% (7)
Project Report - FRA V1.0
28 pages
Grocery Store Combo Suggestions Analysis
100% (1)
Grocery Store Combo Suggestions Analysis
17 pages
Grocery Store Combo Recommendations
No ratings yet
Grocery Store Combo Recommendations
18 pages
India Credit Risk Model Analysis
100% (4)
India Credit Risk Model Analysis
19 pages
SMDM Project Business Report - Ketan Sawalkar: (Document Title)
100% (2)
SMDM Project Business Report - Ketan Sawalkar: (Document Title)
17 pages
FRA Project Data Analysis Overview
100% (2)
FRA Project Data Analysis Overview
27 pages
Hair Salon Market Segmentation PCA Guide
0% (6)
Hair Salon Market Segmentation PCA Guide
1 page
Statistics Project Cover Page
0% (1)
Statistics Project Cover Page
10 pages
Popular Grocery Combo Recommendations
No ratings yet
Popular Grocery Combo Recommendations
18 pages
Project +Sweta+Kumari+ +FRA+Milestone+1+ July+ 2021
100% (2)
Project +Sweta+Kumari+ +FRA+Milestone+1+ July+ 2021
31 pages
Data Mining - Business Report: Clustering Clean - Ads
100% (4)
Data Mining - Business Report: Clustering Clean - Ads
24 pages
Café Chain POS Data Analysis Insights
83% (6)
Café Chain POS Data Analysis Insights
35 pages
Business Report M2 PDF
100% (2)
Business Report M2 PDF
14 pages
PM - ExtendedProject - Business Report
100% (5)
PM - ExtendedProject - Business Report
35 pages
Austo Motor Company Marketing Analysis
No ratings yet
Austo Motor Company Marketing Analysis
12 pages
Digital Ads Clustering Analysis Report
No ratings yet
Digital Ads Clustering Analysis Report
24 pages
MRA Project Milestone 1 - Maminulislam
83% (6)
MRA Project Milestone 1 - Maminulislam
30 pages
Graded Project As - Kamalpreet Kaur
No ratings yet
Graded Project As - Kamalpreet Kaur
8 pages
Boston Condo Sale Story
0% (1)
Boston Condo Sale Story
11 pages
Automobile Parts Sales Analytics Report
100% (2)
Automobile Parts Sales Analytics Report
41 pages
India Credit Risk Model Development
No ratings yet
India Credit Risk Model Development
14 pages
Sales Prediction & Car Crash Analysis
54% (13)
Sales Prediction & Car Crash Analysis
14 pages
FRA Project Milestone 1 Overview
90% (21)
FRA Project Milestone 1 Overview
44 pages
Data Mining Quiz 2
100% (2)
Data Mining Quiz 2
8 pages
Time Series Forecasting Week 2 Quiz Part 1
75% (4)
Time Series Forecasting Week 2 Quiz Part 1
3 pages
Weekly Quiz - 2 (TSF) - Time Series Forecasting - Great Learning PDF
100% (3)
Weekly Quiz - 2 (TSF) - Time Series Forecasting - Great Learning PDF
4 pages
MRA Assignment: by Chitra Mukadam
100% (2)
MRA Assignment: by Chitra Mukadam
19 pages
Graded Quiz Week 2 - Data Visualization Using TABLEAU - Great Learning
100% (5)
Graded Quiz Week 2 - Data Visualization Using TABLEAU - Great Learning
11 pages
RFM Analysis for Auto Parts Sales Insights
100% (8)
RFM Analysis for Auto Parts Sales Insights
30 pages
Life Insurance Agent Bonus Analysis
80% (5)
Life Insurance Agent Bonus Analysis
34 pages
Data Mining Project Report on Customer Segmentation
100% (2)
Data Mining Project Report on Customer Segmentation
26 pages
Marketing & Retail Analytics-Milestone 1 - 300521
71% (14)
Marketing & Retail Analytics-Milestone 1 - 300521
18 pages
Linear Regression Model for CPU Usage
No ratings yet
Linear Regression Model for CPU Usage
31 pages
RFM Analysis for Customer Segmentation
100% (7)
RFM Analysis for Customer Segmentation
43 pages
SANDYA VB-Business Report TSF
100% (6)
SANDYA VB-Business Report TSF
24 pages
Wine Sales Time Series Analysis
100% (1)
Wine Sales Time Series Analysis
15 pages
Sparkling Wine Sales Forecast Analysis
No ratings yet
Sparkling Wine Sales Forecast Analysis
46 pages
Wine Sales Forecasting Report
100% (1)
Wine Sales Forecasting Report
26 pages
Wine Sales Time Series Forecasting Report
100% (3)
Wine Sales Time Series Forecasting Report
62 pages
1902TaniyaDubey TSF Sparkling 2
No ratings yet
1902TaniyaDubey TSF Sparkling 2
36 pages
Time Series Forecasting for Wine Sales
No ratings yet
Time Series Forecasting for Wine Sales
75 pages
Time Series Rose Shehroz Arfeen
100% (1)
Time Series Rose Shehroz Arfeen
42 pages
Sparkling Wine Sales Forecasting Analysis
No ratings yet
Sparkling Wine Sales Forecasting Analysis
36 pages
Time Series Forecasting for Sparkling Wine
No ratings yet
Time Series Forecasting for Sparkling Wine
70 pages
Time Series Forecasting for Sparkling Wine
No ratings yet
Time Series Forecasting for Sparkling Wine
35 pages
20th Century Wine Sales Analysis
100% (4)
20th Century Wine Sales Analysis
70 pages
House Price Prediction Analysis
100% (2)
House Price Prediction Analysis
26 pages
20th Century Wine Sales Analysis
100% (4)
20th Century Wine Sales Analysis
70 pages
House Price Prediction Analysis
100% (2)
House Price Prediction Analysis
26 pages
Ma3391-P&s Cat - 1 Solution
No ratings yet
Ma3391-P&s Cat - 1 Solution
6 pages
Drawing Conclusions from Research Findings
No ratings yet
Drawing Conclusions from Research Findings
5 pages
SML 2
No ratings yet
SML 2
13 pages
Time Management As A Moderator of Relations Between Stressors and Employee Strain
No ratings yet
Time Management As A Moderator of Relations Between Stressors and Employee Strain
10 pages
Financial Goals and Saving Habits of Senior High Students
No ratings yet
Financial Goals and Saving Habits of Senior High Students
35 pages
1516780478
No ratings yet
1516780478
36 pages
Stock Price Comparison of Companies A, B, C
No ratings yet
Stock Price Comparison of Companies A, B, C
4 pages
Normal Distribution Problems and Solutions
No ratings yet
Normal Distribution Problems and Solutions
7 pages
GST 221 History Philosophy of Science
No ratings yet
GST 221 History Philosophy of Science
82 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
59 pages
Sem in Amos and Mplus
No ratings yet
Sem in Amos and Mplus
30 pages
Intro Stats (6th Edition) de Veaux
No ratings yet
Intro Stats (6th Edition) de Veaux
10 pages
Methods For Validating and Norming Translations of Health Status Questionnaires: The IQOLA Project Approach
No ratings yet
Methods For Validating and Norming Translations of Health Status Questionnaires: The IQOLA Project Approach
7 pages
Student Behavior and Mobile Legends Impact
No ratings yet
Student Behavior and Mobile Legends Impact
26 pages
DLL - Mathematics 5 - Q4 - W9
100% (1)
DLL - Mathematics 5 - Q4 - W9
7 pages
A Study of Matriculation Computer Science Students' Knowledge, Anxiety and Attitude Towards Computer
No ratings yet
A Study of Matriculation Computer Science Students' Knowledge, Anxiety and Attitude Towards Computer
9 pages
Thanasrisuebwong - Influence of The Residual Ridge Widths and Implant Thread Designs On
No ratings yet
Thanasrisuebwong - Influence of The Residual Ridge Widths and Implant Thread Designs On
7 pages
Work Sampling: Techniques and Applications
No ratings yet
Work Sampling: Techniques and Applications
25 pages
Cost Estimation: Mcgraw-Hill/Irwin
No ratings yet
Cost Estimation: Mcgraw-Hill/Irwin
17 pages
Inbound 7405968947952262340
No ratings yet
Inbound 7405968947952262340
48 pages
John+Mark+R.+Asio+ (65 76)
No ratings yet
John+Mark+R.+Asio+ (65 76)
12 pages
Analyzing The Causes of Income Tax Return Failures A Study of Key Factors HARSHADA GUJAR PROJECT
No ratings yet
Analyzing The Causes of Income Tax Return Failures A Study of Key Factors HARSHADA GUJAR PROJECT
62 pages
Machine Learning Model Building
No ratings yet
Machine Learning Model Building
6 pages
Article 03 Impact of HRM Practices On Employees Performance
0% (1)
Article 03 Impact of HRM Practices On Employees Performance
8 pages
ANN Tools and WEKA Experiments
No ratings yet
ANN Tools and WEKA Experiments
14 pages
ch1 Guj
No ratings yet
ch1 Guj
27 pages
Understanding Prior Probability in Bayes' Theorem
No ratings yet
Understanding Prior Probability in Bayes' Theorem
14 pages
Assignment 1: Introduction To Health Research
100% (2)
Assignment 1: Introduction To Health Research
46 pages
Nonlinear Regression in Excel Guide
No ratings yet
Nonlinear Regression in Excel Guide
20 pages
Astm-D1709-22. Impacto Al Dardo
No ratings yet
Astm-D1709-22. Impacto Al Dardo
5 pages