0% found this document useful (0 votes)
29 views19 pages

Wine Final Projects

This document presents a marketing strategy case study for Vivino, focusing on identifying key physicochemical properties of French Bordeaux wines to classify wine quality using machine learning models. The Random Forest model outperformed the Naive Bayes model, achieving an accuracy of 82.9% compared to 80.83%, with both models identifying alcohol, sulphates, and volatile acidity as critical factors for high-quality wines. Recommendations for wine producers include focusing on alcohol content, increasing sulphates for better preservation, and reducing volatile acidity to enhance wine quality.

Uploaded by

Gamyuii Kitsana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views19 pages

Wine Final Projects

This document presents a marketing strategy case study for Vivino, focusing on identifying key physicochemical properties of French Bordeaux wines to classify wine quality using machine learning models. The Random Forest model outperformed the Naive Bayes model, achieving an accuracy of 82.9% compared to 80.83%, with both models identifying alcohol, sulphates, and volatile acidity as critical factors for high-quality wines. Recommendations for wine producers include focusing on alcohol content, increasing sulphates for better preservation, and reducing volatile acidity to enhance wine quality.

Uploaded by

Gamyuii Kitsana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Marketing Research and Engineering : MAX503 FALL 2024

Key Factors of Wine Quality:


A Marketing Strategy Case
Study for Vivino

Lopa Detroja
Kitsana Sudsaneh
Marketing Research and Engineering : MAX503 FALL 2024

Table of Contents

1 Introduction and Executive 5 Random Forest Model


Summary

Data Preparation (Cleaning


2 and Preparing) 6 Comparative Analysis

3 Exploratory Data Analysis 7 Business Implications


and Recommendation

4 Naive Bayes Model 8 Appendix


Marketing Research and Engineering : MAX503 FALL 2024

Goals
Introduction and
Executive Summary Identify key physicochemical properties from
Vivino’s dataset of 1,599 French Bordeaux wines to
classify wine quality.

Use machine learning models to predict and


Methods understand the factors defining a "good" wine.

Used Naive Bayes (a probabilistic model)


and Random Forest (an ensemble decision
tree method) for classification. Key Findings
Random Forest outperformed Naive Bayes in
accuracy and feature analysis.

Top 3 features:
Alcohol: Correlated with higher quality.
Sulphates: Enhanced flavor and preservation.
Volatile Acidity: Lower levels indicate better
quality.
Marketing Research and Engineering : MAX503 FALL 2024

Data Preparation:
Data Overviews (Cleaning and Preparing)
1,599 observations of French Bordeaux wines.

11 physicochemical properties
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides Data Preparation
free sulfur dioxide
total sulfur dioxide Dataset Loaded:
density 1,599 rows and 12 columns.
pH
sulphates Target Variable Created:
alcohol "Good" wine: Quality > 6 (217 samples).
and one quality score. "Bad" wine: Quality ≤ 6 (1,382 samples).

Train-Test Split:
Training set (70%): 1,119 samples (956 "Bad," 163 "Good").
Testing set (30%): 480 samples (426 "Bad," 54 "Good").
Marketing Research and Engineering : MAX503 FALL 2024

Exploratory Data Analysis


Positive Correlation:

A strong positive correlation indicates that as the


amount of free sulfur dioxide increases, the total
sulfur dioxide also increases proportionally.

Example: Total Sulfur Dioxide and Free Sulfur


Dioxide (r = 0.67)

Negative Correlation:

A strong negative correlation indicates that as the


acidity of the wine increases (lower pH), the pH
value decreases. This is expected because pH
measures acidity inversely.

Example: pH and Fixed Acidity (r = -0.68)


Marketing Research and Engineering : MAX503 FALL 2024

Naive Bayes Model Training Model Details

Dataset: The dataset consists of 1,599 observations of


French Bordeaux wines with 11 variables. We focus on
predicting wine quality, where the target variable
quality.label is categorized as "Good" (quality > 6) or "Bad"
(quality ≤ 6).

Model Type: Naive Bayes (a probabilistic classifier).

Split Dataset: Training Dataset: 70% of the data (1,119


samples), Testing Dataset: 30% of the data (480 samples).

We applied a Naive Bayes classifier, a probabilistic model


based on Bayes’ Theorem, to predict whether a wine is
"Good" or "Bad."
Marketing Research and Engineering : MAX503 FALL 2024

Training Model Details Naive Bayes Model


Built a Naive Bayes model for classification to
predict wine quality.

Split the dataset: 70% training data (1,119


samples) and 30% testing data (480 samples).

Before Handling Imbalance


(Original Model)
Class Error: "Bad" wines had low error (63.21%), Accuracy: 80.83%
while "Good" wines had a higher error (36.79%).

Model Bias: The model had a slight bias towards


the majority class ("Bad").

Cluster Plot: Significant overlap between "Good"


and "Bad" classes, reflecting poor separation for the
"Good" wines.
Marketing Research and Engineering : MAX503 FALL 2024

Naive Bayes Model


After Handling Imbalance
(Balanced Sampling):
Balanced Sampling:
ensures the model equally focuses on both classes,
Issue with Imbalanced Classes improving the accuracy for the "Good" class.

Class Error:
The dataset is imbalanced with more "Bad" Slightly reduced error for both classes.
wines than "Good" wines, affecting prediction
accuracy for the "Good" wines.
Adjusted Rand Index (ARI):
ARI: 0.21, indicating a modest agreement
between the predicted and actual classes.

Cluster Plot:
Improved separation between the "Good" and
"Bad" classes after balancing the dataset.
Marketing Research and Engineering : MAX503 FALL 2024

Naive Bayes Model


Performance Metrics
RESULT EXPLANATION

Naive Bayes provides a solid prediction performance with


Accuracy 80.83% an accuracy of 80.83%, while Random Forest slightly
outperforms it at 82.9%.

The Adjusted Rand Index (ARI) indicates a moderate


agreement between predicted and actual labels, though it
Adjust Rand Index 0.21
suggests room for improvement, especially for predicting
the "Good" class.

Among the "Bad" wines in the test data, Naive Bayes


Error rate: correctly classified 359 "Bad" wines and misclassified 67 as
Confusion Matrix
20.37% (Good) "Good". Among the "Good" wines, 29 were classified as
"Bad".

Top Features: Naive Bayes identifies Alcohol, Residual Sugar, and Sulphates as the most important features in
predicting wine quality.
Marketing Research and Engineering : MAX503 FALL 2024

3 Tops properties
Naive Bayes Model
Summary Function Alcohol:
Predicted: 11.54 for "Good" wines vs. 10.20 for "Bad"
wines.
Actual: 11.63 for "Good" wines vs. 10.32 for "Bad" wines.
Observation: Higher alcohol content is strongly
associated with "Good" wines.

Sulphates:
Predicted: 0.73 for "Good" wines vs. 0.64 for "Bad"
wines.
Actual: 0.74 for "Good" wines vs. 0.65 for "Bad" wines.
Observation: "Good" wines consistently have higher
sulphate levels, which enhance flavor and preservation.

Volatile Acidity:
Predicted: 0.36 for "Good" wines vs. 0.57 for "Bad"
wines.
Actual: 0.43 for "Good" wines vs. 0.54 for "Bad" wines.
Observation: Lower volatile acidity improves the taste,
making wines more likely to be classified as "Good."
Marketing Research and Engineering : MAX503 FALL 2024

Random Forest Model


Training Model Details
Built a Random Forest model with 3,000
decision trees for classification.

Split dataset: 70% training data (1,119 samples)


and 30% testing data (480 samples).

Before Handling Imbalance


(Original Model)
Class Error: "Bad" wines had low error (2.65%),
while "Good" wines had high error (46.01%).

Model Bias: The model performed significantly


better for the majority class ("Bad") compared to
the minority class ("Good").

Cluster Plot: Significant overlap between "Good"


and "Bad" clusters, highlighting poor separation for
"Good" wines.
Marketing Research and Engineering : MAX503 FALL 2024

Random Forest Model


Issue with Imbalanced Classes
The dataset is imbalanced, with more "Bad" wines (1,382)
than "Good" wines (217). The model prioritizes "Bad" wines,
leading to poor performance on "Good" wines.

Balanced sampling ensures the model focuses equally on


"Good" wines, improving their classification accuracy.

After Handling Imbalance


(Balanced Sampling):
Class Error: "Bad": Slightly higher error (13.18%).
"Good": Reduced error (20.25%), improving minority
class predictions.

Cluster Plot: Shows clearer separation and better


classification for "Good" wines.

Balanced Focus: Model now performs better for "Good"


wines, addressing the earlier bias toward "Bad" wines.
Marketing Research and Engineering : MAX503 FALL 2024

Random Forest Model


Performance Metrics
RESULT EXPLANATION

Out of the total predictions, 82.9% matched the actual


Accuracy 82.9%
wine quality labels

Moderate clustering agreement reflects the model’s


Adjust Rand Index 0.32
limitation for minority class “Good” predictions.

Error rate: Among the "Bad" wines in the test data, 16.67% were
Confusion Matrix
16.67% (Bad) incorrectly classified as "Good" wines.

Error rate: Among the "Good" wines in the test data, 20.37% were
20.37% (Good) incorrectly classified as "Bad" wines.

The Random Forest model has proven effective at distinguishing between "Good" and "Bad" wines with an
accuracy of 82.9%. This indicates that the model can reliably predict the quality of most wines in the dataset.
Marketing Research and Engineering : MAX503 FALL 2024

Random Forest Model 3 Tops properties

Summary Function Alcohol :


Higher for “Good” wines than “Bad” in both
Good wines have significant higher alcohol
levels, making it the most importance factor.

Sulphates:
Higher for “Good” wines than “Bad” in both
Predicted Quality Sulphates enhance flavor and preservation,
strongly influencing wine quality.

Volatile Acidity:
Lower for “Good” wines than “Bad” in both
Lower volatile acidity improve taste, making
this a critical factor for high-quality wines.

Actual Quality
Marketing Research and Engineering : MAX503 FALL 2024

Random Forest Model 3 Tops properties

Variable Importance

The heatmap clearly shows


that "Good" wines have higher
alcohol and sulphates and
lower volatile acidity compared
to "Bad" wines.

The feature importance plots


(Mean Decrease Accuracy and
Gini) confirm the rankings of
alcohol, sulphates, and volatile
acidity as the most influential
factors.
Marketing Research and Engineering : MAX503 FALL 2024

Comparative Analysis
NAIVE BAYES RANDOM FOREST EXPLANATION

Naive Bayes predicted 80.83% of wine labels,


Accuracy 80.83% 82.9%
slightly lower than Random Forest.

Naive Bayes shows moderate agreement with


Adjust
0.21 0.32 actual labels, with room for improvement for
Rand Index
"Good" class predictions.

Both models show similar misclassification rates


Confusion Error rate: 20.37% Error rate: 20.37%
for "Good" wines, but Naive Bayes is slightly less
Matrix (Good) (Good)
accurate.

Alcohol, Alcohol,
3 Top Both models agree on Alcohol and Sulphates and
Sulphates, Sulphates,
Features Volatile Acidity as key features.
Volatile Acidity Volatile Acidity

Random Forest outperforms Naive Bayes in terms of accuracy and minority class
Key Takeaway performance.
Random Forest also identifies important physicochemical properties, offering deeper
insights for marketing and product decisions.
Marketing Research and Engineering : MAX503 FALL 2024

Implications
Business Implications and
Recommendation
Model Effectiveness:
Naive Bayes Model: With an accuracy of 80.83%, Naive
Bayes performs well in classifying wine quality based on
physicochemical features, offering a straightforward Feature Importance:
and efficient model for wine quality classification. Top Features Identified: Alcohol, Sulphates,
Random Forest Model: While slightly more accurate Volatile Acidity were identified as the most
(82.9%), it requires more complexity. Naive Bayes still important features in predicting wine quality.
outperforms in terms of accuracy with minimal These features highlight key areas
complexity and provides value in simpler scenarios. winemakers can focus on to improve
product quality, aligning production with
Impact of Class Imbalance: quality characteristics demanded in the
Naive Bayes model is highly effective in handling market.
imbalanced datasets, maintaining robustness even with
skewed distributions between the “Bad” and “Good”
wines.
The Random Forest model’s handling of class imbalance
improves prediction accuracy for "Good" wines, but
Naive Bayes provides a simpler and effective solution
for classifying minority classes.
Marketing Research and Engineering : MAX503 FALL 2024

Recommendations For Marketing and Product Decisions

For Wine Producers Targeting "Good" Wines: The insights from the
models can guide marketing efforts by
Focus on Alcohol Content: Since alcohol emphasizing the importance of alcohol, sulphates,
significantly influences wine quality, experimenting and volatile acidity as key selling points for
with different fermentation processes to adjust premium wine products.
alcohol levels could help producers achieve the Use Model Insights for R&D: The Naive Bayes
optimal balance for quality wine. model can inform the development of wine by
Increase Sulphates for Better Preservation: identifying key physicochemical traits that
Ensuring adequate sulphate content can enhance correlate with high-quality wines. This insight can
flavor, extend the shelf life, and align with consumer be used to enhance product development and
demand for higher-quality wines. continuous improvement.
Reduce Volatile Acidity: Maintaining low levels of
volatile acidity will not only improve taste but also
Data-Driven Marketing Campaigns
enhance the appeal of wines in the high-quality Wine brands can utilize these insights in
segment. marketing campaigns by highlighting
scientifically-backed quality factors, appealing
to consumers who value quality consistency.
Marketing Research and Engineering : MAX503 FALL 2024

Appendix

For more detailed information in part of coding, please access the full document
by scanning the provided QR code with your mobile device's camera or a QR code
scanning application.

You might also like