Comprehensive Resource
Comprehensive Resource
ABSTRACT
In the era of digital commerce, customer reviews have emerged as a critical factor influencing
consumer behavior and business strategies. These reviews contain valuable insights into
customer satisfaction, product quality, and service experiences. However, analyzing large
volumes of textual feedback manually or using basic keyword-based methods is inefficient
and often inaccurate. Traditional systems rely on rule-based keyword matching or manual
interpretation, which are time-consuming, lack contextual understanding, and are not scalable
to meet the demands of modern e-commerce platforms. To overcome these limitations, this
research proposes a machine learning-based sentiment analysis system specifically designed
to process customer reviews from the Brazilian e-commerce dataset (Olist). The system
integrates multiple datasets and performs comprehensive preprocessing, including text
cleaning, tokenization, stopword removal, and TF-IDF vectorization. Sentiment labeling is
carried out using the VADER sentiment analyzer, and three classification models are trained
and evaluated: Gaussian Naive Bayes (GNB), Logistic Regression Classifier (LRC), and
CatBoost model. Among these, the CatBoost classifier demonstrates superior performance,
achieving an accuracy of 99.76%, significantly outperforming the other models. LRC model
also shows robust results, while GNB model lags due to its simplistic probabilistic
assumptions. The system is evaluated using key performance metrics such as accuracy,
precision, recall, and F1-score, along with confusion matrix visualizations. This intelligent
sentiment analysis system offers a scalable, accurate, and automated solution for
understanding customer opinions. It provides e-commerce platforms with actionable insights
to enhance product offerings, improve customer experience, and support data-driven
decision-making, making it a valuable tool in the competitive digital marketplace.
CHAPTER 1
INTRODUCTION
1.1 Overview
This research focuses on developing a machine learning-based sentiment analysis system for
classifying customer reviews collected from the Brazilian e-commerce platform (Olist). With
the rise of online shopping, customer reviews have become a key source of information for
understanding user satisfaction and improving business services. This system integrates
various datasets from the Olist platform, performs preprocessing to clean and transform
textual data, and applies sentiment classification using supervised machine learning models.
Three models—GNB model, LRC model, and CatBoost—are trained and evaluated using
performance metrics such as accuracy, precision, recall, and F1-score. The research aims to
automate the process of understanding customer sentiment in real-time, offering businesses a
reliable and scalable approach to monitor and respond to user feedback.
The primary motivation for this research stems from the increasing volume of unstructured
textual feedback available on e-commerce platforms. Manually analyzing this data is not only
time-consuming but also inconsistent and error-prone. Traditional rule-based sentiment
analysis tools often lack the contextual intelligence to understand the true sentiment behind
customer statements. Therefore, there is a growing need to explore machine learning methods
that can handle such data efficiently, adapt to changing language patterns, and provide deeper
insights into customer satisfaction.
The significance of this research lies in its potential to revolutionize how e-commerce
platforms understand and respond to customer feedback. By automating sentiment
classification with high accuracy using machine learning models, the system reduces manual
labor, minimizes errors, and enhances customer engagement strategies. The CatBoost model’s
superior performance also highlights the advantage of using advanced algorithms in
sentiment analysis tasks. Ultimately, the research contributes to better customer service, data-
driven decision-making, and increased business competitiveness in the digital market.
The primary objective of this research is to design and implement a robust machine learning-
based sentiment analysis system capable of accurately classifying customer reviews from the
Brazilian e-commerce platform (Olist) into Positive, Neutral, or Negative sentiments. To
achieve this, the following specific objectives are established:
To collect and integrate multiple related datasets from the Olist platform, including
reviews, order information, product details, and customer data, to create a
comprehensive dataset for sentiment analysis.
To train and evaluate multiple machine learning models (GNB model, LRC
model, and CatBoost) and compare their performance using metrics such as accuracy,
precision, recall, and F1-score.
To identify the most accurate and efficient model for real-time sentiment
classification in a production-ready e-commerce environment.
To demonstrate the value of automated sentiment analysis in improving customer
understanding, supporting data-driven decisions, and enhancing overall business
intelligence in online retail platforms.
1.6 Advantages
High Accuracy and Performance: The use of advanced models like CatBoost provides
exceptional accuracy (up to 99.76%), significantly outperforming traditional methods.
Automation of Review Analysis: The system eliminates the need for manual review
categorization, reducing human error and increasing processing efficiency.
Data-Driven Insights: The model outputs can be used to generate actionable insights
for marketing, customer support, and product development strategies.
1.7 Applications
The outcomes of this research can be directly applied in various domains and real-world
scenarios, particularly in the e-commerce and customer service sectors:
Customer Support Prioritization: Identify negative reviews in real time for immediate
resolution and escalation by support teams.
Market Research and Product Feedback: Analyze large volumes of feedback to detect
trends, product flaws, and areas for improvement.
Brand Reputation Management: Track sentiment shifts across product lines or regions
to mitigate risks and respond proactively to customer concerns.
CHAPTER 2
LITERATURE SURVEY
economic dynamics and is imperative for developing countries like Brazil. Brazilian e-
commerce has been mirroring the growth trajectories of the world’s largest economies,
signaling a crucial shift in national consumption patterns [10,11]. This growth in Brazil is
driven by increasing internet penetration and the widespread use of smartphones, which
democratize access to the digital marketplace [12]. The rise of more user-friendly e-
commerce platforms and a growing trust in online transactions are essential in this evolution,
enabling a more comprehensive range of consumers to participate in the digital economy and
enjoy its convenience, efficiency, and advantages [13]. The global health crisis intensified
this trend as consumers increasingly sought digital channels for safe and convenient
purchases [11,13].
Retail companies of various sizes, from large multinationals to local startups, have responded
by diversifying their digital services, offering an extensive range of products and services
from electronics to groceries [3]. The digital competition has spurred significant
advancements in marketing channels [14], user experience, and payment options in Brazil,
continually enhancing the e-commerce shopping experience [15]. These advancements
include the optimization of delivery networks [16], the integration of artificial intelligence for
personalized shopping experiences [7,8], and the adoption of secure, efficient payment
gateways [17]. As a result, e-commerce platforms are becoming more user-friendly and
reliable, attracting a broader range of consumers. E-commerce is establishing itself as a
crucial sales channel in Brazil and a driver of socioeconomic change [13], contributing to
increased consumer access, job creation, and economic growth [11].
The sector ’s evolution is also encouraging traditional businesses to innovate and adapt to
digital transformations, further solidifying the importance of e-commerce in Brazil’s
economic landscape [18]. As the eighth-largest internet market worldwide, Brazil accounts
for about 42% of B2C [Business-to-Consumer] e-commerce in Latin America, boasting 150
million users [19].
The Brazilian e-commerce sector has experienced a notable upward trajectory within less
than two decades. 2019, it recorded around 148.4 million transactions, generating BRL 61.9
billion, marking a 16.3% increase from the previous year [14]. This trend continued, with
revenues reaching BRL 87.4 billion in 2020 [41% growth] and BRL 161 billion in 2021 [a
27% increase]. ABComm projects revenues to hit BRL 169.59 billion in 2022 and anticipates
a rise to BRL 186.7 billion in 2023, with projections reaching BRL 273 billion by the end of
the following year [20]. The Brazilian Chamber of Electronic Commerce notes that categories
like office, computer, and communication equipment lead in revenue, followed by furniture,
household appliances, clothing, and footwear [21].
The pandemic has significantly altered consumption patterns [22]. A Mastercard study in
2021 indicated that 56% of consumers turned to online shopping, with 7% new to digital
platforms. Furthermore, about 46% of existing e-commerce users increased online purchases
[14]. Dunnhumby [23] found that 59% of these new digital consumers continued online
shopping after their initial purchase. As a result of this shift towards digital platforms, the
legislative framework governing e-commerce in Brazil has become increasingly important.
The framework is a critical factor shaping the sector’s growth and development. Several key
policies and regulations have been established to support and regulate online business
activities, ensuring a balanced environment for consumers and businesses.
CHAPTER 3
EXISTING SYSTEM
In e-commerce platforms, customer feedback and reviews are vital indicators of user
satisfaction. Traditionally, companies have relied on manual review systems or basic
keyword-based sentiment classification to understand customer emotions. These methods
involve human agents scanning review texts or using predefined dictionaries to detect
sentiment, which is often error-prone, time-consuming, and non-scalable.
3. Excel-Based Tracking
Basic tools like Excel are used to log reviews and flag common issues manually.
4. No Learning Capability
Traditional systems don’t adapt to new vocabulary or evolving customer language
trends.
Manual Effort: High human dependency makes the system costly and error-prone.
CHAPTER 4
PROPOSED SYSTEM
4.1 Overview
This research focuses on sentiment analysis of customer review comments from the Brazilian
e-commerce platform Olist. It aims to understand customer satisfaction by analyzing natural
language feedback using machine learning techniques as shown in Fig. 4.1. The dataset used
in this research is derived from multiple interrelated sources provided by Olist, including
order details, payment information, customer and seller data, and most importantly, customer
reviews. By integrating these datasets, the research provides a holistic view of customer
behavior and sentiment patterns.
The central objective of this research is to classify customer review messages into three
sentiment categories: Positive, Negative, and Neutral. To achieve this, the review texts
undergo a series of preprocessing steps including cleaning, tokenization, stopword removal,
and transformation using TF-IDF vectorization. These processed features are then used to
train various machine learning models, namely GNB model, LRC model, and CatBoost
Classifier. The implementation also includes label encoding of categorical variables and the
use of VADER sentiment analysis to label sentiments based on the polarity of the reviews.
An essential aspect of this research is model evaluation and comparison. Each model is tested
on unseen data and assessed using standard performance metrics such as accuracy, precision,
recall, and F1-score. The confusion matrix and classification reports provide deeper insights
into model behavior and the quality of classification. The comparative analysis helps identify
the most effective model for sentiment prediction in this context.
Through this research, a scalable and reusable pipeline has been developed for analyzing
user-generated content in the e-commerce sector. It not only contributes to improving
customer experience monitoring but also supports decision-making processes for sellers and
platform managers. The techniques and methodologies implemented in this research are
applicable to other domains involving textual feedback and customer reviews.
Fig. 4.1: Proposed system architecture
4.2: Preprocessing
Preprocessing plays a critical role in preparing the raw dataset for effective analysis and
model training. In this research, multiple CSV files from the Olist e-commerce dataset are
merged to create a comprehensive view of customer orders, reviews, payments, and product
information. The dataset includes both numerical and categorical data, as well as unstructured
text in the form of customer reviews.
Initially, timestamp fields are converted to datetime objects to extract temporal features such
as day of the week, hour, month, year, and delivery duration. Missing values, particularly in
the review_comment_message field, are removed to ensure data integrity. Categorical and
boolean columns are label encoded to convert them into numerical format required for
machine learning algorithms.
For text preprocessing, the review messages are cleaned by converting them to lowercase,
removing punctuation, and eliminating Portuguese stopwords using the NLTK library.
Tokenization is applied to split the review texts into individual words. A new cleaned text
column is created for further sentiment analysis. To generate sentiment labels, VADER
(Valence Aware Dictionary for Sentiment Reasoning) is used, which assigns a sentiment score
and classifies reviews into Positive, Negative, or Neutral. This comprehensive preprocessing
pipeline ensures that the data is structured, clean, and ready for feature extraction and
classification.
In this research, the pre processed customer review data is used to train machine learning
models capable of classifying sentiments. The cleaned textual data is converted into
numerical form using TF-IDF (Term Frequency–Inverse Document Frequency) vectorization,
which captures the importance of words relative to the entire corpus. The sentiment labels
generated using VADER serve as the target variable for supervised learning.
To evaluate model performance and robustness, the data is split into training and testing
subsets. Three models are trained: GNB model, LRC model, and CatBoost Classifier. Each
model is trained, saved, and evaluated using standard classification metrics. The goal is to
identify which algorithm best captures sentiment patterns in customer feedback.
1. GNB model
GNB model is a simple yet powerful probabilistic classifier based on Bayes' Theorem. The
internal operation of GNB model is demonstrated in Fig. 4.2. It assumes that the features (in
this case, TF-IDF word vectors) are normally distributed and independent of each other,
which simplifies the computation.
Internal Working:
Estimates the mean and variance of each feature (word frequency) for each class.
The class with the highest posterior probability is assigned to each review.
Despite its simplicity and assumptions, GNB works well with high-dimensional text
data and performs efficiently.
2. LRC model
LRC model is a linear model used for binary or multiclass classification. In this research, it
predicts the probability of each sentiment class based on the weighted sum of input features
derived from TF-IDF.
Internal Working:
Applies the softmax function for multiclass classification to convert logits into
probabilities:
The model is trained using gradient descent to minimize the cross-entropy loss
between predicted and actual labels.
LRC model is effective for high-dimensional sparse data and provides interpretable
coefficients.
3. CatBoost Classifier
Internal Working:
Builds an ensemble of decision trees sequentially where each tree corrects the errors
of the previous one.
Uses ordered boosting and minimal variance sampling to reduce prediction shift and
variance.
CatBoost internally handles categorical features, but in this case, it is applied to TF-
IDF vectors.
Loss function: Multi-class logarithmic loss (logloss) is minimized during training.
UML DIAGRAMS
GOALS: The Primary goals in the design of the UML are as follows:
Provide users a ready-to-use, expressive visual modeling Language so that they can
develop and exchange meaningful models.
The class diagram is used to refine the use case diagram and define a detailed design of the
system. The class diagram classifies the actors defined in the use case diagram into a set of
interrelated classes. The relationship or association between the classes can be either an "is-a"
or "has-a" relationship. Each class in the class diagram was capable of providing certain
functionalities. These functionalities provided by the class are termed "methods" of the class.
Apart from this, each class may have certain "attributes" that uniquely identify the class.
Activity diagram
A data flow diagram (DFD) is a graphical representation of how data moves within an
information system. It is a modeling technique used in system analysis and design to illustrate
the flow of data between various processes, data stores, data sources, and data destinations
within a system or between systems. Data flow diagrams are often used to depict the structure
and behavior of a system, emphasizing the flow of data and the transformations it undergoes
as it moves through the system.
Use Case diagram: A use case diagram in the Unified Modeling Language (UML) is a type
of behavioral diagram defined by and created from a Use-case analysis. Its purpose is to
present a graphical overview of the functionality provided by a system in terms of actors,
their goals (represented as use cases), and any dependencies between those use cases. The
main purpose of a use case diagram is to show what system functions are performed for
which actor. Roles of the actors in the system can be depicted.
Deployment Diagram:
A deployment diagram in UML illustrates the physical arrangement of hardware and software
components in the system. It visualizes how different software artifacts, such as data
processing scripts and model training components, are deployed across hardware nodes and
interact with each other, providing insight into the system’s infrastructure and deployment
strategy.
SOFTWARE ENVIRONMENT
Python is a high-level, interpreted programming language known for its simplicity and
readability, which makes it a popular choice for beginners as well as experienced developers.
Key features of Python include its dynamic typing, automatic memory management, and a
rich standard library that supports a wide range of applications from web development to data
science and machine learning. Its object-oriented approach and support for multiple
programming paradigms allow developers to write clear, maintainable code. Python's
extensive ecosystem of third-party packages further enhances its capabilities, enabling rapid
development and prototyping across diverse fields.
Installation
First, download the appropriate installer from the official Python website
(https://www.python.org/downloads/release/python-376/). For Windows users, run the
executable installer and ensure to check the "Add Python to PATH" option during installation;
for macOS and Linux, follow the respective package installation commands or use a package
manager like Homebrew or apt-get. After installation, verify the setup by running python --
version or python3 --version in your terminal or command prompt, which should display
"Python 3.7.6." This version-specific installation supports all major functionalities and
libraries compatible with Python 3.7.6, making it an excellent foundation for developing
robust applications in areas such as data analysis, machine learning, and GUI development.
1. Programming Language
Python 3.x: Used for data preprocessing, model building, evaluation, and backend
logic.
Scikit-learn – For machine learning algorithms like Naive Bayes, Ridge Classifier,
and evaluation metrics
Matplotlib & Seaborn – For EDA and visualizations
3. Operating System
5. Package Manager
Python 3.7.6 can run efficiently on most modern systems with minimal hardware
requirements. However, meeting the recommended specifications ensures better performance,
especially for developers handling large-scale applications or computationally intensive tasks.
By ensuring compatibility with hardware and operating system, can leverage the full potential
of Python 3.7.6.
Memory (RAM) Requirements: Python 3.7.6 does not demand excessive memory but
requires adequate RAM for smooth performance, particularly for running resource-intensive
applications such as data processing, machine learning, or web development.
Insufficient RAM can cause delays or crashes when handling large datasets or executing
computationally heavy programs.
Storage Requirements: Python 3.7.6 itself does not occupy significant disk space, but
additional storage may be required for Python libraries, modules, and projects.
Developers using Python for large-scale projects or data science should allocate more storage
to manage virtual environments, datasets, and frameworks like TensorFlow or PyTorch.
Compatibility with Operating Systems: Python 3.7.6 is compatible with most operating
systems but requires hardware that supports the respective OS. Below are general
requirements for supported operating systems:
Linux: Supports a wide range of distributions, including Ubuntu, CentOS, and Fedora.
The hardware specifications for the OS directly impact Python’s performance, particularly for
modern software development.
8. Package Manager
For an intelligent sentiment analysis system to operate effectively, it must meet a set of
functional and non-functional requirements. These requirements ensure the system performs
its core tasks (data processing, model training, sentiment prediction) reliably, efficiently, and
within acceptable performance bounds.
Functional requirements describe the core services the system must provide to fulfill its
objectives.
Data Integration: The system must load and merge various datasets (orders, reviews,
payments, etc.).
Preprocessing Module: Must clean and prepare review texts (tokenization, stopword
removal, etc.).
Non-functional requirements define the quality attributes and constraints of the system.
Usability: Should provide clean output for model results and charts.
A feasibility study evaluates whether the system is practical and beneficial from technical,
operational, and economic perspectives. This ensures that the project is viable before further
investment.
Aspect Description
Technical The system is technically feasible with Python libraries like NLTK,
Feasibility Scikit-learn, CatBoost, and standard hardware.
Operational End users (analysts or business users) can easily operate the system
Feasibility with minimal training.
Economic Implementation cost is low due to the use of open-source tools and
Feasibility reusable codebase.
SOURCE CODE
import pandas as pd
import numpy as np
import os
import joblib
import nltk
def load_and_merge_data():
df_reviews = pd.read_csv("dataset/olist_order_reviews_dataset.csv")
df_orders = pd.read_csv("dataset/olist_orders_dataset.csv")
df_products = pd.read_csv("dataset/olist_products_dataset.csv")
df_customers = pd.read_csv("dataset/olist_customers_dataset.csv")
df_sellers = pd.read_csv("dataset/olist_sellers_dataset.csv")
df_payments = pd.read_csv("dataset/olist_order_payments_dataset.csv")
# Merge datasets
return df
def preprocess_data(df):
df['order_purchase_timestamp'] = pd.to_datetime(df['order_purchase_timestamp'])
df['order_delivered_customer_date'] =
pd.to_datetime(df['order_delivered_customer_date'])
# Create features
df['day_of_week_int'] = df['order_purchase_timestamp'].dt.weekday + 1
df['hour'] = df['order_purchase_timestamp'].dt.hour
df['month'] = df['order_purchase_timestamp'].dt.month
df['year'] = df['order_purchase_timestamp'].dt.year
df['delivery_time'] = (df['order_delivered_customer_date'] -
df['order_purchase_timestamp']).dt.days
df = df.dropna(subset=['review_comment_message']).reset_index(drop=True)
le = LabelEncoder()
df[col] = le.fit_transform(df[col].astype(str))
STOP_WORDS = set(stopwords.words('portuguese'))
def clean_and_tokenize(text):
return ""
df['review_comment_message_clean'] =
df['review_comment_message'].apply(clean_and_tokenize)
# Sentiment analysis
analyzer = SentimentIntensityAnalyzer()
def get_sentiment(text):
scores = analyzer.polarity_scores(text)
return 'Positive'
return 'Negative'
else:
return 'Neutral'
df['sentiment'] = df['review_comment_message_clean'].apply(get_sentiment)
return df
def split_data(df):
X = df['review_comment_message_clean']
y = df['sentiment']
# Vectorize text
vectorizer = TfidfVectorizer(max_features=5000)
X_train_tfidf = vectorizer.fit_transform(X_train).toarray()
X_test_tfidf = vectorizer.transform(X_test).toarray()
# Encode labels
label_encoder = LabelEncoder()
y_train_encoded = label_encoder.fit_transform(y_train)
y_test_encoded = label_encoder.transform(y_test)
os.makedirs('model', exist_ok=True)
joblib.dump(vectorizer, 'model/vectorizer.pkl')
joblib.dump(label_encoder, 'model/label_encoder.pkl')
model_path = 'model/gaussian_nb.pkl'
if os.path.exists(model_path):
model = joblib.load(model_path)
else:
model = GaussianNB()
model.fit(X_train, y_train)
joblib.dump(model, model_path)
return model
model_path = 'model/logistic_regression.pkl'
if os.path.exists(model_path):
model = joblib.load(model_path)
else:
model.fit(X_train, y_train)
joblib.dump(model, model_path)
return model
def train_catboost(X_train, y_train):
model_path = 'model/catboost.pkl'
if os.path.exists(model_path):
model = joblib.load(model_path)
else:
model.fit(X_train, y_train)
joblib.dump(model, model_path)
return model
y_pred = model.predict(X_test)
# Calculate metrics
metrics = {
'Model': model_name,
# Classification Report
# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(8, 6))
xticklabels=label_encoder.classes_,
yticklabels=label_encoder.classes_)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.show()
return metrics
df = load_and_merge_data()
df
df = preprocess_data(df)
df
models = {
# Evaluate models
results = []
results.append(metrics)
results_df = pd.DataFrame(results)
styled_df = results_df.style.highlight_max(
color='lightblue'
display(styled_df)
CHAPTER 9
This chapter provides a detailed explanation of the implementation steps followed in the
sentiment analysis model using the Olist Brazilian E-commerce dataset. The pipeline includes
data integration, preprocessing, feature extraction, model training, evaluation, and
comparison of classification models.
Essential Python libraries such as pandas, numpy, matplotlib, seaborn, nltk, sklearn, catboost,
and joblib are imported to handle data manipulation, visualization, machine learning, natural
language processing, and model persistence.
Function: load_and_merge_data()
Merges them into a single DataFrame using keys like order_id, product_id,
customer_id, and seller_id.
This unified dataset allows contextual sentiment analysis by combining review texts
with relevant product and transaction data.
3. Data Preprocessing
Function: preprocess_data(df)
Feature Engineering:
Sentiment Labeling:
Function: split_data(df)
Train-Test Split:
Text Vectorization:
Label Encoding:
Serialization:
o Saves the vectorizer and label encoder as .pkl files for reuse.
5. Model Training
Functions:
train_gaussian_nb(X_train, y_train)
train_logistic_regression(X_train, y_train)
train_catboost(X_train, y_train)
6. Model Evaluation
Generates:
7. Model Comparison
The dataset used in this research is derived from the Brazilian e-commerce platform Olist and
comprises multiple integrated components, including customer reviews, order timelines,
product and seller details, and transactional metadata. After merging and preprocessing, the
dataset contains enriched records that represent each order with associated review text and
delivery attributes. Key features include timestamps of purchase and delivery, product and
seller information, and cleaned review messages. Additional derived features such as delivery
time, day of the week, and hour of order provide valuable temporal insights. Each review is
classified with a sentiment label—Positive, Neutral, or Negative—using VADER sentiment
analysis. This structured and feature-rich dataset forms the foundation for building accurate
and scalable machine learning models for automated sentiment classification, enabling
meaningful interpretation of customer satisfaction patterns in the e-commerce domain.
order_item_id Item count per order (usually 1 per row, but orders with
multiple items will have multiple rows).
hour Hour of the day when the order was placed (0–23).
In the Fig. 9.1 (a), the GNB model model shows a large number of misclassifications,
particularly for the Neutral class. Out of all Neutral samples, many are incorrectly predicted
as either Negative or Positive, which results in high off-diagonal values. Specifically, it
correctly classifies 3,035 Neutral instances but also misclassifies 2,509 as Neutral and 3,567
as Positive. The model struggles with separating the three sentiment classes, indicating it
cannot model the complexity of the data accurately, likely due to its assumption of feature
independence and normal distribution, which do not hold well in TF-IDF text data.
(a) (b)
(c)
Fig. 9.1: Confusion matrices obtained using (a) GNB model. (b) LRC model. (c) CatBoost
model.
In the Fig. 9.1 (b) confusion matrix corresponds to LRC model, which shows a much better
performance. The majority of predictions fall on the diagonal, indicating correct
classifications. It perfectly identifies 9,111 Neutral reviews and correctly predicts 704
Positive reviews, with very minimal misclassifications (e.g., 54 Positive samples labeled
Neutral). LRC model demonstrates strong capability in distinguishing Neutral sentiment, with
minor confusion between Positive and Neutral, and minimal mis labeling of Negative
reviews.
In Fig. 9.1 (c) illustrates the performance of the CatBoost Classifier, which achieves near-
perfect classification. It correctly classifies 9,110 Neutral and 746 Positive reviews, with only
a handful of misclassified samples (e.g., 13 Positive reviews as Neutral and 1 Neutral review
as Positive). The Negative class also shows improved predictions, with 56 correctly predicted
Negative samples. This model's superior performance is attributed to its gradient boosting
mechanism, ability to handle categorical features, and robustness to overfitting, making it the
most accurate and reliable model among the three.
Table 9.1 presents a comparative analysis of the performance of the three machine learning
models—GNB model, LRC model, and CatBoost Classifier—used in this research for
sentiment classification. The evaluation metrics considered are Accuracy, Precision, Recall,
and F1-Score, which provide a comprehensive view of each model’s performance.
The results clearly show that CatBoost outperforms the other models in all metrics, followed
closely by LRC model, while GNB model lags significantly in terms of accuracy and recall.
These observations suggest that advanced models like CatBoost are more capable of
capturing the nuanced patterns in textual data.
Key Observations:
GNB model has high precision but low accuracy and recall, indicating it's biased
toward a dominant class.
The performance comparison of the three machine learning models—GNB model, LRC
model, and CatBoost—demonstrates a clear distinction in their effectiveness for
sentiment classification of customer reviews. Gaussian Naive Bayes, while showing
relatively high precision (0.886), suffers from low accuracy (0.308), recall (0.308), and
F1-score (0.409), indicating that it often misclassifies and may be biased toward
predicting the majority class correctly while failing on others. LRC model performs
significantly better, achieving over 99% across all metrics, reflecting its strong ability to
generalize and distinguish between sentiment classes accurately. However, the CatBoost
Classifier surpasses both, with the highest accuracy (0.9976), precision (0.9976), recall
(0.9976), and F1-score (0.9975), proving its superior capability in capturing complex
patterns within the textual data through gradient boosting. This comparison underscores
the effectiveness of ensemble models like CatBoost in handling high-dimensional,
imbalanced, and non-linear textual data more efficiently than traditional linear or
probabilistic models.
CHAPTER 10
Conclusion
This research successfully developed a machine learning-based sentiment analysis system for
customer reviews in the Brazilian e-commerce dataset (Olist). Through extensive data
preprocessing—including text cleaning, tokenization, and TF-IDF vectorization—the system
was able to convert unstructured review messages into meaningful numerical features.
Multiple models were trained and evaluated, including GNB model, LRC model, and
CatBoost Classifier. The comparative analysis revealed that CatBoost significantly
outperformed the other models, achieving an accuracy of 99.76%, along with high precision,
recall, and F1-score. LRC model also performed well with a 99.02% accuracy, while GNB
model lagged behind due to its assumptions not aligning with the nature of textual data.
Overall, the project demonstrates the importance of advanced machine learning techniques
and robust feature engineering in achieving high sentiment classification accuracy. The
system's performance can be further improved by fine-tuning hyperparameters, using
ensemble stacking, or incorporating additional linguistic features such as n-grams, named
entity recognition, or word embeddings like Word2Vec or BERT.
Future Scope
In the future, this system can be enhanced to support multilingual sentiment analysis and real-
time review classification. Additionally, integrating a web-based dashboard for visualizing
customer sentiment trends can support business decision-making and customer relationship
management.
REFERENCES
[1] Maseeh, H.I.; Nahar, S.; Jebarajakirthy, C.; Ross, M.; Arli, D.; Das, M.; Rehman, M.;
Ashraf, H.A. Exploring the privacy concerns of smartphone app users: A qualitative
approach. Mark. Intell. Plan. 2023,41, 945–969.
[2] Büchel, E.; Spinler, S. The impact of the metaverse on e-commerce business models—
A delphi-based scenario study. Technol. Soc.2024,76, 102465
[3] Statista. Global Retail e-Commerce Sales 2014–2025. 2022. Available online:
https://www.statista.com/statistics/379046/worldwide-retail-e-commerce-sales/
(accessed on 12 May 2025).
[4] Gupta, A.S.; Mukherjee, J.; Garg, R. Retailing during the COVID-19 lifecycle: A
bibliometric study. Int. J. Retail. Distrib. Manag.2023,11, 1413–1476.
[5] World Trade Organization. E-Commerce, Trade and the COVID-19 Pandemic. 2020.
Available
online:https://www.wto.org/english/tratop_e/covid19_e/ecommerce_report_e.pdf
(accessed on 12 May 2025).
[6] Accenture. Technology Trends 2021. Available online: https://www.accenture.com/us-
en/insights/technology/technology-trends-2021 (accessed on 12 May 2025).
[7] Su, Z.; Bentley, B.L.; McDonnell, D.; Ahmad, J.; He, J.; Shi, F.; Takeuchi, K.;
Cheshmehzangi, A.; da Veiga, C.P. 6G and Artificial Intelligence Technologies for
Dementia Care: Literature Review and Practical Analysis. J. Med. Internet Res.
2022,24, e30503.
[8] Lucas, G.A.; Lunardi, G.L.; Dolci, D.B. From e-commerce to m-commerce: An
analysis of the user’s experience with different access platforms. Electron. Commer.
Res. Appl. 2023,58, 101240.
[9] Alves de Araújo, F.; Mendes dos Reis, J.G.; Terra da Silva, M.; Aktas, E. A Fuzzy
Analytic Hierarchy Process Model to Evaluate Logistics Service Expectations and
Delivery Methods in Last-Mile Delivery in Brazil. Sustainability 2022,14, 5753.
[10] Anacleto, A.; de Araújo Bornancin, A.P.; Mendes, S.H.C.; Scheuer, L. Between
Flowers and Fears: The New Coronavirus Pandemic [Covid-19] and the Flower Retail
Trade. Ornam. Agric. 2021,27, 26–32.
[11] Ferraz, R.M.; da Veiga, C.P.; da Veiga, C.R.P.; Furquim, T.S.G.; da Silva, W.V. After-
Sales Attributes in E-Commerce: A Systematic Literature Review and Future Research
Agenda. J. Theor. Appl. Electron. Commer. Res. 2023,18, 475–500.
[12] Pitta, G.B.; Pereira da Veiga, C.; Kaczam, F.; Su, Z.; Vieira da Silva, W. Reviewing
the scientific literature of the barriers to online purchases. Int. J. Bus. Forecast. Mark.
Intell. 2024,9, 80–102.
[13] Sociedade Brasileira de Varejo e Consumo. 2023. Available online:
http://www.sbvc.com.br (accessed on 12 May 2025).
[14] E-Bit. 2023. Available online: http://www.ebit.com.br (accessed on 12 May 2025).
[15] Furquim, T.S.G.; da Veiga, C.P.; Veiga, C.R.P.d.; Silva, W.V.d. The Different Phases
of the Omnichannel Consumer Buying Journey: A Systematic Literature Review and
Future Research Directions. J. Theor. Appl. Electron. Commer. Res. 2023,18, 79–104.
[16] ABComm. 2022. Available online: http://www.abcomm.org (accessed on 12 May
2025).
[17] Hassan, M.A.; Shukur, Z.; Hasan, M.K. An Efficient Secure Electronic Payment
System for E-Commerce. Computers 2020,9, 66.
[18] Almeida, S.F.; de Moura Leite, A.O.; de Castro Lima, L.; de Oliveira, P.H. Dinâmicas
do Varejo no Brasil: Produtividade e o Período Pós-Pandemia. Rev. Do IBRAC 2023,1,
87–116.
[19] eMarketer. 2019. Available online: http://www.emarketer.com (accessed on 12 May
2025).
[20] Valor Econômico. Faturamento do E-Commerce Deve Aumentar 10% em 2023.
Available online:https://valor.globo.com/patrocinado/dino/noticia/2023/05/08/
faturamento-do-e commerce-deve-aumentar-10-em 2023.ghtml (accessed on 12 May
2025).
[21] Mccenet. 2020. Available online: https://www.mccenet.com.br/categorias (accessed
on 12 May 2025).
[22] Michel, J.; da Veiga, C.; da Veiga, C.R. Metanarrativa sobre E-commerce no Brasil.
An. Do Simpósio Sul-Mato-Grossense Adm. 2021,4, 325–341.
[23] Dunnhumby. Six Months on, How Have Consumer Behaviours Changed as a Result
of COVID-19? 2020. Available online: https://customerfirst.dunnhumby.com/six-
months-on-how-have-consumer-behaviours-changed-as-a-result-of-covid-19.