0% found this document useful (0 votes)
23 views47 pages

Comprehensive Resource

This research presents a machine learning-based sentiment analysis system designed to classify customer reviews from the Brazilian e-commerce platform Olist into Positive, Neutral, or Negative sentiments. By integrating multiple datasets and employing advanced preprocessing techniques, the system utilizes models such as CatBoost, which achieves an impressive accuracy of 99.76%, significantly improving upon traditional keyword-based methods. The proposed system aims to automate sentiment classification, enhance customer insights, and support data-driven decision-making in the competitive e-commerce landscape.

Uploaded by

thepatel2212
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views47 pages

Comprehensive Resource

This research presents a machine learning-based sentiment analysis system designed to classify customer reviews from the Brazilian e-commerce platform Olist into Positive, Neutral, or Negative sentiments. By integrating multiple datasets and employing advanced preprocessing techniques, the system utilizes models such as CatBoost, which achieves an impressive accuracy of 99.76%, significantly improving upon traditional keyword-based methods. The proposed system aims to automate sentiment classification, enhance customer insights, and support data-driven decision-making in the competitive e-commerce landscape.

Uploaded by

thepatel2212
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 47

A COMPREHENSIVE RESOURCE FOR SALES PREDICTION,

CUSTOMER INSIGHTS, AND DELIVERY OPTIMIZATION FROM


BRAZILIAN E-COMMERCE DATASET RESEARCH PAPERS

ABSTRACT

In the era of digital commerce, customer reviews have emerged as a critical factor influencing
consumer behavior and business strategies. These reviews contain valuable insights into
customer satisfaction, product quality, and service experiences. However, analyzing large
volumes of textual feedback manually or using basic keyword-based methods is inefficient
and often inaccurate. Traditional systems rely on rule-based keyword matching or manual
interpretation, which are time-consuming, lack contextual understanding, and are not scalable
to meet the demands of modern e-commerce platforms. To overcome these limitations, this
research proposes a machine learning-based sentiment analysis system specifically designed
to process customer reviews from the Brazilian e-commerce dataset (Olist). The system
integrates multiple datasets and performs comprehensive preprocessing, including text
cleaning, tokenization, stopword removal, and TF-IDF vectorization. Sentiment labeling is
carried out using the VADER sentiment analyzer, and three classification models are trained
and evaluated: Gaussian Naive Bayes (GNB), Logistic Regression Classifier (LRC), and
CatBoost model. Among these, the CatBoost classifier demonstrates superior performance,
achieving an accuracy of 99.76%, significantly outperforming the other models. LRC model
also shows robust results, while GNB model lags due to its simplistic probabilistic
assumptions. The system is evaluated using key performance metrics such as accuracy,
precision, recall, and F1-score, along with confusion matrix visualizations. This intelligent
sentiment analysis system offers a scalable, accurate, and automated solution for
understanding customer opinions. It provides e-commerce platforms with actionable insights
to enhance product offerings, improve customer experience, and support data-driven
decision-making, making it a valuable tool in the competitive digital marketplace.
CHAPTER 1

INTRODUCTION

1.1 Overview

This research focuses on developing a machine learning-based sentiment analysis system for
classifying customer reviews collected from the Brazilian e-commerce platform (Olist). With
the rise of online shopping, customer reviews have become a key source of information for
understanding user satisfaction and improving business services. This system integrates
various datasets from the Olist platform, performs preprocessing to clean and transform
textual data, and applies sentiment classification using supervised machine learning models.
Three models—GNB model, LRC model, and CatBoost—are trained and evaluated using
performance metrics such as accuracy, precision, recall, and F1-score. The research aims to
automate the process of understanding customer sentiment in real-time, offering businesses a
reliable and scalable approach to monitor and respond to user feedback.

1.2 Research Motivation

The primary motivation for this research stems from the increasing volume of unstructured
textual feedback available on e-commerce platforms. Manually analyzing this data is not only
time-consuming but also inconsistent and error-prone. Traditional rule-based sentiment
analysis tools often lack the contextual intelligence to understand the true sentiment behind
customer statements. Therefore, there is a growing need to explore machine learning methods
that can handle such data efficiently, adapt to changing language patterns, and provide deeper
insights into customer satisfaction.

1.3 Problem Definition

Despite the availability of large-scale customer reviews, extracting meaningful sentiment


from them remains a challenge due to the complexity and diversity of human language.
Traditional systems fall short in terms of accuracy, scalability, and adaptability. The core
problem this research addresses is the lack of an automated, intelligent, and accurate system
that can classify sentiments (Positive, Neutral, Negative) from customer reviews. The goal is
to design and implement a robust machine learning pipeline that processes, analyzes, and
classifies customer sentiments using real-world data.
1.4 Significance

The significance of this research lies in its potential to revolutionize how e-commerce
platforms understand and respond to customer feedback. By automating sentiment
classification with high accuracy using machine learning models, the system reduces manual
labor, minimizes errors, and enhances customer engagement strategies. The CatBoost model’s
superior performance also highlights the advantage of using advanced algorithms in
sentiment analysis tasks. Ultimately, the research contributes to better customer service, data-
driven decision-making, and increased business competitiveness in the digital market.

1.5 Research Objectives

The primary objective of this research is to design and implement a robust machine learning-
based sentiment analysis system capable of accurately classifying customer reviews from the
Brazilian e-commerce platform (Olist) into Positive, Neutral, or Negative sentiments. To
achieve this, the following specific objectives are established:

 To collect and integrate multiple related datasets from the Olist platform, including
reviews, order information, product details, and customer data, to create a
comprehensive dataset for sentiment analysis.

 To convert unstructured text into numerical features using TF-IDF vectorization,


enabling machine learning algorithms to learn from and classify sentiment effectively.

 To train and evaluate multiple machine learning models (GNB model, LRC
model, and CatBoost) and compare their performance using metrics such as accuracy,
precision, recall, and F1-score.
 To identify the most accurate and efficient model for real-time sentiment
classification in a production-ready e-commerce environment.
 To demonstrate the value of automated sentiment analysis in improving customer
understanding, supporting data-driven decisions, and enhancing overall business
intelligence in online retail platforms.
1.6 Advantages

This research offers several notable advantages by implementing a machine learning-based


sentiment analysis system tailored to real-world e-commerce customer reviews:

 High Accuracy and Performance: The use of advanced models like CatBoost provides
exceptional accuracy (up to 99.76%), significantly outperforming traditional methods.

 Automation of Review Analysis: The system eliminates the need for manual review
categorization, reducing human error and increasing processing efficiency.

 Scalability: The approach is capable of processing large volumes of unstructured


customer reviews, making it ideal for growing e-commerce platforms.

 Language Handling and Text Understanding: With proper preprocessing, tokenization,


and TF-IDF vectorization, the system effectively understands complex language
patterns, including slang, variations, and contextual sentiment.

 Data-Driven Insights: The model outputs can be used to generate actionable insights
for marketing, customer support, and product development strategies.

1.7 Applications

The outcomes of this research can be directly applied in various domains and real-world
scenarios, particularly in the e-commerce and customer service sectors:

 E-commerce Review Monitoring: Automatically classify and monitor customer


sentiment to improve product listings, logistics, and seller ratings.

 Customer Support Prioritization: Identify negative reviews in real time for immediate
resolution and escalation by support teams.

 Market Research and Product Feedback: Analyze large volumes of feedback to detect
trends, product flaws, and areas for improvement.

 Business Intelligence Dashboards: Integrate with reporting tools to display real-time


sentiment metrics for decision-makers.

 Brand Reputation Management: Track sentiment shifts across product lines or regions
to mitigate risks and respond proactively to customer concerns.
CHAPTER 2

LITERATURE SURVEY

E-commerce has emerged as a transformative global force, reshaping consumer access to


products and services [1,2]. Its growth has been particularly notable post-COVID-19 as
consumers increasingly turn to online platforms [3,4]. This expansion allows businesses,
regardless of size, to tap into markets once beyond their reach, thereby contributing to the
globalization and democratization of trade [5]. Advances in technology, including artificial
intelligence, augmented reality, and blockchain, have further personalized and secured
consumer experiences [6–9].

Thus, e-commerce is more than a passing trend; it is a fundamental aspect of modern

economic dynamics and is imperative for developing countries like Brazil. Brazilian e-

commerce has been mirroring the growth trajectories of the world’s largest economies,
signaling a crucial shift in national consumption patterns [10,11]. This growth in Brazil is
driven by increasing internet penetration and the widespread use of smartphones, which
democratize access to the digital marketplace [12]. The rise of more user-friendly e-
commerce platforms and a growing trust in online transactions are essential in this evolution,
enabling a more comprehensive range of consumers to participate in the digital economy and

enjoy its convenience, efficiency, and advantages [13]. The global health crisis intensified

this trend as consumers increasingly sought digital channels for safe and convenient
purchases [11,13].

Retail companies of various sizes, from large multinationals to local startups, have responded
by diversifying their digital services, offering an extensive range of products and services
from electronics to groceries [3]. The digital competition has spurred significant
advancements in marketing channels [14], user experience, and payment options in Brazil,
continually enhancing the e-commerce shopping experience [15]. These advancements
include the optimization of delivery networks [16], the integration of artificial intelligence for
personalized shopping experiences [7,8], and the adoption of secure, efficient payment
gateways [17]. As a result, e-commerce platforms are becoming more user-friendly and
reliable, attracting a broader range of consumers. E-commerce is establishing itself as a
crucial sales channel in Brazil and a driver of socioeconomic change [13], contributing to
increased consumer access, job creation, and economic growth [11].

The sector ’s evolution is also encouraging traditional businesses to innovate and adapt to
digital transformations, further solidifying the importance of e-commerce in Brazil’s
economic landscape [18]. As the eighth-largest internet market worldwide, Brazil accounts
for about 42% of B2C [Business-to-Consumer] e-commerce in Latin America, boasting 150
million users [19].

The Brazilian e-commerce sector has experienced a notable upward trajectory within less
than two decades. 2019, it recorded around 148.4 million transactions, generating BRL 61.9
billion, marking a 16.3% increase from the previous year [14]. This trend continued, with
revenues reaching BRL 87.4 billion in 2020 [41% growth] and BRL 161 billion in 2021 [a
27% increase]. ABComm projects revenues to hit BRL 169.59 billion in 2022 and anticipates
a rise to BRL 186.7 billion in 2023, with projections reaching BRL 273 billion by the end of
the following year [20]. The Brazilian Chamber of Electronic Commerce notes that categories
like office, computer, and communication equipment lead in revenue, followed by furniture,
household appliances, clothing, and footwear [21].

The pandemic has significantly altered consumption patterns [22]. A Mastercard study in
2021 indicated that 56% of consumers turned to online shopping, with 7% new to digital
platforms. Furthermore, about 46% of existing e-commerce users increased online purchases
[14]. Dunnhumby [23] found that 59% of these new digital consumers continued online
shopping after their initial purchase. As a result of this shift towards digital platforms, the
legislative framework governing e-commerce in Brazil has become increasingly important.
The framework is a critical factor shaping the sector’s growth and development. Several key
policies and regulations have been established to support and regulate online business
activities, ensuring a balanced environment for consumers and businesses.
CHAPTER 3

EXISTING SYSTEM

In e-commerce platforms, customer feedback and reviews are vital indicators of user
satisfaction. Traditionally, companies have relied on manual review systems or basic
keyword-based sentiment classification to understand customer emotions. These methods
involve human agents scanning review texts or using predefined dictionaries to detect
sentiment, which is often error-prone, time-consuming, and non-scalable.

Fig. 1: Traditional architectural of comprehensive resource for sales

Traditional System and Workflow

1. Manual Feedback Analysis


Staff manually reads and interprets customer reviews to classify sentiment.
2. Keyword-Based Classification
Predefined positive/negative word lists are matched against text to assign sentiment.

3. Excel-Based Tracking
Basic tools like Excel are used to log reviews and flag common issues manually.

4. No Learning Capability
Traditional systems don’t adapt to new vocabulary or evolving customer language
trends.

Limitations of Traditional System

 Scalability Issues: Cannot process thousands of reviews efficiently in real time.

 Low Accuracy: Keyword matching fails to capture context, sarcasm, or negation.

 No Adaptability: Cannot learn from new data or improve over time.

 Manual Effort: High human dependency makes the system costly and error-prone.
CHAPTER 4

PROPOSED SYSTEM

4.1 Overview

This research focuses on sentiment analysis of customer review comments from the Brazilian
e-commerce platform Olist. It aims to understand customer satisfaction by analyzing natural
language feedback using machine learning techniques as shown in Fig. 4.1. The dataset used
in this research is derived from multiple interrelated sources provided by Olist, including
order details, payment information, customer and seller data, and most importantly, customer
reviews. By integrating these datasets, the research provides a holistic view of customer
behavior and sentiment patterns.

The central objective of this research is to classify customer review messages into three
sentiment categories: Positive, Negative, and Neutral. To achieve this, the review texts
undergo a series of preprocessing steps including cleaning, tokenization, stopword removal,
and transformation using TF-IDF vectorization. These processed features are then used to
train various machine learning models, namely GNB model, LRC model, and CatBoost
Classifier. The implementation also includes label encoding of categorical variables and the
use of VADER sentiment analysis to label sentiments based on the polarity of the reviews.

An essential aspect of this research is model evaluation and comparison. Each model is tested
on unseen data and assessed using standard performance metrics such as accuracy, precision,
recall, and F1-score. The confusion matrix and classification reports provide deeper insights
into model behavior and the quality of classification. The comparative analysis helps identify
the most effective model for sentiment prediction in this context.

Through this research, a scalable and reusable pipeline has been developed for analyzing
user-generated content in the e-commerce sector. It not only contributes to improving
customer experience monitoring but also supports decision-making processes for sellers and
platform managers. The techniques and methodologies implemented in this research are
applicable to other domains involving textual feedback and customer reviews.
Fig. 4.1: Proposed system architecture
4.2: Preprocessing

Preprocessing plays a critical role in preparing the raw dataset for effective analysis and
model training. In this research, multiple CSV files from the Olist e-commerce dataset are
merged to create a comprehensive view of customer orders, reviews, payments, and product
information. The dataset includes both numerical and categorical data, as well as unstructured
text in the form of customer reviews.

Initially, timestamp fields are converted to datetime objects to extract temporal features such
as day of the week, hour, month, year, and delivery duration. Missing values, particularly in
the review_comment_message field, are removed to ensure data integrity. Categorical and
boolean columns are label encoded to convert them into numerical format required for
machine learning algorithms.

For text preprocessing, the review messages are cleaned by converting them to lowercase,
removing punctuation, and eliminating Portuguese stopwords using the NLTK library.
Tokenization is applied to split the review texts into individual words. A new cleaned text
column is created for further sentiment analysis. To generate sentiment labels, VADER
(Valence Aware Dictionary for Sentiment Reasoning) is used, which assigns a sentiment score
and classifies reviews into Positive, Negative, or Neutral. This comprehensive preprocessing
pipeline ensures that the data is structured, clean, and ready for feature extraction and
classification.

4.3 ML Model Building and Training

In this research, the pre processed customer review data is used to train machine learning
models capable of classifying sentiments. The cleaned textual data is converted into
numerical form using TF-IDF (Term Frequency–Inverse Document Frequency) vectorization,
which captures the importance of words relative to the entire corpus. The sentiment labels
generated using VADER serve as the target variable for supervised learning.

To evaluate model performance and robustness, the data is split into training and testing
subsets. Three models are trained: GNB model, LRC model, and CatBoost Classifier. Each
model is trained, saved, and evaluated using standard classification metrics. The goal is to
identify which algorithm best captures sentiment patterns in customer feedback.
1. GNB model

GNB model is a simple yet powerful probabilistic classifier based on Bayes' Theorem. The
internal operation of GNB model is demonstrated in Fig. 4.2. It assumes that the features (in
this case, TF-IDF word vectors) are normally distributed and independent of each other,
which simplifies the computation.

Fig 4.2 : Internal operation of Gauusian Naïve Bayes model

Internal Working:

 Calculates the prior probability of each class (Positive, Neutral, Negative).

 Estimates the mean and variance of each feature (word frequency) for each class.

 The class with the highest posterior probability is assigned to each review.

 Despite its simplicity and assumptions, GNB works well with high-dimensional text
data and performs efficiently.
2. LRC model

LRC model is a linear model used for binary or multiclass classification. In this research, it
predicts the probability of each sentiment class based on the weighted sum of input features
derived from TF-IDF.

Fig 4.3 : Internal operation of LRC model.

Internal Working:

 Computes the dot product of input features and weights:

 Applies the softmax function for multiclass classification to convert logits into
probabilities:

 The model is trained using gradient descent to minimize the cross-entropy loss
between predicted and actual labels.
 LRC model is effective for high-dimensional sparse data and provides interpretable
coefficients.

3. CatBoost Classifier

CatBoost is a high-performance gradient boosting algorithm specifically optimized for


categorical and textual data. It is part of the boosting family and is known for its ability to
reduce overfitting and handle complex feature interactions.

Fig 4.4 : Internal operation of Catboost classifier model

Internal Working:

 Builds an ensemble of decision trees sequentially where each tree corrects the errors
of the previous one.

 Uses ordered boosting and minimal variance sampling to reduce prediction shift and
variance.

 CatBoost internally handles categorical features, but in this case, it is applied to TF-
IDF vectors.
 Loss function: Multi-class logarithmic loss (logloss) is minimized during training.

 Highly regularized and efficient, CatBoost performs exceptionally well on complex


datasets with non-linear relationships.
CHAPTER 5

UML DIAGRAMS

UML stands for Unified Modeling Language. UML is a standardized general-purpose


modeling language in the field of object-oriented software engineering. The standard is
managed, and was created by, the Object Management Group. The goal is for UML to
become a common language for creating models of object-oriented computer software. In its
current form UML is comprised of two major components: a Meta-model and a notation. In
the future, some form of method or process may also be added to; or associated with, UML.

The Unified Modeling Language is a standard language for specifying, Visualization,


Constructing and documenting the artifacts of software system, as well as for business
modeling and other non-software systems. The UML represents a collection of best
engineering practices that have proven successful in the modeling of large and complex
systems. The UML is a very important part of developing objects-oriented software and the
software development process. The UML uses mostly graphical notations to express the
design of software projects.

GOALS: The Primary goals in the design of the UML are as follows:

 Provide users a ready-to-use, expressive visual modeling Language so that they can
develop and exchange meaningful models.

 Provide extendibility and specialization mechanisms to extend the core concepts.

 Be independent of particular programming languages and development process.

 Provide a formal basis for understanding the modeling language.

 Encourage the growth of OO tools market.

 Support higher level development concepts such as collaborations, frameworks,


patterns and components.

 Integrate best practices.


Class diagram

The class diagram is used to refine the use case diagram and define a detailed design of the
system. The class diagram classifies the actors defined in the use case diagram into a set of
interrelated classes. The relationship or association between the classes can be either an "is-a"
or "has-a" relationship. Each class in the class diagram was capable of providing certain
functionalities. These functionalities provided by the class are termed "methods" of the class.
Apart from this, each class may have certain "attributes" that uniquely identify the class.

Figure-5.1: Class Diagram


Sequence Diagram

A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram


that shows how processes operate with one another and in what order. It is a construct of a
Message Sequence Chart. A sequence diagram shows, as parallel vertical lines (“lifelines”),
different processes or objects that live simultaneously, and as horizontal arrows, the messages
exchanged between them, in the order in which they occur. This allows the specification of
simple runtime scenarios in a graphical manner.

Figure-5.2: Sequence Diagram

Activity diagram

Activity diagrams are graphical representations of Workflows of stepwise activities and


actions with support for choice, iteration, and concurrency. In the Unified Modeling
Language, activity diagrams can be used to describe the business and operational step-by-step
workflows of components in a system. An activity diagram shows the overall flow of control.

Figure-5.3: Activity Diagram


Data flow diagram

A data flow diagram (DFD) is a graphical representation of how data moves within an
information system. It is a modeling technique used in system analysis and design to illustrate
the flow of data between various processes, data stores, data sources, and data destinations
within a system or between systems. Data flow diagrams are often used to depict the structure
and behavior of a system, emphasizing the flow of data and the transformations it undergoes
as it moves through the system.

Figure-5.4: Dataflow Diagram


Component diagram: Component diagram describes the organization and wiring of the
physical components in a system.

Figure-5.5: Component Diagram.

Use Case diagram: A use case diagram in the Unified Modeling Language (UML) is a type
of behavioral diagram defined by and created from a Use-case analysis. Its purpose is to
present a graphical overview of the functionality provided by a system in terms of actors,
their goals (represented as use cases), and any dependencies between those use cases. The
main purpose of a use case diagram is to show what system functions are performed for
which actor. Roles of the actors in the system can be depicted.

Figure-5.6: use case diagram

Deployment Diagram:

A deployment diagram in UML illustrates the physical arrangement of hardware and software
components in the system. It visualizes how different software artifacts, such as data
processing scripts and model training components, are deployed across hardware nodes and
interact with each other, providing insight into the system’s infrastructure and deployment
strategy.

Figure-5.7: Deployment Diagram.


CHAPTER 6

SOFTWARE ENVIRONMENT

6.1 Software Requirements

Python is a high-level, interpreted programming language known for its simplicity and
readability, which makes it a popular choice for beginners as well as experienced developers.
Key features of Python include its dynamic typing, automatic memory management, and a
rich standard library that supports a wide range of applications from web development to data
science and machine learning. Its object-oriented approach and support for multiple
programming paradigms allow developers to write clear, maintainable code. Python's
extensive ecosystem of third-party packages further enhances its capabilities, enabling rapid
development and prototyping across diverse fields.

Installation

First, download the appropriate installer from the official Python website
(https://www.python.org/downloads/release/python-376/). For Windows users, run the
executable installer and ensure to check the "Add Python to PATH" option during installation;
for macOS and Linux, follow the respective package installation commands or use a package
manager like Homebrew or apt-get. After installation, verify the setup by running python --
version or python3 --version in your terminal or command prompt, which should display
"Python 3.7.6." This version-specific installation supports all major functionalities and
libraries compatible with Python 3.7.6, making it an excellent foundation for developing
robust applications in areas such as data analysis, machine learning, and GUI development.

1. Programming Language

 Python 3.x: Used for data preprocessing, model building, evaluation, and backend
logic.

2. Libraries and Packages

 Pandas – For data manipulation and analysis

 NumPy – For numerical operations

 Scikit-learn – For machine learning algorithms like Naive Bayes, Ridge Classifier,
and evaluation metrics
 Matplotlib & Seaborn – For EDA and visualizations

 Joblib – For model serialization and loading

 warnings: A module to handle warning messages, ensuring cleaner output by filtering


unnecessary alerts.
 sklearn.model_selection (train_test_split, cross_val_score, GridSearchCV): A
module for splitting data into training and test sets, performing cross-validation, and
hyperparameter tuning.
 LabelEncoder Part of sklearn.preprocessing; used to convert categorical labels (like
failure modes) into numeric form.
 CatBoost – Gradient boosting library used for high-performance model training and
handling categorical features
 Random Forest Classifier An ensemble learning method that builds multiple
decision trees to provide robust predictions.
 sklearn.metrics (mean_absolute_error, mean_squared_error, r2_score): A
module providing functions to evaluate model performance using various metrics.
 Scikit-learn – Provides tools for preprocessing, TF-IDF vectorization, model training
(LRC model, GaussianNB), evaluation metrics, and train-test splitting.

3. Operating System

 Windows/Linux/macOS – Compatible with any OS supporting Python and Flask.

4. Development Environment (Optional)

 Jupyter Notebook / VS Code / PyCharm – For code development and debugging

 Anaconda – For managing Python environments and dependencies

5. Package Manager

 pip – For installing required Python packages

6.2 Hardware Requirements

Python 3.7.6 can run efficiently on most modern systems with minimal hardware
requirements. However, meeting the recommended specifications ensures better performance,
especially for developers handling large-scale applications or computationally intensive tasks.
By ensuring compatibility with hardware and operating system, can leverage the full potential
of Python 3.7.6.

Processor (CPU) Requirements: Python 3.7.6 is a lightweight programming language that


can run on various processors, making it highly versatile. However, for optimal performance,
the following processor specifications are recommended:

 Minimum Requirement: 1 GHz single-core processor.

 Recommended: Dual-core or quad-core processors with a clock speed of 2 GHz or


higher. Using a multi-core processor allows Python applications, particularly those
involving multithreading or multiprocessing, to execute more efficiently.

Memory (RAM) Requirements: Python 3.7.6 does not demand excessive memory but
requires adequate RAM for smooth performance, particularly for running resource-intensive
applications such as data processing, machine learning, or web development.

 Minimum Requirement: 512 MB of RAM.

 Recommended: 4 GB or higher for general usage. For data-intensive operations, 8


GB or more is advisable.

Insufficient RAM can cause delays or crashes when handling large datasets or executing
computationally heavy programs.

Storage Requirements: Python 3.7.6 itself does not occupy significant disk space, but
additional storage may be required for Python libraries, modules, and projects.

 Minimum Requirement: 200 MB of free disk space for installation.

 Recommended: At least 1 GB of free disk space to accommodate libraries and


dependencies.

Developers using Python for large-scale projects or data science should allocate more storage
to manage virtual environments, datasets, and frameworks like TensorFlow or PyTorch.

Compatibility with Operating Systems: Python 3.7.6 is compatible with most operating
systems but requires hardware that supports the respective OS. Below are general
requirements for supported operating systems:

 Windows: 32-bit and 64-bit systems, Windows 7 or later.


 macOS: macOS 10.9 or later.

 Linux: Supports a wide range of distributions, including Ubuntu, CentOS, and Fedora.

The hardware specifications for the OS directly impact Python’s performance, particularly for
modern software development.

 Windows/Linux/macOS – Compatible with any OS supporting Python and Flask.

7. Development Environment (Optional)

 Jupyter Notebook / VS Code / PyCharm – For code development and debugging

 Anaconda – For managing Python environments and dependencies

8. Package Manager

pip – For installing required Python packages


CHAPTER 7

FUNCTIONAL AND NON-FUNCTIONAL REQUIREMENTS

For an intelligent sentiment analysis system to operate effectively, it must meet a set of
functional and non-functional requirements. These requirements ensure the system performs
its core tasks (data processing, model training, sentiment prediction) reliably, efficiently, and
within acceptable performance bounds.

7.1 Functional Requirements

Functional requirements describe the core services the system must provide to fulfill its
objectives.

 Data Integration: The system must load and merge various datasets (orders, reviews,
payments, etc.).

 Preprocessing Module: Must clean and prepare review texts (tokenization, stopword
removal, etc.).

 Sentiment Classification: Should accurately label reviews as Positive, Neutral, or


Negative.

 Model Training: Allows training of machine learning models on processed data.

 Evaluation and Reporting: Computes metrics (accuracy, F1-score) and visualizes


performance.

7.2 Non-Functional Requirements

Non-functional requirements define the quality attributes and constraints of the system.

 Scalability: The system should handle large datasets without performance


degradation.

 Accuracy: The sentiment predictions should meet or exceed acceptable performance


metrics (e.g., 80%+ accuracy).

 Usability: Should provide clean output for model results and charts.

 Maintainability: Code should be modular, well-documented, and easy to update.

 Portability: Should run on different systems (Windows/Linux) with minimal setup.


7.3 System Study

A feasibility study evaluates whether the system is practical and beneficial from technical,
operational, and economic perspectives. This ensures that the project is viable before further
investment.

Aspect Description

Technical The system is technically feasible with Python libraries like NLTK,
Feasibility Scikit-learn, CatBoost, and standard hardware.

Operational End users (analysts or business users) can easily operate the system
Feasibility with minimal training.

Economic Implementation cost is low due to the use of open-source tools and
Feasibility reusable codebase.

Schedule The system can be developed, trained, and evaluated within a


Feasibility reasonable time frame.

Legal Feasibility No legal constraints as data is publicly available and open-source


packages are used.
CHAPTER 8

SOURCE CODE

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

import os

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import LabelEncoder

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,


classification_report, confusion_matrix

from sklearn.naive_bayes import GaussianNB

from sklearn.linear_model import LogisticRegression

from catboost import CatBoostClassifier

from sklearn.ensemble import VotingClassifier

import joblib

import nltk

from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize

import string # Added import for string module

def load_and_merge_data():

"""Load and merge multiple datasets."""


df_items = pd.read_csv("dataset/olist_order_items_dataset.csv")

df_reviews = pd.read_csv("dataset/olist_order_reviews_dataset.csv")

df_orders = pd.read_csv("dataset/olist_orders_dataset.csv")

df_products = pd.read_csv("dataset/olist_products_dataset.csv")

df_customers = pd.read_csv("dataset/olist_customers_dataset.csv")

df_sellers = pd.read_csv("dataset/olist_sellers_dataset.csv")

df_payments = pd.read_csv("dataset/olist_order_payments_dataset.csv")

# Merge datasets

df = df_orders.merge(df_items, on='order_id', how='inner')

df = df.merge(df_payments, on='order_id', how='inner', validate='m:m')

df = df.merge(df_reviews, on='order_id', how='inner')

df = df.merge(df_products, on='product_id', how='inner')

df = df.merge(df_customers, on='customer_id', how='inner')

df = df.merge(df_sellers, on='seller_id', how='inner')

return df

def preprocess_data(df):

"""Preprocess the dataset, converting object/boolean to numerical and creating features."""

# Convert datetime columns

df['order_purchase_timestamp'] = pd.to_datetime(df['order_purchase_timestamp'])

df['order_delivered_customer_date'] =
pd.to_datetime(df['order_delivered_customer_date'])
# Create features

df['day_of_week_int'] = df['order_purchase_timestamp'].dt.weekday + 1

df['hour'] = df['order_purchase_timestamp'].dt.hour

df['month'] = df['order_purchase_timestamp'].dt.month

df['year'] = df['order_purchase_timestamp'].dt.year

df['delivery_time'] = (df['order_delivered_customer_date'] -
df['order_purchase_timestamp']).dt.days

# Handle review comments

df = df.dropna(subset=['review_comment_message']).reset_index(drop=True)

# Convert object/boolean columns to numerical

le = LabelEncoder()

for col in df.select_dtypes(include=['object', 'bool']).columns:

if col not in ['review_comment_message', 'review_comment_title']: # Keep text


columns

df[col] = le.fit_transform(df[col].astype(str))

# Clean and tokenize review comments

STOP_WORDS = set(stopwords.words('portuguese'))

def clean_and_tokenize(text):

if not isinstance(text, str):

return ""

cleaned_text = text.lower().translate(str.maketrans('', '', string.punctuation))


words = word_tokenize(cleaned_text)

filtered_words = [word for word in words if word not in STOP_WORDS]

return " ".join(filtered_words)

df['review_comment_message_clean'] =
df['review_comment_message'].apply(clean_and_tokenize)

# Sentiment analysis

from nltk.sentiment.vader import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()

def get_sentiment(text):

scores = analyzer.polarity_scores(text)

if scores['compound'] >= 0.05:

return 'Positive'

elif scores['compound'] <= -0.05:

return 'Negative'

else:

return 'Neutral'

df['sentiment'] = df['review_comment_message_clean'].apply(get_sentiment)

return df
def split_data(df):

"""Split data into training and testing sets."""

X = df['review_comment_message_clean']

y = df['sentiment']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Vectorize text

vectorizer = TfidfVectorizer(max_features=5000)

X_train_tfidf = vectorizer.fit_transform(X_train).toarray()

X_test_tfidf = vectorizer.transform(X_test).toarray()

# Encode labels

label_encoder = LabelEncoder()

y_train_encoded = label_encoder.fit_transform(y_train)

y_test_encoded = label_encoder.transform(y_test)

# Save vectorizer and label encoder

os.makedirs('model', exist_ok=True)

joblib.dump(vectorizer, 'model/vectorizer.pkl')

joblib.dump(label_encoder, 'model/label_encoder.pkl')

return X_train_tfidf, X_test_tfidf, y_train_encoded, y_test_encoded, label_encoder

def train_gaussian_nb(X_train, y_train):


"""Train and save GaussianNB model."""

model_path = 'model/gaussian_nb.pkl'

if os.path.exists(model_path):

model = joblib.load(model_path)

else:

model = GaussianNB()

model.fit(X_train, y_train)

joblib.dump(model, model_path)

return model

def train_logistic_regression(X_train, y_train):

"""Train and save LogisticRegression model."""

model_path = 'model/logistic_regression.pkl'

if os.path.exists(model_path):

model = joblib.load(model_path)

else:

model = LogisticRegression(random_state=42, max_iter=1000)

model.fit(X_train, y_train)

joblib.dump(model, model_path)

return model
def train_catboost(X_train, y_train):

"""Train and save CatBoostClassifier model."""

model_path = 'model/catboost.pkl'

if os.path.exists(model_path):

model = joblib.load(model_path)

else:

model = CatBoostClassifier(verbose=0, random_state=42)

model.fit(X_train, y_train)

joblib.dump(model, model_path)

return model

def evaluate_performance(model, X_test, y_test, model_name, label_encoder):

"""Evaluate model performance with detailed metrics."""

y_pred = model.predict(X_test)

# Calculate metrics

metrics = {

'Model': model_name,

'Accuracy': accuracy_score(y_test, y_pred),

'Precision': precision_score(y_test, y_pred, average='weighted'),


'Recall': recall_score(y_test, y_pred, average='weighted'),

'F1-Score': f1_score(y_test, y_pred, average='weighted')

# Classification Report

print(f"\nClassification Report for {model_name}:")

print(classification_report(y_test, y_pred, target_names=label_encoder.classes_))

# Confusion Matrix

cm = confusion_matrix(y_test, y_pred)

plt.figure(figsize=(8, 6))

sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',

xticklabels=label_encoder.classes_,

yticklabels=label_encoder.classes_)

plt.title(f'Confusion Matrix - {model_name}')

plt.xlabel('Predicted')

plt.ylabel('True')

plt.show()

return metrics

df = load_and_merge_data()

df
df = preprocess_data(df)

df

X_train, X_test, y_train, y_test, label_encoder = split_data(df)

models = {

'GaussianNB': train_gaussian_nb(X_train, y_train),

'LogisticRegression': train_logistic_regression(X_train, y_train),

'CatBoost': train_catboost(X_train, y_train)

# Evaluate models

results = []

for name, model in models.items():

metrics = evaluate_performance(model, X_test, y_test, name, label_encoder)

results.append(metrics)

results_df = pd.DataFrame(results)

styled_df = results_df.style.highlight_max(

subset=['Accuracy', 'Precision', 'Recall', 'F1-Score'],

color='lightblue'

print("\nModel Performance Comparison:")

display(styled_df)
CHAPTER 9

RESULTS AND DISCUSSION

9.1 Implementation Description

This chapter provides a detailed explanation of the implementation steps followed in the
sentiment analysis model using the Olist Brazilian E-commerce dataset. The pipeline includes
data integration, preprocessing, feature extraction, model training, evaluation, and
comparison of classification models.

1. Importing Required Libraries

Essential Python libraries such as pandas, numpy, matplotlib, seaborn, nltk, sklearn, catboost,
and joblib are imported to handle data manipulation, visualization, machine learning, natural
language processing, and model persistence.

2. Data Loading and Merging

Function: load_and_merge_data()

 Reads seven different CSV files from the Olist dataset:

o order_items, order_reviews, orders, products, customers, sellers,


order_payments.

 Merges them into a single DataFrame using keys like order_id, product_id,
customer_id, and seller_id.

 This unified dataset allows contextual sentiment analysis by combining review texts
with relevant product and transaction data.

3. Data Preprocessing

Function: preprocess_data(df)

 DatetimeParsing: Converts order_purchase_timestamp and order_delivered_


customer_date into datetime format.

 Feature Engineering:

o Extracts day_of_week, hour, month, year, and delivery_time from timestamps.


 Text Cleaning:

o Removes null reviews.

o Label-encodes categorical variables (except review text).

 Tokenization and Stopword Removal:

o Cleans review_comment_message by removing punctuation and converting to


lowercase.

o Tokenizes and filters out Portuguese stopwords using NLTK.

 Sentiment Labeling:

o Applies NLTK's VADER sentiment analyzer to generate compound polarity


scores.

o Classifies reviews into Positive, Negative, or Neutral.

4. Data Splitting and Vectorization

Function: split_data(df)

 Target & Feature Extraction:

o Uses the cleaned review comments as X (features) and sentiment labels as y.

 Train-Test Split:

o Splits the dataset (80% training, 20% testing).

 Text Vectorization:

o Transforms text using TF-IDF (max_features=5000) for numerical


representation.

 Label Encoding:

o Encodes sentiment labels for classification.

 Serialization:

o Saves the vectorizer and label encoder as .pkl files for reuse.
5. Model Training

Functions:

 train_gaussian_nb(X_train, y_train)

 train_logistic_regression(X_train, y_train)

 train_catboost(X_train, y_train)

 Trains three different classification models:

o GNB model (simple probabilistic model)

o LRC model (linear classifier)

o CatBoost Classifier (gradient boosting optimized for categorical data)

 Models are saved using joblib to avoid retraining in future runs.

6. Model Evaluation

Function: evaluate_performance(model, X_test, y_test, model_name, label_encoder)

 Predicts sentiments on test data using each model.

 Calculates evaluation metrics:

o Accuracy, Precision, Recall, F1-score (weighted average).

 Generates:

o A classification report (per class performance).

o A confusion matrix heatmap for visual inspection of predictions.

 Aggregates the performance of all models for comparison.

7. Model Comparison

 Collects performance metrics of all three models into a DataFrame.

 Uses Pandas style.highlight_max() to highlight the best-performing model across all


evaluation metrics.

 Displays final model comparison results.


9.2 Dataset Description

The dataset used in this research is derived from the Brazilian e-commerce platform Olist and
comprises multiple integrated components, including customer reviews, order timelines,
product and seller details, and transactional metadata. After merging and preprocessing, the
dataset contains enriched records that represent each order with associated review text and
delivery attributes. Key features include timestamps of purchase and delivery, product and
seller information, and cleaned review messages. Additional derived features such as delivery
time, day of the week, and hour of order provide valuable temporal insights. Each review is
classified with a sentiment label—Positive, Neutral, or Negative—using VADER sentiment
analysis. This structured and feature-rich dataset forms the foundation for building accurate
and scalable machine learning models for automated sentiment classification, enabling
meaningful interpretation of customer satisfaction patterns in the e-commerce domain.

Table : 9.2: Dataset description

Column Name Description

order_id Unique identifier for each order placed on the e-


commerce platform.

customer_id Unique identifier for the customer who placed the


order.

order_status Encoded value indicating the status of the order (e.g.,


delivered, shipped).

order_purchase_timestamp Timestamp when the order was placed by the customer.

order_approved_at Timestamp when the payment for the order was


approved.

order_delivered_carrier_date Encoded value representing the date when the order


was handed over to the delivery carrier.

order_delivered_customer_date Actual date when the order was delivered to the


customer.

order_estimated_delivery_date Estimated date of delivery provided to the customer at


the time of purchase.

order_item_id Item count per order (usually 1 per row, but orders with
multiple items will have multiple rows).

product_id Encoded product identifier.

seller_zip_code_prefix Encoded postal code of the seller's location.

seller_city Encoded city code where the seller is located.

seller_state Encoded state code of the seller’s address.

day_of_week_int Numeric representation of the weekday on which the


order was placed (1 = Monday, 7 = Sunday).

hour Hour of the day when the order was placed (0–23).

month Month when the order was placed (1–12).

year Year in which the order was placed.

delivery_time Number of days taken to deliver the product to the


customer (calculated from
order_delivered_customer_date -
order_purchase_timestamp).

review_comment_message_clean Preprocessed version of the customer review message


(lowercased, tokenized, stopwords removed,
punctuation stripped).

sentiment Sentiment class assigned to the review (Positive,


Neutral, Negative) based on sentiment scoring.

9.3 Results Analysis

In the Fig. 9.1 (a), the GNB model model shows a large number of misclassifications,
particularly for the Neutral class. Out of all Neutral samples, many are incorrectly predicted
as either Negative or Positive, which results in high off-diagonal values. Specifically, it
correctly classifies 3,035 Neutral instances but also misclassifies 2,509 as Neutral and 3,567
as Positive. The model struggles with separating the three sentiment classes, indicating it
cannot model the complexity of the data accurately, likely due to its assumption of feature
independence and normal distribution, which do not hold well in TF-IDF text data.

(a) (b)

(c)

Fig. 9.1: Confusion matrices obtained using (a) GNB model. (b) LRC model. (c) CatBoost
model.

In the Fig. 9.1 (b) confusion matrix corresponds to LRC model, which shows a much better
performance. The majority of predictions fall on the diagonal, indicating correct
classifications. It perfectly identifies 9,111 Neutral reviews and correctly predicts 704
Positive reviews, with very minimal misclassifications (e.g., 54 Positive samples labeled
Neutral). LRC model demonstrates strong capability in distinguishing Neutral sentiment, with
minor confusion between Positive and Neutral, and minimal mis labeling of Negative
reviews.
In Fig. 9.1 (c) illustrates the performance of the CatBoost Classifier, which achieves near-
perfect classification. It correctly classifies 9,110 Neutral and 746 Positive reviews, with only
a handful of misclassified samples (e.g., 13 Positive reviews as Neutral and 1 Neutral review
as Positive). The Negative class also shows improved predictions, with 56 correctly predicted
Negative samples. This model's superior performance is attributed to its gradient boosting
mechanism, ability to handle categorical features, and robustness to overfitting, making it the
most accurate and reliable model among the three.

Table 9.1 presents a comparative analysis of the performance of the three machine learning
models—GNB model, LRC model, and CatBoost Classifier—used in this research for
sentiment classification. The evaluation metrics considered are Accuracy, Precision, Recall,
and F1-Score, which provide a comprehensive view of each model’s performance.

The results clearly show that CatBoost outperforms the other models in all metrics, followed
closely by LRC model, while GNB model lags significantly in terms of accuracy and recall.
These observations suggest that advanced models like CatBoost are more capable of
capturing the nuanced patterns in textual data.

Table 9.1: Performance comparison of existing and proposed classification models.

Model Accuracy Precision Recall F1-Score

GNB model 0.307870 0.886172 0.30787 0.408502


0

LRC model 0.990238 0.990163 0.99023 0.989183


8

Proposed CatBoost 0.997585 0.997589 0.99758 0.997539


model 5

Key Observations:

 GNB model has high precision but low accuracy and recall, indicating it's biased
toward a dominant class.

 LRC model performs excellently, with above 99% on all metrics.


 CatBoost achieves the highest performance overall, making it the most effective
model for this sentiment analysis task.

The performance comparison of the three machine learning models—GNB model, LRC
model, and CatBoost—demonstrates a clear distinction in their effectiveness for
sentiment classification of customer reviews. Gaussian Naive Bayes, while showing
relatively high precision (0.886), suffers from low accuracy (0.308), recall (0.308), and
F1-score (0.409), indicating that it often misclassifies and may be biased toward
predicting the majority class correctly while failing on others. LRC model performs
significantly better, achieving over 99% across all metrics, reflecting its strong ability to
generalize and distinguish between sentiment classes accurately. However, the CatBoost
Classifier surpasses both, with the highest accuracy (0.9976), precision (0.9976), recall
(0.9976), and F1-score (0.9975), proving its superior capability in capturing complex
patterns within the textual data through gradient boosting. This comparison underscores
the effectiveness of ensemble models like CatBoost in handling high-dimensional,
imbalanced, and non-linear textual data more efficiently than traditional linear or
probabilistic models.
CHAPTER 10

CONCLUSION AND FUTURE SCOPE

Conclusion

This research successfully developed a machine learning-based sentiment analysis system for
customer reviews in the Brazilian e-commerce dataset (Olist). Through extensive data
preprocessing—including text cleaning, tokenization, and TF-IDF vectorization—the system
was able to convert unstructured review messages into meaningful numerical features.
Multiple models were trained and evaluated, including GNB model, LRC model, and
CatBoost Classifier. The comparative analysis revealed that CatBoost significantly
outperformed the other models, achieving an accuracy of 99.76%, along with high precision,
recall, and F1-score. LRC model also performed well with a 99.02% accuracy, while GNB
model lagged behind due to its assumptions not aligning with the nature of textual data.
Overall, the project demonstrates the importance of advanced machine learning techniques
and robust feature engineering in achieving high sentiment classification accuracy. The
system's performance can be further improved by fine-tuning hyperparameters, using
ensemble stacking, or incorporating additional linguistic features such as n-grams, named
entity recognition, or word embeddings like Word2Vec or BERT.

Future Scope

In the future, this system can be enhanced to support multilingual sentiment analysis and real-
time review classification. Additionally, integrating a web-based dashboard for visualizing
customer sentiment trends can support business decision-making and customer relationship
management.
REFERENCES

[1] Maseeh, H.I.; Nahar, S.; Jebarajakirthy, C.; Ross, M.; Arli, D.; Das, M.; Rehman, M.;
Ashraf, H.A. Exploring the privacy concerns of smartphone app users: A qualitative
approach. Mark. Intell. Plan. 2023,41, 945–969.
[2] Büchel, E.; Spinler, S. The impact of the metaverse on e-commerce business models—
A delphi-based scenario study. Technol. Soc.2024,76, 102465
[3] Statista. Global Retail e-Commerce Sales 2014–2025. 2022. Available online:
https://www.statista.com/statistics/379046/worldwide-retail-e-commerce-sales/
(accessed on 12 May 2025).
[4] Gupta, A.S.; Mukherjee, J.; Garg, R. Retailing during the COVID-19 lifecycle: A
bibliometric study. Int. J. Retail. Distrib. Manag.2023,11, 1413–1476.
[5] World Trade Organization. E-Commerce, Trade and the COVID-19 Pandemic. 2020.
Available
online:https://www.wto.org/english/tratop_e/covid19_e/ecommerce_report_e.pdf
(accessed on 12 May 2025).
[6] Accenture. Technology Trends 2021. Available online: https://www.accenture.com/us-
en/insights/technology/technology-trends-2021 (accessed on 12 May 2025).
[7] Su, Z.; Bentley, B.L.; McDonnell, D.; Ahmad, J.; He, J.; Shi, F.; Takeuchi, K.;
Cheshmehzangi, A.; da Veiga, C.P. 6G and Artificial Intelligence Technologies for
Dementia Care: Literature Review and Practical Analysis. J. Med. Internet Res.
2022,24, e30503.
[8] Lucas, G.A.; Lunardi, G.L.; Dolci, D.B. From e-commerce to m-commerce: An
analysis of the user’s experience with different access platforms. Electron. Commer.
Res. Appl. 2023,58, 101240.
[9] Alves de Araújo, F.; Mendes dos Reis, J.G.; Terra da Silva, M.; Aktas, E. A Fuzzy
Analytic Hierarchy Process Model to Evaluate Logistics Service Expectations and
Delivery Methods in Last-Mile Delivery in Brazil. Sustainability 2022,14, 5753.
[10] Anacleto, A.; de Araújo Bornancin, A.P.; Mendes, S.H.C.; Scheuer, L. Between
Flowers and Fears: The New Coronavirus Pandemic [Covid-19] and the Flower Retail
Trade. Ornam. Agric. 2021,27, 26–32.
[11] Ferraz, R.M.; da Veiga, C.P.; da Veiga, C.R.P.; Furquim, T.S.G.; da Silva, W.V. After-
Sales Attributes in E-Commerce: A Systematic Literature Review and Future Research
Agenda. J. Theor. Appl. Electron. Commer. Res. 2023,18, 475–500.
[12] Pitta, G.B.; Pereira da Veiga, C.; Kaczam, F.; Su, Z.; Vieira da Silva, W. Reviewing
the scientific literature of the barriers to online purchases. Int. J. Bus. Forecast. Mark.
Intell. 2024,9, 80–102.
[13] Sociedade Brasileira de Varejo e Consumo. 2023. Available online:
http://www.sbvc.com.br (accessed on 12 May 2025).
[14] E-Bit. 2023. Available online: http://www.ebit.com.br (accessed on 12 May 2025).
[15] Furquim, T.S.G.; da Veiga, C.P.; Veiga, C.R.P.d.; Silva, W.V.d. The Different Phases
of the Omnichannel Consumer Buying Journey: A Systematic Literature Review and
Future Research Directions. J. Theor. Appl. Electron. Commer. Res. 2023,18, 79–104.
[16] ABComm. 2022. Available online: http://www.abcomm.org (accessed on 12 May
2025).
[17] Hassan, M.A.; Shukur, Z.; Hasan, M.K. An Efficient Secure Electronic Payment
System for E-Commerce. Computers 2020,9, 66.
[18] Almeida, S.F.; de Moura Leite, A.O.; de Castro Lima, L.; de Oliveira, P.H. Dinâmicas
do Varejo no Brasil: Produtividade e o Período Pós-Pandemia. Rev. Do IBRAC 2023,1,
87–116.
[19] eMarketer. 2019. Available online: http://www.emarketer.com (accessed on 12 May
2025).
[20] Valor Econômico. Faturamento do E-Commerce Deve Aumentar 10% em 2023.
Available online:https://valor.globo.com/patrocinado/dino/noticia/2023/05/08/
faturamento-do-e commerce-deve-aumentar-10-em 2023.ghtml (accessed on 12 May
2025).
[21] Mccenet. 2020. Available online: https://www.mccenet.com.br/categorias (accessed
on 12 May 2025).
[22] Michel, J.; da Veiga, C.; da Veiga, C.R. Metanarrativa sobre E-commerce no Brasil.
An. Do Simpósio Sul-Mato-Grossense Adm. 2021,4, 325–341.
[23] Dunnhumby. Six Months on, How Have Consumer Behaviours Changed as a Result
of COVID-19? 2020. Available online: https://customerfirst.dunnhumby.com/six-
months-on-how-have-consumer-behaviours-changed-as-a-result-of-covid-19.

You might also like