Journal - Ijritcc - 30.9.2023
Journal - Ijritcc - 30.9.2023
Abstract— This research work aims to develop prediction models and analytical insights to overcome customer churn issues through data-
driven approaches. The attrition rate of consumers in e-commerce is a significant issue requiring effective retention strategies. A novel
methodology is proposed comprising data preprocessing, using statistical analysis techniques developing the model and carrying out tailored
retention strategies. The model is used to identify crucial churn influencers and propose practical recommendations for enhancing consumer
retention. The significance of this work lies in its potential to allow e-commerce ventures with insights with the intention of price savings
strategies, enhanced revenue measures, and better consumer fulfillment. This research will influence the e-commerce business by facilitating
evidence-based methods for reducing customer turnover and increasing long-term customer value. The resulting accuracy of the proposed model
using Logistic Regression results in 87 percentage of accuracy which is a good metric to assess overall model performance. The Kaplan-Meier
curve is used to check the survival probability of consumers and identify consumers more likely to churn over time.
Keywords-Consumer churn; Statistical analysis; Kaplan-Meier curve; Consumer retention; Logistic regression; Prediction.
3968
IJRITCC | September 2023, Available @ http://www.ijritcc.org
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: 11 Issue: 9
Article Received: 25 July 2023 Revised: 12 September 2023 Accepted: 30 September 2023
___________________________________________________________________________________________________________________
II. LITERATURE SURVEY the e-commerce industry. Spatial analysis is commonly used to
The rise of the industrial agglomeration e-commerce evaluate geographical or location-based data and patterns, and
business has been strongly influenced by the quick it may be used to identify spatial trends or dependencies linked
development of technology and the economy, increasing to customer turnover in this context. Contrarily, machine
customer shopping experiences and removing logistical learning techniques are useful for processing and analyzing
obstacles [4]. This development potential is supported by enormous volumes of data, making it possible to spot patterns,
contemporary information technology and networks, with trends, and prediction models.
multimedia and industrial cluster marketing increasing E-commerce enterprises must be able to forecast client
relevance. Virtual e-commerce platforms in industrial clusters turnover to improve customer retention and marketing tactics
haven't, however, received much research. This study examines [11]. The model presented in this research uses support vector
virtual e-commerce models in diverse industrial machine (SVM) prediction and k-means customer
agglomerations, discussing big data, industrial agglomeration, segmentation to detect customer loss in B2C e-commerce.
and their connection to e-commerce. It emphasizes the Customers are divided into three groups using the approach,
significant growth in domestic internet users over the past ten which also identifies important customer segments. The work
years and the significance of e-commerce for current and shows that k-means clustering enhances prediction accuracy
upcoming company endeavors. The study concludes that when comparing SVM and logistic regression for churn
various e-commerce models in industrial agglomerations must prediction. The accuracy of SVM is higher than that of logistic
be adjusted to various contexts since they are context-specific regression. The management of customer interactions can
and cannot be generalized. benefit from these insights for B2C e-commerce businesses.
This work [5] focuses on how crucial client retention is to This study explores several data analysis strategies,
corporate success, particularly in the cutthroat environment of techniques, and tools that may be used to examine consumer
today. Losing clients, also known as customer churn, is a major behavior data and find trends or patterns that could point to
problem for newly established businesses. The idea of impending churn [12]. These techniques could consist of data
employing a Stacking Classifier, an ensemble learning mining, machine learning techniques, and statistical analysis.
approach, to assess and forecast customer turnover in e- The study's goal is to investigate how these methods might be
commerce data is introduced in the article as a way to address used in the particular context of cargo and logistics, where
this. This classifier includes knowledge from four basic keeping clients is crucial to preserving the profitability and
learners: KNN, SVM, RF Classifiers, and Decision Trees. long-term viability of the company.
The research [6] focuses on dealing with customer churn, E-commerce businesses gather a lot of client information,
which is an important issue for large companies, especially in such as search history, buying trends, reviews, and comments.
the telecommunications industry. Using machine learning This data may be used using machine learning and data mining
methods on a big data platform, the authors created a churn tools to assess customer behavior and detect possible attrition
prediction model to address this problem. They increased issues [13]. A popular supervised learning technique in
prediction accuracy by utilizing Social Network Analysis machine learning, the support vector machine focuses on both
(SNA) elements as well. regression and classification issues in predictive analysis. The
Customer retention is a crucial issue since the approach for forecasting E-Commerce customer attrition
telecommunications business is both oversaturated and very presented in this research uses support vector machines in
competitive [7]. Data mining techniques and data science conjunction with a hybrid recommendation strategy. Empirical
technology provide useful tools for anticipating client turnover data show that employing the integrated forecasting model
and enhancing customer loyalty. The study intends to analyze significantly improves several parameters, including coverage
alternative methods and build data science models to categorize ratio, hit ratio, lift degree, and accuracy rate.
consumers based on their propensity to quit a telecom firm. Considering all the relevant research mechanisms a novel
This study shows how these models may forecast and lower mechanism is proposed to predict consumer churn in e-
customer turnover by identifying the major causes of churn and commerce business platforms.
improving services. Enhancing customer loyalty, lowering III. PROPOSED METHODOLOGY
attrition, and improving company results may all be achieved
by implementing customer churn prediction models. The initial process to start with is data preprocessing. The
Machine learning techniques for anticipating client attrition efficiency of the prediction techniques relies on the quality of
were compared and analyzed by the authors in [8]. The the data because raw data causes inaccuracies and
research assesses the effectiveness of several machine learning inconsistencies and impacts the accuracy of the results. The
algorithms in the context of customer attrition prediction. dataset is preprocessed by handling missing values through
The authors [9] explore the world of e-commerce, paying removal, identifying and removing duplicates and outliers,
close attention to client turnover prediction. The research finally, the data consistency is ensured. The target variable
results presented in the article can give e-commerce enterprises churn is set by finding the consumers who haven't made
insightful information that will help them better understand successful purchases in the last six months. Feature engineering
consumer behavior and foresee turnover. The creation of more is carried out to create new features and transform the existing
successful customer engagement programs, retention tactics, features to enhance the performance of the model. The features
and marketing campaigns that are specifically tailored to each involved in predicting the churn value are identified. The
customer can result from this. features include frequency, purchase history, monetary value,
The authors in [10] combine machine learning and and consumer demographic information like age, gender,
geographical analytic methods, which proposes a thorough location, and salary. Additional features like time-stamped
approach for researching and forecasting consumer attrition in
3969
IJRITCC | September 2023, Available @ http://www.ijritcc.org
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: 11 Issue: 9
Article Received: 25 July 2023 Revised: 12 September 2023 Accepted: 30 September 2023
___________________________________________________________________________________________________________________
1
Figure 1. Statistical Representation of the Operational Dataset.
3970
IJRITCC | September 2023, Available @ http://www.ijritcc.org
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: 11 Issue: 9
Article Received: 25 July 2023 Revised: 12 September 2023 Accepted: 30 September 2023
___________________________________________________________________________________________________________________
3971
IJRITCC | September 2023, Available @ http://www.ijritcc.org
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: 11 Issue: 9
Article Received: 25 July 2023 Revised: 12 September 2023 Accepted: 30 September 2023
___________________________________________________________________________________________________________________
platform before churn. All the consumers will not experience overall instances. The calculated accuracy measure is 0.8783,
the churn factor in the stated time and some will be still active which states that the model correctly predicts 87.83 % of the
and not churned at the end of time. This Kaplan-Meier curve is instances in the dataset using the formula in (3).
used to handle this type of issue and make predictions for 𝑇𝑃+𝑇𝑁
consumers who have not churned. 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁 (3)
The formula for the Kaplan-Meier curve estimator for a
specific time is represented in the below equation (2). In the A confusion matrix table is generated to evaluate the
̂ (𝑡) is the estimated survival probability performance of the model. There are four values required to
below equation, 𝐾
calculate the confusion matrix score. The TP (True Positives)
over a time t. 𝑡𝑥 denotes the customer churn time over some are churned consumers that are correctly predicted by the
time, 𝐶𝑥 is the customer churn observed over a certain time proposed model. The TN (True Negatives) are non-churned
and 𝑅𝑥 is non-churned consumers over time x. consumers that are also correctly predicted as negatives by the
𝐶 proposed model. The FP (False Positives) is negatives which
𝐾̂ (𝑡) = ∏ (1 − 𝑥 ) (2)
𝑅𝑥 are predicted as positives by the model and FN (False
𝑡𝑥 ≤𝑡 Negatives) is instances that are positive in reality but are
The time series analysis is further extended to find the predicted as negatives. The result states that there are 533 true
seasonality changes in consumer behavior and their positives correctly predicted as churned customers, 434 true
relationship with consumer churn. Hence a final decision negatives correctly predicted as not churned, 56 as false
support system is generated to predict he churn and provide positives that are incorrectly predicted as churned consumers,
relevant insights to enhance consumer retention in e- and 78 as false negatives are churned consumers incorrectly
commerce business. The results of statistical analysis and predicted as not churned as shown in Fig.9.
machine learning algorithms can be compared to derive a The precision, recall, and F1-score are calculated for both
holistic view of consumer churn. The Kaplan-Meier curve is churned (True) and non-churned consumers (False) using
examined to check how it differs in terms of survival formulas in (4) (5), and (6).
probability and identify which consumers are likely to churn 𝑇𝑃
over time. The results of logistic regression are used to 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃+𝐹𝑃
(4)
determine the predictor's statistical significance for predicting
𝑇𝑃
customer churn. The resulting coefficient values indicate the 𝑅𝑒𝑐𝑎𝑙𝑙 = (5)
𝑇𝑃+𝐹𝑁
direction and the strength of the model. The features creating a
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ×𝑅𝑒𝑐𝑎𝑙𝑙
significant impact on the churn are identified and used in 𝐹1 𝑆𝑐𝑜𝑟𝑒 = 2 × (6)
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
developing targeted retention strategies. Hence a complete
and interpretable analysis of consumer churn can be developed For true the precision score is 89% which indicates that the
for e-commerce businesses to understand the survival instances predicted as true are certainly true and recall 85%
mechanism and retention dynamics of consumers. indicates true instances predicted correctly. The F1-score is
87% which is a balance between precision and recall. The non-
IV. RESULTS AND DISCUSSION churned consumers stated as false resulted in a precision score
of 87% indicating instances predicted as false are false, 90% of
The performance metric of logistic regression is evaluated
the recall score stating actual false correctly predicted, and 89%
using various metrics like accuracy, precision, recall, F1-score,
of the F1-score stating balance between the two metrics. The
and support score [18] [19] [20]. The results are depicted in
support metric indicates the number of instances in each class.
Fig. 9. The below-calculated metrics are used to determine how
The proposed model performs well with good overall accuracy
well the proposed model can distinguish between churned and
providing detailed insights into the model's performance for
retained customers.
both the churned and non-churned consumers and compared to
different models [21].
Then, statistical analysis for consumer survival is estimated
through Kaplan-Meier curves shown in Fig.10.
3972
IJRITCC | September 2023, Available @ http://www.ijritcc.org
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: 11 Issue: 9
Article Received: 25 July 2023 Revised: 12 September 2023 Accepted: 30 September 2023
___________________________________________________________________________________________________________________
The x-axis of the above graph represents time intervals in [9] Lubis, A. R., Prayudani, S., Julham, Nugroho, O., Lase, Y. Y., &
Lubis, M. (2022). Comparison of model in predicting customer
weeks and the y-axis denotes the survival probability. Initially, churn based on users' habits on e-Commerce. 2022 5th
the curve starts at 1 representing consumers are active, and International Seminar on Research of Information Technology and
slowly decreases over time when the consumers are churned. IntelligentSystems(ISRITI). https://doi.org/10.1109/isriti56927.202
Further separate curves can also be generated to identify 2.10052834
[10] Matuszelański, K., & Kopczewska, K. (2022). Customer churn in
consumer churn and tailor specific retention mechanisms. retail e-Commerce business: Spatial and machine learning
approach. Journal of Theoretical and Applied Electronic Commerce
V. CONCLUSION Research, 17(1), 165-198. https://doi.org/10.3390/jtaer17010009
[11] Xiahou X, Harada Y. B2C E-Commerce Customer Churn
This proposed work scrutinizes the techniques and data- Prediction Based on K-Means and SVM. Journal of Theoretical
driven strategies used to predict client attrition in the e- and Applied Electronic Commerce Research. 2022; 17(2):458-475.
commerce region, offering insights into customer retention https://doi.org/10.3390/jtaer17020024
tactics. Predictive models and approaches must continually be [12] Sahinkaya, G. O., Erek, D., Yaman, H., & Aktas, M. S. (2021). On
enhanced due to the business's quick development and the data analysis workflow for predicting customer churn behavior
in cargo and logistics sectors: Case study. 2021 International
changing consumer preferences. A fascinating strategy is Conference on Electrical, Communication, and Computer
exposed by combining statistics and machine learning Engineering(ICECCE). https://doi.org/10.1109/icecce52056.2021.
techniques. The trade-offs between accuracy and 9514117
interpretability is highlighted by the comparative examination [13] Shobana J, Ch. Gangadhar, Rakesh Kumar Arora, P.N.
of the developed model. Understanding customer turnover Renjith, J. Bamini, Yugendra devidas Chincholkar,E-
commerce customer churn prevention using machine
depends greatly on the model's interpretability and explain learning-based business intelligence strategy, Measurement:
ability. E-commerce businesses can make use of interpretable Sensors,Volume 27,2023,100728,ISSN 2665-
models to obtain perceptive information and make data-driven 9174,https://doi.org/10.1016/j.measen.2023.100728.
decisions to facilitate and efficiently retain consumers. The [14] Baxani, R., & Edinburgh, M. (2022). Heart disease
proposed research recommends churn prevention techniques as prediction using machine learning algorithms logistic
regression, support vector machine and random forest
well as individualized marketing, targeted incentives, and classification techniques. SSRN Electronic
improved customer service to increase customer loyalty and Journal. https://doi.org/10.2139/ssrn.4151423
profitability. Future studies may examine real-time data [15] Edlitz, Y., & Segal, E. (2022). Author response: Prediction
analytics, sophisticated deep learning methods, and models that of type 2 diabetes mellitus onset using logistic regression-
based scorecards. https://doi.org/10.7554/elife.71862.sa2
adjust to shifting consumer behavior as the e-commerce [16] Harshini, P. S., Naresh, K., Pamulapati, S. R., &
industry changes. Lavanya, A. (2023). Diagnosis of liver diseases using
machine learning algorithms and their prediction using
REFERENCES logistic regression and ANN. 2023 3rd International
Conference on Intelligent Technologies
(CONIT). https://doi.org/10.1109/conit59222.2023.1020581
[1] Li, L. & Zhang, J. (2021). Research and Analysis of an 9
Enterprise E-Commerce Marketing System Under the Big [17] Kadam, E., Gupta, A., Jagtap, S., Dubey, I., & Tawde, G.
Data Environment. Journal of Organizational and End User (2023). Loan approval prediction system using logistic
Computing (JOEUC), 33(6), 1-19. regression and CIBIL score. 2023 4th International
http://doi.org/10.4018/JOEUC.20211101.oa15 Conference on Electronics and Sustainable Communication
[2] JWenli, Z., & Liang, H. (2023). Research on optimization Systems
design of fresh e-Commerce APP based on KANO (ICESC). https://doi.org/10.1109/icesc57686.2023.1019315
model. International Conference on Cyber Security, 0
Artificial Intelligence, and Digital Economy (CSAIDE [18] Manglani, R., & Bokhare, A. (2021). Logistic regression
2023). https://doi.org/10.1117/12.2681544 model for loan prediction: A machine learning
[3] Šaković Jovanović J, Vujadinović R, Mitreva E, Fragassa C, approach. 2021 Emerging Trends in Industry 4.0 (ETI
Vujović A. The Relationship between E-Commerce and 4.0). https://doi.org/10.1109/eti4.051663.2021.9619201
Firm Performance: The Mediating Role of Internet Sales [19] Rajasekaran, V., & Priyadarshini, R. (2021). An e-
Channels. Sustainability. 2020; 12(17):6993. Commerce prototype for predicting the product return
https://doi.org/10.3390/su12176993 phenomenon using optimization and regression
[4] Yue Hongqiang,Research on E-Commerce Data Standard techniques. Communications in Computer and Information
System in the Era of Digital Economy From the Perspective Science, 230-240. https://doi.org/10.1007/978-3-030-88244-
of Organizational Psychology,Frontiers in 0_22
Psychology,VOLUME=13,YEAR=2022,DOI=10.3389/fpsy [20] Reddy, D. J., Gunasekaran, M., & Sundari, K. S. (2022).
g.2022.900698, ISSN=1664-1078 undefined. 2022 International Conference on Cyber
[5] Awasthi, S. (2022). Customer churn prediction on e- Resilience
Commerce data using stacking (ICCR). https://doi.org/10.1109/iccr56254.2022.9995969
classifier. https://doi.org/10.36227/techrxiv.20291694 [21] T. Vafeiadis, K. I. Diamantaras, G. Sarigiannidis, and K. C.
[6] Demir, B., & Ergün, Ö. Ö. (2023). Customer churn prediction with Chatzisavvas, “A comparison of machine learning techniques for
machine learning methods in telecommunication customer churn prediction,” Simul. Model. Pract. Theory, vol. 55,
industry. https://doi.org/10.21203/rs.3.rs-3343217/v1 no. 10, pp. 1–9, 2015, doi: 10.1016/j.simpat.2015.03.003.
[7] Gopal, P., & MohdNawi, N. B. (2021). A survey on customer churn [22] Retail Case Study Analysis | Kaggle
prediction using machine learning and data mining techniques in e-
Commerce. 2021 IEEE Asia-Pacific Conference on Computer
Science and Data Engineering
(CSDE). https://doi.org/10.1109/csde53843.2021.9718460
[8] Karamollaoglu, H., Yucedag, I., & Dogru, I. A. (2021). Customer
churn prediction using machine learning methods: A comparative
analysis. 2021 6th International Conference on Computer Science
andEngineering(UBMK). https://doi.org/10.1109/ubmk52708.2021
.9558876
3973
IJRITCC | September 2023, Available @ http://www.ijritcc.org