An E-Commerce Prototype for Predicting the Product
Return phenomenon Using Optimization and Regression
Techniques
Vidya Rajasekaran1, R Priyadarshini 2,
B.S. Abdur Rahman Crescent Institute of Science and Technology Chennai, Tamilnadu, India
[email protected] 1,
[email protected] Abstract. E-Commerce product returns are considered as a major disease and
is also a very challenging issue that greatly impacts the revenue of the E-
Commerce firm. Most of the E-Commerce firms consider 10% of the return
rates to be normal, but in cases when the product return rate exceeds 10%, the
investigation to such cases are further needed. The rising of the return rate is
considered as a big threat to the e-commerce industry, which needed to be
slowed down. Predictions can be carried out to overcome the return issues in
advance and measures can be taken to decrease the return rate. In this paper,
the work focuses on developing a prototype model for predicting the return
rate of any particular product in advance. In the existing system the return
volume management is predicted based on the dependent variable of the
manufacturer’s production process and their resources alone. Our work focuses
on finding the return rate by including a few more parameters which in turn
enhances the prediction accuracy. The proposed work is tested on different
Machine Learning algorithms for optimizing the results. The results can be
applied to overcome the major return and loss percentage by the e-commerce
industry to enhance their future revenue.
Keywords: Data analytics, return rate, prediction prototype, metaheuristic,
linear regression, Gradient Boosting, Random Forest, machine learning.
1 Introduction
In recent years, the e-commerce industry is growing rapidly. Most of the online
shopping takes place through the e-commerce system. The major challenge faced by
the e-commerce firms is the product return. It affects the revenue of the e-commerce
business. The e-commerce company should predict the return rate of the product in
order to avoid the financial and the operational costs. The vendor should also predict
the return rate in order to avoid the unnecessary losses. So a prediction system
becomes mandatory for stopping any kind of losses. The shopping habits for each and
every individual customer differs. Each customer has different behavior, places
different orders; spend money on different kinds of products. There are also issues
where the same product is liked by a particular customer and disliked by another. So
this is a biggest challenge for both the vendors and the e-commerce firms. Nowadays
social media is being used by the customers where anyone can comment on any
product easily which cuts down revenue of the product easily. Also providing a return
option for the product is one of the successes for e-commerce firms, where the
customer enjoys a fully satisfied shopping experience. Accepting returns is must for
the e-commerce company, but high number of returns for the same product is a
serious issue which should be taken into consideration by the e-commerce firm and
the vendor. The main reason behind the return is the quality of product or unsatisfied
customer. Each product returned affects the revenue and brings an unhappy customer.
To improve the profit and sales, the e-commerce system collects and stores the data of
customers purchase behavior. This kind of data plays a huge role in decision making,
adding a great value to the e-commerce business by improving the sales. The products
with high returns can be removed from the catalog by the vendors and the e-
commerce firm can suggest idea for improving the sales of the product by providing
better quality and ways to satisfy the customer. The return rate of a product is
calculated as the percentage of product returns accepted to total sales.
The primary motive of this paper is to develop an effective model for predicting the
return of products. For this research we use a hosted dataset. We developed the model
using time, product id and feedback rating. We use a metaheuristic approach to
optimize our results based on Linear Regression machine learning method. We
summarize our results based on the feedback rating of the products.
2 Literature Review and Real-time Surveys on Returns in E-
Commerce
Similar research work is proposed in [1], where the returns were considered based on
the sales, time, product, retailer, production process and resources, multiproduct effect
and historical returns. The feedback was not considered, which is an important factor
to predict the future returns, since the products with positive feedback will make less
number of returns and negative feedback will increase the number of returns. Nearly,
30% of the products ordered online are returned, when compared to the Brick-and-
Mortar stores with 8.89% of return [2]. The consumers expect an easy-return policy
so that 92% of consumers will get the tendency to buy something again. Free return
shipping is expected by 79% of consumers. Nearly 49% of retailers deal with free
return shipping where 67% of consumers check the returns policy before placing an
order. 62% of consumers are willing to shop online if there are chances to return an
item in-store. 58% of consumers want a return policy without asking any reason for
return and 47% want an easy-to-print return label. The chances of online purchase
offering free return shipping will be 27% for any product that costs more than $1000
and reduces to 10% if free return shipping is not offered. The top three reasons for
product return:
1) 23% of return occurs because of receiving wrong item,
2) 22% of the return takes place because of the actual product received looks
different from the product ordered,
3) 20% of the return takes place because of receiving the product in damaged
condition.
In Barclaycard research [3], focus was made on the serial returners. Consumers
make orders on several numbers of items, where they only keep the needed items and
return over the rest of the products. Nearly 30% of consumers purposely over-
purchase and later return unwanted items. Also 19% of consumers confessed to
ordering different varieties of the same item and they can select the items when
delivered. A survey conducted by UPS in 2019 states that e-commerce consumers
investigate their buying and settle to shops with translucent strategies. It reported that
36% of online shoppers had made a return in the previous three months.73% of the
consumers have stated that their experience in return policies will definitely disturb
their idea on buying repeatedly from the vendor or the e-commerce firm. By knowing
the returns in advance, unnecessary processing of the product to be shipped and
delivered out can be minimized. Also the products which make a very high return rate
need to be taken into serious consideration and necessary steps need to be carried out.
Not all the returns expect refunds, some return orders also demand for exchanges.
Analyzing the literature and the current issues, the need for this kind of prediction
systems are termed to be mandatory in running successful e-commerce businesses.
3 Empirical Research
In this section, the empirical research comprises the below three sections 3.1 describes
the operational process, section 3.2 describes the data and section 3.3 explains
mechanisms for optimizing the results.
3.1 Quantitative Operational Process
In this work we set up an experiment using the quantifiable data to predict the future
returns of a product or service. Several vendors are connected to the e-commerce firm
in order to sell their products online. Different consumers from various locations
place orders to the e-commerce firm. When a consumer places an order for a
particular product, the order request is sent to the particular vendor and the shipment
is processed. This is termed as successful order sales.
Returns. Returns are considered as a major service provided by the e-commerce
companies to retain their customers .The products are returned back from the
customers for several reasons. This becomes a hectic job for the vendors and the e-
commerce industries to proceed the return process again, since its affects the revenue
for both of them. If the product returns are too high, actions need to be taken for
reducing return percentage. Vendors and the e-commerce firms need to concentrate
more on the return volume to enhance their revenue. The flow of HMRA approach for
return volume phenomenon is shown in Fig 1.
Fig. 1. Flow of HMRA approach for Return Volume Prediction
3.2 Understanding the Data
In this research, we use a dataset with several attributes containing information about
the product sales detail. Each and every product is assigned with a unique product_id,
customer_id, manufacturer name, return information. For every product sold through
the e-commerce site feedbacks are received from the customer and they are stored in
the dataset. We use these values altogether and perform computations for predicting
the future return rate of any particular product.
3.3 Variables Used in Optimizing Results
We focus on the time, product and feedback variables in particular for predicting the
return rate of the products and store the result in Δtpf. The subscripts t, p, f denote the
time period, Product_id and feedback rating. We use the descriptive data to make the
prediction analytics. Diagnostic analytics technique is used to analyze the cause for
increased return rate and the prescriptive analytics approach is used for
recommending the measures to be carried out for minimizing the return rates.
Time (t). The time is denoted using the variable date, month and year of the purchase is
made and the return is initiated.
Product (p). The products denote the total number of products sold in different
categories as illustrated in the Fig. 2.
Fig. 2. Graph representing the overall sales and return based on products
feedback (f). It is the total number of negative feedback received which is directly
related to the return. The total number of returns and the negative feedback received are
illustrated below in Fig.3.
Fig. 3. Graph representing the overall returns based on the Negative feedback
4 Hybrid Metaheuristic based Regression Approach (HMRA)
1. Start
2. Declare input variables as σ p_id, τ p_id , υ p_id ,φ p_id
3. Find Count of (Α manuf_return) using (σ p_id / τ p_id) * 100
4. Find Count of (Β prod_feedback) using (υ p_id / φ p_id ) * 100
5. Compute [(Α manuf_return + Β prod_feedback )/2]/100
6. Store and print the result in Δ tpf
7. Prediction of results
if Δ = 0 then
tpf
Print (“Type I : Chances of return are low“)
elseif Δ > 0 and <= 0.25 then
tpf
Print (“Type II : Chances of return are moderate “)
elseif Δ > 0.25 and <= 0.5 then
tpf
Print (“Type III : Chances of return are high“)
elseif Δ > 0.5 and <= 0.75 then
tpf
Print (“Type IV : Chances of return are very high “)
elseif Δ > 0.75 and <= 1 then
tpf
Print (“Type IV : Chances of return are extreme “)
8. End
5 Experimental Illustration
The experimentation is carried out to predict how much likely a particular product
will be returned by the customer using a sample dataset. In this work the return score
of any product Δ over a particular time period is calculated by summing the resultant
tpf
values of the below outcomes of Α manuf_return and Β prod_feedback as shown in Eq.(1).
Here,
1. Return value of a particular product for the manufacturer( Α manuf_return)
2. Feedback score of the particular product (Β prod_feedback)
(1)
5.1 Manufacturer Return Percentage (Α manuf_return)
Several products are shipped from different manufacturers at different periods of time
based upon the orders placed by consumers. While some products are returned back
demanding for exchange or refund. The products returned back to the manufacturer
are calculated in percentage as, Manufacturers return percentage (Α manuf_return). For
each product with respective id, it is obtained from the values of the total number of
times a particular product is returned back to the manufacturer (σ p_id )by the total
number of times a particular product is shipped from the manufacturer (τ p_id ) as
stated in Eq.(2).
(2)
Products shipped from manufacturer (σ p_id). The total number of products
dispatched from any particular manufacturer are extracted from the database and
represented in three columns as product_id, the manufacturer name and the last column
represents the total count of the products shipped from the manufacturer. The product
from the manufacturer K-Y is considered to illustrate our experimentation. The K-Y
manufacturer has shipped products that are represented with the
product_id:AV16khLE-jtxr-f38VFn. The total number of dispatched count is found to
be 27 for the particular product which is marked as σ p_id.
Products returned back to the manufacturer (τ p_id). The outcome for the total
number of products returned back from the consumers for several reasons are
estimated. The count on the total number of returns made by consumers for any
particular product which is referred with the product_id is estimated. The product of
the K-Y manufacturer is returned nine times with the specific product_id:AV16khLE-
jtxr-f38VFn. The total number of returned count is 9 for the particular product which is
marked as τ p_id.
Return percentage of the manufacturer (Α manuf_return). Therefore, by applying the
results obtained we can determine the return percentage value of the manufacturer (Α).
Just for illustration we will take the sample output values of σ p_id and τ p_id for the K-Y
manufacturer and apply it in Eq. (3). To find the resulting return percentage of the
manufacturer. For example,
(3)
(4)
İn Eq. (3). The return percentage of K-Y manufacturer is calculated and the resulting
score of Α K-Y is found to be 33.33% as shown in Eq. (4).Is obtained in the results as
shown in Fig. 4.
Fig. 4. Count on the return rate of the manufacturers in percentage
5.2 Feedback Review Percentage (Β prod_feedback) .The return and feedback are
related to each other, so whenever a product return is made from unsatisfied
customers that will also surely reflect with a poor feedback from the consumers. The
feedback percentage (Β prod_feedback) is calculated using the sum obtained from the
negative feedbacks received for individual products (υ p_id) by the total number of
feedbacks received for each product (φ p_id ) as shown in Eq.(5).
(5)
Sum of the negative feedbacks received for any product (υ p_id). The feedback for
the delivered products are received from the customers by the E-Commerce firm. The
feedbacks are received in the form of texts and star ratings. In this work we consider
only the feedback received through star ratings and the textual feedback will be
included in the future work. The star ratings with less than three stars for any product
are considered and summed together. The star value < 3 stars for any product is
considered as negative feedback in this work, which is denoted as (υ p_id) and the
results are obtained.
Total number of feedbacks received for each product (φ p_id). The total number of
feedbacks received for any particular product, is the summation of all the feedback
with star values ranging anything between 0 to 5 is taken for consideration and is
denoted as (φ p_id ). 0 is the null value representing no star rating, which can be
omitted since it's not going to make any difference. The results for the total feedback
count is extracted.
Overall feedback percentage received for any product (Β prod_feedback). The final
overall feedback percentage of any product is retrieved by applying the resulting
values of (υ p_id) and (φ p_id ) in Eq. (5). The output values are shown in Fig.5. For
illustration we will take the sample output values of p_id and (φ p_id ) for the K-Y
manufacturer and apply it in Eq.(5) to find the overall feedback percentage of a
particular product. For example,
(6)
(7)
İn Eq. (6) the feedback percentage of the K-Y manufacturer is calculated and the
resulting score of Β K-Y is found to be 40.74% as shown in Eq. (7). The results can be
viewed in implementation as shown in Fig.5.
Fig.5. Feedback percentage for each product
5.3 Final Return Prediction percentage of any product (Δ tpf) .The final
return prediction is calculated by substituting the scores of the results obtained from
Eq. (2) and (5) and applied in Eq. (1). The final resulting value Δ tpf for a particular
product from manufacturer k-y is obtained as shown in Eq. (8) and (9),
(8)
(9)
For simplicity, we convert the value of Δ tpf ranging between 0 to 1 and based on the
range value, we categorize Δ tpf into four groups as shown in Table 1.The description
for the four groups Type I ,Type II, Type III and Type IV are illustrated in Table 1.In
our example the range value is obtained as 37.035% as shown in Eq. (12) which is
divided by 100 for conversion to range value and we get the result as 0.37035 which
falls in the range converted so the conclusion is made based on the description given
in Table 1.
Table 1. Prediction based on Categorization of Δ tpf
Groups Range Description on Prediction
Type I Δ tpf=0 Strongly very Less chances of return
Type II 0>Δ tpf ≤ 0.25 Return chances are moderate
Type III 0.25>Δ tpf ≤ 0.5 Return chances are high
Type IV 0.5>Δ tpf ≤ 0.75 Chances of return is very high
Type V 0.75 > Δ tpf ≤ 1 Chances of return is extreme
Table 2. Comparison of performance based on Machine Learning Algorithms
ML Test data Training data
Linear Mean Absolute Error: 0.01452 Mean Absolute Error: 0.01451
regression Mean Squared Error: 0.00846 Mean Squared Error: 0.00901
Root Mean Squared Error: 0.09200 Root Mean Squared Error: 0.09492
Random Mean Absolute Error: 0.02002 Mean Absolute Error: 0.02031
Forest Mean Squared Error: 0.00903 Mean Squared Error: 0.00947
Root Mean Squared Error: 0.09504 Root Mean Squared Error: 0.09736
Gradient Mean Absolute Error: 0.02016 Mean Absolute Error: 0.02124
Boosting Mean Squared Error: 0.00888 Mean Squared Error: 0.00940
Root Mean Squared Error: 0.09427 Root Mean Squared Error: 0.09695
The results for the comparison of prediction performances are shown in Table 2.
Based on the results, it's found that Gradient Boosting Algorithm correlates well to
our approach when compared to Linear Regression and Random Search Algorithms.
6 Conclusion and Future Work
There are no proper mechanisms for predicting the return volume in e-commerce
systems with more accuracy. In this work we consider the feedback received through
star ratings and the textual feedback are to be included in the future work. The current
research work shows the best suited machine learning algorithm for the prediction of
return rates. The proposed hybrid approach shows improved accuracy compared to the
existing methods. The return rate prediction is not only limited to feedback and
products, several other factors shall be included in the future work. The function
approximation will be carried out using classification techniques. The future research
will include the methodologies on reducing the operational and financial costs for the
e-commerce firm in their return policies.
References
1. Cui, H., Rajagopalan, S., & Ward, A. R. (2019). Predicting Product Return Volume Using
Machine Learning Methods. European Journal of Operational
Research. doi:10.1016/j.ejor.2019.05.046
2. E-Commerce Product Return Rate – Statistics and Trends [Infographic] (2020), E-
commerce Product Return Rate – Statistics and Trends [Infographic] (invespcro.com)
3. E-CommerceReturns: Stats and Trends (2020), Ecommerce Returns: 2020 Stats and Trends
- SaleCycle
4. Walsh, G.,Möhring, M.Effectiveness of product return-prevention instruments: Empirical
evidence.Electron Markets 27, 341–350 (2017). https://doi.org/10.1007/s12525-017-0259-0
5. Dimitrios Vlachos, Rommert Dekker,"Return handling options and order quantities for
single period products",European Journal of Operational Research,Volume 151, Issue
1,2003,Pages 38-52,ISSN 0377-2217, https://doi.org/10.1016/S0377-2217(02)00596-9.
6. Grifs, S. E., Rao, S., Goldsby, T. J., & Niranjan, T. T. (2012). The customer consequences
of returns in online retailing: An empirical analysis. Journal of Operations Management,
30(4), 282–294.
7. Stock, J. R., & Mulki, J. P. (2009). Product returns processing: An examination of practices
of manufacturers, wholesalers/distributors, and retailers. Journal of Business Logistics,
30(1), 33–62.
8. F. Ma, "The Study on Reverse Logistics for E-Commerce," 2010 International Conference
on Management and Service Science, Wuhan, China, 2010, pp. 1-4, doi:
10.1109/ICMSS.2010.5575577.
9. S. K. Shivakumar and P. V. Suresh, "Maximizing Knowledge Management returns in e-
commerce," 2014 International Conference on Computing for Sustainable Global
Development (INDIACom), New Delhi, India, 2014, pp. 545-550, doi:
10.1109/IndiaCom.2014.6828018.
10. H. Yang, J. Wang, M. He and B. Kuang, "Research of B2C E-Commerce Return Strategies
Based on Return Price," 2010 International Conference on Management and Service
Science, Wuhan, China, 2010, pp. 1-4, doi: 10.1109/ICMSS.2010.5576679.
11. Eleonora Morganti, Saskia Seidel, Corinne Blanquart, Laetitia Dablanc, Barbara Lenz, “
The Impact of E-commerce on Final Deliveries: Alternative Parcel Delivery Services in
France and Germany”, Transportation Research Procedia, Volume 4, 2014,Pages 178-190,
ISSN 2352-1465, https://doi.org/10.1016/j.trpro.2014.11.014.
12. Scott Matthews H, Hendrickson CT, Soh DL. “Environmental and Economic Effects of E-
Commerce: A Case Study of Book Publishing and Retail Logistics.” Transportation
Research Record. 2001; 1763 (1):6-12. doi:10.3141/1763-02
13. Saman Hassanzadeh Amin, Guoqing Zhang,"A multi-objective facility location model for
closed-loop supply chain network under uncertain demand and return,Applied
Mathematical Modelling",Volume 37, Issue 6,2013,Pages 4165-4176,ISSN 0307-
904X,https://doi.org/10.1016/j.apm.2012.09.039.
14. Dmitry Ivanov, Alexander Pavlov, Dmitry Pavlov, Boris Sokolov,Minimization of
disruption-related return flows in the supply chain,International Journal of Production
Economics,Volume 183, Part B,2017,Pages 503-513,ISSN 0925-
5273,https://doi.org/10.1016/j.ijpe.2016.03.012.
15. Ramakrishnan Ramanathan,"An empirical analysis on the influence of risk on relationships
between handling of product returns and customer loyalty in E-commerce, International
Journal of Production Economics", Volume 130, Issue 2, 2011,Pages 255-261,ISSN 0925-
5273,https://doi.org/10.1016/j.ijpe.2011.01.005.