Intelligent Sales Prediction Using Machine Learning Techniques
Intelligent Sales Prediction Using Machine Learning Techniques
Techniques
Sunitha Cheriyan Shaniba Ibrahim Saju Mohanan Susan Treesa
IT Department IT Department IT Department IT Department
Higher College of Technology Higher College of Technology Higher College of Technology Higher College of Technology
Muscat, Sultanate of Oman Muscat, Sultanate of Oman Muscat, Sultanate of Oman Muscat, Sultanate of Oman
[email protected] [email protected] [email protected] [email protected]
Abstract— Intelligent Decision Analytical System requires effectively, predictive sales data is important for businesses
integration of decision analysis and predictions. Most of the when looking for acquiring investment capital. The studies
business organizations heavily depend on a knowledge base proceed with a new perspective that focuses on how to
and demand prediction of sales trends. The accuracy in sales choose an appropriate approach to forecast sales with high
forecast provides a big impact in business. Data mining degree of precision. Initial dataset considered in this
techniques are very effective tools in extracting hidden research had a large number of entries, but the final dataset
knowledge from an enormous dataset to enhance accuracy and used for analysis having much smaller size compared to the
efficiency of forecasting. The detailed study and analysis of original due to the riddance of non-usable data, redundant
comprehensible predictive models to improve future sales
entries and irrelevant sales data.
predictions are carried out in this research. Traditional
forecast systems are difficult to deal with the big data and The data mining techniques and predictions methods are
accuracy of sales forecasting. These issues could be overcome discussed in Section I. The review of various literatures
by using various data mining techniques. In this paper, we about sales forecasts are stated in Section II. In Section III,
briefly analyzed the concept of sales data and sales forecast.
data tuning process and predictions are highlighted with
The various techniques and measures for sales predictions are
described in the later part of the research work. On the basis visual representation of generated results. The predictive
of a performance evaluation, a best suited predictive model is analytics and methodology on sales price also discussed.
suggested for the sales trend forecast. The results are The performance evaluations of various prediction
summarized in terms of reliability and accuracy of efficient algorithms using machine learning approaches are stated.
techniques taken for prediction and forecasting. The studies Finally, the result is analyzed and concluded by
found that the best fit model is Gradient Boost Algorithm, summarizing the research findings and future scope.
which shows maximum accuracy in forecasting and future
sales prediction. II. RELATED WORK
54
D. Forecasting and Trends
The forecast is composed of a smoothed averaged adjusted
The figure 3 shows the forecasting of the future sales from for a linear trend. Then the forecast is also adjusted for
Quarter 3 of 2018 to Quarter 3 of 2021. The trend shows the seasonality. The Figure 5 shows the details about the model
sum of sales revenue for the dated Quarter. The blue color used for the trend analysis. The model shows a seasonal
indicates the actual sales generated and red color indicates effect high in the month of January 2022 and low in the
the estimated sales for the dated quarter showing a slight month of August 2022.
increase in the sales as shown in Figure 3.
E. Prediction
55
Machine learning techniques can be applied to all
disciplines. Machine learning uses statistics to solve many ----------- (3)
classification and clustering problems. The ML algorithms Generalized linear models are providing estimate of the
are classified in three categories [15]. They are supervised, regression coefficients and estimated asymptotic standard
unsupervised and semi supervised. In this paper we errors of the coefficients. Usually the dispersion parameter
discussed about three machine learning algorithms which in GLS is fixed to a numeric value 1[18].
can be applied to prediction, like Generalized Linear Model
(GLM), Decision Tree (DT) and Gradient Boost Tree 2) Decision Tree
(GBT).
Decision tree is a classifier referred as recursive
partition of the instant space. It is a powerful form of
multiple variable analyses and is a strong data mining tool.
Its applications are found in various domains and this
approach represents factors involved in achieving a
predetermined goal and the corresponding factors to achieve
the goal and the ways and means of implementation. [14]
Let the objective can be denoted as (O) and (Ci) is the ways
to follow and let (Mij) the means of action corresponding to
these ways, which can be noted by qi, (i= 1 …. n), which
meets the relation.
…………….. (4)
……………… (5)
56
training method, required for adding a new weak learner further improvement on the GBT implementation with the
into the model, the weak learner is the decision tree [19]. support of a strong data set along with models such as
Grabit, Tobit as analyzed in [20], projects better accuracy
Let F(x) is a full model after t-1 round and h(x) is the new rate.
tree, added to the model. F0= 0 ………. (6)
GBT 98 2 50 50 0.962
V. CONCLUSION
IV. RESULT AND ANALYSIS
The researchers have concluded that an intelligent sales
prediction system is required for business organizations to
The performance of the classification algorithms is
handle enormous volume of data. Business decisions are
mostly focused on Classification accuracy, Accuracy in each
based on speed and accuracy of data processing techniques.
class and confusion matrix which shows the number of
Machine learning approaches highlighted in this research
predictions of each class which can be compared to the
paper will be able to provide an effective mechanism in data
instances of each class. Root Mean Square Error, Mean
tuning and decision making. In order to be competent in
Square Error, Absolute error are calculated and average of
business, organizations are required to equip with modern
the error is shown in the output in the Table III as the Error
approaches to accommodate different types of customer
Rate. This measure helps to identity whether the given
behavior by forecasting attractive sales turn over. In our
prediction is wrong on average.
studies, we used almost 85,000 records for the comparison
of algorithms. Since the time of execution was huge and to
The comparative studies of the three algorithms based on
manage such a large set of records are complex, some of the
the prediction performance are given in the Table 1 and the
records were discarded, during the analysis phase. At the
visualization in Figure 9. Based on the performance, it is
same time, fields and attributes, used in this analysis were
understood that Gradient Boost Algorithm is showing 98%
insufficient for the further analysis. It was the major
overall accuracy and the second stands Decision Tree
challenge we faced during the research. However, we had
Algorithms with nearly 71% overall accuracy and followed
thoroughly weighed our works by implementing efficient
by Generalized Linear Model with 64% accuracy. Finally,
ML techniques for prediction and forecasting. The current
it can be compared based on the empirical evaluation of the
studies can be expedited by using Big Data as a tool for the
three chosen algorithm the best fit for the model is Gradient
predictive analytics in sales forecasting. The big data
Boosted Tree. The classification accuracy rate can reach up
analysis and forecasting are measured as the vital fields in
to 100%, but in GBT model analyzed and shown in Table
the modern business scenario.
III, achieved approximately 98% of accuracy. If there are
57
REFERENCES [12] Rey, T. D., Wells, C., & Kauhl, J. (2013). Using data mining in
forecasting problems. In SAS Global Forum 2013: Data Mining and
Text Analytics.
[1] Huang, Q., & Zhou, F. (2017, March). Research on retailer data [13] Huang, W., Zhang, Q., Xu, W., Fu, H., Wang, M., & Liang, X.
clustering algorithm based on spark. In AIP Conference Proceedings (2015). A Novel Trigger Model for Sales Prediction with Data
(Vol. 1820, No. 1, p. 080022). AIP Publishing. Mining Techniques. Data Science Journal, 14.
[2] Saylı, A., Ozturk, I., & Ustunel, M. (2016). Brand loyalty analysis [14] Ethem Alpaydin. (2004). Introduction to Machine Learning (Adaptive
system using K-Means algorithm. Journal of Engineering Technology Computation and Machine Learning), The MIT Press.
and Applied Sciences, 1(3).
[15] Lytvynenko, T. I. (2016). Problem of data analysis and forecasting
[3] Maingi, M. N. A Survey on the Clustering Algorithms in Sales Data using decision trees method.
Mining.
[16] Lazăr, C., & Lazăr, M. (2015). Using the Method of Decision Trees in
[4] Sastry, S. H., Babu, P., & Prasada, M. S. (2013). Analysis & the Forecasting Activity. Petroleum-Gas University of Ploiesti
Prediction of Sales Data in SAP-ERP System using Clustering Bulletin, Technical Series, 67(1).
Algorithms. arXiv preprint arXiv:1312.2678.
[17] Flesch, B., Vatrapu, R., Mukkamala, R. R., & Hussain, A. (2015,
[5] Shrivastava, V., & Arya, N. (2012). A study of various clustering October). Social set visualizer: A set theoretical approach to big
algorithms on retail sales data. Int. J. Comput. Commun. Netw, 1(2). social data analytics of real-world events. In Big Data (Big Data),
[6] Rajagopal, D. (2011). Customer data clustering using data mining 2015 IEEE International Conference on(pp. 2418-2427). IEEE.
technique. arXiv preprint arXiv:1112.2663. [18] Asooja, K., Bordea, G., Vulcu, G., & Buitelaar, P. (2016). Forecasting
[7] Tsai, C. F., Wu, H. C., & Tsai, C. W. (2002). A new data clustering Emerging Trends from Scientific Literature. In LREC.
approach for data mining in large databases. In Parallel Architectures, [19] Stearns, B., Rangel, F., Rangel, F., de Faria, F. F., Oliveira, J., &
Algorithms and Networks, 2002. I-SPAN'02. Proceedings. Ramos, A. A. D. S. (2017). Scholar Performance Prediction using
International Symposium on (pp. 315-320). IEEE. Boosted Regression Trees Techniques. In European Symposium on
[8] Mann, A. K., & Kaur, N. (2013). Review paper on clustering Artificial Neural Networks, Computational Intelligence and Machine
techniques. Global Journal of Computer Science and Technology. Learning (ESANN). Citeseer.
[9] Shah, N., Solanki, M., Tambe, A., & Dhangar, D. Sales Prediction [20] Sigrist, F., & Hirnschall, C. (2018). Gradient Tree-Boosted Tobit
Using Effective Mining Techniques. Models for Default Prediction.
[10] Korolev, M., & Ruegg, K. (2015). Gradient Boosted Trees to Predict
Store Sales.
[11] Jain, A., Menon, M. N., & Chandra, S. Sales Forecasting for Retail
Chains.
58