Employee Attrition Using Machine Learning and Depression Analysis
Employee Attrition Using Machine Learning and Depression Analysis
I. INT RODUCT ION high-performing employees quit the company searching for
Employee turnover [1] can be described as a constant de- better avenues. The losses incurred when an efficient employee
cline in the workforce due to retirement, death, or resignation. quits are not limited to advanced product beliefs, admirable
Every organization needs to have a certain percentage of project administration or links with the customers. This can
attrition to ensure the growth of the organization. Positive have a detrimental effect on companies as their
attrition is considered beneficial as it generally results in productivity decreases considerably, which hampers the
incapable and less productive employees quitting the organization’s morale. According to global professional
organization. Meagre attrition rates result in the stagnation of services firm Towers Watson, attrition in India occupies a
ideas in the workplace. They do not promote intellectual relatively higher position at 14% compared with global and
growth caused by exposure to new fresh recruits’ new ideas. the Asia Pacific Countries (11.20% and 13.81%, respectively)
High attrition rates prove to be exorbitant for the corporation [2]. Employee churn rate is influenced by several aspects like
as the corporation invests time, money, and assets to train age, salary, job satisfaction, etc. The elemental takeaway from
employees to make them prepared for the job in a particular the considerable employee attrition rate: the corporate world
corporation. In the case where employees quit the job, it is getting afflicted.
causes considerablelosses to the corporation. Companies have Mental health problems [3] impacts many employees, which
an uphill task as they must manage recruiting and training is usually disregarded because these problems tend to be
recruits and talent loss due to industry attrition trends. hidden at work. The most commonplace mental health disorder
Negative attrition implies a larger, more severe problem that has been studied best in the workplace is depression.
inside an organization when Recent studies have shown that employers lose approximately
Authorized licensed use limited to: Institute of Technology (Nirma University). Downloaded on September 28,2021 at 18:46:43 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Intelligent Computing and Control Systems (ICICCS 2021)
IEEE Xplore Part Number: CFP21K74-ART; ISBN: 978-0-7381-1327-2
$44 billion each year due to employees with clinical Machine (SVM), Random Forest Classifier (RFC) [12],
depression. The constant feeling of sadness and loss of K-Nearest Neighbor (KNN) [14], and Na¨ıve Bayes classifier
interest in everyday activities, which differs from people’s [15] for prediction purposes. Based on this research, we have
mood fluctuations in daily life, is called depression. A recent incorporated the same algorithms for training and testing our
study conducted on a survey of a sample reported that model.
about 6% of employees exhibit symptoms of depression in any
given year. Besides, employees may be fatigued at work, 4. S. S. Alduayj et al. (2018) [7] used a 3 step experiment
show signs of presenteeism, absenteeism. Depression may process to determine employee attrition rate. In the first
also mar judgment and hampers decision-making. When experiment, they used an original imbalanced dataset using
depression is realistically addressed in the workplace, it SVM with various kernel functions KNN, and Random
promises to lower presenteeism, increase productivity, lower Forest. They concentrated on reducing class imbalance using
absenteeism and lower medical costs. This paper approves that the adaptive synthetic (ADASYN) approach, retraining the
it is likely to anticipate employee attrition and an employee’s new dataset using the above-mentioned machine learning
mental state in the corporate sector. The prediction, as models. Furthermore, they performed undersampling of the
mentioned above, will help top-level management take data to achieve a balance between classes. Finally, training
preemptive measures to delve into various approaches in an ADASYN-balanced dataset with KNN with K = 3
retaining their staff, appointing new people or training resulted in the highest performance, with a 0.93 F1-score.
beforehand. Furthermore, it would assist them to take steps to They achieved an F!-score of 0.909 while using 12 features
improve the workplace’s mental health scenario. out of 29, using Random Forest Classifier and Feature
Selection. The essential idea of ADASYN [11] is to use a
II. RELAT ED W ORK weighted distribution for different minority class examples
Researchers have successfully developed many depression according to their level of difficulty in learning, where more
analysis and employee attrition calculation models that can synthetic data is generated for minority class examples that
classify expressions, gender, and many other features in recent are harder to learn compared to those minority examples that
years. are easier to learn.
3. Dilip Singh Sisodia et al. (2017) [6] used the HR 7. A study conducted by AR Subhani et al. (2017) [10]
analytics dataset sourced from Kaggle and tried to build a found that the human brain is the most affected organ while
model that predicts employee churn rate. A correlation undergoing stress. This study can be applied to learn the
matrix and heatmap were generated to show the relation changes and stress that a person with mental illnesses like
between the attributes. In the experimental section, a depression, anxiety, etc., goes through. Several features from
histogram was created to compare left employees vs the signal analysis of the Electroencephalogram (EEG) on
compensation, department, satisfaction level, etc. They used the affected person can be extracted. Classification of the
various machine learning algorithms such as Support Vector extracted features using algorithms like Decision Tree
Authorized licensed use limited to: Institute of Technology (Nirma University). Downloaded on September 28,2021 at 18:46:43 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Intelligent Computing and Control Systems (ICICCS 2021)
IEEE Xplore Part Number: CFP21K74-ART; ISBN: 978-0-7381-1327-2
Authorized licensed use limited to: Institute of Technology (Nirma University). Downloaded on September 28,2021 at 18:46:43 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Intelligent Computing and Control Systems (ICICCS 2021)
IEEE Xplore Part Number: CFP21K74-ART; ISBN: 978-0-7381-1327-2
I V. RESULT
Fig. 5. Correlation between Equal Opportunities and Equal Work Distribution This paper demonstrates the use of various classification
algorithms Support Vector Machines (SVM), Decision Tree
Fig 6 graph shows a correlation between the balance of Classifier (DTC), Random Forest Classifier (RFC), Gaussian
professional and personal life and stress at work. For Rating Naive Bayes (GNB), Logistic Regression (LR), and K-
4 comparatively, the stress is high, and balance is low, so the Neighbors (KNN), for predicting employee attrition rates in
attrition rate is higher. The average rate of attrition increases an organization.
with the rating as expected. The graph suggests that as balance A comparative study was performed using six different
in life decreases, the stress increases. classification algorithms to enhance accuracy. After training
Depression analysis will be done for employees, and the HR all models, the accuracies of the various models were
Dept will analyze the mental health. Fig. 7 shows the effect compared. Random Forest Classifier Algorithm tops the list
of the depression level on attrition. As the depression level with an accuracy of 86.00%, followed by Gaussian Naive
goes higher, the chances of attrition are higher. This indicates Bayes (GNB) at 81.40%.
Authorized licensed use limited to: Institute of Technology (Nirma University). Downloaded on September 28,2021 at 18:46:43 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Intelligent Computing and Control Systems (ICICCS 2021)
IEEE Xplore Part Number: CFP21K74-ART; ISBN: 978-0-7381-1327-2
V I. CONCLUSION
Our model considers various features and predicts the
attrition and mental health for an individual employee with an
accuracy of 86.0%, which is higher than the existing solutions
using the same algorithm.
The dataset that we have used has independent as well
as correlated attributes. Support Vector Machines (kernel =’
poly’) is better for non-linear problems but has poor
Fig. 8. Comparison of Accuracies performance when used on a dataset with a large number of
features. Furthermore, Naive Bayes(Gaussian) also works well
on non-linear datasets with many features, but it needs to
The following fig 8, shows the bar graph representation
have all the features independent of each other, while there is
of accuracies achieved for the different algorithms we have
some correlation in our case. Finally, Random Forest
used to analyze the attrition rate. Our study results state that
Classification works perfectly for such conditions, where there
Random Forest Classifier (RFC) has achieved the highest
is a correlation between features and the number of features is
accuracy as the features and the correlation between them
large. For the reasons mentioned above, Random Forest
is better suited for RFC. The rest of the models exhibited
Classifier gives higher accuracy than Naive Bayes, followed
comparatively less accuracy. Table 1 shows the details of the
by SVM. For the remaining algorithms, they need the dataset
models’ accuracy and their mean scores.
to be linear, which is not the case, and hence the accuracy
for other algorithms is comparatively low.
T ABLE I Using this model, the employers/HRs can be aware of
COMPARISION OF ACCURACY SCORES
their employees’ mental health and take appropriate steps to
prevent the employees’ attrition. The HR Dept. can focus on
Algorithm Accuracy R Square Score F1 Score employees that need therapy. With this model’s help, business
organizations can ensure that their employees work in a
positive atmosphere without tainting the business’s
Random Forest Classifier 86.00% 0.356 0.8599 productivity and efficiency.
REFERENCES
Gaussian NB 81.40% 0.173 0.814
[1] K Sunanda (2017), AN EMPIRICAL ST UDY ON EMPLOYEE
AT T RITION IN IT INDUST RIES- WITH SPECIFIC REFERENCE
SVM 80.04% 0.099 0.8004 T O WIPRO T ECHNOLOGIES Paper 15.pdf (researcher-
sworld.com)(Online).
[2] T alapatra, Pradip & Rungta, Saket & Anne, Jagadeesh. (2016).
EMPLOYEE AT T RIT ION AND ST RAT EGIC RET ENT ION
Logistic Regression 79.60% 0.062 0.796 CHALLENGES IN INDIAN MANUFACT URING INDUST RIES: A
CASE ST UDY. VSRD International Journal of Business and
Management Research. VI. 251-262.
Decision T ree Classifier 68.00% -0.470 0.68 [3] Mental health problems in the workplace - Harvard Health (Online).
[4] Saidi, S. B. Othman and S. B. Saoud, ”Hybrid CNN-SVM classifier
for efficient depression detection system,” 2020 4th International
Conference on Advanced Systems and Emergent T echnologies (IC
KNN 67.6% -0.488 0.676 ASET ), Hammamet, T unisia, 2020, pp. 229-234, doi: 10.1109/IC
ASET 49463.2020.9318302.
[5] Wongkoblap A, Vadillo MA, Curcin V. Modeling Depression Symptoms
from Social Network Data through Multiple Instance Learning. AMIA
Jt Summits T ransl Sci Proc. 2019;2019:44-53. Published 2019 May 6.
V . FUT URE SCOPE [6] D. S. Sisodia, S. Vishwakarma and A. Pujahari, ”Evaluation of machine
learning models for employee churn prediction,” 2017 International
1) All the major companies and govt institutions can make Conference on Inventive Computing and Informatics (ICICI),
use of our product. This product can be implemented Coimbatore, 2017, pp. 1016-1020, doi: 10.1109/ICICI.2017.8365293.
across various sectors like Finance, Education, IT etc. [7] S. Alduayj and K. Rajpoot, ”Predicting Employee Attrition using
Machine Learning,” 2018 International Conference on Innovations in
The product can be custom-built according to the Information Technology (IIT), Al Ain, United Arab Emirates, 2018, pp.
different needs of the sectors. 93-98, doi: 10.1109/INNOVAT IONS.2018.8605976.
Authorized licensed use limited to: Institute of Technology (Nirma University). Downloaded on September 28,2021 at 18:46:43 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Intelligent Computing and Control Systems (ICICCS 2021)
IEEE Xplore Part Number: CFP21K74-ART; ISBN: 978-0-7381-1327-2
[8] Wai, H.A., Chan, K.C.C., Yao, X.: A novel evolutionary data mining
algorithm with applications to churn prediction. IEEE T rans. Evol.
Comput. 7(6), 532–545 (2003).
[9] M. Deshpande and V. Rao, ”Depression detection using emotion
artificial intelligence,” 2017 International Conference on Intelligent
Sustainable Systems (ICISS), Palladam, India, 2017, pp. 858-862, doi:
10.1109/ISS1.2017.8389299.
[10 ] A. R. Subhani, W. Mumtaz, M. N. B. M. Saad, N. Kamel and A. S.
Malik, ”Machine Learning Framework for the Detection of Mental Stress
at Multiple Levels,” in IEEE Access, vol. 5, pp. 13545-13556, 2017, doi:
10.1109/ACCESS.2017.2723622.
[11 ] Haibo He, Yang Bai, E. A. Garcia and Shutao Li, ”ADASYN: Adaptive
synthetic sampling approach for imbalanced learning,” 2008 IEEE
International Joint Conference on Neural Networks (IEEE World
Congress on Computational Intelligence), Hong Kong, China, 2008, pp.
1322-1328, doi: 10.1109/IJCNN.2008.4633969.
[12 ] Parmar, Aakash & Katariya, Rakesh & Patel, Vatsal. (2019). A Review
on Random Forest: An Ensemble Classifier. 10.1007/978-3-030-03146-
6 86.
[13 ] Evgeniou, T heodoros & Pontil, Massimiliano. (2001). Support Vector
Machines: T heory and Applications. 2049. 249-257. 10.1007/3-540-
44673-7 12.
[14 ] Zhang, Shichao & Deng, Zhenyun & Cheng, Debo & Zong, Ming &
Zhu, Xiaoshu. (2016). Efficient kNN Classification Algorithm for Big
Data. Neurocomputing. 195. 10.1016/j.neucom.2015.08.112.
[15 ] Berrar, Daniel. (2018). Bayes’ T heorem and Naive Bayes Classifier.
10.1016/B978-0-12-809633-8.20473-1.
[16 ] Patel, Harsh & Prajapati, Purvi. (2018). Study and Analysis of Decision
T ree-Based Classification Algorithms. International Journal of Computer
Sciences and Engineering. 6. 74-78. 10.26438/ijcse/v6i10.7478.
Authorized licensed use limited to: Institute of Technology (Nirma University). Downloaded on September 28,2021 at 18:46:43 UTC from IEEE Xplore. Restrictions apply.