0% found this document useful (0 votes)
54 views6 pages

Employee Attrition Using Machine Learning and Depression Analysis

The document discusses employee attrition and depression analysis using machine learning. Various algorithms like decision tree classifier, support vector machine, and random forest classifier were applied to a dataset from a survey to predict attrition rate and depression analysis. The models achieved an accuracy of 86% in predicting attrition rate.

Uploaded by

Vimoli Mehta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views6 pages

Employee Attrition Using Machine Learning and Depression Analysis

The document discusses employee attrition and depression analysis using machine learning. Various algorithms like decision tree classifier, support vector machine, and random forest classifier were applied to a dataset from a survey to predict attrition rate and depression analysis. The models achieved an accuracy of 86% in predicting attrition rate.

Uploaded by

Vimoli Mehta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Proceedings of the Fifth International Conference on Intelligent Computing and Control Systems (ICICCS 2021)

IEEE Xplore Part Number: CFP21K74-ART; ISBN: 978-0-7381-1327-2

Employee Attrition Using Machine Learning And


Depression Analysis
Mr Richard Joseph Mr Shreyas Udupa Mr Sanket Jangale
Asst Prof., Dept. Of Computer Engineering Dept. Of Computer Engineering Dept. Of Computer Engineering
VESIT VESIT VESIT
2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS) | 978-1-6654-1272-8/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICICCS51141.2021.9432259

Mumbai, India Mumbai, India Mumbai, India


[email protected] [email protected] [email protected]

Mr Kunal Kotkar Mr Parthesh Pawar


Dept. Of Computer Engineering Dept. Of Computer Engineering
VESIT VESIT
Mumbai, India Mumbai, India
[email protected] [email protected]

Abstract—Amongst the significant issues that corporate leaders


have to deal with within an organizati on is the decline in
proficient employees. This decline is primarily attributed to
extreme work pressure, dissatisfaction at work, and ignored
mental health issues such as depression, anxiety, etc. This is
known as Employee Attrition or Churn Rate. Given the amount
of stress employed people go through, focus on the state of mind
has gained much- needed traction. Our model aims to predict the
employee attrition rate and the employees’ emotional assessment
in an organization. A survey containing attrition-related
questions helped us gather the required data for analysis. Our
model will predict the attrition and give the depression analysis
with the help of this data. Algorithms such as Decision Tree
Classifier (DTC), S upport Vector Machine(S VM) and Random
Forest Classifier(RFC) were applied to this dataset after
performing preprocessing steps, which helped us achieve an
accuracy of 86.0% in predicting attrition rate. The results have
been expressed using the primary classification metrics,
including F1-score and accuracy. Fig. 1. Attrition Trends in industries
Index Terms—Attrition, Depression, S upport Vector Machine,
Random Forest

I. INT RODUCT ION high-performing employees quit the company searching for
Employee turnover [1] can be described as a constant de- better avenues. The losses incurred when an efficient employee
cline in the workforce due to retirement, death, or resignation. quits are not limited to advanced product beliefs, admirable
Every organization needs to have a certain percentage of project administration or links with the customers. This can
attrition to ensure the growth of the organization. Positive have a detrimental effect on companies as their
attrition is considered beneficial as it generally results in productivity decreases considerably, which hampers the
incapable and less productive employees quitting the organization’s morale. According to global professional
organization. Meagre attrition rates result in the stagnation of services firm Towers Watson, attrition in India occupies a
ideas in the workplace. They do not promote intellectual relatively higher position at 14% compared with global and
growth caused by exposure to new fresh recruits’ new ideas. the Asia Pacific Countries (11.20% and 13.81%, respectively)
High attrition rates prove to be exorbitant for the corporation [2]. Employee churn rate is influenced by several aspects like
as the corporation invests time, money, and assets to train age, salary, job satisfaction, etc. The elemental takeaway from
employees to make them prepared for the job in a particular the considerable employee attrition rate: the corporate world
corporation. In the case where employees quit the job, it is getting afflicted.
causes considerablelosses to the corporation. Companies have Mental health problems [3] impacts many employees, which
an uphill task as they must manage recruiting and training is usually disregarded because these problems tend to be
recruits and talent loss due to industry attrition trends. hidden at work. The most commonplace mental health disorder
Negative attrition implies a larger, more severe problem that has been studied best in the workplace is depression.
inside an organization when Recent studies have shown that employers lose approximately

978-0-7381-1327-2/21/$31.00 ©2021 IEEE 1000

Authorized licensed use limited to: Institute of Technology (Nirma University). Downloaded on September 28,2021 at 18:46:43 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Intelligent Computing and Control Systems (ICICCS 2021)
IEEE Xplore Part Number: CFP21K74-ART; ISBN: 978-0-7381-1327-2

$44 billion each year due to employees with clinical Machine (SVM), Random Forest Classifier (RFC) [12],
depression. The constant feeling of sadness and loss of K-Nearest Neighbor (KNN) [14], and Na¨ıve Bayes classifier
interest in everyday activities, which differs from people’s [15] for prediction purposes. Based on this research, we have
mood fluctuations in daily life, is called depression. A recent incorporated the same algorithms for training and testing our
study conducted on a survey of a sample reported that model.
about 6% of employees exhibit symptoms of depression in any
given year. Besides, employees may be fatigued at work, 4. S. S. Alduayj et al. (2018) [7] used a 3 step experiment
show signs of presenteeism, absenteeism. Depression may process to determine employee attrition rate. In the first
also mar judgment and hampers decision-making. When experiment, they used an original imbalanced dataset using
depression is realistically addressed in the workplace, it SVM with various kernel functions KNN, and Random
promises to lower presenteeism, increase productivity, lower Forest. They concentrated on reducing class imbalance using
absenteeism and lower medical costs. This paper approves that the adaptive synthetic (ADASYN) approach, retraining the
it is likely to anticipate employee attrition and an employee’s new dataset using the above-mentioned machine learning
mental state in the corporate sector. The prediction, as models. Furthermore, they performed undersampling of the
mentioned above, will help top-level management take data to achieve a balance between classes. Finally, training
preemptive measures to delve into various approaches in an ADASYN-balanced dataset with KNN with K = 3
retaining their staff, appointing new people or training resulted in the highest performance, with a 0.93 F1-score.
beforehand. Furthermore, it would assist them to take steps to They achieved an F!-score of 0.909 while using 12 features
improve the workplace’s mental health scenario. out of 29, using Random Forest Classifier and Feature
Selection. The essential idea of ADASYN [11] is to use a
II. RELAT ED W ORK weighted distribution for different minority class examples
Researchers have successfully developed many depression according to their level of difficulty in learning, where more
analysis and employee attrition calculation models that can synthetic data is generated for minority class examples that
classify expressions, gender, and many other features in recent are harder to learn compared to those minority examples that
years. are easier to learn.

5. A carrier company ordered the integration of a


1. Afef Saidi et al. (2020) [4] presented an innovative revolutionary algorithm named Data Mining Evolutionary
audio-based method to detect depression using a hybrid algorithm(DMEL) (2003) [8]. Its prime objective was to
model. Their model combines convolutional neural networks anticipate the consumer’s attrition rate and the chances of
(CNN) and Support Vector Machines (SVM) [13], where them leaving. It was established that if a consumer is
SVM is deployed in place of the fully connected layers in leaving, then a set of loyalty programs, including special
CNN. In this proposed model, feature extraction was offers and discounts, are offered by the company to retain
performed using CNN, and the classification was performed the consumer. Applying the model to real-time data showed
using SVM. They achieved an accuracy of 68% using the accurate results by depicting stimulating rules of
hybrid model compared with 58.57% achieved with the CNN classification and distinct attrition rates. Using this concept,
model. our model will give the employer/admin suggestions to retain
employees on the verge of quitting.
2. Akkapon Wongkoblap et al. (2019) [5] collected Facebook
users’ data from 2007 to 2012 to build a predictive model 6. M. Deshpande et al.(2017) [9] successfully implemented
for detecting symptoms of depression. They used multiple emotional analysis based on Twitter feeds, primarily focusing
instances of learning neural networks to create their on depression. the feed was classified as negative or neutral,
predictive model. This enabled them to develop their model based on a specially constructed list of words depicting
using a few labelled bags instead of requiring all of the labels depression tendencies. They adopted a unique approach to
of the instances used. They achieved maximum accuracy of conducting their experiments. By implementing Naive-Bayes
74.51% and a precision of 80% in detecting depressed users Classifier and SVM, they achieved a maximum accuracy of
based on the content on their social network account. 83.0% in predicting depression tendencies.

3. Dilip Singh Sisodia et al. (2017) [6] used the HR 7. A study conducted by AR Subhani et al. (2017) [10]
analytics dataset sourced from Kaggle and tried to build a found that the human brain is the most affected organ while
model that predicts employee churn rate. A correlation undergoing stress. This study can be applied to learn the
matrix and heatmap were generated to show the relation changes and stress that a person with mental illnesses like
between the attributes. In the experimental section, a depression, anxiety, etc., goes through. Several features from
histogram was created to compare left employees vs the signal analysis of the Electroencephalogram (EEG) on
compensation, department, satisfaction level, etc. They used the affected person can be extracted. Classification of the
various machine learning algorithms such as Support Vector extracted features using algorithms like Decision Tree

978-0-7381-1327-2/21/$31.00 ©2021 IEEE 1001

Authorized licensed use limited to: Institute of Technology (Nirma University). Downloaded on September 28,2021 at 18:46:43 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Intelligent Computing and Control Systems (ICICCS 2021)
IEEE Xplore Part Number: CFP21K74-ART; ISBN: 978-0-7381-1327-2

Classifier (DTC) [15], Logistic regression etc., providing A. Model Training


results that were used to differentiate between stress levels, We have considered some of the features in our dataset,
which helped identify psychological disorders. which include work stress, equality at the workplace, job
satisfaction, overtime, working hours, travel, personal life, etc.
III. M ET HODOLOGY After gathering the data and performing correlation analysis, it
The following flowchart represents the course of action for was realized that certain features contributed only marginally
the development of our model. (Fig 2) towards the model’s accuracy. Hence, the following attributes,
like the stature of the company, canteen facilities, campus
environment, etc., were discarded, and the subset mentioned
above, was taken into consideration. A significant feature that
we have considered in our subset was the analysis performed
on the depression questionnaire, containing Goldberg’s
Depression Questionnaire. Furthermore, the dataset is made to
undergo preprocessing to make it suitable for model training.
After preprocessing of data, the model undergoes data training
where the dataset is split into 75% training and 25% test
dataset. Data modelling is performed after the dataset is
trained. Different machine learning algorithms such as Support
Vector Machines (SVM), Decision Tree Classifier (DTC),
Random Forest Classifier (RFC), Gaussian Naive Bayes
(GNB), Logistic Regression (LR), and K-Neighbors (KNN)
are used to determine which algorithm results in the best
accuracy. To better analyze the results and show the
correlation between different features, various graphs have
been implemented.
B. Observations
After developing the model on the dataset, we derived the
following observations:
• Correlation between the features.
• Dependency of attrition on the features.

Fig 3 shows the relationship between Salary Hike and the


Attrition Rate. It can be analyzed that the attrition rate is
inversely proportional to the salary hike, i.e. there is a higher
chance of attrition when the salary hike is low. This shows
that the salary hike is an important feature that affects the
employee attrition rate.

Fig. 3. Attrition v/s Salary Hike


Fig. 2. Flow Of Model
As we can infer from Fig 4, the lower the job satisfaction
higher is the percentage of attrition. As the trends suggest, job

978-0-7381-1327-2/21/$31.00 ©2021 IEEE 1002

Authorized licensed use limited to: Institute of Technology (Nirma University). Downloaded on September 28,2021 at 18:46:43 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Intelligent Computing and Control Systems (ICICCS 2021)
IEEE Xplore Part Number: CFP21K74-ART; ISBN: 978-0-7381-1327-2

satisfaction has a substantial impact on the attrition rate. We


can infer from the graph that job satisfaction is an important
feature that drives the employee to continue working at the
organization.

Fig. 6. Correlation between Work-Life Balance and Stress


Fig. 4. Attrition v/s Job Satisfaction

that mental health is an important measure to be taken into


From Fig 5, it can be inferred that the two attributes, i.e.
consideration.
equal opportunities for everyone and equal distribution of
work, are correlated. The average percentage of attrition when
there are equal opportunities and equal work distribution is less
than the other two results as expected.

Fig. 7. Attrition v/s Depression

I V. RESULT
Fig. 5. Correlation between Equal Opportunities and Equal Work Distribution This paper demonstrates the use of various classification
algorithms Support Vector Machines (SVM), Decision Tree
Fig 6 graph shows a correlation between the balance of Classifier (DTC), Random Forest Classifier (RFC), Gaussian
professional and personal life and stress at work. For Rating Naive Bayes (GNB), Logistic Regression (LR), and K-
4 comparatively, the stress is high, and balance is low, so the Neighbors (KNN), for predicting employee attrition rates in
attrition rate is higher. The average rate of attrition increases an organization.
with the rating as expected. The graph suggests that as balance A comparative study was performed using six different
in life decreases, the stress increases. classification algorithms to enhance accuracy. After training
Depression analysis will be done for employees, and the HR all models, the accuracies of the various models were
Dept will analyze the mental health. Fig. 7 shows the effect compared. Random Forest Classifier Algorithm tops the list
of the depression level on attrition. As the depression level with an accuracy of 86.00%, followed by Gaussian Naive
goes higher, the chances of attrition are higher. This indicates Bayes (GNB) at 81.40%.

978-0-7381-1327-2/21/$31.00 ©2021 IEEE 1003

Authorized licensed use limited to: Institute of Technology (Nirma University). Downloaded on September 28,2021 at 18:46:43 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Intelligent Computing and Control Systems (ICICCS 2021)
IEEE Xplore Part Number: CFP21K74-ART; ISBN: 978-0-7381-1327-2

2) The features can be altered to include questions


pertaining to Work from Home problems like the ease
of access to internet connectivity, personal issues, living
space, etc.
3) Emotional analysis can be extended to include other
mental health disorders like anxiety, stress etc.

V I. CONCLUSION
Our model considers various features and predicts the
attrition and mental health for an individual employee with an
accuracy of 86.0%, which is higher than the existing solutions
using the same algorithm.
The dataset that we have used has independent as well
as correlated attributes. Support Vector Machines (kernel =’
poly’) is better for non-linear problems but has poor
Fig. 8. Comparison of Accuracies performance when used on a dataset with a large number of
features. Furthermore, Naive Bayes(Gaussian) also works well
on non-linear datasets with many features, but it needs to
The following fig 8, shows the bar graph representation
have all the features independent of each other, while there is
of accuracies achieved for the different algorithms we have
some correlation in our case. Finally, Random Forest
used to analyze the attrition rate. Our study results state that
Classification works perfectly for such conditions, where there
Random Forest Classifier (RFC) has achieved the highest
is a correlation between features and the number of features is
accuracy as the features and the correlation between them
large. For the reasons mentioned above, Random Forest
is better suited for RFC. The rest of the models exhibited
Classifier gives higher accuracy than Naive Bayes, followed
comparatively less accuracy. Table 1 shows the details of the
by SVM. For the remaining algorithms, they need the dataset
models’ accuracy and their mean scores.
to be linear, which is not the case, and hence the accuracy
for other algorithms is comparatively low.
T ABLE I Using this model, the employers/HRs can be aware of
COMPARISION OF ACCURACY SCORES
their employees’ mental health and take appropriate steps to
prevent the employees’ attrition. The HR Dept. can focus on
Algorithm Accuracy R Square Score F1 Score employees that need therapy. With this model’s help, business
organizations can ensure that their employees work in a
positive atmosphere without tainting the business’s
Random Forest Classifier 86.00% 0.356 0.8599 productivity and efficiency.

REFERENCES
Gaussian NB 81.40% 0.173 0.814
[1] K Sunanda (2017), AN EMPIRICAL ST UDY ON EMPLOYEE
AT T RITION IN IT INDUST RIES- WITH SPECIFIC REFERENCE
SVM 80.04% 0.099 0.8004 T O WIPRO T ECHNOLOGIES Paper 15.pdf (researcher-
sworld.com)(Online).
[2] T alapatra, Pradip & Rungta, Saket & Anne, Jagadeesh. (2016).
EMPLOYEE AT T RIT ION AND ST RAT EGIC RET ENT ION
Logistic Regression 79.60% 0.062 0.796 CHALLENGES IN INDIAN MANUFACT URING INDUST RIES: A
CASE ST UDY. VSRD International Journal of Business and
Management Research. VI. 251-262.
Decision T ree Classifier 68.00% -0.470 0.68 [3] Mental health problems in the workplace - Harvard Health (Online).
[4] Saidi, S. B. Othman and S. B. Saoud, ”Hybrid CNN-SVM classifier
for efficient depression detection system,” 2020 4th International
Conference on Advanced Systems and Emergent T echnologies (IC
KNN 67.6% -0.488 0.676 ASET ), Hammamet, T unisia, 2020, pp. 229-234, doi: 10.1109/IC
ASET 49463.2020.9318302.
[5] Wongkoblap A, Vadillo MA, Curcin V. Modeling Depression Symptoms
from Social Network Data through Multiple Instance Learning. AMIA
Jt Summits T ransl Sci Proc. 2019;2019:44-53. Published 2019 May 6.
V . FUT URE SCOPE [6] D. S. Sisodia, S. Vishwakarma and A. Pujahari, ”Evaluation of machine
learning models for employee churn prediction,” 2017 International
1) All the major companies and govt institutions can make Conference on Inventive Computing and Informatics (ICICI),
use of our product. This product can be implemented Coimbatore, 2017, pp. 1016-1020, doi: 10.1109/ICICI.2017.8365293.
across various sectors like Finance, Education, IT etc. [7] S. Alduayj and K. Rajpoot, ”Predicting Employee Attrition using
Machine Learning,” 2018 International Conference on Innovations in
The product can be custom-built according to the Information Technology (IIT), Al Ain, United Arab Emirates, 2018, pp.
different needs of the sectors. 93-98, doi: 10.1109/INNOVAT IONS.2018.8605976.

978-0-7381-1327-2/21/$31.00 ©2021 IEEE 1004

Authorized licensed use limited to: Institute of Technology (Nirma University). Downloaded on September 28,2021 at 18:46:43 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Intelligent Computing and Control Systems (ICICCS 2021)
IEEE Xplore Part Number: CFP21K74-ART; ISBN: 978-0-7381-1327-2

[8] Wai, H.A., Chan, K.C.C., Yao, X.: A novel evolutionary data mining
algorithm with applications to churn prediction. IEEE T rans. Evol.
Comput. 7(6), 532–545 (2003).
[9] M. Deshpande and V. Rao, ”Depression detection using emotion
artificial intelligence,” 2017 International Conference on Intelligent
Sustainable Systems (ICISS), Palladam, India, 2017, pp. 858-862, doi:
10.1109/ISS1.2017.8389299.
[10 ] A. R. Subhani, W. Mumtaz, M. N. B. M. Saad, N. Kamel and A. S.
Malik, ”Machine Learning Framework for the Detection of Mental Stress
at Multiple Levels,” in IEEE Access, vol. 5, pp. 13545-13556, 2017, doi:
10.1109/ACCESS.2017.2723622.
[11 ] Haibo He, Yang Bai, E. A. Garcia and Shutao Li, ”ADASYN: Adaptive
synthetic sampling approach for imbalanced learning,” 2008 IEEE
International Joint Conference on Neural Networks (IEEE World
Congress on Computational Intelligence), Hong Kong, China, 2008, pp.
1322-1328, doi: 10.1109/IJCNN.2008.4633969.
[12 ] Parmar, Aakash & Katariya, Rakesh & Patel, Vatsal. (2019). A Review
on Random Forest: An Ensemble Classifier. 10.1007/978-3-030-03146-
6 86.
[13 ] Evgeniou, T heodoros & Pontil, Massimiliano. (2001). Support Vector
Machines: T heory and Applications. 2049. 249-257. 10.1007/3-540-
44673-7 12.
[14 ] Zhang, Shichao & Deng, Zhenyun & Cheng, Debo & Zong, Ming &
Zhu, Xiaoshu. (2016). Efficient kNN Classification Algorithm for Big
Data. Neurocomputing. 195. 10.1016/j.neucom.2015.08.112.
[15 ] Berrar, Daniel. (2018). Bayes’ T heorem and Naive Bayes Classifier.
10.1016/B978-0-12-809633-8.20473-1.
[16 ] Patel, Harsh & Prajapati, Purvi. (2018). Study and Analysis of Decision
T ree-Based Classification Algorithms. International Journal of Computer
Sciences and Engineering. 6. 74-78. 10.26438/ijcse/v6i10.7478.

978-0-7381-1327-2/21/$31.00 ©2021 IEEE 1005

Authorized licensed use limited to: Institute of Technology (Nirma University). Downloaded on September 28,2021 at 18:46:43 UTC from IEEE Xplore. Restrictions apply.

You might also like