(IJCST-V10I5P42) :mrs R Jhansi Rani, Manchinti Pavan Kumar Reddy
(IJCST-V10I5P42) :mrs R Jhansi Rani, Manchinti Pavan Kumar Reddy
ABSTRACT
Military personnel have greater psychological stress and are at higher suicide attempt risk compared with the general
population. High mental stress may cause suicide ideations which are crucially driving suicide attempts. However,
traditional statistical methods could only find a moderate degree of correlation between psychological stress and
suicide ideation in non-psychiatric individuals. This project utilizes machine learning techniques including logistic
regression, decision tree, random forest, gradient boosting regression tree, support vector machine and multilayer
perceptron to predict the presence of suicide ideation by six important psychological stress domains of the military
males and females. The accuracies of all the six machine learning methods are over 98%. Among them, the
multilayer perceptron and support vector machine provide the best predictions of suicide ideation approximately to
100%. As compared with the BSRS-5 score ≥ 7, a conventional criterion, for the presence of suicide ideation ≥ 1,
the proposed algorithms can improve the performances of accuracy, sensitivity, specificity, precision, the AUC of
ROC curve and the AUC of PR curve up to 5.7%, 35.9%, 4.6%, 65.2%, 4.3% and 53.2%, respectively; and for the
presence of more severely intense suicide ideation ≥ 2, the improvements are 6.1%, 26.2%, 5.8%, 83.5%, 2.8% and
64.7%, respectively.
Keywords: - Military personnel, Machine Learning, Accuracy, Improvements, Support Vector Machine.
US military war veterans [2]. A meta-analysis detection by machine learning are implemented in
showed consistent results that the worldwide pooled [18]. Recognition of heart murmurs could be
prevalence of PTSD in rescue workers was up to achieved by deep neural networks [19]. In addition,
10.0% [3]. The symptoms of mental disorders Ambale-Venkatesh et al. identified the top-20 risk
developed frequently in those of continued combat factors of incident cardiovascular events by the
exposure and those of repeated deployments [4]. The random survival forest which performance was better
association between military absenteeism and mental than the conventional risk calculators [20]. Therefore,
health problems has been discussed in [5]. The rate of using machine learning and deep learning techniques
suicide attempt among active duty US Army has become an efficient and reliable tool for clinical
personnel has been increasingly higher than that in practice by physicians globally. High mental stress
the civilians [6]. According to an analysis for 27,501 may cause suicide ideations which are crucially
military participants in [7], 14.3% of survey driving suicide attempts. However, traditional
respondents reported suicide ideation and 3.0% statistical methods find merely a moderate correlation
committed suicide. In other words, 21% of those with between psychological stress and suicide ideation.
suicide ideation had suicide attempt. As is known, Machine learning could provide better performance
previous studies have revealed a relationship between of the prediction of suicide ideations. In this paper,
suicide ideation and psychological stress [8],[9]. To we utilize a large sample of the military members for
early predict the presence of suicide ideation and several machine learning techniques by taking the
further prevent the behavior of suicide are essential psychological stress dimensions into consideration to
and important in the military. With the technology predict the presence of suicide ideation. The
improvement and the availability of various kinds of schematic diagram of the proposed method in this
real world big data, artificial intelligence (AI) grows paper is illustrated in Fig. 1. A binary probabilistic
fast accordingly. The academics have made great classifier of machine learning algorithm can
efforts on the computerized algorithms to deal with determine whether the military persons, through their
big data. Machine learning, a combination of AI and questionnaires, have suicide ideations. Machine
computations, could provide accurate diagnosis of learning provides an effective manner for early
diseases and predict the outcomes [10]-[17]. For warning and prevention of suicide by automatic
instance, the circuits for seizure classification and suicide ideation detection.
performance of weak learners (i.e., DT here) in an iterative fashion into a single strong learner to increase the
accuracy of prediction. Our algorithm uses the maximum tree depth as the hyper parameter to be optimized to avoid
over-fitting. Support vector machine (SVM) with linear kernel (Linear SVM) is used for our proposed method. A
data point is viewed as a 6-dimensional vector and we separate such points with a hyperplane. This linear SVM
constructs the maximum-margin hyperplane so that the distance from it to the nearest training data point of any class
(class 0 or class 1) is maximized. Random forest (RF) an ensemble machine learning technique, constructs multiple
decision trees and collects them together for classification. The training algorithm adopted in our method for random
forest is the bootstrap aggregating (bagging) technique. RF builds multiple CART models with different samples
and different initial variables. In each decision tree, a random subset of the features is taken into consideration for
splitting a node. The individual trees are not correlated with each other and thus the trees in random forest of our
method are not pruned. The final prediction result is according to the majority-votes model from the multiple DTs.
RF combines the merits of feature selection and bagging. The decision tree number is the hyper parameter to be
optimized. In the proposed system, the academics have made great efforts on the computerized algorithms to deal with
big data of suicide attempts. The system is more powerful in finding both Psychological Stress, Suicide Ideation. The six
input factors of psychological stress for machine learning include BSRS-5 score, anxiety, depression, hostility,
interpersonal sensitivity and insomnia. This paper uses six machine learning techniques including logistic regression
(LR), decision tree (DT), random forest (RF), gradient boosting decision tree (GBDT), support vector machine
(SVM) and multilayer perceptron (MLP) for the prediction of the presence of suicide ideation of the military
members. The system diagram of proposed method is illustrated in Fig. 2.
of more severely intense suicide ideation (suicide ideation ≥ 2), the numbers for training and test sets are 3191 (class
0: 3157, class 1: 34) and 355 (class 0: 347, class 1: 8), respectively. This imbalance in the dataset between class 0
and class 1 is obvious. This problem is addressed by applying the synthetic minority over-sampling technique
(SMOTE). The training data for class 1 are pre-processed by SMOTE to 3080 and 3157 for the two predictions,
respectively. Support vector machine (SVM) with linear kernel (Linear SVM) is used for our proposed method. A
data point is viewed as a 6-dimensional vector and we separate such points with a hyperplane. This linear SVM
constructs the maximum-margin hyperplane so that the distance from it to the nearest training data point of any class
(class 0 or class 1) is maximized. If the training set is not linearly separable, soft-margin SVM allows the fat
decision margin and some outliers are inside or on the wrong side of the margin. Our method adopts soft-margin
SVM, which minimizes training error traded off against margin. Regularization strategy with a constraint by
regularization term aims to fit training set data and avoid over-fitting. ℓ2-norm is utilized in SVM for our method.
The regularization hyperparameter is optimized in our algorithm to control overfitting.
Random forest (RF), an ensemble machine learning technique, constructs multiple decision trees and
collects them together for classification. The training algorithm adopted in our method for random forest is the
bootstrap aggregating (bagging) technique. RF builds multiple CART models with different samples and different
initial variables. In each decision tree, a random subset of the features is taken into consideration for splitting a node.
The individual trees are not correlated with each other and thus the trees in random forest of our method are not
pruned. The final prediction result is according to the majority-votes model from the multiple DTs. RF combines the
merits of feature selection and bagging. The decision tree number is the hyperparameter to be optimized. Gradient
boosting decision tree (GBDT) is also an ensemble machine learning method and constructs multiple additive
decision tree models. The DTs fitting the gradient on pseudo residuals of previous cumulative models are repeatedly
trained to minimize mean squared error. This sequential stepwise manner combines the performance of weak
learners (i.e., DT here) in an iterative fashion into a single strong learner to increase the accuracy of prediction. Our
algorithm uses the maximum tree depth as the hyperparameter to be optimized to avoid over-fitting. Multilayer
perceptron (MLP) consists of an input layer, hidden layers and an output layer for our algorithm. In fully connected
MLP, each node in one layer connects with a certain weight to every node in the following layer. In the forward
propagation, the signal flow moves from the input layer through the hidden layers to the output layer. Learning is
carried out through backward propagation. The loss function consists of cross entropy and ℓ2-norm regularization to
prevent over-fitting. The optimizer Adam is adopted in our method. Besides regularization hyperparameter, the
numbers of hidden layers, neurons and iterations are also used as the hyperparameters to be optimized in our MLP
method.
of DT and GBDT, the anxiety dimension has less importance compared to other three methods. For the screening
instrument to predict suicide ideation proposed in [8], the sensitivities for psychiatric group, community group and
general medical group are 83.76%, 21.57%, and 10.57%, respectively, and the specificities for the three groups are
72.17%, 99.49%, and 99.88%, respectively. As compared with the BSRS-5 score ≥7 in [9], a conventional criterion,
for the presence of suicide ideation ≥ 1, the proposed algorithms can improve the performances of accuracy,
sensitivity, specificity, precision, the AUC of ROC curve and the AUC of PR curve up to 5.7%, 35.9%, 4.6%,
65.2%, 4.3% and 53.2%, respectively; and for the presence of more severely intense suicide ideation ≥ 2, the
improvements are 6.1%, 26.2%, 5.8%, 83.5%, 2.8% and 64.7%, respectively. Instead of only considering the BSRS-
5 score in screen of suicide ideation like [8] and [9], our algorithm additionally takes the five psychopathological
domains which are related to the BSRS-5 score, i.e., anxiety, depression, hostility, interpersonal sensitivity and
insomnia as the input variables, and our schemes incorporating machine learning techniques provide better results
than those of [8] and [9]. In addition, we add several critical physiological data including age, sex, body height,
body weight, waist circumference, heart rate, systolic blood pressure, diastolic blood pressure and physical activity
on the initial inputs of the BSRS-5 score and related five psychopathological domains in our proposed models. We
find that the performances are only improved by incorporating these nine physiological data into the model of
logistic regression. All of the performances of logistic regression regarding accuracy, sensitivity, specificity and
precision reach 100% for suicide ideation ≥ 1 and 99.9% for suicide ideation ≥ 2. However, for the other five
machine learning methods, the performances are not getting better with additional inputs of these physiological data.
As the incidence of suicide attempts is relatively low, a meta-analysis reveals that the utility of suicide ideation for
predicting later suicide is limited by low positive predictive value and modest sensitivity [41]. Machine learning
techniques for the BSRS-score and related five psychopathological domains can be aimed for later suicide attempts
in future work.
Fig.4 A set of learning curves for each of the six machine learning methods
Fig.5 Feature Importance for the machine learning methods in the proposed algorithm