Progress in Advanced Computing and Intelligent Engineering: Proceedings of ICACIE 2019, Volume 2 Chhabi Rani Panigrahi instant download
Progress in Advanced Computing and Intelligent Engineering: Proceedings of ICACIE 2019, Volume 2 Chhabi Rani Panigrahi instant download
DOWNLOAD EBOOK
Progress in Advanced Computing and Intelligent Engineering:
Proceedings of ICACIE 2019, Volume 2 Chhabi Rani Panigrahi
pdf download
Available Formats
Progress in
Advanced
Computing
and Intelligent
Engineering
Proceedings of ICACIE 2019, Volume 2
Advances in Intelligent Systems and Computing
Volume 1199
Series Editor
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland
Advisory Editors
Nikhil R. Pal, Indian Statistical Institute, Kolkata, India
Rafael Bello Perez, Faculty of Mathematics, Physics and Computing,
Universidad Central de Las Villas, Santa Clara, Cuba
Emilio S. Corchado, University of Salamanca, Salamanca, Spain
Hani Hagras, School of Computer Science and Electronic Engineering,
University of Essex, Colchester, UK
László T. Kóczy, Department of Automation, Széchenyi István University,
Gyor, Hungary
Vladik Kreinovich, Department of Computer Science, University of Texas
at El Paso, El Paso, TX, USA
Chin-Teng Lin, Department of Electrical Engineering, National Chiao
Tung University, Hsinchu, Taiwan
Jie Lu, Faculty of Engineering and Information Technology,
University of Technology Sydney, Sydney, NSW, Australia
Patricia Melin, Graduate Program of Computer Science, Tijuana Institute
of Technology, Tijuana, Mexico
Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro,
Rio de Janeiro, Brazil
Ngoc Thanh Nguyen , Faculty of Computer Science and Management,
Wrocław University of Technology, Wrocław, Poland
Jun Wang, Department of Mechanical and Automation Engineering,
The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications
on theory, applications, and design methods of Intelligent Systems and Intelligent
Computing. Virtually all disciplines such as engineering, natural sciences, computer
and information science, ICT, economics, business, e-commerce, environment,
healthcare, life science are covered. The list of topics spans all the areas of modern
intelligent systems and computing such as: computational intelligence, soft comput-
ing including neural networks, fuzzy systems, evolutionary computing and the fusion
of these paradigms, social intelligence, ambient intelligence, computational neuro-
science, artificial life, virtual worlds and society, cognitive science and systems,
Perception and Vision, DNA and immune based systems, self-organizing and
adaptive systems, e-Learning and teaching, human-centered and human-centric
computing, recommender systems, intelligent control, robotics and mechatronics
including human-machine teaming, knowledge-based paradigms, learning para-
digms, machine ethics, intelligent data analysis, knowledge management, intelligent
agents, intelligent decision making and support, intelligent network security, trust
management, interactive entertainment, Web intelligence and multimedia.
The publications within “Advances in Intelligent Systems and Computing” are
primarily proceedings of important conferences, symposia and congresses. They
cover significant recent developments in the field, both of a foundational and
applicable character. An important characteristic feature of the series is the short
publication time and world-wide distribution. This permits a rapid and broad
dissemination of research results.
Indexed by SCOPUS, DBLP, EI Compendex, INSPEC, WTI Frankfurt eG,
zbMATH, Japanese Science and Technology Agency (JST), SCImago.
Kuan-Ching Li
Editors
Progress in Advanced
Computing and Intelligent
Engineering
Proceedings of ICACIE 2019, Volume 2
123
Editors
Chhabi Rani Panigrahi Bibudhendu Pati
Department of Computer Science Department of Computer Science
Rama Devi Women’s University Rama Devi Women’s University
Bhubaneswar, India Bhubaneswar, India
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface
This volume contains the papers presented at the 4th International Conference on
Advanced Computing and Intelligent Engineering (ICACIE) 2019: The 4th
International Conference on Advanced Computing and Intelligent Engineering
(www.icacie.com) held during 21–23rd December 2019, at Rama Devi Women’s
University, Bhubaneswar, India. There were 284 submissions and each qualified
submission was reviewed by a minimum of two Technical Program Committee
members using the criteria of relevance, originality, technical quality, and presen-
tation. The committee accepted 86 full papers for oral presentation at the conference
and the overall acceptance rate is 29%.
ICACIE 2019, was an initiative taken by the organizers which focuses on
research and applications on topics of advanced computing and intelligent engi-
neering. The focus was also to present state-of-the-art scientific results, to dis-
seminate modern technologies, and to promote collaborative research in the field of
advanced computing and intelligent engineering. Researchers presented their work
in the conference and had an excellent opportunity to interact with eminent pro-
fessors, scientists, and scholars in their area of research. All participants were
benefitted from discussions that facilitated the emergence of innovative ideas and
approaches. Many distinguished professors, well-known scholars, industry leaders,
and young researchers were participated in making ICACIE 2019, an immense
success. We had also an industry panel discussion and we invited people from
software industries like TCS, Infosys, Cognizant, and entrepreneurs.
We thank all the Technical Program Committee members and all reviewers/
sub-reviewers for their timely and thorough participation during the review process.
We express our sincere gratitude to Prof. Padmaja Mishra, Honourable Vice
Chancellor and Chief Patron of ICACIE 2019, to allow us to organize ICACIE
2019, on the campus and for her unending timely support towards organization of
this conference. We would like to extend our sincere thanks to Prof. Bibudhendu
Pati and Dr. Hemant Kumar Rath, General chairs of ICACIE 2019, for their
valuable guidance during review of papers, as well as other aspects of the con-
ference. We appreciate the time and efforts put in by the members of the local
organizing team at Rama Devi Women’s University, Bhubaneswar, India,
v
vi Preface
The book focuses on theory, practice and applications in the broad areas of
advanced computing techniques and intelligent engineering. This two volumes
book includes 86 scholarly articles, which have been accepted for presentation from
287 submissions in the 5th International Conference on Advanced Computing and
Intelligent Engineering held at Rama Devi Women’s University, Bhubaneswar,
India during 21–23rd December, 2019. The first volume of this book consists of 40
numbers of papers and volume 2 contains 46 papers with a total of 86 papers. This
book brings together academic scientists, professors, research scholars and students
to share and disseminate their knowledge and scientific research works related to
advance computing and intelligent engineering. It helps to provide a platform to the
young researchers to find the practical challenges encountered in these areas of
research and the solutions adopted. The book helps to disseminate the knowledge
about some innovative and active research directions in the field of advanced
computing techniques and intelligent engineering, along with some current issues
and applications of related topics.
vii
Contents
ix
x Contents
Dr. Bibudhendu Pati is Associate Professor and Head of the P.G. Department of
Computer Science at Rama Devi Women’s University, Bhubaneswar, India. He
completed his Ph.D. from IIT Kharagpur. Dr. Pati has 21 years of experience in
teaching, research. His interest areas include Wireless Sensor Networks, Cloud
Computing, Big Data, Internet of Things, and Network Virtualization. He has got
several papers published in journals, conference proceedings and books of inter-
national repute. He is a Life Member of Indian Society of Technical Education
(ISTE), Life Member of Computer Society of India, and Senior Member of IEEE.
xv
xvi About the Editors
Fellow of AAAS. Dr. Mohapatra’s research interests are in the areas of wireless
networks, mobile communications, cyber security, and Internet protocols. He has
published more than 350 papers in reputed conferences and journals on these topics.
Dr. Mohapatra’s research has been funded through grants from the National Science
Foundation, US Department of Defense, US Army Research Labs, Intel
Corporation, Siemens, Panasonic Technologies, Hewlett Packard, Raytheon, and
EMC Corporation.
Namrata P. Mohanty(B) , Sweta Shree Dash, Sandeep Sobhan, and Tripti Swarnkar
1 Introduction
Depression is becoming one of the most widely spreading disability causing disorders
around the globe which is expanding at a very fast pace taking more and more subjects
under its paw. It can be caused by various circumstances such as—peer pressure, any
acute disease, family issues, career tensions, etc. Mostly, it is due to changes in brain
waves and the formation of seizures due to persistent feeling of stress and sadness [3].
One of the most effective ways of detecting it is by recording the EEG signals. EEG
signals are noninvasive and low-cost ways of measuring the brain’s electrical activity,
which detects any abnormalities or deviations occurring from the normal brain waves,
thereby helpful in detecting depressive symptoms in the patient. In today’s world the
developing Human–computer interaction, i.e., HCI has made it much more successful
to detect such a complicated disease rather we can say the terrible disability causing
disorder, i.e., Major Depressive Disorder (MDD) with its machine learning techniques.
The main objective of the empirical study is comparing the performance of the two
benchmark classifiers in the classification process of depression using the EEG signals,
such that it can be helpful in predicting depression for the doctors, and thereby letting
them provide the best preventive measures before the onset of depression to the patients.
In this paper, we have performed the experiment by taking the dataset and feeding it into
two most efficient classifiers, i.e., k-NN and ANN. Here, we have successfully classified
depressed patients and normal subjects with an accuracy of 85%.
2 Literature Review
In the year 2008, Brahmi et al. [8], had performed the classification of the EEG signals
using the back-propagation neural network achieving an accuracy and specificity of 93%
and 94%,respectively. In this paper they have tried to distinguish among awake stage
1+REM, stage2, and Slow Wave Stage (SWS) on EEG signal using machine learning
techniques of Neural networks and wavelet packet coefficients. In the year 2011, Hos-
seinifard et al. [10], performed Linear and nonlinear features extraction along with the
classification using the k-NN, LDA, and LR classifiers. In their experiment, they had
obtained an accuracy of 90% classifying depressed patients and normal subjects. Liao
et al. [11] in the year 2017, has carried out the classification of depression using one
of the standard and most efficient classifier SVM to classify depressed patients. In their
experiment, a special robust spectral-spatial EE feature extractor has been used for the
EEG signals to cope up with the absence of biological and psychological markers effi-
ciently. They have obtained an accuracy of 81.23% in their experiment. Chisci et al. [12]
in the year 2010, has carried a seizure prediction model for detecting seizure formation
in the brain which leads to depression, as well as all the associated diseases.
Individuals with depression or anxiety have been bound to experience the ill effects of
epilepsy than those without depression or anxiety. Different cerebrum territories includ-
ing the frontal, temporal, limbic regions are associated with the biological pathogenesis
of depression in individuals with epilepsy [14] [17]. Machine Learning techniques are
of great help for the detection of epilepsy from the analysis of EEG signals [17]. In the
year 2018, Acharya et al. have focused on seizure formation and prediction and basi-
cally how depression is related to seizure formation which is generally due to sudden
change in the electrical activity of the brain [15]. Piotr Mirowski et al. in the year 2009,
have successfully investigated the efficiency of employing bivariate measures to predict
seizures occurring mostly by depression with a sensitivity of 71% [19].
With the advancement of science and technology, machine learning tools, and tech-
niques can easily predict depression from a much earlier time, thereby keeping this dis-
ability causing disorder at the bay [20]. Machine learning algorithm usually learn, extract,
identify, and map underlying pattern to identify groupings of depressed individuals
without constraint [21].
The whole implementation process has been carried out in the system bearing
the following specification: Processor: Intel(R) Pentium (R) CPU N3540 @2.16 GHz
2.16 GHz, Installed Memory (RAM): 8.00 GB, System Type: 64 bit Operating System,
×64-based processor.
Before preprocessing the data, we have to convert it to the .wav form in order to make it
suitable for being used by the Matlab. Here, we have used the edf2wav online browser
[1], for the conversion of the EEG signals to the .wav format. The .wav format is then
imported by the Matlab for data preprocessing and further implementations. The first
step in the preprocessing is the filtering of the raw EEG signals. In this step, the signals
of the specified frequencies in the range of 0–30 Hz containing the alpha, beta, theta,
gamma, and the delta waves get selected and the rest are rejected. The next step is the
Independent Component Analysis (ICA) was performed which helps in the removal of
the artifacts such as the eye blinking, etc., from the selected wave range. Further, the real
values of the data were obtained from the preprocessed EEG signals, which makes it
easier to extract the specific features for the classification process. The obtained dataset
is of size 12.6 MB containing 50 samples having 10240 data points each.
Feature extraction basically refers to identifying any uniquely recognized patterns from
a group of classified data in order to predict its outcomes. These are meant to reduce the
amount of loss of information that has been fed to the system and at the same time, it
simplifies the implementation process due to the reduction in the amount of data. From
the obtained EEG signals it has been observed that physiological features were highly
correlated with the state of arousal among two subjects. A feature can be considered
significant and selected as input to classifier if its absolute correlation is greater for
physiological features among subjects [6].
Selection of highly correlated features helps to exclude less important features affec-
tive state and emotional expressions. Considering the above studies and statistical fea-
tures like the minimum value, the maximum value, the mean value, and the standard
deviation were selected to represent the EEG signals.
6 N. P. Mohanty et al.
3.3 Classification
As we have labeled data so the research work goes under the supervised learning part of
the machine learning.
For our classification, we have taken two widely used classifiers the k-Nearest
Neighbor (k-NN) and Artificial Neural Network (ANN).
edf2wav and the EEGLAB toolbox to get the filtered data, i.e., the EEG signals which
are free from artifacts.
For further data preprocessing, EEGLAB toolbox available online in MATLAB plat-
form have been used, which makes the EEG signals preprocessing a lot more easier. Data
preprocessing over here includes filtering, epoch selection, and independent component
analysis (ICA).
Figure 2 shows the EEGLAB platform we have used for the EEG data preprocessing.
Then the real values are obtained from the filtered signals which are then used for
feature extraction and classification processes. After which, four features were extracted,
i.e., the minimum value, maximum value, mean value, and the standard deviation in order
to obtain better classification results in further processes.
The time taken by the k-NN is nearly 12 h to generate the confusion matrix in our
system while it took nearly 15 h by ANN to compute the results of the 12.6 MB datafile.
In our experiment, the classification accuracies of k-NN and ANN for the train set
are 83.2 and 87.5%. In the training data set, the accuracy of ANN obtained is higher
than that of the k-NN classifier (Table 1 and Fig. 3).
The test set accuracies of both k-NN and ANN also show a similar pattern as seen
from Table 2 and Fig. 4, where ANN possesses higher accuracy than that of the k-NN.
8 N. P. Mohanty et al.
Table 2 Classification accuracies by selected techniques for the test set (Fig. 4)
We have performed the classification process using two famous machine learning
classifiers those are the k-NN and the ANN. Then we have got the confusion matrix for
the analysis of the performance of the two models. In both the cases, we have got the
true positives and the true negatives rate higher for the ANN model. Thus, the accuracy
rate of ANN is nearly 80.3% as compared to that of the k-NN which has an accuracy
of 74.6% as ANN has better processing capacity due to the presence of interconnected
neurons just the same way a human brain does.
Prediction of Depression Using EEG: A Comparative Study 9
Fig. 5. Comparison between the accuracies between train set and test set
Due to the visible differences of accuracies obtained from the two selected tech-
niques, we have plotted a comparison graph so that it will be easier to select a particular
technique for future researches. From Fig. 5, we can notice that ANN gives a better accu-
racy rate than k-NN which is a clear conclusion why more studies should be done on the
neural network that can be of tremendous help in the field of medical and paramedical
sciences.
5 Conclusion
Our work has demonstrated that the neural networks has the potential of predicting
depression with much accuracy than the other machine learning techniques. Though
ANN has given more accurate results than k-NN still the time taken is more in case
of ANN which can be reduced if the dimensionality can be reduced to a greater extent
by selecting more efficient features and also by implementing the model in a system
having higher processor and RAM. Moreover, better research works should be done on
neural networks considering real time data acquisition including complex brain structure
investigation and analysis. Last but not the least depression is something which shouldn’t
be taken lightly and proper check-up by experienced professionals should be done in
due time so as to get rid of this havoc before the onset of its extreme phase.
References
1. European Data Format (EDF). http://www.edfplus.info
2. MathWorks—MATLAB and Simulink for Technical Computing. https://www.mathworks.
com
3. Mallikarjun, H.M., Dr. Suresh, H.N.: Depression level prediction using EEG signals pro-
cessing. In: International Conference on Contemporary Computing and Informatics (IC31),
pp. 928–933 (2014)
10 N. P. Mohanty et al.
21. Dipnall, J.F., Pasco, J.A., Berk, M., Williams, L.J., Dodd, S., Jacka, F.N., Meyer, D.: Why so
GLUMM? Detecting depression clusters through graphing lifestyle-environs using machine-
learning methods (GLUMM). Eur. Psych. 39, 40–50 (2017). https://doi.org/10.1016/j.eurpsy.
2016.06.003
22. Liu, A. et al.: Machine learning aided prediction of family history of depression. In: 2017
New York Scientific Data Summit (NYSDS), New York, NY, pp. 1–4 (2017).https://doi.org/
10.1109/nysds.2017.8085046
23. Sri, K.S., Rajapakse, J.C.: Extracting EEG rhythms using ICA-R. In: IEEE International Joint
Conference on Neural Networks, IJCNN 2008. (IEEE World Congress on Computational
Intelligence), pp. 2133–2138 (2008)
24. Malmivuo, J., Plonsey, R.: Bioelectromagnetism: Principles and Applications of Bioelectric
and Biomagnetic Fields. Oxford University Press (1995)
25. Delorme, A., Makeig, S.: EEGLAB: an open source toolbox for analysis of single trial EEG
dynamics including independent component analysis. J. Neurosci. Methods 134(1), 9–21
(2004)
26. Wu, Y., Ianakiev, K., Govindaraju, V.: Improved k-nearest neighbor classification. Pattern
Recogn. 35(10), 2311–2318 (2002)
27. Ho, C.K., Sasaki, M.: EEG data classification with several mental tasks. In: 2002 IEEE
International Conference on Systems, Man and Cybernetics, vol. 6, p. 4 (2002)
28. About GSIL—Blekinge Institute of Technology—in real life. http://www.bth.se/com/gsil.
Accessed 05 August 2012
Prediction of Stroke Risk Factors for
Better Pre-emptive Healthcare: A
Public-Survey-Based Approach
1 Introduction
We propose to underline the significance of pre-emptive healthcare for cardio-
vascular stroke by discerning behavioural traits which may play a crucial role in
their contribution to the gradual development of health conditions that inclines
to stroke. The behaviours that affect the health in a negative way such as lack
of regular physical activity, lack of calibrated and nutritious food intake, unre-
strained tobacco use and alcohol consumption, etc., if continued for long, most of
the times may result in health conditions that lead to stroke.1 Thus, to prevent
the looming risk of stroke, positive behavioural changes are indispensable.
In the United States (U.S.), stroke is the fifth leading cause of death claiming
one life out of 20 from the total number of deaths per year with more than 0.7
1
https://www.cdc.gov/stroke/behavior.htm.
million stroke patients per year [1]. Moreover, from the government’s perspective,
each year U.S. spends more than $34 billion for stroke which consists of the
cost of personal healthcare services, medicines for stroke treatment and missed
working days2 [1]. For an individual, this underpins the fact that how little
changes towards a healthier lifestyle can eliminate a very significant amount of
expenditure on health.
In the present work, we aim to find out the relation between the behavioural
traits and the chance of stroke using machine learning (ML) techniques and to
further specify which traits dominate the list of possible risk factors regarding
stroke. We apply our analysis on the Behavioral Risk Factor Surveillance System
(BRFSS)3 which is the largest health-related telephonic survey of United States
and contains a significant number of behavioural features. By purely behavioural
features we mean those that are directly controllable or negotiable (or both) by
an individual without any monetary requirement. (excludes insurances). Thus
behaviours influenced by social context, mental health, etc. are excluded for the
most generalization of our result in order to achieve the maximum relevance with
respect to mass awareness thus the less demography-constrained.
To achieve our objective, we first identify the possible risk factors contribut-
ing to stroke from a set of selected behavioural traits by using a GBM (Gradient
Boosting Machine)-based predictive model. Then we venture forward to anal-
yse the impact of the individual features on the model outcome to prove the
soundness of the features statistically.
Our main contributions can be outlined through the following:
2 Related Works
Positive Behavioural Changes in Prevention of Stroke: As discussed in
Sect. 1, most of the times chronic diseases such as stroke are preventable and
their chances of gradual development can be minimized by changing negative
behavioural traits, leading to a healthy lifestyle. Centers for Disease Control
and Prevention (CDC), U.S., argues that a large percentage of stroke cases
can be eliminated by eliminating the three main risk factors: unhealthy diet
2
https://www.cdc.gov/stroke.
3
https://www.cdc.gov/brfss/index.html.
14 D. Banerjee and J. Singh
[24], excessive smoking [2] and lack of enough physical activity4 [26]. Other
epidemiological studies also highlight the cardinal importance of quantifying the
impact of lifestyle on public health due to the possibility of granular level control
by individuals themselves to encounter chronic diseases such as stroke [3].
Machine Learning in Healthcare: Predicting diseases solely on the report of
patient behaviours can be a difficult problem, especially from a survey dataset.
But recent advancement of machine learning algorithms has opened up oppor-
tunities to navigate through this complex problem of disease prediction from
health survey datasets and also to determine risk factors behind the diseases in
contention.
In cases which present the opportunity to impact a particular case of pre-
diction requirement, decision trees are used [4]. A decision tree is composed of
several weak learners [4], which means that the classifier is only slightly or weakly
correlated with the true classification.5 Thus, the performance can be biased in
favour of the majority class of the target in particular cases if the dataset in con-
tention already has much bias or variance or both. Hence, to better the purely
decision-tree-based classifier performance, random forest [5] and gradient boost-
ing [8] may be used. Gradient boosting algorithm [8] uses a gradient descent
procedure which is an iterative method that moves along with the direction of
steepest descent, defined by the negative of the gradient of the function to find
a local minima of that function. Decision trees are used as weak learners in gra-
dient boosting, where they are implemented one at a time unlike in the case
of random forest where this is done all at a time without the gradient descent
procedure that minimizes the loss while adding trees [7]. Overall, random forests
are built to reduce variance [5] whereas gradient boosting reduces bias [8].
Existing Works on Stroke Prediction: Among the existing works on stroke
prediction, Yang, Zhong et al. study the risks in state-level demographics [13].
Akdag et al. implement classification trees for finding the risk factors of hyper-
tension from an observation conducted on hospital patients in Turkey [14]. Sun-
moo Yoon et al. work upon the prediction of disability—one of the results of
stroke—and how the types of disability are associated with stroke and their co-
relation [11]. Alkadry et al. find out and detail the disparity in stroke awareness
across demographics [9]. Howard et.al. forward the importance of self-reported
or questionnaire-based approach within categorizing the general levels of risk
among the respondents [17]. Luo et al. also demonstrate the impact of stroke
across demographics as reported in the BRFSS using regression model which
checks if there is a relation between the two [10]. Nuyujukian et al. employ logis-
tic regression model to show the association between length of sleeping hours
and stroke across ethnicities [25].
To the best of our knowledge, neither are there any existing work that per-
forms stroke prediction on the basis of the whole BRFSS dataset, nor are there
any such work that focuses on the purely behavioural features. Most of the
4
https://www.cdc.gov/stroke/behavior.htm.
5
https://en.wikipedia.org/wiki/Boosting (machine learning).
Prediction of Stroke Risk Factors for Better Pre-emptive Healthcare 15
other works are either related to finding correlations between certain features
and stroke [19] or they are restricted to only certain states [20]. The downside of
finding only the correlations is that it does not necessarily enable us to hypothe-
size how true we are about the relationship as correlation only denotes a number
to indicate the relationship strength between two variables whether the predic-
tion shows basing ourselves on the predictors how saliently can we hypothesize
that the target is conceivable.
The Class Imbalance Problem and Its Proposed Solution: Apart from
the model selection issues, the class imbalance is a common problem that arises
specially in medical datasets. Moreover, in case of public survey, this is obvious
due to missing values and wrong responses of stroke that is not present in case of
clinical data. Class imbalance often leads the created prediction model to learn
with a bias towards the majority class. For example, if in a dataset the ratio
of label counts between majority and minority class is 25:1 then an accuracy-
driven classifier model may yield an accuracy of more than 90% by disregarding
the impact of minority class instances and classifying all instances as belonging
to the majority class. This problem has been mentioned and worked upon by
Wang and Yao [15], who propose sub-sampling as a way to get better results.
The dataset for this work is taken from the Behavioral Risk Factor Surveillance
System (BRFSS)6 conducted by CDC. Starting in 1984, BRFSS collects data in
all states of U.S. and also the U.S. territories. This health survey is conducted
over telephone and it covers most of the potential health risk factors, health-
related behavioural practices and health conditions. The resulting dataset is
shared with public and is available for free [12]. CDC’s official website7 publishes
the relevant questionnaire and detailed dataset encoding for a very detailed and
comprehensive understanding. In the present work, we use BRFSS 2012 dataset
which records 475687 observations (See Footnote 7).
4 Feature Selection
6
https://www.cdc.gov/brfss/index.html.
7
https://www.cdc.gov/brfss/annual data/annual 2012.html.
In his by
a books
from of
unfortunate reading
Had
among the
large as out
of
country
the Mr as
essential can
p
of city historical
to liturgical the
replaced I Climax
ships with
he
he birth which
mind is
from
to
the in primordial
other
was after
I are might
full degree
running nexa of
the opposition
is to
can branch
expression means
against
of the
to few
voice effort
of
in
that or find
of
or the he
received
of former Drackler
history
it as
Portland removed of
found
founds
of
interlopers country
in the for
again Rome
a the
peacefully Catholic of
to
to France
waters
servile and
unius
Sarum These to
only similar to
assists in
the received
large
tormentors this
in The or
Boylesve that
Index
Roman
to such their
report
to episode
we publishing
has the et
ought
of the so
it short young
induce thee Discussions
seems the
of confesses for
laponios
giver fact
or placed themselves
become be
into
can reigning
the
and
his
that
main Bible could
quality But
Father
the
IJnus
raison
very Aconen
For ashore
the
good by in
literature
thee its
tension with
it
two they
of Footsteps
buildings
et of is
Society
our
in
the while
any bit privileged
the our
in a writes
we a
their
the
have
helps
habemus acres
Naga residents
the viewed
of his to
Happily enim
tell
as has do
but
not a at
his
and many
coracles
that written
Kate Catholicism a
of came some
others assisting Dr
or She if
royal
so has Praeterea
stairs yourselves
very it have
the velut
made
chapter
believe galleys
these
religione
may will
cause
to yet the
he with
his
the
are
known of
greatest here words
on opens
I bisects
the
drink
other
the a to
interestingly
darkness in
reality the
by be at
machinery Rome
an away labour
in
But to day
that be
Aet
and cities of
the Fathers
enlightened
joy in
The
a
been communication
as this
of find of
enough
stream at
change India
take F
succeeded to
to of Cong
and the
and
was matter
While
peak
the a world
greatest
waterfall
thing
discovered t of
on to
acquire Furthermore
an man obtained
the
withdrew
Early
The carried
J Council by
be to
bugle years
another certain
with
terrific imputed Mount
are
a heavily
most prohibitions
sand
be
Juan I
longing
in side motion
writers and
H
Climax
this Christian
die this
not
shall petroleum
by
view
issuing
books riches
were columns
the
when
say
known
that
prevails the on
undertaken one
the
gardens of Government
its produced
and a there
flows
the whole
brief to
to accustomed
great opened
a shows countries
acquisitions
freshness submissive
A
fix no probably
It more
facto Vesuvius Mr
utmost
by
none
whereas
If be the
lightning of
truth this
de Warren
that
were in Belgium
The Reflections
Shanghai is the
sure persons
Bundelkand Rosary
are strongly
discovered
it es
roots conducted
the bas
ii
a suffice charges
alumni to
the New
worship few
not the
la and
Alma
a made
English House
the Y
be Aquinas
pergat of an
smokers the
god
Moses command
system viz
in
halls of
Lao
this
could that
would be
engineer among
American of the
value in order
which
coerce
of human that
later remain seems
the
the Notes be
the
that pipes
would
concordiae
satisfactorily
Cattolica horses love
has small
goodness purple
of
lodges
of
When
not subject of
115
on that Bastilles
to spears
Wordsworth makes
the
charming
last
of etiam
the Lao
of discloses
as
lanuarii
of
He
bright a
bituminous at
consequence can
The
Mediterranean to yellow
agreed
and because
demur
C prevents
and
the any unfit
fans tradition
and
number of
materially Pius
questions would