0% found this document useful (0 votes)
19 views91 pages

Progress in Advanced Computing and Intelligent Engineering: Proceedings of ICACIE 2019, Volume 2 Chhabi Rani Panigrahi instant download

The document details the proceedings of the 4th International Conference on Advanced Computing and Intelligent Engineering (ICACIE) 2019, which took place at Rama Devi Women’s University in Bhubaneswar, India. It includes 86 accepted papers from 284 submissions, focusing on advanced computing techniques and intelligent engineering applications. The volume aims to disseminate innovative research and foster collaboration among researchers in the field.

Uploaded by

vfwvmft7592
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views91 pages

Progress in Advanced Computing and Intelligent Engineering: Proceedings of ICACIE 2019, Volume 2 Chhabi Rani Panigrahi instant download

The document details the proceedings of the 4th International Conference on Advanced Computing and Intelligent Engineering (ICACIE) 2019, which took place at Rama Devi Women’s University in Bhubaneswar, India. It includes 86 accepted papers from 284 submissions, focusing on advanced computing techniques and intelligent engineering applications. The volume aims to disseminate innovative research and foster collaboration among researchers in the field.

Uploaded by

vfwvmft7592
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 91

Progress in Advanced Computing and Intelligent

Engineering: Proceedings of ICACIE 2019, Volume


2 Chhabi Rani Panigrahi pdf download
https://textbookfull.com/product/progress-in-advanced-computing-and-intelligent-engineering-
proceedings-of-icacie-2019-volume-2-chhabi-rani-panigrahi/

★★★★★ 4.6/5.0 (41 reviews) ✓ 210 downloads ■ TOP RATED


"Fantastic PDF quality, very satisfied with download!" - Emma W.

DOWNLOAD EBOOK
Progress in Advanced Computing and Intelligent Engineering:
Proceedings of ICACIE 2019, Volume 2 Chhabi Rani Panigrahi
pdf download

TEXTBOOK EBOOK TEXTBOOK FULL

Available Formats

■ PDF eBook Study Guide TextBook

EXCLUSIVE 2025 EDUCATIONAL COLLECTION - LIMITED TIME

INSTANT DOWNLOAD VIEW LIBRARY


Collection Highlights

Progress in Advanced Computing and Intelligent


Engineering: Proceedings of ICACIE 2019, Volume 1 Chhabi
Rani Panigrahi

Progress in Advanced Computing and Intelligent Engineering


Proceedings of ICACIE 2016 Volume 2 1st Edition Khalid
Saeed

Progress in Advanced Computing and Intelligent Engineering


Proceedings of ICACIE 2016 Volume 1 1st Edition Khalid
Saeed

Advanced Computing and Intelligent Engineering:


Proceedings of ICACIE 2018, Volume 2 1st Edition
Bibudhendu Pati (Editor)
Intelligent Computing: Proceedings of the 2018 Computing
Conference, Volume 2 Kohei Arai

Intelligent Computing Proceedings of the 2020 Computing


Conference Volume 2 Kohei Arai

Advanced Intelligent Systems for Sustainable Development


AI2SD 2019 Volume 2 Advanced Intelligent Systems for
Sustainable Development Applied to Agriculture and Health
Mostafa Ezziyyani

Computing in Engineering and Technology: Proceedings of


ICCET 2019 Brijesh Iyer

Intelligent Computing Proceedings of the 2020 Computing


Conference Volume 3 Kohei Arai
Advances in Intelligent Systems and Computing 1199

Chhabi Rani Panigrahi ·


Bibudhendu Pati ·
Prasant Mohapatra · Rajkumar Buyya ·
Kuan-Ching Li Editors

Progress in
Advanced
Computing
and Intelligent
Engineering
Proceedings of ICACIE 2019, Volume 2
Advances in Intelligent Systems and Computing

Volume 1199

Series Editor
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland

Advisory Editors
Nikhil R. Pal, Indian Statistical Institute, Kolkata, India
Rafael Bello Perez, Faculty of Mathematics, Physics and Computing,
Universidad Central de Las Villas, Santa Clara, Cuba
Emilio S. Corchado, University of Salamanca, Salamanca, Spain
Hani Hagras, School of Computer Science and Electronic Engineering,
University of Essex, Colchester, UK
László T. Kóczy, Department of Automation, Széchenyi István University,
Gyor, Hungary
Vladik Kreinovich, Department of Computer Science, University of Texas
at El Paso, El Paso, TX, USA
Chin-Teng Lin, Department of Electrical Engineering, National Chiao
Tung University, Hsinchu, Taiwan
Jie Lu, Faculty of Engineering and Information Technology,
University of Technology Sydney, Sydney, NSW, Australia
Patricia Melin, Graduate Program of Computer Science, Tijuana Institute
of Technology, Tijuana, Mexico
Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro,
Rio de Janeiro, Brazil
Ngoc Thanh Nguyen , Faculty of Computer Science and Management,
Wrocław University of Technology, Wrocław, Poland
Jun Wang, Department of Mechanical and Automation Engineering,
The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications
on theory, applications, and design methods of Intelligent Systems and Intelligent
Computing. Virtually all disciplines such as engineering, natural sciences, computer
and information science, ICT, economics, business, e-commerce, environment,
healthcare, life science are covered. The list of topics spans all the areas of modern
intelligent systems and computing such as: computational intelligence, soft comput-
ing including neural networks, fuzzy systems, evolutionary computing and the fusion
of these paradigms, social intelligence, ambient intelligence, computational neuro-
science, artificial life, virtual worlds and society, cognitive science and systems,
Perception and Vision, DNA and immune based systems, self-organizing and
adaptive systems, e-Learning and teaching, human-centered and human-centric
computing, recommender systems, intelligent control, robotics and mechatronics
including human-machine teaming, knowledge-based paradigms, learning para-
digms, machine ethics, intelligent data analysis, knowledge management, intelligent
agents, intelligent decision making and support, intelligent network security, trust
management, interactive entertainment, Web intelligence and multimedia.
The publications within “Advances in Intelligent Systems and Computing” are
primarily proceedings of important conferences, symposia and congresses. They
cover significant recent developments in the field, both of a foundational and
applicable character. An important characteristic feature of the series is the short
publication time and world-wide distribution. This permits a rapid and broad
dissemination of research results.
Indexed by SCOPUS, DBLP, EI Compendex, INSPEC, WTI Frankfurt eG,
zbMATH, Japanese Science and Technology Agency (JST), SCImago.

More information about this series at http://www.springer.com/series/11156


Chhabi Rani Panigrahi Bibudhendu Pati
• •

Prasant Mohapatra Rajkumar Buyya


• •

Kuan-Ching Li
Editors

Progress in Advanced
Computing and Intelligent
Engineering
Proceedings of ICACIE 2019, Volume 2

123
Editors
Chhabi Rani Panigrahi Bibudhendu Pati
Department of Computer Science Department of Computer Science
Rama Devi Women’s University Rama Devi Women’s University
Bhubaneswar, India Bhubaneswar, India

Prasant Mohapatra Rajkumar Buyya


Department of Computer Science Cloud Computing and Distributed Systems
University of California (CLOUDS) Lab
Davis, CA, USA School of Computing and Information
Systems, The University of Melbourne
Kuan-Ching Li Melbourne, VIC, Australia
Department of Computer Science
and Information Engineering
Providence University
Taichung, Taiwan

ISSN 2194-5357 ISSN 2194-5365 (electronic)


Advances in Intelligent Systems and Computing
ISBN 978-981-15-6352-2 ISBN 978-981-15-6353-9 (eBook)
https://doi.org/10.1007/978-981-15-6353-9
© Springer Nature Singapore Pte Ltd. 2021
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made. The publisher remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface

This volume contains the papers presented at the 4th International Conference on
Advanced Computing and Intelligent Engineering (ICACIE) 2019: The 4th
International Conference on Advanced Computing and Intelligent Engineering
(www.icacie.com) held during 21–23rd December 2019, at Rama Devi Women’s
University, Bhubaneswar, India. There were 284 submissions and each qualified
submission was reviewed by a minimum of two Technical Program Committee
members using the criteria of relevance, originality, technical quality, and presen-
tation. The committee accepted 86 full papers for oral presentation at the conference
and the overall acceptance rate is 29%.
ICACIE 2019, was an initiative taken by the organizers which focuses on
research and applications on topics of advanced computing and intelligent engi-
neering. The focus was also to present state-of-the-art scientific results, to dis-
seminate modern technologies, and to promote collaborative research in the field of
advanced computing and intelligent engineering. Researchers presented their work
in the conference and had an excellent opportunity to interact with eminent pro-
fessors, scientists, and scholars in their area of research. All participants were
benefitted from discussions that facilitated the emergence of innovative ideas and
approaches. Many distinguished professors, well-known scholars, industry leaders,
and young researchers were participated in making ICACIE 2019, an immense
success. We had also an industry panel discussion and we invited people from
software industries like TCS, Infosys, Cognizant, and entrepreneurs.
We thank all the Technical Program Committee members and all reviewers/
sub-reviewers for their timely and thorough participation during the review process.
We express our sincere gratitude to Prof. Padmaja Mishra, Honourable Vice
Chancellor and Chief Patron of ICACIE 2019, to allow us to organize ICACIE
2019, on the campus and for her unending timely support towards organization of
this conference. We would like to extend our sincere thanks to Prof. Bibudhendu
Pati and Dr. Hemant Kumar Rath, General chairs of ICACIE 2019, for their
valuable guidance during review of papers, as well as other aspects of the con-
ference. We appreciate the time and efforts put in by the members of the local
organizing team at Rama Devi Women’s University, Bhubaneswar, India,

v
vi Preface

especially the faculty members of the Department of Computer Science, student


volunteers, and administrative staff, who dedicated their time and efforts to make
ICACIE 2019, successful. We would like to extend our thanks to Dr. Subhashis Das
Mohapatra for designing and maintaining ICACIE 2019, Website.
We are very grateful to all our sponsors, especially Department of Science and
Technology (DST), Government of India under Consolidation of University
Research for Innovation and Excellence in women universities (CURIE) project for
its generous support towards ICACIE 2019.

Bhubaneswar, India Chhabi Rani Panigrahi


Bhubaneswar, India Bibudhendu Pati
Davis, USA Prasant Mohapatra
Melbourne, Australia Rajkumar Buyya
Taichung, Taiwan Kuan-Ching Li
About This Book

The book focuses on theory, practice and applications in the broad areas of
advanced computing techniques and intelligent engineering. This two volumes
book includes 86 scholarly articles, which have been accepted for presentation from
287 submissions in the 5th International Conference on Advanced Computing and
Intelligent Engineering held at Rama Devi Women’s University, Bhubaneswar,
India during 21–23rd December, 2019. The first volume of this book consists of 40
numbers of papers and volume 2 contains 46 papers with a total of 86 papers. This
book brings together academic scientists, professors, research scholars and students
to share and disseminate their knowledge and scientific research works related to
advance computing and intelligent engineering. It helps to provide a platform to the
young researchers to find the practical challenges encountered in these areas of
research and the solutions adopted. The book helps to disseminate the knowledge
about some innovative and active research directions in the field of advanced
computing techniques and intelligent engineering, along with some current issues
and applications of related topics.

vii
Contents

Advanced Machine Learning Applications


Prediction of Depression Using EEG: A Comparative Study . . . . . . . . . 3
Namrata P. Mohanty, Sweta Shree Dash, Sandeep Sobhan,
and Tripti Swarnkar
Prediction of Stroke Risk Factors for Better Pre-emptive Healthcare:
A Public-Survey-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Debayan Banerjee and Jagannath Singh
Language Identification—A Supportive Tool for Multilingual ASR
in Indian Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Basanta Kumar Swain and Sanghamitra Mohanty
Ensemble Methods to Predict the Locality Scope of Indian
and Hungarian Students for the Real Time: Preliminary Results . . . . . 37
Chaman Verma, Zoltán Illés, and Veronika Stoffová
Automatic Detection and Classification of Tomato Pests
Using Support Vector Machine Based on HOG and LBP Feature
Extraction Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Gayatri Pattnaik and K. Parvathi
Poly Scale Space Technique for Feature Extraction in Lip Reading:
A New Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
M. S. Nandini, Nagappa U. Bhajantri, and Trisiladevi C. Nagavi
Machine Learning Methods for Vehicle Positioning in Vehicular
Ad-Hoc Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Suryakanta Nayak, Partha Sarathi Das, and Satyasen Panda
Effectiveness of Swarm-Based Metaheuristic Algorithm in Data
Classification Using Pi-Sigma Higher Order Neural Network . . . . . . . . 77
Nibedan Panda and Santosh Kumar Majhi

ix
x Contents

Deep Learning for Cover Song Apperception . . . . . . . . . . . . . . . . . . . . 89


D. Khasim Vali and Nagappa U. Bhajantri
SVM-Based Drivers Drowsiness Detection Using Machine Learning
and Image Processing Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
P. Rasna and M. B. Smithamol
Fusion of Artificial Intelligence for Multidisciplinary Optimization:
Skidding Track—Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Abhishek Nigam and Debi Prasad Ghosh
A Single Document Assamese Text Summarization Using
a Combination of Statistical Features and Assamese WordNet . . . . . . . 125
Nomi Baruah, Shikhar Kr. Sarma, and Surajit Borkotokey
SVM and Ensemble-SVM in EEG-Based Person Identification . . . . . . . 137
Banee Bandana Das, Saswat Kumar Ram, Bibudhendu Pati,
Chhabi Rani Panigrahi, Korra Sathya Babu,
and Ramesh Kumar Mohapatra
A Self-Acting Mechanism to Engender Highlights of a Tennis Game . . . 147
Ramanathan Arunachalam and Abishek Kumar
Performance Evaluation of RF and SVM for Sugarcane Classification
Using Sentinel-2 NDVI Time-Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Shyamal Virnodkar, V. K. Pachghare, V. C. Patil, and Sunil Kumar Jha
Classification of Nucleotides Using Memetic Algorithms
and Computational Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Rajesh Eswarawaka, S. Venkata Suryanarayana, Purnachand Kollapudi,
and Mrutyunjaya S. Yalawar
A Novel Approach to Detect Emergency Using Machine Learning . . . . 185
Sarmistha Nanda, Chhabi Rani Panigrahi, Bibudhendu Pati,
and Abhishek Mishra

Data Mining Applications and Sentiment Analysis


A Novel Approach Based on Associative Rule Mining Technique
for Multi-label Classification (ARM-MLC) . . . . . . . . . . . . . . . . . . . . . . . 195
C. P. Prathibhamol, K. Ananthakrishnan, Neeraj Nandan,
Abhijith Venugopal, and Nandu Ravindran
Multilevel Neuron Model Construction Related to Structural Brain
Changes Using Hypergraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
Shalini Ramanathan and Mohan Ramasundaram
Contents xi

AEDBSCAN—Adaptive Epsilon Density-Based Spatial Clustering


of Applications with Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
Vidhi Mistry, Urja Pandya, Anjana Rathwa, Himani Kachroo,
and Anjali Jivani
Impact of Prerequisite Subjects on Academic Performance
Using Association Rule Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Chandra Das, Shilpi Bose, Arnab Chanda, Sandeep Singh, Sumanta Das,
and Kuntal Ghosh
A Supervised Approach to Aspect Term Extraction Using Minimal
Robust Features for Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 237
Manju Venugopalan, Deepa Gupta, and Vartika Bhatia
Correlation of Visual Perceptions and Extraction of Visual
Articulators for Kannada Lip Reading . . . . . . . . . . . . . . . . . . . . . . . . . 252
M. S. Nandini, Nagappa U. Bhajantri, and Trisiladevi C. Nagavi
Automatic Short Answer Grading Using Corpus-Based Semantic
Similarity Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
Bhuvnesh Chaturvedi and Rohini Basak
A Productive Review on Sentimental Analysis for High
Classification Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
Gaurika Jaitly and Manoj Kapil
A Novel Approach to Optimize Deep Neural Network Architectures . . . 295
Harshita Pal and Bhawna Narwal
Effective Identification and Prediction of Breast Cancer Gene
Using Volterra Based LMS/F Adaptive Filter . . . . . . . . . . . . . . . . . . . . 305
Lopamudra Das, Jitendra Kumar Das, and Sarita Nanda
Architecture of Proposed Secured Crypto-Hybrid Algorithm (SCHA)
for Security and Privacy Issues in Data Mining . . . . . . . . . . . . . . . . . . . 315
Pasupuleti Nagendra Babu and S. Ramakrishna
A Technique to Classify Sugarcane Crop from Sentinel-2 Satellite
Imagery Using U-Net Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
Shyamal Virnodkar, V. K. Pachghare, and Sagar Murade
Performance Analysis of Recursive Rule Extraction Algorithms
for Disease Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
Manomita Chakraborty, Saroj Kumar Biswas, and Biswajit Purkayastha
Extraction of Relation Between Attributes and Class in Breast Cancer
Data Using Rule Mining Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
Krishna Mohan, Priyanka C. Nair, Deepa Gupta, Ravi C. Nayar,
and Amritanshu Ram
xii Contents

Recent Challenges in Recommender Systems: A Survey . . . . . . . . . . . . 353


Madhusree Kuanr and Puspanjali Mohapatra
Framework to Detect NPK Deficiency in Maize Plants Using CNN . . . . 366
Padmashri Jahagirdar and Suneeta V. Budihal
Stacked Denoising Autoencoder: A Learning-Based Algorithm
for the Reconstruction of Handwritten Digits . . . . . . . . . . . . . . . . . . . . . 377
Huzaifa M. Maniyar, Nahid Guard, and Suneeta V. Budihal
An Unsupervised Technique to Generate Summaries from
Opinionated Review Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388
Ashwini Rao and Ketan Shah
Scaled Document Clustering and Word Cloud-Based Summarization
on Hindi Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
Prafulla B. Bafna and Jatinderkumar R. Saini

Big Data Analytics, Cloud and IoT


Rough Set Classifications and Performance Analysis in Medical
Health Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
Indrani Kumari Sahu, G. K. Panda, and Susant Kumar Das
IoT-Based Modeling of Electronic Healthcare System Through
Connected Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
Subhasis Mohapatra and Smita Parija
SEHS: Solar Energy Harvesting System for IoT Edge
Node Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
Saswat Kumar Ram, Banee Bandana Das, Bibudhendu Pati,
Chhabi Rani Panigrahi, and Kamala Kanta Mahapatra
An IoT-Based Smart Parking System Using Thingspeak . . . . . . . . . . . . 444
Anagha Bhat, Bharathi Gummanur, Likhitha Priya, and J. Nagaraja
Techniques for Preserving Privacy in Data Mining for Cloud Storage:
A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452
Ila Chandrakar and Vishwanath R. Hulipalled
A QoS Aware Binary Salp Swarm Algorithm for Effective Task
Scheduling in Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
Richa Jain and Neelam Sharma
An Efficient Emergency Management System Using NSGA-II
Optimization Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
V. Ramasamy, B. Gomathy, and Rajesh Kumar Verma
Contents xiii

Load Balancing Using Firefly Approach . . . . . . . . . . . . . . . . . . . . . . . . 483


Manisha T. Tapale, R. H. Goudar, and Mahantesh N. Birje
IoT Security, Challenges, and Solutions: A Review . . . . . . . . . . . . . . . . 493
Jayashree Mohanty, Sushree Mishra, Sibani Patra, Bibudhendu Pati,
and Chhabi Rani Panigrahi
About the Editors

Dr. Chhabi Rani Panigrahi is Assistant Professor in the P.G. Department of


Computer Science at Rama Devi Women’s University, Bhubaneswar, India. She
completed her Ph.D. from Department of Computer Science and Engineering,
Indian Institute of Technology Kharagpur, India. Her research interest areas include
Software Testing and Mobile Cloud Computing. She holds 19 years of teaching and
research experience. She has published several international journals and confer-
ence papers. She is a Life Member of Indian Society of Technical Education (ISTE)
and member of IEEE and Computer Society of India (CSI).

Dr. Bibudhendu Pati is Associate Professor and Head of the P.G. Department of
Computer Science at Rama Devi Women’s University, Bhubaneswar, India. He
completed his Ph.D. from IIT Kharagpur. Dr. Pati has 21 years of experience in
teaching, research. His interest areas include Wireless Sensor Networks, Cloud
Computing, Big Data, Internet of Things, and Network Virtualization. He has got
several papers published in journals, conference proceedings and books of inter-
national repute. He is a Life Member of Indian Society of Technical Education
(ISTE), Life Member of Computer Society of India, and Senior Member of IEEE.

Prof. Prasant Mohapatra is serving as the Vice Chancellor for Research at


University of California, Davis. He is also a Professor in the Department of
Computer Science and served as the Dean and Vice-Provost of Graduate Studies
during 2016-18. He was the Department Chair of Computer Science during
2007-13. In the past, Dr. Mohapatra has also held Visiting Scientist positions at
Intel Corporation, Panasonic Technologies, Institute of Infocomm Research (I2R),
Singapore, and National ICT Australia (NICTA). Dr. Mohapatra received his
doctoral degree from Penn State University in 1993, and received an Outstanding
Engineering Alumni Award in 2008. He is also the recipient of Distinguished
Alumnus Award from the National Institute of Technology, Rourkela, India. Dr.
Mohapatra received an Outstanding Research Faculty Award from the College of
Engineering at the University of California, Davis. He received the HP Labs
Innovation awards in 2011, 2012, and 2013. He is a Fellow of the IEEE and a

xv
xvi About the Editors

Fellow of AAAS. Dr. Mohapatra’s research interests are in the areas of wireless
networks, mobile communications, cyber security, and Internet protocols. He has
published more than 350 papers in reputed conferences and journals on these topics.
Dr. Mohapatra’s research has been funded through grants from the National Science
Foundation, US Department of Defense, US Army Research Labs, Intel
Corporation, Siemens, Panasonic Technologies, Hewlett Packard, Raytheon, and
EMC Corporation.

Prof. Rajkumar Buyya is a Redmond Barry Distinguished Professor and Director


of the Cloud Computing and Distributed Systems (CLOUDS) Laboratory at the
University of Melbourne, Australia. He is also serving as the founding CEO of
Manjrasoft, a spin-off company of the University, commercializing its innovations
in Cloud Computing. He has authored over 650 publications and seven text books
including “Mastering Cloud Computing” published by McGraw Hill, China
Machine Press, and Morgan Kaufmann for Indian, Chinese and international
markets respectively. Dr. Buyya is one of the highly cited authors in computer
science and software engineering worldwide (h-index=120, g-index=255, 76,800+
citations). He is named in the recent Clarivate Analytics’ (formerly Thomson
Reuters) Highly Cited Researchers and “World’s Most Influential Scientific Minds”
for three consecutive years since 2016. Dr. Buyya is recognized as Scopus
Researcher of the Year 2017 with Excellence in Innovative Research Award by
Elsevier for his outstanding contributions to Cloud computing. He served as
founding Editor-in-Chief of the IEEE Transactions on Cloud Computing. He is
currently serving as Editor-in-Chief of Software: Practice and Experience, a long
standing journal in the field established *50 years ago.

Prof. Kuan-Ching Li is currently a Professor in the Department of Computer


Science and Information Engineering at the Providence University, Taiwan. He was
the Vice-Dean for Office of International and Cross-Strait Affairs (OIA) in this same
university since 2014. Prof. Li is recipient of awards from Nvidia, Ministry of
Education (MOE)/Taiwan and Ministry of Science and Technology (MOST)/
Taiwan, as also guest professorship from different universities in China. He got his
PhD from University of Sao Paulo, Sao Paulo, Brazil in 2001. His areas of research
are networked and GPU computing, parallel software design, and performance
evaluation and benchmarking. He has edited 2 books: Cloud Computing and Digital
Media and Big Data published by CRC Press. He is a Fellow of the IET, senior
member of the IEEE and a member of TACC.
Advanced Machine Learning
Applications
Prediction of Depression Using EEG:
A Comparative Study

Namrata P. Mohanty(B) , Sweta Shree Dash, Sandeep Sobhan, and Tripti Swarnkar

Department of Computer Science and Engineering, ITER, S’O’A (Deemed to be University),


Bhubaneswar, India
[email protected], [email protected],
[email protected], [email protected]

Abstract. The worldwide havoc of today’s world: depression, is increasing in


this era. Depression is not any specific disease rather the determinant factor in the
onset of numerous terrible diseases. With the increase in automation and artificial
intelligence, it has become easier to predict depression before a much earlier time.
The machine learning techniques are used in the classification of EEG for the
prediction of different neuro-problems. EEG signals are the brain waves which
can easily detect any abnormalities occurring in the brain waves, thereby making
it easier to predict the seizure formation or depression. Proposed work uses the
EEG signals for the analysis of brain waves, thereby predicting depression. In this
paper, we have compared two widely used benchmark models, i.e., the k-NN and
the ANN for the prediction of depression with an accuracy of 85%. This method
will help doctors and medical associates in predicting diseases before the onset of
its extreme phase, as well as assist them in providing the best treatments, possible
in proper time.

Keywords: Depression · EEG · ANN · k-NN

1 Introduction
Depression is becoming one of the most widely spreading disability causing disorders
around the globe which is expanding at a very fast pace taking more and more subjects
under its paw. It can be caused by various circumstances such as—peer pressure, any
acute disease, family issues, career tensions, etc. Mostly, it is due to changes in brain
waves and the formation of seizures due to persistent feeling of stress and sadness [3].
One of the most effective ways of detecting it is by recording the EEG signals. EEG
signals are noninvasive and low-cost ways of measuring the brain’s electrical activity,
which detects any abnormalities or deviations occurring from the normal brain waves,
thereby helpful in detecting depressive symptoms in the patient. In today’s world the
developing Human–computer interaction, i.e., HCI has made it much more successful
to detect such a complicated disease rather we can say the terrible disability causing
disorder, i.e., Major Depressive Disorder (MDD) with its machine learning techniques.

© Springer Nature Singapore Pte Ltd. 2021


C. R. Panigrahi et al. (eds.), Progress in Advanced Computing and Intelligent Engineering,
Advances in Intelligent Systems and Computing 1199,
https://doi.org/10.1007/978-981-15-6353-9_1
4 N. P. Mohanty et al.

The main objective of the empirical study is comparing the performance of the two
benchmark classifiers in the classification process of depression using the EEG signals,
such that it can be helpful in predicting depression for the doctors, and thereby letting
them provide the best preventive measures before the onset of depression to the patients.
In this paper, we have performed the experiment by taking the dataset and feeding it into
two most efficient classifiers, i.e., k-NN and ANN. Here, we have successfully classified
depressed patients and normal subjects with an accuracy of 85%.

2 Literature Review
In the year 2008, Brahmi et al. [8], had performed the classification of the EEG signals
using the back-propagation neural network achieving an accuracy and specificity of 93%
and 94%,respectively. In this paper they have tried to distinguish among awake stage
1+REM, stage2, and Slow Wave Stage (SWS) on EEG signal using machine learning
techniques of Neural networks and wavelet packet coefficients. In the year 2011, Hos-
seinifard et al. [10], performed Linear and nonlinear features extraction along with the
classification using the k-NN, LDA, and LR classifiers. In their experiment, they had
obtained an accuracy of 90% classifying depressed patients and normal subjects. Liao
et al. [11] in the year 2017, has carried out the classification of depression using one
of the standard and most efficient classifier SVM to classify depressed patients. In their
experiment, a special robust spectral-spatial EE feature extractor has been used for the
EEG signals to cope up with the absence of biological and psychological markers effi-
ciently. They have obtained an accuracy of 81.23% in their experiment. Chisci et al. [12]
in the year 2010, has carried a seizure prediction model for detecting seizure formation
in the brain which leads to depression, as well as all the associated diseases.
Individuals with depression or anxiety have been bound to experience the ill effects of
epilepsy than those without depression or anxiety. Different cerebrum territories includ-
ing the frontal, temporal, limbic regions are associated with the biological pathogenesis
of depression in individuals with epilepsy [14] [17]. Machine Learning techniques are
of great help for the detection of epilepsy from the analysis of EEG signals [17]. In the
year 2018, Acharya et al. have focused on seizure formation and prediction and basi-
cally how depression is related to seizure formation which is generally due to sudden
change in the electrical activity of the brain [15]. Piotr Mirowski et al. in the year 2009,
have successfully investigated the efficiency of employing bivariate measures to predict
seizures occurring mostly by depression with a sensitivity of 71% [19].
With the advancement of science and technology, machine learning tools, and tech-
niques can easily predict depression from a much earlier time, thereby keeping this dis-
ability causing disorder at the bay [20]. Machine learning algorithm usually learn, extract,
identify, and map underlying pattern to identify groupings of depressed individuals
without constraint [21].

3 Materials and Methods


Our main focus over here is analyzing the two most widely known classifiers which can
be used in the field of medical science to predict depression from the EEG signals much
earlier. Our implementation process follows the path below (Fig. 1).
Prediction of Depression Using EEG: A Comparative Study 5

Data Feature Classification Comparison


Pre- Extraction: : of results:
EEG processing: Min value, max Classifiers: Accuracy and
Signals value, mean k-NN and time taken.
Filtering value and ANN
ICA standard
deviation

Fig. 1. Implementation Procedure

The whole implementation process has been carried out in the system bearing
the following specification: Processor: Intel(R) Pentium (R) CPU N3540 @2.16 GHz
2.16 GHz, Installed Memory (RAM): 8.00 GB, System Type: 64 bit Operating System,
×64-based processor.

3.1 Data Preprocessing

Before preprocessing the data, we have to convert it to the .wav form in order to make it
suitable for being used by the Matlab. Here, we have used the edf2wav online browser
[1], for the conversion of the EEG signals to the .wav format. The .wav format is then
imported by the Matlab for data preprocessing and further implementations. The first
step in the preprocessing is the filtering of the raw EEG signals. In this step, the signals
of the specified frequencies in the range of 0–30 Hz containing the alpha, beta, theta,
gamma, and the delta waves get selected and the rest are rejected. The next step is the
Independent Component Analysis (ICA) was performed which helps in the removal of
the artifacts such as the eye blinking, etc., from the selected wave range. Further, the real
values of the data were obtained from the preprocessed EEG signals, which makes it
easier to extract the specific features for the classification process. The obtained dataset
is of size 12.6 MB containing 50 samples having 10240 data points each.

3.2 Feature Extraction

Feature extraction basically refers to identifying any uniquely recognized patterns from
a group of classified data in order to predict its outcomes. These are meant to reduce the
amount of loss of information that has been fed to the system and at the same time, it
simplifies the implementation process due to the reduction in the amount of data. From
the obtained EEG signals it has been observed that physiological features were highly
correlated with the state of arousal among two subjects. A feature can be considered
significant and selected as input to classifier if its absolute correlation is greater for
physiological features among subjects [6].
Selection of highly correlated features helps to exclude less important features affec-
tive state and emotional expressions. Considering the above studies and statistical fea-
tures like the minimum value, the maximum value, the mean value, and the standard
deviation were selected to represent the EEG signals.
6 N. P. Mohanty et al.

3.3 Classification

As we have labeled data so the research work goes under the supervised learning part of
the machine learning.
For our classification, we have taken two widely used classifiers the k-Nearest
Neighbor (k-NN) and Artificial Neural Network (ANN).

3.3.1 K-Nearest Neighbor (KNN)


k-NN has become one of the most popularly used classification techniques because of
its ease of interpretation methodology, high predictive power, and low calculation time
properties. In k-NN an object is classified by a plurality vote of its neighbors, with the
object being assigned to the class most common among its k nearest neighbors [26].

3.3.2 Artificial Neural Networks


Back in the 90s, ANN was proposed by Nobel laureates Hubel and Wiesel. This proposed
model basically mimics the human brain, tries to process information, and gives output
just like its been done by a human brain. It is one of the most efficient classification tools
frequently being used in the classification of various diseases and prediction techniques
especially in the medical world [27].
In our paper, we have fed the EEG signals to the neural network model where it
gets processed by two input layers and two hidden layers giving rise to one output layer
using the rectified linear unit (ReLU) activation function for the hidden layers and binary
sigmoid activation function for the output layer.

3.4 Performance Measures


Performance measures of the models were then calculated in terms of ROC, accuracy,
and time taken. Time taken is one the prime aspect on the basis of which the performance
has been judged. The lesser the time taken to operate, the more coherent the model is
considered to be.

4 Results and Discussions


EEG records the brain signals over a particular period of time, which then shows whether
the obtained signals are the same as the normal waves or not. Basically, there are 5 types
of brain signals on which we are concentrating upon based on the frequency ranges,
i.e., delta, theta, alpha, beta, and gamma. Signals are measured for a short duration, i.e.,
20–40 min and are produced by the continuous striking of electrons within the brain.
There are a lot of EEG databases available online of both normal subjects and controls
which are widely used for research purposes. Here, in our study, we have taken the
DEAP dataset [5], which is freely available online only for research purposes.
In our above experiment, we have taken the EEG signals for the analysis of the brain
waves. Various preprocessing has been carried out in different platforms such as the
Prediction of Depression Using EEG: A Comparative Study 7

edf2wav and the EEGLAB toolbox to get the filtered data, i.e., the EEG signals which
are free from artifacts.
For further data preprocessing, EEGLAB toolbox available online in MATLAB plat-
form have been used, which makes the EEG signals preprocessing a lot more easier. Data
preprocessing over here includes filtering, epoch selection, and independent component
analysis (ICA).
Figure 2 shows the EEGLAB platform we have used for the EEG data preprocessing.

Fig. 2. EEGLAB toolbox: for EEG signal processing [2]

Then the real values are obtained from the filtered signals which are then used for
feature extraction and classification processes. After which, four features were extracted,
i.e., the minimum value, maximum value, mean value, and the standard deviation in order
to obtain better classification results in further processes.
The time taken by the k-NN is nearly 12 h to generate the confusion matrix in our
system while it took nearly 15 h by ANN to compute the results of the 12.6 MB datafile.
In our experiment, the classification accuracies of k-NN and ANN for the train set
are 83.2 and 87.5%. In the training data set, the accuracy of ANN obtained is higher
than that of the k-NN classifier (Table 1 and Fig. 3).
The test set accuracies of both k-NN and ANN also show a similar pattern as seen
from Table 2 and Fig. 4, where ANN possesses higher accuracy than that of the k-NN.
8 N. P. Mohanty et al.

Table 1. Classification accuracies by selected techniques for train set (Fig. 3)

Techniques Accuracy (%)


k-NN 83.2
ANN 87.5

Fig. 3. Classification accuracies by selected techniques for train set (set A)

Table 2 Classification accuracies by selected techniques for the test set (Fig. 4)

Techniques Accuracy (%)


k-NN 74.6
ANN 80.3

Fig. 4. Classification accuracies by selected techniques for test set (set B)

We have performed the classification process using two famous machine learning
classifiers those are the k-NN and the ANN. Then we have got the confusion matrix for
the analysis of the performance of the two models. In both the cases, we have got the
true positives and the true negatives rate higher for the ANN model. Thus, the accuracy
rate of ANN is nearly 80.3% as compared to that of the k-NN which has an accuracy
of 74.6% as ANN has better processing capacity due to the presence of interconnected
neurons just the same way a human brain does.
Prediction of Depression Using EEG: A Comparative Study 9

Fig. 5. Comparison between the accuracies between train set and test set

Due to the visible differences of accuracies obtained from the two selected tech-
niques, we have plotted a comparison graph so that it will be easier to select a particular
technique for future researches. From Fig. 5, we can notice that ANN gives a better accu-
racy rate than k-NN which is a clear conclusion why more studies should be done on the
neural network that can be of tremendous help in the field of medical and paramedical
sciences.

5 Conclusion

Our work has demonstrated that the neural networks has the potential of predicting
depression with much accuracy than the other machine learning techniques. Though
ANN has given more accurate results than k-NN still the time taken is more in case
of ANN which can be reduced if the dimensionality can be reduced to a greater extent
by selecting more efficient features and also by implementing the model in a system
having higher processor and RAM. Moreover, better research works should be done on
neural networks considering real time data acquisition including complex brain structure
investigation and analysis. Last but not the least depression is something which shouldn’t
be taken lightly and proper check-up by experienced professionals should be done in
due time so as to get rid of this havoc before the onset of its extreme phase.

References
1. European Data Format (EDF). http://www.edfplus.info
2. MathWorks—MATLAB and Simulink for Technical Computing. https://www.mathworks.
com
3. Mallikarjun, H.M., Dr. Suresh, H.N.: Depression level prediction using EEG signals pro-
cessing. In: International Conference on Contemporary Computing and Informatics (IC31),
pp. 928–933 (2014)
10 N. P. Mohanty et al.

4. Biosemi EEG ECG EMG BSPM NEURO amplifiers systems. http://www.biosemi.com/faq/


file_format.htm
5. https://www.eecs.qmul.ac.uk/mmv/datasets/deap/download.html
6. Khan, N.A., Jönsson, P., Sandsten, M., Performance comparison of time-frequency distribu-
tions for estimation of instantaneous frequency of heart rate variability signals. Appl. Sci.
7(3), 221 (2017). https://doi.org/10.3390/app7030221
7. Gautam, R., Mrs. Shimi S.L.: Features extraction and depression level prediction by using
EEG signals. Int. Res. J. Eng. Technol. (IRJET) 04(05) (2017)
8. Ebrahimi, F., Mikaeili, M., Estrada, E., Nazeran, H.: Automatic sleep stage classification based
on EEG signals by using neural networks and wavelet packet coefficients. In: 30th Annual
International IEEE EMBS Conference Vancouver, British Columbia, Canada, August 20–24,
2008, pp. 1151–1154. https://doi.org/10.1109/iembs.2008.4649365
9. Knott, Verner., Mahoney, Colleen., Kennedy, Sidney, Evans, Kenneth: EEG power, frequency,
asymmetry and coherence in male depression. Psych. Res. Neuroimaging Sect. 106, 123–140
(2001)
10. Hosseinifard, B., Moradi, M.H., Rostami, R.: Classifying depression patients and normal
subjects using machine learning techniques. In: 2011 19th Iranian Conference on Electrical
Engineering, Tehran, pp. 1–1 (2011)
11. Shih-Cheng Liao, Chien-Te Wu, Hao-Chuan Huang, Wei-Teng Cheng, Yi-Hung Liu, ”
Major Depression Detection from EEG Signals Using Kernel Eigen-Filter-Bank Common
Spatial Patterns”, Sensors (Basel) 2017 Jun; 17(6): 1385. Published online 2017 Jun 14.
10.3390/s17061385
12. Chisci, L., Mavino, A., Perferi, G., Sciandrone, M., Anile, C., Colicchio, G., Fuggetta, F.:
Real-time epileptic seizure prediction using AR models and support vector machines. IEEE
Trans. Biomed. Eng. 57(5), 1124–32 (2010). https://doi.org/10.1109/TBME.2009.2038990.
Epub 2010 Feb 17
13. Karim, H.T., Wang, M., Andreescu, C., Tudorascu, D., Butters, M.A., Karp, J.F., Reynolds,
C.F., 3rd Aizenstein, H.J.: Acute trajectories of neural activation predict remission to phar-
macotherapy in late-life depression. Neuroimage Clin 8(19), 831–839 (2018). https://doi.org/
10.1016/j.nicl.2018.06.006
14. Kwon, Oh-Young, Park, Sung-Pa: Depression and anxiety in people with epilepsy. J Clin
Neurol. 10(3), 175–188 (2014). https://doi.org/10.3988/jcn.2014.10.3.175
15. Acharya, U.R., Hagiwara, Y., Adeli, H.: Automated seizure prediction. Epilepsy Behav. 88,
251–261 (2018). https://doi.org/10.1016/j.yebeh.2018.09.030. Epub 2018 Oct 11
16. Varatharajah, Y., Iyer, R.K., Berry, B.M., Worrell, G.A., Brinkmann, B.H.: Seizure forecasting
and the preictal state in canine epilepsy. Int. J. Neural Syst. 27:1650046 (2017) [12 pp.]
17. Günay, M., Ensari, T.: EEG signal analysis of patients with epilepsy disorder using
machine learning techniques. In: 2018 Electric Electronics, Computer Science, Biomedical
Engineerings’ Meeting (EBBT), Istanbul, pp. 1–4 (2018)
18. Kumar, P.N., Kareemullah, H.: EEG signal with feature extraction using SVM and ICA clas-
sifiers. In: International Conference on Information Communication and Embedded Systems
(ICICES2014), Chennai, pp. 1–7 (2014). https://doi.org/10.1109/icices.2014.7034090
19. Mirowski, P., Madhavan, D., LeCun, Y., Kuzniecky, R.: Classification of patterns of EEG
synchronization for seizure prediction. Clin. Neurophysiol. 120(11), 1927–1940 (2009)
20. Jauhar, S., Krishnadas, R., Nour, M.M., Cunningham-Owens, D., Johnstone, E.C., Lawrie,
S.M.: Is there a symptomatic distinction between the affective psychoses and schizophrenia?
A machine learning approach. Schizophr. Res. 202, 241–247 (2018). https://doi.org/10.1016/
j.schres.2018.06.070
Prediction of Depression Using EEG: A Comparative Study 11

21. Dipnall, J.F., Pasco, J.A., Berk, M., Williams, L.J., Dodd, S., Jacka, F.N., Meyer, D.: Why so
GLUMM? Detecting depression clusters through graphing lifestyle-environs using machine-
learning methods (GLUMM). Eur. Psych. 39, 40–50 (2017). https://doi.org/10.1016/j.eurpsy.
2016.06.003
22. Liu, A. et al.: Machine learning aided prediction of family history of depression. In: 2017
New York Scientific Data Summit (NYSDS), New York, NY, pp. 1–4 (2017).https://doi.org/
10.1109/nysds.2017.8085046
23. Sri, K.S., Rajapakse, J.C.: Extracting EEG rhythms using ICA-R. In: IEEE International Joint
Conference on Neural Networks, IJCNN 2008. (IEEE World Congress on Computational
Intelligence), pp. 2133–2138 (2008)
24. Malmivuo, J., Plonsey, R.: Bioelectromagnetism: Principles and Applications of Bioelectric
and Biomagnetic Fields. Oxford University Press (1995)
25. Delorme, A., Makeig, S.: EEGLAB: an open source toolbox for analysis of single trial EEG
dynamics including independent component analysis. J. Neurosci. Methods 134(1), 9–21
(2004)
26. Wu, Y., Ianakiev, K., Govindaraju, V.: Improved k-nearest neighbor classification. Pattern
Recogn. 35(10), 2311–2318 (2002)
27. Ho, C.K., Sasaki, M.: EEG data classification with several mental tasks. In: 2002 IEEE
International Conference on Systems, Man and Cybernetics, vol. 6, p. 4 (2002)
28. About GSIL—Blekinge Institute of Technology—in real life. http://www.bth.se/com/gsil.
Accessed 05 August 2012
Prediction of Stroke Risk Factors for
Better Pre-emptive Healthcare: A
Public-Survey-Based Approach

Debayan Banerjee(B) and Jagannath Singh

KIIT Deemed to be University, Bhubaneswar, India


[email protected], [email protected]

Abstract. This work endeavours to explore the relation between cer-


tain behavioural traits and prevalent diseases among the sample popu-
lation,reported in a public health survey,by means of machine learning
techniques. Predictive models are developed to ascertain the significance
statistically while also checking the fitness of the models to predict the
diseases in a non-invasive way. Our study focuses on cardiovascular stroke
from the BRFSS database of CDC, USA. The proposed model achieves
0.71 AuC in predicting stroke from purely behavioural features. Further
analysis reveals an interesting behavioural trait: proper maintenance of
an individual’s work–life balance, apart from the three main conventional
habits: regular physical activity, healthy diet, abstinence from heavy
smoking and drinking as the most significant actors for reducing the
risk of potential stroke.

Keywords: Stroke prediction · Behavioural features · Predictive


model · Gradient boosting · BRFSS

1 Introduction
We propose to underline the significance of pre-emptive healthcare for cardio-
vascular stroke by discerning behavioural traits which may play a crucial role in
their contribution to the gradual development of health conditions that inclines
to stroke. The behaviours that affect the health in a negative way such as lack
of regular physical activity, lack of calibrated and nutritious food intake, unre-
strained tobacco use and alcohol consumption, etc., if continued for long, most of
the times may result in health conditions that lead to stroke.1 Thus, to prevent
the looming risk of stroke, positive behavioural changes are indispensable.
In the United States (U.S.), stroke is the fifth leading cause of death claiming
one life out of 20 from the total number of deaths per year with more than 0.7
1
https://www.cdc.gov/stroke/behavior.htm.

c Springer Nature Singapore Pte Ltd. 2021


C. R. Panigrahi et al. (eds.), Progress in Advanced Computing and Intelligent Engineering,
Advances in Intelligent Systems and Computing 1199,
https://doi.org/10.1007/978-981-15-6353-9_2
Prediction of Stroke Risk Factors for Better Pre-emptive Healthcare 13

million stroke patients per year [1]. Moreover, from the government’s perspective,
each year U.S. spends more than $34 billion for stroke which consists of the
cost of personal healthcare services, medicines for stroke treatment and missed
working days2 [1]. For an individual, this underpins the fact that how little
changes towards a healthier lifestyle can eliminate a very significant amount of
expenditure on health.
In the present work, we aim to find out the relation between the behavioural
traits and the chance of stroke using machine learning (ML) techniques and to
further specify which traits dominate the list of possible risk factors regarding
stroke. We apply our analysis on the Behavioral Risk Factor Surveillance System
(BRFSS)3 which is the largest health-related telephonic survey of United States
and contains a significant number of behavioural features. By purely behavioural
features we mean those that are directly controllable or negotiable (or both) by
an individual without any monetary requirement. (excludes insurances). Thus
behaviours influenced by social context, mental health, etc. are excluded for the
most generalization of our result in order to achieve the maximum relevance with
respect to mass awareness thus the less demography-constrained.
To achieve our objective, we first identify the possible risk factors contribut-
ing to stroke from a set of selected behavioural traits by using a GBM (Gradient
Boosting Machine)-based predictive model. Then we venture forward to anal-
yse the impact of the individual features on the model outcome to prove the
soundness of the features statistically.
Our main contributions can be outlined through the following:

– We are able to discern a set of behavioural traits as a possible risk factor


regarding stroke along with their comparative individual statistical signifi-
cance in contributing to stroke.
– Apart from the conventional habits our analysis identifies that maintaining
a healthy work–life balance as well as sharing household responsibilities in
relevant cases can be significantly beneficial to maintain good health.
– Furthermore, our findings are based on the whole BRFSS dataset signifying
the whole U.S. Thus not being constrained to specific demographics or states,
our results provide a holistic view to anyone concerned with this matter—
individuals and policymakers.

2 Related Works
Positive Behavioural Changes in Prevention of Stroke: As discussed in
Sect. 1, most of the times chronic diseases such as stroke are preventable and
their chances of gradual development can be minimized by changing negative
behavioural traits, leading to a healthy lifestyle. Centers for Disease Control
and Prevention (CDC), U.S., argues that a large percentage of stroke cases
can be eliminated by eliminating the three main risk factors: unhealthy diet
2
https://www.cdc.gov/stroke.
3
https://www.cdc.gov/brfss/index.html.
14 D. Banerjee and J. Singh

[24], excessive smoking [2] and lack of enough physical activity4 [26]. Other
epidemiological studies also highlight the cardinal importance of quantifying the
impact of lifestyle on public health due to the possibility of granular level control
by individuals themselves to encounter chronic diseases such as stroke [3].
Machine Learning in Healthcare: Predicting diseases solely on the report of
patient behaviours can be a difficult problem, especially from a survey dataset.
But recent advancement of machine learning algorithms has opened up oppor-
tunities to navigate through this complex problem of disease prediction from
health survey datasets and also to determine risk factors behind the diseases in
contention.
In cases which present the opportunity to impact a particular case of pre-
diction requirement, decision trees are used [4]. A decision tree is composed of
several weak learners [4], which means that the classifier is only slightly or weakly
correlated with the true classification.5 Thus, the performance can be biased in
favour of the majority class of the target in particular cases if the dataset in con-
tention already has much bias or variance or both. Hence, to better the purely
decision-tree-based classifier performance, random forest [5] and gradient boost-
ing [8] may be used. Gradient boosting algorithm [8] uses a gradient descent
procedure which is an iterative method that moves along with the direction of
steepest descent, defined by the negative of the gradient of the function to find
a local minima of that function. Decision trees are used as weak learners in gra-
dient boosting, where they are implemented one at a time unlike in the case
of random forest where this is done all at a time without the gradient descent
procedure that minimizes the loss while adding trees [7]. Overall, random forests
are built to reduce variance [5] whereas gradient boosting reduces bias [8].
Existing Works on Stroke Prediction: Among the existing works on stroke
prediction, Yang, Zhong et al. study the risks in state-level demographics [13].
Akdag et al. implement classification trees for finding the risk factors of hyper-
tension from an observation conducted on hospital patients in Turkey [14]. Sun-
moo Yoon et al. work upon the prediction of disability—one of the results of
stroke—and how the types of disability are associated with stroke and their co-
relation [11]. Alkadry et al. find out and detail the disparity in stroke awareness
across demographics [9]. Howard et.al. forward the importance of self-reported
or questionnaire-based approach within categorizing the general levels of risk
among the respondents [17]. Luo et al. also demonstrate the impact of stroke
across demographics as reported in the BRFSS using regression model which
checks if there is a relation between the two [10]. Nuyujukian et al. employ logis-
tic regression model to show the association between length of sleeping hours
and stroke across ethnicities [25].
To the best of our knowledge, neither are there any existing work that per-
forms stroke prediction on the basis of the whole BRFSS dataset, nor are there
any such work that focuses on the purely behavioural features. Most of the

4
https://www.cdc.gov/stroke/behavior.htm.
5
https://en.wikipedia.org/wiki/Boosting (machine learning).
Prediction of Stroke Risk Factors for Better Pre-emptive Healthcare 15

other works are either related to finding correlations between certain features
and stroke [19] or they are restricted to only certain states [20]. The downside of
finding only the correlations is that it does not necessarily enable us to hypothe-
size how true we are about the relationship as correlation only denotes a number
to indicate the relationship strength between two variables whether the predic-
tion shows basing ourselves on the predictors how saliently can we hypothesize
that the target is conceivable.
The Class Imbalance Problem and Its Proposed Solution: Apart from
the model selection issues, the class imbalance is a common problem that arises
specially in medical datasets. Moreover, in case of public survey, this is obvious
due to missing values and wrong responses of stroke that is not present in case of
clinical data. Class imbalance often leads the created prediction model to learn
with a bias towards the majority class. For example, if in a dataset the ratio
of label counts between majority and minority class is 25:1 then an accuracy-
driven classifier model may yield an accuracy of more than 90% by disregarding
the impact of minority class instances and classifying all instances as belonging
to the majority class. This problem has been mentioned and worked upon by
Wang and Yao [15], who propose sub-sampling as a way to get better results.

3 Dataset Collection and Description

The dataset for this work is taken from the Behavioral Risk Factor Surveillance
System (BRFSS)6 conducted by CDC. Starting in 1984, BRFSS collects data in
all states of U.S. and also the U.S. territories. This health survey is conducted
over telephone and it covers most of the potential health risk factors, health-
related behavioural practices and health conditions. The resulting dataset is
shared with public and is available for free [12]. CDC’s official website7 publishes
the relevant questionnaire and detailed dataset encoding for a very detailed and
comprehensive understanding. In the present work, we use BRFSS 2012 dataset
which records 475687 observations (See Footnote 7).

4 Feature Selection

According to the survey features mentioned in BRFSS 2012 related to stroke


that are purely behavioural, excluding our target variable denoting positive or
negative response for stroke, we pick out 15 features as candidate risk factors as
discussed in Sect. 1 and the BRFSS Codebook (See Footnote 7). In Table 1, we
present the variable names coded in BRFSS.

6
https://www.cdc.gov/brfss/index.html.
7
https://www.cdc.gov/brfss/annual data/annual 2012.html.
In his by

a books

from of

unfortunate reading

Had

their your guardian

among the

large as out

sufficient cyclopean Mothshade


theologians worship

of

country

the Mr as

essential can

p
of city historical

to liturgical the

replaced I Climax

the gain romantic

unoccupied the combined

ships with

he
he birth which

mind is

from

to

the in primordial

other

was after
I are might

full degree

running nexa of

the opposition

is to

advantageous list before

can branch
expression means

against

of the

to few

voice effort

of

Maynooth tenants but


pages

in

that or find

of

or the he

received
of former Drackler

history

coral too feet

it as

Portland removed of

found

founds

of

and Mass forgive

interlopers country
in the for

New reverence that

again Rome

a the

peacefully Catholic of

Nazareth followers active

to

window more Society


Father

to France

waters

servile and

this and the

unius

Sarum These to

your Volumus explanation

only similar to
assists in

the received

large

tormentors this

in The or

Boylesve that

that many and

Index

Roman

to such their
report

to episode

we publishing

has the et

ought

of the so

it short young
induce thee Discussions

seems the

of confesses for

from illam being

laponios

giver fact
or placed themselves

become be

into

can reigning

the

and

his

that
main Bible could

quality But

Father

the

IJnus
raison

very Aconen

For ashore

and the Anderledy

the

good by in

literature

thee its
tension with

it

two they

of Footsteps

buildings

et of is

Society

our

in

the while
any bit privileged

the our

in a writes

we a

their

the

have

that not thus

helps
habemus acres

Naga residents

the viewed

show women extent

of his to

Happily enim
tell

as has do

but

not a at

his
and many

coracles

that written

Kate Catholicism a

of came some
others assisting Dr

or She if

royal

so has Praeterea

stairs yourselves

very it have

the velut

made
chapter

believe galleys

these

religione

may will
cause

to yet the

he with

his

the

are

known of
greatest here words

on opens

promises English that

Notes possessed Neither

vice the clearly


the as international

I bisects

the

drink

adequate better came


is

other

etiam give WIS

the a to

interestingly

darkness in

secrets scattered captivity


throw

reality the

by be at

machinery Rome

an away labour

in

But to day

that be

advance words this


suam no ontology

Aet

and cities of

known with there

the Fathers

enlightened

joy in

The
a

been communication

as this

of find of

enough

stream at

change India

take F
succeeded to

to of Cong

and the

and

was matter

While

peak

the a world
greatest

waterfall

thing

discovered t of

on to

acquire Furthermore

an man obtained

the
withdrew

Early

The carried

J Council by

be to
bugle years

gallons material Paul

others 1886 tells

another certain

with
terrific imputed Mount

are

a heavily

most prohibitions

sand
be

Juan I

kingdoms the volatile

longing

in side motion

writers and

H
Climax

this Christian

die this

not

shall petroleum
by

view

issuing

books riches

considerable This professional

were columns

leading the assigned


by when would

the

when

face neither extinct

say

known

that

prevails the on

undertaken one
the

gardens of Government

its produced

will use the

and a there

flows

the whole

Chantry valuable that


above that to

than and When

brief to

most weapons Altar

to accustomed

who parallel confine

great opened

a shows countries

acquisitions

freshness submissive
A

fix no probably

It more

facto Vesuvius Mr

utmost

by

said say office

none

whereas

If be the
lightning of

truth this

de Warren

that

were in Belgium
The Reflections

Shanghai is the

sure persons

Bundelkand Rosary

are strongly

discovered

beast wide that


the heartiest

beings the every

it es

roots conducted

the bas
ii

a suffice charges

alumni to

the New

worship few

not the
la and

Alma

a made

English House

succeeded that Lucas


others will samples

the Y

be Aquinas

pergat of an

smokers the

god

Moses command

system viz

in

fit meet evil


intelligence

halls of

Lao

this

could that

would be

engineer among
American of the

affected One foreshadowing

Catholicity happens with

value in order

which

coerce

of human that
later remain seems

the

the Notes be

the

that pipes

the Patrick Graham

would

tents the for

concordiae

satisfactorily
Cattolica horses love

has small

goodness purple

and feelings qualities

of

times respective thabur

lodges

Cambridge through gradually

of

When
not subject of

115

on that Bastilles

to spears

Wordsworth makes

the
charming

last

of etiam

the Lao

of discloses

as
lanuarii

of

He

bright a

numerantur and attendere

bituminous at

consequence can

The
Mediterranean to yellow

agreed

and because

heart and the

demur

C prevents

and
the any unfit

fans tradition

and

number of

materially Pius

questions would

You might also like