Intelligent Network Design Driven by Big Data Analytics IoT AI and Cloud Comput
Intelligent Network Design Driven by Big Data Analytics IoT AI and Cloud Comput
This book will be useful to researchers, scientists, engineers, professionals, advanced students Edited by
and faculty members in ICTs, data science, networking, AI, machine learning and sensing. It
will also be of interest to professionals in data science, AI, cloud and IoT start-up companies, Sunil Kumar, Glenford Mapp and Korhan Cengiz
as well as developers and designers.
This publication is copyright under the Berne Convention and the Universal Copyright
Convention. All rights reserved. Apart from any fair dealing for the purposes of research or
private study, or criticism or review, as permitted under the Copyright, Designs and Patents
Act 1988, this publication may be reproduced, stored or transmitted, in any form or by
any means, only with the prior permission in writing of the publishers, or in the case of
reprographic reproduction in accordance with the terms of licences issued by the Copyright
Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to
the publisher at the undermentioned address:
The Institution of Engineering and Technology
Futures Place
Kings Way, Stevenage
Herts, SG1 2UA, United Kingdom
www.theiet.org
While the authors and publisher believe that the information and guidance given in this
work are correct, all parties must rely upon their own skill and judgement when making use
of them. Neither the author nor publisher assumes any liability to anyone for any loss or
damage caused by any error or omission in the work, whether such an error or omission is
the result of negligence or any other cause. Any and all such liability is disclaimed.
The moral rights of the author to be identified as author of this work have been asserted by
him in accordance with the Copyright, Designs and Patents Act 1988.
1
Introduction to intelligent network design driven by big data analytics,
IoT, AI and cloud computing 1
Sunil Kumar , Glenford Mapp , and Korhan Cengiz
Preface1
Chapter 2: Role of automation, Big Data, AI, ML IBN, and
cloud computing in intelligent networks 2
Chapter 3: An intelligent verification management approach for
efficient VLSI computing system 2
Chapter 4: Evaluation of machine learning algorithms on academic
big dataset by using feature selection techniques 3
Chapter 5: Accurate management and progression of Big Data analysis 4
Chapter 6: Cram on data recovery and backup cloud computing techniques 4
Chapter 7: An adaptive software defined networking (SDN) for load
balancing in cloud computing 5
Chapter 8: Emerging security challenges in cloud computing: An insight 5
Chapter 9: Factors responsible and phases of speaker recognition system 6
Chapter 10: IoT-based water quality assessment using fuzzy logic controller 6
Chapter 11: Design and analysis of wireless sensor network for
intelligent transportation and industry automation 7
Chapter 12: A review of edge computing in healthcare Internet of
Things: theories, practices, and challenges 7
Chapter 13: Image processing for medical images on the basis of
intelligence and bio computing 8
Chapter 14: IoT-based architecture for smart health-care systems 8
Chapter 15: IoT-based heart disease prediction system 9
Chapter 16: DIAIF: detection of interest flooding using artificial
intelligence-based framework in NDN android 9
Chapter 17: Intelligent and cost-effective mechanism for
monitoring road quality using machine learning 10
References10
vi Intelligent network design driven by Big Data analytics
2
Role of automation, Big Data, AI, ML IBN, and cloud computing in
intelligent networks13
Sunil Kumar and Priya Ranjan
2.1 Evolution of networks: everything is connected 13
2.1.1 Intelligent devices 14
2.1.2 Intelligent devices connection with networks 14
2.2 Huge volume of data generation by intelligent devices 15
2.2.1 Issues and challenges of Big Data Analytics 16
2.2.2 Storage of Big Data 16
2.3 Need of data analysis by business 18
2.3.1 Sources of information 19
2.3.2 Data visualization 20
2.3.3 Analyzing Big Data for effective use of business 20
2.3.4 Intelligent devices thinking intelligently 20
2.4 Artificial intelligence and machine learning in networking 21
2.4.1 Role of ML in networks 21
2.5 Intent-based networking 22
2.6 Role of programming 23
2.6.1 Basic programming using Blockly 23
2.6.2 Blockly games 24
2.7 Role of technology to design a model 24
2.7.1 Electronic toolkits 25
2.7.2 Programming resources 26
2.8 Relation of AI, ML, and IBN 26
2.9 Business challenges and opportunities 26
2.9.1 The evolving job market 27
2.10 Security 28
2.10.1 Challenges to secure device and networks 28
2.11 Summary 31
References 31
3
An intelligent verification management approach for efficient VLSI
computing system 35
Konasagar Achyut , Swati K Kulkarni , Akshata A Raut , Siba Kumar Panda ,
and Lakshmi Nair
3.1 Introduction 36
3.2 Literature study 38
3.3 Verification management approach: Case Study 1 46
3.3.1 The pseudo random number generator in a verification
environment 47
3.3.2 Implementation of PRNG in higher abstraction language and
usage of DPI 49
3.4 Verification management approach: Case Study 2 51
Contents vii
4
Evaluation of machine learning algorithms on academic big dataset by
using feature selection techniques 61
Mukesh Kumar , Amar Jeet Singh , Bhisham Sharma , and Korhan Cengiz
4.1 Introduction 62
4.1.1 EDM 64
4.1.2 EDM process 65
4.1.3 Methods and techniques 66
4.1.4 Application areas of data mining 68
4.2 Literature survey 69
4.3 Materials and methods 72
4.3.1 Dataset description 72
4.3.2 Classification algorithms 73
4.3.3 FS algorithms 75
4.3.4 Data preprocessing phase 77
4.4 Implementation of the proposed algorithms 78
4.4.1 Model construction for the standard classifier 78
4.4.2 Implementation after attribute selection using ranker method 79
4.5 Result analysis and discussion 84
4.6 Conclusion 86
References 86
7
An adaptive software-defined networking (SDN) for load balancing in
cloud computing 135
Swati Lipsa , Ranjan Kumar Dash , and Korhan Cengiz
7.1 Introduction 135
7.2 Related works 138
7.3 Architecture overview of SDN 140
7.4 Load-balancing framework in SDN 141
7.4.1 Classification of SDN controller architectures 142
7.5 Problem statement 144
7.5.1 Selection strategy of controller head 144
7.5.2 Network setup 146
7.6 Illustration 147
Contents ix
18 Conclusion397
Sunil Kumar , Glenford Mapp , and Korhan Cengiz
18.1 Conclusion 397
References397
Index 401
This page intentionally left blank
About the Editors
Preface
As enterprise access networks evolve with more mobile users, diverse devices and
cloud-based applications, managing user performance on an end-to-end basis has
become next to impossible. Recent advances in big data network analytics, com-
bined with AI and cloud computing are being leveraged to tackle this growing prob-
lem. The book focuses on how new network analytics platforms are being used to
ingest, analyze and correlate a myriad of infrastructure data across the entire net-
work stack with the goal of finding and fixing the quality of service network perfor-
mance problems.
This book presents new upcoming technologies in the field of networking and
telecommunication. It addresses major new technological developments and reflects
on industry needs, current research trends and future directions. The authors focus
on the development of AI-powered mechanisms for future wireless networking
applications and architectures which will lead to more performant, resilient and
valuable ecosystems and automated services. The book is a primary readership and
is a “must-read” for researchers, academicians, engineers and scientists involved in
the design and development of protocols and AI applications for wireless communi-
cation devices and wireless networking technologies.
All chapters presented here are the product of extensive field research involving
applications and techniques related to data analysis in general, and to big data, AI,
IoT and network technologies in particular.
1
Department of Computer Science & Engineering, Amity University, India
2
Faculty of Science & Technology, Department of Computer Science, Middlesex University, London,
UK
3
Department of Electrical-Electronics Engineering, Trakya University, Edirne, Turkey
2 Intelligent network design driven by Big Data analytics
the help of system Verilog language, the developed reusable testbench is used for
verification. The injected inputs to the testbench are randomized with constraints,
such that the design should produce accurate output. To unify the verification lan-
guage there is a dedicated methodology commonly known as Universal Verification
Methodology (UVM); by this, the article is extended to experience the readers also
through the coverage-based formal verification. For continuous functional verifi-
cation, an intelligent regression model is also developed with the help of ML and
scripting. With this repeated injection of various test cases is possible in order to ver-
ify the functionality. Thus, with the adoption of the presented verification environ-
ment and distinctive approach, one can affirm that the design is ready to be deployed
over the targeted semiconductor chips. As the verification is an unignorable proce-
dure, this can be used to classify the algorithms developed in ML for data clustering,
data encoding and its accurate analysis. More importantly, this chapter allows us to
understand an intelligent verification model for testing the design with regression
run with the corresponding set-up and the pass/failure analysis steps. This structure
may result in a significant reduction of the simulation time for a VLSI verification
engineer.
Keywords: VLSI, verification, intelligent, ASIC.
The present digital world technology is evolving at a rapid pace. To store, manage
and protect the digital information, it is necessary to back up and recover the data
with utmost efficiency. As a solution, cloud computing that offers customers a wide
range of services can be used [11, 12]. Storage-as-a-Service (SaaS) is one of the
cloud platform’s services, in which a large volume of digital data is maintained in
the cloud database. Enterprise’s most sensitive data are stored in the cloud, ensuring
that it is secure and accessible at all times and from all locations. At times, informa-
tion may become unavailable due to natural disasters such as windstorms, rainfall,
earthquakes, or any technical fault and accidental deletion. To ensure data security
and availability under such circumstances, it is vital to have a good understanding
of the data backup and recovery strategies. This chapter examines a variety of cloud
computing backup and recovery techniques.
Keywords: cloud computing, data backup, data recovery, advantage,
disadvantage.
Introduction to intelligent network design driven by big data analytics 5
been discussed. For better clarification, several reviews are conducted on the exist-
ing models.
Keywords: security, cloud computing, saas, pass, iaas, cryptography.
The method of identifying a speaker based on his or her speech is known as automatic
speaker recognition. Speaker/voice recognition is a biometric sensory device that
recognizes people by their voices. Most speaker recognition systems nowadays are
focused on spectral information, which means they use spectral information derived
from speech signal segments of 10–30 ms in length. However, if the received speech
signal contains some noise, the cepstral-based system’s output suffers. The primary
goal of the study is to see the various factors responsible for improved performance
of the speaker recognition systems by modeling prosodic features, and phases of
speaker recognition system. Furthermore, in the presence of background noise, the
analysis focused on a text-independent speaker recognition system.
Keywords: voice recognition, signals, noise, quality.
Water is an essential resource that we use in our daily life. The standard of the water
quality must be observed in real time to make sure that we obtain a secured and clean
supply of water to our residential areas. A water quality-monitoring and decision-
making system (WQMDMS) is implemented for this purpose based on Internet of
Things (IoT) and fuzzy logic (FLC) to decide the usage of water (drinking or tap
water) in a common water tank system. The physical and chemical properties of
data are obtained through continuous monitoring of sensors. The work describes in
detail the design of a fuzzy logic controller for a water quality measurement system,
to determine the quality of water by decision-making, and accordingly, the usage of
water is decided. The WQMDM system measures the physico-chemical characteris-
tics of water like pH, turbidity, and temperature by the use of corresponding analog
and digital sensors. The values of the parameters obtained are used to detect the
presence of water contaminants and accordingly, the quality of water is determined.
The measurements from the sensor are handled and processed by Esp32 and these
refined values follow the rules determined by the fuzzy inference system. The output
highlights the water quality that is categorized as very poor, poor, average, good.
The usage of the water will be determined by the results obtained using the FLC and
as per the percentage of water quality, the water is decided as drinking water or tap
water.
Keywords: WQMDMS, IoT, controller, fuzzy logic.
Introduction to intelligent network design driven by big data analytics 7
Internet of Things (IoT) provides a pathway for connecting physical entities with
digital entities using devices and communication technologies. The rapid growth
of IoT in recent days has made a significant influence in many fields. Healthcare
is one of those fields which will be hugely benefited by IoT. IoT can resolve many
challenges faced by patients and doctors in healthcare. Smart health-care applica-
tions allow the doctor to monitor the patient’s health state without human interven-
tion. Sensors collect and send the data from the patient. Recorded data are stored
in a database that enables medical experts to analyze those data. Any abnormal
change in the status of the patient can be notified to the doctor. This paperwork
aims to study different research works made on IoT-based health-care systems that
are implemented using basic development boards. Various hardware parameters of
health-care systems and sensors used for those parameters are explored. A basic
Arduino-based health-care application is proposed using sensors and global system
for mobile communication (GSM) module.
Keywords: IoT, smart healthcare, patient monitoring.
Introduction to intelligent network design driven by big data analytics 9
In India, around 80 per cent of people who die due to heart disease do not receive
proper treatment. This is a daunting task for doctors as they often do not diagnose
patients properly. The treatment of this disease is very costly. The purpose system
improves the cost-effective treatment using data mining technology to simplify the
Decision Support System. Most hospitals employee a hospital management system
to manage the care of their patients. Unfortunately, many of these programs do not
use big clinical data to extract important information. As these systems generate a
large quantity of data in various embodiment in spite of the data being rarely looked
upon and remains unused. Therefore, in this process, much effort is required to make
wise decisions. This project helps diagnose a disease using various data mining tech-
niques. Currently, diagnosing a disease involves identifying various symptoms and
features of a disease.
Keywords: heart disease prediction, data mining, deep learning.
References
[1] Kumar S., Ranjan P., Ramaswami R. ‘EMEEDP: enhanced multi-hop energy
efficient distributed protocol for heterogeneous wireless sensor network’.
Fifth International Conference on Communication Systems and Network
Technologies; Gwalior, India, IEEE, 2015. pp. 194–200.
[2] Kumar S., Rao A.L.N., Ramaswani R. ‘Energy optimization technique for
distributed localized wireless sensor network’. International Conference
Introduction to intelligent network design driven by big data analytics 11
[14] Chauhan R., Kumar S. ‘Packet loss prediction using artificial intelligence uni-
fied with big data analytics, internet of things and cloud computing technolo-
gies’. 5th International Conference on Information Systems and Computer
Networks (ISCON); Mathura, India, IEEE, 2021. pp. 01–06.
[15] Haidar M., Kumar S. ‘Smart healthcare system for biomedical and health
care applications using aadhaar and blockchain’. 2021 5th International
Conference on Information Systems and Computer Networks, ISCON 2021;
Mathura, India, IEEE, 2022. pp. 1–5.
[16] Punhani A., Faujdar N., Kumar S. ‘Design and evaluation of cubic Torus
Network-on-Chip architecture’. International Journal of Innovative
Technology and Exploring Engineering. 2019, vol. 8(6), pp. 2278–3075.
[17] Dubey G., Kumar S., Kumar S., Navaney P. ‘Extended opinion lexicon and
ML-based sentiment analysis of tweets: a novel approach towards accurate
classifier’. International Journal of Computational Vision and Robotics.
2020, vol. 10(6), pp. 505–21.
Chapter 2
Role of automation, Big Data, AI, ML IBN, and
cloud computing in intelligent networks
Sunil Kumar 1 and Priya Ranjan 2
There are more smart devices in our world today than individuals. A growing num-
ber of people are linked to the Internet 24 hours a day, in some form or another.
A growing number of people own three, four, or more smart devices and rely on
them. Smartphones, fitness and health trackers, e-readers, and tablets are just a few
examples. It is forecast that on average there will be 3.4 smart devices or connections
for every person on earth. The Internet of Things (IoT) is relevant to many indus-
tries. IoT systems contribute to the environmental controls, retail, transportation,
healthcare, and agriculture industries among many others. According to Statista, the
number of IoT devices that are in use across all relevant industries is forecast to
grow to more than 8 billion by 2030. As for consumers, important growth areas are
the Internet and digital media devices, which include smartphones. This area is also
predicted to grow to more than 8 billion by 2030. Other applications with more than
1 million connected devices are connected and autonomous vehicles, IT infrastruc-
ture, asset management, and electric utility smart grid.
All of this is made possible through intelligent networks. The planet is rap-
idly becoming covered in networks that allow digital devices to communicate and
interconnect. Consider the network mesh as a digital skin that surrounds the earth.
Mobile devices, electronic sensors, electronic measuring equipment, medical gad-
gets, and gauges can all link with this digital skin. They keep track of, communicate
with, analyze, and, in some situations, automatically adjust to the data collected and
transmitted.
1
Department of Computer Science & Engineering, Amity University, Uttar Pradesh, India
2
Department of Electronics & Communication Engineering, SRM University, Andhra Pradesh, India
14 Intelligent network design driven by Big Data analytics
businesses? The networks that we utilize daily enable these relationships [1]. The
Internet and the digitized world are built on the foundation of intelligent networks.
The methods we use to communicate are constantly changing. Breakthroughs in
wireless and digital technology have substantially extended the reach of our com-
munications, which were formerly limited by wires and plugs [2]. Through their
connectivity to the Internet, networks in businesses and large organizations can sup-
ply products and services to clients. Networks can also be used on a larger scale to
consolidate, store, and offer access to data on network servers [3].
The Internet is the world’s largest network, and it serves as the “electronic skin”
that protects the globe. The Internet is a collection of private and public networks
that are linked together. The Internet is accessible to businesses, small office net-
works, and residential networks. The data collected, retained, and analyzed by sen-
sors benefit a wide range of companies. Businesses now have a better understanding
of the things they sell and who is buying them. They can streamline production and
target their marketing and advertising to specific areas or audiences using this type
of data, encouraging the development of new business opportunities, and marketing
ideas [4].
powerful computer may be on the same LAN as the controller, or it could only be
reached via the Internet [8, 9].
Actuators are frequently used in conjunction with intelligent devices. Actuators
are devices that accept electrical input and convert it into physical action. For exam-
ple, if a smart device senses excessive heat in a room, the temperature reading is sent
to the microcontroller. The data can be sent from the microcontroller to an actuator,
which will then switch on the air conditioner. The majority of new devices such
as fitness wearables, implanted pacemakers, air meters in a mine shaft, and water
meters in a farm field all require wireless connectivity. Because many sensors are
“out in the field” and are powered by batteries or solar panels, consideration must
be given to power consumption. Low-powered connection options must be used to
optimize and extend the availability of the sensor [10–12].
• They have a significant amount of data that require more storage space as time
goes on (volume).
• They are dealing with an ever-increasing amount of data (velocity).
• They have information in a number of formats (variety).
How much data do intelligent devices collect? Here are some estimated exam-
ples. For comparison, assume that the average MP3 song is about 3 megabytes.
• Intelligent devices in one smart connected home can produce as much as 1 giga-
byte (GB) of information a week, or the equivalent of 333 MP3 songs.
• Intelligent devices in one autonomous car can generate 4,000 gigabits (Gb) of
data per day. That is, 500 gigabytes (GB) of data, which is the equivalent of
about 167,000 MP3 songs.
• Safety intelligent devices in mining operations can generate up to 2.4 terabits
(TB) of data every minute. That is, 300 GB or about 100,000 MP3 songs.
• An Airbus A380 Engine generates 1 petabyte (PB) of data on a flight from
London to Singapore. That is, 1 million GB or about 334 million MP3 songs.
16 Intelligent network design driven by Big Data analytics
2.2.2.1 Edge computing
Edge computing (Figure 2.1) is an architecture that utilizes end-user clients or
devices at the edge of the network to do a substantial amount of the preprocessing
and storage required by an organization. Edge computing was designed to keep the
data closer to the data source for preprocessing. Intelligent device data, in particu-
lar, can be preprocessed closer to where it was collected. The information gained
from that preprocessed analysis can be fed back into companies’ systems to mod-
ify processes if required. Because the sensor data are preprocessed by end devices
within the company system, communications to and from the servers and devices are
quicker. This requires less bandwidth than constantly sending raw data to the cloud
[17, 18]. After the data have been preprocessed, it is often shipped off for long-term
storage, backup, or deeper analysis within the cloud [19].
companies such as Google, Microsoft, and Apple. Cloud storage services are pro-
vided by different vendors such as Google Drive, Apple iCloud, Microsoft OneDrive,
and Dropbox. From an individual’s perspective, using the cloud services allows you:
• to access various programs instead of downloading them onto your local device
• to save all of your data, such as images, music, movies, and emails, freeing up
local hard disc space
• to be able to access your data and applications from any location, at any time,
and on any device
One of the disadvantages of using the cloud is that your data could fall into the
wrong hands. Your data are at the mercy of the security robustness of your chosen
cloud provider. From the perspective of an enterprise, cloud services and computing
support a variety of data management issues.
2.2.2.3 Distributed processing
From a data management perspective, analytics were simple when only humans cre-
ated data. The amount of data was manageable and relatively easy to sift through.
However, with the explosion of business automation systems and the exponential
growth of web applications and machine-generated data, analytics is becoming
increasingly more difficult to manage. In fact, 90 percent of the data on the planet
now were created in the last 2 years. Exponential growth is characterized by an
increase in volume over a short period of time. This high volume of data is difficult
18 Intelligent network design driven by Big Data analytics
to process and analyze within a reasonable amount of time. Rather than large data-
bases being processed by big and powerful mainframe computers and stored in giant
disk arrays (vertical scaling), distributed data processing takes the large volume of
data and breaks it into smaller pieces. These smaller data volumes are dispersed over
multiple sites to be processed by a large number of computers with less powerful
CPUs. Each computer in the distributed architecture examines its own piece of the
Big Data puzzle (horizontal scaling).
Most distributed file systems are designed to be invisible to client programs.
The distributed file system locates files and moves data, but the users have no way
of knowing that the files are distributed among many different servers or nodes. The
users access these files as if they were local to their own computers. All users see
the same view of the file system and are able to access data concurrently with other
users. Hadoop was built to deal with these massive amounts of data. The Hadoop
project began with two components: MapReduce is a distributed, fault-tolerant file
system based on Hadoop Distributed File System, which is a distributed way to
process data. Hadoop has now evolved into a very comprehensive ecosystem of
software for Big Data management. Hadoop is open-source software enabling the
distributed processing of large data sets that can be terabytes in size and that are
stored in clusters of computers. Hadoop is designed to scale up from single servers
to thousands of machines where they offer computation and storage. To make it
more efficient, Hadoop can be installed and run on many virtual machines (VMs).
These VMs can all work together in parallel to process and store the data.
Hadoop uses scalability and fault tolerance as two important features:
• Scalability: With Hadoop, cluster size can easily scale from a 5-node cluster to
a 1,000-node cluster without excessively increasing the administrative burden.
• Fault tolerance: Hadoop creates many replicated files automatically and pro-
vides backup to ensure that data will not be lost. If a disk, node, or a whole rack
fails, the data are safe.
Every organization must become more efficient and innovative to stay competitive
and relevant in the digitized world. The IoT is an integral part of achieving that effi-
ciency and innovation. Many companies want to collect and analyze large amounts
of new product usage data in order to acquire valuable insights. Businesses can use
data analytics to better evaluate the impact of their products and services, alter their
processes and aims, and give better products to their consumers faster. The ability to
gain new insights from their data brings value to the business.
To businesses, data are the new oil. Like crude oil, it is valuable, but if it is
unrefined, it cannot be easily used. Crude oil has to be changed to gasoline, plastic,
chemicals, and other substances to create a valuable product. It is the same with
data. Data must be broken down and analyzed for it to have value. Transactional
Role of automation, Big Data, AI, ML IBN, and cloud computing 19
and analytical data are the two main types of processed data that provide value.
As events occur, transactional data are recorded and processed. Daily sales reports
and production schedules are analyzed using transactional data to decide how much
inventory to keep on hand. Managerial analysis such as assessing whether the com-
pany should establish a new manufacturing plant or hire more salespeople is aided
by analytical data.
Even if data are considered structured, different applications create files in dif-
ferent formats that are not necessarily compatible with one another. Structured data
may need to be manipulated into a common format such as CSV. Comma-separated
value (CSV) files are a type of plaintext file that uses commas to separate columns
in a table of data and the carriage return character to separate rows. Each row is a
record. Although they are commonly used for importing and exporting in traditional
databases and spreadsheets, there is no specific standard. Data formatting techniques
such as JSON and XML are also plaintext file types that use a standard way of
representing data records. These file formats are compatible with a wide range of
applications.
Unstructured data require different tools to prepare data for processing or analy-
sis. The following are two examples:
Web pages are created to provide data to humans, not machines. “Web scrap-
ing” tools automatically extract data from HTML pages. This is similar to a Web
Crawler or spider of a search engine. It explores the web to extract data and cre-
ates a database to respond to the search queries. The web scraping software may
use Hypertext Transfer Protocol or a web browser to access the World Wide Web.
Typically, web scraping is an automated process that uses a bot or web crawler to
do data mining. Specific data are gathered and copied from the web to a database or
spreadsheet. The data can then be easily analyzed.
Many large web service providers such as Facebook provide standardized
interfaces to collect the data automatically using application programming inter-
faces (APIs). The most common approach is to use RESTful APIs. RESTful APIs
use HTTP as the communication protocol and JSON structure to encode the data.
20 Intelligent network design driven by Big Data analytics
Internet websites such as Google and Twitter gather large amounts of static and
time-series data. Knowledge of the APIs for these sites allows data analysts and
engineers to access the large amount of data that are constantly being generated on
the Internet.
2.3.2 Data visualization
The mined data must be analyzed by intelligent tools and techniques and should be
presented to managers and decision-makers to be of value with minimum errors.
There are many different visualizations that can be used to present the value in the
data. Determining the best chart to use will vary based on the following:
self-driving car. When a gadget makes a choice or takes a course of action based
on information from the outside world, it is referred to as a smart device. The word
“smart” now appears in the titles of many of the devices with which we interact.
This suggests that the device can change its behavior in response to its surroundings.
2.4.1.1 Speech recognition
Many different companies now offer digital assistants which allow you to use speech
to communicate with a computer system. Apple, Microsoft, Google, and Amazon all
offer this service. These companies not only allow commands to be given verbally
but also offer speech-to-text capabilities.
2.4.1.2 Product recommendation
Systems build up a customer profile and recommend products or services based on
previous patterns. Users of Amazon and eBay receive recommendations on prod-
ucts. Organizations such as LinkedIn, Facebook, and GooglePlus recommend users
you may wish to connect with.
22 Intelligent network design driven by Big Data analytics
2.4.1.4 Facial recognition
Security cameras are everywhere, from stores and streets to airports and transporta-
tion hubs. These cameras continually scan the crowds, normally watching for dan-
gerous or illegal activities, but they can also be used to identify and track individu-
als. The system builds a pattern of specific facial features and then watches for a
match to these facial patterns triggering some action.
2.4.1.5 Anomaly detection
In manufacturing, mining, transportation, and other areas, ML can be used to learn
about normal operating conditions in mechanical systems. This makes it possible
to detect unusual operating conditions that could signal that something is ready to
break. Over time, ML not only detects anomalies but also suggests what the most
likely cause of the anomaly may be. Anomaly detection is commonly used in pre-
ventive maintenance but can also be used to detect other unusual conditions that
could signal safety or security issues in other types of systems such as data networks.
2.5 Intent-based networking
For a business to survive, it must be agile and respond quickly to the needs and
demands of its customers. Businesses are increasingly dependent on their digital
resources to meet customer demands, so the underlying IT network must also be
responsive enough to quickly adapt to these requirements. This normally involves
adjustments to many systems and processes. These adjustments may include
changes to security policies and procedures, business services and applications, and
operational policies. With traditional networks, many different components must
be manually adjusted to meet ever-changing business requirements. This requires
different technicians and engineers to ensure that the systems are changed to allow
them to work together to accomplish their goals. This sometimes results in errors
and delays and often in suboptimal network performance.
In order to be nimble, responsive, and business-relevant, the new business network
must seamlessly and securely incorporate IoT devices, cloud-based services, and remote
offices. The network must also protect these new digital activities from the constantly
evolving threat landscape. To meet this demand, the IT industry has begun to develop
a systematic strategy to link infrastructure management to business goals. Intent-based
networking is the name given to this method. The diagram depicts the general concept of
Role of automation, Big Data, AI, ML IBN, and cloud computing 23
2.6 Role of programming
It is usual for programmers to write the first draft of a program in a programming lan-
guage that they are unfamiliar with. These language-independent programs, which are
typically referred to as algorithms, are centered on logic rather than syntax. A flowchart
is a visual representation of an algorithm. System software and application software are
the two most popular types of computer software. Application software programs are
designed to perform a certain task or set of tasks. Cisco Packet Tracer, for example, is
a network simulation application that allows users to model complicated networks and
ask “what if” network behavior questions. System software connects the hardware of
the computer to the application program. The system software is responsible for control-
ling the computer hardware and allowing application programs to run. Linux, Apple
OSX, and Microsoft Windows are all instances of system software. A programming
language is used to construct both system and application software. A programming
language is a set of rules for writing programs that send instructions to computer hard-
ware. Algorithms are self-contained, step-by-step sets of operations to be done, and these
programs implement them. Some programming languages compile their code into a col-
lection of machine instructions. C++ is an example of a computer language that is com-
piled. Others do not compile these instructions into machine code before interpreting
them. An interpreted programming language such as Python is an example. The process
of creating a program can begin once the programming language has been decided and
the process has been diagrammed in a flowchart. Program architectures are similar in
most computer languages.
for sensors and actuators. Blockly can translate its block-based code into Python or
JavaScript. This is very useful to beginner programmers.
2.6.2 Blockly games
Google offers a number of free and open-source instructional games to aid program-
ming learning. Blockly Games is the name of the game series.
Visit https://blockly.games to discover more about Blockly Games or to give it
a try.
To assist you to get started, there are several stages to accomplish. Blockly may
appear to be a toy, but it is a fantastic tool for honing your logical reasoning skills,
which is one of the fundamentals of computer programming. The first part of this
session covered how to use basic programming to support IoT devices. Flowcharts
are diagrams that show how processes work. System software and application soft-
ware are the two most popular types of computer software. Application software is
designed to complete a certain task.
Variables in programming can be divided into two groups:
IF-THEN, FOR Loops, and WHILE Loops are the most frequent logic constructs.
Blockly is a visual programming tool designed to assist novices in learning pro-
gramming fundamentals. Blockly uses colored blocks to implement visual program-
ming by assigning different programming structures to them.
Python is a widely used programming language that is intended to be simple to
learn and write. Python is an interpreted language, so parsing and executing Python
code necessitates the use of an interpreter.
How do you create a prototype? There are several options for getting started.
The Google Glass was created by a team at Google using the “Rapid Prototyping
Method.” To see a video regarding Google’s approach to prototyping, look up
“google glass quick prototype TED talk” on the Internet. Google, of course, has
plenty of cash to pay for the people and materials involved in prototyping. Most of
us will require financial assistance to bring our ideas from our thoughts to a proto-
type. There is crowdfunding for us. Kickstarter, Indiegogo, and Crowdfunder are
just three of the several online crowdfunding platforms available. Look for “Pebble
Time Kickstarter Video” on the Internet.
Of course, the Internet is a good place to start. People have been exchanging
ideas for millennia, but the Internet has taken it to a whole new level. People who
have never met in person are able to interact and work together now. There are a
number of websites where you may connect with other creators. Maker Media is a
global platform that brings together makers to share projects and ideas. Makers can
also use the platform to find and purchase products for their projects. Make a search
for Makezine on the Internet for further information.
2.7.2 Programming resources
Programming is essential in the IoT. When designing an IoT solution, bespoke code
is really handy. Blockly and Python have already been introduced to you. There are
plenty of other free tools available to assist you in honing your programming skills.
MIT OpenCourseWare (OCW) is a web-based repository of nearly all MIT
course materials. OCW is an excellent site to learn about computer programming
for free because it is open to the entire world. http://ocw.mit.edu/courses/intro-pro-
gramming/ contains OCW programming-related courses.
See? I told you! It does not have to be tough to learn basic programming. It is
possible to have a good time! You now have some very powerful starting tools after
creating a process flowchart and using Blockly and Python. What do you think you
could make for the IoT? How could a small prototype help you get started? It can
be entertaining, such as programming a remote-controlled toy to play with your cat
while you are away. Programming a heat sensor for a newborn’s bed, for example,
may be lifesaving. I am willing to wager that once you have had some experience
prototyping in the IoT, you will begin to see things differently.
The IoT has a lot of advantages, but it also has a lot of drawbacks. We are now faced
with an ever-expanding collection of new technologies that we must learn since the
IoT is a transformational technology. The IoT is transforming our lives in every way.
This is not the first time we have seen a technical advancement with such ram-
ifications. Farm mechanization enhanced the productivity of accessible farmland
and triggered a population shift from rural to urban areas. The invention of the
Role of automation, Big Data, AI, ML IBN, and cloud computing 27
vehicle allowed for improved workforce mobility and recreational pursuits. Many
mundane operations could be automated with greater precision and efficiency
thanks to the personal computer. On a worldwide scale, the Internet began to
break down geographical barriers and enhance equality among people. These are
only a few examples of transformative innovations that have occurred in recent
years. Each of these technologies brought significant changes to an established
culture and was initially received with fear and apprehension. The underlying
benefits were apparent when the first dread of the unknown was overcome and the
technology was accepted. Each perceived problem brings with it a slew of new
possibilities.
Can you imagine how different your life would be if you did not have a car, a
computer, or Internet access?
• Collaboration
• Enterprise Networks
• Data Center and Virtualization
• AI
• Application Development
• IoT Program Developer
• IoT Security Specialist
Not all of the jobs created by the IoT are IT-related. The IoT should be viewed
as an enabling technology with applications in all industries and elements of our
daily life. Within its sphere, the IoT has spawned a plethora of work opportunities.
These positions are available across the design, development, and implementation
of the IoT. There are a few major categories that describe the various career options
available in today’s digital world:
• Enablers: These positions are responsible for developing and implementing the
underlying technology.
• Engagers: These professionals plan, develop, integrate, and provide IoT ser-
vices to customers.
• Enhancers: These positions create their own value-added services in addition to
Engagers’ services, which are unique to the IoT.
28 Intelligent network design driven by Big Data analytics
The IoT is also creating a need for a new type of IT expert. These are people
who have the knowledge and abilities needed to create new IoT-enabled goods and
analyze the data they collect.
A workforce with expertise in both information science and software or com-
puter engineering is required. With addition, in the IoT, operational and information
technologies are merging. People must interact and learn from one another in order
to comprehend the things, networks, and procedures that exploit the IoT’s boundless
potential.
We must stay current in order to harness the full potential of what the IoT has
to offer, given the ever-changing landscape of the digitized world. As new technolo-
gies emerge, the work sector will continue to offer additional opportunities. At the
same time, the skill sets required for these positions will evolve, necessitating the
need for lifelong learning.
2.10 Security
Many companies’ data have been accessed by hackers over the years. The outcome
has been the leaking of millions of users’ data on the Internet, which has had a
tremendous impact. Login passwords and other personal data linked to more than
one million Yahoo and Gmail accounts are purportedly being sold on the dark web
marketplace, according to recent reporting. Usernames, emails, and unencrypted
passwords are purportedly included in the online accounts for sale on the Dark Web.
The accounts are considered to be the result of numerous big cyberattacks rather
than a single data leak.
In July 2017, hackers broke into Equifax (EFX), one of the largest credit
bureaus, and stole the personal information of 145 million people. Because of the
amount of sensitive information revealed, including Social Security numbers, it
was deemed one of the greatest data breaches of all time. Two months later, the
business revealed the hack. Because the stolen data could be exploited for identity
fraud, it could have a long-term impact. In 2018, the company’s food and nutri-
tion app, MyFitnessPal, was hacked, affecting an estimated 150 million subscribers.
According to the inquiry, usernames, email addresses, and hashed passwords may
have been compromised.
Some IoT devices that are connected to the Internet can interact with the physical
environment. They have found their way into gadgets, automobiles, our bodies, and
our houses. Sensors could collect information from the refrigerator or heating sys-
tem. They could also be found mounted to tree trunks or in city lampposts. Physical
security is difficult or impossible to achieve in many nontraditional locations.
Sensor-enabled IoT devices may be positioned in inaccessible or remote areas,
making human intervention or configuration nearly impossible. The devices are fre-
quently built to last far longer than ordinary high-tech equipment. Some IoT devices
are purposely created without the capacity to be upgraded, or they may be placed
in locations where reconfiguring or upgrading is difficult or impossible. Every day,
new vulnerabilities are discovered. If a device cannot be upgraded, the vulnerability
will remain for the duration of its life. If a gadget may be upgraded, the average con-
sumer may not have a technical background; hence the upgrading procedure should
be automated or simple enough for a layperson to complete.
2.10.1.1 Wi-Fi security
Because they are simple to set up and operate, wireless networks are popular in all
types and sizes of enterprises. The organization must provide a wireless experience
that is both mobile and secure for both employees and visitors. Hackers within range
can access and infiltrate a wireless network if it is not properly secured.
regular basis. You can take a number of steps to safeguard your company’s wireless
network. Keep your firewall set on, manage your operating system and browser, and
use antivirus and antispyware to keep your devices safe.
If you are utilizing a public or unsecured Wi-Fi hotspot, observe these safety
guidelines: Do not access or transfer any sensitive personal information over a pub-
lic wireless network.
2.11 Summary
In this chapter, we discussed the latest technologies in the field of intelligent net-
works. We also discussed the roles and relationships of AI, ML, Big Data, IBN, and
cloud computing to improve the performance and QoS of networks.
References
[1] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘EMEEDP: enhanced
multi-hop energy efficient distributed protocol for heterogeneous wire-
less sensor network’. Proceedings of the 5th International Conference on
Communication Systems and Network Technologies, CSNT 2015; 2015. pp.
194–200.
[2] Kumar S., Ramaswami R., Rao A.L.N. ‘Energy optimization in distrib-
uted localized wireless sensor networks’. Proceedings of the International
Conference on Issues and Challenges Intelligent Computing Technique
(ICICT); Ghaziabad, India, IEEE, 2014. pp. 350–55.
[3] Chauhan R., Kumar S. ‘Packet loss prediction using artificial intelligence
unified with big data analytics, internet of things and cloud computing tech-
nologies’. 2021 5th International Conference on Information Systems and
Computer Networks (ISCON), 2021; Mathura, India, IEEE, 2021. pp. 1–6.
[4] Sudhakaran S., Kumar S., Ranjan P., Tripathy M.R. ‘Blockchain-based
transparent and secure decentralized algorithm’. International Conference
on Intelligent Computing and Smart Communication 2019. Algorithms for
Intelligent Systems; Singapore: Springer; 2020.
[5] Kumar S., Trivedi M.C., Ranjan P. Evolution of software-defined network-
ing foundations for iot and 5G mobile networks. IGI USA: IGI Publisher;
2020. pp. 1–235. Available from https://www.igi-global.com/book/
evolution-software-defined-networking-foundations/244540
32 Intelligent network design driven by Big Data analytics
[6] Kumar S., Ranjan P., Radhakrishnan R., Tripathy M.R. ‘Energy efficient mul-
tichannel MAC protocol for high traffic applications in heterogeneous wireless
sensor networks’. Recent Advances in Electrical & Electronic Engineering.
2017, vol. 10(3), pp. 223–32.
[7] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘Resource efficient clus-
tering and next hop knowledge based routing in multiple heterogeneous wire-
less sensor networks’. International Journal of Grid and High Performance
Computing. 2017, vol. 9(2), pp. 1–20.
[8] Kumar S., Cengiz K., Vimal S., Suresh A. ‘Energy efficient resource migra-
tion based load balance mechanism for high traffic applications IoT’. Wireless
personal communications. 2022, vol. 10(3), pp. 1–14.
[9] Haidar M., Kumar S. ‘Smart healthcare system for biomedical and health care
applications using aadhaar and blockchain’. 5th International Conference on
Information Systems and Computer Networks, ISCON 2021; GLA University,
Mathura, 22–23 October 2021; 2022. pp. 1–5.
[10] Punhani A., Faujdar N., Kumar S. ‘Design and evaluation of cubic to-
rus network-on-chip architecture’. International Journal of Innovative
Technology and Exploring Engineering. 2019, vol. 8(6), pp. 2278–3075.
[11] Dubey G., Kumar S., Kumar S., Navaney P. ‘Extended opinion lexicon and
ML-based sentiment analysis of tweets: a novel approach towards accurate
classifier’. International Journal of Computational Vision and Robotics.
2020, vol. 10(6), pp. 505–21.
[12] Singh P., Bansal A., Kamal A.E., Kumar S. ‘Road surface quality monitoring
using machine learning algorithm’ in Reddy A.N.R., Marla D., Favorskaya
M.N., Satapathy S.C. (eds.). Intelligent Manufacturing and Energy
Sustainability. Smart Innovation, Systems and Technologies. 265. Singapore:
Springer; 2022.
[13] Kumar S., Ranjan P., Radhakrishnan R., Tripathy M.R. ‘Energy aware dis-
tributed protocol for heterogeneous wireless sensor network’. International
Journal of Control and Automation. 2015, vol. 8(10), pp. 421–30.
[14] Kumar S., Ranjan P., Tripathy M.R. ‘A utility maximization approach to MAC
layer channel access and forwarding’. Progress in Electromagnetics Research
Symposium; PIERS 2015 Prague, Czech Republic, 2015. pp. 2363–67.
[15] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘An NS3 implementa-
tion of physical layer based on 802.11 for utility maximization of WSN’.
Proceedings of the International Conference on Computational Intelligence
and Communication Networks; Jabalpur, India, IEEE, 2016. pp. 79–84.
[16] Sharma A., Awasthi Y., Kumar S. ‘The role of blockchain, AI and iot for smart
road traffic management system’. 2020 IEEE India Council International
Subsections Conference (INDISCON), 2020; Visakhapatnam, India, IEEE,
2020. pp. 289–96.
[17] Singh P., Bansal A., Kumar S. ‘Performance analysis of various information
platforms for recognizing the quality of indian roads’. 2020 10th International
Conference on Cloud Computing, Data Science & Engineering (Confluence),
2020; Noida, India, IEEE, 2020. pp. 63–76.
Role of automation, Big Data, AI, ML IBN, and cloud computing 33
[18] Kumar S., Ranjan P., Singh P., Tripathy M.R. ‘Design and implementation of
fault tolerance technique for internet of things (iot)’. 2020 12th International
Conference on Computational Intelligence and Communication Networks
(CICN), 2020; Bhimtal, India, IEEE, 2020. pp. 154–59.
[19] Singh P., Bansal A., Kumar S. ‘Performance analysis of various information
platforms for recognizing the quality of indian roads’. 2020 10th International
Conference on Cloud Computing, Data Science & Engineering (Confluence),
2020; Noida, India, IEEE, 2020. pp. 63–76.
[20] Reghu S., Kumar S. ‘Development of robust infrastructure in networking to
survive a disaster’. 2019 4th International Conference on Information Systems
and Computer Networks (ISCON), 2019; Mathura, India, IEEE, 2019. pp.
250–55.
[21] Chauhan R., Kumar S. ‘Packet loss prediction using artificial intelligence uni-
fied with big data analytics, internet of things and cloud computing technolo-
gies’. IEEE; Mathura, India, 2021. pp. 1–6.
This page intentionally left blank
Chapter 3
An intelligent verification management
approach for efficient VLSI computing system
Konasagar Achyut 1, Swati K Kulkarni 2, Akshata A Raut 3,
Siba Kumar Panda 4, and Lakshmi Nair 5
Any masterpiece is conjoined with all works of engineering, which includes the field
of computer science or an electrical and electronics or mixture of both computer
and electronics. Today, this gives the industry to understand research, evolve and
develop into newer technology unfolding many scriptures behind the engineering
works. In the similar manner, this chapter unfolds the prominent works involved in
the verification of the designs involved in VLSI domain. Considering machine learn-
ing (ML), neural networks and artificial intelligence (AI) concepts and applying
these to a wide range of verification approaches are quite interesting. The specific
kinds of Register Transfer Level (RTL) design require rigorous verification which
is targeted over any type of Field Programmable Gate Array (FPGA) or application-
specific integrated circuits (ASICs). The verification process should be closed with
testing all possible scenarios that too with intelligent verification methods. This
chapter in the following pages brings the unique way of verification procedure
involved in the RTL development methodologies using hardware description lan-
guages. With the help of system Verilog language, the developed reusable testbench
is used for verification. The injected inputs to the testbench are randomized with
constraints, such that the design should produce accurate output. To unify the veri-
fication language, there is a dedicated methodology commonly known as Universal
Verification Methodology (UVM); by this, the chapter is extended to experience the
readers also through the coverage-based formal verification. For continuous func-
tional verification, an intelligent regression model is also developed with the help
of ML and scripting. With this repeated injection of various test cases is possible in
1
J.B. Institute of Engineering & Technology, Hyderabad, India
2
Department of Applied Electronics, Gulburga University, Kalaburgi, Karnataka, India
3
Department of Electronics & Telecommunication Engineering, Fr. C. Rodrigues Institute of
Technology, Navi, Mumbai, India
4
Mobiveil Technologies India Pvt. Ltd, Chennai, India
5
Amrita School of Engineering, India
36 Intelligent network design driven by Big Data analytics
order to verify the functionality. Thus, with the adoption of the presented verification
environment and distinctive approach, one can affirm that the design is ready to be
deployed over the targeted semiconductor chips. As the verification is an unignor-
able procedure, this can be used to classify the algorithms developed in ML for data
clustering, data encoding and its accurate analysis. More importantly, this chapter
allows us to understand an intelligent verification model for testing the design with
regression run with the corresponding set-up and the pass/failure analysis steps. This
structure may result in a significant reduction of the simulation time for a VLSI
verification engineer.
3.1 Introduction
3.2 Literature study
Electronic circuits have a significant problem in terms of power usage. One strategy
to cope with this problem is to think about it early in the design phase so that it can
An intelligent verification management approach 39
to FPGAs from 1992 to 2018. Few among the top 150 applications were categorized
as follows: digital control, communication interfaces, networking, computer secu-
rity, cryptographic approaches, ML, digital signal processing, image and video pro-
cessing, big data, computer algorithms and other applications. DV technologies are
impacted by advancements in electronic system design. The Internet-of-Things and
Cyber-Physical Systems paradigms presume devices that were immersed in physical
surroundings, have limited resources, and must provide security, privacy, dependa-
bility, performance and low-power features [7]. In addition, a preliminary method to
multidimensional verification using ML techniques was assessed. The constrained-
random stimulus has become widespread as a technique of stimulating a design’s
functioning and ensuring it completely satisfies expectations as ICs have become
increasingly complicated. In theory, random stimuli permit all conceivable combi-
nations to be exercised given enough time, but in fact, a completely random tech-
nique will struggle to train all possible combinations in a timely manner with very
complicated designs. As a result, steering the DV environment to generate difficult-
to-hit combinations is frequently required. The resulting constrained-random tech-
nique is effective, but it frequently requires substantial human guidance to effec-
tively exercise the design in the DV context. This method produces better-than-random
results in a highly automated manner, allowing DV to meet its complete design
coverage goals in a shorter time frame and with minimum resources [8]. In Reference
[9], the goal of metric-driven verification is discussed as well as its significance in
the current verification technique. It is proposed that the regression testing technique
be included in the verification environment itself, due to known restrictions. The
data gathered from the verification metrics can be utilized to fine-tune the simula-
tion’s testing operations. This method enables dynamic manipulation of the regres-
sion testing framework and may result in significant simulation time savings. The
test segments were introduced in the presented solution. They can be launched at
any moment in the simulation that is connected to the internal state of the DUT
(checkpoint). Many new protocols are gaining traction and are now commonly uti-
lized in the business. The AMBA Advanced Extensible Interface Protocol, devel-
oped by ARM processors, includes features that were not available in the preceding
Advanced High-Performance Protocol. The practical results of several of the proto-
col’s properties are shown in this document. It should be mentioned that verifying
such protocols is a time-consuming and arduous operation. A verification IP already
has the essential testbench-generating processes and may be readily combined with
other tools [10]. Verifying a design basically needs well-prepared strategies allow-
ing many vendors to rely over a strict verification plan for the whole system [11].
Many of the negative consequences of the process, voltage and temperature changes
can be mitigated by bundled-data designs. They described an open-source CAD
pipeline for synthesizing block design (BD) from RTL specifications using commer-
cial tools and Perl and Tcl scripts. Edge is a flow that was created with the help of an
open-source CPU called Plasma and Qualcomm Technologies, Inc. industrial design
[12]. There has been a lot of research into reconfigurable technology. The applica-
tion of FPGA as a developing reconfigurable platform for a controller solution was
the subject of this review. It is an embedded system that allows for a great deal of
An intelligent verification management approach 41
freedom in the development of sophisticated ICs. The review’s three goals are as
follows: To begin, it looks into the contributions of FPGA-based controllers rather
than ASIC-based controllers. Second, to demonstrate the influence and success cri-
teria from past studies to improve controllers in terms of performance (response
time, complexity, flexibility and cost design). Third, to determine the optimum way
for combining these criteria to significantly increase the controller performance. To
improve factor design, the majority of studies rely on four important criteria: effi-
ciency of tuning and optimization methods, robust implementation technology to
reduce cost with more flexible design, type of technology to reduce complexity and
minimize design area and the ability of the controller to be used with different order
systems which could greatly improve controller responses [13]. Today’s technology
has advanced to the point that a system may be put on a single chip, a concept known
as system on chip (SoC). It entails microcontrollers and a variety of peripheral
devices, each of which has its intellectual property (IP) called IP cores. Various pro-
tocols such as RS232, RS422 and UART are used to create serial communication
between these IP cores. They execute point-to-point communication, which neces-
sitates extensive wiring and multiplexing of all bus connections to transmit data to
IP Cores. The benefit of I2C protocol is that it has a low wiring data transfer rate,
which may be increased by using ultrafast mode. The ultrafast data transfer mode is
a one-way data transfer mode. They use the system Verilog and UVM in the tool
SimVision to verify the design of an I2C protocol between a master and a slave in
this study [14]. The design of a 3-wire SPI protocol chip for ASICs and field-
programmable gate arrays is presented in this study (FPGA). It is the first study to
implement the SPI protocol using VLSI and FPGA technologies for testing and veri-
fication. The functions of the SPI protocol have been successfully tested using an
oscilloscope. With only four pads, this study develops superior VLSI architecture
(including system clock in ASIC design) with cheap cost and minimal complexity
[15]. Hardware and software are designed and verified separately in a typical verifi-
cation environment. The frequency of processors is no longer growing due to
Moore’s law’s extinction. Moore’s law is no longer valid, and time-to-market con-
cerns are driving a new wave of technology that tightly integrates hardware and
software. This is to improve computing performance. Furthermore, data in domains
such as Data Analytics must be processed in real-time and with minimal latency. For
applications that demand real-time and low-latency data processing, Solarflare’s
Application On-load Engine is a platform that combines an FPGA processing engine
and low-latency server. Co-verification is not supported by the UVM, which instead
relies on the DPI for co-simulation. Software emulation is done concurrently with
hardware verification in co-verification when software is emulated separately on an
Instruction Set Simulator [16].
Due to the enormous number of IPs in the system that needs to communicate,
Network on Chip (NoC) has evolved as an interconnection option for modern digital
systems, particularly SoC. Various systems and routers have been implemented,
necessitating the creation of a reusable verification environment for testing single
routers as well as entire networks. They offered a reusable testing environment for
NoC platforms based on the UVM, which tests and certifies both routers and
42 Intelligent network design driven by Big Data analytics
networks in a way that can be easily modified to match different routers and net-
works. Performance factors such as injection rate, throughput and latency were also
evaluated by the environment [17]. Despite decades of effort, DV is still a costly and
time-consuming part of the electronic system development process. With the intro-
duction of SoC architectures, verification has evolved into a multi-platform, multi-
tool, multi-technology process that spans the whole life cycle. This paper presented
an instructional overview of current verification practice in the context of modern
SoC DV, current industry trends and significant research problems [18]. In contrast
to other Hardware Verification Languages (HVLs) such as Specman E, interactive
debug is not a native component of SystemVerilog simulations. We propose an
interactive debug library for UVM written in the SystemVerilog Direct Programming
Interface in this work (SV-DPI). At its most basic level, this allows for high-level
interactive debugging during simulation execution. This library offers the following
features: 1) writing or reading a register using the UVM register abstraction layer; 2)
generating, randomising, initialising and starting a UVM sequence on any sequencer;
and 3) calling a user-specified SV function or task using the interactive debug
prompt. They were also shown a debugging and regression approach that outlined
the best practices for speeding up debugging and reducing regression run time.
According to preliminary findings, employing an interactive debug library lowered
debug turnaround time and regression run time dramatically. The UVM debug
library is an open-source project that can be found on GitHub [19]. UVM stands for
Universal Verification Methodology, which is a standardized approach to testing IC
designs to achieve Coverage-Driven Verification (CDV). To signal progress in the
DV, it combines automatic test generation, self-checking testbenches and coverage
measurements. The CDV follows a different pattern than typical directed-testing
methods. With the CDV, a testbench developer begins with a structured plan by
defining the verification goals. Coverage monitors, which have been introduced to
the simulation environment, are used to track progress. The non-exercised function-
ality can be detected this way. Furthermore, the additional scoreboards reveal unfa-
vourable DUT behaviour. Three recent ASIC and FPGA projects that have
successfully incorporated the new workflow have been developed: the CLICpix2
65 nm CMOS hybrid pixel readout ASIC design; C3PD 180 nm HV-CMOS active
sensor ASIC design and the CLICpix chip’s FPGA-based DAQ system [20]. The
methodology aids us in improving the SoC’s performance and lowering its costs.
The most difficult task in the entire SoC is verification. Chips with multi-million
gate designs have a higher level of complexity. In this study, the complexity issue in
SoC was handled using an advanced technique called advanced verification method-
ology. The verification task’s overall design effort is estimated to be almost 70%.
The verification methodology utilized here is primarily aimed at maximising reus-
ability for various design IP configurations in the SoC. The SoC’s time to market is
shortened thanks to its advanced reusable test bench development. The development
of a test bench for advanced verification of the SoC [21]. The SystemVerilog UVM
aims to boost verification efficiency by allowing teams to exchange tests and test
benches across projects and divisions. One of the most crucial steps in the ASIC/
VLSI design process is verification which takes up a lot of time and effort in the
An intelligent verification management approach 43
design flow cycle to ensure that the design is bug-free. As a result, a powerful and
reusable verification approach is in high demand. UVM is based on various method-
ologies such as verification methodology manual (VMM), open verification meth-
odology (OVM) and e reuse methodology (eRM). It can be used to verify designs
written in a variety of languages, including Verilog, VHDL and System Verilog.
UVM allows for a reusable verification environment, which saves time during the
verification cycle [22]. Software models are used as a golden reference model for the
algorithm after it has been finalize. They described a unified and modular modelling
framework for image signal processing algorithms that can be used for a variety of
applications, including ISP algorithm development, reference for hardware imple-
mentation, reference for firmware implementation and bit-true certification. The
functional verification framework of image signal processors utilising software ref-
erence models is based on the UVM. IP-XACT-based solutions for the automatic
production of functional verification environment files and model map files are also
well described [23]. Technical publications frequently make subjective or unfounded
assertions regarding today’s functional verification process, such as verification
consumes 70% of a project’s overall work. Despite this, there are few trustworthy
industry studies that quantify the functional verification process in terms of verifica-
tion technology uptake, effort and efficacy. To overcome this knowledge gap, a
recent global, double-blind functional verification study covering all electronic
industry market categories was conducted. This is the largest independent functional
verification research ever undertaken to our knowledge [24]. FPGA technology has
progressed significantly. Some advancement may be in terms of previously avail-
able resources, while others may be in terms of overcoming traditional limitations or
improving efficiency. The transition in focus to soft core/embedded processors has
resulted in the transformation of FPGA from just hardware to a powerful System-
on-Chip. IP cores and design tools have also progressed, with software developers
providing improved support. The emergence of new FPGA capabilities on various
industrial applications is examined in three primary areas: digital real-time applica-
tions, sophisticated control approaches and electronic instrumentation, with a focus
on mechatronics, robotics and power electronics [25]. Because of its great efficiency
and low power consumption, FPGA has become popular in new big data architec-
tures and systems, allowing researchers to install enormous accelerators on a single
chip. In this work, they propose the software defined accelerator (SODA) technol-
ogy, which is a software-defined FPGA-based accelerator that aids in the re-
construction and re-organization of acceleration engines based on the needs of
various computation applications. This SODA is made up of several layers. Large
and complicated applications are decomposed into coarse-grained single-purpose
RTL code libraries that perform specialised tasks in out-of-order hardware [26].
According to Moore’s law, hundreds of gates are being added to the SoC architec-
ture as technology shrinks. Meanwhile, getting a product to market is growing more
complicated. Due to the rising complexity of SoC, verification has become more
difficult. Because current methods are unable to handle the efficiency of growing
design sizes, there is a need for innovation [27]. With the advent of multi-core com-
puters, parallelizing performance-critical applications of many types has become a
44 Intelligent network design driven by Big Data analytics
difficulty. Parallel computation aids in the division of big and difficult jobs into
smaller units, each of which is handled by a different processor. This aids in improv-
ing performance and completing difficult jobs. A customizable parallel device
known as a FPGA remains a minor area of study in parallel computing. The FPGA
was discussed in this research study as a technique for improving parallel applica-
tions by functioning as a co-processor rather than conventional CPUs. The parallel
architecture of an FPGA allows complex operations to be moved from the CPU and
into specially designed logic inside the FPGA, resulting in excellent performance at
a low cost. FPGAs are useful as development tools and are more cost-effective for
computing applications. As a result, for a surging number of higher volume applica-
tions, FPGAs have shown to be a better option to traditional CPUs [28]. The overall
development process becomes increasingly critical and demanding as the design
complexity of SoC verification grows. UVM is a standard solution to SoC design
complexity, even though there are still unsolved issues. The future of SoC verifica-
tion technique is discussed by specialists from business and academia in this publi-
cation [29]. Ultra-High Frequency (UHF) radio frequency identification (RFID) is a
fast-growing technology that uses RF signals to automatically identify things. RFID
is currently being used for a variety of applications, including vehicle tracking and
security, as well as asset and equipment tracking. Due to the module-reuse strategy
and low-power techniques used in the digital baseband, the RFID tag’s power con-
sumption is minimized. For verification, an intelligent and adaptable testbench
architecture based on UVM is developed. To improve efficiency and provide auton-
omous stimuli, complete algorithms for pathfinding (CAP) based on state transition
graph are employed [30]. SoC verification is one of the popular trends in VLSI right
now. Verification takes up most of the time. As a result, it was necessary to create a
usable and efficient verification environment. UVM is a SoC functional verification
methodology. UVM explains how to verify an IP and set up a productive verification
environment. To distinguish between UVM-based and traditional verification,
research was conducted [31]. The major reason for functional verification on hard-
ware is to discover issues in the designers’ design specifications and to evaluate the
design functioning. The design is then tweaked as needed to achieve the intended
functionality of the DUT. There are a variety of verification strategies that make the
process more user-friendly. SystemVerilog is a huge and complex programming
language. UVM is a powerful and versatile class library that has evolved over time
[32]. UVM includes powerful SystemVerilog capabilities. It includes a reference
verification mechanism in addition to a robust base class library. It was investigated
whether the addition of certain SystemVerilog features aided in the evolution of
UVM. The combination of UVM and SystemVerilog has been found to provide
users with a comprehensive toolbox that can be used to solve a variety of challenges
in the functional verification domain of hardware designs [33]. The Ninth Haifa
Verification Conference HVC 2013 [34] organized by IBM Research was to advance
the state-of-the-art and profession in verification and testing. A forum of industry
experts, academic practitioners and researchers gathered here to share their work,
exchange ideas and explore future testing and verification prospects for hardware,
software and complex hybrid systems. One of the current difficulties in the
An intelligent verification management approach 45
pre-silicon validation area is the lack of a single framework for all pre-silicon valida-
tion efforts, including security validation. As a result, multiple validation tool teams
are having trouble communicating. As each team works to improve a specific tool
for the project they serve, the project’s geography shifts to keep up with the product
development life cycle, tool updates and addressing specific tool adjustments, all of
which must be communicated among several teams [35]. UVM is a verification
approach for the IP and SoC. This is based on the SystemVerilog HVL and allows
verification components to be reused. In complex systems, SoC verification is diffi-
cult because it involves a lot of effort to validate numerous IP interface and on-chip
inter-IP interaction. The reusability of IP verification environment is a desired but
difficult approach [36]. The degree of complexity for component and system design
for functional verification increases as the design and sophistication of Multiprocessor
SoC designs grow to meet minimal power, speed, performance and functionalities.
The objective of verification is to guarantee that the designed system is operating in
accordance with the specified requirements. In this study, a flexible verification
environment is introduced that adapts to the associated NoC framework [37]. Many
obstacles were encountered in prior verification approaches, including reusability,
maintenance, bug identification, and so on. The introduction of SystemVerilog
UVM overcomes these obstacles. At the SoC level, where each IP is a black box and
regarded as a golden block, IP level verification is an important factor. In this
research, a set of reconfigurable image signal processing IPs is combined to meet the
needs of a variety of advanced video processing SoCs [38]. Verification activities
have begun to dominate the design effort as systems get more complicated. More
efficient verification languages and tools have been devised to cope with rising com-
plexity and enhance productivity at the ESL (Electronic System Level). UVM is a
methodology for RTL and ESL verification that includes a complementary library.
SVM with extensive TLM support based on System C was introduced in this chap-
ter, and it was compared to UVM-related features [39]. The software drivers are
produced only once the hardware is available. This reliance can result in a prolonged
design cycle. It is not always easy to specify driver development and hardware/
software interface models. A technique to formalize hardware/software interface
requirements where concurrency is required is presented in this work, and this
approach is proven using a practical scenario and elaboration of use [40]. This chap-
ter provides an overview of programmable hardware trends in the semiconductor
industry. FPGAs have become synonymous with development and reprogrammable
computing. This chapter focuses on many new industry developments as well as
FPGA, with the goal of being informative. The reasons for the emergence of differ-
ent technologies, their benefits and drawbacks, and why FPGA is the dominating
choice when it comes to programmable logic are all explored here. There is also
debate about whether these technologies will continue to exist [41]. Building a test-
bench to verify a hardware design is referred to as a software challenge. The basic
elements of a design, namely registers, linkages between them, and the computation
required to adjust their values, are all included in the RTL. The procedure for creat-
ing an RTL design is similar to programming. In the software world, testbenches
survive. Data structures and algorithms make up the testbench. Even though most of
46 Intelligent network design driven by Big Data analytics
their creation and operation is software, they are hardware-aware because it is their
role to regulate, adapt to and evaluate the hardware [42]. There is a desire to com-
pare FPGAs and ASICs in terms of space, power consumption and performance.
The scientific evidence characterizing the gap between FPGAs and ASICs are pre-
sented in this work. When compared to an identical ASIC, it was discovered that the
FPGA takes up more space, performs slower, and consumes more power. It was
eventually verified that using hard multipliers and dedicated memory blocks can
save a significant amount of space and power while only having a modest influence
on the latency between FPGAs and ASICs [43]. The industry’s need for complicated
and efficient high-performance controllers prompted rapid development of VLSI
technology and EDA methodologies. EDA tools are used to generate, simulate and
test a design without committing to hardware, allowing complicated systems and
concepts to be quickly evaluated. Because of the increasing difficulty of control
algorithms and chip density, efficient design methodologies are required [44]. DV
has surged in popularity in the electronics industry. It began with the shift in verifi-
cation requirements from ICs to SoCs and bigger systems, which was amplified by
the rise in embedded processor utilization. Then it went on to expand the quantity
and range of verification approaches and languages available. Finally, books con-
taining instructions and advice on verification methodologies, languages, and tools,
as well as how to combine all of them into a unified verification approach and pro-
cess, have become more widely available [45]. As ASIC designs get more compli-
cated, the complexity of the ecosystems in which they were implemented rises
exponentially. There has been no parallel robust procedure in place for verification
environments like an ASIC design’s HDL may be subdivided into intelligible but
small components whose behaviour can be grasped in a reasonable amount of time.
Any verification environment established or generated for these design sub-blocks,
whether written in HDL or any of the different verification or scripting languages
now accessible, remains extremely complex. The goal of the team was to combine
C++, Tcl and Perl into a unified, highly intelligent and usable ASIC verification
environment [46–67].
The forthcoming [64] industrial revolution has the greatest man-made theories yet
to be proven in actual practical scenarios, though the intention of the technology
is to reduce the manpower and make use of the so-called computers at a greatest
extent. When we make a note of the technology called “computer,” we tend to aim
towards the development of both hardware as well as software productions and yet
here we are relating to one such field of computers where the core of its opera-
tions is embedded over a single fabric with the lightest and more and more thinner
technology nodes developing by huge multinational companies. This in turn again
gives an end user to reduce the man power and yet depend over the software so as to
develop/demonstrate/simulate/calibrate an actual hardware which has to be brought
up from the idea created to the end market for the consumers. ML is one such end
An intelligent verification management approach 47
where computers are used at its full potential which is again to study the computer
algorithms with the help of the usage of huge amount of data nodes. Data nodes is
a term referring to bunch of information pointing to one subject and these subjects
may differ from point to point. To access these nodes or to identify the information
in a very minute period of timescale, we need faster developed algorithms, as fast
as it could also be utilised to judge a qubit (quantum bit) in the quantum computing.
Having this phenomenal facility of ML, this chapter possesses the strategy to imple-
ment the randomization technique with the help of xorshift128 algorithm which is
a part of ML. Internal to this, we have to keenly observe the network topology of
the data redundancy and node accessibility, and this stage is well known to be the
neural networks.
Figure 3.2 DPI linkage with actual scenario (PRNG and floating point
conversion module)
known to be a neural network. From Figure 3.3 as we are able to vision, the internal struc-
ture of the xorshift algorithm having four input seeds with the range ‘n’ is shifted multiple
times with the other seed bits and is exported to form a new set of output seeds for the
random bit generation.
Figure 3.4 Block view of DPI and different languages on either side
An intelligent verification management approach 51
language which needs to talk to our SystemVerilog code. Basically, this layer is also
responsible for providing a clean response to a client from time to time. Here, the
message is well decoded and received from the end user and generates the informa-
tion to be reverted back to the client (end user). With this one successful transaction
with client, the server is now responsible to close the connection indicating that
the information has been successfully exchanged between the connection layer and
vice versa. Usage of DPI directly in Python can be done by calling “pythonbind11”
package bit; this has a disadvantage of not cohesively performing actions with the
communication layers in between the client and server, making it hard to debug the
code and its functionality.
Bridging of crucial high-level abstraction and low-level described languages
needs a strong and precise interpreter in terms of computer science theory, as the
mutual compilation feedbacks are necessary so as to spawn the exact output from
the interfacing components used.
1. It is possible to “edit, execute, stop and restart” Tcl scripts without recompiling
or exiting the simulator.
2. Because of the nature of Tcl, it is far more expressive and prolific than
SystemVerilog.
3. DSL support from scratch.
4. Flexible creation of any data structures
5. Automation and enhancements for simple static analysis
6. The most straightforward method for interacting with external simulator EDA/
CAD tools.
7. An easy way to communicate with something written in another language.
8. There is no need for anything additional because Tcl is already included in EDA
tools.
The debug prompt helps experiment with different test scenarios in real time. If
the same set of debug instructions is used repeatedly, loading a command file saves
time on typing. The command and descriptions are given in Table 3.1.
On the other hand, the debug prompt lacks any built-in programming fea-
tures and can only execute one command after another consecutively. To put it
another way, the debug prompt is not a Turing complete language in and of itself.
Fortunately, the simulator already includes a full-featured programming environ-
ment in the shape of the Tcl prompt.
The UVM debug library produces a Tcl wrapper function that calls the debug
prompt, passes in and executes one debug command, and returns to the Tcl prompt
with the debug command’s return value.
Please remember that the Tcl script integration is currently in its early stages.
Everything is executed in a single thread by the Tcl prompt.
It would be interesting to observe how well the UVM debug library performs
when several threads call numerous debug instructions simultaneously.
Command Description
Help [command] Displays the help message. With no argument, it lists all
the available commands; otherwise it displays the help
message of the command specified
Continue Exits the debug prompt and continue the simulation
pause Pauses the simulation and switch to the simulator’s Tcl prompt
Run < runtime > Runs the simulator for the specified time and go back to the
debug prompt
History [list] |clear |save< Lists all the previous command inputted by the user; clear the
file > history; save the history to a command file
Repeat # Repeats the specified line in the command history
Read < file > Reads a command file
Save_checkpoint [-path< Saves a check point snapshot
path > ]< snapshot
name>
54 Intelligent network design driven by Big Data analytics
We address the new challenges and directions that big data and AI pose in education
research, policymaking and industry. Big Data and AI applications in education have
made tremendous strides in recent years. This exemplifies a new trend in cutting-
edge educational research.
3.6 Conclusion
References
[1] Nasser Y., Lorandel J., Prevotet J.-C., Helard M. ‘RTL to transistor level pow-
er modeling and estimation techniques for FPGA and ASIC: a survey’. IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Systems.
2021, vol. 40(3) 479–93.
[2] Anwar M.W., Qamar S., Azam F., Butt W.H., Rashid M. ‘Bridging the gap
between design and verification of embedded systems in model based system
engineering’. A Meta-model for Modeling Universal Verification Methodology
(UVM) Test Benches’. ICCMS; 2020. pp. 26–8.
[3] Zhou S., Geng S., Peng X., et al. ‘The design of UVM verification platform
based on data comparison’. Proceedings of the 4th International Conference
56 Intelligent network design driven by Big Data analytics
[18] Chen W., Ray S., Bhadra J., Abadir M., Wang L.-C. ‘Challenges and trends in
modern SOC design verification’. IEEE Design & Test. 2017, vol. 34(5), pp.
7–22.
[19] Chan H. ‘UVM interactive debug library: shortening the debug turnaround
time’. DVCON 2017; San Jose; 2017.
[20] Fiergolski A., on behalf of the CLICdp collaboration. ‘Simulation environ-
ment based on the universal verification methodology’. Topical Workshop on
Electronics for Particle Physic; Karlsruhe, Germany; 2016.
[21] Renuka G., Ushashree V., Chandrasekhar Reddy P. ‘Functional verification of
complex SOC by advanced verification methodology’. International Journal
of VLSI System Design and Communication Systems. 2016, vol. 4(12), pp.
1308–12.
[22] Pankaj S.V., Kureshi D.A.K. ‘UVM architecture for verification’. International
Journal of Electronics and Communication Engineering & Technology. 2016,
vol. 7(3), pp. 29–37.
[23] Jain A., Gupta R. ‘Unified and modular modeling and functional verification
framework of real-time image signal processors’. VLSI Design. 2016, vol.
2016 1–14.
[24] Foster H.D. ‘Trends in functional verification: a 2014 industry study’. 52nd
ACM/EDAC/IEEE Design Automation Conference (DAC); San Francisco,
CA, IEEE, 2015. pp. 1–6.
[25] Rodríguez-Andina J.J., Valdés-Peña M.D., Moure M.J. ‘Advanced features
and industrial applications of FPGAs—a review’. IEEE Transactions on
Industrial Informatics. 2015, vol. 11(4), pp. 853–64.
[26] Wang C., Li X., Zhou X. ‘SODA: software defined FPGA based accelerators
for big data’. Design, Automation & Test in Europe Conference & Exhibition
(DATE); IEEE, 2015. pp. 884–87.
[27] Oddone R., Chen L. ‘Challenges and novel solutions for soc verification’.
ECS Transactions. 2014, vol. 60(1), pp. 1191–5.
[28] Opeyemi M.A., Justice Emuoyefarche E.O. ‘Field programmable gate array
(FPGA): a tool for improving parallel computations’. International Journal
of Scientific & Engineering Research. 2014, vol. 5(2), pp. 2229–5518.
[29] Drechsler R. ‘Panel: future soc verification methodology: UVM evolution or
revolution?’. Design, Automation & Test in Europe Conference & Exhibition
(DATE); IEEE, 2014. pp. 1–5.
[30] Li Q., Xie Z., Su J., Wang X. ‘UVM-based intelligent verification method
for UHF RFID tag’. IEEE International Conference on Electron Devices and
Solid-State Circuits; Chengdu, China, 18–20 Jun; 2014. pp. 1–2.
[31] Salah K. ‘A UVM-based smart functional verification platform: concepts,
pros, cons, and opportunities’. 9th International Design and Test Symposium;
Algeries, Algeria, 16–18 Dec; 2014.
[32] Raghuvanshi S., Singh V. ‘Review on universal verification methodol-
ogy (UVM) concepts for functional verification’. International Journal
of Electrical, Electronics and Data Communication. 2014, vol. 2(3), pp.
101–07.
58 Intelligent network design driven by Big Data analytics
[33] Bromley J. ‘If system verilog is so good, why do we need the UVM? sharing
responsibilities between libraries and the core language’. Proceedings of the
2013 Forum on specification and Design Languages (FDL); Paris, France,
24–26 Sep; 2013.
[34] Valeria Bertacco Axel Legay ‘Hardware and software: verification and test-
ing’. 9th International Haifa Verification Conference; Haifa, Israel, Springer
Cham, 2013. pp. 5–7.
[35] Kannavara R. ‘Towards a unified framework for pre-silicon validation’.
Proceedings of the 4th International Conference on Information, Intelligence,
Systems and Applications; Piraeus, Greece, IEEE, 2013. pp. 321–26.
[36] Zhaohui H., Pierres A., Shiqing H, et al. ‘Practical and efficient SOC veri-
fication flow by reusing IP testcase and testbench’. 2012 International SoC
Design Conference (ISOCC); Jeju, Korea (South), IEEE, 2012. pp. 175–78.
[37] Lim Z.N., Loh S.H., Lee S.W., Yap V.V., Ng M.S., Tang C.M. ‘A reconfigur-
able and scalable verification environment for noc design’. Conference on
New Media Studies (CoNMedia); Tangerang, Indonesia, IEEE, 2013. pp. 1–4.
[38] Jain A., Bonanno G., Gupta D.H., Goyal A. ‘Generic system verilog uni-
versal verification methodology based reusable verification environment for
efficient verification of image signal processing IPS/SOCS’. International
Journal of VLSI Design & Communication Systems. 2012, vol. 3(6), pp.
13–25.
[39] Oliveira M.F.S., Kuznik C., Mueller W., et al. ‘The system verifica-
tion methodology for advanced TLM verification’. Proceedings of the
Eighth IEEE/ACM/IFIP International Conference on Hardware/Software
Codesign and System Synthesis (CODES+ISSS); Tampere Finland, 7–12
Oct; 2012.
[40] Li J., Xie F., Ball T., Levin V., McGarvey C. ‘Formalizing hardware/soft-
ware interface specifications’. 26th IEEE/ACM International Conference on
Automated Software Engineering (ASE 2011); Lawrence, KS, IEEE, 2011.
pp. 143–52.
[41] Ahmed S., Sassatelli G., Torres L., Rouge L. ‘Survey of new trends in industry
for programmable hardware: FPGAs, MPPAs, MPSoCs, structured ASICs,
eFPGAs and new wave of innovation in FPGAs’. Proceedings of the 20th
International Conference on Field Programmable Logic and Applications;
Milano, Italy, 2 Sep; 2010.
[42] Glasser M. Open verification methodology cookbook, springer-verlag new
york 2009, ISBN: 978-1-4899-8513-2; 2009.
[43] Kuon I., Rose J. ‘Measuring the gap between FPGAs and ASICs’. IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Systems.
2007, vol. 26(2), pp. 203–15.
[44] Monmasson E., Cirstea M.N. ‘FPGA design methodology for industrial con-
trol systems—a review’. IEEE Transactions on Industrial Electronics. 2007,
vol. 54(4), pp. 1824–42.
[45] Singh L., Drucker L., Khan N. Advanced verification techniques: a systemC
based approach for successful tapeout. Academic Publisher Kluwer; 2004.
An intelligent verification management approach 59
[46] McKinney M.D. ‘Integrating Perl, Tcl and C++ into simulation-based ASIC
verification environments’. Sixth IEEE International High-Level Design
Validation and Test Workshop; Monterey, CA, 9 Nov; 2001. pp. 19–24.
[47] Haidar M., Kumar S. ‘Smart healthcare system for biomedical and health care
applications using aadhaar and blockchain’. 5th International Conference
on Information Systems and Computer Networks, ISCON 2021, 2021; GLA
Mathura, 22–23 Oct 2021; 2022. pp. 1–5.
[48] Kumar S., Cengiz K., Vimal S., Suresh A, et al. ‘Energy efficient resource
migration based load balance mechanism for high traffic applications iot’.
Wireless Personal Communications, 1-14. 2021, vol. 10(3).
[49] Kumar S., Ranjan P., Radhakrishnan R., Tripathy M.R. ‘Energy efficient mul-
tichannel MAC protocol for high traffic applications in heterogeneous wireless
sensor networks’. Recent Advances in Electrical & Electronic Engineering.
2017, vol. 10(3), pp. 223–32.
[50] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘Resource efficient clus-
tering and next hop knowledge based routing in multiple heterogeneous wire-
less sensor networks’. International Journal of Grid and High Performance
Computing. 2017, vol. 9(2), pp. 1–20.
[51] Punhani A., Faujdar N., Kumar S. ‘Design and evaluation of cubic to-
rus network-on-chip architecture’. International Journal of Innovative
Technology and Exploring Engineering (IJITEE), 1672-1676. 2019, vol. 8(6).
[52] Dubey G., Kumar S., Kumar S., Navaney P. ‘Extended opinion lexicon and
ML-based sentiment analysis of tweets: a novel approach towards accurate
classifier’. International Journal of Computational Vision and Robotics.
2020, vol. 10(6), pp. 505–21.
[53] Kumar S., Ranjan P., Radhakrishnan R., Tripathy M.R. ‘Energy aware dis-
tributed protocol for heterogeneous wireless sensor network’. International
Journal of Control and Automation. 2015, vol. 8(10), pp. 421–30.
[54] Singh P., Bansal A., Kamal A.E., Kumar S., Marla D.. (eds.) ‘Road surface
quality monitoring using machine learning algorithm’ in Reddy A.N.R.,
Favorskaya M.N., Satapathy S.C. (eds.). Intelligent Manufacturing and
Energy Sustainability. Smart Innovation, Systems and Technologies. 265.
Singapore: Springer; 2022.
[55] Sharma A., Awasthi Y., Kumar S. ‘The role of blockchain, AI and iot for
smart road traffic management system’. IEEE India Council International
Subsections Conference, INDISCON; Visakhapatnam, India, IEEE, 2020. pp.
289–96.
[56] Kumar S., Ranjan P., Singh P., Tripathy M.R. ‘Design and implementation
of fault tolerance technique for internet of things (iot)’. 12th International
Conference on Computational Intelligence and Communication Networks
(CICN); Bhimtal, India, IEEE, 2020. pp. 154–59.
[57] Singh P., Bansal A., Kumar S. ‘Performance analysis of various information
platforms for recognizing the quality of indian roads’. 10th International
Conference on Cloud Computing, Data Science and Engineering; Noida,
India, IEEE, 2020. pp. 63–76.
60 Intelligent network design driven by Big Data analytics
Identifying the most accurate methods for forecasting students’ academic achieve-
ment is the focus of this research. Globally, all educational institutions are con-
cerned about student attrition. The goal of all educational institutions is to increase
the student’s retention and graduation rates and this is only possible if at-risk stu-
dents are identified early. Due to inherent classifier constraints and the incorporation
of fewer student features, most commonly used prediction models are inefficient
and incur. Different data mining algorithms like classification, clustering, regres-
sion, and association rule mining are used to uncover hidden patterns and relevant
information in student performance big datasets in academics. Naïve Bayes, random
forest, decision tree, multilayer perceptron (MLP), decision table (DT), JRip, and
logistic regression (LR) are some of the data mining techniques that can be applied.
A student’s academic performance big dataset comprises many features, none of
which are relevant or play a significant role in the mining process. So, features with
a variance close to 0 are removed from the student’s academic performance big data-
set because they have no impact on the mining process. To determine the influence
of various attributes on the class level, various feature selection (FS) techniques
such as the correlation attribute evaluator (CAE), information gain attribute evalu-
ator (IGAE), and gain ratio attribute evaluator (GRAE) are utilized. In this study,
authors have investigated the performance of various data mining algorithms on
the big dataset, as well as the effectiveness of various FS techniques. In conclu-
sion, each classification algorithm that is built with some FS methods improves the
performance of the classification algorithms in their overall predictive performance.
1
Department of Computer Science, Himachal Pradesh University, Summer-Hill Shimla-5, Himachal
Pradesh, India
2
Chitkara University School of Engineering and Technology, Chitkara University, Himachal Pradesh,
India
3
Department of Electrical-Electronics Engineering, Trakya University, 22030 Edirne, Turkey
62 Intelligent network design driven by Big Data analytics
4.1 Introduction
Today, universities are in a very competitive and complicated situation. There has
been a lot of progress in technology and IT equipment, which has made it easier to
store a lot of data in educational databases. If these data are not analyzed, it is just
a lot of data. These data can be used with tools, methods, and techniques that help
us look at it. We can look for patterns and hidden information. Data mining is a way
to look for patterns and relationships in data that can help people make better deci-
sions. People from different fields work together in this area. It combines techniques
from statistics and artificial intelligence with those from neural networks, database
systems, and machine learning.
Data mining offers a variety of methods for analyzing data. A human’s ability
to assess and extract the most important information from student databases is cur-
rently limited by the sheer volume of data available. It is the extraction of nontrivial,
unknown, and potentially relevant information from a huge database through the
process of knowledge discovery. A user’s needs are taken into consideration when
data mining is employed in knowledge discovery. A linguistic term for describ-
ing a subset of data is referred to as a “pattern definition.” Data mining has a wide
range of applications. Financial institutions employ it to uncover hidden correla-
tions among various financial indicators, which they then use to spot potentially
fraudulent activity. It is able to identify both fraudulent and legitimate behavior by
analyzing previous data and turning it into valuable and accurate information [1].
By using data mining in healthcare, it is possible to identify correlations between
disease and treatment success. It also aids health-care insurance providers in detect-
ing fraud. This software is often used to find money laundering, drug trafficking,
and other crimes. Customer demographic features and predicted behavior are two
frequent uses of data mining in the telecommunications industry to boost profit-
ability as well as reduce customer turnover. Based on the findings of data mining,
marketing campaigns and pricing strategies can be developed. Data mining tech-
niques are used in sales and marketing to uncover previously unnoticed patterns in
past customer purchases. In market basket analysis, the results of data mining can be
utilized to determine client purchasing habits and behavior patterns. Future trends
and customer purchasing habits can be predicted using these data. When it comes to
predicting how many customers will leave, the banking industry relies a lot on data
mining. The increasing use of technology in educational systems has resulted in an
enormous amount of data being made available to educators. A substantial amount
of important information may be gleaned via educational data mining (EDM), which
helps to provide a more accurate picture of learners and their learning processes.
This software analyzes educational data and helps students overcome educational
problems using data mining techniques. Similarly, to other data mining techniques’
extraction processes, EDM extracts information from educational data that are
engaging, interpretable, valuable, and innovative for the learner or teacher. EDM,
on the other hand, is primarily geared toward the development of methods that make
use of data that are unique to educational systems. After that, such strategies are
Evaluation of machine learning algorithms 63
be used to remove certain attributes from a database for this purpose. FS is an impor-
tant approach in the success of the data mining process since it allows us to select
the helpful or relevant qualities in the dataset that are being used for the analysis [4].
In this study, a comparison of alternative FS approaches is made, as well as an
examination of the impact of the strategies on classification algorithms. Any coun-
try’s development depends on its educational system. So, take it seriously from the
outset. They have their own systems and evaluators. Today’s education includes
online education, Massive Open Online Courses (MOOC) courses, intelligent tutori-
als, web-based education, project-based learning, seminars, and workshops [5]. But
none of these systems work unless they are accurately evaluated. So, to make any
school system successful, a clear evaluation mechanism is needed. Every educa-
tional institution creates a lot of data about each registered student, and if that data
are not correctly analyzed, all resources are lost. It includes student acceptance, fam-
ily info, and academic results. Every educational institution assesses their students
[6]. In modern education, many tools are utilized to measure a student’s academic
achievement. Data mining is one of the greatest sophisticated computer tools used
to monitor student progress. Currently, data mining is applied in almost every field
where data are used. Data mining has several key applications in retail, marketing,
banking, telecommunications, hospitality, hospital, and production management.
This entire company uses data mining to increase sales and future growth. With the
help of a built-in algorithm, it analyzes any organization’s historical data to uncover
hidden information [7]. So, we may say that data mining techniques can extract hid-
den information from any organization’s data warehouse.
4.1.1 EDM
In Montreal, EDM hosted its first international research conference, which took
place in 2008. Founded in 2011, the International Educational Data Mining Society
is dedicated to advancing EDM. Since then, disciplines such as data mining, machine
learning, pattern recognition, psychometrics, artificial intelligence, information vis-
ualization, and computational modeling have increased in popularity. EDM’s ulti-
mate goal is to improve education and to make decision-making processes more
transparent [8]. Data that are made during teaching and learning can be used to find
new information, correlations, and patterns in very large datasets.
Data mining has a wide range of applications. Customer behavior data may be
used to improve customer loyalty, and it can also be used to uncover hidden correla-
tions between various financial indicators in order to spot potentially illegal activity.
It collects and analyzes past data in order to detect both fraudulent and nonfraudulent
behavior. Data mining in healthcare makes discovering the connections between dis-
eases and therapies easier [9]. Fraud detection is made easier for health-care insurers
as well. It is used by law enforcement authorities to investigate money laundering,
narcotics trafficking, and other criminal activities. Customer demographic features
and predicted behavior are two frequent uses of data mining in the telecommuni-
cations industry to boost profitability as well as reduce customer turnover. Based
on the findings of data mining, marketing campaigns and pricing strategies can be
Evaluation of machine learning algorithms 65
developed. Using data mining tools, companies can uncover previously unnoticed
trends in customer purchase behavior [10]. Data mining results are used to discover
customers’ purchasing habits by analyzing market baskets and analyzing the combi-
nations of products they purchase together. Additionally, it’s used to make educated
guesses about upcoming fashion and consumer trends. A significant amount of data
mining is used in the banking business when projecting how many clients will quit.
The amount of data produced and saved in many educational institutions has grown
to the point that it is impossible to analyze it manually any longer. A new subject,
EDM, has emerged as a result of the analysis of educational data.
1. Data mining is used to discover patterns, forecast trends, and find relations, thus
enabling the companies to decode the true potential of their precious data. The
game-changing insights derived from data mining process enable the organiza-
tions to make informed business decisions, devise effective strategies, and gain
a competitive advantage in the industry.
2. Organizations can learn in detail about the factors affecting and impacting mar-
ket demographics. Alongside, they can uncover new business prospects, tap
untouched markets that might extend beyond geographical boundaries, and
expand the horizons.
3. They can understand the consumer better, know their preferences, age, spend-
ing habits, social media behavior, which product is preferred the most, etc.,
and offer them customized services accordingly. This consequently leads to
increased loyalty and lesser churned out rate.
As a data mining application, companies can adapt more about their clients and can
execute more efficient schemes for a number of business functions to leverage the
resources in a more optimized and perceptive manner. This assists the companies to
get closer to the goals and make better results.
It involves efficient data compilation, warehousing, and computer processing.
To segment the data and to estimate the possibility of future events, it utilizes com-
posite mathematical algorithms. It is also termed as knowledge discovery in data-
bases (KDD). The main features of data mining are automatic pattern predictions
based on trends, behavior analysis. The focus is given to huge datasets and data-
bases, for discovering and documenting the fact groups based on clustering that are
not known previously.
The algorithms of data mining are the step-by-step definition of the process
utilized to bring meaning to the set of information. Some of them are quite simple
and need little to comprehend and execute and some are very composite and need
significant learning and an attempt to implement. The algorithms could be of several
forms that depend on the data to be undertaken and the outcome to attain. In this
section, numbers of data mining algorithms are defined with its types.
In this process, data are extracted from several types of multimedia sources, such
as audio, text and hypertext, video and still images, and the data are turned into a
numerical representation that may be shown in a variety of forms, such as HTML.
This method can be used for a variety of tasks, such as grouping and classification,
performing similarity checks, and identifying relationships.
Data mining in distributed architecture: It is becoming increasingly popular
because it allows for the mining of large amounts of information that is kept in mul-
tiple locations within a firm or across multiple companies. In order to collect data
from many sources and deliver appropriate insights and reports based on it, highly
sophisticated algorithms are used.
Data mining in education: The application of data mining in the realm of edu-
cation has resulted in the development of what is known as EDM. It aids in the
prediction of students’ learning tendencies by providing them with data. It also aids
in the analysis of learning outcomes and the formulation of decisions based on the
information gained. According to the education data mining community, it is a new
discipline that focuses on developing techniques to investigate unique data types
from the educational environment and using them to better understand students and
learn from them.
It is becoming increasingly popular to use data mining technologies in the
subject of education, as seen by the tremendous increase in interest. Development
approaches for discovering information from data obtained from the educational
environment are part of this growing topic, which is now under investigation. This
technique differs from standard data mining techniques in that it explicitly employs
many degrees of meaningful hierarchy in educational data, as opposed to traditional
data mining techniques. EDM is concerned with the collecting, archiving, and anal-
ysis of data that are connected to student learning and evaluation in a formal setting.
It is common for researchers conducting EDM studies to use approaches that have
been pulled from a variety of sources, such as psychometrics, machine learning, data
mining, educational statistics, information visualization, and computer modeling.
An interactive loop is created by the implementation of data mining in educational
systems, which includes the phases of formation, testing, and improvement. The
knowledge that has been discovered should be entered into the system’s cycle, and
the guide should support and boost overall learning. The system is used to transform
data into knowledge and to filter the knowledge obtained through mining in order to
aid in decision-making processes.
To complete this study on predicting academic performance of the students using dif-
ferent data mining techniques, we came across different research and review papers.
During literature study on this topic, we found that a most of the researcher of EDM
communities are trying to develop a system which effectively and efficiently predict
the academic performance of the students. Although a lot of data mining algorithms
are development in the past like classification algorithms that effectively predict the
70 Intelligent network design driven by Big Data analytics
class level of any datapoint in the data. Most of the researcher in the literature con-
sidered different aspects of student attributes in consideration while collecting data
for the dataset. We have different categories of student’s attributes like academic,
personal, family, social and institutional, and all these attributes contribute in the
development of predictive models for academic performance prediction [16]. EDM
is becoming more popular nowadays as a result of the proliferation of electronic
resources, the use of online educational tools, and the Internet. Numerous studies are
being conducted to improve educational materials and technologies. By leveraging
EDM techniques to forecast or analyze student performance and to assist students
who are receiving less than satisfactory grades, an artificial neural network classifier
model was created that can benefit both students and teachers in discovering knowl-
edge from the massive amount of data available in the educational sector.
Sentiment analysis was used to better understand students’ learning styles and
study plans in order to improve instruction [17]. After removing behavioral infor-
mation from students, a model was suggested using data mining approaches that
achieved a 22.1% high accuracy after removing behavioral features. Additionally, it
was discovered that applying ensemble methods resulted in a 25.8% improvement
in accuracy. The academic dataset contained 473 cases, and the Bayesian classifier
achieved a 70% accuracy rate. To characterize student dropouts, the Naïve Bayes’
classifier, artificial neural network (ANN), KNN, and J48 were employed. KNN
and decision trees with 10-fold cross-validation achieved 87% and 79.7% accuracy,
respectively. By building a hyperplane, support-vector machines (SVMs) distin-
guish classes in high-dimensional space [18]. On health-related data, data mining
techniques such as LR and multi-classifiers achieved amazing results. Decision
trees employ optimal ways to identify or reach a specific aim in the real world.
Combining this principle with the ability to accelerate the training process explains
their extensive use in EDM. On a dataset of 15,150 instances, a decision tree was
utilized to predict whether courses pass or fail; 85.92% accuracy was achieved [19].
Another most important algorithms are there in Data mining that help to increase the
accuracy level of any data mining classification algorithms. We have FS techniques
like filter and wrapper method that help to reduce the different attributes of any
dataset that are not very helpful in making prediction on any class. So, in literature,
we came across many FS techniques like CAE (CorrelationAttributeEval), IGAE
(informationGainAttributeEval), and GRAE (GainRatioAttributeEval) with Ranker
as search method and are mostly used in literature [20].
During literature review we focus our review toward two main features: first
is to check that which attributes of the student mostly affect the academic perfor-
mance of the students and second which classification algorithms are mostly used by
researcher to predict the result or performance of the student in academics. The last
and most important point is to check the importance of FS in the prediction of the
academic result. And in most of the research, we found that FS improves the perfor-
mance of any prediction model. According to Hung et al., classification algorithms
like SVM, RF, and neural networks can all be used to identify at-risk students. The
new method was better than other methods at both accuracy and sensitivity in tests
on two different datasets: one from a school and the other from a university [21]. The
Evaluation of machine learning algorithms 71
K-means algorithm was used to study the challenge of detecting the amount of stu-
dent participation in a similar way. Furthermore, using the a priori association rules
method, the authors were able to construct a set of rulers that were associated with
student involvement and academic success. Researchers discovered that students’
levels of involvement and academic performance in an e-learning environment have
a favorable relationship, according to the results of their experiments.
According to Helal et al., students’ socio-demographic characteristics as well as
their university admission basis and attendance type can be taken into account when pre-
dicting their performance. Rule- and tree-based algorithms were shown to be the most
interpretable by the authors’ experiments, making them the most suitable for educational
settings [22]. Zupanc and Bosnic added semantic coherence and consistency character-
istics to an existing automated essay evaluation system. The authors demonstrated that
their proposed approach gave superior semantic feedback to the writer through experi-
mentation. Aside from that, the accuracy of its grading was superior to that of other
cutting-edge automated essay evaluation systems. Two layers of machine learning were
proposed by Xu et al. to track and predict student performance in degree programs.
According to their simulations, the proposed strategy outperformed benchmark strate-
gies [23]. In this paper, the author compared the accuracy, F-measure, and true positive
rate of 11 machine learning models. For the measures listed above, they found that the
decision tree method outperformed alternative classifiers. The number of scholars work-
ing in EDM has skyrocketed in the last several years.
It is possible to employ EDM in a wide range of applications, including user group-
ing and student performance prediction [24]. In order to better comprehend students,
teachers, and their many attributes, academics are developing new statistical methodolo-
gies and machine learning algorithms. Data analysis to improve student and teacher per-
formance is nothing new. As a result of recent developments in educational institutions,
such as the increase in computing facilities and the introduction of web-based learning,
researchers are developing new ways for analyzing educational data. The information
can be obtained via admissions, registration, library management, syllabus, and course
management systems. It is possible to forecast student performance based on data, and
this study employs four machine learning methods to accomplish so: the decision tree,
the random forest classifier, gradient boosting classifier, and extreme gradient boosting
classifier, among others. In order to forecast student performance, researchers use a vari-
ety of categorization techniques [25].
The author of this paper also used decision tree techniques, neural networks, and
linear discriminant analysis to identify students who were at risk of failing the course
and divide them into three categories: low-, intermediate-, and high-risk students [26].
Using classification techniques, it was possible to develop a profile of students who
were most likely to fail or succeed in their studies. In order to forecast students’ achieve-
ment, the author looked at socio-demographic and educational data from the pupils.
Researchers used the Chi-square Automatic Interaction Detector decision tree technique
to construct a predictive model that can identify students who are struggling to acquire
new material [27]. Use of the Naive Bayes (NB) classification by Pandey allowed him
to accurately identify bright kids from those who were more mediocre. Based on their
previous results, their program was able to accurately forecast students’ final grades.
72 Intelligent network design driven by Big Data analytics
A comparative study was undertaken in 2012 in order to estimate the student’s per-
formance. Study participants were asked to choose from among several decision-tree
algorithms to see which one best predicted students’ grade [28]. After testing a number
of different classifiers, they came to the conclusion that a classifier’s precision and accu-
racy should be taken into consideration when deciding which one should be used. They
found that the Classification and Regression Tree algorithm (CART) algorithm, which
was made to look like the decision tree algorithm, was the best. Five machine learn-
ing categorization models are tested by Sekeroglu et al. to see which one best predicts
student achievement in higher education. According to the researchers’ findings, data
preprocessing procedures boost prediction accuracy [29].
So, after review, we selected an academic dataset from the University of California
Irvine (UCI) repository and tried to implement the selected classification algorithms on
that. Now, in our next step, FS algorithms are also used on the selected dataset and select
the top 10 attributes for our implementation with all the classification algorithms [30].
Romero and Ventura highlighted the following applications for EDM: student model-
ing, performance prediction, data visualization, social network analysis, feedback to help
managers, planning and scheduling, grouping students, and identification of undesirable
behaviors. To identify kids who are at risk of failing their first year of education, Pal
employed the categorization technique to analyze data from prior years’ student dropout
statistics [31]. Decision tree approaches were employed in conjunction with information
on the student’s previous education, family income, and parents’ education to forecast
the list of students that required further attention in order to lower the drop-out rate. The
results of this study show that the machine learning model can use current student drop-
out data to build a good prediction model [32]. A deeper knowledge of student behavior
and learning styles is essential for academic institutions to effectively manage current
study programs and educational practices.
Data mining techniques and models can be used as decision-support tools in
education by reviewing datasets and establishing the significance of the influence of
specific variables, resulting in more successful research and an overall improvement
in educational quality [33]. As a result of EDM, colleges can do a better job of allo-
cating resources to their students. Different data mining algorithms such as decision
tree, link analysis, and decision forest were utilized to investigate the preferences of
students for courses, as well as their completion rates and professions of enrollment
[34]. It was discovered through this study that there is a relationship between course
category and enrollment professions, and it was also discovered that data mining is
important for curriculum development and marketing in the sphere of higher edu-
cation These findings may be utilized as a guideline for marketing and curriculum
development efforts in the future.
school reports and questionnaires. There is a math (mat) dataset and a Portuguese
language performance (pl) dataset (por). G3 is strongly linked to G2 and G1 objec-
tive attributes. Graduation is divided into three periods: G1, G2, and G3, with the
latter two representing the first and second semesters, respectively. Predicting the
outcome of G3 on its own is more difficult, but it is also more advantageous.
output while other trees will not. All the trees, however, properly predict the correct out-
come when taken together. In order for the classifier to be able to accurately predict out-
comes rather than make educated guesses, the dataset’s feature variable must have a few
real values. The correlation between the forecasts of each tree must be extremely low.
Decision tree algorithm: In statistics, data mining, and machine learning, the
decision tree algorithm, also known as induction of decision trees, is a predictive
modeling tool. From the observed qualities of an object’s goal value, it employs
a decision tree [37]. Supervised learning approaches such as decision tree can be
used to address classification and regression problems; however, it is more typically
used to tackle classification difficulties. Each leaf node in a tree-structured classi-
fier represents a classification result, with internal nodes relating to dataset features
and branching corresponding to rule sets. In a decision tree, the decision node and
the leaf node are two of the nodes that make up the tree’s branches. Instead of mak-
ing decisions, decision nodes have several branches, but leaf nodes only report the
results of those decisions. Decisions and tests are based on the dataset’s character-
istics. Graphic representations of all conceivable outcomes are used in this strategy.
This is why the term “decision tree” was coined: it expands outward in a tree-like
form from the root node. The CART method, which stands for Classification and
Regression Tree algorithm, is used to create a tree for the purpose of classification
and regression. According to the answer, the tree is divided into subtrees in a deci-
sion tree, which is a sort of decision tree (yes or no).
MLP algorithm: The MLP differs from a linear perceptron in that it has multiple
layers and does not activate in a linear fashion. It has the capability of separating data
that are not linearly separable, among other things. MLPs are feedforward artificial neu-
ral networks that generate outputs from a collection of inputs. A directed graph com-
bines the input and output layers of an MLP’s input nodes together to form a single unit
known as a directed graph unit. Backpropagation is used by MLP to train the network
and improve its performance. MLP is a deep learning method that uses machine learning
[38]. Given that it is a neural network connected to several layers in a directed graph,
the signal flow through the nodes of an MLP is restricted to one direction only. There
are no linear activation functions in this system; instead, all nodes, with the exception
of the input nodes, have non-linear activation functions as their activation functions. An
approach known as backpropagation is used by MLPs to learn from their mistakes. MLP
is classified as a deep learning technique since it makes use of multiple layers of neu-
rons. MLP is frequently utilized in research in computational neuroscience and parallel
distributed processing. It is also commonly used for supervised learning tasks, which is
why it is so popular. Speech and image recognition, as well as machine translation, are
just a few of the many applications available today.
DT algorithm: DTs are scheduled rule logic entries that are table-formatted and
contain conditions (represented by the row and column names) and actions (represented
by the intersection points of the table’s conditional cases). DTs are used to organize and
schedule rule logic entries. DTs are particularly useful when dealing with business rules
that have a number of different conditions to consider. A decision tree or a switch-case
statement in a programming language can also represent the information in a DT. A
DT, sometimes referred to as a cause–effect table, is an effective tool for determining
Evaluation of machine learning algorithms 75
the relationships between various inputs and their associated outputs. The DT is derived
from a logical diagramming approach known as cause–effect graphing, which is why
the term “cause-effect table” was coined. Developing tests are made considerably easier
by using DTs. Using this tool, testers can investigate the consequences of diverse inputs
and other software states on business rules that must be correctly implemented in soft-
ware [39]. When it comes to developing and testing complicated business rules, it gives
a consistent approach. Developers will be able to do a better job with the help of this
tool. Complex business rules necessitate the use of an organized approach to preparing
requirements. Complex logic is another use for it.
JRip algorithm: It is represented by this class as the implementation of a prop-
ositional rule learner. This method, which is an optimized variant of the IREP, was
devised by William W. Cohen. It is named after him. It performs repeated incremen-
tal pruning in order to reduce the number of errors (RIPPER). With its bottom-up
approach to rule identification and learning, JRip classifies instances in the training
data into groups and then discovers the set of rules that apply to all members of each
group. Techniques such as cross-validation and having a specific quantity of words
are employed in order to avoid overfitting the model.
Linear regression algorithm: Using LR, the odds ratio may be calculated
even when there are many explanatory variables present. A binomial distribution
rather than a linear distribution is used as the response variable in this method,
which is very similar to multiple linear regression. Observed occurrence odds
ratios are used to determine the outcome of the experiment. Linear regression, a
fundamental statistical regression approach, can be used to do predictive analysis
and explain the link between continuous variables in a calm and straightforward
way [40]. Linear regression is a statistical approach that develops linear relation-
ships between independent and dependent variables in accordance with its name.
As a result, linear regression is the name given to this technique. Simplified linear
regression refers to linear regression with only one input variable when there are
no other variables in the dataset. The term “multiple linear regression” refers to
the fact that the equation contains many different input variables. The output of
this model is a slanted straight line that depicts the relationship between the two
variables under consideration.
4.3.3 FS algorithms
Methods for reducing the number of variables used in a model’s prediction of a
target variable are known as FS. One way to think about an algorithm for selecting
new feature subsets is to think of it as an amalgamation of a search approach and
an evaluation measure. In order to identify the algorithm with the lowest error rate,
we may simply test all potential subsets of characteristics and see which one works
best. By reducing redundant and superfluous data, FS is a simple but effective solu-
tion to this problem [41]. A better knowledge of the model or data can be gained by
eliminating the irrelevant data, which increases accuracy and decreases computing
time. As a fundamental machine learning technique, FS helps focus the use of vari-
ables on those that are most useful and efficient for a given machine learning system.
76 Intelligent network design driven by Big Data analytics
contrast to a skewed distribution, contains more information because all events have the
same probability.
Gain ratio attribute evaluator: This method activates an evaluator for the gain
ratio attribute and determines the value of an individual attribute by calculating the
gain ratio of the class that contains the attribute. This method returns the evalua-
tor’s capabilities and allows an attribute evaluator to perform any additional post-
processing on the given attribute collection [45].
The targeted output class has a range of 0–20, and there are 21 clusters in the ini-
tial configuration. In terms of the classification task, this is an unreasonable solution
78 Intelligent network design driven by Big Data analytics
because it makes the classification process incredibly difficult, especially given the
limited number of instances provided. In the given dataset, G1, G2, and G3 and the
grade obtained by different students and for better results we find the final grade of
the student by finding the average of all grades and create a new attribute named
“Total Grade.” As a result, I’ve assigned a group of clusters to a few class levels
denoted by the letters A, B, C, D, and F in Table 4.1.
To identify the best classifiers that accurately generalized the data, we implemented
various classification algorithms on a dataset of student academic performance. We
chose different parameters for the algorithms that effectively analyze the dataset
and increase their generalization accuracy [46]. Please keep in mind that all imple-
mented classification algorithms undergo a 10-fold cross-validation to determine
their correctness.
Table 4.3 Attribute selection with the help of ranker search method
Accuracy
Classification algorithm (%) MAE Precision Recall
Table 4.7 Classification algorithm with accuracy using CAE, IGAE, and GRAE
4.6 Conclusion
References
[1] Wang X., Mei X., Huang Q., Han Z., Huang C. ‘Fine-grained learning per-
formance prediction via adaptive sparse self-attention networks’. Information
Sciences. 2021, vol. 545(2), pp. 223–40.
[2] Romero C., Ventura S. ‘Educational data mining and learning analytics: an
updated survey’. WIREs Data Mining and Knowledge Discovery. 2020, vol.
10(3), pp. 25–37.
[3] Iatrellis O., Savvas Ilias Κ.., Fitsilis P., Gerogiannis V.C. ‘A two-phase ma-
chine learning approach for predicting student outcomes’. Education and
Information Technologies. 2021, vol. 26(1), pp. 69–88.
[4] Romero C., Ventura S. ‘Educational data mining: a review of the state of the
art’. IEEE Transactions on Systems, Man, and Cybernetics, Part C. 2010, vol.
40(6), pp. 601–18.
[5] Jauhari F., Supianto A.A. ‘Building student’s performance decision tree clas-
sifier using boosting algorithm’. Indonesian Journal of Electrical Engineering
and Computer Science. 2019, vol. 14(3), pp. 1298–304.
Evaluation of machine learning algorithms 87
[6] Hamoud A.K., Hashim A.S., Awadh W.A. ‘Predicting student performance
in higher education institutions using decision tree analysis’. International
Journal of Interactive Multimedia and Artificial Intelligence. 2018, vol. 5(2),
pp. 26–31.
[7] Slater S., Joksimović S., Kovanovic V., Baker R.S., Gasevic D. ‘Tools for
educational data mining’. Journal of Educational and Behavioral Statistics.
2017, vol. 42(1), pp. 85–106.
[8] Sokkhey P., Okazaki T. ‘Hybrid machine learning algorithms for predict-
ing academic performance’. International Journal of Advanced Computer
Science and Applications. 2020, vol. 11(1), pp. 32–41.
[9] Saa A.A., Al-Emran M., Shaalan K. ‘Mining student information system
records to predict students’ academic performance’. The International
Conference on Advanced Machine Learning Technologies and Applications
(AMLTA2019); Springer, Cham, Cairo, Egypt, 28–30 Mar; 2019. pp.
229–39.
[10] Yadav A., Alexander V., Mehta S. ‘Case-based instruction in undergraduate
engineering: does student confidence predict learning’. The International
Journal of Engineering Education. 2019, vol. 35(1), pp. 25–34.
[11] Ruiz S., Urretavizcaya M., Rodríguez C., Fernández-Castro I. ‘Predicting stu-
dents’ outcomes from emotional response in the classroom and attendance’.
Interactive Learning Environments. 2020, vol. 28(1), pp. 107–29.
[12] Raga R.C., Raga J.D. ‘Early prediction of student performance in blended
learning courses using deep neural networks’. 2019 International Symposium
on Educational Technology (ISET); Hradec Kralove, Czech Republic, 2–4
Jul; 2019. pp. 39–43.
[13] Kuzilek J., Vaclavek J., Zdrahal Z., Fuglik V. ‘Analysing student VLE be-
haviour intensity and performance’. European Conference on Technology
Enhanced Learning; Springer, Cham, Delft, The Netherlands, 16–19 Sep;
2019. pp. 587–90.
[14] Sokkhey P., Okazaki T. ‘Comparative study of prediction models on high
school student performance in mathematics’. 34th International Technical
Conference on Circuits/Systems, Computers and Communications (ITC-
CSCC); IEEE, JeJu, Korea (South), 23–26 Jun; 2019. pp. 1–4.
[15] Sharma A., Yadav D.P., Garg H., Kumar M., Sharma B., Koundal D. ‘Bone
cancer detection using feature extraction based machine learning model’.
Computational and Mathematical Methods in Medicine. 2021, vol. 2021, pp.
1–13.
[16] Kumar M., Bajaj K., Sharma B., Narang S. ‘A comparative performance as-
sessment of optimized multilevel ensemble learning model with existing clas-
sifier models’. Big Data. 2021
[17] Moubayed A., Injadat M., Shami A., Lutfiyya H. ‘Student engagement level
in an e-learning environment: clustering using k-means’. American Journal
of Distance Education. 2020, vol. 34(2), pp. 137–56.
[18] Moubayed A., Injadat M., Shami A., Lutfiyya H. ‘Relationship between stu-
dent engagement and performance in e-learning environment using association
88 Intelligent network design driven by Big Data analytics
[33] Al-Razgan M., Al-Khalifa A.S., Al-Khalifa H.S. ‘Educational data mining:
a systematic review of the published literature 2006-2013’. Proceedings
of the First International Conference on Advanced Data and Information
Engineering (DaEng-2013); Singapore: Springer; 2014. pp. 711–19.
[34] Bhardwaj B.K., Pal S. ‘Data mining: a prediction for performance improve-
ment using classification’. Computer Science Information Retrieval. 2012,
vol. 9(4), pp. 136–40.
[35] Baker R.S., Yacef K. ‘The state of educational data mining in 2009: a review
and future visions’. Journal of Educational Data Mining. 2009, vol. 1(1), pp.
3–17.
[36] Romero C., Ventura S. ‘Educational data mining: a review of the state of the
art’. IEEE Transactions on Systems, Man, and Cybernetics, Part C. 2010, vol.
40(6), pp. 601–18.
[37] Silva C., Fonseca J. ‘Educational data mining: a literature review’. Europe
and MENA Cooperation Advances in Information and Communication
Technologies. Cham: Springer; 2017. pp. 87–94.
[38] Tsai C.-F., Tsai C.-T., Hung C.-S., Hwang P.-S. ‘Data mining techniques for
identifying students at risk of failing a computer proficiency test required
for graduation’. Australasian Journal of Educational Technology. 2011, vol.
27(3), pp. 481–98.
[39] Li J., Li P., Niu W. ‘Artificial intelligence applications in upper gastrointesti-
nal cancers’. The Lancet. Oncology. 2020, vol. 21(1).
[40] Ma W., Adesope O.O., Nesbit J.C., Liu Q. ‘Intelligent tutoring systems and
learning outcomes: a meta-analysis’. Journal of Educational Psychology.
2014, vol. 106(4), pp. 901–18.
[41] Macfadyen L.P., Dawson S., Pardo A., Gasevic D. ‘Embracing big data in
complex educational systems: the learning analytics imperative and the
policy challenge’. Research & Practice in Assessment. 2014, vol. 9, pp.
17–28.
[42] Mussack D., Flemming R., Schrater P., Cardoso-Leite P. ‘Towards discover-
ing problem similarity through deep learning: combining problem features
and user behaviour’. Proceedings of the 12th International Conference on
Educational Data Mining (EDM 2019); Educational Data Mining Forum,
2019. pp. 615–18.
[43] Pu Y., Wu W., Jiang T. ‘ATC framework: A fully automatic cognitive tracing
model for student and educational contents’. Proceedings of 12th International
Conference on Educational Data Mining; Educational Data Mining, 2019.
pp. 635–38.
[44] Lahoura V., Singh H., Aggarwal A., et al. ‘Cloud computing-based framework
for breast cancer diagnosis using extreme learning machine’. Diagnostics.
2021, vol. 11(2), p. 241.
[45] Koundal D., Sharma B. 'Challenges and future directions in neutrosophic set-
based medical image analysis' in Neutrosophic set in medical image analysis.
Academic Press; 2019. pp. 313–43.
90 Intelligent network design driven by Big Data analytics
[46] Singh K., Sharma B., Singh J., et al. ‘Local statistics-based speckle reduc-
ing bilateral filter for medical ultrasound images’. Mobile Networks and
Applications. 2020, vol. 25(6), pp. 2367–89.
[47] Lahoura V., Singh H., Aggarwal A., et al. ‘Cloud computing-based framework
for breast cancer diagnosis using extreme learning machine’. Diagnostics.
2021, vol. 11(2), p. 241.
[48] Haidar M., Kumar S. ‘Smart healthcare system for biomedical and health care
applications using aadhaar and blockchain’. 5th International Conference on
Information Systems and Computer Networks, ISCON 2021; Mathura, IEEE,
2022. pp. 1–5.
[49] Kumar S., Cengiz K., Vimal S., Suresh A. ‘Energy efficient resource migra-
tion based load balance mechanism for high traffic applications IoT’. Wireless
Personal Communications. 2021, vol. 10(3), pp. 1–14.
[50] Kumar S., Ranjan P., Radhakrishnan R., Tripathy M.R. ‘Energy efficient mul-
tichannel MAC protocol for high traffic applications in heterogeneous wireless
sensor networks’. Recent Advances in Electrical & Electronic Engineering.
2017, vol. 10(3), pp. 223–32.
[51] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘Resource efficient clus-
tering and next hop knowledge based routing in multiple heterogeneous wire-
less sensor networks’. International Journal of Grid and High Performance
Computing. 2017, vol. 9(2), pp. 1–20.
[52] Punhani A., Faujdar N., Kumar S. ‘Design and evaluation of cubic to-
rus Network-on-Chip architecture’. International Journal of Innovative
Technology and Exploring Engineering (IJITEE). 2019, vol. 8(6), pp.
1672–6.
[53] Dubey G., Kumar S., Kumar S., Navaney P. ‘Extended opinion lexicon and
ML-based Sentiment analysis of tweets: a novel approach towards accurate
classifier’. International Journal of Computational Vision and Robotics.
2020, vol. 10(6), pp. 505–21.
[54] Kumar S., Ranjan P., Radhakrishnan R., Tripathy M.R. ‘Energy aware dis-
tributed protocol for heterogeneous wireless sensor network’. International
Journal of Control and Automation. 2015, vol. 8(10), pp. 421–30.
[55] Singh P., Bansal A., Kamal A.E., Kumar S. ‘Road surface quality monitoring
using machine learning algorithm’ in Reddy A.N.R., Marla D., Favorskaya
M.N., Satapathy S.C. (eds.). Intelligent Manufacturing and Energy
Sustainability. Smart Innovation, Systems and Technologies. 265. Singapore:
Springer; 2022.
[56] Sharma A., Awasthi Y., Kumar S. ‘The role of blockchain, AI and iot for
smart road traffic management system’. IEEE India Council International
Subsections Conference (INDISCON); Visakhapatnam, India, IEEE, 2020.
pp. 289–96.
[57] Kumar S., Ranjan P., Singh P., Tripathy M.R. ‘Design and implementation
of fault tolerance technique for internet of things (iot)’. 12th International
Conference on Computational Intelligence and Communication Networks
(CICN); Bhimtal, India, IEEE, 2020. pp. 154–59.
Evaluation of machine learning algorithms 91
[58] Singh P., Bansal A., Kumar S. ‘Performance analysis of various information
platforms for recognizing the quality of indian roads’. 10th International
Conference on Cloud Computing, Data Science & Engineering (Confluence);
Noida, India, IEEE, 2020. pp. 63–76.
[59] Reghu S., Kumar S. ‘Development of robust infrastructure in networking to
survive a disaster’. 4th International Conference on Information Systems and
Computer Networks (ISCON); Mathura, India, IEEE, 2019. pp. 250–55.
[60] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘An NS3 implementa-
tion of physical layer based on 802.11 for utility maximization of WSN’.
International Conference on Computational Intelligence and Communication
Networks (CICN); Jabalpur, India, IEEE, 2016. pp. 79–84.
[61] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘A utility maximiza-
tion approach to MAC layer channel access and forwarding’. Progress
in Electromagnetics Research Symposium; Prague, Czech Republic, The
Electromagnetics Academy, 2015. pp. 2363–67.
[62] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘EMEEDP: enhanced
multi-hop energy efficient distributed protocol for heterogeneous wireless
sensor network’. Fifth International Conference on Communication Systems
and Network Technologies; Gwalior, India, IEEE, 2015. pp. 194–200.
[63] Kumar S., Ranjan P., Ramaswami R. ‘Energy optimization in distributed lo-
calized wireless sensor networks’. 2014 International Conference on Issues
and Challenges in Intelligent Computing Techniques (ICICT), Publisher:
IEEE; Ghaziabad, India, 7–8 Feb; 2014.
[64] Sudhakaran S., Kumar S., Ranjan P., Tripathy M.R. ‘Blockchain- based
transparent and secure decentralized algorithm’. International Conference
on Intelligent Computing and Smart Communication 2019. Algorithms
for Intelligent Systems; THDC- Institute of Hydropower Engineering and
Technology, Tehri, India, Singapore: Springer, 2020. pp. 327–36.
[65] Kumar S. Evolution of software-defined networking foundations for iot and
5G mobile networks. Hershey, PA: IGI Publisher; 2020. p. 350.
This page intentionally left blank
Chapter 5
Accurate management and progression of Big
Data Analysis
Tanmay Sinha 1, Narsepalli Pradyumna 1, and K B Sowmya 1
5.1 Introduction
In today’s world, most things are digital in nature. Every nine out of ten works are
done digitally from buying vegetables to booking shows and so on. Our every digital
transaction needs to be stored for safety reasons and for customer satisfaction and
their preferences. So, a digital era indicates an enormous amount of data to be stored
and processed. Therefore, we can also say that this is an era of data.
First of all, we will start with the definition of data which is a combination
of the amount, pictures, or characters on which tasks are executed by a personal
computer that can be renounced and passed on in the form of electrical signals and
subsequently taped on attractive, optical, or mechanical account media. Big Data is
also a kind of data having immense dimensions. Big Data is a term, which depicts
1
Department of Electronics and Communication Engineering, Bengaluru, India
94 Intelligent network design driven by Big Data analytics
• Social Media
1. According to a measurement, social media can store more than 500 terabytes
of information, which are new and cached into the databases of Facebook. This
information is later created in the form of photographs along with video trans-
fers, message trades, and also enables putting remarks, and so on.
Accurate management and progression of Big Data Analysis 95
Big Data is never accessed without proper processing since it makes the usage of
data very difficult. Processing steps are used for identifying hidden patterns, market
trends, and consumer preferences for the benefit of organisational decision-making.
The first step for Big Data Analysis is collecting the data but the raw data we
collect from our sources may it be some websites or apps or any other platforms are
unusable. We need to process and convert the enormous data set into meaningful
data, which can be used by the organisation. This step is called Data Mining. This
step is also called Knowledge Discovery in Databases. In this step, we derive crude
but essential information from the data set. There are various methods for doing this
like k-nearest neighbors (KNN), clustering, regression, association rules, sequential
patterns, etc.
The second step is the actual analysis of processed data depending upon the
need. We can use various machine learning algorithms and select the one/more data
columns for solving the problem.
This combination of steps is called Big Data Analytics, i.e., Data collection –
Raw data, Data mining – Mined data, and Data analysis.
Mined data can be classified as follows (see Figure 5.2):
• statistics
• machine learning
• visualisation
• information science
• database technology
• other disciplines
• Structured
98 Intelligent network design driven by Big Data analytics
• Unstructured
• Semi-structured
a. Collection of data: In this, data from various sources are collected. Data may
be structured or unstructured and may vary from cloud to mobile applications.
These data will be stored in data warehouses from where it can be accessed with
greater ease. Unstructured data may be stored in a data lake.
b. Processing of data: After the data are stored, the next step is to undergo analysis
and then organise such that accuracy can be maintained. For processing the
data, there are two ways which can be followed while doing, viz., batch process-
ing and stream processing.
c. Cleaning of data: After processing of data, it is needed to enhance the quality of
data and get more accurate results. If this cleaning of data is done properly, any
duplicated data or irrelevant data will be eliminated.
d. Analysis of data: After cleaning of irrelevant data, we are left with the actual
data needed. But still, conversion of those data sets into usable data might take
time. For such purposes, we have few data analysis tools which may be fol-
lowed to analyse the data more efficiently. Some of the data analysis tools are
listed below:
i) Data mining
ii) Predictive analysis
ii) Deep learning
Implementation of Big Data Analytics can be understood from Figure 5.6.
5.3. Processing techniques
In processing Big Data, three major characteristics are taken into consideration.
Those characteristics are volume of the data, velocity of the processing of the data,
and wide variety of the data.
1. Step 1: The whole data are split into small blocks and stored in different systems.
2. Step 2: Find the highest recurring genre for each part stored on the correspond-
ing machine.
3. Step 3: Results are taken from each machine to produce the final output.
1. Critical path problem: The path may get extremely long for some applications
due to dividing data without proper management. The time which it takes to fin-
ish the task without delaying the next milestone or the actual completion date.
Therefore, if there are any delays, the entire work will be delayed.
2. Reliability problem: In case of any hardware or software problem, the data
will not be available which reduces its reliability. The management of this type
of problem is a challenge.
3. Splitting data issue: The way to divide the data among systems and the place
where to and how is something a problem; we need to find an efficient way to
split the data to have no burden on any device.
4. Single system may fail: There should be a way for fault tolerance. For example,
if one system is unable to produce the output for the final calculation, it should
still be able to produce results depending on the other outputs.
5. Accumulation of result: We need a mechanism to accumulate the results pro-
duced by each machine to produce the final output.
Due to the above problems, a programmer needs to take the design issues into
consideration while coding the application. To counter the problems that are present
in traditional methods, we have generated an innovative way, i.e., MapReduce. It
helps to take care of individually when using traditional approaches while process-
ing large amounts of data in parallel.
MapReduce framework can be used by programmers to create complex applica-
tions with parallel computations without taking system issues into consideration.
5.3.2 MapReduce
MapReduce is a programming environment that allows for distributed and parallel
processing of large data sets in a distributed environment.
As in its name, this framework is bi-staged:
1. Mapping
2. Reducing
It follows the same order as in its name: The first being comparison and the next
being editor phase.
So the first is the task of comparing and creating key-value pairs based on count.
This is the intermediate output of the process. This output is then entered into the
editor.
Accurate management and progression of Big Data Analysis 103
The editor gets a pair of key values from several map jobs. The adjuster then
consolidates these intermediate data sets (the intermediate key-value pair) into a
smaller set of motors or key-value pairs, which is the final result.
As just mentioning the way cannot explain the process, let us take an example.
Let us take one of the most popular examples: word count.
In this type, we will take a text file with many words in it. Let us assume that the
content of the text file be as below:
Car, Bike, Cycle, Cycle, Bike, Car, Bike, Car, Bike, Cycle
The whole process can be divided into five steps
Step 1: Splitting
Breaking the input into many chunks of data is done in this step. In this example,
let us take the number of chunks to be three for ease of understanding.
Step 2: Mapping
Tokenise each mapper word and assign each token or word a hard-coded value
(1). The reason for setting the hard-coded value to one is because we consider each
word to be unique and appear once. This creates a list of key-value pairs, and the
key is a single unit. Words and values are one. Therefore, in the first row (Car, Bike,
Cycle), there are three key-value pairs – Car(1), Bike(1), Cycle(1). This process is
repeated at all the nodes.
Step 3: Shuffling
The sorting and shuffling partition process is executed and all the same keyed
ordered pairs are sent to each corresponding reducer. After this phase, each reducer
has a list of values and a unique key corresponding to each value.
Step 4: Reducing
Now the reducer counts the values that exist in its list of values. As shown in the
figure, reducer gets a list of values that are [1, 1, 1] for key Car. It then counts the
number of 1 second in the list and returns the final output as – Car, 3.
Step 5: Final output
All output pairs are then collected and written to the output file. The process is
pictorially shown in Figure 5.7.
Parallel processing
MapReduce divides the job into multiple nodes, each node processing part of the job
at the same time. As such, MapReduce is based on the divide-and-conquer paradigm
and is useful for processing data at different nodes. Processing data in parallel at
multiple nodes instead at a single node as conventional methods suggest signifi-
cantly reduces the time it takes to process the data. It can also be called distributed
processing as processing is distributed among different systems.
Data locality
In the case of data locality, the transfer of data from distinct data nodes to a single
node, which is the processing unit for various processes does not happen at all. The
nodes which are the processing units are further shared with the data nodes so as to
make sure that the processing is going to take place at data nodes only when the case
of MapReduce is taken into account.
Assuming the case of traditional systems, the data are first brought near to a
processing device to process the same. But if the data become too large to be pro-
cessed, sharing the same to processing nodes becomes problematic and thus causes
problems, which are listed below:
5.4 Cyber crime
5.4.1 D
ifferent strategies in Big Data to help in various
circumstances
Due to the rapid development in data science, it is difficult to sort apparatus and
strategies systematically. Few of them are reviewed here.
MapReduce Basically, this can be stated as a design which is used for huge
tasks parallely that can be performed efficiently. The efficiency of such a structure
depends on the deployment of the internal correlation in the data sets considered.
Normally, such a case is not accounted for when digital evidence is the central point
involved, rather a part of the file classified which looks similar to a task suits effi-
ciently when portrayed in the programming environment of MapReduce. When they
take different tasks into consideration, one of the most common among them in the
field of forensics is swapping of the parts of the file that is coming from the whole
file system picture or from unallocated space to file types, which are predefined.
These definite file types include algorithms involved in machine learning classi-
fication such as logistic regression, k-means clustering, support vector machines
(SVM), and so on. These algorithms can be attached to the MapReduce paradigm
Accurate management and progression of Big Data Analysis 107
if the individual involved in the analysis of the document tries to mold all the pos-
sibilities of the correlations among individual parts. A more consolidated approach
in the same direction can be merging the classification algorithms with a decision
tree. Such integration approach yields a higher accuracy with respect to individual
approach.
Decision trees and random forests These methods are generally brought to
give productive results in the case of extortion detection software. Here, the main
target is to identify the factual expectations or the predictions in the huge data set.
For such cases, situation odd exchanges or irregular pursuing conduct can be used
depending on the application and the usage.
Audio forensics Independent learning strategies under the overall meaning
of ‘blind signal separation’ give agreeable results in separating two superimposed
speakers or a sound from background noise in sound forensics. Among likely plans,
they rely upon numerical support to track down the base-connected signs.
Image forensics Grouping techniques are helpful in picture legal sciences addi-
tionally to audit enormous arrangements of hundreds or thousands of picture records
as in to recognise dubious pictures from the rest.
Neural networks For complex example acknowledgment in network legal sci-
ences, neural networks are appropriate. This is fundamentally a helpful approach
where the progressive depictions of the document framework are utilised to prepare
the network to distinguish typical conduct of an application. Following the occasion,
the framework can be used subsequently to outline an execution course of events on
a legitimate picture of a record framework. Such neural networks are additionally
used to recognise network traffic yet do not yield significant degrees of exactness.
Another strategy is utilised, e.g., natural language processing procedures includ-
ing Bayesian classifiers and independent calculations for gathering like k-means,
which is effective for origin confirmation or characterisation of huge groups of
unstructured texts, messages in explicit.
For the analysis of humongous data, it is not compulsory to evaluate the same at
a very high speed ( approximately 30 00 00 000 m/s) rather it is important to store
data securely. These security methodologies involve encryption and decryption of
the data and these measures should be an integral part of the analysis because there
are quite a lot of unethical experts who can extract the data without notifying any-
one. Thus, there is a high requirement of giving an insight into Big Data Analysts
about the problems or threats related to privacy of the data prevailing currently and
proper involvement of such analysts is also expected for the development of much
better tools to protect and prevent such threats or issues.
only 10% of the data obtained is useful from a forensic point of view. Traditionally,
the data stayed in complete control of the forensic organisation itself but with the
use of Big Data, for all data processing, forensic experts have to depend on the
organisations to collect the data which simply increases the duration of analysis
and find the facts among them. In many case studies, it is found that for just col-
lecting and preserving the data, more than 24 hours are required. There are various
characteristics of the data, viz., volume, intensity, etc., which do not match with the
available methods in the present. So, examining and analysing such data in such a
programming paradigm becomes challenging. Keeping in mind all the challenges
faced in organising the data, special search and data mining techniques should be
employed to make it more efficient. After organisation, the last step involved in
digital forensics is presentation where the experts have to present the analysis or the
findings done from the evidence in the court. Using traditional computers to present
the analysis is easy compared to Big Data where the data is complex and providing
the analysis of such data becomes tedious for the jury to understand and evaluate.
Collectively, each step faces different challenges while using Big Data.
Thus, to face such challenges, a theoretical model of using Hadoop Distributed
File system (HDFS) and cloud-based computing model is proposed. Hadoop or
Apache Hadoop is basically an open source software that utilises the network of
many computers to solve Big Data problems or huge data problems. To solve such
Big Data problems, it uses the MapReduce framework, which is already explained.
Coming to Hadoop Distributed File system, it is a file system component that is
developed in such a way that it can store huge data sets effectively and efficiently
and provides high bandwidth across the cluster for performing different applications
of the end user.
Apart from HDFS, cloud-based computing models should be deployed in the
case where Big Data is considered as such data contain terabytes of information.
This architecture might prove to be cost effective and scalable.
We all know that our tendencies are shifting from manual to autonomous devices.
These devices are connected in a way thereby creating a chain of appliances trans-
ferring/sharing data and other resources. These are called IoT where the connection
will be due to the internet/intranet/cloud. But if we keep all the processing in a sin-
gle processor then the load will be huge as well as the response will be slower. For
example, if we take an autonomous car then due to development in internet-based
gadgets, e.g., traffic sensors, disseminated camcorders, and associated apparatuses,
a surge of information is being created, which is handled utilising edge-registering
assets. The image of a vehicle in front of it or the traffic signals should be processed
faster to avoid the dangers of the road, and if we depend on cloud, e.g., then the
appliance should first transfer the data to the cloud where it will process the whole
information and send its reply in the form of a command to the respective func-
tional unit. This will take a lot of time compared to the case where we can give the
110 Intelligent network design driven by Big Data analytics
data required only for the command to the command unit. This can be achieved
by Edge Analytics. In this we process the data from the input at the source itself
thereby reducing the information transfer to the command unit thereby reducing the
response time.
The ascent of Big Data acquires uncommon new advantages and opens a few
application areas. The significant undertakings can misuse the IoT-created informa-
tion and surmise the business esteem by performing information investigation. In
particular, a few applications would require continuous edge examination to rap-
idly discover helpful relationships, client inclinations, shrouded designs, and other
significant data that can help associations and chiefs to take increasingly educated
business activities. These difficulties speak to a few open doors for analysts in the
area to research various bearings including data combination, AI, and explanatory
instruments structure.
The study of IoT-produced data is very important to meet the demands and
satisfy the consumers. The conventional investigation methods will not be suitable
due to the enormous data about various details like limits for various tasks like
registrations, timeframes for various tasks, etc. The IoT-created Big Data has novel
5.6 Conclusion
With the knowledge of Big Data, we can minimise the effort for product enhance-
ment, and accumulation of new data can help in increasing user friendliness using
our understanding in logical programming. Logical programming has made the
entire information examination simpler. Establishing patterns reduces the time to
break down the enormous data in a cost-effective pattern. They speak to a real
jump forward and an away from to acknowledge huge gains regarding proficiency,
efficiency, income, and gainfulness. Big Data Analysis is very important for proper
functioning in this digital era since everything is digital in nature. We need to be
careful about our data leakages and while notifying our personal data. With the
help and cooperation of both business and innovation experts, the age of Big Data
will reach greater heights. Besides all this, Big Data Analytics is really helpful for
medium and large organisations to connect with users all over the world through
social media.
Transcript from one of the most alluring speakers on the Big Data circuit,
Google’s Avinash Kaushik, from his introduction at [12], ‘A Big Data Imperative:
Driving Big Action’:
‘I really don’t generally think about the guarantee of information except if they
can convey on that guarantee that accompanies the information’.
112 Intelligent network design driven by Big Data analytics
References
[1] HongJu X., Fei W., FenMei W., XiuZhen W. ‘Some key problems of data man-
agement in army data engineering based on big data’. IEEE 2nd International
Conference on Big Data Analysis (ICBDA); Beijing, China, IEEE, 2017. pp.
149–52.
[2] Sagiroglu S., Sinanc D. ‘Big data: a review’. International Conference on
Collaboration Technologies and Systems (CTS); San Diego, CA, IEEE, 2013.
pp. 42–47.
[3] Shalaginov A., Johnsen J.W., Franke K. ‘Cyber-crime investigations in the
era of big data’. IEEE International Conference on Big Data (Big Data);
Boston, MA, IEEE, 2017. pp. 3672–76.
[4] Jo J., Joo I., Lee K. ‘Constructing national geospatial big data platform: cur-
rent status and future direction’. IEEE 5th World Forum on Internet of Things
(WF-IoT); Limerick, Ireland, IEEE, 2019. pp. 979–82.
[5] Lee C.K.M., Yeung C.L., Cheng M.N. ‘Research on iot based cyber physi-
cal system for industrial big data analytics’. IEEE International Conference
on Industrial Engineering and Engineering Management (IEEM); Singapore,
IEEE, 2015. pp. 1855–59.
[6] Mazumdar S., Wang J. ‘Big data and cyber security: a visual analytics per-
spective’ in Parkinson S., Crampton A., Hill R. (eds.). Guide to Vulnerability
Analysis for Computer Networks and Systems. Computer Communications
and Networks. Cham: Springer; 2018.
[7] Terzi D.S., Terzi R., Sagiroglu S. ‘Big data analytics for network anomaly
detection from netflow data’. International Conference on Computer Science
and Engineering (UBMK); Antalya, Turkey, IEEE, 2017. pp. 592–97.
[8] Azvine B., Jones A.. (eds.) ‘Meeting the future challenges in cyber securi-
ty’ in Dastbaz M., Cochrane P. (eds.). Industry 4.0 and Engineering for a
Sustainable Future. Cham: Springer; 2019.
[9] Priyadarshini S.B.B., BhusanBagjadab A., Mishra B.K. ‘The role of IoT and
big data in modern technological arena: a comprehensive study’ in Balas
V., Solanki V., Kumar R., Khari M. (eds.). Internet of Things and Big Data
Analytics for Smart Generation. Intelligent Systems Reference Library. 154.
Cham: Springer; 2019.
[10] Puthal D., Ranjan R., Nepal S., Chen J. ‘IoT and Big Data: An Architecture
with Data Flow and Security Issues’ in Longo A. (ed.). Cloud Infrastructures,
Services, and IoT Systems for Smart Cities. IISSC 2017, CN4IoT 2017. 189.
Springer, Cham; 2018.
[11] Praveena M.D.A., Bharathi B. ‘A survey paper on Big Data analytics’.
International Conference on Information Communication and Embedded
Systems (ICICES), Publisher: IEEE; Chennai, India; 2017. pp. 1–9 pp..
[12] Singh S., Singh N. ‘Big Data analytics’. 2012 International Conference on
Communication, Information & Computing Technology (ICCICT); 2012. pp.
1–4.
Accurate management and progression of Big Data Analysis 113
[13] Jayasingh B.B., Patra M.R., Mahesh D.B. ‘Security issues and challenges
of big data analytics and visualization’. 2nd International Conference on
Contemporary Computing and Informatics (IC3I); Greater Noida, India, 14–
17 Dec; 2016. pp. 204–8.
[14] Gupta B., Kumar A., Dwivedi R.K. ‘Big data and its applications– a review’.
2018 International Conference on Advances in Computing, Communication
Control and Networking (ICACCCN); Greater Noida, India, IEEE, 2018. pp.
146–49.
[15] Haidar M., Kumar S. ‘Smart healthcare system for biomedical and health care
applications using aadhaar and blockchain’. 5th International Conference on
Information Systems and Computer Networks (ISCON); Mathura, India, 22-
23 Oct 2021; 2022. pp. 1–5.
[16] Kumar S., Cengiz K., Vimal S., Suresh A. ‘Energy efficient resource migra-
tion based load balance mechanism for high traffic applications IoT’. Wireless
Personal Communications. 2021, vol. 10(3), pp. 1–14.
[17] Kumar S., Ranjan P., Radhakrishnan R., Tripathy M.R. ‘Energy efficient mul-
tichannel MAC protocol for high traffic applications in heterogeneous wireless
sensor networks’. Recent Advances in Electrical & Electronic Engineering.
2017, vol. 10(3), pp. 223–32.
[18] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘Resource efficient clus-
tering and next hop knowledge based routing in multiple heterogeneous wire-
less sensor networks’. International Journal of Grid and High Performance
Computing. 2017, vol. 9(2), pp. 1–20.
[19] Punhani A., Faujdar N., Kumar S. ‘Design and evaluation of cubic to-
rus Network-on-Chip architecture’. International Journal of Innovative
Technology and Exploring Engineering (IJITEE). 2019, vol. 8(6), pp.
1672–6.
[20] Dubey G., Kumar S., Kumar S., Navaney P. ‘Extended opinion lexicon and
ML-based sentiment analysis of tweets: a novel approach towards accurate
classifier’. International Journal of Computational Vision and Robotics.
2020, vol. 10(6), pp. 505–21.
[21] Kumar S., Ranjan P., Radhakrishnan R., Tripathy M.R. ‘Energy aware dis-
tributed protocol for heterogeneous wireless sensor network’. International
Journal of Control and Automation. 2015, vol. 8(10), pp. 421–30.
[22] Singh P., Bansal A., Kamal A.E., Kumar S. ‘Road Surface Quality Monitoring
Using Machine Learning Algorithm’ in Reddy A.N.R., Marla D., Favorskaya
M.N., Satapathy S.C. (eds.). Intelligent Manufacturing and Energy
Sustainability. Smart Innovation, Systems and Technologies. 265. Singapore:
Springer; 2022.
[23] Sharma A., Awasthi Y., Kumar S. ‘The role of blockchain, AI and IoT for
smart road traffic management system’. Proceedings of the IEEE India
Council International Subsections Conference (INDISCON); Visakhapatnam,
India, 3-4 Oct; 2020. pp. 289–96.
[24] Kumar S., Ranjan P., Singh P., Tripathy M.R. ‘Design and implementation
of fault tolerance technique for internet of things (iot)’. 12th International
114 Intelligent network design driven by Big Data analytics
The present digital world technology is evolving at a rapid pace. To store, man-
age and protect the digital information, it is necessary to back up and recover the
data with utmost efficiency. As a solution, cloud computing that offers customers a
wide range of services can be used. Storage-as-a-Service (SaaS) is one of the cloud
platform’s services, in which a large volume of digital data is maintained in the
cloud database. Enterprise’s most sensitive data are stored in the cloud, ensuring
that it is secure and accessible at all times and from all locations. At times, informa-
tion may become unavailable due to natural disasters such as windstorms, rainfall,
earthquakes, or any technical fault and accidental deletion. To ensure data security
and availability under such circumstances, it is vital to have a good understanding
of the data backup and recovery strategies. This chapter examines a variety of cloud
computing backup and recovery techniques.
6.1 Introduction
Cloud computing is an idea of affording on-demand network access to a communal
pool of configurable computing resources (e.g., network, servers, applications, ser-
vices and storage) that may be swiftly supplied and released with no service provider
1
Department of Computer Science and Engineering, Puducherry Technological University, Puducherry,
India
2
Department of Electronics and Communication Engineering, Sri Manakula Vinayagar Engineering
College, Puducherry, India
3
Department of Information Science and Technology, College of Engineering, Anna University,
Chennai, India
4
Department of Electronics and Communication Engineering, IFET College of Engineering,
Villupuram, India
5
Electronics and Communication Engineering, University College of Engineering Tindivanam,
Tindivanam, India
116 Intelligent network design driven by Big Data analytics
or administrative effort contact. The cloud computing affords the computing ser-
vices via the Internet in order to afford more elastic resources. The major advantage
of using cloud services is that we have to pay only for the services we use, which
helps to run the organization efficiently in terms of economical scale [1].
Cloud computing has been hailed as the definitive and far-reaching solution
to IT companies’ growing storage expenses. Cloud computing has emerged as one
of the most popular strategies for delivering IT services. Clouds have arisen as a
computing infrastructure that enables the dynamically scalable, virtualized supply
of computing resources as a utility. IT behemoths including Amazon, Google, IBM
and Microsoft have all launched cloud computing initiatives. Because of the finan-
cial benefits, data outsourcing to cloud storage servers is becoming more popular
among businesses and customers.
Data recovery is the process of restoring the lost data due to data exploitation
or deletion unknowingly, or the data not being accessible due to technical reasons.
The restoration process is done from the system where the information was already
backed up.
The types of data that you certainly should be backing up may include:
1. Any files: word documents and excel spreadsheets are common examples of
information-carrying files that must be backed up. The document may also be
emailed to a mailbox or uploaded to a cloud service, allowing it to be accessed
at any time and from any place.
2. Any important entries made during the day should be secured and copied.
Emails, calendar entries, browsing data, contacts, social media records and
some unnoticed applications are some of the entries.
3. The media files such as photographs, images, audio and video. Media files are
usually much bigger, necessitating a different approach while storing in the
cloud.
6.2.1 Recovery
The data recovery services are offered to collect data by recovery techniques.
Different companies offer different strategies. The data recovery techniques used in
the following devices are:
Hard Drive Recovery: A significant number of data recovery services happen
due to hard disc failures. Due to the advanced technology, hard drives are grow
larger, and more data retrieval is needed than ever, as users or customers are con-
cerned about losing their data.
Cram on data recovery and backup cloud computing techniques 119
Optical Recovery: Optical media are an optical device that writes or reads data,
e.g., CDs and DVDs. Many complications can lead to a failure of optical media: the
CD or DVD can be scratched; the CD or DVD player can destroy the media.
Removable Recovery: These removable storage devices have a high chance to
erase the data than the other media platforms already discussed; hence, they are less
incurred.
RAID Recovery: It is really difficult to have Redundant Array of Inexpensive
Devices (RAID) systems because of the extreme skills required to build, manage
and maintain. Fault and configuration problems are more likely to occur. If RAID is
not backed up correctly, business failure or severe loss can be incurred.
Tape Recovery: Tape is used mostly for backup successive storage media; it
uses huge backup space. Experts having distinctive technologies and software tools
use tape as a recovery device.
Digital Recovery: Cameras, handheld storage devices and flash media are
included in the digital media. Since they are cheaper, a growing market for many
goods has been created for recovery. Data recovery organizations have introduced
suitable solutions for digital data loss.
6.2.2 Backup
Laxmi et al. explained that cloud backup is a kind of service used by the cloud to create,
edit, maintain and re-establish enterprise cloud resources and infrastructure [5]. This is
achieved through the internet remotely. Both online backup and remote backup can be
called as cloud backup. In cloud computing, the online backup via the internet can be used
anywhere at any time [6]. Online backup can be used without paying much importance
to technical/physical management that includes HSDR (High-Speed Data Rate Transfer),
PCS (Parity Cloud Service), ERGOT (Efficient Routing Grounded on Taxonomy) and
Linux Box.
Advantages
• It delivers high performance through parallel use.
• The PCS is very quick and easy to use. Further versatile for data retrieval based
on the parity retrieval service.
Disadvantages
• High access to the Internet: To retrieve the data, it requires fast Internet speed.
The users cannot access the files in case of network problems.
• Data are held on the server of third parties: Though backup of the main server
to an external server seems to be a good idea, it can also be a disadvantage. If
something goes wrong with the external server, users lose the data.
• Determine the bandwidth grant: While some service providers allow unre-
stricted bandwidth, many other providers provide a restricted allowance and
can charge an extra fee when users exceed their allocation.
120 Intelligent network design driven by Big Data analytics
The three major categories of backup services are full backup, differential
backup and incremental backup. The definitions of three types of backup techniques
are as follows:
Full backup: A complete backup or full backup is shown in Figure 6.1. The full
backup is the most essential and primary kind of backup process. This category of
backup copies all details into a different media package. The benefits of full backup
are it can be carried out after each operation and a complete copy of all data will be
available within a single medium.
Differential backup: The strategy used in differential backup is shown in
Figure 6.2. It copies all modified results from the last backup. However, every time
it is executed, it continues to copy all changed data and it takes additional space and
time to complete the previous complete backup.
Incremental Backup: It is a gradual backup process that could only duplicate
customer data that has changed since the last backup. The benefit of an incremental
backup is that it duplicates only a few records and hence consumes less time and
space. By this way, this operation is faster and requires less memory necessary for
backup savings; this process is shown in Figure 6.3.
remote locations without the network connection. Second, if the cloud storage gets
debased, which leads to loss of data, the user can recover the data easily. It also
focuses on the data security stored on the remote server. The working mechanism of
the SBA technique is as follows:
• As shown in Figure 6.4, the remote data recovery server, main cloud and the
number of consumers are the components of the SBA.
• The first step is to establish random numbers for each client in the central cloud
as a unique user ID.
• The second step is to create a seed block value for a user. For this, exclusively-
OR (XOR) operation is executed between the user ID and the random number
when the user is trying to access the main cloud so that the user ID is being
recorded in the cloud.
• When the user in the main cloud generates a new file, the seed block value of
that client is stored in the remote data backup server.
• The folder of the user remains XORed with the seed block value of the same
user in the cloud server.
• The resulted XORed file gets stored in the remote data backup server as a file.
• If unlucky, the file which is in the main cloud gets corrupted or deleted, the user
can access his/her file by doing an XOR operation between the file and the seed
block value of that respective user.
Advantages
• SBA takes less time to perform the recovery operation.
• It provides data integrity and security.
• It also ensures reliability of data.
• The implementation of this algorithm is easy and also a cost-efficient method.
• When the main cloud could not provide the requested data to the user, it aids in
recovering the lost data from the remote location.
Disadvantages
• The cost of implementation increases as the size of data increases.
• SBA takes more time to recover the data.
Advantages
• PhotoRec application is used to recover the data even if the data were removed
from the memory location.
• Using memory analysis software, the deleted data from the cloud can be
retrieved after deletion from the cloud.
• If the confidential records about users are erased by mistake, they can be
recovered.
• Rename module gives high protection to the user’s information.
• Rename changes the format of the original data type so data cannot be predicted
or hacked by anyone.
• The information contained in the data remains understandable unless the exact
type of data is known; as the original data format is difficult to know, data can-
not be retrieved.
Disadvantages
• Security problems are created with data recovery tools.
• Cloud servers recover the private data of users even if they delete the data from
the cloud.
• Reconstructing the user’s data results in compromise of privacy.
6.3.3 A
moeba: An autonomous backup and recovery solid-state
drives for ransomware attack defense
Donghyun et al. explained that ransomware is a type of malicious software that
prevents users from using their devices and other computing resources unless they
pay a ransom. Because ransomware can cause rapid economic loss, an efficient tech-
nique of ransomware mitigation is imperative [6]. Existing methods for detecting
and avoiding ransomware rely on recognizing known ransomware behaviors. In the
existing methods, the flash guard and SSD (Solid-State Drives) –- Insider is used for
data backup. In this chapter, Amoeba is proposed to solve the problem of ransom-
ware affecting the SSD backup and retrieval process. Amoeba instinctively executes
ransomware intrusion detection, alerting the entire system, data backup and retrieval
within the SSD.
In the proposed method, for all page write activity, Amoeba measures the prob-
ability of ransomware attacks. In the Ransomware Attack Risk Indicator (RARI)
hardware module, the Direct Memory Access (DMA) controller is expanded to
include a NAND flash inside the SSD. The RARI module calculates all page’s risks
and is used to decide whether the page is to be backed up.
Ransomware Risk Calculation: Intensity, similarity and entropy of Amoeba’s
RARI are used to identify possible ransomware attack risk indicators. By counting
the number of written requests, the intensity of the attack can easily be measured.
This is supported by Flash Translation Layer (FTL) firmware which is introduced
124 Intelligent network design driven by Big Data analytics
at a low price. The DMA controller gets access to new and old data to measure
similarities between the data. Therefore, it must issue the NAND flash memory with
an extra page, which is read, to store older data. Fresh information, old information
from the flash memory and the re-assembly of the controller are read from the main
memory. Amoeba uses a powerful classifier of ransomware that shows entropy, sim-
ilarity and strength. The RARI value is produced by standardizing the three indica-
tors using the new methodology named MinMaxScaler and carrying out the logistics
categorization. The following equation can be used to formulate the RARI value:
1
= , = ˛ SIM + ˇ ENT + INT + ı (6.1)
1+
Here in (6.1), the result of linear equations is given by ; the mean entropy, similarity
and write intensity are given by SIM,ENT, INT; the mean weights are ˛, ˇ, and
ıstands for bias.
Backup and Recovery: To incorporate the recovery and backup function,
Amoeba utilizes the Out-Of-Band (OOB) section of a page. The page’s OOB seg-
ment includes the RARI value and Backup Page Number. The backup technique
initially tests whether the write request follows the read-after-write pattern for every
overwrite request. Subsequently, it verifies if the present valid page RARI value is
greater than the threshold value specified. Amoeba would consider it as a written
request of ransomware attack if both the requirements are met. If one of them is not
met, it is considered as a regular write request.
Advantages
• For quick ransomware detection, amoeba inserts special RARI computing
equipment in the DMA module.
• It autonomously backs ups and restores the computer.
• RARI is an implementer using TFL at a low cost.
Disadvantages
• The detection of ransomware attacks and the management of backup data from
any device have limited efficiency.
• Consequently, low-level backup pages, which have a low likelihood of being
infected with ransomware, are invalidated by the SSD, when the backup state
page space becomes too large. It limits the backup page size by stabilizing the
SSD’s performance.
6.3.4 A
cloud-based automatic recovery and backup system for
video compression
Megha et al. explained that cloud computing provides data recovery and backup effi-
ciently. It is the technique of storing the data or services in a remote place where one can
Cram on data recovery and backup cloud computing techniques 125
gain access to it from anywhere and at any time [10]. Compression techniques are an
integral part in cloud computing, as it decreases the bandwidth required for storing and
accessing information from the cloud. Loading data takes place via both local backup
as well as cloud backup. In the local backup, compression takes place for the video files
with the help of FFMPEG (Fast Forward MPEG (Motion Picture Experts Group)) [11].
The cloud backup takes place in Microsoft one drive, where the graphs of compressed
and uncompressed files are examined for reduced bandwidth. And the user alone can
access the data from the cloud at any time from anywhere across the world. Thus, two
backups are carried out which will help in the protection of data along with compression
techniques for reduced bandwidth. The core objective of the chapter is to provide two
backups and video compression in the cloud.
The existing system of Dropbox storage does not have a recovery methodol-
ogy. If the data are lost in the local system, it will automatically get lost from the
Dropbox. Secondly, to store heavy files like audio and video files into the Dropbox,
lots of internet data are needed. This makes it difficult to back up and recover data
for a user. Hence moving to cloud-based recovery and backup which uses a Dropbox
system with automatic recovery is vital. The video files would get compressed in
the background and stored in the cloud. The background entities involved in the
proposed system are as follows:
• Skydrive API’s Library: It is provided by the Microsoft cloud services for stor-
ing a large amount of data. It is also called as OneDrive or SkyNet.
• FFMPEG: It is a multi-media framework used for the compression of video
and audio files.
• Cloud Storage as a Service: It is nothing but providing storage of huge amount
of data as a service in the cloud.
• File System Tracker: It is a web-based program for monitoring and transferring
items.
To communicate with the cloud from the system, .NET is an efficient lan-
guage for the browser. Three methods must automatically carry out simultane-
ously in the local storage as well as cloud storage. They are automatic backup,
recovery system and video compression. The system performs pre-backup check
daily.
Advantages
• Cloud-based recovery and backup uses a Dropbox system with automatic recov-
ery. It helps in recovering data from the cloud storage even though the data have
been deleted from the local storage. This is called as dual backup.
• The system also provides video compression with the help of libx264 encoder
at the back group automatically before backing them up into the cloud storage.
• The system also provides high reliability of user data.
126 Intelligent network design driven by Big Data analytics
6.3.5 E
fficient and reliable data recovery techniques in cloud
computing
Praveen et al. described that cloud computing provides various computing services
for global communication; one such service is SaaS [12]. Sensitive and essential
data are stored in the cloud in a remote site and can be accessed from anywhere
at any time. And if there is a failure or disaster, these data should be kept secure.
For this, data backup and recovery methods should be used to preserve the level of
security [13].
Many methods exist for recovering data from the cloud, but most of them are
inefficient and unreliable. A system with three modules has been proposed and it is
named as an Enriched Genetic Algorithm (EGA) to meet efficiency and reliability
in the data recovery process. The modules are user module, remote backup server
and main cloud server, as shown in Figure 6.5. Data are stored in both main cloud
server and remote server to meet efficiency, and to provide reliability, more backup
servers are used [14–17].
The procedure followed in EGA is as follows:
• User uploads file to the main cloud server and the number of backup cloud
servers.
• Hash code H1 of the file is generated and it is stored in the database.
• The number of copies of replicas is selected for storage.
• The size of the file is calculated.
• For downloading the files, the user has to select the file and download it. If it
is not in the main cloud, then it should be recovered from the backup storage.
• For recovering file, hash function H2 is generated.
• If both hash functions H1 and H2 are equal, then the original file is recovered.
Cram on data recovery and backup cloud computing techniques 127
Advantages
• If the data are lost in the main server, it can be recovered from the cloud server.
• Provides reliability by creating a number of backup servers.
• Multi-server system increases the data availability.
• Hashing function like MD5 is generated to provide integrity.
• Uploaded either in two-block or four-block server.
• Four-block server provides flexibility, i.e., lost files can be recovered from any
backup servers.
Disadvantages
• Requires more storage space.
• Four-block server takes more upload and recovery time than two-block servers.
• A replica of the same file is created N times.
The de-duplication is a vital part of a modern backup that separates data pack-
ets into variable-length chunks and afterward substitutes duplicate chunks for their
pre-stored samples with pointers. A de-duplication method can recognize each
chunk via its hash code, i.e. fingerprint. However, in reality, due to various lengths
of chunks, the de-duplication method handles data on a bigger entity known as a
128 Intelligent network design driven by Big Data analytics
container. A container is fixed in size and is commonly used as reading and writing
units. To secure the location of the network traffic, Neptune adds the chunks into
the containers. Neptune often utilizes the container as a pre-fetching device. The
Least Recently Used (LRU) algorithm is used to remove the preserved cached in a
container.
The second one is a rapid and more powerful design delta compressor. It exploits
file correlations and can generate compressed files drastically in a smaller size. To
enhance network transmission, therefore, it takes considerably smaller bits to trans-
mit; just the difference (or delta) between two data is required. A compressor that
allows two inputs is used in delta compression, where one input is the destination
file that has to be compressed and the second is a source file used for reference. The
creator of the delta compressor identifies and replicates the discrepancies between
both the destination and the source files. The delta and source data are used to pro-
duce an accurate replica of the destination from the compressor.
Delta Compression and Recovery: It is suggested that a delta- based de-
compression scheme can calculate the distinction between several identical files
besides the original one to provide an economical and effective recovery solution.
The users can artificially pick the base file or automatically by Neptune, which uses
well-known and powerful clustering algorithms. Neptune requires only maintaining
the deltas from other de-duplicated files and the base file. The delta-based archi-
tecture will reduce consumption of device resources dramatically and enable data
recovery functionalities. Neptune stores all the base files even if they are chosen for
retrieval by another user in order to promote the data retrieval during data recovery.
Neptune performs delta decoding procedures by computing the database files and
the deltas when customers want to retrieve information.
Advantages
• The most important idea of the chapter is to detect similarities, delta chains
along with compression and restoration, and to evaluate performance using
industrial datasets in the modern world.
• It provides remote backups at a low cost.
• It does not allow additional calculation on the deltas which reduces the latency.
• Neptune often uses shortcut delta chains to help quick data recovery.
Disadvantage
• Time consumption is high.
• Due to the complexity of the procedures, financial inefficiency is likely to occur.
CDR is predominantly an IaaS approach that backs up and designates the data in
Remote Off-site Cloud Server database. IT systems are being used by an increasing
number of providers, and some of them are financial and non-financial services.
Many significant businesses and government services use the process of disaster
management to protect key data and to reduce downtime triggered by disaster device
defects [19–33].
The different types of methods used in CDR are regular backup or persistent
information synchronization and preparation for a standby network in geographi-
cally isolated areas [34–37]. Figure 6.6 shows the disaster recovery in the multi-
cloud environment.
Advantages
• Using multiple clouds not only allows picking the most cost-effective options
but it also allows choosing the best cloud services to fulfill specific business
needs.
• Adopting a multi-cloud strategy can also enable businesses to avoid vendor
lock-in decreasing their dependency on a single cloud supporter.
• Using the retrieval of multi-cloud catastrophes, one can duplicate their resources
to a secondary cloud provider in some other location.
• Respective cloud provider’s Disaster Recovery (DR) services are planned to
deal with numerous cloud suppliers.
Disadvantages
• If the system is not managed correctly, the costs of the use of this system may
increase and the business agility will be affected.
• A secured approach and a multi-cloud approach are the crucial areas to be han-
dled effectively.
Table 6.2 Data recovery and backup cloud computing techniques
DR-Cloud: Multi-cloud-based DR-Model cloud • Allows replication of resources This methodology is made more
disaster recovery to a different geographic region difficult due to complicated security
using a second cloud provider. services and a multi-cloud approach.
Cram on data recovery and backup cloud computing techniques 131
Table 6.2 illustrates the data recovery and backup and cloud computing tech-
niques. The table explains the techniques, advantages and disadvantages,
6.4 Conclusion
References
[6] Min D., Park D., Ahn J., et al. ‘Amoeba: an autonomous backup and recovery
SSD for Ransomware attack defense’. IEEE Computer Architecture Letters.
2018, vol. 17(2), pp. 245–8.
[7] Monisha S., Venkateshkumar D.S. ‘Cloud computing in data backup and
data recovery’. International Journal of Trend in Scientific Research and
Development. 2018, vol. 2(6), pp. 865–7.
[8] Dharanyadevi P., Therese M.J., Venkatalakshmi K. ‘Internet of things‐
based service discovery for the 5G‐VANET milieu’. 'Cloud and IoT–Based
Vehicular Ad Hoc Networks', Publisher: Wiley. 2021, pp. 31–45.
[9] Surbiryala J., Rong C. ‘Data recovery and security in cloud’. 9th International
Conference on Information, Intelligence, Systems and Applications (IISA),
Akynthos, Greece; IEEE, 2018.
[10] Therese M.J., Devi A., Kumar T. A. 'Interfacing FOG and cloud computing
for IOT applications'. Recent Developments In Computing, Electronics and
Mechanical Sciences. 2020.
[11] Raigonda M.A., Raigonda M.R., Raigonda M.R. 'A cloud based automatic
recovery and backup system with video compression'. International Journal
Of Engineering And Computer Science. 2016, vol. 5(9).
[12] Therese M.J., Ezhilarasi C., Harshitha K., Jayasri A. ‘Secured data partition
and transmission to cloud through FOG computing for IOT application’.
IJAST. 2020, vol. 29(11), pp. 921–31.
[13] Kumar S., Ranjan P., Ramaswami R. ‘EMEEDP: enhanced multi-hop energy
efficient distributed protocol for heterogeneous wireless sensor network’.
Proceedings of the 5th International Conference on Communication Systems
and Network Technologies, CSNT 2015; Gwalior, India, 4-6 Apr 2015;
2015–194–200
[14] Kumar S., Ranjan P., Ramaswami R. ‘Energy optimization in distrib-
uted localized wireless sensor networks’. Proceedings of the International
Conference on Issues and Challenges Intelligent Computing Technique
(ICICT); Ghaziabad, India, IEEE, 2014.
[15] Chauhan R., Kumar S. ‘Packet loss prediction using artificial intelligence uni-
fied with big data analytics, internet of things and cloud computing technolo-
gies’. 5th International Conference on Information Systems and Computer
Networks (ISCON); Mathura, India, 2021. pp. 1–6.
[16] Sudhakaran S., Kumar S., Ranjan P., Tripathy M.R. ‘Blockchain-based
transparent and secure decentralized algorithm’. International Conference
on Intelligent Computing and Smart Communication 2019. Algorithms for
Intelligent Systems; Singapore: Springer. THDC-Institute of Hydropower
Engineering and Technology, Tehri, India, 2020.
[17] Kumar S., Trivedi M.C., Ranjan P. Evolution of Software-Defined Networking
Foundations for IoT and 5g Mobile Networks. Hershey, PA: IGI Publisher;
2020. p. 350.
[18] Devi A., Julie Therese M., Premalatha G. ‘Cloud computing based intelli-
gent bank locker system’. Journal of Physics: Conference Series. 2021, vol.
1717(1), p. 012020.
Cram on data recovery and backup cloud computing techniques 133
[19] Kumar S., Ranjan P., Radhakrishnan R., Tripathy M.R. ‘Energy efficient mul-
tichannel MAC protocol for high traffic applications in heterogeneous wireless
sensor networks’. Recent Advances in Electrical & Electronic Engineering.
2017, vol. 10(3), pp. 223–32.
[20] Ali S.A., Affan M., Alam M. ‘A study of efficient energy management tech-
niques for cloud computing environment’. 9th International Conference on
Cloud Computing, Data Science & Engineering (Confluence); 2019.
[21] Hua Y., Liu X., Feng D. ‘Cost-efficient remote backup services for enterprise
clouds’. IEEE Transactions on Industrial Informatics. 2016, vol. 12(5), pp.
1650–7.
[22] Yu Gu., Dongsheng Wang., Chuanyi Liu. ‘DR-cloud: multi-cloud based dis-
aster recovery service’. Tsinghua Science and Technology. 2014, vol. 19(1),
pp. 13–23.
[23] Chakraborty B., Chowdhury Y. ‘Disaster recovery: background’. Introducing
Disaster Recovery with Microsoft Azure. 2020, pp. 1–41.
[24] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘Resource efficient clus-
tering and next hop knowledge based routing in multiple heterogeneous wire-
less sensor networks’. International Journal of Grid and High Performance
Computing. 2017, vol. 9(2), pp. 1–20.
[25] Kumar S., Cengiz K., Vimal S., Suresh A. ‘Energy efficient resource migra-
tion based load balance mechanism for high traffic applications IoT’. Wireless
Personal Communications. 2021, vol. 10(3).
[26] Haidar M., Kumar S. Smart healthcare system for biomedical and health care
applications using aadhaar and blockchain. 5th International Conference on
Information Systems and Computer Networks (ISCON); Mathura, India, 22-
23 Oct 2021; 2022. pp. 1–5.
[27] Punhani A., Faujdar N., Kumar S. ‘Design and evaluation of cubic Torus
network-on-chip architecture’. International Journal of Innovative Technology
and Exploring Engineering. 2019, vol. 8(6), pp. 2278–3075.
[28] Dubey G., Kumar S., Kumar S., Navaney P. ‘Extended opinion lexicon and
ML-based sentiment analysis of tweets: a novel approach towards accurate
classifier’. International Journal of Computational Vision and Robotics.
2020, vol. 10(6), pp. 505–21.
[29] Kumar S., Ranjan P., Radhakrishnan R., Tripathy M.R. ‘Energy aware dis-
tributed protocol for heterogeneous wireless sensor network’. International
Journal of Control and Automation. 2015, vol. 8(10), pp. 421–30.
[30] Singh P., Bansal A., Kamal A.E., Kumar S. ‘Road surface quality monitoring
using machine learning algorithm’ in Reddy A.N.R., Marla D., Favorskaya
M.N., Satapathy S.C. (eds.). Intelligent Manufacturing and Energy
Sustainability. Smart Innovation, Systems and Technologies. 265. Singapore:
Springer; 2022.
[31] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘A utility maximiza-
tion approach to MAC layer channel access and forwarding’. Progress in
Electromagnetics Research Symposium, Jan; 2015. pp. 2363–7.
134 Intelligent network design driven by Big Data analytics
[32] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘An NS3 implementa-
tion of physical layer based on 802.11 for utility maximization of WSN’.
Proceedings of the International Conference on Computational Intelligence
and Communication Networks, CICN 2015; 2016. pp. 79–84.
[33] Reghu S., Kumar S. ‘Development of robust infrastructure in networking to
survive a disaster’. 4th International Conference on Information Systems and
Computer Networks, ISCON 2019; 2019. pp. 250–5.
[34] Singh P., Bansal A., Kumar S. ‘Performance analysis of various information
platforms for recognizing the quality of Indian roads’. Proceedings of the
Confluence 2020-10th International Conference on Cloud Computing, Data
Science and Engineering; 2020. pp. 63–76.
[35] Sharma A., Awasthi Y., Kumar S. ‘The role of blockchain, AI and IoT for
smart road traffic management system’. Proceedings of the IEEE India
Council International Subsections Conference; INDISCON 2020; 2020. pp.
289–96.
[36] Kumar S., Ranjan P., Singh P., Tripathy M.R. ‘Design and implementation of
fault tolerance technique for Internet of things (IoT)’. Proceedings of the 12th
International Conference on Computational Intelligence and Communication
Networks, CICN 2020; 2020. pp. 154–9.
[37] Chauhan R., Kumar S. ‘Packet loss prediction using artificial intelligence uni-
fied with big data analytics, Internet of things and cloud computing technolo-
gies’. 5th International Conference on Information Systems and Computer
Networks; 2021. pp. 01–6.
Chapter 7
An adaptive software-defined networking (SDN)
for load balancing in cloud computing
Swati Lipsa 1, Ranjan Kumar Dash 1, and Korhan Cengiz 2
7.1 Introduction
“The Internet of Things (IoT) is a network of physical objects (or ‘things’) equipped
with sensors, software, as well as other technologies that facilitate interaction and
1
Department of IT, Odisha University of Technology and Research, India
2
College of IT, University of Fujairah, UAE
136 Intelligent network design driven by Big Data analytics
exchange of information between devices and systems via the internet” [1]. This
means the Internet of Things (IoT) is a network that comprises internet-enabled
smart devices that can gather, transmit, and process data from their application envi-
ronments using embedded devices like sensors, CPUs, and different types of com-
munication hardware. IoT devices can be connected to an IoT gateway to exchange
the acquired sensor data from various devices. The sensor data can then be pro-
cessed locally or routed to the cloud for further analysis. The data collected by these
devices can be translated into valuable information such as finding patterns, provid-
ing suggestions, helping in decision-making, and identifying potential problems that
may arise without the need for human intervention in the majority of the cases. In
this manner, an IoT application interacts with smart devices that automate processes
to meet certain requirements. The application domain ranges from the medical and
healthcare sector to smart home, industrial automation, smart cities, smart grids,
transportation and logistics, retail, and so on. This paves the way for advanced ser-
vices aimed at raising the standard of living.
The IoT is perceived as a key frontier that has the ability to improve nearly all
aspects of our life. It enables humans to live and work smartly and achieve control
over their daily activities to a large extent. Many businesses have started to use this
technology because it gives firms a real-time glimpse of how their systems operate,
providing insights into anything from machine performance to logistics operations
and supply chain management. This technology comes with numerous benefits,
some of which include: the capability to monitor and manage activities, saving
money, time, and resources by automating processes, allowing analytical decisions
to be made more quickly and precisely, improving the Quality of Service (QoS), and
enhancing customer experience. The IoT still has a lot of unexplored opportunities.
This technology may attain its full potential if all devices are able to interact with
one another, independent of brand or company.
As the number of devices appears to be escalating by leaps and bounds, and
there is an enormous amount of data being generated, the traditional network infra-
structure is struggling to catch up with this new digital era. The traditional network
infrastructure fails to effectively manage the network resources, especially during
peak hours. Traditional networking supports the old-fashioned approach of network-
ing strategy that uses fixed and dedicated hardware devices like routers and switches
to monitor network traffic. Traditional networks are becoming more sophisticated as
more protocols are used to increase reliability and network speeds. The absence of
open and standard interfaces hinders interoperability. Network devices are limited in
their ability to evolve since they are built on proprietary hardware and software and
are static. The traditional networks were well adapted to static traffic patterns and
featured a plethora of protocols tailored to particular tasks. Due to the complexity of
traditional network components making modifications to networks to accommodate
flexible traffic patterns has grown extremely difficult.
Traditional networks have a high management overhead. In traditional networking
infrastructures, businesses are frequently tied to a specific vendor, without any standard
procedures for configuring devices across network providers. It is becoming increas-
ingly challenging for network administrators to manage various network devices and
An adaptive software-defined networking (SDN) for load balancing 137
interfaces from various vendors. A network upgrade necessitates alterations to the con-
figuration of numerous devices. Furthermore, networks must expand to accommodate
hundreds or thousands of new devices with varying performance and service require-
ments. When it comes to configuring the network system, the network provider faces
significant challenges in making modifications to the network system, as individual
devices must be manually programmed for any change in the network infrastructure.
Another issue is the degradation of scalability as well because networks are congested
and cannot be rapidly reconfigured to serve critical traffic. The conventional method
of network management is being rendered outdated by the increasingly large amounts
of data traffic, complicated network design, and growing expectations to enhance the
performance of the network. The inability to fulfill these demands is a big drawback of
traditional networks. These limitations make it difficult for IoT devices to communicate
in a versatile and reliable manner [2].
In the light of the aforementioned issues, Software-Defined Networking (SDN)
is perceived to be a technology enabler for delivering adequate solutions [3]. It has
emerged as a modern and promising model for migrating from existing traditional
networks and facilitates users with more programmability and easier resource man-
agement. It is a networking architecture in which the control plane (network’s con-
trol functions) and data plane (packet forwarding) are decoupled and the network
controller is centralized. This architecture varies from conventional networks that
regulate network traffic through specialized hardware devices such as switches and
routers. SDN allows the use of software to establish and manage a virtual network,
as well as traditional hardware.
There are numerous motivating reasons underlying the development of SDN.
Some of them include: lead to faster provisioning and administration of network
resources, has the potential to simplify statically designed networks, the network can
be overviewed conveniently from a single place, offers cost-effective operation as
administrative cost decreases and server utilization increases, adheres to open stan-
dards, and is compatible with network hardware from any manufacturer. The objec-
tive of SDN is to enable network administrators and enterprises to respond quickly
to changing business requirements. SDN can be updated swiftly and in bulk, with-
out reconfiguring each device manually. Since this architecture optimizes the net-
work operations and can be programmed through software, they are more versatile,
agile, and scalable as well as fit well with cloud architecture. As a result, SDN has
become an important component of a cloud-based framework [4]. The advantages of
integrating SDN with cloud computing include operational cost reduction, security,
reduced downtime, cloud abstraction, etc. These capabilities of SDN-enabled cloud
computing make this an ideal choice for effectively managing the dynamic nature
of IoT.
Despite its diverse capabilities and unique characteristics, SDN introduces new
challenges for network engineers and service providers emphasizing on energy effi-
ciency, performance, security, load balancing, and virtualization [5]. To make the
maximum utilization of an SDN-based cloud network, the system must balance the
load among multiple devices in the network, and our work concentrates on harness-
ing the benefits of a balanced SDN network.
138 Intelligent network design driven by Big Data analytics
The rest of this chapter has been organized as follows: Section 7.2 delves into
related studies on load balancing with SDN controllers. Section 7.3 describes the
basic background related to SDN architecture in a subtle way. Section 7.4 specifi-
cally discusses the taxonomy of load balancing in SDN. Section 7.5 provides an
insight into the implementation of our proposed algorithms. An illustration of the
proposed algorithm is given in Section 7.6. The proposed mechanism’s performance
is discussed and the outcomes are analyzed in Section 7.7. The summary of the work
as well as future research potential is presented in Section 7.8.
7.2 Related works
The work carried out in [6] introduces a distributed architecture- based load-
balancing technique for SDN controllers called a dynamic and adaptive algorithm,
which is demonstrated by implementing a prototype system based on floodlight. The
OpenFlow switch and controller packages [7] are used to create an SDN-based cloud
computing platform that enables certain functions such as energy conservation, load
balancing, etc. To resolve the issue of managing elephant flows in data center net-
works, an SDN-based load-balancing (SBLB) strategy for elephants [8] is addressed
in this chapter that adapts multiple routing paths to distribute the load in response to
changing load conditions. The paper [9] presents OpenFlow Round-Robin (RR) and
OpenFlow Least-Connections algorithms to overcome the limitations of conven-
tional load-balancing algorithms. It is found that the OpenFlow Least-Connections
algorithm performs better as compared to the OpenFlow RR algorithm in balancing
the load and has a shorter response time. A middlebox based on the Clos network
and SDN is developed [10] to improve bandwidth usage while ensuring QoS for
data centers. The authors in [11] present a technique based on employing an appro-
priate configuration for the SDN controller operating parameters in order to mini-
mize the SDN controller computational load by modifying the activity of the control
functions with little influence on the efficiency of the functions. An architecture
facilitating software-defined clouds is formulated in [5] that emphasizes on mobile,
web, and enterprise cloud applications, and the same is evaluated for two use cases,
namely, QoS-aware bandwidth allocation and bandwidth-aware energy-efficient
VM (Virtual Machine) placement.
The dependence on a single controller can lead to scalability issues, so Sufiev
and Haddad [12] have proposed a multi-controller architecture as a solution that
enables dynamic load balancing and reduces the interdependence between Super
Controller and Regular Controller. The load- balancing technique developed in
Reference [13] is based on multiple distributed controllers. Furthermore, the experi-
ment was done using floodlight and exhibits that this technique can dynamically bal-
ance each controller’s load while minimizing load-balancing time. The paper [14]
presents an Ant Colony Optimization (LLBACO) for link load-balancing approach
that can balance network link traffic, increase QoS, and reduce network overhead.
The work carried out in Reference [15] introduces an SDN-enhanced Inter cloud
Manager (S-ICM) to assign network flows in the crowded cloud network. The
An adaptive software-defined networking (SDN) for load balancing 139
the gateway between the user and the server. Due to the dense traffic inside the net-
work, issues of bottlenecks may arise at gateways. Hence, the load of such networks
discussed earlier needs to be distributed evenly to avoid congestion. While some
studies concentrated on static load balancing, some on dynamic load balancing, and
others on diverse multipath routing algorithms to control traffic, they all had one
thing in common: they were all about traffic management. There are also shreds of
evidence of developments that have integrated both load balancing and routing to
resolve the bottleneck at gateways.
But looking into the load-balancing techniques in the previously explained algo-
rithms, they can be termed as practically insufficient to deliver a substantial amount
of efficiency even while working in the multi-controller distributed SDN architec-
ture. In a distributed architecture, multiple controllers can be arranged either in a flat
or hierarchical manner. The controllers arranged in a hierarchical fashion typically
consist of a controller on top level that acts as the super controller and manages the
load among local controllers. However, the possibilities of a single point of failure
cannot be ruled out as there is only a single super controller in the hierarchy. The
existing studies pertaining to multiple controllers do not discuss the way of finding
an alternative in case of a super controller failure as mentioned earlier. Our proposed
algorithm promises to cater to the needs of an effective load-balancing mechanism
by finding a replacement strategy in case of a super controller failure in a hierarchi-
cally distributed SDN architecture. This will prove to be an algorithm for niche
categories in the area of load balancing.
1. Infrastructure layer: networking devices that monitor the network’s routing and
data processing capabilities are orchestrated in this layer. These devices can be
a collection of switches and routers in a data center responsible for handling
packets according to the rules set by a controller. This layer deals with gathering
various network parameters like the flow of traffic, topology, network utiliza-
tion, etc., and sends them to the control layer. This layer acts as the physical
layer over which network virtualization is established using the control layer.
2. Control layer: this layer is perceived as the brain of SDN. The SDN controller
serves as a bridge between the application and infrastructure layer. This layer
owes its intelligence to the centralized SDN controller software that controls net-
work infrastructure. The controller accepts the application layer’s requirements
An adaptive software-defined networking (SDN) for load balancing 141
and conveys them to the networking devices. It also sends back the information
fetched from networking devices to the application layer.
3. Application layer: services and applications running in this layer describe the
behavior of a network. These applications are programs that employ APIs to
communicate their desired network requirements and behavior to the SDN con-
troller. Furthermore, these applications can provide the network’s abstract view
by obtaining information from the controller for appropriate decision-making.
There are numerous types of applications that can be developed like network
configuration and management, intrusion detection systems, traffic monitoring
and control, business policies, etc. In the real world, these applications offer a
range of end-to-end solutions for corporate and data center networks.
The APIs act as the control point for all network components. The APIs in the
SDN framework are known as southbound and northbound interfaces that repre-
sent the communication between controllers, network devices, and applications. A
southbound interface enables a network component, i.e., controllers to interact with
lower-level components, i.e., switches and routers. OpenFlow is a southbound API
used by the administrators to add, amend, and delete entries in the internal flow table
of network switches and routers allowing the network to accommodate real-time
traffic demands. On the contrary, a northbound interface facilitates communication
between higher-level components, i.e., it is the connection between the controller
and the application. The end-user application conveys its requisites to the network,
such as data, storage space, bandwidth, and so on, and the network replies to the
application with the appropriate resource based on the resource availability.
Despite the fact that a controller utilizes just one-third of the OpenFlow pro-
tocol, it is very vital. A controller manages and regulates all of the switches and
routers, creates a virtualized network, and forwards the incoming packets. Taking
into consideration the significance of controller in the SDN framework and the var-
iegation of architectures and deployments in the business and research fields, there
is a demand to evaluate and benchmark all of these options with respect to numerous
performance indicators. When the traffic increases due to the increase in dynamic
arrival of incoming packets, the potentiality of a single controller to provide service
for the overwhelming traffic decreases in terms of processing power, bandwidth lim-
its, and memory. This leads to the network relying on a single SDN controller facing
a catastrophic bottleneck making it the point of failure [31]. Due to this limitation,
the SDN network is susceptible to face issues related to packet loss, scalability, load
balancing, reliability, and performance degradation. Therefore, this situation neces-
sitates the role of multiple controllers to address concerns pertaining to load balanc-
ing and performance improvement.
Load balancing refers to the approach of distributing computational tasks among a set
of resources. It is the technique of sharing network traffic across multiple servers. The
142 Intelligent network design driven by Big Data analytics
goal of load balancing is to deal with unexpected traffic surges, maximize throughput,
reduce latency, reduce response time, speed up the performance, and evade the over-
burden of any single resource. Figures 7.1 and 7.2 depict an overview of a network
without and with load balancing, respectively.
7.4.1.1 Distributed architecture
Distributed architecture deals with the dispersion of loads over the entire set of oper-
ating nodes rather than relying completely on a single node. The controllers and
switches both act as nodes and have many-to-many interactions among themselves.
Here, every controller operates as a super controller, capable of making decisions
based on the network it is operating. However, before an overloaded controller can
decide to reroute incoming packets, it must first acquire load information from other
An adaptive software-defined networking (SDN) for load balancing 143
controllers. This load information is acquired on a real-time basis [32]. This approach
promises to exhibit improved scalability over centralized architecture. There are two
types of distributed control plane layouts, i.e., flat and hierarchical [11].
7.4.1.2 Hybrid architecture
Hybrid serves as an architecture where both distributed and centralized architecture
coexist [33]. This architecture has the advantage of distributed processing and cen-
tralized control where the distributed processing part is borrowed from distributed
architecture and the latter from the centralized architecture.
7.4.1.3 Centralized architecture
A specialized super controller is used in the centralized approach for gathering the load-
ing statuses of all other controllers and coordinating traffic management. This special-
ized super controller acts as a coordinator that maintains the global controller load infor-
mation table. When there is an uneven load distribution among controllers, the controller
decides to balance the load using the load information table. However, when the super
controller fails or becomes overloaded, the whole network comes down due to a single
point of failure, just like when only one controller is employed [22].
7.5 Problem statement
a group of distributed computer processes. The process with the highest process
ID number from among the non-failed processes is selected as the coordinator”
[34,35,36,37,38,39,40,41,42].
There can be different strategies for the replacement of super controllers
depending on the kind of applications and the number of super controllers that fail or
become overwhelmed in certain instances of time. In case of failure of a single super
controller, the same can be replaced by another super controller, whereas the failure
of multiple super controllers can be handled by replacement of inactive super con-
trollers by another set of multiple super controllers. We have applied the Modified
Bully algorithm for efficient determination of such replacement strategies of super
controllers in case of failures.
The Modified Bully algorithm treats super controllers in the same way that the
Bully algorithm handles processes. A Modified Bully algorithm is a leader elec-
tion algorithm that dynamically elects a coordinator or leader from a collection of
passive super controllers. In this algorithm, weight is assigned to the passive super
controller based on the hop distance between the passive and active super controller.
The passive super controller with the highest assigned weight from among the avail-
able passive super controllers is selected as the candidate replacement for the failed
super controller [43,44,45,46,47,48,49,50,51].
When a super controller becomes overloaded and fails, the system undertakes
load balancing and redirects packet flows to another super controller with a lower
load. As a result, the proposed SDN architecture’s load management technique
focuses on coordination among all super controllers.
Modified Bully algorithm entails the following steps:
1. Each passive super controller has been assigned a unique weight with respect to
each active super controller.
2. Every active super controller knows the assigned weight to each of the passive
super controllers.
3. The passive super controller that detects the failure of the active super controller
initiates the election.
4. Several passive super controllers can initiate an election simultaneously, and
the passive super controller with the highest assigned weight is elected as the
new coordinator.
There are two types of super controllers, viz. active and passive. Each active
super controller exhibits two states, i.e., ACTIVE and FAILED, whereas each passive
super controller can take up two states, i.e., AVAILABLE and NOT_AVAILABLE.
146 Intelligent network design driven by Big Data analytics
7.5.2 Network setup
Let S be a set of super controllers and Si be the ith super controller. Let N be the set
of networks and Nj be the jth network, such that for SiNj represents that the jth net-
work is managed by the ith super controller. A controller is active if it has at least
one network to handle, otherwise, it is passive. An active controller possesses a flow
table of the network under its control, whereas a passive controller maintains the
flow table of all networks controlled by the active controller. Let W be the weight
assigned to passive super controllers and Wk be the weight of kth super controller.
Let W’ be the weight assigned to active controllers such that W′ > W. Let m be the
number of packets to be processed by super controller Si and C be the capacity of
the super controller to process incoming packets such as C(Si) is the capacity of the
ith super controller.
An adaptive software-defined networking (SDN) for load balancing 147
The function process(Si, m) represents that ith super controller processes m num-
ber of packets.
7.6 Illustration
The illustration of the above configuration (i.e Figure 7.4 and Figure 7.5) can be
explained as follows.
For instance, the S consists of seven Si, namely S1, S2, S3, S4, S5, S6, and S7, where
S1, S3, S5, and S7 are active and manage networks N1, N2, N3, and N4, respectively,
as shown in Figure 7.6. The super controllers S2, S4, and S6 are passive. Weights are
assigned to the passive controllers (i.e. S2, S4, and S6) on basis of the number of hops
from the starting point. Assuming that S1 is the starting point, S1 is one hop away
from S2, so weight assigned for S1 to S2= 4. Likewise, if only passive controllers are
included, the weight assigned for S1 to S6 = 3 and S1 to S4 = 2. In a typical scenario,
the following conditions might arise while working with such types of networking
environments.
i. S1 fail, S2 available:
Suppose S4 tries to communicate with S1 and receives no reply from S1.
So S4 gets to know that S1 has failed and starts the election for replacing
148 Intelligent network design driven by Big Data analytics
Figure 7.4 Network configuration of the super controller and local controller
The simulation of the proposed model is performed by using the python program-
ming language with the help of the POX tool. POX tool is an open-source platform
for SDN controllers like OpenFlow. The super controllers as well as the switches are
configured to operate over the virtual networks.
The following two criteria are taken into consideration for the simulation of the
proposed model:
Case I: Allowing the super controller to transmit packets to the appropriate net-
work without using the Modified Bully algorithm, even if one or more super control-
lers are down.
Case II: When a super controller fails, the Modified Bully algorithm is employed
to find a replacement for it.
Both scenarios are compared to determine the performance in terms of various
metrics such as throughput, packet transmission ratio (PTR), and packet loss (as
shown in Figures 7.7–7.9, ).
7.7.1 Comparison of throughput
The observations of the simulation of the two scenarios are recorded once one or
more super controllers start failing (i.e. after 100 s of operation), the reason being
An adaptive software-defined networking (SDN) for load balancing 151
Figure 7.8 Number of packets per second vs. packet transmission ratio
they behave similarly in the absence of failure (Figure 7.7). This figure depicts
the overall network throughput over runtime with a moderate number of packets
per second. Both of the aforementioned situations have a nearly identical perfor-
mance by fluctuating over two levels of throughput. However, implementation of
the Modified Bully algorithm for load balancing results in a remarkable increase
in overall system throughput compared to not using the Modified Bully algorithm,
because the Modified Bully algorithm easily finds a replacement to combat super
controller failure.
7.7.2 Comparison of PTR
The PTR is calculated using the following formula:
Ps
PTR = (7.1)
pT
where PS = number of packets processed successfully and PT = total number of
packets received.
By using (7.1), the PTR is evaluated for both cases, as shown in Figure 7.8. This
figure indicates how the PTR of the total network increases when the number of
packets received per second increases up to a certain value beyond which the PTR of
the system shows a gradual decrease with further increase in the number of packets
152 Intelligent network design driven by Big Data analytics
Figure 7.9 Number of packets per second vs. number of packet loss
received per second, due to increase number of packet loss. Here these values are
termed as threshold values which are 0.92 at 120 packets/s and 0.98 at 175 packets/s
for the two cases, respectively.
7.8 Conclusion
In spite of the features of SDN that emerge as a break-through technique for han-
dling complicated networking operational difficulties, there are some possibilities
of a computational resource-limited controller being overwhelmed by heavy traffic
and then experiencing unexpected delays. A multi-controller architecture evolves as
a solution to the issues faced by the single controller, but uneven load distribution
An adaptive software-defined networking (SDN) for load balancing 153
References
[1] Aazam M., Khan I., Abdullah Alsaffar A., Huh E.-N. ‘Cloud of things:
Integrating Internet of things and cloud computing and the issues involved’.
11th International Bhurban Conference on Applied Sciences & Technology
(IBCAST); Islamabad, Pakistan, 14-18 Jan; 2014.
[2] 'Traditional network infrastructure model and problems as-
sociated with it'. Journal of Pluribus Networks. 2012, vol.
14(2). Available from https://pluribusnetworks.com/blog/
traditional-network-infrastructure-model-and-problems-associated-with-it/
[3] Ojo M., Adami D., Giordano S. ‘A SDN-iot architecture with NFV imple-
mentation’. IEEE Globecom Workshops (GC WKSHPS); Washington, DC,
USA, IEEE, 4-8 Dec, 2016. pp. 1–6.
[4] Barros S. ‘Applying software-defined networks to cloud computing’. 33rd
Brazilian Symposium on Computer Networks and Distributed Systems;
Vitória, ES, Brazil, 18-22 May; 2015.
[5] Buyya R., Calheiros R.N., Son J., Dastjerdi A.V., Yoon Y. ‘Software-defined
cloud computing: architectural elements and open challenges’. International
Conference on Advances in Computing, Communications and Informatics
(ICACCI); Delhi, India, 24-27 Sep; 2014.
[6] Zhou Y., Zhu M., Xiao L., et al. ‘A load balancing strategy of SDN controller
based on distributed decision’. IEEE 13th International Conference on Trust,
Security and Privacy in Computing and Communications; Beijing, China, 24-
26 Sep; 2014.
[7] Yen T.-C., Su C.-S. ‘An SDN-based cloud computing architecture and its
mathematical model’. International Conference on Information Science,
Electronics and Electrical Engineering; Sapporo, Japan, 26-28 Apr; 2014.
[8] Liu J., Li J., Shou G., Hu Y., Guo Z., Dai W. ‘SDN based load balancing mech-
anism for elephant flow in data center networks’. International Symposium on
Wireless Personal Multimedia Communications (WPMC); Sydney, NSW, 7-
10 Sep; 2014.
[9] Zhang H., Guo X. ‘SDN-based load balancing strategy for server cluster’. IEEE
3rd International Conference on Cloud Computing and Intelligence Systems;
Shenzhen, 27-29 Nov; 2014.
154 Intelligent network design driven by Big Data analytics
[10] Tu R., Wang X., Zhao J., Yang Y., Shi L., Wolf T. ‘Design of a load-balancing
middlebox based on SDN for data centers’. IEEE Conference on Computer
Communications Workshops (INFOCOM WKSHPS); Hong Kong, China, 26
Apr-1 May; 2015.
[11] Caba C., Soler J. ‘Mitigating SDN controller performance bottlenecks’.
24th International Conference on Computer Communication and Networks
(ICCCN); Las Vegas, NV, 3-6 Aug; 2015.
[12] Sufiev H., Haddad Y. ‘A dynamic load balancing architecture for SDN’. IEEE
International Conference on the Science of Electrical Engineering (ICSEE);
Eilat, Israel, 16-18 Nov; 2016.
[13] Yu J., Wang Y., Pei K., Zhang S., Li J. ‘A load balancing mechanism for mul-
tiple SDN controllers based on load informing strategy’. 18th Asia-Pacific
Network Operations and Management Symposium (APNOMS). IEEE;
Kanazawa, Japan, 5-7 Oct; 2016.
[14] Wang C., Zhang G., Xu H., Chen H. ‘An ACO-based link load-balancing al-
gorithm in SDN’. 7th International Conference on Cloud Computing and Big
Data (CCBD); Macau, China, 16-18 Nov; 2016.
[15] Kang B., Choo H. ‘An SDN-enhanced load-balancing technique in the cloud
system’. The Journal of Supercomputing. 2018, vol. 74(11), pp. 5706–29.
[16] Kanagavelu R., Aung K. ‘Software-defined load balancer in cloud data cent-
ers’. Proceedings of the 2nd International Conference on Communication and
Information Processing; Singapore, 26-29 Nov; 2016.
[17] Li L., Xu Q. ‘Load balancing researches in SDN: a survey’. 7th IEEE
International Conference on Electronics Information and Emergency
Communication (ICEIEC); Macau, China, 21-23 Jul; 2017.
[18] Benamrane F., Mamoun M.B., Benaini R, Ben Mamoun M., Redouane B.
‘New method for controller- to-
controller communication in distributed
SDN architecture’. International Journal of Communication Networks and
Distributed Systems. 2017, vol. 19(3), pp. 357–67.
[19] Chien W.-C., Lai C.-F., Cho H.-H., Chao H.-C. ‘Journal of Network and
Computer Applications’. A SDN-SFC-based service-oriented load balancing
for the IoT applications. 2018, vol. 114, pp. 88–97.
[20] Jungmin S., Buyya R. ‘A taxonomy of software-defined networking (SDN)-
enabled cloud computing’. ACM Computing Surveys. 2018, pp. 1–36.
[21] Abdelltif A.A., Ahmed E., Fong A.T., Gani A., Imran M. ‘SDN-based load
balancing service for cloud servers’. IEEE Communications Magazine. 2018,
vol. 56(8), pp. 106–11.
[22] Wang K.-Y., Kao S.-J., Kao M.-T. ‘An efficient load adjustment for balancing
multiple controllers in reliable SDN systems’. IEEE International Conference
on Applied System Invention (ICASI), Chiba, Japan; 2018.
[23] Neghabi A.A., Jafari Navimipour N., Hosseinzadeh M., Rezaee A. ‘Load
balancing mechanisms in the software defined networks: a systematic and
comprehensive review of the literature’. IEEE Access. 2018, vol. 6, pp.
14159–78.
An adaptive software-defined networking (SDN) for load balancing 155
[37] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘Resource efficient clus-
tering and next hop knowledge based routing in multiple heterogeneous wire-
less sensor networks’. International Journal of Grid and High Performance
Computing. 2017, vol. 9(2), pp. 1–20.
[38] Punhani A., Faujdar N., Kumar S. ‘Design and evaluation of cubic Torus
Network-on-Chip architecture’. International Journal of Innovative
Technology and Exploring Engineering. 2019, vol. 8(6), pp. 2278–3075.
[39] Dubey G., Kumar S., Kumar S., Navaney P. ‘Extended opinion lexicon and
ML-based sentiment analysis of tweets: a novel approach towards accurate
classifier’. International Journal of Computational Vision and Robotics.
2020, vol. 10(6), pp. 505–21.
[40] Kumar S., Ranjan P., Radhakrishnan R., Tripathy M.R. ‘Energy aware dis-
tributed protocol for heterogeneous wireless sensor network’. International
Journal of Control and Automation. 2015, vol. 8(10), pp. 421–30.
[41] Singh P., Bansal A., Kamal A.E., Kumar S. 'Road surface quality monitoring
using machine learning algorithm' in Reddy A.N.R., Marla D., Favorskaya
M.N., Satapathy S.C. (eds.). Intelligent manufacturing and energy sustain-
ability. smart innovation, systems and technologies. Vol. 265. Singapore:
Springer; 2022.
[42] Sharma A., Awasthi Y., Kumar S. ‘The role of blockchain, AI and IoT for
smart road traffic management system’. IEEE India Council International
Subsections Conference (INDISCON); Visakhapatnam, India, 3-4 Oct; 2020.
pp. 289–96.
[43] Kumar S., Ranjan P., Singh P., Tripathy M.R. ‘Design and implementa-
tion of fault tolerance technique for internet of things (IoT)’. Proceedings
of the 12th International Conference on Computational Intelligence and
Communication Networks (CICN); Bhimtal, India, 25-26 Sep; 2020. pp.
154–9.
[44] Singh P., Bansal A., Kumar S. ‘Performance analysis of various informa-
tion platforms for recognizing the quality of indian roads’. Proceedings of
the 10th Confluence of the International Conference on Cloud Computing,
Data Science and Engineering; Noida, India, IEEE, 29-31 Jan, 2020. pp.
63–76.
[45] Chauhan R., Kumar S. ‘Packet loss prediction using artificial intelligence uni-
fied with big data analytics, internet of things and cloud computing technolo-
gies’. 5th International Conference on Information Systems and Computer
Networks (ISCON); Mathura, India, 22-23 Oct; 2021. pp. 1–6.
[46] Reghu S., Kumar S. ‘Development of robust infrastructure in networking to
survive a disaster’. 4th International Conference on Information Systems and
Computer Networks, ISCON; Mathura, India, IEEE, 21-22 Nov, 2019. pp.
250–55.
[47] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘An NS3 implementa-
tion of physical layer based on 802.11 for utility maximization of WSN’. 2015
International Conference on Computational Intelligence and Communication
Networks, CICN; Jabalpur, India, IEEE, 12-14 Dec, 2016. pp. 79–84.
An adaptive software-defined networking (SDN) for load balancing 157
[48] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘A utility maximiza-
tion approach to MAC layer channel access and forwarding’. Progress in
Electromagnetics Research Symposium; Prague; 2015. pp. 2363–7.
[49] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘EMEEDP: enhanced
multi-hop energy efficient distributed protocol for heterogeneous wireless
sensor network’. 5th International Conference on Communication Systems
and Network Technologies, CSNT; Gwalior, India, IEEE, 4-6 Apr, 2015. pp.
194–200.
[50] Kumar S., Ranjan P., Ramaswami R. ‘Energy optimization in distrib-
uted localized wireless sensor networks’. Proceedings of the International
Conference on Issues and Challenges Intelligent Computing Technique
(ICICT); Ghaziabad, India, IEEE, 7-8 Feb, 2014.
[51] Sudhakaran S., Kumar S., Ranjan P., Tripathy M.R. ‘Blockchain-based trans-
parent and secure decentralized algorithm’. International Conference on
Intelligent Computing and Smart Communication 2019; Singapore; 2020.
This page intentionally left blank
Chapter 8
Emerging security challenges in cloud
computing: an insight
Gaurav Aggarwal 1, kavita Jhajharia 2, Dinesh Kumar
Saini 2, and Mehak Khurana 3
Cloud computing has been evolved as a new computing prototype with the aim of
providing reliability, quality of service and cost-effective with no location barrier.
More massive databases and applications are relocated to an immense centralized
data center known as the cloud. Cloud computing has enormous benefits of no need
to purchase physical space from a separate vendor instead of using the cloud, but
these benefits have security threats. The resource virtualization, the data and the
machine are physically absent in the cloud; the storage of data in the cloud causes
security issues. An unauthorized person can penetrate through the cloud security
and can cause data manipulation, data loss or theft might take place. This chapter
has described cloud computing and various security issues and challenges that are
present in different cloud models and cloud environment. It also gives an idea of
different threat management techniques available to encounter security issues and
challenges. The RSA algorithm implementation has been described in detail, and
the Advance Encryption Standard policy, along with its implementation, has also
been discussed. For better clarification, several reviews are conducted on the exist-
ing models.
8.1 Introduction
1
Department of Information Technology and Engineering, Amity University in Tashkent, Uzbekistan
2
Scool of Computing and Information Technology, Manipal University Jaipur, Jaipur, India
3
Department of Computer Science and Engineering, The NorthCap University, Gurugram, India
160 Intelligent network design driven by Big Data analytics
as a service (IaaS) that are the primary concern in cloud computing technology. The
main aspects of cloud computing technology consist of sharing of resources, services
on-demand, compatibility of connectivity with various devices, perfect accountabil-
ity of services and complete access of network. The cloud reduces the investment
cost and software license cost as well. The main concern remains with the cloud is
the security of valuable data and its computing. The cloud storage services allow
data owner to move valuable data from the local machine to remotely located cloud
storages, which release cloud user’s impediment for managing as well as maintain-
ing their data at local devices. Cloud computing is a distributed architecture, and
its objective is clear. It is dedicated to providing convenient data storage and quick
computing through virtualizations of computing resources. It enhances the collabo-
ration, scalability, flexibility to adapt rapidly changing new technology and provide
overall cost reduction through efficient computing. Cloud computing continues to
evolve in the new era of internet-based technology with reliable, ubiquitous and on-
demand services. Cloud computing is a combination of service-oriented architecture
[3], virtualization and advanced technologies with a strong dependency on internet-
based applications. Google has introduced the MapReduce [2, 4] framework along
with Apache’s Hadoop Distributed File System (HDFS) [5], which is processing a
large amount of shared data with less time using the internet-based applications.
In the present days, the enterprises are combining private and public cloud serv-
ers due to virtualization, and global storages of data and hence security for storage
of data and computation stole the spotlight. The storage of containers at the server
and moving them between multiple cloud environments need greater security in data
at rest and as well as in moving stages. In the present era, uses of cloud have been
increased at a larger scale.
There are online transactions, customer services, email accessing, online exams,
banking services, e-governance, and so on. All these services are run on a cloud-
based platform. So the security of those services requires enormous attention. Cloud
computing is dependent on vulnerable internet. The involvement of heterogeneous
architectures with different Cloud Service Providers (CSPs) raises more significant
security threats (Figure 8.1). Moreover, CSPs store the user’s data at different loca-
tions without knowing customers that also invite security issues. So, the security
paradigm needs to be changed to a new dimension.
The so-called security concepts such as authorization, authentication and identi-
fication are no longer stand firm to prevent valuable data stored in the server of CSPs
by a cloud user and its safe transaction within the network from possible threats.
Several essential management techniques with encryption and decryption strategies
are adapted to safeguard the user’s data when it is stored in the cloud storage and on
transmission in the network as well. The data are encoded before it is warehoused
in cloud storage.
Numerous techniques are adopted to protect the data when it is traveling through
the network’s nodes. Despite undertaking so many precautions, security remains a
big question.
Are the data safe? Are the security measures taken so far appropriate? Due to
rapid changes in technology, security architectures also need further enhancement.
Emerging security challenges in cloud computing: an insight 161
It is well known that whenever the data are traveling in the insecure network, there
is no such fool-proof technique which can assure complete security for user’s data.
Before we move toward the details of data security and its transaction in the cloud
environment, we need to know about the characteristics of cloud architecture.
According to the definition of the National Institute of Standards and Technology
(NIST) [6], a complete model of the cloud is built up with five most important
features, in which three are the service models and four deployment models. Every
cloud model has different security issues, which need to be addressed in detail.
1. Broad network access: The cloud model should have open standards for all
Applications Running Interfaces (APIs). It should be able to function on all
IP.HTTPs and REST protocols and resources should be available to users from
anywhere with an internet connection.
2. Rapid elasticity: Cloud model should always be able to allocate the resources
among the users dynamically. It should release the additional resources
162 Intelligent network design driven by Big Data analytics
dynamically to the users as and when needed. The cloud model should be fully
automated.
3. Measured service: The cloud services are metered, like a utility, and the users
only pay for services used by them. The most important is that the services can
be canceled at any time by the users.
4. On-demand self-service: In the cloud model, users are abstracted from the
implementation, and the service is based on time (real-time) delivery. The ser-
vices are accessed through a self-service network interface.
5. Resource pooling: The requested resources are pooled from the common
sources, which builds a scale of economies. Common infrastructure gains high
efficiency and runs at greater network bandwidth.
1. SaaS
2. PaaS
3. IaaS
services and tools offered by the service providers. The network servers, operating
systems or storages are managed by the providers only. The security of PaaS ser-
vices depends upon protected and dependable networking and a safe web browser.
In a PaaS platform, there are two types of securities that are needed to be maintained
by service providers. The first one is the security and safety of the PaaS stage itself,
i.e., the runtime machine and another one is the security of the user’s applications
deployed on the PaaS platform. The PaaS owner generally uses third-party service
components which are called as mashups [10, 11], a combination of many source
elements into a single combined unit. Mashups are more prone to threats when they
are not handled carefully on open networks.
threats. Though Virtual Machine Monitor (VMM) [12, 13] software is used for the
separation of virtual machines which are running simultaneously; flaws in VMM
applications need to be sifted through because any single snag in VMMs can break
down the whole systems.
1. Private cloud
2. Public cloud
3. Community cloud
4. Hybrid cloud
In these types of cloud models (Figure 8.5), the service providers provide the
complete package of networking, data storage, platform and software infrastructures
to the cloud users.
carefully. Any unauthorized use may break the security wall and create a threat to
the cloud. In a private cloud, CSPs need to maintain and resolve all security issues.
The main advantage of the private cloud is that all resources are shared among the
authorized private personnel and bound to internal use only.
the user’s devices, such as a computer, laptop, mobile phone, or any other access
device, as well as any application software that requires cloud services access. The
cloud end consists of high-speed computing devices, servers and distributed data-
base system which provide the required access to the clients and hence form a cloud
environment.
The user end needs to connect his accessing devices like PC, laptop or mobile
to the cloud to access his data stored in a server in the cloud through an interface
software using the internet. The cloud being a distributed architecture, the user data
are stored in CSP’s server, and the place of the server is unidentified to the clients
and the users.
The CSP’s servers are managed by administrative groups to whom users need
to depend upon for safeguarding data and maintaining its privacy. The administra-
tive group members, who are trading with user’s data, can be a threat to secrecy and
privacy of user’s data. A threat management policy ensures that the cloud should
not be able to determine any evidence about the client’s data. The followings are the
security challenges which need immense attention.
8.2.4.1 Vulnerabilities
Applications like Gmail, Yahoo or Facebook are provided to users via an internet
browser. Attackers are penetrating the client’s computer or application by using the
web browser. Customary security solutions are not sufficient to protect data from
attacks. Consequently, advanced measures are required to be enforced. In the virtu-
alization technique, varied instances running on the same corporeal machine need
to be secluded. A VMM [15, 16] software is implemented to separate substantial
resources from different virtual machines running simultaneously. There are vul-
nerabilities in Microsoft Virtual PCs or Microsoft Virtual Servers that can allow a
casual visitor or third party to run malicious code on the host or another operating
system. Two virtual machines using a covert channel can communicate with each
other bypassing all the rules defined by the VMMs. Another cloud threat is called
168 Intelligent network design driven by Big Data analytics
Threat 11 [17]. In this, an attacker creates a negative Virtual Machine image con-
taining malware or virus. Then the attackers publish it on the provider’s storage area.
The users then retrieve them and infect the whole cloud environment.
To counter this attack, Mirage [18], an image management system, was pro-
posed, which focuses on the access control mechanism, image filtrations, derivation
tracking system and warehouse maintenance services. VMs are to be root-locked, so
that virtualized guest environment is not permitted to interface with the host systems.
8.2.4.2 Attack in networks
8.2.4.2.1 Sniffer attacks
Data in a network travel from one node to another as a packet. Capturing of these
packets by intruders in a network is termed as sniffer attack. The data packets which
are not encrypted may be accessed and modified by the unauthorized guests, and
hence, vital data packets may lose its identity. A sniffer detection program through
the Network Interface Card guarantees that the data linked to specific systems in the
network are recorded so that its identity of data remains intact. Address Resolution
Protocol (ARP) [19] is used to detect sniffing attacks, where it maps the IP address
with the MAC address of the machine so that the data travel toward the designated
machine only. Another method called Round Trip Time [19] is also used in a sniffing
detection platform to detect a sniffing attack in networks. Here also attackers try to
copy the MAC address and capture the data by using their own modified software.
8.2.4.3 Reused IP address
This is a big security issue in terms of network security. Users migrate from one
network to another network frequently. So the old user when migrated from the
Emerging security challenges in cloud computing: an insight 169
current network to another one, the old user’s IP address is reconfigured to a new
user. A cache log for IP addresses of the departing users in the DNS server [20]
remains active for a specific time and the time lag or delay between the reassigned
IP addresses and the cache log in DNS server creates a significant security issue.
Hence, the attackers may use the cache log of the DNS and modify the DNS with
malware which would violate the originality of the data of new users.
For instances, suppose there are 16 bytes, starting from x0, x1, x2, x3 … to x15.
These bytes can be represented as the following:
2 3
x0 x4 x8 x12
6 7
6x1 x5 x9 x13 7
6 7
6x x x x14 7
4 2 6 10 5
x3 x7 x 11 x15
The key size identifies the number of replications of transformation rounds which
translate the input message, i.e., the plaintext, in the final outcome, i.e., the cipher
text. The AES operation can be illustrated as follows (Figure 8.8):
The key length decides the number of rounds that an AES operation should
have. It performs ten sets for 128-bit keys, 12 sets for 192-bit keys and 14 sets for
256-bit keys.
Each set of AES again has four subprocesses (Figure 8.9):
1. SubBytes
2. ShiftRows
172 Intelligent network design driven by Big Data analytics
3. MixColumn
4. AddRounKey
key is used. Here, we find three big positive integers a, p and d so that the integrated
exponentiation for all m:
(m^a)^p mod d = m
Here, e and d are released as public key exponent and private key exponent,
respectively. Public key is generated by merging the public exponent e and modu-
lus n, similarly for private key generation, modulus n and private exponent d are
merged.
8.2.5.2.3 Encryption
Suppose Tom wants to send message M to Jerry. First, Tom needs to turn the mes-
sage M in integer m, in a manner 0 ≤ m< d and gcd (m, d)=1 by utilizing a padding
scheme which is a prearranged reversible protocol. Tom then calculates the cipher
text c using Jerry’s public key a corresponding to c=m^a mod d. Even for 500-bit
numbers, this can be done efficiently using modular exponentiation. Tom then sends
c to Jerry.
8.2.5.2.4 Decryption
After receiving the message, Jerry can now recover the message m with the private
key exponent p by calculating c^p = (m^a) ^P=m mod d. Jerry finds the original
message M using m and reversing the padding scheme.
8.3.1 SeDaSC
Ali et al. [14] proposed a model on Secure Data Sharing on Cloud (SeDaSC) titled
‘SeDaSC: Secure Data Sharing in Cloud.’ In their model, the user file is encrypted
by a single encryption key called the master key. After encryption of the file, the
master key is then divided into two different key shares for every user. One key part
is possessed by the user, which keeps away the intruder (insider threat) from the user
data and another key part is retained from a reliable third-party body which is termed
as Cryptographic Server (CS). The master key is then deleted permanently. The
key part alone cannot decrypt user data. It needs to generate the master key again
with the help of the two key parts to decrypt the user’s file. See Figure 8.10.
master key. To prevent the regeneration of the original master key, the master key is
divided into two parts. Secure overwriting is used to delete the original master ley.
One part is the CS retains the key along with the ACL list maintained by the CS and
the other part is being sent to the users in the circle. The encrypted file is stored in the
cloud by the CS on behalf of the users. A CSP maintains the cloud storage [24–32].
The user, when wants to retrieve the data, sends an access request to the CS. The
CS, after receiving the request, downloads the required user’s file and asks for the
key part available with the user. After receiving the user key part, it authenticates
and reconstructs the master key with the help of the user’s key, and CS maintained
portion of the key for that particular user. The user’s file is then decrypted with the
master key newly generated and sent back to the user who requested the data. If a
new member joins with the existing group, the new user ID is added to the ACL, and
two parts of the key are again generated. For the member who is leaving the group,
his identification is deleted from the existing ACL. There is no possibility of access-
ing the data by the departing member, as he is having only the user part of the key
[33–40]. The ‘SedaSc’ model also suggests that the frequent encryption/decryption
is not required in the event of any change in the group membership.
The ‘SeDaSC’ model claims that the methodology can be utilized in mobile
cloud computing in supplement to existing conventional cloud computing since the
cryptographic server performs compute-intensive operation.
and keeping user’s data separate from other users in the network. The following
issues need to have greater attention to upgrade the security issues for the data.
Now, if any adversary A could corrupt a small group of cloud servers and affect
these servers to introduce numerous cheating attacks as below:
a. Storage cheating attacks: when the adversary modifies the store data to negoti-
ate data integrity or discloses the confidential data to purchase interest or both.
b. Computing cheating attack model: The attack can be for data computation secu-
rity of the cloud. The attackers can leave the cloud with erroneous computing
for the users, yet the data seem to be original.
c. Privacy cheating attack model: The attackers can reveal or leak the data to the
public for business competition. If the data are not encrypted, i.e., plaintext data,
the data can be sold out by the adversary.
To protect the valuable data from the attackers, the ‘SecCloud’ provides two
concepts as follow:
a. The proposed protocol requests for a storage space to the CSPs and the CSP
allocates a space by returning a space index i for the message to be stored.
b. The CUs need to sign each transmitted message block to enable the VAs for
auditing.
c. CUs carry out data encapsulation for precomputing a session key by using the
Bilinear Diffie-Hellman method. The CUs then send the data encrypted by the
session key and corresponding signature pairs to the cloud for storage.
When the data are needed to be received, then the CSP decrypts the packet by
using its session key to recover data signature pairs and check the signature for data
authenticity by VAs using its secret key. The authority for checking the signatures
is held by the CSP and VA only, and hence, it is claimed that the data are secured
and protected.
Though the ‘SecCloud’ protocol claims to be robust and reliable, its shortcom-
ings can be found as below:
a. The computing and transmission overhead is the primary concern here. The
computing and transmission of encrypted data are very much involved and take
much time which may cause time-out for access to the servers.
b. Here in this proposed model, we are again going to be dependent on a third
party, i.e., Verification Auditors (which may reveal the valuable user’s data and
cause a threat. Verification auditors need to be persuaded that the cloud servers
use the data on the exact location so that the cloud server’s cheating behavior
should not be detected.
c. In some cases, the users may face a denial of service due to the fact that verifica-
tion auditors may become unresponsive due to connection loss with the users
or servers.
Emerging security challenges in cloud computing: an insight 179
8.3.3 D
ata accountability and auditing for secure cloud data
storage
Prassanna et al. [44] and the team have evaluated the structure for data account-
ability and reviewed cloud user data in the distributed cloud. They have proposed
a mechanism that is technically monitoring any data access collected in the cloud
with reviewing mechanism. The mechanism also suggests privacy – maintaining
public auditing model for customer data that uses the cloud storage services in cloud
computing.
Here, it is noticed that most of the models are relying on a third party which
is the leading cause of concern in the security issues of cloud computing. In this
advanced era of technology, it is not advisable to rely on a third party.
The storage, virtualization as well as the various virtualization techniques and the
network in use encompass a more significant security concern in a cloud computing
environment. The security of user’s data needs to be ensured at the highest level.
Due to the complex architecture of the cloud, it is a big challenge to ensure the
security of data at all levels in the cloud environment. Interconnections are complex
among different architectures of CSPs. PaaS providers hire SaaS vendors for their
service and also sometimes a third party for the backup of the user’s data. The dif-
ferent providers must agree upon an SLA to secure the user’s data. Users must be
informed that what levels of standard securities are available with the CSPs and how
their data are being stored.
This chapter discusses various security issues and challenges that are present
in different cloud models and cloud environment. It gives a complete idea of other
threat management techniques available to encounter security issues and challenges.
Also, the chapter reviews the different proposed approaches published by various
journals. The report explains an entirely different process (secure key transmission
model) to enhance the security of the above issues and challenges, discussions and
comparisons with the existing models.
References
[7] Fernandes D.A.B., Soares L.F.B., Gomes J.V., Freire M.M., Inácio P.R.M.
‘Security issues in cloud environments: a survey’. International Journal of
Information Security. 2014, vol. 13(2), pp. 113–70.
[8] Security guidance for critical areas of focus in cloud computing v4.0 [on-
line]. cloud security alliance. Available from https://cloudsecurityalliance.
org/artifacts/security-guidance-v4/
[9] Understanding-the-Cloud-Computing-Stack [online]. Available from htt-
ps://www.diversity.net.nz/wp-content/uploads/2011/03/Understanding-the-
Cloud-Computing-Stack.pdf [Accessed 9 Nov 2020].
[10] Cloud Security and Privacy [online]. Available from https://www.oreilly.com/
library/view/cloud-security-and/9780596806453/ [Accessed 9 Nov 2020].
[11] Keene C. What Is Platform as a Service (PaaS) [online]?Available from
http://www.keeneview.com/2009/03/what-is-platform-as-service-paas.html
[Accessed 9 Nov 2020].
[12] Hashizume K., Rosado D.G., Fernández-Medina E., Fernandez E.B. ‘An
analysis of security issues for cloud computing’. Journal of Internet Services
and Applications. 2013, vol. 4(1), p. 5.
[13] Jansen W.A. ‘Cloud hooks: security and privacy issues in cloud computing’.
44th Hawaii International Conference on System Sciences, IEEE; Kauai, HI,
2011. pp. 1–10.
[14] Ali M., Dhamotharan R., Khan E., et al. ‘SeDaSC: secure data sharing in
clouds’. IEEE Systems Journal. 2017, vol. 11(2), pp. 395–404.
[15] Zhang F., Chen H. ‘Security-preserving live migration of virtual machines in
the cloud’. Journal of Network and Systems Management. 2013, vol. 21(4),
pp. 562–87.
[16] Subashini S., Kavitha V. ‘A survey on security issues in service delivery mod-
els of cloud computing’. Journal of Network and Computer Applications.
2011, vol. 34(1), pp. 1–11.
[17] treacherous-12-top-threats [online]. Available from https://downloads.clou
dsecurityalliance.org/assets/research/top-threats/treacherous-12-top-threats.
pdf [Accessed 9 Nov 2020].
[18] Wei J., Zhang X., Ammons G., Bala V., Ning P. ‘Managing security of vir-
tual machine images in a cloud environment’. CCSW ’09: Proceedings of
the 2009 ACM Workshop on Cloud Computing Security, ACM; Chicago, IL,
2009. pp. 91–96.
[19] Ohlman B., Eriksson A., Rembarz R. ‘What networking of informa-
tion can do for cloud computing’. 18th IEEE International Workshops on
Enabling Technologies: Infrastructures for Collaborative Enterprises, IEEE;
Groningen, Netherlands, 2009. pp. 78–83.
[20] Zhang L., Zhou Q. ‘CCOA: cloud computing open architecture’. 2009 IEEE
International Conference on Web Services, IEEE; Los Angeles, CA, 2009. pp.
607–16.
[21] Goldwasser S., Bellare M. Lecture notes on cryptography. introduction to
modern cryptography [online]. 2008. Available from https://cseweb.ucsd.
edu/~mihir/papers/gb.pdf
182 Intelligent network design driven by Big Data analytics
[22] Rivest R.L., Shamir A., Adleman L. A method for obtaining digital signatures
and public-key cryptosystems, communications of the ACM. Vol. 21; 1978.
pp. 120–26.
[23] Wei J., Zhang X., Ammons G., Bala V., Ning P. 'Managing security of virtual
machine images in a cloud environment, ACM 91-96'.2009, vol. 6(1).
[24] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘EMEEDP: enhanced
multi-hop energy efficient distributed protocol for heterogeneous wire-
less sensor network’. Fifth International Conference on Communication
Systems and Network Technologies (CSNT), IEEE; Gwalior, India, 2015. pp.
194–200.
[25] Kumar S., Ranjan P., Ramaswami R. ‘Energy optimization in distributed lo-
calized wireless sensor networks’. International Conference on Issues and
Challenges in Intelligent Computing Techniques (ICICT); Ghaziabad, India,
7-8 Feb; 2014.
[26] Chauhan R., Kumar S. ‘Packet loss prediction using artificial intelligence uni-
fied with big data analytics, internet of things and cloud computing technolo-
gies’. 5th International Conference on Information Systems and Computer
Networks (ISCON), IEEE; Mathura, India, 2021. pp. 1–6.
[27] Sudhakaran S., Kumar S., Ranjan P., Tripathy M.R. ‘Blockchain-based
transparent and secure decentralized algorithm’. International Conference
on Intelligent Computing and Smart Communication 2019. Algorithms for
Intelligent Systems; Singapore: Springer; 2020.
[28] Kumar S., Trivedi M.C., Ranjan P., Punhani A. Evolution of Software-Defined
Networking Foundations for IoT and 5G Mobile Networks. Hershey, PA: IGI
Publisher; 2020.
[29] Kumar S., Ranjan P., Radhakrishnan R., Tripathy M.R. ‘Energy efficient mul-
tichannel MAC protocol for high traffic applications in heterogeneous wireless
sensor networks’. Recent Advances in Electrical & Electronic Engineering.
2017, vol. 10(3), pp. 223–32.
[30] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘Resource efficient clus-
tering and next hop knowledge based routing in multiple heterogeneous wire-
less sensor networks’. International Journal of Grid and High Performance
Computing. 2017, vol. 9(2), pp. 1–20.
[31] Kumar S., Cengiz K., Vimal S., Suresh A. ‘Energy efficient resource migra-
tion based load balance mechanism for high traffic applications IoT’. Wireless
Personal Communications. 2021, vol. 10(3).
[32] Kumar S., Cengiz K., Trivedi C.M., et al. ‘DEMO enterprise ontology with
a stochastic approach based on partially observable Markov model for data
aggregation and communication in intelligent sensor networks’. Wireless
Personal Communication. 2022.
[33] Punhani A., Faujdar N., Kumar S. ‘Design and evaluation of cubic to-
rus Network-on-Chip architecture’. International Journal of Innovative
Technology and Exploring Engineering. 2019, vol. 8(6), pp. 2278–3075.
[34] Dubey G., Kumar S., Kumar S., Navaney P. ‘Extended opinion lexicon and
ML-based sentiment analysis of tweets: a novel approach towards accurate
Emerging security challenges in cloud computing: an insight 183
The method of identifying a speaker based on his or her speech is known as automatic
speaker recognition. Speaker/voice recognition is a biometric sensory device that
recognizes people by their voices. Most speaker recognition systems nowadays are
focused on spectral information, which means they use spectral information derived
from speech signal segments of 10–30 ms in length. However, if the received speech
signal contains some noise, the cepstral-based system’s output suffers. The primary
goal of the study is to see the various factors responsible for improved performance
of the speaker recognition systems by modeling prosodic features, and phases of
speaker recognition system. Furthermore, in the presence of background noise, the
analysis focused on a text-independent speaker recognition system.
Many researchers have labored to develop various methods and algorithms for ana-
lyzing speech signals and simplifying speech and speaker recognition processes. In
Reference [1], recent audiovisual (AV) fusion research is summarized. In References
[2–6], a method is proposed to solve the issue of visual speech recognition.
The fusion of multiple biometric characteristics for identity authentication has
shown strong benefits as compared to conventional systems based on unimodal
biometric attributes. By combining visual speech and face information simultane-
ously, a new multimodal verification approach is explored in this study. Unlike face
authentication, the proposed method uses visual speech lip movement features,
which can reduce the risk of being duped by a fake face picture. To extract features
of the face and visual expression, a Linearity Preserving Projection transform and a
Projection Local Spatiotemporal Descriptor are used.
Department of Electrical Engineering and Computer Science, Texas A&M University – Kingsville,
2
• Feature extraction
• Data augmentation
• Fusion and recognition
estimates using non-Euclidean geometry [36]. Reference [37] determines the emit-
ter location in this case.
9.2.1.1 Variability of session
Session variability applies to a phenomenon that causes differences between two
recordings of the same speaker [39]. To put it another way, two different samples of
speech recorded by the same person are unidentifiable by the computer [40].
Intersession variability is caused by a number of factors as follows.
9.2.1.2 Gender
Biometric systems use sources like iris, palm-print, face, hand geometry, and voice,
etc., to recognize individuals [47]. Along with above-mentioned characteristics,
some ancillary knowledge about the user can be used to build a secure and user-
friendly biometric device. Height, gender, age, and eye color of the individual person
are examples of ancillary details. These are referred to as “Soft Biometric” charac-
teristics. These characteristics may be continuous or discrete in nature. Gender is a
discrete soft biometric characteristic. Despite the lack of distinctiveness in the use of
soft biometric traits, gender can be used to filter [24] a broad biometric database [48].
Male and female speaker’s voices vary greatly in the shape of some aspects of the
speech signal, allowing them to separate out a huge amount of unnecessary data and
a considerable amount of time is saved.
9.2.1.3 Environment
Researchers have recently become concerned about mismatch among training and
testing environments. Speaker model synthesis [49], factor analysis [50], and feature
mapping [51] are some of the techniques developed to address this problem. Parallel
condition data are needed for accurate speaker recognition because features of the
speech signal are influenced by surrounding conditions. Speech sample collected in
the soundproof setting will have much better quality than the one recorded in a noisy
classroom or library. A database with speech samples collected in various environ-
ments, such as a soundproof space, a noisy classroom, a library, an auditorium, and
a market, should be created for this purpose.
9.2.1.5 Instrument of recording
Speech samples may be recorded using a variety of instruments. The quality of the
speech samples is influenced by the methods of recording. A digital voice recorder,
laptop, cell phone, microphone, and long-distance phone call are some of the devices
that can be used to build a quality database [54].
190 Intelligent network design driven by Big Data analytics
9.2.1.6 Age variability
The issue of aging and variations in speech quality is intertwined. Over time, the
aging effect becomes more pronounced, and the standard of speech becomes more
likely to deteriorate. Because of the aging effect, accuracy of any biometric system
degrades over time [55, 56]. However, the impact of age variability on speaker rec-
ognition has received the attention of sporadic research. The problem of aging in a
speaker recognition system can be solved by database updating at regular intervals.
A better, but more difficult, approach is to automatically adapt aging-related
changes. The lack of a database is the most significant challenge in developing
such a method. The public access to the longitudinal speaker database, which
spans more than three years, is inaccessible. The key source of variability in the
TCDSA (Trinity College Dublin Speaker Aging) database was ageing, but vari-
ance in speech quality was inevitable for such a long period of time. As a result,
for a long-term and large-scale method, the database must contain data from dif-
ferent speakers at different times, or the database should be constantly updated in
this manner.
9.2.1.7 Spoofing
Types of spoofing attacks are: (a) impersonation, (b) speech synthesis, (c) voice con-
version, and (d) replay. Impersonation is when someone attempts to imitate another
person who is a real speaker. The synthetization of the voice for the authentic
speaker is done by using a speech synthesizer to spoof the verification process [57].
Conversion of voice is a method of spoofing the computer in which attacker’s voice
is automatically converted into the voice of a legitimate speaker using a conversion
tool. The goal genuine speaker’s prerecorded speech samples are replayed using a
playing computer, which can be a cell phone, music player, or any other player. As
a result, steps should be taken right from the start, during the compilation of speech
databases, to prevent attacks of spoofing.
9.2.1.8 Whispering
The effect of deliberate speech behavior alteration on speaker recognition is studied,
and a flaw in speaker recognition systems is presented. Whispered speech, a form
of disguised speech created psychologically and/or physiologically, has recently
piqued the interest of researchers [58]. Along with research into its acoustic char-
acteristics, such as formant frequencies, corresponding bandwidth, and endpoint
detection, researchers are also interested in its applications, such as reconstruction,
speaker recognition, and so on. Whispered speech characteristics are:
9.2.1.9 Twins
Since 1990, the birth rate of twins has increased by an average of 3% each year,
according to statistics [60]. Even though identical twins account for just 0.2 per-
cent of the world’s population, their numbers are comparable to the populations of
countries such as Greece and Portugal. As a result, a biometric system capable of
accurately distinguishing between identical twins that share the same genetic code is
urgently needed [61]. The results showed that the voice and expression parameters
of the twin pairs differ to varying degrees of similarity and dissimilarity.
No. of
speakers
S. Name of Language Duration/ Type of data/
no. database used size M F Text Used EER References
Urdu speakers who record these samples by using various equipment for channel
variability.
They also gathered databases for identical twins for the requirements of
such information by forensic scientists. Another effort, using five separate net-
works in parallel, was made to build a database of 200 speakers in English
and many Indian languages in the soundproof as well as noisy places. These
databases are commonly used for biometric and forensic applications, and they
still have some flaws. Many research issues, such as databases for crying and
shouting voices, distance from the microphone, age variability, and spoofing,
remain unsolved despite these efforts.
Multiple features must be used for speaker recognition, as all the above-
mentioned characteristics are not fulfilled by a single feature. At the same time, the
number of features considered for processing and recognition should be limited, as
few techniques can handle high-dimensional data [11].
These characteristics have disadvantages as well, such as being less discrimina-
tive and easily mimicked. High-level functionality often necessitates a more com-
plex framework. As a result, no function can be considered the best for recognition,
and feature selection is a trade-off between robustness, discriminative property, and
device implementation viability.
9.2.2.3 Spectro-temporal features
Two spectro temporal details, i.e., formant transitions and energy modulation, may
be used to extract a lot of speaker-specific detail.
2 To provide temporal details to
the functions, the delta and
double delta coefficients,
which are first- and
second-order derivative estimates, can be used. The successive feature vector coef-
ficient’s time difference is used to calculate these coefficients and then combined
with the original coefficients. If thenumber
of original coefficients is n, then the total
number of coefficients with and 2 coefficients will be 3 n. For each frame, the
process is repeated.
Figure 9.2 depicts the temporal discrete cosine transform process. Performance
can be improved by integrating cepstral and temporal features instead of cepstral
system alone, but the change was minor, and further study is needed before it can be
used. Speaker recognition may also use modulation frequency as a feature. Details
about the rate at which the speaker says words and some other stylistic attributes are
included in the modulation frequency.
Speech intelligibility is measured using less than 20 Hz modulation frequencies.
To achieve the highest efficiency with this function, a temporal window of 300 ms
was used with less than 20 Hz modulation frequencies. Instead of spectrogram mag-
nitudes, discrete cosine transform (DCT) can be applied to temporal trajectories for
the reduction of dimensionality of the spectro-temporal features. DCT has an advan-
tage over DFT in that it can minimize dimensionality while maintaining the relative
phases of feature vectors, allowing phonetic- and speaker-specific information to be
contained. Instead of using amplitude-based approaches, frequency modulation can
Factors responsible and phases of speaker recognition system 195
be used to improve the unit. To separate speech signals into sub-band signals, band-
pass filter bank can be used. Then, using dominant frequency components like fre-
quency centroids, extraction of formant frequency features is done. Center and pole
frequency difference in the sub-band can then be used as a frequency modulation-
based function.
9.2.2.4 Prosodic features
Since prosody is such an important aspect of speech interpretation, it is critical to
use prosodic features for the improvement of speech processing. In general, pro-
sodic features are paired with other acoustic features for use in recognition sys-
tems; however, there are some drawbacks, such as the range of prosodic features,
and therefore framework handling segmental features cannot accommodate these
features. These features, which include pause length, syllable stress, speaking rate,
pitch or tempo, intonation patterns, and energy distribution, are referred to as supra-
segmental features.
Another issue is determining speaker differences through the processing of pro-
sodic data, which can be instantaneous or long-term. Furthermore, the characteristics
may be dependent
on aspects that the speaker may alter deliberately. The fundamen-
tal frequency Fo is a prosodic function that is commonly used. Combination of
Fo (fundamental frequency) and spectral based features has proven to be the most
efficient and reliable in noisy environments. Energy and length, in addition to Fo
related features, are much accurate than other prosodic features. Fois now explored
in depth because it is the most critical prosodic function. Fo includes both physi-
ological and learned knowledge and these two information are essential for recog-
nizing a speaker.
9.2.2.5 High-level features
Speakers can also be discriminated against based on the words they use often dur-
ing a conversation. Doddington began doing research in this field in 2001 [78].
To discriminate against the speakers, an idiolect (a speaker’s specific vocabulary)
was used. The idea behind using high-level features is to turn utterances into the
sequence of tokens and then distinguish various speakers based on the occurrence
of a similar pattern of tokens. These tokens may be words, phonemes, or prosodic
variations such as pitch or energy rises or falls. Figure 9.3 depicts the second and
third phases, namely feature extraction and feature matching.
9.2.3.1 Statistical techniques
These techniques include HMM, GMM, UBM (Universal Background Model), and
Vector Quantization (VQ), etc.
9.2.3.2 Soft-computing techniques
Soft-computing techniques are based on human mind’s cognitive actions. Neural
networks and fuzzy logic are two approaches that are developed by combining vari-
ous methods for solving real-world problems.
9.2.3.3 Hybrid techniques
These techniques combine mathematical and soft computing techniques to take
advantage of the benefits of both.
Factors responsible and phases of speaker recognition system 197
Speech signal is an input for each speech processing technique listed above. As
a result, learning the fundamentals of speech signals is important. The basics of
speech signals are presented here. First and foremost, in order to build good speech
and speaker recognition systems, the human voice development and perception
method and speech signal action must be understood.
nasal cavity [91–99]. The final sound quality is now determined by the arrangement
of articulators: velum (soft palate), tongue, lips, and mandible (jaw).
9.3.4.1 Amplitude
Maximum movement of sinusoid above and below the zero-crossing relation is
called amplitude. The amplitude of a speech signal represents its energy, and hence
its loudness.
The calculation of amplitude can be done in a variety of ways. Amplitude cal-
culation may be performed in a unit of pressure since it is related to degree of air
pressure variation. It is usually expressed in decibels (dB), which is a logarithmic
scale measurement for amplitude relative to normal signal. The dB scale is useful
because it corresponds to how humans experience loudness.
Factors responsible and phases of speaker recognition system 199
9.3.4.2 Frequency
The number of cycles per second can be used to define frequency. An oscillation
from zero-crossing reference to a peak, down to a peak below zero-crossing refer-
ence, and back to the zero-crossing reference is referred to as a loop.
The unit of frequency measurement is cycle/second. The time duration is the
inverse of the frequency, or the amount of time it takes to complete a loop. Pitch of
a speech sound is believed to change with difference in frequency (though the pitch
is a more complex perceptual quantity).
9.3.4.3 Phase
Phase can be described as the location of sinusoid’s starting point. Sinusoids that
start at maxima have a phase of zero degrees, while those that start at minima have
a phase of 180 degrees.
The phase is extremely difficult to interpret, but relative phase differences
between two signals are much easier to detect. In fact, it is the foundation of human
binaural hearing, as the human brain deduces the location of source of a speech
sound based on phase differences heard in both the ears.
9.3.7 Autocorrelation
Autocorrelation can be used to measure the pitch of voice. This approach is based on
evaluating the relationship between speech signal and its delayed counterpart. When
the delay is a pitch time, the phase of speech and its retarded variant will change, i.e.,
if one signal increases, another will rise as well, and vice versa.
When the delay is half the pitch time, there will be uncorrelated and out-of-
phase signals, which means that if one signal rises, others drop, and vice versa. The
autocorrelation curve shown in Figure 9.5 is obtained by plotting the degree of cor-
relation versus lag between signal and delayed version of same signal.
A peak can be seen in plot at a point that relates to the lag of pitch duration. As
a result, autocorrelation can be used to determine a sound’s pitch.
• Physical features
• Perceptual features
• Signal features
• Power
• Fundamental frequency
• Spectral features
• Duration of sound
9.4.1.1 Power
It is directly proportional to amplitude in speech signal and can be described as
work done per second. The higher the power present in the signal, the louder the
speech tone. Power calculation may also be used to determine presence of silence in
a speech signal as well as the dynamic range.
Windowing the speech signal, squaring the sample, and obtaining mean after-
ward may be used to evaluate the strength in speech signal for short period of time.
The energy present in a specific frequency band may be used to detect the edges
of speech signal. Applying a hard energy threshold to differentiate between frames
with no signal and frames with low energy, such as the edges of fade, cannot work.
9.4.1.2 Fundamental frequency
The signals that are either periodic or pseudo-periodic which have a validated funda-
mental frequency, which is denoted byfo. Periodic
signals repeat themselves indefi-
nitely with a period of , i.e., w t + = w t and fo = 1
for
a maximum
. The
pseudo-periodic signal nearly repeats itself is represent by w t + = w t + " .
From one period to the next, the signal varies slightly, but still, fo = 1is valid,
where "corresponds to tolerance value. In this case, the signal is periodic if the
calculated fois constant or roughly constant and true for the remaining signal. fois
used to acquire the speech signal’s edges.
The idea is that the significant variance in fois more likely to occur at the end
of word rather than in the middle.
9.4.1.3 Spectral features
Of the many simple spectral features, bandwidth is one of most significant spectral
features. Spectrum of different frequencies available in speech signals is named as
bandwidth. To differentiate music from speech sounds, bandwidth (a spectral fea-
ture) is used.
9.4.1.4 Duration of sound
The period of sound is simply the length of time the sound lasts. This is used as a
function of particular application based on multimedia database. If the unknown
sound’s duration appears to be similar to reference template, duration matching
techniques may be used.
202 Intelligent network design driven by Big Data analytics
• Pitch
• Prosody
9.4.2.1 Pitch
Since pitch includes so much knowledge about speech signals, it tends to be a bet-
ter perceptual function. It appears to be identical to a physical property known as
frequency.
Frequency, on the other hand, is an absolute and numerical quantity, while pitch
is a subjective and fluid quantity. It is perceptible; nevertheless, no human being can
identify a particular pitch value.
9.4.2.2 Prosody
Prosody is a perceptual characteristic of speech that correlates to changes in pitch
and phoneme duration, as well as significant pauses during a spoken word, which
suggest deeper meaning.
It can also be used to emphasize a single word in a sentence. Prosody research
for classification assumes that speech has already been understood and that prosody
offers additional context. In the absence of understood voice, prosody may be used.
9.4.3.1 ZCR-related features
ZCR is a measurement of the number of zero-crossings per second. The spectral
content of speech signals is determined by ZCR. The fundamental frequency fo
was determined using ZCR.
By first using filters to filter out high-frequency components that can degrade
the measurement, a zero-crossing rate-based fodetector can be created.
However, the cut-off frequencies of the filters must be chosen with care, as if
they are not, the filters may delete fo partially in order to remove as many high-
frequency contents as possible.
Factors responsible and phases of speaker recognition system 203
9.5 Localization of speaker
1. Calculate the TDE between each microphone pair and the source.
2. Estimated time delays and microphone distance information are used to deter-
mine the source’s position.
Microphone arrays are arranged in a circular pattern in this method, but the
issue is acoustic environment in which array is mounted. The microphones pick up
not just the speech signals but also the reverberated signals as well as the ambient
noise in this situation.
Meeting rooms with various types of sensors have recently become common.
These are referred to as smart meeting spaces, and they use a microphone array to
record multiperson meetings.
Additionally, the registered data can be used to automatically structure and
index meetings.
9.6 Conclusion
On the basis of extensive literature review, we identified the various factors affecting
the database of speech signal. Further study motivates us to explore the various tech-
niques of feature extraction, and feature mapping techniques to identify the speaker
have been explored based on the extracted features of speech signals.
References
[56] Batteau D.W. ‘The role of the Pinna in human localization’. Proceedings
of the Royal Society of London. Series B, Biological Sciences. 1967, vol.
168(1011), pp. 158–80.
[57] Roffler S.K., Butler R.A. ‘Factors that influence the localization of sound in
the vertical plane’. The Journal of the Acoustical Society of America. 1968,
vol. 43(6), pp. 1255–9.
[58] Oldfield S.R., Parker S.P. ‘Acuity of sound localisation: a topography of
auditory space. II. Pinna cues absent’. Perception. 1984, vol. 13(5), pp.
601–17.
[59] Chenghui G., Heming Z., Zhi T. 'Speaker identification of whispered speech
with perceptible mood'. Journal of Multimedia. 2014, vol. 9(4), pp. 553–61.
[60] Martin J.A., Kung H.-C., Mathews T.J., et al. ‘Annual summary of vital
statistics: 2006’. Pediatrics. 2008, vol. 121(4), pp. 788–801.
[61] Hofman P.M., Van Riswick J.G., Van Opstal A.J. ‘Relearning sound locali-
zation with new ears’. Nature Neuroscience. 1998, vol. 1(5), pp. 417–21.
[62] Brown C.P. Modeling the elevation characteristics of the head related im-
pulse response. [Master’s Theses and Graduate Research]. San Jose State
University Scholar Works; 1996.
[63] Campbell J.P., Shen W., Campbell W.M., Schwartz R., Bonastre J.-F., Matrouf
D. ‘Forensic speaker recognition’. IEEE Signal Processing Magazine. 2009,
vol. 26(2), pp. 95–103.
[64] Hwang S., Park Y., Park Y. ‘Sound source localization using HRTF data-
base’. Proceedings of International Conference on Control, Automation,
and Systems (ICCAS2005), KINTEX Gyeonggi-Do; Korea (South), 2–5
June; 2005. pp. 751–5.
[65] Ortega-Garcia J., Gonzalez-Rodriguez J., Marrero-Aguiar V. ‘AHUMADA:
a large speech corpus in Spanish for speaker characterization and identifica-
tion’. Speech Communication. 2000, vol. 31(2-3), pp. 255–64.
[66] Madden S.C., Galliano F., Jones A.P., Sauvage M. ‘ISM properties in
low-metallicity environments-I. mid-infrared spectra of dwarf galaxies’.
Astronomy & Astrophysics. 2006, vol. 446(3), pp. 877–96.
[67] Gravier G., Adda G., Paulson N., Carré M., Giraudel A., Galibert O. ‘The
ETAPE corpus for the evaluation of speech-based TV content processing in
the French language’. 2012.
[68] Dhall A., Goecke R., Joshi J., Wagner M., Gedeon T. ‘Emotion recogni-
tion in the wild challenge’. Proceedings of the 15th ACM on International
Conference on Multimodal Interaction; 2013. pp. 509–16.
[69] Galibert O., Kahn J. ‘The first official repere evaluation’. First Workshop on
Speech, Language and Audio in Multimedia (SLAM 2013), 43-48; 2013.
[70] Marković B., Jovic̆ić S.T., Galić J., Grozdić Đ. ‘Whispered speech data-
base: design, processing and application’. International Conference on Text,
Speech and Dialogue; Berlin, Heidelberg: Springer; 2013. pp. 591–8.
[71] Segundo E.S., Alves H., Trinidad M.F. ‘Civil corpus: voice quality for
speaker forensic comparison’. Procedia - Social and Behavioral Sciences.
2013, vol. 95(2), pp. 587–93.
Factors responsible and phases of speaker recognition system 209
[72] Vloed D.V., Bouten J., van Leeuwen D. ‘NFI-FRITS: A forensic speaker rec-
ognition database and some first experiments’. The Speaker and Language
Recognition Workshop; Joensuu, Finland, 16–19 Jun 2014; 2014.
[73] Ajili M., Bonastre J.F., Kahn J., Rossato S., Bernard G. ‘Fabiole, a
speech database for forensic speaker comparison’. Proceedings of the
Tenth International Conference on Language Resources and Evaluation
(LREC'16); 2016. pp. 726–33.
[74] Harrington J., Cassidy S. Techniques in Speech Acoustics. Dordrecht:
Kluwer Academic Publishers; 1999.
[75] Smith J., Abel J. ‘Closed-form least-squares source location estimation from
range-difference measurements’. IEEE Transactions on Acoustics, Speech,
and Signal Processing. 1987, vol. 35(12), pp. 1661–9.
[76] Besacier L., Bonastre J.F., Fredouille C. ‘Localization and selec-
tion of speaker-specific information with statistical modeling’. Speech
Communication. 2000, vol. 31(2-3), pp. 89–106.
[77] Kinnunen T., Li H. ‘An overview of text-independent speaker recognition:
from features to supervectors’. Speech Communication. 2010, vol. 52(1), pp.
12–40.
[78] Doddington G. ‘Speaker recognition based on idiolectal differences be-
tween speakers’. Proceedings of Seventh European Conference on Speech
Communication and Technology (Eurospeech 2001), Eurospeech 2001 -
Scandinavia; 2001. pp. 2521–24. Available from https://www.isca-speech.
org/archive_v0/archive_papers/eurospeech_2001/e01_2521.pdf
[79] Haidar M., Kumar S. 'Smart healthcare system for biomedical and health care
applications using aadhaar and blockchain'. Presented at 5th International
Conference on Information Systems and Computer Networks, ISCON 2021,
Publisher: IEEE; Mathura, India.
[80] Kumar S., Cengiz K., Vimal S., Suresh A., et al. ‘Energy efficient resource
migration based load balance mechanism for high traffic applications IoT’.
Wireless Personal Communications. 2021, vol. 10(3).
[81] Kumar S., Ranjan P., Radhakrishnan R., Tripathy M.R. ‘Energy efficient
multichannel MAC protocol for high traffic applications in heterogene-
ous wireless sensor networks’. Recent Advances in Electrical & Electronic
Engineering. 2017, vol. 10(3), pp. 223–32.
[82] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘Resource efficient
clustering and next hop knowledge based routing in multiple heterogene-
ous wireless sensor networks’. International Journal of Grid and High
Performance Computing. 2017, vol. 9(2), pp. 1–20.
[83] Punhani A., Faujdar N., Kumar S. ‘Design and evaluation of cubic to-
rus Network-on-Chip architecture’. International Journal of Innovative
Technology and Exploring Engineering (IJITEE). 2019, vol. 8(6).
[84] Dubey G., Kumar S., Kumar S., Navaney P. ‘Extended opinion lexicon and
ML-based sentiment analysis of tweets: a novel approach towards accurate
classifier’. International Journal of Computational Vision and Robotics.
2020, vol. 10(6), pp. 505–21.
210 Intelligent network design driven by Big Data analytics
[85] Kumar S., Ranjan P., Radhakrishnan R., Tripathy M.R. ‘Energy aware distrib-
uted protocol for heterogeneous wireless sensor network’. International Journal
of Control and Automation. 2015, vol. 8(10), pp. 421–30.
[86] Singh P., Bansal A., Kamal A.E., Kumar S. ‘Road surface quality moni-
toring using machine learning algorithm’ in Reddy A.N.R., Marla D.,
Favorskaya M.N., Satapathy S.C. (eds.). Intelligent Manufacturing and
Energy Sustainability. Smart Innovation, Systems and Technologies. 265.
Singapore: Springer; 2022.
[87] Sharma A., Awasthi Y., Kumar S. ‘The role of blockchain, AI and iot for
smart road traffic management system’. Proceedings of the 2020 IEEE India
Council International Subsections Conference, INDISCON 2020, Publisher:
IEEE, Visakhapatnam, India; 2020. pp. 289–96.
[88] Kumar S., Ranjan P., Singh P., Tripathy M.R. ‘Design and implementa-
tion of fault tolerance technique for internet of things (iot)’. Proceedings
of the 12th International Conference on Computational Intelligence and
Communication Networks, CICN 2020, Publisher: IEEE, Bhimtal, India;
2020. pp. 154–59.
[89] Singh P., Bansal A., Kumar S. ‘Performance analysis of various informa-
tion platforms for recognizing the quality of indian roads’. Proceedings of
the Confluence 2020 - 10th International Conference on Cloud Computing,
Data Science and Engineering, Publihser: IEEE, Noida India; 2020. pp.
63–76.
[90] Singh P., Bansal A., Kumar S. ‘Performance analysis of various information
platforms for recognizing the quality of indian roads’. Proceedings of the
Confluence 2020-10th International Conference on Cloud Computing, Data
Science and Engineering, Publisher: IEEE, Noida, India; 2020. pp. 63–76.
[91] Reghu S., Kumar S. ‘Development of robust infrastructure in networking
to survive a disaster’. 4th International Conference on Information Systems
and Computer Networks, ISCON 2019, Publisher: IEEE, Mathura, India;
2019. pp. 250–55.
[92] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘An NS3 implementa-
tion of physical layer based on 802.11 for utility maximization of WSN’.
Proceedings of the International Conference on Computational Intelligence
and Communication Networks, CICN 2015, Publisher: IEEE, Jabalpur,
India; 2016. pp. 79–84.
[93] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘A utility maxi-
mization approach to MAC layer channel access and forwarding’.
Progress in Electromagnetics Research Symposium, PIERS 2015, The
Electromagnetics Academy, 6-9 July 2015, Prague, Czech Republic;
2015. pp. 2363–67.
[94] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘EMEEDP: enhanced
multi-hop energy efficient distributed protocol for heterogeneous wire-
less sensor network’. Proceedings of the 5th International Conference on
Communication Systems and Network Technologies, CSNT 2015; Gwalior,
India, IEEE, 2015. pp. 194–200.
Factors responsible and phases of speaker recognition system 211
Water is an essential resource that we use in our daily life. The standard of the water
quality must be observed in real time to make sure that we obtain a secured and clean
supply of water to our residential areas. A water quality-monitoring and decision-
making system (WQMDMS) is implemented for this purpose based on Internet of
Things (IoT) and fuzzy logic to decide the usage of water (drinking or tap water)
in a common water tank system. The physical and chemical properties of data are
obtained through continuous monitoring of sensors. The work describes in detail the
design of a fuzzy logic controller (FLC) for a water quality measurement system,
to determine the quality of water by decision-making, and accordingly, the usage of
water is decided. The WQMDM system measures the physico-chemical characteris-
tics of water like pH, turbidity, and temperature by the use of corresponding analog
and digital sensors. The values of the parameters obtained are used to detect the
presence of water contaminants and accordingly, the quality of water is determined.
The measurements from the sensor are handled and processed by ESP32, and these
refined values follow the rules determined by the fuzzy inference system (FIS). The
output highlights the water quality that is categorized as very poor, poor, average,
and good. The usage of the water will be determined by the results obtained using
the FLC and as per the percentage of water quality, the water is decided as drinking
water or tap water.
10.1 Introduction
To understand the quality of water, we should know the chemical, physical, and
natural features of the water based on the norms of its operation. It is not easy to
say that “water is good” or “water is bad.” Water quality is generally determined in
1
School of EEE, SASTRA Deemed University, Thirumalaisamudram, Thanjavur, Tamil Nadu, India
214 Intelligent network design driven by Big Data analytics
relation to the usage of water. ESP32 microcontrollers are used in IoT-based water
quality monitoring systems. The sensing devices like pH, turbidity, and DS18B20
temperature sensors are interfaced with ESP32. The ESP32 receives the data from
sensors and updates them in the cloud platform ThingSpeak with the help of the
Internet via Wi-Fi. The live data from ThingSpeak will be fed into MATLAB® as the
input for the fuzzy logic system.
Fuzzy logic is a computational approach to representing the vagueness or the
uncertainty rather than mentioning the “good or bad” (the integer values 1 or 0)
boolean sense on which ultra-modern computers are based. Fuzzy logic is an intro-
ductory control system that depends on the state of the input and affair depends on
the rate of change of that state.
In the work done by Unnikrishna Menon et al. [1], a system is described for
the quality assessment and monitoring of the river water based on wireless sensor
networks (WSN) that support both remote and continuous monitoring of the quality
levels of water in India. In their work, they used only a pH sensor and tested the dif-
ferent conditions of pH like lemon juice, rainwater, and drinking water. They used
Zigbee technology to transmit the data from sensing devices to the cluster head.
Bokingkito and Caparida [2] proposed a water quality assessment system for
finding out the decrease in aquaculture (fisheries) production. He and Zhang [3]
presented a work based on wireless water monitoring networks and remote data cen-
ters and suggested using the CC2430 microprocessor as a core hardware platform.
The WSN is constructed on the Zigbee communication module. The WSN scans
the parameter and uses the GPRS DTU with the TCP/IP protocol to send the live
readings to the Internet. Pande et al. [4] presented a manuscript to assess the quality
of drinking water for housing society. The proposed system collects the parameters
like temperature, turbidity, level, and pH to measure the quality of water samples.
They suggested the use of ESP8266 Wemos d1 mini and Raspberry pi for simple,
faster, efficient, real-time, monitoring of data. Sarwar et al. [5] presented a study on
the designed fire detection and warning systems for buildings based on a fuzzy logic
theory and carried out the simulation tasks in the MATLAB Fuzzy Logic toolbox.
They used Arduino Uno R3 and fuzzy logic to predict only the true incidents of fire
with the data obtained from temperature, humidity, and flame sensor. They proposed
a control mechanism that can activate water showers when it detects fire. Lambrou
et al. [6] worked on the design of monitoring water quality at consumer sites using
the optical and electrochemical sensors installed in the pipelines. They proposed a
system with a PIC32 MCU board that gathers water quality parameters from sensors
and sends the data to the ARM platform and to the Internet that stores data and send
email or message to the notification node through Zigbee. Using event detection
algorithms, the alarms were activated to detect the water quality standards.
Faruq et al. [7] designed and implemented a cost-effective, simple water quality
assessment system with calibrated sensors for measuring parameters like tempera-
ture, turbidity, and pH, which will be shown on the LCD monitor. They just detected
the water quality without using IoT. Vigueras-Velázquez et al. [8] discussed a work
to evaluate the freshwater quality in farming tanks to grow whitefish (Chirostoma
estor water quality) and used sensors to measure dissolved oxygen (DO), pH,
IoT-based water quality assessment using fuzzy logic controller 215
temperature, non-ionized ammonia, and total ammonia. The main aim is to maintain
the ideal conditions for the sustained growth of fish. Better aquaculture water assess-
ment was made probable by the implementation of weighted FIS. Pasika and Gandla
[9] also proposed a smart water quality assessment system using Arduino mega with
ultrasonic, pH, turbidity, temperature sensors, and Wi-Fi ESP8266 node MCU with
the cloud platform. Baghavan and Saranya [10] proposed a sensor network with AI
to identify the pollutants in water so that the water can be subjected to a purification
process. Kothari et al. [11] developed a system to test the rainwater, tap water, well
water, and purified RO water using sensors, viz., for measuring temperature, TDS,
pH, and DO, along with Arduino mega2560 and GSM module. Chowdury et al. [12]
also discussed how the WSN along with a microcontroller is used for processing and
establishing the inter- and intra-node communication among sensors for pollutant
monitoring of water for Bangladeshi populations. They were able to acquire the real-
time data and were able to access using remote monitoring and IoT. The collected
data were displayed on a PC through an expert system and DL neural network mod-
els, in comparison with the typical values that generate computerized alert messages
if the obtained value is above the threshold limit [13–20].
Based on the motivation from the previous studies, the proposed system of
monitoring the water quality and decision-making has been implemented in real
time for consumer application using only ESP32 with pH, temperature, and turbidity
sensors for processing and monitoring, and the fuzzy logic system is implemented
for decision-making. It does not require any data centers as we are storing and dis-
playing the measured water quality parameters in the cloud platform. This proposed
system is economical and cost effective, and the prediction will be more accurate.
We can measure the water quality of water tank systems that are installed in multiple
locations and also get the live data in the cloud system [21–36].
10.2 Experimental procedures
sensor to sense the temperature of the water, as shown in Figure 10.1. The hardware
is configured, and the C language program code is scripted in Arduino IDE to obtain
the anticipated format of the sensor data. The sensor parameters, namely, temper-
ature, turbidity, and pH values can be seen in Arduino serial monitor, as shown
in Figure 10.2 which is sent to Thinspeak, and the respective data are obtained in
MATLAB via Thing speak. In MATLAB, using fuzzy logic rules, the degree of
quality, and clearness of the water are determined.
Sensors and microcontroller
pH sensor (E-201C): The pH of any solution is a quality metric that represents
the acidity or basicity. The pH scale is a log scale of hydrogen ions in solution
(range of 0–14 and a neutral point of 7). The values above seven determine a base or
alkaline solution, and values below seven indicate an acidic solution. It operates on
a 5-V power supply and is presented in Figure 10.2. Its response time is <1 min and
internal resistance is less than or equal to 250 MΏ. For drinking, water pH should
be between 6.5 and 8.5. The sensor needs to be calibrated where it will display the
voltage and pH value.
10.3 Working
The suggested system adopts three sensing devices (temperature, pH, and turbid-
ity), ESP32, and the ThingSpeak platform. Only one microcontroller with inbuilt
Wi-Fi and Bluetooth module is used. Obtaining the input values from the sensors
and determining the degree of membership level for each input value from sensors
using fuzzy expressions are the first stage in utilizing fuzzy logic rules to evaluate
the water condition (also known as membership function). The sensors (pH, turbid-
ity, and DS18B20 temperature sensor) that are interfaced with the ESP32 micro-
controller are processed using the program in Arduino IDE from which the sensor
readings will be sent to ThingSpeak. The live readings from ThingSpeak will be
fed into the FLC in MATLAB. Figure 10.7 shows the hardware circuit connections.
Fuzzy logic designer
Figure 10.8 depicts a block schematic of the proposed system that uses the
fuzzy-based decision-making system. These steps of the proposed fuzzy logic
decision-making system comprise (a) initialization of linguistic variables, member-
ship functions, and construction of rules, (b) fuzzification where the crisp values of
input data are converted to fuzzy values using membership functions, (c) evaluation
of knowledge-based rules and combining the results of each rule, and (d) defuzzifi-
cation where the output values are transformed to non-fuzzy values.
The hardware and software components like ESP32, the three sensors (tempera-
ture, pH, and turbidity), Matlab, and ThingSpeak make up the whole system. The
IoT-based water quality assessment using fuzzy logic controller 221
The low has a range of –50º to 20º, the medium has a range of 20º to 40º, and the
high has a range of 40º–100º. The triangular membership function is used.
Membership function of pH
Figure 10.11 displays three levels of pH, which are acid, neutral, and base. The
unit of pH has a range of 0–14 pH. The acid has a range of 0–6.5 pH, the neutral has
a range of 6.5–7.5 pH, and the base has a range of 7.5–14 pH.
Membership function of turbidity
Figure 10.12 shows three levels of turbidity as low, mid, and high. The unit of
turbidity has a range of 0–200 NTU. The low has a range of 0–5 NTU, the medium
has a range of 5–30 NTU, and the high has a range of 30–200 NTU.
Membership function of output water quality
The linguistic variables of the output are determined into four categories (very
poor, poor, average, and good). The quality of water is determined by the range of
variables as very poor from 0% to 30%, poor as 30% to 60%, average as 60% to
80%, and good as 80% to 100%, which are mentioned in Figure 10.13.
Fuzzy rule editor for water quality
Figure 10.14 represents the MATLAB rule editor, where rule sets for fuzzy
water quality are implemented. The pH is represented by the first box, the turbidity
variables are represented by the second box on the left, the temperature variables are
represented by the third box on the right, these sides of the box indicate the input
variables, and the level of water quality is represented by the right box.
The results of four different samples of water are displayed as rule viewer
in Figures 10.15–10.18. The first three columns represent the pH, turbidity, and
224 Intelligent network design driven by Big Data analytics
temperature, and the last column indicates the water quality in percentage.
Figure 10.15 displays the first water sample quality of percentage 70 as pH value is
7.05, turbidity value is 24.1 NTU, and the temperature is 34.2°C, which indicates
that it can be used as tap water for cleaning and washing. The second water sample is
taken whose pH is 5.2, turbidity value is 110 NTU, the temperature value is 28.9°C,
and the water quality percentage is 15, as shown in Figure 10.16 and cannot be used
as drinking water. The third water sample is taken whose pH is 6.85, turbidity value
IoT-based water quality assessment using fuzzy logic controller 225
is 2.19 NTU, the temperature value is 27.8°C, and the water quality percentage is 90,
as shown in Figure 10.17. This implies that the quality of water is good and can be
used as drinking water. The final water sample is taken whose pH is 8.18, turbidity
value is 81 NTU, the temperature value is 46.5°C, and the water quality percentage is
50, as shown in Figure 10.18, which can be used for cleaning purposes. Figure 10.19
displays the surface viewer of the water quality decision-making system.
Defuzzification
The whole monitoring and decision-making of the water quality system are now
complete as inference and defuzzification are built-in functions that are executed
by MATLAB. Therefore, defuzzification is the final step in implementing the FLC
where the output is expressed as if condition statements and stored in the knowledge-
based system database. Here, fuzzification of the scalar values, application of rules,
generation of fuzzy output, and conversion to scalar quantity take place. The com-
monly issued defuzzification methods are centroid and weighted average methods.
The centroid method is used here as it provides accurate and efficient [5]. After the
defuzzification process, the water quality percentage is displayed.
Figure 10.20 shows the hardware connections of the sensors (pH, turbidity, and
DS18B20 temperature sensor) with the ESP32 microcontroller and interfaced with
the Arduino IDE; from there, the sensor readings will be sent to ThingSpeak.
Several trials were done to evaluate the proposed water monitoring system’s per-
formance. The system’s performance is determined by the delicate nature of the
water. The conducted trials listed below represent the precision of water quality.
Several trials were conducted to evaluate the proposed water quality monitoring
system’s performance, as shown in Tables 10.1 and 10.2. Out of these trials, three
trial instances are visually represented as trials with bad, average, and acceptable
water quality, respectively. The water quality index is indicated in the trials below
from 0% to 100% in terms of the rate of changes in pH, turbidity, and temperature.
ThingSpeak data
ThingSpeak is an open-source cloud-based platform for gathering, envisag-
ing, and analyzing live data received from sensors. The ThingSpeak application is
Turbidity Temperature
Water sample pH NTU o
C Water quality % Quality level
1 6.8 3.4 28.8 90 Good
2 6.2 15 30 70 Average
3 7.1 15.3 27.5 70 Average
4 6.9 28 7.25 50 Poor
5 6.3 200 27 30 Very poor
6 7.2 2 31 90 Good
7 8 20 26 40 Poor
8 6 10 40 50 Poor
9 6.85 2.19 27.8 90 Good
10 8.18 81 46.5 50 Poor
11 5.2 110 28.9 15 Very poor
228 Intelligent network design driven by Big Data analytics
available as a MATLAB library function, and we use it to get real-time water metrics.
The data from the sensors are displayed in the ThingSpeak interface as shown in
Figures 10.21–10.23) which is then sent to the FLC, and the water quality is calcu-
lated and displayed for three different samples.
Results of MATLAB implementation
The water quality percentage of the three different samples determined using the
FLC designed in MATLAB is displayed in Figures 10.24–10.26.
After the experiments, all the results were obtained, and the data are given in
Tables 10.1 and 10.2. As you can see from the tables, the percentage of water quality
is obtained, ranging from 15% to 90%, which explicates that the suggested system is
working in accordance with the defined rules in the FLC.
10.5 Conclusion
In the proposed system, we have successfully constructed the hardware and soft-
ware tools for WQMDMS using the ESP32 microcontroller, sensors like pH, tur-
bidity, and DS18B20 temperature sensor, MATLAB-based fuzzy logic control, and
ThingSpeak cloud platform. The circuit operates in error-free conditions and pro-
vides us with the expected output. Continuous data samples can be obtained using
the present proposed system that will help us to determine the water quality more
accurately. The proposed system is portable to determine the water quality tank sys-
tems across multiple areas. We can also determine the leakages in the pipeline by
IoT-based water quality assessment using fuzzy logic controller 229
seeing the variation in water quality levels across the different locations. The future
scope of the system is as follows:
References
[1] Unnikrishna Menon K.A., Divya P., Ramesh M.V. ‘Wireless sensor network
for river water quality monitoring in india’. Third International Conference
on Computing, Communication and Networking Technologies (ICCCNT’12),
IEEE; Coimbatore, India, 2012. pp. 1–7.
[2] Bokingkito P.B., Caparida L.T. ‘Using fuzzy logic for real - time water quality
assessment monitoring system’. Proceedings of the 2018 2nd International
Conference on Automation, Control and Robots (ICACR 2018). Publisher:
Association for Computing Machinery (ACM); Bangkok Thailand, 2018. pp.
21–25.
[3] He D., Zhang L.X. ‘The water quality monitoring system based on WSN’.
2nd International Conference on Consumer Electronics, Communications
and Networks (CECNet), IEEE; Yichang, China, 2012. pp. 3661–64.
[4] Pand A.M., Warhade K.K., Komati R.D. ‘Water quality monitoring system
for water tanks of housing society’. International Journal of Electronics
Engineering Research. 2017, vol. 9(7), pp. 1071–8.
[5] Sarwar B., Bajwa I., Ramzan S., Ramzan B., Kausar M. ‘Design and applica-
tion of fuzzy logic based fire monitoring and warning systems for smart build-
ings’. Symmetry. 2018, vol. 10(11),615.
[6] Lambrou T.P., Anastasiou C.C., Panayiotou C.G., Polycarpou M.M. ‘A low-
cost sensor network for real-time monitoring and contamination detection in
drinking water distribution systems’. IEEE Sensors Journal. 2014, vol. 14(8),
pp. 2765–72.
[7] Faruq M.O., Emu I.H., Haque M.N., Dey M., Das N.K., Dey M. ‘Design
and implementation of cost-effective water quality evaluation system’. IEEE
Region 10 Humanitarian Technology Conference (R10- HTC); Publisher:
IEEE, Dhaka, Bangladesh, 2017. pp. 860–63.
[8] Vigueras-Velázquez M.E., Carbajal-Hernández J.J., Sánchez-Fernández L.P.,
Vázquez-Burgos J.L., Tello-Ballinas J.A. ‘Weighted fuzzy inference system
for water quality management of Chirostoma estor estor culture’. Aquaculture
Reports. 2020, vol. 18, p. 100487.
[9] Pasika S., Gandla S.T. ‘Smart water quality monitoring system with cost-
effective using IoT’. Heliyon. 2020, vol. 6(7) e04096.
[10] Bhagavan N.V.S., Saranya P.L. 'Water pollutants monitoring based on internet
of things' in Inorganic pollutants in water. Elsevier; 2020. pp. 371–97.
[11] Kothari N., Shreemali J., Chakrabarti P., Poddar S. 'Design and implemen-
tation of iot sensor based drinking water quality measurement system'.
Materials Today: Proceedings. 2021, vol. 3, pp. 1–10.
[12] Chowdury M.S.U., Emran T.B., Ghosh S., et al. ‘IoT based real-time river
water quality monitoring system’. Procedia Computer Science. 2019, vol.
155(3), pp. 161–8.
[13] Kumar S., Cengiz K., Trivedi C.M., et al. ‘DEMO enterprise ontology with
a stochastic approach based on partially observable Markov model for data
232 Intelligent network design driven by Big Data analytics
[25] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘An NS3 implementa-
tion of physical layer based on 802.11 for utility maximization of WSN’. 2015
International Conference on Computational Intelligence and Communication
Networks (CICN), IEEE; Jabalpur, India, 2016. pp. 79–84.
[26] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘A utility maximiza-
tion approach to MAC layer channel access and forwarding’. Progress in
Electromagnetics Research Symposium, Publisher: The Electromagnetics
Academy; Prague, Czech Republic, 2015. pp. 2363–67.
[27] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘EMEEDP: enhanced
multi-hop energy efficient distributed protocol for heterogeneous wireless
sensor network’. Fifth International Conference on Communication Systems
and Network Technologies, IEEE; Gwalior, India, 2015. pp. 194–200.
[28] Kumar S., Ramaswami R., Rao A.L.N. ‘Energy optimization in distrib-
uted localized wireless sensor networks’. Proceedings of the International
Conference on Issues and Challenges Intelligent Computing Technique
(ICICT), IEEE; Ghaziabad, India, 2014. pp. 350–55.
[29] Sudhakaran S., Kumar S., Ranjan P., Tripathy M.R. ‘Blockchain-based trans-
parent and secure decentralized algorithm’. International Conference on
Intelligent Computing and Smart Communication. Algorithms for Intelligent
Systems, Springer; THDC-IHET, 2020. pp. 327–36.
[30] Kumar S., Trivedi M.C., Ranjan P. Evolution of Software-Defined Networking
Foundations for IoT and 5G Mobile Networks. Hershey, PA: IGI Publisher;
2020. p. 350.
[31] Sampathkumar A., Rastogi R., Arukonda S., Shankar A., Kautish S., Sivaram
M. ‘An efficient hybrid methodology for detection of cancer-causing gene
using CSC for micro array data’. Journal of Ambient Intelligence and
Humanized Computing. 2020, vol. 11, pp. 4743–51.
[32] Nie X., Fan T., Wang B., Li Z., Shankar A., Manickam A. ‘Big data analyt-
ics and IoT in operation safety management in under water management’.
Computer Communications. 2020, vol. 154(1), pp. 188–96.
[33] Shankar A., Jaisankar N., Khan M.S., Patan R., Balamurugan B. ‘Hybrid
model for security‐aware cluster head selection in wireless sensor networks’.
IET Wireless Sensor Systems. 2019, vol. 9(2), pp. 68–76.
[34] Shankar A., Pandiaraja P., Sumathi K., Stephan T., Sharma P. ‘Privacy pre-
serving E-voting cloud system based on ID based encryption’. Peer-to-Peer
Networking and Applications. 2021, vol. 14(4), pp. 2399–409.
[35] Bhardwaj A., Shah S.B.H., Shankar A., Alazab M., Kumar M., Gadekallu
T.R. ‘Penetration testing framework for smart contract blockchain’. Peer-to-
Peer Networking and Applications. 2021, vol. 14(5), pp. 2635–50.
[36] Kumar A., Abhishek K., Nerurkar P., Ghalib M.R., Shankar A., Cheng X.
‘Secure smart contracts for cloud- based manufacturing using Ethereum
blockchain’. Transactions on Emerging Telecommunications Technologies.
2020, vol. 33(4).
This page intentionally left blank
Chapter 11
Design and analysis of wireless sensor
network for intelligent transportation and
industry automation
Prabhakar D. Dorge 1, Prasanna M. Palsodkar 2, and Divya
Dandekar 3
This work is based on the wireless sensor networks (WSN), which contain an insuf-
ficient number of device nodes, regularly similarly stated nodes or sensors, and
sensor knots that are associated with all other wireless communications. There are
numerous assumptions or overall possessions of WSNs, and a lot more applica-
tions of WSNs around the creation are presented, making it unbearable to protect all
their application areas. Applications of WSNs span ecological and animal monitor-
ing, factory and manufacturing monitoring, farming monitoring and mechanization,
healthiness monitoring, and many other areas. One of the most characteristics of
WSNs is that they are strongly coupled with their application. In this chapter, WI-
MAX without wormhole attack is explained, and the related results are explained
with their outputs The NS2 evaluation system is applied to production out of all
imitations.
11.1 Introduction
A wireless network is an all type of processer net that uses wireless information net-
works to wad system nodes. Wireless networks are processer networks that are not
associated by chains irrespective of the kind. The apple of a wireless network allows
originalities to stop the exclusive resources of giving cables into constructions or
as an association connecting different apparatus locations. The foundation of wire-
less systems is the receiver effect, an application to happen on the animal advanced
equal of system construction. Wireless skills are different based on the sizes, for the
1
Department of Electronics and Telecommunication Enginnering, Yeshwantrao Chavan College of
Engineering, Nagpur, India
2
Department of Electronics Engineering, Yeshwantrao Chavan College of Engineering, Nagpur, India
3
Department of Electronics Engineering, Hochschule Breman University, USA
236 Intelligent network design driven by Big Data analytics
most part particularly in just how much bandwidth they offer and how distant to
one side interactive nodes can be real. Additional significant difference comprises
which possible electromagnetic ranges they choose and just how much power they
use. In this part, we deliberate four protuberant wireless technologies: Bluetooth,
Wi-Fi, Wi-MAX, and 3G cellular wireless. In the segments, we present them in a
manner from the straight series to the lengthiest range one; for the most part, usu-
ally wireless links are used in these days, which are generally unequal, i.e., together
endpoints are typically kinds of nodes. One endpoint, from time to time called the
base station, generally has no flexibility but has a connection to the Internet or other
systems that join in the conflicting end from the link for the reason that a “client
node” can frequently be movable and uses its link to the base station for those with
its statements through other nodes.
Multiple-input multiple-output (MIMO) is a technique for multiplying the
capacity of a radio link using many transmitting and receiving antennas to utilize
multipath propagation. The advantages of using MIMO are increasing link capacity
and spectral efficiency. Here we are dealing with multiple radio channels, i.e., the
WiMAX-based WSN system will transmit and receive the data through multiple
radio channels. Multiple transmitters can transmit data for a particular receiver at a
time; such a technique is called multiuser MIMO [1]. By using multiple transmit-
ting and receiving antennas or radio channels, MIMO offers extra special degrees of
freedom for data transmission. In a single-user wireless communication system, it
has been observed that using the MIMO technique can lead to impressive improve-
ment in capacity and link reliability. The MIMO technique has a great potential to
improve the throughput, delay, and jitter performance [2]. MIMO plays a significant
role in any wireless communication system. WiMAX uses the MIMO technique in
terms of multiple radio channels, which results in an improvement in the perfor-
mance of WiMAX-based systems.
In today’s era of wireless communication systems, the strength and the quality
of the received signal by the user is depending upon various factors. Similarly, in
the transceiver system, there are several factors that govern the signal strength and
quality. Some of these factors are modulation technique, data rate, coding scheme,
power constraint, path loss factor, and so on. Now in order to satisfy the need for
effective signal transmission and to gain a good signal quality at the receiver or
user end, the aforementioned listed parameters have to be adjusted according to the
channel parameters. The novel technique employing this strategy is called the link
adaptation technique.
In the link adaptation technique, various constraining factors are adjusted or
adapted according to the required radio link parameters to provide good-quality sig-
nals. Recently, an alternative link adaptation technique called AMC has come up to
improve the overall system capacity. AMC allows matching of modulation and cod-
ing method along with protocol and signal parameters to those with the conditions
on the radio link such as path loss, sensitivity of the receiver, power margin of the
transmitter, and so on.
The process of AMC is a dynamic one as the protocol and signal conditions on
radio link alter frequently. The main purpose of AMC is to maintain an acceptable
Design and analysis of wireless sensor network 237
bit error rate as well as to make more effective use of channel capacity. Various
transmission parameters that can be adapted are data rate, coding rate, error prob-
ability and transmitted power.
Other methods can be the assumption of approximately the same channel from
TX to RX and from RX to TX in the time-division duplex method. Using link state
information that is present in the TX adaptive modulation systems provides an
improvement in the rate of transmission.
The AMC concept. As per the AMC concept, here the modulation format is
matched according to the SNI ratio for end customers. Here, the channel is made
approximately constant by selecting small time–frequency bins, and thus for each
time slot, a separate channel is represented by each of these time–frequency bins.
The efficient technique will be allowing only one user having the best channel to
transmit in each of the parallel channels. For lower values of SINR, the modulation
technique used is QPSK with a smaller constellation size of the signal to provide
reliability and robust transmission. As the SINR values increase, the modulation
techniques with a bigger constellation size of the signal are employed like 8-PSK,
16-QAM, and 64-QAM. For higher values of SINR, the reason for using bigger
size of the constellation of the signal is to provide significant modulation rates with
lesser values of error probability.
MIMO promises a significant increment in throughput and ranges of wireless
communication without any increase in transmit power. A MIMO system relies on
techniques such as spatial multiplexing, transmit diversity, and beamforming to
improve the quality of transmission, data rates, and received signal gain as well as
to reduce interference. Assume a communication system with ηт TX antennas and
ηR RX antennas.
r t = Hs t + v t
T
Here, rt = r1t r2t ...rnRt is the receive side signal on time moment t,
1 2 nT T
st = s t , s t , ..., s t is the transmitted signal, and vt is AWGN with unit variance
and uncorrelated among the ηR RX antennas. RX antenna i receives a superposi-
tion of every transmitted message as of TX j, weighted by the link reply, and a few
AWGN is added.
Here, ηRηT transition matrix is completed from elements hi,j as given:
0 1
h1,1 ... h1,nT
B C
H=B @ ... .... ... C
A
hnR ,1 ... hnR ,nT
Here, hi,j represents the complex link coefficient among the jth TX antenna and the
ith RX antenna.
The transmitted power is given by
P tj = P
T
or j=1, 2,…………………., ηт
238 Intelligent network design driven by Big Data analytics
Recently, the requirement for multimedia facilities with large superiority of ser-
vice requirements has been increasing [2]. The speed of the network depends on
multipath propagation and path loss. The key difference between WSN and MANET
is the high-speed models, fast-changing topology, and also capability of mobility
prediction. The 4 × 4 MIMO system gives superior performance in the presence of
interference. Also, it improves system’s capacity.
The sending and receiving sections of the antenna are used to suppress the
multi-antenna fading channel. It can take advantage of MIMO channels to improve
the wireless channel and double the ability exclusive of growing the bandwidth and
transmit power of the antenna. To defeat the troubles of the channel impairments, we
used adaptive modulation for MIMO systems. AMC and MIMO increase the infor-
mation rate of the system. The information of link situation information at the TX
is serious for multi-user MIMO, while it is not important for single-user MIMO [3].
In any unplanned network, the nodes are not well known about the topology of
the network. Reactive RP is a bandwidth-efficient on-demand routing protocol for
ad hoc networks. The examples of reactive RPs are Ad hoc On-Demand Distance
Vector (AODV), Ad hoc OnDemand Multipath Distance Vector (AOMDV), and
Dynamic Source Routing (DSR).
In case of proactive RP, each node has either one or more than one table that
shows the whole network topology. These tables are regularly updated so as to pro-
vide advanced information about routing from every node to others. Destination
Sequenced Distance Vector (DSDV) is the proactive RP.
The information regarding the topology is transmitted only on-demand by the
nodes. AODV uses the shortest path or fastest path for transmission of data. AODV
RP is used to find roots from the transmitter to the receiver only on insist, i.e., only
when the source node wants to transmit data packets, only then the root is found.
This is unipath routing protocol. If the root to the required destination is not avail-
able, then the source node sends RREQ to every neighboring node into the system
as a result of which it gets many roots to the different destinations from just one
root request RREQ message. Ten AODV utilizes target string number for the deter-
mination of up-to-date lane and for finding a fresh way to the receiver. The source
node receives RREP packets from intermediate nodes that have a valid root to the
destination node or else the root replies packets, i.e., RREPs are directly sent by
the receiving node to the transmitting node. But in between if the path breaks and
the intermediate node identifies it, then it informs the end nodes about the break-
age of the path by sending a Root Error message. As a result, the corresponding
entry is deleted from the table by the end node. Again now the source node initiates
the search of a new path with the new previous destination sequence number and
new broadcast identifier. For maintenance of roots in AODV, periodic exchange
of HELLO messages is done. Every time when RREQ messages are broadcasted
by source nodes to its neighboring nodes, a reverse path is set up, and due to this
reverse path, a unique ID is allotted. Every node will check the address of the initia-
tor and this unique ID and rejects the message if it had processed that request. The
AODV routing protocol creates a problem during the transmission of data in node-
to-node communication [4]. But at the same time, AODV effortlessly overcomes the
Design and analysis of wireless sensor network 239
counting to infinity and Bellman-Ford problems, and it also provides rapid conver-
gence whenever the ad hoc network topology is changed [5].
AOMDV RP is used for sensor networks also. This is basically used for link-
ing disjoint paths. As the nodes receive a duplicate root advertisement message, it
denotes another route to the receiver.
AOMDV is used for finding node disjoint roots as well as link disjoint roots.
For finding node disjoint roots, every RREQ coming through another nearest node
of sources presents a node displace lane. For finding several connections to displace
roots, the receiver gives a reply to copy of RREQs. As the primary jump is finished,
the RREPs go through opposite routes and they are node displace and therefore
link put out of place. The paths of every RREP might go through the same midway
node at some point except every one of them gets a dissimilar opposite route to the
transmitter node in order to make sure disjointness. AOMDV has been verified to be
a superior protocol that uses multipath routes. Advantages of AOMDV RP: It is the
distributed protocol to discover link disjoint paths. It reduces overhead by providing
multiple paths.
Disadvantages of AOMDV routing protocol:
It has additional overhead for route discovery for RREP. Because of periodic
route discovery it consumes extra bandwidth.
It emphasizes the issue of improving road safety and transport effectiveness
through the use of WSN. Here the authors have considered the issue of safety in
vehicular communication and for safety-related applications in WSN; they have sur-
veyed the recent approaches and protocols in WSN. The authentic time route devel-
opment algorithm is used to ease transfer congestion in city areas. First, the authors
have established a hybrid intelligent transportation system and then proposed a novel
lane preparation algorithm that outperforms the conventional scattered lane prepa-
ration algorithms for spatial utilization. Furthermore, Miao Wang et al. designed
an efficient coordinated charging strategy. Through this strategy, the authors have
achieved improvement in the energy utilization and reduction in the electric vehi-
cles’ journey charge: the availability of IPTV services over WSN, communication
scenario to verify wireless communication performance and its operating reliability
record duplication as a scheme for data division in WSN and also compared various
strategies of database replication, and the location Verification Protocol for NLOS
(non-line-of-sight) conditions in WSN. Through simulation results, the authors have
proved that the NLOS condition can be overcome by using the location verifica-
tion protocol among the neighboring vehicles, and thus the integrity of localization
services for WSN can be secured: the analyzed WSN connectivity in case of limited
RSUs (road-side units) deployment as well as provided for enhancement also.
As the path request packet will propagate in the network, if the destination gen-
erates the route reply, then it will put the path documentation into the path respond
packet from the route request packet. Figures 11.1–11.5 show the path response by
the receiving node itself. Adding new metrics and making few changes in the opera-
tion of DSR protocol using the fuzzy interface system increase the performance in
real-time applications. In some of the applications, the DSR protocol lags behind
compared with other reactive routing protocols because if any source has more than
240 Intelligent network design driven by Big Data analytics
one route in its cache, which route to choose will totally depend on the source.
When energy efficiency is considered, the DSR protocol lags behind compared with
other reactive routing protocols. The reason behind that is node mobility and node
failures.
Advantages of DSR RP:
Paths remain only between nodes that require transmission.
Disadvantages of DSR RP:
• Incremental updates can be used instead of full dump updates to avoid extra
traffic.
They designed a high-speed address generator scheme required for address gen-
eration in deinterleaver of the WiMAX receiver system. Here, they have proposed
a novel Application-Specific Integrated Circuit-based design for address generator,
and its modeling has been done using VHDL. The authors have worked on Wi-Fi
and also WiMAX technologies. Here the authors have evaluated a multi-vehicle to
infrastructure WSN by using Wi-Fi for V2V communication and using WiMAX
for the vehicle-to-infrastructure communication. They analyze the WiMAX perfor-
mance of efficient wireless channels using image and speech transmission. Here
they have thoroughly explained the concept of WiMAX system modeling with a
proper selection of wireless channels like AWGN, Rayleigh, Racian, and so on so
as to control the BER. The operational inference on WSN is 802.16e (WiMAX) and
802.11 p.
Then the authors have analyzed the performance of Wireless Broadband. Here
they observed that a large portion of the delay in the handover process is due to the
deep computing process occurring during authentication process. WiMAX topology
shows that those data rates humiliate with the increase in the distance greater than
10,000 m. The authors proved that the independent MCS level is better than others:
the performance of proactive RP in WSN over TCP and CBR connections. The
performance is analyzed from parameters like PDR, PLR, and so on. Similar work
of performance evaluation of WSN routing protocol is done by Nicholas et al. in the
field of large-scale urban environment. The routing protocol used for performance
evaluation is GPSR, Vehicle-Assisted Data Delivery, and LOUVRE. WiMAX
allows more number of users in a short coverage area. While barriers are there, the
genuine speed may be below 20 Mbps, but WiMAX can offer safe relief of data and
hold mobile subscribers at vehicular mobility. IEEE 802.16e is designed to both
achieve high-speed data services and allow mobile users with broadband wireless
access solutions. WiMAX enables higher mobility for high-speed data applications.
The number of IFFT and FFT vectors decides the number of subcarriers gen-
erated for the given OFDM system. For an OFDM symbol, the orthogonality of
subcarriers is given by
fk = k
k = 0, 1, ......, N 1 (11.1)
TMC
Here, T 1 is the intercarrier spacing
MC
k’ is the number of subcarriers whose frequency is to be calculated.
The corresponding kth subcarrier at frequency fk can therefore be written as:
244 Intelligent network design driven by Big Data analytics
and so forth. The Wi-Fi sensor system is a collection of SNs. The evaluation of stor-
age, mining, and processing of the subsystem of the Internet SNs consists of sensor
subsystem, dispensation machine, and message scheme.
11.3 WSN application
It has various applications in this field of WSN. The few applications can be the
following.
Military application: WSN is probably an important piece of martial command,
control, communication for calculating intellect, battleground surveillance, investi-
gation, and directing systems.
Locale monitoring: In this locale monitoring, the feeler nodes are organized
in a section where some experience is to be monitored. At what time the feeler
notices the occasion creature observed high temperature and force, and the occasion
is stated to one is a BS.
Transference: Genuine-time transfer info is existence composed by WSNs to
future supply transport model and attentive drivers of mobbing and transfer problem.
Healthiness application: In the same health application are at the bottom of
intrusions for disabled, including patient monitor, indicative, and medicine direction
in hospices, tale-monitoring of humanoid physiological information, and tracing and
monitoring medics or patients within a hospice.
Ecological sense: The team of ecological feeler networks has residential to cover
up lots of applications including air pollution checking, forest fire discovery, green-
house checking, and landslide detection. This is the type of application of ecology.
Structural monitoring: Wireless sensors are able to be used to observe the asso-
ciation inside constructions and infrastructures such as bridge flyovers, embank-
ment, and tunnel.
Design and analysis of wireless sensor network 247
11.4 Limitations of WSN
1. They have a very slight loading capacity – a few hundred kBs.
2. They have hesitant processing power – 8 MHz.
3. They work in a small variety – devours a lot of power.
4. They need small energy – obliges protocols.
11.5 Literature survey
In this chapter, the wireless networks are vulnerable to many outbreaks, with an
outbreak recognized as the loss outbreak. The wormhole outbreak is actual prevail-
ing, and avoiding the attack has been established to be very problematic. In such
outbreaks, double or extra malevolent plotting nodes make an advanced level simu-
lated tunnel in the net, which is working to transport packages. It presents an original
belief-based structure for classifying and unraveling nodes that generate a wormhole
in the system deprived of attractive any cryptographic income. We establish that our
arrangement purposes efficiently in the occurrence of malevolent conspiring nodes
and do not execute any needless situations upon the system founding and process
stage [1].
In this chapter, present the Ad-hoc network vehicular are predicted in to the out-
come Wi-Fi knowledge containing of short range, which is grouping of Wi-Fi. Other
applicants of prolonged reserves wireless machines are cell, and WI-max. in meant
device offers a pair of radio channel in among trans receiver aimed at the printed and
response of the information thru using the clue of MIMO expertise. Furthermore,
AMC offers a range of variation methods trusting on the sign to finish relation of the
channel. These two schemes offer the massive alteration inside the excellence of the
current network [2, 3] on this section of the paper Zigbee collection tree be dressed
documented Zigbee topology particularly suitable for WSNs impossible to resist
low power and maintenance reduce change since it supports sturdy factor rescue
schedules [4–8].
11.6 Related work
Figure 11.2 shows that the wormhole outbreak includes the only extra nodes and net-
work between them. The outback node-sized applications or statistics one recover
248 Intelligent network design driven by Big Data analytics
Constraint Number
Frequency 2.4 GHz
Bandwidth 20 MHz
Transmission model TRG propagation
Mac WiMAX
Nodes 10
Period 10 s
Zone size 500 m × 500 m
and amount them to an additional remotely placed node that allots them neighbor-
ing [9–11]. They can then transmit out several types of bouts in contradiction of
information circulating flows including discriminating decreasing. The strong factor
also displays themselves or masks in a manner. The preceding is showing or naked
wormhole outbreak, on the equal although as the concluding is a covered or close
to one [12–14].
11.7 Methodology
Simulation parameters are important for the design of any system that provides the
information about the nature of the wireless system. Table 11.1 shows the various
simulation constraints used to design the wireless communication system. They are
decided on the basis of applications that are targeted [15–33].
The QoS parameters of vehicular ad hoc networks are important to evaluate
the WSN system. Different QoS parameters used in this WSN system are explained
below.
11.7.1 Throughput
Throughput is the average rate of successful packet transfer between the transmit-
ter and receiver of any system over a communication channel. Throughput of any
network should be as high as possible. Its unit is bps.
Throughput = (Total packets received * Size of packet * 8)/(Time taken for
transmission of all packets).
11.7.2 Delay
Delay is the time taken by the packets to transfer from TX to RX. Delay affects due
to channel conditions, traffic, improper routing, and so on. Delay is an important
performance parameter of any network. Delay shows how fast your data transmis-
sion process is carried out. So delay should be as low as possible. Delay is measured
in seconds. It is calculated by the following expression.
Delay = Packet receive time – Packet send time
Design and analysis of wireless sensor network 249
Parameters Quantity
Throughput 1575.204 bps
PDR 77.76%
Delay 1.69×10ˉ5 s
• Create vehicular nodes according to the real- time vehicular system. Each
vehicular node has its own source and destination locations. Both the locations
should be within its simulation area. The vehicular nodes are moving at some
speed, so assign different speeds to each vehicular node. Give different labels
to each of the vehicular nodes. Here the vehicular nodes are labeled as V1, V2,
V3, and so on.
• Create source traffic. In the WSN system, communication takes place in three
different ways, i.e., V2V, base station to vehicle, and vehicle to base station.
Any vehicular node can communicate with other according to its requirement.
User Datagram Protocol (UDP) agent is used for transmission of data from
transmitting node. Connect UDP agents to all transmitting nodes.
• Constant Bit Rate (CBR) agent traffic source is attached to UDP with a packet
size of 1,000 bytes and different packet interval times to each of the source
vehicular nodes. UDP1 and CBR1 are attached to the same vehicular node so
that it will be transmitter or source node.
• Create the NULL agent to sink traffic. Some of the vehicular nodes are receiv-
ing nodes, so connect NULL agents to all receivers. The NULL agent will ter-
minate the data to that respective vehicular node.
• Attach two vehicular nodes in which agents UDP and NULL are attached to
make pair of transmitter and receiver so that the transmitter node can transmit
the data to the receiving node.
• Start and stop the CBR traffic within the simulation time.
11.8 Related results
The function of NAM is to show the animation of static and dynamic nodes with
packets transfer, packets loss, position of the nodes, and simulation time scale. NAM
also consists of forward, reverse, and stop functions that can be used to see any event
that occurred during the simulation at any time instant. The NAM scenarios for vari-
ous environments are given in the following.
Design and analysis of wireless sensor network 251
11.9 Conclusion
The designed WSN system is useful for transportation systems as well as indus-
try automation also. This system provides better performance than the exist-
ing system. The various simulation parameters show that the designed system
provides a high speed of the network for transportation systems as well as for
other applications.
252 Intelligent network design driven by Big Data analytics
11.10 Future scope
The investigation carried out in this book chapter leaves an ample scope for the
extension of the WiMAX-based WSN system for various environment networks.
The performance of the WiMAX-based WSN system can be improved in the future
by using the following ways.
The use of hybrid routing protocols in the WiMAX-based WSN system can
increase the performance of the network.
Various power reduction techniques can reduce power utilization at the base
station.
By using various low-energy-consumption algorithms, the utilization of energy
per vehicular node can be reduced.
One can design a WiMAX-based WSN system for a large coverage area net-
work by using multiple relay stations to improve the efficiency.
It can be concluded that the work under investigation on the design of the
WiMAX-based WSN system can be extended toward various applications in the
area of intelligent transportation systems. The practical development of an improved
WiMAX-based WSN system is one of the major potential research directions in the
future.
References
[1] Parmar A., V.B. V. ‘Detection and prevention of wormhole attack in WSN
using AOMDV protocol’. 7th International Conference on Communication,
Computing and Virtualization; Maharashtra, India, 2016. pp. 700–07.
[2] Dorge P.D., Dorle S.S. ‘Design of WSN for improvement of QoS with differ-
ent mobility patterns’. 6th International Conference on Emerging Trends in
Engineering and Technology; Nagpur, India, 16-18 Dec; 2013.
[3] Dorge P.D., Dorle S.S., Chakole M.B., Research Scholar, G. H. Raisoni
College of Engineering, Nagpur, India 'Implementation of MIMO and AMC
techniques in wimax network based vanet system'. International Journal of
Information Technology and Computer Science. 2016, vol. 8(2), pp. 60–68.
Available from http://www.mecs-press.org/ijitcs/v8n2.html
[4] Shende S.F., Deshmukh R.P., Dorge P.D. ‘Performance improvement in
ZigBee cluster tree network’. International Conference on Communication
and Signal Processing (ICCSP); Chennai, India, 6-8 Apr; 2017.
[5] Pochhi R.D., Deshmukh R.P., Dorge P.D. ‘An efficient multipath RP for cog-
nitive AD hoc networks’. International Journal of Advanced Electrical and
Electronics Engineering. 2012, vol. 1(3), pp. 1–7.
[6] Meshram S.L., Dorge P.D. ‘Design and performance analysis of mobile Ad
hoc network with reactive RPs’. International Conference on Communication
and Signal Processing (ICCSP); IEEE, Chennai, India, 6-8 Apr; 2017.
[7] Pandilakshmi S., Amar R. ‘Detecting and prevent the Wormhole attack us-
ing customized evolution’. International Journal of Innovative Research &
Studies. 2018, vol. 8(4), pp. 1–7.
[8] Ghormare S.N., Sorte S., Dorle S.S. ‘Detection and prevention of Wormhole
attack in WiMAX based mobile Adhoc network’. Second International
Conference on Electronics, Communication and Aerospace Technology
(ICECA); Coimbatore, India, 29-31 Mar; 2018.
[9] Siva Ram M.C., Manoj B.S. Ad hoc wireless networks architecture and pro-
tocols, upper saddle river, NJ: Prentice Hall PTR; 2004.
[10] Gupta S., Kar S., Dharmaraja S. 'WHOP: wormhole attack detection pro-
tocol using hound packet'. Presented at The International Conference on
INNOVATIONS Technology; Abu Dhabi, United Arab Emirates. IEEE,
[11] Hu Y.-C., Perrig A., Johnson D.B. ‘Wormhole Attacks in Wireless Networks’.
IEEE Journal on Selected Areas in Communications. 2016, vol. 24(2), pp.
370–80.
254 Intelligent network design driven by Big Data analytics
[12] ChiuH.S., Lui K.S. ‘DelPHI: wormhole detection mechanism for ad hoc
wireless networks’. 1st International Symposium on Wireless Pervasive
Computing; Phuket, Thailand, IEEE, 2006. pp. 6–11.
[13] Chaurasia U.K., Singh V. 'MAODV: modified wormhole detection AODV
protocol'. IEEE. 2013, pp. 239–43.
[14] Dorge P.D., Dorle S.S., Chakole M.B., Thote D.K. ‘Improvement of qos in
WSN with different mobility patterns’. International Conference on Radar,
Communication and Computing; Tiruvannamalai, India, IEEE, 2012. pp.
206–09.
[15] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘EMEEDP: eenhanced
multi-hop energy efficient distributed protocol for heterogeneous wire-
less sensor network’. Fifth International Conference on Communication
Systems and Network Technologies; Gwalior, India, IEEE, 2015. pp.
194–200.
[16] Kumar S., Ranjan P., Ramaswami R. ‘Energy optimization technique for
distributed localized wireless sensor network’. International Conference
on Issues and Challenges in Intelligent Computing Techniques (ICICT);
Ghaziabad, India, IEEE, 2014.
[17] Chauhan R., Kumar S. ‘Packet loss prediction using artificial intelligence uni-
fied with big data analytics, internet of things and cloud computing technolo-
gies’. 5th International Conference on Information Systems and Computer
Networks (ISCON); Mathura, India, IEEE, 2021. pp. 1–6.
[18] Sudhakaran S., Kumar S., Ranjan P., Tripathy M.R. ‘Blockchain-based
transparent and secure decentralized algorithm’. International Conference
on Intelligent Computing and Smart Communication 2019. Algorithms for
Intelligent Systems; Uttarakhand, India, IEEE, 2020. pp. 327–36.
[19] Kumar S., Trivedi M.C., Ranjan P., Punhani A., et al. Evolution of Software-
Defined Networking Foundations for IoT and 5G Mobile Networks. Hershey,
PA: IGI Publisher; 2020. p. 350.
[20] Kumar S., Ranjan P., Radhakrishnan R., Tripathy M.R. ‘Energy efficient mul-
tichannel MAC protocol for high traffic applications in heterogeneous wireless
sensor networks’. Recent Advances in Electrical & Electronic Engineering.
2017, vol. 10(3), pp. 223–32.
[21] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘Resource efficient clus-
tering and next hop knowledge based routing in multiple heterogeneous wire-
less sensor networks’. International Journal of Grid and High Performance
Computing. 2017, vol. 9(2), pp. 1–20.
[22] Kumar S., Cengiz K., Vimal S., Suresh A. ‘Energy efficient resource migra-
tion based load balance mechanism for high traffic applications IoT’. Wireless
Personal Communications. 2021, vol. 10(3).
[23] Kumar S., Cengiz K., Trivedi C.M., et al. ‘DEMO enterprise ontology with
a stochastic approach based on partially observable Markov model for data
aggregation and communication in intelligent sensor networks’. Wireless
Personal Communication. 2022.
Design and analysis of wireless sensor network 255
[24] Punhani A., Faujdar N., Kumar S. ‘Design and evaluation of cubic to-
rus Network-on-Chip architecture’. International Journal of Innovative
Technology and Exploring Engineering. 2019, vol. 8(6), pp. 2278–3075.
[25] Dubey G., Kumar S., Kumar S., Navaney P. ‘Extended opinion lexicon and
ML-based Sentiment analysis of tweets: a novel approach towards accurate
classifier’. International Journal of Computational Vision and Robotics.
2020, vol. 10(6), pp. 505–21.
[26] Singh P., Bansal A., Kamal A.E., Kumar S. ‘Road surface quality monitoring
using machine learning algorithm’ in Reddy A.N.R., Marla D., Favorskaya
M.N., Satapathy S.C. (eds.). Intelligent Manufacturing and Energy
Sustainability. Smart Innovation, Systems and Technologies. 265. Singapore:
Springer2022.
[27] Kumar S., Ranjan P., Radhakrishnan R., Tripathy M.R. ‘Energy aware dis-
tributed protocol for heterogeneous wireless sensor network’. International
Journal of Control and Automation. 2015, vol. 8(10), pp. 421–30.
[28] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘A utility maximiza-
tion approach to MAC layer channel access and forwarding’. Progress in
Electromagnetics Research Symposium, Publisher: The Electromagnetics
Academy; Prague, Czech Republic, 2015. pp. 2363–67.
[29] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘An NS3 implementa-
tion of physical layer based on 802.11 for utility maximization of WSN’.
International Conference on Computational Intelligence and Communication
Networks (CICN); Jabalpur, India, IEEE, 2016. pp. 79–84.
[30] Sharma A., Awasthi Y., Kumar S. ‘The role of Blockchain, AI and IoT for
smart road traffic management system’. IEEE India Council International
Subsections Conference (INDISCON); Visakhapatnam, India, 3-4 Oct; 2020.
pp. 289–96.
[31] Singh P., Bansal A., Kumar S. ‘Performance analysis of various information
platforms for recognizing the quality of indian roads’. 10th International
Conference on Cloud Computing, Data Science & Engineering (Confluence);
Noida, India, IEEE, 2020. pp. 63–76.
[32] Kumar S., Ranjan P., Singh P., Tripathy M.R. ‘Design and implementation of
fault tolerance technique for internet of things (iot)’. 2020 12th International
Conference on Computational Intelligence and Communication Networks
(CICN); Bhimtal, India, IEEE, 2020. pp. 154–59.
[33] Reghu S., Kumar S. ‘Development of robust infrastructure in networking to
survive a disaster’. 4th International Conference on Information Systems and
Computer Networks, ISCON 2019; Mathura, India, IEEE, 2019. pp. 250–55.
This page intentionally left blank
Chapter 12
A review of edge computing in healthcare
Internet of things: theories, practices
and challenges
Shamik Tiwari 1 and Vadim Bolshev 2,3
The pandemic has forced industries to move immediately their critical workload to
the cloud in order to ensure continuous functioning. As cloud computing expansions
pace and organisations strive for methods to increase their network, agility and stor-
age, edge computing has shown to be the best alternative. The healthcare business
has a long history of collaborating with cutting-edge information technology, and
the Internet of Things (IoT) is no exception. Researchers are still looking for sub-
stantial methods to collect, view, process, and analyse data that can signify a quanti-
tative revolution in healthcare as devices become more convenient, and smaller data
becomes larger. To provide real-time analytics, healthcare organisations frequently
deploy cloud technology as the storage layer between system and insight. Edge com-
puting, also known as fog computing, allows computers to perform important analy-
ses without having to go through the time-consuming cloud storage process. For this
form of processing, speed is key, and it may be crucial in constructing a healthcare
IoT that is useful for patient interaction, inpatient treatment, population health man-
agement and remote monitoring. We present a thorough overview to highlight the
most recent trends in fog computing activities related to the IoT in healthcare. Other
perspectives on the edge computing domain are also offered, such as styles of appli-
cation support, techniques and resources. Finally, necessity of edge computing in the
era of Covid-19 pandemic is addressed.
1
School of Computer Science, University of Petroleum and Energy Studies, Dehradun, India
2
Laboratory of Power Supply and Heat Supply, Federal Scientific Agro engineering Center VIM,
Moscow, Russia
3
Laboratory of Intelligent Agricultural Machines and Complexes, Don State Technical University,
Rostov-on-Don, Russia
258 Intelligent network design driven by Big Data analytics
12.1 Introduction
The IoT has transformed how healthcare solutions operate, and the industry has seen a
significant shift away from on-premise hardware and software and towards cloud com-
puting [1]. IoT has spread across a variety of markets, catering to customers on a global
scale. From smart voice assistants to smart homes, brands are diversifying their offerings
and experimenting with new designs to increase consumer engagement.
IoT devices include gadgets, sensors, actuators, appliances and machines, which
are designed for particular purposes and may broadcast data over the Internet or other
networks. The IoT is introducing a new layer of complexity to the task of analysing an
ever-growing mountain of data in order to improve healthcare [2]. Those who are not
physically available in a health institution can use IoT devices to accumulate health
indicators such as heart rate, temperature, blood pressure, glucose level and more, reduc-
ing the necessity for patients to travel to clinicians by collecting health data themselves.
While much of the conversation around big data has focused on the possible shortcom-
ings of Electronic Health Records (EHRs) and the major challenges of intuitive, efficient
decision support, the IoT is adding another layer of complexity. Traditional information
governance solutions are inadequate to manage this jumbled, unstandardised and poorly
defined amount of data, and they are seen undesirably by overloaded, dispassionate
health practitioners [3, 4]. Furthermore, the massive size of Patient-Generated Health
Data (PGHD) produced each day presents huge problems for already overburdened ana-
lytics infrastructures that lack the ability to handle the avalanche of big data coming their
way. If information from his wearable sensors don’t reach the treatment station in time, a
patient in the acute care unit has only minutes before a dip in vital signs leads into a dev-
astating crash. Healthcare organisations frequently utilise cloud-based solutions as the
process, service or storage layer between system and insight to allow what is now known
as ‘real-time analytics’. Data is bulk-uploaded to the cloud, and associated components
are discovered and dragged back down into a server for analytics before being submitted
to a user interface visualisation [5]. In a perfect world, the operation will only take a few
minutes but patient life and quality-of-care judgements cannot be compromised due to
time lag. The solution could be found in the edge or fog but not in the cloud. This work
provides the details of fog computing including its applications to healthcare IoT. The
rest of the chapter consists of sections on cloud computing in healthcare and its limita-
tions, fog computing and its advantages over cloud computing, role of IoT in healthcare,
practice of fog computing in healthcare, significance of machine learning in healthcare,
integrated impact of IoT, machine learning and fog computing in healthcare, modeling
and simulation tools for fog computing, necessities of edge computing in pandemic era
and conclusion in that order.
Cloud computing is a computational model that uses the Internet to connect comput-
ers, data centres, processors, servers, wired and wireless networks, storage, develop-
ment tools and even healthcare applications. A cloud service provider may take care
A review of edge computing in healthcare internet of things 259
of any or all of these needs, including providing equipment, training and ongoing
maintenance. Cloud computing can be deployed in different manners, depending on
the resources that an organisation demands. The first thing to consider is the deploy-
ment model – private cloud, public cloud, hybrid cloud and multicloud depending
on the aims of the business use case, each deployment type has strengths and limi-
tations. When deciding on a cloud migration plan, an organisation must weigh all
factors [6, 7]. Some scenarios for cloud-based healthcare system are presented in
Figure 12.1.
12.2.1 Public cloud
A public cloud is a cloud deployment type in which a vendor owns and operates
computational resources that are shared across several tenants through the Internet.
A public cloud is an open system that enables customers to access storage or software
for freely or on a pay-per-use basis over the Internet. A public cloud is a large data
centre that provides all of its users with the same services. The services are avail-
able to anyone and are widely used by consumers. Amazon Elastic Cloud Compute
(EC2), Google App Engine, IBM Blue Cloud and Azure Services Platform are few
examples of common public clouds.
260 Intelligent network design driven by Big Data analytics
12.2.2 Private cloud
A private cloud (or corporate cloud) is a cloud computing infrastructure in which all
equipment/software services are devoted to a single client and only that client has
access to them. Private clouds are limited in size and are used by multiple clients
who access virtualised services and draw resources from a separate pool of physical
computing. With internal hosting and firewalls, the private cloud guarantees that
data is private and secure. It also assures that third-party vendors do not have access
to functional or critical information. Private clouds include Elastra-private cloud and
HP data centres.
12.2.4 Community cloud
The phrase ‘community cloud computing’ refers to a shared cloud computing
service environment geared towards a small number of businesses or employees.
Technically, community cloud is a multitenant platform that is only available to a
limited number of customers. It can be shared by organisations with similar com-
puting concerns and mutual interests. Ventures, corporate organisations, research
groups and tenders are the greatest candidates for this form of cloud computing.
This allows community cloud users to understand and analyse market requirements
upfront.
Many EHRs are actually housed on conventional client–server architectures.
IoT has also aided in the simplification of operations in this area, making the process
much more effective and patient-centric than it was a decade ago. Healthcare activi-
ties can be made much more convenient and cost-effective by implementing cloud
computing solutions [8].
The cloud provides on-demand computing by deploying, accessing and utilis-
ing networked information, software and services using cutting-edge technology.
However, there are certain drawbacks of cloud computing in healthcare.
Data processing, analytical and computational skills are brought nearer to the net-
work’s edge with edge computing. These are the so-called ‘things’ in the context of
an IoT network. There is some ambiguity about the use of these two words in the IoT
industry [9]. Others see a slight difference in that the calculation is performed differ-
ently in a fog or edge scheme, but they are also used interchangeably. Edge computing,
also known as fog computing, enables computers to perform essential analytics without
relying on the inefficient cloud storage method. For this form of processing, speed is
important, and it could be crucial to creating the IoT in healthcare truly effective for
patient interaction, nursing, healthcare management and remote health assistance. Fog
computing is a distributed computing model that sits between cloud data centres, IoT
computers and sensors as an intermediary layer [10, 11]. It offers cloud-based services
with computing, networking and storage capabilities that can be applied closer to IoT
devices and sensors. Cisco introduced fog computing in 2012 to handle the problems
that IoT applications face in traditional computing environments. IoT devices and sen-
sors, as well as real-time and latency-sensitive service specifications, are widely spread
at the network’s edge. Cloud data centres are physically centralised, and they regularly
struggle to meet the storage and handling demands of billions of globally dispersed IoT
devices and sensors. As a result, the network is congested, service delivery is late and
the quality of service is weak [12, 13]. Figure 12.2 presents edge and cloud computing
with end point devices. Table 12.1 assesses cloud computing and edge/fog computing.
12.4 IoT in healthcare
IoT can play a critical role in smart hospitals, for example. By integrating IoT
technologies into the healthcare sector, it can change almost all of these situations.
Utilising blockchain and smart contracts, inefficient large paper registries can be
substituted by an automated, centralised database. Submissions can be received,
queues can be controlled and staff members may be tracked in real time via smart-
phones with an all-encompassing processing system. Using blockchain and smart
contracts, any equipment can be continuously controlled and maintained.
The availability of critical data and real-time data analysis is beneficial to every
firm. These variables, on the other hand, could spell the change between life and
loss in the healthcare. The vast bulk of processing remains to take place in the cloud
or at single-site data centres. Analysing data from a distance, particularly from the
latter, presents a number of challenges. Congestion of bandwidth, high latency and
low reliability are examples of these issues [16, 17]. Even with today’s 4G LTE net-
works, these problems will still arise as every second counts. Edge/fog computing
seeks to solve these problems by taking data processing closer to the data collection
devices. This is particularly useful in cases where data must be acted on right away,
A review of edge computing in healthcare internet of things 265
such as in healthcare. There isn’t time to upload it to the cloud and process it when
every second counts.
Improved security, quicker accessibility to real-time data and transmission effi-
ciency are the three significant benefits of edge computing. Because each edge data
centre handles less data, the security of health records can be improved. Because
the amount of potentially sensitive data in each site is less, malicious hackers will
have a tougher time compromising critical resources or infecting the existing net-
work. Establishing a closed-loop system in an intensive care unit (ICU) that includes
smart systems to detect acutely ill patient can help healthcare professionals react
in condition more quickly is an example of edge computing in action. In an ICU,
edge computing is performed by attaching sensors to modest, local control systems
that manage computation and transmission. Organisations can realise the following
advantages after they have precisely articulated what they really want to aim from
edge computing and what they have to do to enable it:
Machine learning (ML) is a sort of data analysis that uses artificial intelligence (AI)
to build analytical models. It is a subfield of AI centred on the idea that robots can
learn from data, see patterns and create decisions with minute to not at all human
intervention. ML refers to a system’s potential to ‘learn’ by identifying patterns in
huge datasets. In other words, the ‘solutions’ generated by ML algorithms are statisti-
cal conclusions drawn from very big datasets. The ability to collect, distribute and
deliver data is becoming increasingly important as digitalisation disrupts every sec-
tor, including healthcare [18, 19]. ML, big data and AI will all assist in overcoming
the obstacles that large quantities of data present.
ML is a branch of AI that includes the methods that permit machines to
infer implication from historical data and design intelligent systems. However,
deep learning (DL) is a subset of ML that enables machines to comprehend
extremely complex problems. DL is a relatively new branch of AI based on arti-
ficial neural networks. We can classify DL algorithms as a subsection of ML
since they require data to learn and solve issues. Figure 12.4 provides machine
learning and allied areas.
266 Intelligent network design driven by Big Data analytics
In healthcare, AI and ML are still in their infancy. Adoption on a big scale has
yet to occur. In order to be effective in the healthcare industry, AI and ML must have
the support of healthcare medical specialists and doctors.
However, a lot of money is being invested in AI in healthcare, and it is growing
quickly. AI in healthcare is now targeted at improving patient outcomes, balancing
A review of edge computing in healthcare internet of things 267
• Diabetic retinopathy
• Brain tumour
• Alzheimer disease
• Tracking tumour growth
• Breast cancer
268 Intelligent network design driven by Big Data analytics
Figure 12.6 I ntegrated role of IOT, ML and edge computing in healthcare. The
edge computing paradigm brings processing closer to physical IoT
equipment, which serves as a vital midway for lowering latency and
conserving bandwidth in the cloud.
A review of edge computing in healthcare internet of things 269
Following are some case studies where integrated role of IOT, ML and edge/
fog computing in healthcare can be applied. Figure 12.6 discusses integrated role of
IOT, ML and edge computing in healthcare.
As attention in edge/fog computing grows, so does the demand for simulation plat-
forms to facilitate the development and evaluation of edge computing systems. In
many cases, in addition to real-world solutions, simulations are needed to exam-
ine the behaviour of composite IoT-edge-cloud systems or to create modern, effec-
tive data analytics solutions. Several simulators for examining distributed systems,
270 Intelligent network design driven by Big Data analytics
especially IoT and cloud systems are available these days for researchers. Some
prominent tools for simulation of fog and edge computing are listed below [23, 24].
• iFogSim
• FogTorch
• FogTorchPi
• FogNetSim++
• FogBed
• MaxiNet
• EmuFog
• Yet Another Fog Simulator
• IoTSim
• FogExplorer
• RECAP
• EdgeCloudSim
• Sleipnir
Data processing for IoT achieves somewhat efficiency due to edge/fog computing.
It adds to the organisation’s coordinated and accurate execution. However, along
with the nice stuff, there are some severe difficulties to deal with. The following are
some of the major issues that arise with implementing edge-computing technologies
[31–33].
• Data centres acquire higher bandwidth under the conventional asset distribution
model, while endpoints get less. Because edge data processing necessitates a
large amount of bandwidth for optimised and efficient workflow, the dynamics
of edge computing altered intensely. Retaining a balance between the two while
achieving decent performance is the issue.
• Upgrades to edge devices or services might go wrong, resulting in clusters or
devices failing.
• Edge/fog computing raises the significance of the physical setting and comput-
ing environment for data collection and processing. Organisations must have
a position in local data centres to resulting optimal workload and give reliable
result [25, 34–38].
• The set of modules in most servers is distributed and placed high off the ground.
Edge Computing (EC) on the other hand, usually brings all systems closer to
the computational areas. This causes a conflict because the business server must
consider the edge server during computation.
• It becomes more difficult to detect and oversee when firms install ever more
edge nodes to oversee a larger range of processes. Devices may eventually
exceed the edge’s bounds, causing bandwidth saturation and compromising the
security of numerous devices. IoT traffic raises delay as it expands, and when
data is transferred untreated, it might endanger security [39–44].
• Because of the multiple edge receivers located at various distances from the
data centre, troubleshooting and repairing any issues that arise in the framework
necessitate a significant amount of logistical as well as manual input, raising the
cost of maintenance.
A review of edge computing in healthcare internet of things 273
In order for ‘the edge’ to become as widespread in the business world as ‘the
cloud’, plenty of technical difficulties must be overcome. Those include production
of small devices with large computing power, software that allows businesses to
remotely control and maintain an unlimited number of edge devices from anywhere
in the globe and additional security techniques and protocols to ensure things pro-
tected. Red Hat, Amazon, Microsoft, IBM, Nutanix and Cloudera, for example, are
all continuously trying to solve these issues and have built their own edge solutions.
12.11 Conclusion
also discussed. Finally, the importance of edge/fog computing in the context of the
Covid-19 pandemic is discussed. Future research should concentrate on improving
present edge tools, and there is a pressing necessity for the development of a strong
computationally edge intelligent devices and model for tackling pandemic.
References
[1] Wu Q., He K., Chen X. ‘Personalized federated learning for intelligent IoT
applications: a cloud-edge based framework’. IEEE computer graphics and
applications. 2020, vol. 1, pp. 35–44.
[2] Stoyanova M., Nikoloudakis Y., Panagiotakis S., Pallis E., Markakis E.K. ‘A
survey on the Internet of things (IoT) forensics: challenges, approaches, and
open issues’. IEEE Communications Surveys & Tutorials. 2020, vol. 22(2),
pp. 1191–221.
[3] Abrahão M.T.F., Nobre M.R.C., Gutierrez M.A. ‘A method for cohort se-
lection of cardiovascular disease records from an electronic health record
system’. International journal of medical informatics. 2017, vol. 102, pp.
138–49.
[4] Agniel D., Kohane I.S., Weber G.M. ‘Biases in electronic health record data
due to processes within the healthcare system: retrospective observational
study’. BMJ. 2018, vol. 361, p. k1479.
[5] Baumann L.A., Baker J., Elshaug A.G. ‘The impact of electronic health re-
cord systems on clinical documentation times: a systematic review’. Health
Policy. 2018, vol. 122(8), pp. 827–36.
[6] Tiwari S. ‘An ensemble deep neural network model for onion-routed traffic
detection to boost cloud security’. International Journal of Grid and High
Performance Computing. 2021, vol. 13(1), pp. 1–17.
[7] Green D.R. Cloud Computing in Healthcare: Understanding User Perception,
Organizational Operations, and IT Costs to Be Successful in the Cloud.
[Doctoral dissertation]. Northcentral University; 2020.
[8] Mourya A.K., Idrees S.M. ‘Cloud computing-based approach for accessing
electronic health record for healthcare sector’. Microservices in Big Data
Analytics. Singapore: Springer; 2020. pp. 179–88.
[9] Dong P., Ning Z., Obaidat M.S., et al. ‘Edge computing based healthcare sys-
tems: enabling decentralized health monitoring in Internet of medical things’.
IEEE Network. 2020, vol. 34(5), pp. 254–61.
[10] Amin S.U., Hossain M.S. ‘Edge intelligence and Internet of things in health-
care: a survey’. IEEE Access. 2020, vol. 9, pp. 45–59.
[11] Patra B., Mohapatra K. ‘Cloud, edge and fog computing in healthcare’.
Intelligent and Cloud Computing. Singapore: Springer; 2021. pp. 553–64.
[12] Verma P., Fatima S. 'Smart healthcare applications and real-time analytics
through edge computing' in Internet of things use cases for the healthcare
industry; 2020. pp. 241–70.
A review of edge computing in healthcare internet of things 275
[13] Yaraziz M.S., Bolhasani H. 'Edge computing applications for IoT in health-
care: A systematic literature review'. In Review. 2021.
[14] Haghi Kashani M., Madanipour M., Nikravan M., Asghari P., Mahdipour
E. ‘A systematic review of IoT in healthcare: applications, techniques, and
trends’. Journal of Network and Computer Applications. 2021, vol. 192(5)
103164.
[15] Dang L.M., Piran M.J., Han D., Min K., Moon H. ‘A survey on internet of
things and cloud computing for healthcare’. Electronics. 2019, vol. 8(7),
p. 768.
[16] Singh A., Chatterjee K. ‘Securing smart healthcare system with edge comput-
ing’. Computers & Security. 2021, vol. 108(1) 102353.
[17] Mulimani M.S., Rachh R.R. ‘Edge computing in healthcare systems’. Deep
Learning and Edge Computing Solutions for High Performance Computing.
Cham: Springer; 2021. pp. 63–100.
[18] Qayyum A., Qadir J., Bilal M., Al-Fuqaha A. ‘Secure and robust machine
learning for healthcare: a survey’. IEEE Reviews in Biomedical Engineering.
2020, vol. 14, pp. 156–80.
[19] Saleem T.J., Chishti M.A. ‘Exploring the applications of machine learning in
healthcare’. International Journal of Sensors, Wireless Communications and
Control. 2020, vol. 10(4), pp. 458–72.
[20] Tiwari S. ‘Dermatoscopy using multi-layer perceptron, convolution neural
network, and capsule network to differentiate malignant melanoma from be-
nign nevus’. International Journal of Healthcare Information Systems and
Informatics. 2021, vol. 16(3), pp. 58–73.
[21] Sharma S., Tiwari S. ‘COVID-19 diagnosis using X-ray images and deep
learning’. International Conference on Artificial Intelligence and Smart
Systems (ICAIS); Coimbatore, India, IEEE, 2021. pp. 344–49.
[22] Darwish A., Hassanien A.E., Elhoseny M., Sangaiah A.K., Muhammad K.
‘The impact of the hybrid platform of Internet of things and cloud computing
on healthcare systems: opportunities, challenges, and open problems’. Journal
of Ambient Intelligence and Humanized Computing. 2019, vol. 10(10), pp.
4151–66.
[23] Ning H., Li Y., Shi F., Yang L.T. ‘Heterogeneous edge computing open plat-
forms and tools for Internet of things’. Future Generation Computer Systems.
2020, vol. 106, pp. 67–76.
[24] Ranjan R., Villari M., Shen H., Rana O., Buyya R. ‘Software tools and tech-
niques for FOG and edge computing’. Software: Practice and Experience.
2020, vol. 50(5) 473–5.
[25] Gupta H., Vahid Dastjerdi A., Ghosh S.K., Buyya R. ‘iFogSim: a toolkit for
modeling and simulation of resource management techniques in the Internet
of things, edge and FOG computing environments’. Software: Practice and
Experience. 2017, vol. 47(9), pp. 1275–96.
[26] Javaid M., Khan I.H. ‘Internet of things (IoT) enabled healthcare helps to
take the challenges of COVID-19 pandemic’. Journal of Oral Biology and
Craniofacial Research. 2021, vol. 11(2), pp. 209–14.
276 Intelligent network design driven by Big Data analytics
[27] Sufian A., Ghosh A., Sadiq A.S., Smarandache F. ‘A survey on deep transfer
learning to edge computing for mitigating the COVID-19 pandemic’. Journal
of Systems Architecture. 2020, vol. 108(4), p. 101830.
[28] Rahman M.A., Hossain M.S. ‘An internet-of-medical-things-enabled edge
computing framework for tackling COVID-19’. IEEE Internet of Things
Journal. 2021, vol. 8(21) 15847–54.
[29] Tiwari S., Jain A. ‘Convolutional capsule network for COVID‐19 detection
using radiography images’. International Journal of Imaging Systems and
Technology. 2021, vol. 31(2), pp. 525–39.
[30] Kong X., Wang K., Wang S., et al. ‘Real-time mask identification for
COVID-19: an edge-computing-based deep learning framework’. IEEE
Internet of Things Journal. 2021, vol. 8(21) 15929–38.
[31] Varghese B., Wang N., Barbhuiya S., Kilpatrick P., Nikolopoulos D.S.
‘Challenges and opportunities in edge computing’. IEEE International
Conference on Smart Cloud (SmartCloud); New York, NY, USA, IEEE, 2016.
pp. 20–26.
[32] Xiao Y., Jia Y., Liu C., Cheng X., Yu J., Lv W. ‘Edge computing security: state
of the art and challenges’. Proceedings of the IEEE. 2019, vol. 107(8), pp.
1608–31.
[33] Liu S., Liu L., Tang J., Yu B., Wang Y., Shi W. ‘Edge computing for autono-
mous driving: opportunities and challenges’. Proceedings of the IEEE. 2019,
vol. 107(8), pp. 1697–716.
[34] Haidar M., Kumar S. ‘Smart healthcare system for biomedical and health care
applications using aadhaar and blockchain’. 5th International Conference
on Information Systems and Computer Networks (ISCON); Mathura, India,
IEEE, 2022. pp. 1–5.
[35] Kumar S., Ranjan P., Radhakrishnan R., Tripathy M.R. ‘Energy efficient mul-
tichannel MAC protocol for high traffic applications in heterogeneous wireless
sensor networks’. Recent Advances in Electrical & Electronic Engineering.
2017, vol. 10(3), pp. 223–32.
[36] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘Resource efficient clus-
tering and next hop knowledge based routing in multiple heterogeneous wire-
less sensor networks’. International Journal of Grid and High Performance
Computing. 2017, vol. 9(2), pp. 1–20.
[37] Punhani A., Faujdar N., Kumar S. 'Design and evaluation of cubic to-
rus network-on-chip architecture'. International Journal of Innovative
Technology and Exploring Engineering (IJITEE), 1672-1676. 2019, vol.
8(6).
[38] Dubey G., Kumar S., Kumar S., Navaney P. ‘Extended opinion lexicon and
ML-based Sentiment analysis of tweets: a novel approach towards accurate
classifier’. International Journal of Computational Vision and Robotics.
2020, vol. 10(6), pp. 505–21.
[39] Kumar S., Ranjan P., Radhakrishnan R., Tripathy M.R. ‘Energy aware dis-
tributed protocol for heterogeneous wireless sensor network’. International
Journal of Control and Automation. 2015, vol. 8(10), pp. 421–30.
A review of edge computing in healthcare internet of things 277
[40] Singh P., Bansal A., Kamal A.E., Kumar S. ‘Road Surface Quality Monitoring
Using Machine Learning Algorithm’ in Reddy A.N.R., Marla D., Favorskaya
M.N., Satapathy S.C. (eds.). Intelligent Manufacturing and Energy Sustainability.
Smart Innovation, Systems and Technologies. 265. Singapore: Springer; 2022.
[41] Sharma A., Awasthi Y., Kumar S. ‘The role of blockchain, AI and iot for smart
road traffic management system’. Proceedings of the IEEE India Council
International Subsections Conference, INDISCON 2020, Publisher: IEEE,
Visakhapatnam, India; 2020. pp. 289–96.
[42] Kumar S., Ranjan P., Singh P., Tripathy M.R. ‘Design and implementa-
tion of fault tolerance technique for internet of things (iot)’. Proceedings
– 2020 12th International Conference on Computational Intelligence and
Communication Networks, CICN 2020, Publisher: IEEE, Bhimtal, India;
2020. pp. 154–59.
[43] Singh P., Bansal A., Kumar S. ‘Performance analysis of various information
platforms for recognizing the quality of indian roads’. Proceedings of the
Confluence 2020 – 10th International Conference on Cloud Computing, Data
Science and Engineering; Noida, India, IEEE, 2020. pp. 63–76.
[44] Chauhan R., Kumar S. ‘Packet loss prediction using artificial intelligence uni-
fied with big data analytics, internet of things and cloud computing technolo-
gies’. 5th International Conference on Information Systems and Computer
Networks (ISCON); India, Mathura, 2021. pp. 1–6.
[45] Reghu S., Kumar S. ‘Development of robust infrastructure in networking to
survive a disaster’. 4th International Conference on Information Systems and
Computer Networks, ISCON 2019; Mathura, India, IEEE, 2019. pp. 250–55.
[46] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘An NS3 implementa-
tion of physical layer based on 802.11 for utility maximization of WSN’.
Proceedings of the International Conference on Computational Intelligence
and Communication Networks, CICN 2015; 2016. pp. 79–84.
[47] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘A utility maximiza-
tion approach to mac layer channel access and forwarding’. Progress in
Electromagnetics Research Symposium; The Electromagnetics Academy,
2015. pp. 2363–67.
[48] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘EMEEDP: enhanced
multi-hop energy efficient distributed protocol for heterogeneous wire-
less sensor network’. Proceedings of the 5th International Conference on
Communication Systems and Network Technologies, CSNT; Gwalior, India,
IEEE, 2015. pp. 194–200.
[49] Kumar S., Ranjan P., Ramaswami R. ‘Energy optimization in distrib-
uted localized wireless sensor networks’. Proceedings of the International
Conference on Issues and Challenges Intelligent Computing Technique
(ICICT); Ghaziabad, India, IEEE, 2014.
[50] Sudhakaran S., Kumar S., Ranjan P., Tripathy M.R. ‘Blockchain-based trans-
parent and secure decentralized algorithm’. International Conference on
Intelligent Computing and Smart Communication. Algorithms for Intelligent
Systems; Springer: Singapore; 2020.
278 Intelligent network design driven by Big Data analytics
[51] Kumar S., Trivedi M.C., Ranjan P., Punhani A. Evolution of Software-Defined
Networking Foundations for IoT and 5G Mobile Networks. Hershey, PA: IGI
Publisher; 2020. p. 350.
[52] Kumar S., Cengiz K., Vimal S., Suresh A. ‘Energy efficient resource migra-
tion based load balance mechanism for high traffic applications IoT’. Wireless
Personal Communications. 2021, vol. 10(3).
Chapter 13
Image Processing for medical images on the
basis of intelligence and biocomputing
M. Mohammed Mustafa 1, S. Umamaheswari 2, and
Korhen Cengiz 3
13.1 Introduction
The motive for the primary image processing was modified to enhance the huge
image. It is modified to goal human beings to enhance the seen effect of people. In
1
Department of Information Technology, Sri Krishna College of Engineering & Technology,
Coimbatore, Tamilnadu, India
2
Department of Information Technology, C. Abdul Hakeem College of Engineering & Technology,
Melvishsaram, Tamilnadu, India
3
College of Information Technology, University of Fujairah, UAE
280 Intelligent network design driven by Big Data analytics
image processing, the middle is a low and huge image, and the output is an image
with top-notch progress. Common image processing consists of image enhance-
ment, restoration, encoding, and compression. First-a-hit software program became
the American Jet Propulsion Laboratory. They used image processing techniques,
which protected geometric correction, gradation transformation, noise removal, etc.
13.1.1 What is an image?
Practically, each scene around us is bureaucratic, and this is concerned with photo
processing. A photo is shaped through two-dimensional analog and virtual signs that
carry color facts organized alongside x and y spatial axis.
13.2 Image processing
Image processing is a technique to carry out a few operations on an image, with the
purpose to get a superior image or extract a few beneficial records from it.
In general terms, manipulating an image to amplify the same to induce informa-
tion out of it is called image processing.
There are two styles of image processing, which are as follows:
• analog image processing, which is used for recycling photos, printouts, and
other image hard clones
• digital image processing, which is used for manipulating digital images with the
help of complex algorithms
• representing reused data in a visual way one can understand, for example, giv-
ing a visual form to unnoticeable objects
• to ameliorate the reused image quality, image stropping and restoration work
well
• image recuperation helps in searching images
• helps to measure objects in the image
• with pattern recognition, it becomes easy to classify objects in the image, detect
their position, and get an overall understanding of the scene
The method of labeling an item, based primarily on the facts, is provided by its
description. Recognition is the method of assigning a tag, such as a “vehicle” to an
item, based primarily on its descriptors.
The first is the bodily instrument, which is sensitive to the electricity radi-
ated using the object that one wishes to visualize (sensor). The second,
called a digitizer, is a tool for transforming the output of the body detection
tool into a virtual shape [13–19].
5. Image displays
Presentations used today are mainly color television screens (ideally a flat
screen). Monitors are pushed through the image outputs, and the images
show playing cards which are an essential part of a PC system.
284 Intelligent network design driven by Big Data analytics
6. Hardcopy devices
Used for recording images, they consist of laser printers, film cameras,
heat-sensitive devices, inkjet gadgets, and virtual gadgets, which include
optical discs, cameras, and Compact Disk-Read Only Memory (CD-ROM).
7. Networking
13.3 Medical imaging
Medical imaging is the technique used to achieve pictures of the frame elements for
scientific applications, makes use of it on the way to perceive or examine diseases.
There are masses of heaps of imaging strategies completed every week worldwide.
Medical imaging is developing swiftly due to developments in image graph process-
ing techniques including image graph recognition, assessment, and enhancement.
Image processing will grow the percentage and amount of detected tissues. There
are many straightforward and complicated image graph assessment techniques
within the medical imaging field. They are also explaining a way to exemplify photo
interpretation demanding situations and the usage of distinct photo processing algo-
rithms which include k-means, ROI primarily based on segmentation, and water-
shed strategies.
Deep learning strategies, especially convolutional neural networks (CNNs), have con-
firmed success in clinical imaging classification. Many, one-of-a-kind CNN architectures
have been investigated on images of chest X-rays for prognosis of sickness (Figure 13.1).
These fashions were preskilled at the keras and tensor waft ImageNet database with the aid
of lowering the use for big education units as they have got preskilled weights. Its changes
discovered that CNN primarily based on architectures has the capability for prognosis
of sickness. Several types of current in-depth study strategies including Convolutional
Neural Community (CNN), Vanilla Neural Community, Fully Neural Community-Based
Visible Geometry Organization (VGG), and Pill Community are implemented for the pre-
diction of lung disease. Simple CNN has terrible overall performance for rotated, tilted,
or non-normal image orientation. Recently, in-depth study has proven brilliant capabil-
ity while implementing clinical images for sickness detection, inclusive of lung sickness.
Image Processing for medical images 285
Lung sickness refers to problems that have an effect on the lungs, the organs that enable
us to breathe. Breathing trouble due to lung sickness may also save the frame from getting
sufficient oxygen. Lung sickness is the first-rate challenge for human beings, and many
people lose their life due to lung sickness. In general, the CNN set of rules is getting used
for image classification and popularity due to its excessive accuracy.
In this modern day ever transcending world, wherein the generation is convert-
ing loads of one-of-a-kind fields and industries, human beings nevertheless now
no longer create an entire benefit of generation in diverse industries. The foremost
cause for this will be that maximum of the engineers as a minimum the bulk of
them are absolutely dedicated to laptop technological know-how, which regard a
sequence of hate or problem attitude for the programmer; however, there are few
who clearly desire to make large adjustments in the clinical region and in fact make
novel improvements within the field.
In current years, a lot of such computer-aided diagnoses (CAD) structures are
designed for the analysis of numerous diseases. Most lung cancer detection at an early
degree has come to be very critical and additionally very clean with image processing and
deep gaining knowledge of strategies. The lung-affected person computer tomography
(CT) test images are used to classify the lung nodules and to stumble on the malignancy
stage of those nodules. The CT test images may be segmented with the use of U-Net archi-
tecture. Globally, pneumonia is the maximum critical motive of loss of life, although it
286 Intelligent network design driven by Big Data analytics
Frequently, we wish we could make old images more. And that is possible
currently. Zooming, stropping, edge discovery, and high dynamic range
edits all fall under this order. All these ways help in enhancing the image.
Utmost editing software and image correction law can do these effects flu-
ently.
Utmost editing apps and social media apps give pollutants these days.
Pollutants make the image look more visually charming. Pollutants are
generally a set of functions that change the colors and other aspects in an
image that make the image look different. Pollutants are an intriguing op-
eration of image processing.
3. Medical technology
In the medical field, image processing is used for colorful tasks such as
Positron emmission tomography (PET) checkup, X-ray imaging, medical
CT, UV imaging, cancer cell image processing, etc. The preface of image
processing to the medical technology field has greatly bettered the diag-
nostic process.
4. Computer/machine vision
5. Pattern recognition
6. Video processing
288 Intelligent network design driven by Big Data analytics
A CNN is crafted from one or more convolutional layers (regularly with a subsampling
step) as soon as that is observed through technique of manner of one or more wholly
related layers as accomplice diploma exceeding in a powerful multilayer neural commu-
nity. The shape of a CNN is supposed to require a gain of the second one shape of an input
image (or absolutely distinct two-dimensional input which includes a speech signal). This
may be completed with near connections and tied weights found through the manner of
many forms of pooling which ends up in translation-invariant features. Another gain of
CNNs is that they are simpler to train and feature numerous fewer parameters than wholly
related networks with the same form of hidden units. The data are preprocessed which
include image reshaping, resizing, and conversion to array form. A similar system is also
accomplished on the test image. A database complete of approximately 4,000 one every
of a kind plant species is obtained, out of that, any photograph is likewise used as a test
photograph for the software bundle program.
The training database is used to teach the version (CNN) so that it can select the con-
trol image and the disease it has. CNN has unique levels which could be Convolution2D,
MaxPooling, and fully related. Once the version is successfully activated, the software can
detect the disease if the plant species is contained in the database.
After a fulfillment training and preprocessing, evaluation takes a look at image,
and skilled model takes area to anticipate the contamination.
9. Then this version has brought one further layer to flatten the output of the on top
of designed convolutional neural community version.
10. This destruction system will deliver the perform set for every image within the
form of output.
11. Now, this version has utterly connected layers so it will be used for the kind of
photos entirely at the generated function set.
12. This dense layer acts as a result of the hidden layer of the artificial neural com-
munity having 512 hidden neurons, and also the activation feature is ReLU.
This version is an intended one among these manners that every entered somatic
cell is connected to every completely different hidden neuron forming a very
connected layer.
13. It is one further completely related layer that acts as a result of the output layer
of the artificial neural community having three output neurons. The range of
output neurons is regularly looking on the classes. It makes use of SoftMax
activation feature.
14. The output of this residue is that the anticipated magnificence label that is to
assess the final accuracy of this version.
The CNN algorithm is being used for image classification and recognition
because of its high accuracy. Initially, the problem statement has been studied and
analyzed the dataset, then apply machine learning and deep learning to predict
whether the patient has lung disease or not. In this current ever-transcending world,
where technology is changing a lot of different fields and industries, people still not
making complete advantage of technology in various industries. The main reason
for this would be that most of the engineers are completely committed to computer
science, which regard a series of hate or difficulty mindset for the programmer, but
there are few who really want to make big changes in the medical sector and actually
make novel advancements in the field.
13.6 Convolution layers
From Figure 13.2, the convolutional layer includes a tough associate in nursing
quick of filters whose parameters need to be learned. The height and weight of the
filters are smaller than those of the enter extent. Every filter is convolved with the
enter extent to cipher an activation map factory made from neurons. In numerous
words, the clear out is softened throughout the breadth and peak of the middle, and
also the dot merchandise between the enter and clear out is computed at each spatial
position.
Automatically detecting diseases and obtaining correct diagnoses through X-ray
medical images have become a new research priority in the field of computer sci-
ence and AI, as the expense of manual labeling and classification continues to rise.
However, the quality of a standard radiograph is insufficient for most activities,
and traditional approaches are inadequate for dealing with large pictures. To detect
pneumothorax from chest X-ray images, we present a feature fusion CNN model. To
290 Intelligent network design driven by Big Data analytics
begin, two methods are used to improve the preprocessed image samples. The final
classification is then implemented using a feature fusion CNN model that combines
the Gabor features with the additional information collected from the images.
Furthermore, the CNN has made significant progress in the recognition of pat-
terns in images, particularly in medical imaging. Designing an image-feature extrac-
tor is crucial in traditional CAD techniques. This is, however, a challenging task.
A CAD approach that employs CNN, on the other hand, does not necessitate the
employment of an image-feature extractor. We used CNN to create an image-based
Computer Aided Design X (CADx) for differential diagnosis of lung anomalies
including lung nodules and diffuse lung illnesses in this work. CNN performs admi-
rably in the classification of natural photos. As a result, numerous studies have been
conducted on the differential diagnosis of lung anomalies such as lung nodules and
diffuse lung illnesses. CNNs or ConvNets are a type of deep neural network used to
analyze visual imagery in deep learning. Based on the shared weight architecture of
the convolution kernels that shift over input features and produce translation equiv-
ariant responses, they are also known as shift in variant or space invariant artificial
neural networks (SIANN). Surprisingly, most convolutional neural networks are
equivariant rather than invariant under translation. They can be used in computer
vision and pattern recognition, recommender systems, classification, image segmen-
tation, medical image analysis, natural language processing, central nervous system
interfaces, and financial time series, to name a few applications.
Step 1: The first step is to do a convolution operation. The convolution opera-
tion is the initial step in our attack strategy. We will discuss about feature detectors
in this stage, which are essentially the neural network’s filters. We will also discuss
about feature maps, including how to learn the parameters of such maps, how to
recognize patterns, the layers of detection, and how to map out the results.
Image Processing for medical images 291
Step 1a: ReLU layer will be used in the second half of this process. We will
discuss about ReLU layers and how linearity works in the context of CNNs. It is not
vital to understand CNNs but it is never a bad idea to brush up on your skills.
Step 2: Pooling is the next step. Pooling will be covered in this section, and we
will learn how it works in general. However, our nexus will be a special form of
pooling: maximum pooling. However, we will go over a variety of ways, including
mean (or sum) pooling. This section will conclude with a demonstration using a
visual interactive tool that will undoubtedly clarify the entire subject.
Step 3: Flattening is the third step. When working with CNNs, there will be a
brief discussion of the flattening process and how we move from pooling to flattened
layers.
Step 4: Complete the connection, and everything we have discussed so far will
be combined in this section. By learning this, you will gain a better understanding of
how CNNs work and how the “neurons” that are eventually formed learn to classify
photos.
The output extent of the convolutional layer is received via manner of means of
stacking the activation maps of all filters aboard the intensity dimension. Since the
breadth and peak of each filter out are intended to be smaller than the entire, every
nerve cell within the activation map is simplest involving tiny low close space of
the enter extent. In numerous words, the receptive subject length of every neuron
is small and is a clone of the clear out length. The nearby property is influenced via
way of means of the structure of the animal visible cortex whereby the receptive
fields of the cells are small. The nearby property of the convolutional layer permits
the community to check filters that maximally reply to a close-by space of the enter,
consequently exploiting the spatial nearby correlation of the enter (for AN enter
image, a constituent is further correlated to the within sight pixels than to the remote
pixels). In addition, as a result, the activation map is received via manner of means
of acting convolution among the filter out and also the larger, the clear out param-
eters are shared for all nearby positions. The load sharing reduces the large choice
of parameters for performance of expression, performance of learning, and precise
generalization.
The accuracy of the version is excessive while compared with different category
methods.
After categorizing the photos based mostly on elegance labels, process the facts
to fit in our model with the usage of CNN and educate the CNN based mostly on
training photos. CNN is proposed on this artwork for the elegance of the leaves
inflamed with the brilliant sorts of fungal disease. The education snapshots have
been taken for the beauty labels. Training a CNN is the exercising of going for
walk education examples through the version from the input layer to the output
layer concurrently developing a prediction and identifying the effects or errors. If
292 Intelligent network design driven by Big Data analytics
the prediction is incorrect then it is over again propagated in opposite order, that is,
from final layer to first layer.
accuracies were obtained at the validation set and the schooling time. For the six
architectures, fine-tuning gave very brilliant precision (from 99.2% for SqueezeNet
to 99.5% for VGG13). The instances required for fine-tuning and for schooling
from scratch are close (from 1.05 to 5, 64 h for fine-tuning and from 1.05 to 5, 91
h on the equal time as informed from scratch). The function extraction technique
had the bottom schooling instances (from 0.85 to 3, 63 h). Overall, from scratch
and switch-studying should recognize and must not be visible as without a doubt
together for the fantastic techniques.
Gao et al.’s artwork assign a single ILD class label without delay upon whole
axial CT test photographs but without preprocessing the reap ROIs. While reading
the lung ailments photo database, with segmentation masks, a variety of CT test
photographs are located with or more illness labels. The efficiency of deep gain-
ing knowledge is a subfield of device gaining knowledge regarding algorithms
stimulated via means of the feature and shape of the brain. Recent traits in device
gaining knowledge, in particular, aid the identity, quantification, and category of
styles in scientific photos. These traits had been made feasible because of the capac-
ity of deep gaining knowledge of two discovered capabilities simply from data, as
opposed to hand-designed capabilities primarily based on domain-precise knowl-
edge. Deep-gaining knowledge is turning speedily into country of the art, mainly to
step forward overall performance in several scientific applications. Consequently,
those improvements help clinicians in detecting and classifying positive scientific
situations efficiently. Deep neural community fashions have conventionally been
designed, and experiments had been finished upon them by means of human spe-
cialists in a persevering manner with the trial-and-mistakes technique. This sys-
tem needs giant time, know-how, and resources. To triumph over this problem, a
unique, however, easy version is brought to routinely carry out the most advanta-
geous category responsibilities with a deep neural community structure. The neural
community structure turned into especially designed for pneumonia image category
responsibilities. This approach is primarily based on the convolutional neural com-
munity set of rules, using a hard and fast set of neurons to convolve on a given image
and extract applicable capabilities from them. Demonstration of the efficacy of the
proposed approach with the minimization of the computational fee as the focal point
becomes done compared with the exiting modern lung illness class networks. Lung
infection detection is commonly offered by classifying an image graph into healthy
lungs or infection-inflamed lungs. The lung infection classifier, every now and then
known as a version, is acquired through training. Training is the tool wherein a
neural community learns to apprehend a category of photos. Using deep getting
to know, it is miles feasible to train a version that can classify photos into their
respective beauty labels. Therefore, to use deep getting to know for lengthy infec-
tion detection, Step 1 is to collect photos of lungs with the infection to be classified.
The second step is to train the neural community till it can recognize the illnesses.
The very last step is to classify new photos. Here, new images are unseen through
the way of the version earlier than the established in the version, and the version
predicts the beauty of these photos.
Image Processing for medical images 295
However, the process of deep learning applied to spot respiratory organ diseases
from medical images is described. There are three main steps: (1) image preprocess-
ing, (2) coaching, and (3) classification.
13.6.3.2 Image preprocessing
Image processing is a technique to carry out a few operations on an image, with a
purpose to get an additional image or to extract a few beneficial facts from it. It is
a form of sign processing wherein input is an image and output can be an image or
characteristics/competencies related to that image. Nowadays, photo processing is
among unexpectedly growing technologies. Image processing essentially consists of
the subsequent three steps:
13.6.3.3 Training
Deep mastering neural community fashions discover ways to map inputs to outputs
given an education dataset of examples. The education procedure entails locating
a fixed weight within the community that proves to be appropriate, or appropriate
enough, at fixing the unique problem. Training is the procedure wherein a neural
community learns to understand a category of images. We need to teach the neu-
ral community till it may understand the diseases. This is the vital section of the
machine as this section makes a decision on the accuracy and overall performance
of the machine.
13.6.3.4 Classification
The very last section is the type of the images. In our machine, the type is of the under-
neath types, such as (1) healthy, (2) pneumonia, (3) tuberculosis, and (4) Covid-19. While
importing the image, our skilled version classifies the image into someone of the above
diseases. This approach has the subsequent advantages:
1. use higher and optimized facts of scans to supply an awful higher consequences
2. increase overall performance of utility computation usage
3. hyper rapid consequences without taking an awful computation from locale
4. accurate consequences
5. clear generalized record in my view dispatched to user
296 Intelligent network design driven by Big Data analytics
13.6.6.1 Classification
The set of rules works in a two-segment cycle, the ahead by skip and returned propa-
gation. During the ahead by skip, the photo is handed to each of the above layer, and
the output is calculated. The anticipated output is in comparison with the real output,
and the mistake is calculated. After the mistake is calculated, the set of rules then
adjusts the weights, that is, the spatial values of each of the filters and the biases as
much as the primary enter layer. This adjustment of the weights or the spatial values
of the filters is the returned propagation segment. This returned propagation seg-
ment is used at the side of optimization strategies together with gradient descent to
decrease the mistake as a great deal as possible.
1. preprocessing of images
2. education
3. categorization
13.7.2 Training
Given a training dataset of examples, deep learning neural network models learn to
map inputs to outputs. The training process entails locating a set of weights in the
network that is good, or good enough, at solving the specific problem.
Training data for ML is an important input to algorithms that comprehend
and memorize information from such data for future prediction. However, various
aspects emerge during the ML development process, without which various critical
tasks cannot be completed. Training data is the foundation of any AI or ML project;
without it, it is impossible to train a machine that learns from humans and predicts
for humans.
The process by which a neural network learns to recognize a class of images is
known as training. We must train the neural network until it can recognize diseases.
This is the most important phase of the system because it determines the system’s
accuracy and performance.
13.7.3 Classification
Image classification entails extracting features from an image in order to identify
patterns in a dataset. Using an Artificial neural network (ANN) for image classifi-
cation would be extremely computationally expensive due to the extremely large
trainable parameters.
The images are classified in the final stage. The classification in our system is
of the following types.
1. Normal
2. Pneumonia (19th)
3. Tuberculosis
4. Covid-19
While uploading the image, our trained model classifies it as one of the diseases
listed above. We use better and optimized scan data in the proposed system to pro-
duce much better and optimized results. Our system improves application perfor-
mance when it comes to computation usage.
It generates hyperfast results without requiring much computation from local,
which saves lives at an earlier stage, and it generates accurate results. The results
produced are highly accurate.
300 Intelligent network design driven by Big Data analytics
13.8 Conclusion
References
[1] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘EMEEDP: enhanced
multi-hop energy efficient distributed protocol for heterogeneous wire-
less sensor network’. Proceedings of the 5th International Conference on
Communication Systems and Network Technologies, CSNT 2015; Gwalior,
India, IEEE, 2015. pp. 194–200.
[2] Kumar S., Ranjan P., Ramaswami R. ‘Energy optimization in distrib-
uted localized wireless sensor networks’. Proceedings of the International
Conference on Issues and Challenges Intelligent Computing Technique
(ICICT); Ghaziabad, India, IEEE, 2014.
[3] Chauhan R., Kumar S. ‘Packet loss prediction using artificial intelligence uni-
fied with big data analytics, internet of things and cloud computing technolo-
gies’. 5th International Conference on Information Systems and Computer
Networks (ISCON); Mathura, India, IEEE, 2021. pp. 1–6.
[4] Sudhakaran S., Kumar S., Ranjan P., Tripathy M.R. ‘Blockchain- based
transparent and secure decentralized algorithm’. International Conference
on Intelligent Computing and Smart Communication 2019. Algorithms for
Intelligent Systems; Singapore: Springer; 2020.
[5] Kumar S., Trivedi M.C., Ranjan P., Punhani A. Evolution of Software-Defined
Networking Foundations for IoT and 5G Mobile Networks. Hershey, PA: IGI
Publisher; 2020. p. 350.
[6] Kumar S., Ranjan P., Radhakrishnan R., Tripathy M.R. ‘Energy efficient mul-
tichannel MAC protocol for high traffic applications in heterogeneous wireless
sensor networks’. Recent Advances in Electrical & Electronic Engineering.
2017, vol. 10(3), pp. 223–32.
Image Processing for medical images 301
[7] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘Resource efficient clus-
tering and next hop knowledge based routing in multiple heterogeneous wire-
less sensor networks’. International Journal of Grid and High Performance
Computing. 2017, vol. 9(2), pp. 1–20.
[8] Kumar V., Arablouei R., Cengiz K., Vimal S., Suresh A. ‘Energy efficient re-
source migration based load balance mechanism for high traffic applications
IoT’. Wireless Personal Communications. 2022, vol. 10(3), pp. 1623–44.
[9] Kumar S., Cengiz K., Trivedi C.M., et al. ‘DEMO enterprise ontology with
a stochastic approach based on partially observable Markov model for data
aggregation and communication in intelligent sensor networks’. Wireless
Personal Communication. 2022.
[10] Punhani A., Faujdar N., Kumar S. ‘Design and evaluation of cubic to-
rus Network-on-Chip architecture’. International Journal of Innovative
Technology and Exploring Engineering (IJITEE). 2019, vol. 8(6).
[11] Dubey G., Kumar S., Kumar S., Navaney P. ‘Extended opinion lexicon and
ML-based Sentiment analysis of tweets: a novel approach towards accurate
classifier’. International Journal of Computational Vision and Robotics.
2020, vol. 10(6), pp. 505–21.
[12] Singh P., Bansal A., Kamal A.E., Kumar S. ‘Road surface quality monitoring
using machine learning algorithm’ in Reddy A.N.R., Marla D., Favorskaya
M.N., Satapathy S.C. (eds.). Intelligent Manufacturing and Energy
Sustainability. Smart Innovation, Systems and Technologies. 265. Singapore:
Springer; 2022.
[13] Kumar S., Ranjan P., Radhakrishnan R., Tripathy M.R. ‘Energy aware dis-
tributed protocol for heterogeneous wireless sensor network’. International
Journal of Control and Automation. 2015, vol. 8(10), pp. 421–30.
[14] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘A utility maximiza-
tion approach to MAC layer channel access and forwarding’. Progress in
Electromagnetics Research Symposium, Publisher: The Electromagnetics
Academy; 2015. pp. 2363–67.
[15] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘An NS3 implementa-
tion of physical layer based on 802.11 for utility maximization of WSN’.
Proceedings of the International Conference on Computational Intelligence
and Communication Networks, CICN 2015; Jabalpur, India, IEEE, 2016. pp.
79–84.
[16] Sharma A., Awasthi Y., Kumar S. ‘The role of blockchain, AI and iot for smart
road traffic management system’. Proceedings of the IEEE India Council
International Subsections Conference, INDISCON 2020; Visakhapatnam,
India, IEEE, 2020. pp. 289–96.
[17] Singh P., Bansal A., Kumar S. ‘Performance analysis of various information
platforms for recognizing the quality of indian roads’. Proceedings of the
Confluence 2020 – 10th International Conference on Cloud Computing, Data
Science and Engineering; Noida, India, IEEE, 2020. pp. 63–76.
[18] Kumar S., Ranjan P., Singh P., Tripathy M.R. ‘Design and implementation of
fault tolerance technique for internet of things (iot)’. Proceedings of the 12th
302 Intelligent network design driven by Big Data analytics
Internet of Things (IoT) provides a pathway for connecting physical entities with
digital entities using devices and communication technologies. The rapid growth of
IoT in recent days has made a significant influence in many fields. Healthcare is one
of those fields which will be hugely benefited by IoT. IoT can resolve many chal-
lenges faced by patients and doctors in healthcare. Smart health-care applications
allow the doctor to monitor the patient’s health state without human intervention.
Sensors collect and send the data from the patient. Recorded data are stored in a
database that enables medical experts to analyze those data. Any abnormal change in
the status of the patient can be notified to the doctor. This chapter aims to study dif-
ferent research works made on IoT-based health-care systems that are implemented
using basic development boards. Various hardware parameters of health-care sys-
tems and sensors used for those parameters are explored. A basic Arduino-based
health-care application is proposed using sensors and global system for mobile com-
munication (GSM) module.
14.1 Introduction
In recent years, population growth has caused many problems in the health sector
[1]. On average, around 523 million people suffered through a heart attack in a
year. Doctors cannot provide treatment for them. People who have heart attack has
an average age of greater than 50 years [2]. So, constant monitoring is required for
these kinds of diseases. A smart health-care application helps in this scenario. The
elder people who cannot travel often can be benefited through this application. It
1
Department of Computer Science & Engineering, Pondicherry University, Coimbatore, India
2
Faculty of Electronics and Information Technology, Warsaw University of Technology, Warsaw, Poland
3
Department of Information Technology, PSG College of Technology, Coimbatore, India
304 Intelligent network design driven by Big Data analytics
is also helpful for those children and babies whose both parents have to work [3].
One of the main advantages of this smart health-care application is that it will be
more helpful for those diseases that take more time to cure and that do not have a
cure [4].
IoT provides a huge benefit in the health sector. It is used to track and monitor
the patients closely from a remote location. Sensors such as heartbeat sensors, pres-
sure sensors, and temperature and humidity sensors are used to record values from
the patients. These values are processed for further analysis to produce better results.
The results that were produced are helpful for further treatment and are stored in a
secured manner [5]. The whole process of storing the values from the sensor can be
done without human intervention.
The connection between wired and intelligent physical devices creates the
major part of the IoT. These devices are capable of performing many functions such
as storing information, collecting information, and also for its processing. These
instruments have further usages such as communication between applications and
the sensors, Internet connectivity, and so on. This smart care health application pro-
vides better accuracy and results than the traditional methods as there are no human
interventions here. It is also proved to be economically better than traditional patient
monitoring [6].
IoT-based architecture for smart health care systems 305
Proposed
Author and framework
publication or prototype Used hardware Used software Communication
S.No. year description components components module Limitations
1. Uday Kumar et al. This paperwork proposes a ✓Arduino ✓Arduino IDE ✓GSM/GPRS Data stored in radio
(2020) [8] smart healthcare patient ✓Blood pressure ✓Embedded C module frequency
data and security system sensor language identification
to monitor patient ✓DS18B20 ✓Thingspeak Server (RFID) tags
health-care data like Temperature are not secure.
temperature and blood sensor Strong encryption
pressure. The patient strategies should
data are uploaded to the be used to secure
Cloud server. data in the cloud
servers.
2. Vajubunissa Begum This paper aims at helping ✓Arduino UNO ✓Arduino IDE ✓Wi-Fi module The data collected from
et al. cardiac patients by ✓Temperature sensor ✓Embedded C ✓Bluetooth module this sensor are
(2020) [2] continuously collecting (LM 35) language not secured. The
and monitoring the ✓Humidity sensor ✓Python language cost of this system
health status of the (DHT 11) setup is high.
former using sensors. ✓ECG sensor
This framework also provides (AD8232)
an ECG graph of the ✓Heart rate monitor
patient’s heart rate. sensor (MAX
30105)
✓Body position sensor
(ADXL 335)
✓Raspberry Pi
3. Milon Islam et al. This paperwork is about ✓ESP 32 (node MCU) ✓Thingspeak Server ✓Wi-Fi module This system is bulkier,
(2020) [9] a smart health-care module ✓Arduino IDE so it is not flexible
application using ✓Heartbeat sensor to work with.
five sensors. Error ✓Body temperature
percentage in these five sensor (LM35)
sensors is also analyzed ✓Room temperature
in this application. sensor (DHT11)
✓CO Sensor (MQ 9)
✓CO2 Sensor (MQ
135)
4. Dahlia Sam et al. This article proposes a ✓Microcontroller ✓Arduino IDE ✓GSM module The limitation in this
(2020) [3] working IoT-based (ATMega328P) ✓Cloud database system is the need
architecture that is ✓Arduino UNO for excess wires
capable of monitoring ✓Temperature sensor for connection
the health of any patient (LM35) between the
by using sensors and ✓Blood pressure devices.
microcontrollers during sensor
the pandemic situation.
So, by reducing the
unnecessary expenses
for doctor and hospital
visits.
306 Intelligent network design driven by Big Data analytics
Proposed
Author and framework
publication or prototype Used hardware Used software Communication
S.No. year description components components module Limitations
5. Seena Naik et al. This journal paper gives ✓ECG sensor ✓Cloud database ✓GSM module There are no limitations
(2019) [10] the implementation of ✓Raspberry Pi ✓Raspbian OS mentioned in this
Raspberry Pi and IoT ✓Respiration sensor paper.
in the health system. ✓Acceleration sensor
The various sensors ✓Temperature sensor
gather the body health ✓Blood pressure
parameters information sensor
for the diagnosis ✓Heartbeat sensor
by connecting with
Raspberry Pi, which
is associated with the
cloud and displayed on
the LCD.
6. Suneeta S. Raykar et The proposed framework in ✓Node MCU ✓ThingSpeak Cloud ✓API developed The developed
al. [2019] [11] this paper is ALERT ✓Max30105 oxygen ✓Arduino IDE using MIT App framework allows
(android-based health saturation sensor inventor the medical experts
enabled ✓ECG sensor AD- to examine and
remote terminal). The system 8232 provide advice
is built using node MCU ✓Max30102 pulse rate to one patient
with various sensors for body temperature at a time. This
monitoring the oxygen sensor limitation has to be
level, heart-pumping changed to multi-
rate, body temperature user accessibility.
levels.
7. PandiaRajan Jeyaraj The proposed IoT health-care ✓EEG sensor ✓Cloud for data – The proposed model
et al. (2019) system is combined ✓ECG sensor storage classifies or
[12] with some deep learning Temperature ✓WEKA states the patient
algorithms and sensor sensor Figure 14.1 as
networks. The model ✓Pulse rate sensor more generically
collects the data from ✓NI-myRIO processor like abnormal,
the patients with the normal, and
help of sensors and subnormal rather
stores them in a cloud than providing
server. Further, the some medical
data are analyzed and assistance.
visualized using the Even though the
DCNN algorithm that proposed system
was developed for provides reliable
learning and using and high accuracy
WEKA analysis is done. health status
prediction, it
does not provide
any portal for the
user (patient) and
medical expert and
communication
platform.
8. Subasish Mohapatra This paperwork proposes ✓Arduino ✓Cloud Server ✓Wi-Fi module There are no limitations
et al. a health-care system ✓Heartbeat sensor ✓Embedded C mentioned in this
(2019) [6] using Arduino, collects ✓DS18B20 language paper.
the data from various Temperature ✓Arduino IDE
sensors, and stores sensor
the data into the cloud ✓ ESP8288 (Wi-Fi
database using a Wi-Fi module)
module.
9. Abhishek Kumar This paperwork presents a ✓Raspberry Pi ✓Thingspeak server
✓Wi-Fi module This system is not
et al. health-care monitoring ✓DS18B20 ✓Raspbian OS scalable to add
(2018) [7] system to monitor Temperature ✓HTML more sensors, and
temperature and sensor ✓Blynk it also has many
heartbeat using sensors.✓Heartbeat sensor wired connections
This system also has ✓Analog-to-digital that make the
a web camera on the converter system less
patient’s side. (MCP3008) flexible.
✓Web camera
10. Shubham Banka et al. This paperwork is about a ✓Raspberry Pi ✓Raspbian OS ✓Wi-Fi module There are no limitations
(2018) [13] smart health-care system ✓Temperature sensor ✓GSM module mentioned in this
using Raspberry Pi. (LM35) paper.
This system collects ✓Heartbeat sensor
various details from the ✓Vibration sensor
sensor and is capable of ✓BP sensor
intimating these details
to the patient’s family
and doctor.
IoT-based architecture for smart health care systems 307
Proposed
Author and framework
publication or prototype Used hardware Used software Communication
S.No. year description components components module Limitations
11. C. Senthamilarasi, This journal presents a real- ✓Arduino UNO ✓Cloud Server ✓Wi-Fi module There are no limitations
et al. time patient monitoring ✓ECG sensor ✓Arduino IDE mentioned in this
(2018) [14] system interconnected ✓Temperature sensor ✓Embedded C paper.
with IoT to evaluate ✓Heartbeat sensor language
the performance and
practicability of the
systems. This procedure
helps to monitor the
patient’s healthcare
continuously based
on some parameters.
This method will be
supported by Arduino
UNO with a cloud
database.
12. Swaleha Shaikh et al. This paperwork proposes ✓Raspberry Pi ✓Cloud Server ✓Wi-Fi module Security concerns
(2017) [15] a smart health-care ✓ LM35 (temp sensor) ✓Raspbian OS over the patient’s
monitoring system ✓Heart rate sensor sensitive data.
using Raspberry Pi and ✓ Blood pressure
a cloud database to sensor
effectively monitor the ✓Accelerometer
patients.
13. Tarannum Khan et al. This paperwork presents ✓Arduino UNO ✓Arduino IDE ✓Wi-Fi module The data that are
(2017) [16] a patient monitoring ✓Temperature sensor ✓Embedded C uploaded on the
application using ✓Heartbeat sensor Language cloud is insecure.
Arduino UNO. The ✓SD card ✓Blynk Application
stored data are also
presented through the
android application.
14. Shreyaasha This paperwork presents a ✓Arduino UNO ✓Arduino IDE ✓Wi-Fi module The data stored in the
Chaudhury health-care application ✓Temperature sensor ✓Embedded C ✓GSM module cloud are insecure,
et al. that monitors health (LM35) language and the cost of
(2017) [5] parameters through ✓ECG sensor ✓HTML this system is also
sensors. An audio ✓Heart rate monitor high.
signaling device and sensor
message service are
attached to the system to
indicate emergencies.
15. Niharika Kumar This paper proposes a smart ✓Arduino UNO ✓Arduino IDE ✓Wi-Fi module The cost of this
(2017) [17] frame for a healthcare ✓Gyroscope ✓Embedded C proposed system
system that uses ✓ECG sensor Language is high.
architecture protocols of ✓Temperature sensor ✓HTML
the 6LoWPAN protocol ✓ Heart rate sensor
and IEEE 11073, which
is associated with cloud
networks.
16. Fatima Alshehri et al. This paper provides a – – – –
[2020] [18] detailed survey about
smart healthcare. This
paper includes various
methods for smart
healthcare like the IoT,
Internet of medical
things, and artificial
intelligence.
17. Yasmeen Shaikh et al. This paperwork proposes – – – –
(2018) [19] several methods for
providing health-care
solutions using IoT.
The major methods
discussed in this paper
are RFID and named
data networking.
18. Wei Li et al. This paper is about a survey – – – –
(2020) [20] that explores the
different possibilities
of implementing
smart healthcare using
machine-learning and
big data analysis.
19. Ramakrishna Hegde The proposed system is – – – –
et al. to monitor the daily
(2021) [21] health-related activities
of a patient and report to
the doctors through the
internet with a different
approach.
308 Intelligent network design driven by Big Data analytics
Proposed
Author and framework
publication or prototype Used hardware Used software Communication
S.No. year description components components module Limitations
Uday Kumar et al. [8] proposed Arduino-based patient monitoring system that stores
patient details such as blood pressure and temperature. The author implemented this
system by using a temperature sensor to monitor temperature and a blood pressure
sensor to monitor blood pressure. Patient data that are collected through sensors are
stored in radio frequency identification (RFID) tags and are also uploaded in the
cloud. The stored data will be useful to doctors to analyze patients’ conditions. This
system has GSM/General Packet Radio Service (GPRS) module to alert emergen-
cies to the patient’s side by short message services (SMS).
Vajubunnisa Begum et al. [2] proposed a health-care monitoring system spe-
cially designed for cardiac patients using Raspberry Pi. The data are collected
through the sensors such as temperature sensors, heartbeat sensors, ECG sensors,
and body position sensors. Wi-Fi module is used for sending the collected data from
sensors to the cloud. The data are also transferred to the doctor’s side using the
Bluetooth module. Serial plotter software is used to plot the stored data collected
from the ECG sensor. The analyzed result from the sensor is displayed on the LCD.
Milon Islam et al. [9] proposed IoT-based health-care application using Node
MCU (which is also known as ESP 32). Five sensors collected patient’s data and
there are a heartbeat sensor to monitor heartbeat, a room-temperature sensor to mon-
itor room temperature, a body temperature sensor to monitor temperature, an MQ9
sensor to monitor CO gas, and an MQ135 sensor to monitor CO2 gas.
Dahlia Sam et al. [3] proposed an Arduino-based model patient monitoring sys-
tem to monitor the patient’s situation. Patient monitoring is done using a tempera-
ture sensor for monitoring body temperature and an optical sensor that measures
heartbeat in pulsates. These sensors send the digital signals to the Arduino UNO,
and sensor data through a Wi-Fi module and can be accessed through the cloud by
doctors and also relatives.
Seena Naik et al. [10] have proposed a framework by implementing a working
model that monitors and checks a patient’s health condition even from a distance in
a cheap way using an ECG sensor, temperature sensor, and heartbeat sensor condi-
tioned by Raspberry Pi. All the digital signals will be transmitted to the Raspberry Pi
and stored in a database accessed by the cloud. The details will be sent to the health
specialist via the Wi-Fi module.
Suneeta S. Raykar et al. [11] have proposed a duplex communication system
that provides a gateway to monitor various health parameters. The data will be
recorded using a sensor and sent to the ThingSpeak cloud. The medical experts will
use the data to make decisions on the patient’s health. They can also give medical
recommendations based on that data. The recommendations that are given by the
medical experts can be seen by the patient through an android application that is
310 Intelligent network design driven by Big Data analytics
developed using MIT App inventor. It also suggests future modifications that can be
made for the proposed framework.
PandiaRajan Jeyaraj et al. [12] have proposed a patient monitoring system along
with a deep learning algorithm for making an accurate prediction based on the patient
collected data. The system collects various data such as EEG, ECG, body tempera-
ture, and vital signs. The data are connected using an intelligent sensor network,
and then the data are stored using central cloud storage with the help of the myRIO
processor along with a Wi-Fi module. The data from the cloud are taken, classified
using WEKA, and analyzed using a Deep Convolutional Neural Network (DCNN)
learning algorithm. The model provides results with 97.2% prediction accuracy.
Swaleha Shaikh et al. [6] proposed an application that observes the patient’s
health conditions with Raspberry Pi at its heart. This framework presents a system
in which the data are collected using embedded sensors that are wearable, and the
health status of the patient is monitored concerning some parameters dynamically.
The collected data are then transmitted to the Raspberry Pi, which will process and
analyze the data. These data that have been analyzed by the processor are stored in
the cloud. The stored results are then used by the doctors when the details of the
patient are needed.
Abhishek Kumar et al. [7] proposed a smart health-care application using
Raspberry Pi. Raspberry Pi is connected with a temperature sensor to monitor tem-
perature and heartbeat sensors to monitor the heartbeat. The input from the sensor
is processed in Raspberry Pi and displayed on LCD. The data are then sent to the
cloud by using Raspberry Pi that enables doctors to continuously monitor the status
of the patient. The patient’s side has a web camera that helps the doctor to monitor
the patient. Both Web and android application is used to present the data collected
through the sensor.
Shubham Banka et al. [13] proposed a smart health-care monitoring system
using Raspberry Pi. The author implemented this system using a temperature sensor,
heartbeat sensor, BP sensor, and vibration sensor to monitor the patient’s tempera-
ture, heartbeat, BP, and shivering of the patient. The data collected from the sen-
sor are stored and are presented through the web user interface. A GSM module is
attached to inform the critical situations of the patients through SMS.
Senthamilarasi et al. [14] proposed a model for monitoring patients’ healthcare
by Arduino UNO. The real-time implementation of health-care applications uses a
temperature sensor for body temperature, ECG for heartbeat, and a heartbeat sensor
for heartbeat rate monitoring. These sensors send the signal to the Arduino UNO,
and sensor data can be monitored by any smart device that is interconnected with the
cloud database, which acts as a server for communication. The paper also discusses
the recent advancements in the IoT-based health-care eco-system.
Subasish Mohapatra et al. [15] implemented a health-care monitoring system
using Arduino. This model uses a temperature sensor and pulse sensor that are con-
nected to the Arduino. The health parameters given by the sensors are then stored in
the cloud database by using Wi-Fi module. This cloud database processes the data
provided by the sensors and if the data exceeded the threshold value, it makes an
emergency call to the concerned doctor with the current health status of the patient
IoT-based architecture for smart health care systems 311
and with a full detailed medical report, so that the doctor suggests a proper health-
care measures to be taken in case of critical health conditions.
Tarannum Khan et al. [16] proposed an Arduino-based patient monitoring sys-
tem to monitor the health of the patients. The patient’s details are collected using
two sensors that are temperature sensor and heartbeat sensor to monitor temperature
and heartbeat. An LCD screen is used to display the monitored details. The collected
details are uploaded on the server and are converted into JSON links for visualizing
in an android application.
Shreyaasha Chaudhury et al. [5] proposed a patient monitoring system using
the IoT. The author implemented this system using Arduino UNO, temperature sen-
sor, ECG sensor, and heartbeat sensor to monitor the patient’s health parameter.
The data collected through the sensors are monitored and analyzed by the doctor.
Abnormal changes in the health parameters are notified to the doctor using a buzzer
and SMS. The data collected are stored in a database and are present through a
simple web page.
Niharika Kumar [17] gave a framework end-to-end health-care system associ-
ated with Arduino UNO. The implementation of the health-care application system
uses a temperature sensor for body temperature, ECG for heartbeat, gyroscope for
sense angular velocity, and heartbeat sensor for the heartbeat rate monitoring. These
sensors send the signal to the Arduino UNO, which is connected to the Internet and
acts as the communication media between the IoT devices and exhibit devices such
as a computer system or phones.
Durga Amarnath M. Budida et al. [23] proposed a health-care system that uses
ATMEL 89s52 microprocessor as its base. The system contains a login to validate
the user who can access the system. The data are sent from microprocessor to other
computers in the hospital side using Wi-Fi module. The system has a temperature
sensor and a pulse rate sensor to monitor the patient’s health, and it also requires
analog to digital convertor. This convertor will convert the analog to digital on
which the microprocessor works.
Kavita Jaiswal et al. [24] implemented a Raspberry Pi-based health monitoring
system that is used to monitor the health condition of a patient from a remote loca-
tion. This system uses a Docker container to manage the data that are collected and
also uses a Database to store the data about the patient. Once the data is stored, it can
be accessed by multiple users who can be benefitted.
Gowrishankar et al. [25] proposed and designed a framework that determines
the human body temperature and heartbeat of the patient, then it redirects the data to
the server end by using a cost-efficient microcontroller with great effect. It uses three
different sensors, such as pulse sensors, heartbeat sensor, and temperature sensor,
and they are controlled by the microcontroller.
312 Intelligent network design driven by Big Data analytics
14.4.1.2 Raspberry Pi
Raspberry Pi is a single-board computer that allows us to connect and explore with
different sensors. In References 6, 7, 10, the health-care application is built upon the
Raspberry Pi platform. In Reference 2, the smart health-care application is imple-
mented in both the Arduino and Raspberry Pi platforms to compare the efficiency.
Raspberry Pi gave better results than the Arduino in the patient monitoring system
proposed in Reference 2.
14.4.2 Sensors
14.4.2.1 Blood pressure sensor
The blood pressure sensor is capable of monitoring the blood flow of a patient. The
pressures created in the blood flow are detected and converted to electrical signals.
Blood pressure is an important factor when considering a person’s health. This sen-
sor is used in the patient monitoring system proposed in References 7, 8, 10 to study
the blood pressure of the patient.
IoT-based architecture for smart health care systems 313
14.4.2.10 Respiration sensor
A respiration sensor is used for giving the output of respiration vibration in the form
of a wave. As the breathing of the patient takes place, a respiratory waveform can
be shown in the display. In Reference 10, this sensor is used to monitor the patient’s
respiration.
The proposed health-care application uses Arduino at its heart which is a microcon-
troller capable of articulating multiple sensors. Arduino development boards have
the power of transforming the data provided to them. The system uses three sensors
to monitor the health of the patients. The sensors are a temperature sensor, heartbeat
sensor, and pressure sensor [28]. Many varieties of temperature sensors are used to
read the temperature from patients. The temperature sensor used in this proposed
framework is LM 35. A heartbeat sensor is used for monitoring the heartbeat of the
patient. The vibration sensor is capable of checking the shivering of the patient.
The data collected from Arduino are constantly monitored. The GSM module sends
messages to doctors when the patient’s state is critical. The LCD is used to display
the temperature, heartbeat, and vibration status of a patient. The data can also be
accessed online with the help of a Wi-Fi module [27, 29–43].
the analog voltage to digital format for readability purposes. The temperature sensor
is shown in Figure 14.4.
The proposed system is implemented using the Proteus application. The different
components used in this application are connected to the Arduino. The block dia-
gram of the proposed system is demonstrated in Figure 14.2. The sensor modules
used in the system are connected to the Arduino development board using wires.
Each sensor used in this system contains a Vcc and a ground. These connections are
provided to the sensors. The output pin of the LM35 sensor is connected to A0 of
the Arduino board. The output pin of the heartbeat sensor is connected to A2 of the
Arduino board. D0 of the Arduino board is connected to the output pin of the vibra-
tion sensor. Virtual terminal in Proteus is used to display the output of the proposed
system. The RX pin of the serial monitor is connected to D1 of Arduino. The GSM
module contains two pins, RX and PX. The RX pin and TX pin of the GSM module
is connected to D2 and D3 of the Arduino board, respectively. The circuit diagram
of the proposed system is shown in Figure 14.10.
The code for the proposed system is compiled and executed in the Arduino
IDE. The hex file of the executed code is uploaded to the Arduino board that is the
IoT-based architecture for smart health care systems 321
program file of the board. The program file of the GSM module is uploaded in it. The
proposed system is simulated to view the results.
The data collected from the sensors are displayed on the virtual terminal. First,
the temperature is displayed in two units, and they are Celsius and Fahrenheit. The
output from the vibration sensor is either the patient is normal or the patient is shiv-
ering. The heartbeat sensor gives heartbeats per minute. If the patient is normal,
then there will be no alert messages sent to the doctor’s side. The data displayed are
shown in Figure 14.11.
Each parameter is set to a threshold value. When the data recorded from the sen-
sor exceed the threshold value, alert messages are sent to the doctor. The threshold
value for the temperature sensor is 101 °F. If the temperature value exceeds the
given value, then the alert messages are sent to the doctor using the GSM module.
In the same way, the threshold value of the heartbeat sensor is 50. If the heartbeats
per minute reduce below 50, then the alert messages should be sent to the doctor’s
number. The alert message is shown in Figure 14.12.
Figure 14.13 shows the temperature reading of the patient collected during dif-
ferent times of the day. For this particular patient, the temperature value is between
98.2 and 98.9 °F.
The heartbeat of the patient recorded using sensors is shown in Figure 14.14. .
The heartbeats of the patient vary from range 69 to 72.
14.7 Conclusion
IoT combined with the health-care sector has proved to make remote monitoring
of patients possible and effective. It unleashes the power and possibility to keep
the patients healthy and also empower the medical experts to deliver instant care to
the patients. It also prevents unnecessary admission in hospitals since every single
change in their body is being collected at an early stage and remedies can be pro-
vided superlatively.
IoT in the health-care system is proving to be significantly effective and the
outcome is very much beneficial, not only to the patients but also to the medical
experts and hospitals. This paper deals with various health-care monitoring systems
that have been developed using IoT and provides an extensive survey on the same.
It briefs about different designs for the health-care system. Also, the different types
of IoT components used for building the system, their limitations, and support pro-
vided by each type of design have been extensively discussed. An Arduino-based
application is simulated on the Proteus application.
References
[1] Zhu H., Wu C.K., Koo C.H., et al. ‘Smart healthcare in the era of Internet-of-
Things’. IEEE Consumer Electronics Magazine. 2019, vol. 8(5), pp. 26–30.
[2] Vajubunnisa Begum R., Dharmarajan K. ‘Smart healthcare monitoring sys-
tem in IoT’. European Journal of Molecular & Clinical Medicine. 2020, vol.
7(4).
324 Intelligent network design driven by Big Data analytics
[3] Dahlia Sam S., Srinidhi V., Niveditha R., Amutha S. ‘Progressed IOT based
remote health monitoring system’. International Journal of Control and
Automation. 2020, vol. 13(2s).
[4] Baker S.B., Xiang W., Atkinson I. ‘Internet of things for smart healthcare:
technologies, challenges, and opportunities’. IEEE Access. 2017, vol. 5, pp.
26521–44.
[5] Chaudhury S., Paul D., Mukherjee R., Haldar S. ‘Internet of thing based health-
care monitoring system’. 2017 8th IEEE Annual Information Technology,
Electronics and Mobile Communication Conference (IEMCON), Publisher:
IEEE, Vancouver, BC, Canada; IEEE, 2017.
[6] Mohapatra S., Mohanty S., Mohanty S. Smart healthcare: an approach for
ubiquitous healthcare management using iiott, Editor(s): Nilanjan Dey,
Himansu Das, Bighnaraj Naik, Himansu Sekhar Behera, in advances in
ubiquitous sensing applications for healthcare, big data analytics for in-
telligent healthcare management, academic press, 2019, 175-196, ISBN
9780128181461. Elsevier; 2019.
[7] Kumar A., Chattree G., Periyasamy S. Smart healthcare monitoring system,
wireless personal communications, 453–463 (2018). https://doi.org/10.1007/
s11277-018-5699-0. Vol. 101. Springer; 2018. pp. 453–63.
[8] Uday Kumar K., Shabbiah S., Rudra Kumar M. ‘Design of high-security
smart health care monitoring system using IoT’. International Journal of
Emerging Trends in Engineering Research. 2020, vol. 8(6).
[9] Islam M.M., Rahaman A., Islam M.R. ‘Development of smart healthcare
monitoring system in IoT environment’. SN Computer Science. 2020, vol.
1(3), p. 185.
[10] Seena Naik K., Sudarshan E. ‘Smart healthcare monitoring system using
raspberry Pi on IoT platform’. ARPN Journal of Engineering and Applied
Sciences. 2019, vol. 14(4).
[11] Raykar S.S., Suneeta S., Shet V.N. ‘Design of healthcare system using iot
enabled application’. Advanced Materials for Clean Energy and Health
Applications, Today: Proceedings, Volume 23, Part 1, 2020, Pages 62-67.
2019.
[12] Rajan Jeyaraj P., Nadar E.R.S., Jeyaraj P. ‘Smart-monitor: patient monitoring
system for IoT-based healthcare system using deep learning’. IETE Journal
of Research. 2019, vol. 16(1), pp. 1–8.
[13] Banka S., Madan I., Saranya S.S. ‘Smart healthcare monitoring using IoT’.
International Journal of Applied Engineering Research. 2018, vol. 13(15).
[14] Senthamilarasi C., Jansi Rani J., Vidhya B., Atitha H. ‘A smart patient health
monitoring system using IoT’. International Journal of Pure and Applied
Mathematics, vol. 119(16).
[15] Shaikh S., Chitre V. ‘Healthcare monitoring system using iot’. International
Conference on Trends in Electronics and Informatics ICEI, Publisher:
IEEE,Tirunelveli, India, DOI: 10.1109/ICOEI.2017.8300952; 2017.
[16] Khan T., Chattopadhyay M.K. ‘Smart healthcare monitoring system’. IEEE,
International Conference on Information, Communication, Instrumentation
IoT-based architecture for smart health care systems 325
[40] Singh P., Bansal A., Kamal A.E., Kumar S. ‘Road surface quality monitoring
using machine learning algorithm’ in Reddy A.N.R., Marla D., Favorskaya
M.N., Satapathy S.C. (eds.). Intelligent Manufacturing and Energy
Sustainability. Smart Innovation, Systems and Technologies. 265. Singapore:
Springer; 2022.
[41] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘Energy aware dis-
tributed protocol for heterogeneous wireless sensor network’. International
Journal of Control and Automation. 2015, vol. 8(10), pp. 421–30.
[42] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘A utility maximiza-
tion approach to MAC layer channel access and forwarding’. Progress in
Electromagnetics Research Symposium. 2015, vol. 2015, pp. 2363–7.
[43] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘An NS3 implementa-
tion of physical layer based on 802.11 for utility maximization of WSN’.
Proceedings – 2015 International Conference on Computational Intelligence
and Communication Networks, CICN 2015; 2016. pp. 79–84.
[44] Sharma A., Awasthi Y., Kumar S. ‘The role of Blockchain, AI and IoT for
smart road traffic management system’. Proceedings – 2020 IEEE India
Council International Subsections Conference, INDISCON 2020; 2020. pp.
289–96.
[45] Singh P., Bansal A., Kumar S. ‘Performance analysis of various information
platforms for recognizing the quality of indian roads’. Proceedings of the
Confluence 2020 – 10th International Conference on Cloud Computing, Data
Science and Engineering, Publisher: IEEE, Noida, India, DOI: 10.1109/
Confluence47617.2020.9057829; 2020. pp. 63–76.
[46] Kumar S., Ranjan P., Singh P., Tripathy M.R. ‘Design and implementation of
fault tolerance technique for Internet of things (IoT)’. Proceedings – 2020 12th
International Conference on Computational Intelligence and Communication
Networks, CICN 2020; 2020. pp. 154–9.
[47] Reghu S., Kumar S. ‘Development of robust infrastructure in networking
to survive a disaster’. 2019 4th International Conference on Information
Systems and Computer Networks, ISCON 2019, Publisher: IEEE, Mathura,
INdia, 10.1109/ISCON47742.2019.9036244; 2019. pp. 250–55.
[48] Kumar A., Krishnamurthi R., Nayyar A., Sharma K., Grover V., Hossain E. ‘A
novel smart healthcare design, simulation, and implementation using health-
care 4.0 processes’. IEEE Access. 2020, vol. 8, pp. 118433–71.
[49] Sundaravadivel P., Kougianos E., Mohanty S.P., Ganapathiraju M.
K. 'Everything you wanted to know about smart health care'. IEEE
Consumer Electronics Magazine, pp. 18-28, Jan. 2018, Doi: 10.1109/
MCE.2017.2755378. 2018, vol. 7(1).
This page intentionally left blank
Chapter 15
IoT-based heart disease prediction system
Rajamohana S P 1, Zbigniew M Wawrzyniak 2, Krishna
Prasath S 3, Shevannth R 3, Raja Kumar I 3,
Mohammed Rafi M 3, and T Hariprasath D 3
In India, almost 80% of patients who die from heart disease do not receive adequate
care. This is a challenging task for doctors because they often seem unable to make
an accurate diagnosis. This condition is extremely expensive to treat. The proposed
solution uses data mining technologies to simplify the decision support system in
order to increase the cost-effectiveness of therapy. To oversee their patients’ care,
most hospitals use a hospital management system. Unfortunately, many of these
tools do not employ large amounts of clinical data to derive useful information.
Because these systems generate a considerable amount of data in many embodi-
ments, the data is rarely accessed and remains unusable. As a result, making sensible
selections requires a lot of effort during this procedure. The process of diagnosing a
disease currently entails identifying the disease’s numerous symptoms and charac-
teristics. This research employs a number of data mining approaches to assist with
medical diagnostics.
15.1 Introduction
Heart diseases are becoming the leading cause of death [1]. We can use health-care
data from several hospitals to train a machine learning system to forecast the occur-
rence of health illnesses. Machine learning is highly regarded in health care because
of its ability to process massive datasets faster than humans [2]. Predicting this out-
come would help the doctor plan and deliver better care. It is a structure that uses
a historical cardiac database to predict heart disease. Health-related signals includ-
ing blood pressure, heart rate, chest discomfort, blood sugar, and cholesterol are
among the input signals. Two additional factors that are also effective in achieving
1
Department of Computer Science & Engineering, Pondicherry University, Karaikal, India
2
Department Electronics and Information Technology, Warsaw University of Technology, Warsaw,
Poland
3
Department of Information Technology, PSG College of Technology, Coimbatore, Tamil Nadu, India
330 Intelligent network design driven by Big Data analytics
the desired outcomes are smoking and obesity, both of which are recognized as
significant heart disease symptoms. To obtain the findings, classification techniques
like decision trees and random forests are used along with the data.
The health-care industry gathers a massive amount of data about health-care
services. Unfortunately, it often lacks practical information that could aid in making
sound judgments. This work created a revolutionary data mining system capable of
detecting and forecasting cardiac problems.
The outcome of paper demonstrates that each process has its own unique capa-
bility in reaching the stated mining objectives [1]. IHDPS can provide answers to
the complex concerns of what if traditional decision-making networks could not
use medical files such as age, while blood sugar and blood pressure can predict that
patients will develop heart disease, thereby providing valuable information [3].
As a result, data mining helps to achieve natural evolution in the medical field
[2]. Deep learning is one of the most often utilized methods for the diagnosis of car-
diac problems, along with machine learning and neural networks. The section that
follows provides a brief description of each [3].
15.1.1 Deep learning
The problem in function of heart due to different circumstance is titled as a heart dis-
ease. Some of the familiar heart diseases are congestive heart failure, cardiac arrest,
arrhythmia, stroke, coronary artery disease (CAD), and congenital heart disease as
shown in Figure 15.1. How heart disease varies establish on its type and its symp-
toms are shown in Figure 15.2.
IoT-based heart disease prediction system 331
The indication to forecast the heart disorder based on the type of heart disorder
symptoms depends on its heart disorder. For example, high blood pressure is one
of the indications for coronary artery for all the people with similar symptom, the
other may differ having the symptom of chest pain. The doctor confirms the heart
disease with the diagnosed report of the patient and several other frameworks. Some
of the most usual heart disorder symptoms and its types are listed in Table 15.1 and
Figure 15.1.
15.2 Related work
Only a few healthcare sectors employ clinical data for predictive purposes, and even
those that do may be limited by a plethora of patient organizational constraints.
Literature surveys are mentioned in Table 15.2. Predictions are based on a doctor’s
intuition rather than extensive research from a scientific database. Incorrect therapy
caused by a faulty diagnosis poses a severe threat to the clinical profession. To solve
these challenges, the following part provided and discussed a data mining strategy
helped by scientific statistics.
Gowrishankar et al. (2017) proposed a framework that estimates and determines
the case’s human body temperature and heartbeat, then redirects the data to the cus-
tomer or server end using a low-cost microcontroller. It employs three distinct sen-
sors (pulse, heartbeat, and temperature sensors), all of which are heavily controlled
by the microcontroller.
Balamurugan et al. (2017) proposed a framework that uses an Arduino, plus,
and a heartbeat sensor to determine the patient’s pulse and analyze the data to deter-
mine whether the patient is healthy or at risk of having a heart attack [44].
Manisha et al. (2016) proposed a system that assists elderly persons who are
constantly producing heart diseases or heart attacks by tracking their cardiac abnor-
malities and sending an alert alarm to the user’s or the surrounding person’s mobile
phone using a gadget [52]. Additionally, it analyses the heartbeat's data set. It also
makes use of big data analytics, which through utilizing various tools and regula-
tions fosters an environment that is open to customers.
Polu et al. (2019) proposed a system that assists to lower deaths brought by
heart disorder since the main reason of heart disorder deaths is expected due to wait
time and proper treatment. It can be bypassed since the technology will notify the
doctor of his present location and ECG report.
Patel et al. (2014) proposed an article that says angina is very significant for
the sufferer. The sufferer may be saved if additional care and medical assistance are
provided within an hour. To quickly identify heart conditions utilizing the algorithm
and the crucial therapy. A gaming gadget can also serve as lifesaving equipment in
this way. If put into practice, this concept is quite helpful.
Ashrafuzzaman et al. (2013) proposed a method that teaches us how to identify
heart problems by heartbeat and blood data. This technique for identifying heart
issues is highly accurate. With the help of this technique for capturing heart frequen-
cies, we could identify a number of heart-related conditions. In addition to heart
attacks, it can detect abnormal blood, heart obstruction, and valve circulation. This
is because the heartbeat was captured. It is now capable of identifying heart issues.
Based on heart frequencies, our technology will assist in the early detection of car-
diac issues.
Bo Jin, Chao Che et al. (2018) proposed a neural network-based model where
the trial and early diagnosis of the heart complaint were performed using electronic
health record (EHR) data from real-world datasets related to ischemic heart disease
in this research [29]. We used one-hot encoding and word vectors, which are the
Table 15.2 Related works
(Continues)
Table 15.2 Continued
Author and Used hardware and Communication
S. No. publication year Proposed framework software components Algorithms used module Limitations
5 Ashrafuzzaman et al. This paper proposes Kinect, Xbox one NA Wi-Fi module • Device detection
(2014) [8] a device “Kinect” of heartbeat must
that monitors heart be precise and
rate and few other accurate otherwise
parameters and detects it will SMS
heart attack. It also triggering alert
sends alert messages about the observer
to emergency contacts will be activated
and makes a Skype and sent to his/her
call to hospital if friends and family.
needed.
6 Ashrafuzzaman et al. This paper says about Smartphone NA Wi-Fi module • Heart rate
(2013) [9] how to detect blood calculation must be
and heart rate and accurate.
using heart rate • Noise-free
recoding detect environment is
heart attacks, heart required that is not
blockage, abnormal favourable mostly.
blood, and valve
circulation.
7 Jin et al. (2018) [10] The paper uses sequential Pulse-sensor, LM35 LSTM ESP8266 Wi-Fi • Techniques in the
data modeling to temperature sensor, module future to classify
predict heart disorder. Arduino UNO and predict heart
(wearable sensors), diseases.
database • These techniques
maybe less speed
and accuracy.
(Continues)
Table 15.2 Continued
Author and Used hardware and Communication
S. No. publication year Proposed framework software components Algorithms used module Limitations
8 Javeed et al. (2017) The article presents a Heart disease data set Enhanced random NA • Many trees make
[11] solution for by using forest model, the algorithm slow
optimized algorithms. random search and ineffective.
9 Sarangi et al. (2015) The article presents a Heart disease data set Hybrid algorithms NA • The used
[12] solution by using (GA and neural algorithms have too
hybrid techniques. networks) many parameters
for somebody
non expert in data
mining.
10 Mamatha and Shaicy The article presents a Heart disease data set Support vector NA • SVM is not apt for
(2019) [13] solution by using data machine larger data sets.
mining for identify.
11 Bahrami and The article presents a Heart disease data set ICD, KKN NA • Accuracy depends
Shirvani [14] solution by using data on the quality of
mining for identify the data.
using KNN, SVO
classifiers.
12 Bhuvaneswari and The article presents Arduino and heart Naive bayes, NA • Naive Bayes
Kalaiselvi (2012) steps for identify disease data set neural suppose that all
[15] facts based on patient network, features are unique,
reports. decision trees rarely happening in
real time.
13 Subbalakshmi et al. The article presents a Heart disease data set DSHDPS, NA • This algorithm is
(2011) [7] solution for using a Cleveland also notorious as a
Decision Support. Database lousy estimator.
Naive Bayes
(Continues)
Table 15.2 Continued
Author and Used hardware and Communication
S. No. publication year Proposed framework software components Algorithms used module Limitations
14 Jabbar et al. (2011) The article is to learn, Heart disease data set Cluster and NA • They have
[16] develop and identify a association limitation execution
network by implement rule mining time.
associative rules. algorithms
15 Arabasadi et al. The article presents Heart disease data set ANN NA • The greater
(2017) [17] a solution for a computational
proposed hybrid burden, black box
identification model nature, proneness
such as genetic and to overfitting,
ANN algorithm. and the empirical
nature of model
development.
16 Ambekar and The article presents a Heart disease data set Naïve Bayes, KNN NA • Naive Bayes
Phalnikar (2018) solution for a proposed algorithm assumes that all
[18] disease risk prediction traits are one of
using CNN-UDRP a kind and that
algorithm. they only occur
infrequently in real
time.
(Continues)
Table 15.2 Continued
Author and Used hardware and Communication
S. No. publication year Proposed framework software components Algorithms used module Limitations
17 Ana and Krishna The article presents a Heart disease data set Naive Bayesian, NA • The multiple
(2017) [19] solution for disease KNN, and Tree algorithms are
prediction for stoke based performed that
patients using increase a rise
wearable sensors in factor of giving
IOT. an improper
information about
the patient.
18 Gupta et al.(2019) This article presents that Heart disease data set KNN, SVM, NA • A real-time heart
[20] KNN has the best decision tree disease prediction
results comparatively system for
to other algorithms. proactive health
monitoring is not
present.
19 He (2020) [21] Data are collected from Heart disease data set KNN Smart watches • The prediction
wearable IoT devices should be done
for 24 hours and in time series
analysis is made in the manner so the time
cloud. consumption is
relatively high.
20 Shaikh et al. (2015) The article presents a Heart disease data set KNN, Bayesian NA • The system cannot
[22] solution using data and Java Swing handle different
mining technics and is kinds of traits.
implement as a Java
application.
(Continues)
Table 15.2 Continued
Author and Used hardware and Communication
S. No. publication year Proposed framework software components Algorithms used module Limitations
21 Ganesan (2019) [23] In this article, they have Heart disease data set Classification Wi-Fi Module • Predictions should
presented to diagnose and IoT model be made in a time
heart disease using a series format, since
cloud and IoT-based this reduces the
model. amount of time
required.
22 Binsalman (2018) This paper proposed a NA Sensors Wi-Fi module • It is used to monitor
[24] remote monitoring a a pre-heart disease
heart disease patient. patient only.
23 Bhat (2020) [25] In this paper, they Heart disease data set Convolutional NA • There is a time
have automated the neural network restriction on how
real-time medical long they can
diagnosis patient. execute.
24 Pooja Arjun (2020) In this article, the author NA Sensors Wi-Fi module • It is only used to
[26] has proposed an monitor a patient
automated system who is in the early
to monitor the heart stages of cardiac
disease patient. disease.
(Continues)
Table 15.2 Continued
Author and Used hardware and Communication
S. No. publication year Proposed framework software components Algorithms used module Limitations
25 Raju et al. (2022) With the use of Edge- Heart disease data set GSO-CCNN NA • The accuracy is
[27] Fog-Cloud computing, maintained at
the author has the first learning
tried to present a percentages, as
revolutionary smart it is with other
healthcare model in algorithms,
this study. and it requires
greater learning
percentages to
raise it.
IoT-based heart disease prediction system 341
core ideas of an extended memory network model, to model the diagnostic events
and read coronary failure events [11]. We usually use the findings to show how cru-
cial it is to value the successional structure of clinical records [7].
Ashir Javeed et al. (2019) proposed an arbitrary timber model and an arbitrary
quest algorithm to identify cardiovascular disease (RSA). This model is intended to
be used in conjunction with a grid quest algorithm.
Srikanta Pattnaik et al. (2015) proposed a cost-effective model using the genetic
algorithm optimizer technique where the weights were reformed and fed as an input
into the specified network. Ninety percent accuracy was achieved using a hybrid
technique combining GA and neural networks.
Mamatha Alex P et al. (2019) proposed a system that makes use of KNN, ANN,
SVM, and Random Forest techniques. ANN [10] compares all of these data mining
to prognosticate the advanced delicacy of the heart complaint diagnosis.
Boshra Bahrami et al. (2020) proposed a colorful bracket methodology was
used in this study to determine the cause of a cardiovascular problem. Classification
algorithms such as SVO, Decision Tree, and KNN are used to partition the data-
sets [11]. Following the bracket and performance evaluation, the Decision tree is
regarded as the most fashionable option from the dataset for cardiovascular com-
plaint vaccination.
Bhuvaneswari et al. (2018) proposed a system that investigates prior experi-
ence and predicts the level of an object among all objects using the Naive Bayes
classification. The Naive Bayesian and Back Propagation Neural Network catego-
rization algorithms were used in the proposed work [22]. In the supervised learning
environment, the Naive Bayesian classification is utilized to train very effectively,
and the prior backend is formed by Bayesian rules based on the precise structure of
the probability model.
Chinna Rao et al. (2019) proposed a Decision Support in Heart Disorder
Identification Network (DSHDPS) using the nave processing modeling technique
[38]. Heart disease indicators like sex, age, blood pressure, and chest discomfort
can forecast a victim’s likelihood of having a heart issue. Ti functions as an online
application with a questionnaire. Prior to this, the UCI Cleveland database was used
to gather data on heart victims. The nave technique is preferable for identifying
cardiac problems for the reasons listed below: We can obtain a better classification
technique when the amount of data is substantial, the dimensions are distinct from
one another, and the model is compared to other models. Notwithstanding its sim-
plicity, naive algorithms frequently outperform more intricate ones [13].
Jabbar et al. (2019) developed a vaticination system using associative rules and
a novel method that combines the concept of sequence figures and clustering heart
attract vaticination in this study. The first datasheet of heart complaint cases was con-
verted into double format using this method, and the proposed system was applied
to double transitional data. A data set of heart complaint cases with 14 essential
dimensions was extracted from the UCI depository’s Cleveland database. Cluster
Based Association Rule Mining Grounded on Sequence Number (CBARBSN) is
the name of the algorithm. In rule mining, support is a fundamental guideline. To be
included in a commonly used item collection, an item must meet the support criteria.
342 Intelligent network design driven by Big Data analytics
In this investigation, the valid data table is separated based on cluster fractions (dis-
joint subsets of the factual valid table), and each item’s Seq.No and Seq.ID are
predetermined. Frequent item sets have been identified in various clusters based on
Seq.ID, with the most common frequent item set designated as the general item set.
Maximum heart rate > 100, trial blood pressure > 120, old peak > 0, age > 45, and
Thal > 3 indicate a heart attack (frequent item set plant in both clusters in this trial)
[23]. When compared to the preliminary developed system, our proposed algorithm
has a lower prosecution time to mine rules (i.e., 0.879 ms when support is 3), and the
prosecution time changes dramatically as support increases.
Zeinab Arabasadi et al. (2017) proposed a mongrel opinion network for coro-
nary roadway complaints based on the Videl ANN machine literacy algorithm and
inheritable algorithms. The Z-Alizadeh Sani dataset was used in this study, which
contains 303 case records with 54 features (only 22 were included in the trial),
including 216 cases with coronary artery disease (CAD) [33]. The weights of the
artificial neural network were linked using an inheritable technique first, and then
the ANN model was developed using training data. This sample ANN employs a
feedforward technique with one input and one output subcaste, as well as one hid-
den subcaste with five neurons. The system was evaluated in this trial using a 10-
fold cross-confirmation method. We can see from the data that our proposed model
outperformed a simple ANN model in terms of delicacy. We also put our model to
the test in four other well-known cardiac complaint data sets, with mixed results.
In comparison to an ANN model, our proposed model has a high level of delicacy.
Sayali Ambekar et al. (2018). created a Neural Network technique that predicts
a patient’s disease using a CNN-UDRP algorithm in structured data. This study
employs a real-time dataset of cardiac illness. They compared the results of the
KNN and Naive Bayes algorithms and discovered that the accuracy of the NB algo-
rithm is 82%, which is higher than the accuracy of the KNN algorithm. They were
able to forecast disease risk with structured data with a 65 % accuracy [6]. They
were able to achieve accurate disease risk prediction as an output by supplying accu-
rate sickness risk prediction as input, providing us with a greater understanding of
the level of disease risk prediction. The risk of heart disease is classified as low,
high, or medium [12]. Disease risk prediction can now be done in a short amount of
time and at a low cost thanks to this approach [6].
By adjusting the risk factors associated with the disease, ANA et al. (2017)
suggested an ensemble classifier that may be utilized as a general prediction model
for a range of diseases. The system is made up of a microcontroller that is linked to
a number of wearable sensors and the cloud [8]. Sensors record input values, which
are subsequently saved in the cloud and used to generate an alarm message [28].
When a critical level is reached, the system predicts the onset of sickness, allowing
the clinician to take appropriate measures. They investigated the accuracy of various
classification methods as well as ensemble classifiers in this study [8]. The findings
imply that ensemble classifiers are superior to other methods for making predictions.
Anashi Gupta et al. (2019) conducted a result analysis and determined that their
model uses KNN as the training method for classifying individuals with “a chance
of heart disease” and “no chance of heart disease.” In this case, KNN outperforms
IoT-based heart disease prediction system 343
well in terms of accuracy, sensitivity, and miss rate [50]. The disadvantage of this
system is that it does not provide a real-time predictive system that uses sensor data
for proactive health monitoring [7].
He et al. (2020) proposed a framework that transfers client data into client com-
puters and smartphones using Bluetooth, WiFi, and other LAN technologies once
the wearable IoT device has collected client data for 24 hours. The clever will then
upload the data to a distant cloud server, where they will apply a machine learning
algorithm that has already been taught to diagnose the submitted data. They can
attain up to 90% accuracy in this area. However, as the Internet of Things integration
was not tested in this work, speed and power consumption are still unknown.
Shaikh et al. (2015) used Decision Support in a Java Application using Data
Mining Techniques. The system tries to extract hidden knowledge from the data-
base. After determining the probability and in accordance with the probability, pat-
tern matching algorithms will be used to generate the appropriate treatments.
Ganesan et al. (2019) developed an effective cloud and IoT-based disease diag-
nosis model for monitoring, forecasting, and diagnosing heart disease [24]. The UCI
Repository dataset and medical sensors were utilized to create an effective frame-
work for predicting cardiac disease, which was also investigated [17]. Furthermore,
classification algorithms are utilized to categorize patient data in order to diagnose
cardiovascular disease. The heart disease dataset is used to train the classifier, which
is subsequently employed by the classification method to detect the presence or
absence of heart disease [15]. The trained classifier will evaluate incoming patient
data to determine whether or not the patient has heart disease [27]. Khalid Binsalman
et al. (2018) had used some in healthcare systems, sensor technologies can be ben-
eficial. It is especially beneficial to patients with chronic cardiac disease since it
allows for early intervention, which helps to save lives and cut the risk of death in
half [20]. Especially if these technologies are used in patient-monitoring systems
through the internet. As a result, using sensor technologies, this research presented a
useful remote monitoring system for heart disease patients [20].
With rising research in the fields of Internet of Things and machine learning,
as well as the growing desire for intelligent and data-driven healthcare ecosystems,
Bhat et al. (2020) have presented an automatic and real-time medical diagnosis and
prediction. The suggested strategy in this research seeks to allay these worries. The
interactive, user-friendly environment that combines real-time ECG detection and
diagnosis as well as a simple mobile application benefits patients undergoing diag-
nosis. However, accuracy is the key component of a medical application. A system
that is impervious to error overall is produced by deep learning models that can cor-
rectly and automatically learn the required characteristics [5].
V Tamilselvi et al. (2020) proposed a system that may be utilized to constantly,
efficiently, and remotely monitor a patient's temperature and heart rate. The doctor
may remotely monitor the patient's health from anywhere in the globe and provide
consultation based on the results, eliminating the need for the patient to visit the
hospital's O.P.D. (Outpatient Department). Temperature and heart rate thresholds
have been specified. The BLYNK app, which we utilized, sends a notice to the
phone whenever one or both of the readings exceed the threshold. In addition, the
344 Intelligent network design driven by Big Data analytics
suggested system is cost efficient due to the low cost of the components employed.
It's small in size, light in weight, and convenient to transport with the patient wher-
ever they go. The suggested approach is a logical and acceptable method of pro-
viding appropriate help to cardiac patients.[29] developed a system that employs
Edge-Fog-Cloud computing to provide a novel smart healthcare paradigm. Data for
this proposed model was collected from a number of hardware sensors. To extract
cardiac properties from signals, peak amplitude, total harmonic distortion, heart
rate, zero crossing rate, entropy, standard deviation, and energy were used. The
characteristics of other attributes were obtained in the same manner, by finding their
"minimum and maximum mean, standard deviation, kurtosis, and skewness." By
improving essential CNN parameters with the CCNN with GSO algorithm [2,] all
of these qualities were introduced to the diagnostic system. By improving essential
CNN parameters with the CCNN with GSO algorithm[2,] all of these qualities were
introduced to the diagnostic system. PSO-CCNN, GWO-CCNN, WOA-CCNN, and
DHOA-CCNN were 3.7 percent, 3.7 percent, 3.6 percent, 7.6 percent, 67.9 per-
cent, 48.4 percent, 33 percent, 10.9 percent, and 7.6 percent more precise than PSO-
CCNN, GWO-CCNN, WOA-CCNN, and DHOA-CCNN, respectively. As a result,
the smart healthcare paradigm with IoT-assisted fog To improve the prediction sys-
tem's accuracy in detecting cardiac illness, the current model could be enhanced in
the future by incorporating more advanced feature selection methods, optimization
approaches, and classification algorithms [2].
15.3 Proposed system
• USB connector
• power port
• microcontroller
• analog input pins
• digital pins
• reset switch
• crystal oscillator
• USB interface chip
• TX RX LEDs
346 Intelligent network design driven by Big Data analytics
15.3.2 Heartbeat sensor
The sound of a person’s heartbeat is formed by the contraction or expansion of the
valves in his or her heart as they force blood from one area to another. The rate of
the heartbeat is measured in beats per minute (BPM), and the pulse is the heartbeat
sensed in any artery adjacent to the skin [1].
The heartbeat sensor shown in Figure 15.5 is powered by the photoplethysmog-
raphy principle. It detects changes in blood volume traveling through any organ of
the body, resulting in a change in the intensity of light passing through that organ
(vascular region) [1]. In applications that measure heart pulse rate, pulse timing is
very crucial. The amount of blood that flows is determined by the rate of heartbeats
and blood absorption.
A basic heartbeat sensor consists of a light-emitting diode and a detector, such
as a light-detecting resistor or a photodiode. As a result of the heartbeat pulses, the
flow of blood to various parts of the body varies [10]. When a light source, such as a
led, illuminates tissue, the light either reflects (as in finger tissue) or transmits (as in
eye tissue) (earlobe). Some of the light is absorbed by the blood, and the light detec-
tor detects the transmitted or reflected light [16]. The blood volume of the tissue
determines the amount of light absorbed [16]. The detector produces an electrical
signal proportional to the heartbeat rate [34].
IoT-based heart disease prediction system 347
15.3.3 Temperature sensor
The LM35 series, as illustrated in (Figure 15.6), are precision integrated-circuit
temperature devices with a linearly proportional output voltage to temperature in
degrees Celsius [10]. Unlike linear temperature sensors calibrated in Kelvin, the
LM35 temperature sensor has the advantage of not requiring a sizable constant volt-
age to be subtracted from the output in order to provide appropriate Centigrade
scaling [46].
15.3.4 Pressure sensor
Figure 15.7 shows how a constant-area sensing device in pressure transducers
responds to the force generated by fluid pressure. The applied force will cause the
diaphragm of the pressure transducer to deflect. The internal diaphragm deflection is
detected and transformed into an electrical output [18]. Microprocessors, program-
mable controllers, computers, and other electronic equipment can therefore monitor
pressure [14].
communicate data over the Internet [31]. The device can help determine a person’s
health by measuring their heart rate and comparing it to a set point [32]. After these
constraints are set, the system will begin monitoring the patient’s heart rate, and if
the heart rate goes over or below the stated limit, the system will alert the user. For
this work, we’re using an Android app model that will measure a patient’s heart rate,
monitor it, and deliver an urgent message about the risk of a heart attack [33]. We
utilize Deep Learning architecture since it is an artificial neural network (ANN) that
allows us to manage the stream by combining Inputs according to trained Weights.
As a result, I’m calling for more adaptability in controlling outputs [34]. Using Pre-
historic data from the heart disease prediction database and simple ANN, we would
create a Deep Learning model. When new data is added to the database, the model
should try and predict cardiac disease, update the database, and notify the physician
[35].
Figure 15.9 presents the circuit diagram of the proposed system [36].
As shown in (Figure 15.10), the proposed network uses an arduino and the sen-
sors which we use are temperature (LM05), heartbeat, and pressure (MPX4115)
sensors [37]. It also has a com port and a terminal connected so by which help us to
detect and use the output from the sensors, The Deep learning model is trained with
an accuracy of 86.18%, the data which we acquire from the sensors and fed into the
Deep learning model and results in an acquired respectively. They also tried training
with some machine learning algorithms which we witness an accuracy of 74.3% for
both naïve Bayes and logistic regression [38]. Our system works in real-time and the
number of sensors deployed is comparatively less, so the accuracy can be increased
by adding more sensors and increasing the volume of trained data used to train our
model [39].
350 Intelligent network design driven by Big Data analytics
The final output of the model is displayed under two separate sections as the sens-
ing section and the predicting section. The results of heart rate, and temperatureare
displayed in LCD 16*2 displays as shown Figure 15.11.
The results of predicting section are shown in Figure 15.12. If the model detects
chance of heart disease based on the sensing results, it prints “chance of Heart
Disease,” else it prints “Normally Functioning Chance of Heart Disease if Less.”
IoT-based heart disease prediction system 351
15.7 Conclusion
This paper provides a comprehensive study on heart disease diagnosis using a per-
son’s heartbeat in the proposed system, the sensor [40] detects heartbeat signals and
sends them to the Arduino UNO. The Arduino [41], in conjunction with the WiFi
module, sends those signals to the cloud, where they are analyzed and predicted
using a deep learning model. The Android application [42] receives an appropriate
notification. Because we use artificial neural networks to predict the results, the
accuracy is higher. Even though we have improved the accuracy of the predicted
results, the network has some flaws. The system functions by sending a heartbeat
to a Wi-Fi module linked to a microcontroller [43]. We limited the system’s ability
to measure heartbeats. In the future we can store all the data in Edge devices and
perform further analytics and add the SMS and e-mail module to notify the patients
and doctors through our system [44].
References
[1] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘Resource efficient clus-
tering and next hop knowledge based routing in multiple heterogeneous wire-
less sensor networks’. International Journal of Grid and High Performance
Computing. 2017, vol. 9(2), pp. 1–20.
[2] Sudhakaran S., Kumar S., Ranjan P., et al. ‘Blockchain-based transparent
and secure decentralized algorithm’. International Conference on Intelligent
Computing and Smart Communication 2019. Algorithms for Intelligent
Systems. Springer; 2020.
[3] Sarangi L., Mohanty M.N., Pattnaik S., Shubham M. ‘An intelligent deci-
sion support system for cardiac disease detection’. International Journal Of
Control Theory And Applications, International Science Press. 2015, vol.
8(5), pp. 2137–43.
[4] Gowrishankar S., Prachita M.Y., Prakash A. ‘‘IoT based heart attack detec-
tion, heart rate and temperature monitor, international journal of computer
applications, (0975 – 8887), volume 170(5)’.2017.
[5] Aboobacker A., Balamurugan D.S. ‘Heartbeat sensing and heart attack de-
tection using internet of things: iot’. International Journal of Engineering
Science and Computing, 6662-6666, Volume 7 Issue No.4, April 2017.
2017.
[6] Manisha M., Sindhura V., Ramaya P., Neeraja K. ‘IoT on heart attack de-
tection and heart rate monitoring’. International Journal of Innovation in
Engineering and Technology. 2016, vol. 196.
[7] Ramesh K., Rao M.C., Subbalakshmi G ‘Decision support in heart disease
prediction system using naive bayes’. Indian Journal of Computer Science
and Engineering (0976-5166). 2020, vol. 2(2), pp. 170–76.
[8] Patel S Shivam, Chauhan Y ‘Heart attack detection and medical attention us-
ing motion sensing device- kinect’. International Journal of Scientific and
Research Publication (2250-3153). 2019, vol. 4.
[9] Ashrafuzzaman M.: ‘Heart attack detection using smart phone’. International
Journal of Technology Enhancements and Emerging Engineering Research 1.
2020, pp. 23–27.
[10] Jin B., Che C., Liu Z., Zhang S., Yin X., Wei A.X. ‘Predicting the risk of
heart failure with EHR sequential data modeling’. IEEE ACCESS: Practical
Innovations, Open Solution, PP. 1-1, 0.1109/ACCESS.2017.2789324.. 2018.
[11] Javeed A., Zhou S., Yongjian L, et al. ‘An intelligent learning system based
on random search algorithm and optimized random forest model for im-
proved heart disease detection’. IEEE ACCESS: Practical Innovations,
Open Solutions, Vol. 7, Pp. 180235-180243, 2019, Doi: 10.1109/
ACCESS.2019.2952107. 2019.
[12] Sarangi L., Mohanty M.N., Pattnaik S.: ‘An intelligent decision support sys-
tem for cardiac disease detection’. International Journal of Control Theory
and Applications. 2019, vol. 8(5), pp. 2137–43.
IoT-based heart disease prediction system 353
[13] Mamatha Alex P., Shaji S.P. ‘Prediction and diagnosis of heart disease pa-
tients using data mining technique’. 2019 International Conference on
Communication and Signal Processing (ICCSP), Publisher: IEEE, 2019, pp.
0848-0852, doi: 10.1109/ICCSP.2019.8697977; 2019.
[14] Bahrami B., Shirvani M.H. ‘Prediction and diagnosis of heart disease by data
mining techniques’. Journal of Multidisciplinary Engineering Science and
Technology (JMEST). 2020, vol. 2(2),(3159-0040).
[15] Bhuvaneswari R., Kalaiselvi K. ‘Naïve Bayesian classification approach in
healthcare applications’. International Journal of Computer Science and
Telecommunication. 2018, vol. 3(1), pp. 106–12.
[16] Chandra D.P., Deekshatulu B.L., Jabbar M.A ‘Cluster based association
rule mining for heart attack prediction’. Journal of Theoretical and Applied
Information Technology, 196-201, (1992-8645). 2019, vol. 32(2).
[17] Arabasadi Z., Alizadehsani R., Roshanzamir M. ‘Computer aided decision
making for heart disease detection using hybrid neural network-genetic algo-
rithm’. Computer methods and programs in biomedicine. 2017, vol. 141, pp.
19–26.
[18] Amebekar S., Plalnikar R. ‘Disease risk prediction by using convolution neu-
ral networks’. IEEE, 2018 Fourth International Conference on Computing
Communication Control and Automation (ICCUBEA), 2018, Pp. 1-5, Doi:
10.1109/ICCUBEA.2018.8697423. 2018.
[19] Ana R., Krishna S. ‘IoT based patient monitoring and diagnostic prediction
tool using ensemble classifier’. IEEE, 2017 International Conference on
Advances in Computing, Communications and Informatics (ICACCI), 2017,
Pp. 1588-1593, Doi: 10.1109/ICACCI.2017.8126068. 2017.
[20] Gupta A., Yadav S., Shahid S. ‘HeartCare: iot based heart disease predic-
tion system’. Presented at 2019 International Conference on Information
Technology (ICIT), 88-93, 2019;
[21] Qhe, Maag: ‘Heart disease monitoring and predicting by using machine learn-
ing based on iot technology’. UTC from IEEE Xplore. May 28, 2021.
[22] Shaikh S., Sawant A., Paradkar S., Patil K Electronic recording system-heart
disease prediction system 2015 International Conference on Technologies for
Sustainable Development (ICTSD; Mumbai, India, 2015.
[23] M.G ‘IoT based heart disease prediction and diagnosis model for healthcare
using machine learning models’. IEEE, 2019 IEEE International Conference
on System, Computation, Automation and Networking (ICSCAN), 2019, Pp.
1-5, Doi: 10.1109/ICSCAN.2019.8878850. 2019.
[24] BinSalman K., Fayoumi A. ‘Effective remote monitoring system for heart dis-
ease patients’. 2018 IEEE 20th Conference on Business Informatics; Vienna,
Austria; 2018.
[25] Bhat T., Akanksha, Shrikara, Bhat S., T M A real-time iot based arrhyth-
mia classifier using convolutional neural networks 2020 IEEE International
Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics
(DISCOVER; Udupi, India, 2020. https://ieeexplore.ieee.org/xpl/mostRecen-
tIssue.jsp?punumber=9278009
354 Intelligent network design driven by Big Data analytics
[26] Baad P.A. ‘IOT based health monitoring system using Arduino’. Journal of
Engineering, Computing and Architecture. 2020, vol. 10(5).
[27] Raju K.B., Dara S., Vidyarthi A., Gupta V.M., Khan B Smart heart disease
prediction system with iot and FOG computing sectors enabled by cascaded
deep learning model’.Computational Intelligence and Neuroscience. 2022,
vol. 2022(5), 1070697.
[28] Kumar S. Evolution of Software-Defined Networking Foundations for IoT
and 5G Mobile Networks. IGI Publisher; 2020. p. 350.
[29] Kumar S., Ranjan P., Radhakrishnan R., Tripathy M.R. ‘Energy efficient mul-
tichannel MAC protocol for high traffic applications in heterogeneous wireless
sensor networks’. Recent Advances in Electrical & Electronic Engineering.
2017, vol. 10(3), pp. 223–32.
[30] Reghu S., Kumar S. ‘Development of robust infrastructure in networking
to survive a disaster’. 2019 4th International Conference on Information
Systems and Computer Networks, ISCON 2019; Mathura, India; 2019. pp.
250–5.
[31] Kumar S., Cengiz K., Vimal S., Suresh A. ‘Energy efficient resource migra-
tion based load balance mechanism for high traffic applications iot’. Wireless
Personal Communications. 2021, vol. 10(3), pp. 1–19.
[32] Kumar S., Cengiz K., Trivedi C.M., et al. ‘DEMO enterprise ontology with
a stochastic approach based on partially observable Markov model for data
aggregation and communication in intelligent sensor networks’. Wireless
Personal Communication. 2022.
[33] Punhani A., Faujdar N., Kumar S. ‘Design and evaluation of cubic Torus
Network-on-Chip architecture’. International Journal of Innovative
Technology and Exploring Engineering (IJITEE). 2019, vol. 8(6), pp.
2278–3075.
[34] Dubey G., Kumar S., Kumar S., Navaney P. ‘Extended opinion lexicon and
ML-based sentiment analysis of tweets: a novel approach towards accurate
classifier’. International Journal of Computational Vision and Robotics.
2020, vol. 10(6), pp. 505–21.
[35] Singh P., Bansal A., Kamal A.E., Kumar S. ‘Road surface quality monitoring
using machine learning algorithm’ in Reddy A.N.R., Marla D., Favorskaya
M.N., Satapathy S.C. (eds.). Intelligent Manufacturing and Energy
Sustainability. Smart Innovation, Systems and Technologies. 265. Singapore:
Springer; 2022.
[36] Kumar S., Ranjan P., Radhakrishnan R., Tripathy M.R. ‘Energy aware dis-
tributed protocol for heterogeneous wireless sensor network’. International
Journal of Control and Automation. 2015, vol. 8(10), pp. 421–30.
[37] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘EMEEDP: enhanced
multi-hop energy efficient distributed protocol for heterogeneous wire-
less sensor network’. Proceedings - 2015 5th International Conference on
Communication Systems and Network Technologies CSNT; Gwalior, India;
2015. pp. 194–200.
IoT-based heart disease prediction system 355
16.1 Introduction
The distributed denial-of-service (DDoS) is a risky attack that can severely decrease
network performance by up to 99 per cent. This attack infected the network with a
script named Trin00, which caused the machines to crash severely, and it began on
July 22, 1999 [3]. Despite the fact that two decades have passed, Internet security
specialists continue to face significant challenges. According to a Kaspersky survey,
1
Department of Computer Science and Engineering, Thiagarajar College of Engineering, Madurai, India
358 Intelligent network design driven by Big Data analytics
DDoS attacks have climbed by up to 18 per cent from the year 2021 to today [4].
Similarly, in 2016, Mirai, a botnet that targets 200–300 Internet of Things (IoT)
devices and also diverts Domain name system (DNS), creating massive distribution in
content distribution, was discovered. GitHub was hit by the greatest enormous attack
ever, with 1.3 terabits of bandwidth exploding and bringing the network to its knees.
This is due to transmission control protocol (TCP)/ Internet protocols (IP’s) lack of
built-in security, which causes more collateral harm to the Internet architecture. Many
clean slate architectures have been proposed to defend the network against attackers
and inefficiency in providing security of today’s Internet architecture of TCP/IP, with
the information-centric network being the most nominated by the Internet security
community in recent years. ICN has built-in features that help you communicate with
multimedia content flawlessly. Content store (CS), pending interest table (PIT), and
forwarding information base (FIB) are some of the functionalities available. ICN is
concentrating on content security rather than end-host security. This built-in secu-
rity enhancement reduces the danger of security flaws that come with the traditional
TCP/IP approach. The in-built components of ICN that make the content distribution
process easier are shown in Figure 16.1. Several security solutions for IFA have been
identified in the literature [5, 6]. The named data networking (NDN) simulations were
used to implement the majority of the security identification and solutions. When
attacks are developed in real-time scenarios, however, the simulation results will not
adapt. As a result, we propose the ICN testbed in conjunction with NDN functions in
order to detect real-time threats using artificial intelligence techniques.
16.2 Background
In this section, the ICN communication model and its components are described in
Section 16.2.1, and attack scenario is described in Section 16.2.2.
DIAIF 359
where n(Ik )is defined as the sum of the number of particular requests for I. n(Ir )is
defined as a count of requests that is received by the router.
Variance of packet count: It is the measurement of comparing the relative
packet count influence the content distribution of ICN.
PN
(Ik NI)
S2 (I) = k=1 (16.2)
N1
P
where NI defines the mean of same interest packet NI = 1n Nk=1 Ik .
DIAIF 361
where NDk and NIk are number of data and interest packets belong to jthinterface of
j
router (rl )with respect to time (t).
PIT utilization ( ): PIT entries are calculated at each time cycle (t).
Pn
k=0 Nentriesk (t)
(rl , ti ) = (16.6)
Nsize
16.3.1 Attack detection
The Multi Feature(MF)-based Adaptive Neuro-Fuzzy Inference System (ANFIS)-is
a attack detection technique on NDN android. The Takagi–Sugeno system is used
to model the proposed multi function adaptive neuro-fuzzy inference system (MF-
ANFIS), where input is traveled through the multiple layers and output is linear
combination weighted average. Figure 16.4 shows the proposed MF-ANFIS model.
The sample of Takagi–Sugeno system with two rule set is as follows:
If x is A1 and y is B1 then f1 = p1 x + q1 y + r1 (16.7)
If x is A2 and y is B2 then f2 = p2 x + q2 y + r2 (16.8)
where Ai , Bi are the fuzzy set and pi , qi , ri are the linearly independent output vari-
ables. ANFIS contains five layers to process the input to output and nine if-then
rules as follows:
Layer 1: All the square input nodes are obtained by the parameters from the
whole data stream. The membership function to input node is given in the following
equation:
O1,i = Ai (x), i 2 1, 2, 3; O1,i = Bi3 (y), i 2 1, 2, 3 (16.9)
362 Intelligent network design driven by Big Data analytics
Layer 2: In this layer, every node is fixed and it multiplied the product of the input
to obtain the output values.
O2,i = wi = Ai (x) Bi3 (y), i 2 1, 2, 3...9 (16.10)
Layer 3: Every node in this layer is considered to be fixed and it does the normaliza-
tion function.
wi
O3,i = wi = w +w +w , i 2 1, 2, 3...9 (16.11)
1 2 3 +...w9
Layer 4: All square nodes are adaptive. The output of this node is considered to be
normalized values.
O4,i = wi .fi = wi .(pi x + qi y + ri ), i 2 1, 2, 3...9 (16.12)
Layer 5: It sums all incoming single and produces the proper output. All square
nodes are adaptive. The output of this node is considered to be normalized values.
P
X w i fi
wi fi = P , i 2 1, 2, 3...9. (16.13)
i
O5,i = f =
i i
The NDN Forwarding Daemon (NFD) is a key component of ICN’s content dis-
tribution and update provisioning. NFD supports a variety of platforms, including
Linux, FreeBSD, Mac OSX, Android, and Raspberry Pi and allows named data
DIAIF 363
and data discovery tactics are all done manually. The NFD approach with cache
for the NDN application is shown in Figure 16.6. The router caches the content
when two users request the same prefix name, as shown in step 9 of Figure
16.6. The user of the NDN app can request any type of data to be forwarded to
the nearest network. If the content is not available on the local router, it is auto-
matically forwarded to the producer via the gateway router. The FIB is updated
for each and every transaction. The local face and name prefix of user app1 are
first registered with the local NFD router. Users’ names and interfaces are then
saved in the local NFD. As a result, the content itself, rather than the location
information, is routed to the gateway router. In this method, NDN achieves
name-based routing rather than location-based routing.
• nfdc face create[remote] < FACEURI > [[persistency] < PERSISTENCY > ]
[local < FACEURI >]
• nfdc face create remote udp://education.example.com
• nfdc face create[remote] < FACEURI > [[persistency] < PERSISTENCY > ]
[local < FACEURI >]
• nfdc face create remote ether://[07:00:13:02:02:02] local dev://eth4 persistency
permanent
• nfdc route add [prefix] < PREFIX > [nexthop] < FACID|FACEURI> [origin <
ORIGIN >]
• nfdc route show prefix /localhost/nfd
• nfdc route add [prefix] < PREFIX > [nexthop]< FACEID|FACEURI > [origin
< ORIGIN >]
• nfdc route add prefix / nexthop udp://router.example.net
• nfdc route remove [prefix]< PREFIX> [nexthop] < FACID|FACEURI > [origin
< ORIGIN>]
• nfdc route remove prefix /ndn nexthop 421 origin static
NFD face is created with nexthop and remote URI for request and response of
packets. Figure 16.7 shows the face ID of NDN remote URI. Figure 16.8 shows the
updated NDN components after creating the face ID and route list.
We put NFD to the test on two android devices with different compatible mod-
els and configurations. NFD tools are also used to configure the producer device.
When the gateway router does not have the packet in cache for each test, it sends the
request to the producer. When name-based routing is used in the network, the CPU
368 Intelligent network design driven by Big Data analytics
use is lowered. Each customer request originates from a 100 MB android handset.
The test is run on two android phones: a Samsung phone with 4.0 GB RAM and a
Hisilicon Kirin 710 CPU, and a Motorola phone with a 2.1 GHz quad-core CPU.
The six parameters are given as input to the MF-ANFIS model and it produces
one output. Before being fed into the AI model, it must be normalized to [0..1].
In MF-ANFIS gaussmf membership, functions are used to obtain the high accu-
racy results. The center of gravity functions is used as a defuzzification model. MF-
ANFIS uses if-then rules for decision making on android NFD. Figure 16.11 shows
the proposed detection model with multiple features. Figure 16.12 shows the MF-
ANFIS decision making on NDN android.
During this heavy flooding period, the CPU consumption is considerably high.
When a user wants to request or obtain the data from the NDN android, the pack-
ets are lost due to the heavy load in the CPU process. With the help of MF-FCM,
the CPU load is captured. Figure 16.13 shows the CPU load during the IFA. The
“green” line indicates that the traffic has arrived to the network. The “blue” line
represents the outbound traffic that is nothing but the router service the packet to the
destination.
16.5 Conclusion
In the event of an emergency, the NDN on android provides a far faster response
time. The NDN application is vulnerable to a severe attack due to transparent for-
warding table updates and name-based routing. During a major flood, the NDN
app fails to consider the incoming packet, resulting in significant packet loss in an
emergency circumstance. The artificial intelligence-based MF-ANFIS is presented
to detect the attack pattern and frequency of the request pattern. MF-ANFIS detects
attacks and assists in the discovery of malicious content patterns. As a result, IFA
detection is required for real-time applications, and additional research into NDN
security challenges pave the path for a healthy and intelligent future.
References
[1] Antonakakis M., April T., Bailey M., et al. ‘Understanding the mirai botnet’.
26th USENIX Security Symposium(USENIX Security 17); Vancouver, BC;
2017. pp. 1093–110.
[2] Jose S. ‘Global mobile data traffic forecast update, 2016–2021 white paper,
index’. Cisco Visual Networking. 2017, p. 180.
370 Intelligent network design driven by Big Data analytics
1
ABES Engineering College, Ghaziabad, India
2
Amity University, Noida, India
3
IOWA State University, AMES, IA, USA
372 Intelligent network design driven by Big Data analytics
findings demonstrate that models trained using a dual data gathering strategy
will produce more accurate outcomes. Data classification will be substantially
more accurate when neural networks, are used. The methods described here
can be used on a broader scale to monitor roads for problems that pose a safety
concern to commuters and to give maintenance data to appropriate authorities.
17.1 Introduction
Nowadays, roads are the primary resource of infrastructure that provides support
for the migration of human beings, goods and logistics, which in turn will develop
a strong foundation for social and economic development of a country and inter-
connecting city. Apart from this, a sudden increase in volume of vehicles on roads
had caused problems such as traffic accidents, traffic jams, and traffic congestion.
In the last few years, there had been a sudden rise in the urbanization sector. Due
to the increased rates of urbanization, more and more vehicles are being purchased
and made available on the road in order to improve the migration of human beings,
goods and logistics. As the number of vehicles on the road increases, it in turn will
directly increase the traffic congestion, accidents, and pollution.
The Intelligent Traffic and Transportation Management System (TMS) is a
better road and traffic management system that integrates data, provides commu-
nication and provides access technologies to properly integrate and inform drivers,
vehicles and roads in a way that assists people driving the vehicles. TMS’s main
goal is to use information and communication technology to tackle not only car/bus
traffic problems but also economic problems such as ageing, tourism revitalization
and long-term economic development of a country.
17.1.1 Definition of TMS
TMS is a new transportation system that uses various new technologies to connect
people, roads and vehicles in an information and communication network to solve
various traffic and transportation issues such as traffic accidents and congestion.
In terms of technology, this is a road or traffic information management system in
which road traffic data are gathered and sent to drivers via sensors put along the
roadside. The TMS technique provides users with a variety of road traffic applica-
tions. The components of TMS are represented in (Figure 17.1).
In 21st century, the population in urban cities has increased at a very fast
rate compared to rural areas. In the year 2019, 56% of the total world
Intelligent and cost-effective mechanism for monitoring road quality 373
population is living in the urban sector. By the year 2045, UN expects that
76% of the world’s entire population will live in cities or the urban sector.
Assuming these hypotheses to be correct and the rapid increase in urbani-
zation continues, it is also reasonable to expect that when the count of
vehicles increases, other road traffic difficulties occur, such as traffic con-
gestion, traffic accidents and environmental effects. The pictures of traffic
congestions in Asian cities are given in (Figure 17.2).
2. Increased population in urban areas needs transportation services:
Traffic safety is one of the major issues nowadays as the requirement and
availability of transport are increasing at a very fast rate. Globally, 1 mil-
lion people have lost their lives and about 50 million got injured due to
traffic accidents. Traffic accidents are one of the primary causes of acci-
dents in younger generations nowadays. In detail, mostly, victims of road
accidents are pedestrians, bicycle drivers and rickshaw drivers.
There is an increasing number of cars in the urban cities, and almost each
and every individual is having a car, bike or any other means of transport in
urban cities. This rapid increase in turn will give rise to traffic congestion
in mega-urban cities.
Due to the growth of population as well as city expansion, there is a rise in traf-
fic as well as the volume due to which there is a deterioration of different transporta-
tion infrastructures. Also, the different causes for such destruction are the various
effects of bad construction material use, deficiency present in the design of the road,
climate change. In combination, all these issues basically result in the appearance of
different anomalies in the roads like bumps, potholes, cracks, which can be found in
different expressways highways and streets present worldwide.
Not only these anomalies but also illegal construction of the safety breakers is
also becoming a threat leading to the deterioration of the road safety. Speed breakers
are generally constructed in order to provide safety for the pedestrians in different
zones in order to control the limit of speed for the vehicles, which helps in avoiding
accidents.
Unnecessarily, a large number of unapproved speed breakers have been
installed, many of which are known to not adhere to the actual size set by the NHA.
The speed bumps are very common in the countries like India due to the fact that
different signboards like speed limit, stop, yield and so on will not be able to work
because of a lack of the traffic to enforcement resources.
This generally happens in the country because of various reasons such as higher
speed of the vehicle, negligence of the driver, lesser visibility at night. There are
different incidents that have been reported where vehicles such as cars, scooters,
motorcycles are much more vulnerable because some speed breakers that are unno-
ticed may cause them to lose balance and can lead to severe accidents as well as
damages.
In the year 2016, India had 300,000 road deaths that is actually double the
amount of the estimation by the government that was only 151,000 in accordance
to the 2018 global safety report provided by WHO, which basically highlighted the
lack of the amount of data on fertility road accidents.
According to the data from the government (Figure 17.4), about 150,000 people
have lost their lives in 2017 due to road accidents, which basically means that 17
people die due to road accidents every hour.
The number of deaths accounts for one-third of the overall accidents, which
averages roughly 53 every hour.
UP has been hit hardest, followed by Tamil Nadu. Delhi is considered to be one
of the safest places for driving considering the various numbers of fatalities that take
place.
Different solutions have been considered in order to automatically detect as well
as report any kind of anomalies on the road to different government agencies so that
the maintenance tasks can be accelerated.
For example, by making the use of different computer vision methods that are
based on texture differences as well as shape segmentation in order to identify the
potholes. Approaches similar to this like contour information, edges, adopting shape
and so on have been considered.
In today’s time, the majority of people possess android mobile phones or smart-
phones that are well equipped with various inbuilt apps such as navigation and Google
maps and sensation systems like Gyro Sensor, Accelerometer, Magnetometer and so
Intelligent and cost-effective mechanism for monitoring road quality 377
on, are enabled with GPS, and are always connected to the Internet. The technology
of smartphones has been adopted in order to tackle such a problem provided its geo-
referencing and sensing capabilities.
The accelerometers in smartphones can detect the movement of the device
in such a way that if the vehicle comes across any irregular road surface like a
bomb or a pothole, the accelerometer will record the occurrence of this event. The
major problem then is the identification of the series of continuous readings of the
378 Intelligent network design driven by Big Data analytics
accelerometer when any kind of anomaly occurs. Even though the task of detecting
road imperfections with a smartphone is well defined, the organization has been
unable to gather much more knowledge and a comprehensive perspective due to a
number of issues. It also allows and encourages the development of an Android app
that uses the data collected by these sensors in the user’s smartphone to warn the
user about bumps and poor road conditions.
A literature review of various machine learning algorithms will be discussed in
detail in Section 17.2.
Detection algorithms are the most differing aspect of the previous work on this topic.
Some papers have described machine learning approaches to road quality classifi-
cation; however, others have opted for different solutions. This subchapter aims to
provide an overview of the different feature sets and techniques used in machine
learning approaches (Table 17.1).
The Pothole Patrol is a sensor-based programme that monitors the condition of
the road surface. It necessitates the integration of particular hardware: an embedded
computer running Linux for data processing, a Wi-Fi card for data transmission, an
external GPS for localization and a three-axis accelerometer for road surface moni-
toring are all required for each vehicle. It detects potholes using a machine-learning
technique. Microsoft developed Nericell, a system that checks traffic conditions. It
necessitates a complex hardware and software configuration. A microphone, GPS
and the SparkfunWiTilt accelerometer are among the external sensors used. The
technology may mistake smooth, uneven and rocky roads due to the inaccuracy of
the detection. Mednis and colleagues proposed a real-time pothole detection system.
The system makes use of Android phones with accelerometer sensors and basic
algorithms for detecting events from acceleration data. The real positive rate was
found to be 90% in the experiments. The system only uses an accelerometer sensor,
and data are acquired using specialized hardware, which is one of the work’s limita-
tions. A camcorder mounted on the front passenger seat’s headrest is used to label
the items. Labelling driving data with video, on the other hand, is a time-consuming
and error-prone task.
The author describes a technique for detecting potholes. The neural network
approach, which has an accuracy of 90–95%, is used to justify the threshold values.
Smartphone accelerometers and gyroscopes are utilized to detect surface irregulari-
ties using an auditory data tagging method in which a labeler sits behind the driver
inside the car and notes everything pertinent he sees or feels. Then, with a 90%
accuracy, SVM is employed for detection and categorization of aberrations.
Using a low-cost Kinect sensor, Moazzam et al. calculate the volume of a pot-
hole. The use of infrared technology for measurement based on a Kinect sensor is
still a novel concept, and more study is needed to reduce error rates. Zhang et al.
Intelligent and cost-effective mechanism for monitoring road quality 379
[16]
Road Sense Accelerometer, Not used Automatic C4.5 Decision tree
Gyroscope,
GPS
380 Intelligent network design driven by Big Data analytics
contribute to the learning algorithm. For classification, two machine learning algo-
rithms were used: KNN and Naive Bayesian classifier. In order to test the classifiers
on evaluation data, 10-fold cross-validation was used. It was noted that while Naive
Bayes and KNN algorithms performed mostly similar, a slight increase was seen
with the KNN. However, the overall accuracy of 78% proved to be not satisfactory.
In the bump detection-based classification approach, the classifier had to distinguish
between a smooth road and a road that has a bump or some other similar anomaly.
Here, classification accuracy was a lot better, with an accuracy of around 98%.
Eriksson et al. [18] employed signal processing and machine learning to detect
and classify a number of different road irregularities. The topic of other undesired
abnormalities manifesting in the accelerometer signal, such as abrupt turning or
stopping, is discussed in the study. To reject such events, several features, such
as speed and filters, such as a high-pass filter, were utilized. The training of the
learning algorithm is based on the peak X-axis and Z-axis acceleration values and
the instantaneous velocity of the car. To help remove false positives and increase
the overall accuracy of correct anomaly detection, the paper describes a process
where an anomaly is reported only when several other detectors have also detected
an anomaly in the same spot. Overall, the paper concludes that by using training
data, which had been carefully examined, the described pothole detection system
achieved a false positive rate of 0.2% in controlled experiments.
Smartphones and tablets have grown ubiquitous in today’s culture. The world-
wide smartphone market has seen a tremendous increase in shipments in recent years
[19]. Furthermore, in recent years, mobile internet usage has significantly increased,
lately surpassing desktop internet usage on a global scale [20]. Because of the ubiq-
uity of mobile devices and their expanding computational capacity, smartphones are
now capable of completing an increasing number of tasks. Mobile applications on
platforms like Google Android and Apple iOS can make use of the devices’ inherent
sensors, allowing for new ways to use these smart gadgets. Sensors can be used for
a variety of activities, such as detecting the device’s position, or more complicated
difficulties, such as sensing human activity [21].
Another key aspect of modern life is road infrastructure. A vast number of indi-
viduals use roads on a regular basis, whether driving their own car, riding their bikes,
walking, or taking public transportation. Many government services and businesses
rely on the road network in some form. A well-maintained and safe road network is
beneficial for everyone involved as the rate of motorization continues to rise [22].
Road quality is a reflection of a country’s development status and has an
impact on travel speeds and safety. There is a major need of evaluation of the qual-
ity of roads. The collected data will be forwarded to a self-built server, where an
analysis module will calculate and analyze the data in order to generate road condi-
tion information. The related previous works revolved around the threshold-based
detection techniques, but they were not that much effective as they just detected the
damage to roads. Some of the works include the use of fast Fourier transform to get
the acceleration information. This research, on the other hand, makes advantage of
mobile sensor capabilities as well as social community data collecting. Some works
also used spatial–temporal anomalies to detect the damage to the roads. During
382 Intelligent network design driven by Big Data analytics
every action, mobile sensing technology is utilized to identify targeted data uti-
lizing relevant sensing components. This research will use mobile crowd sensing
to gather data and develop collective intelligence. This study employs an unsu-
pervised anomaly detection algorithm to aid in the comprehension of typical and
aberrant road sections. The sensing components of phones are employed in this
paper to capture oscillating amplitude data experienced by a user while driving on
a road. Every second, data are gathered and delivered to a self-built analytic server.
As a result, in the context of this study, different users traverse the same road and
multiple sets of oscillation amplitude data are obtained for the same road. After
collection, the server uses an aberration finding algorithm to extract the aberrant
information from the data series. The basic goal is to distinguish between typi-
cal and aberrant road portions. This work uses an unsupervised anomaly detection
approach to model road quality and interpret normal and abnormal road sections.
In addition to the analysis of the obtained data and anomaly detection, the Google
Maps SDK is employed.
A road monitoring technique was presented to provide security and ease of use
to various road users [23]. The major goal of this study is to develop a real-time
Android application called Road Sense that uses a tri-axial accelerometer and a
gyroscope to automatically forecast road quality. The road position trace will be
displayed on a geographic map utilizing GPS, and all recorded workout entries will
be saved. It will also provide a depiction of a region’s road quality map. The pro-
posed road monitoring system would be created using machine learning, with the
C4.5 Decision tree classifier being used to categorize road segments and build the
model using training data. The overall result shows a consistent accuracy of 98.6%.
Discussion on various road monitoring technologies to predict the road conditions
to provide smooth, safe and comfort travel with less damage to vehicles using 27
papers. The literature survey highlights the hardware and software configurations
and limitations related to developed applications. The authors highlight the various
classification algorithms with the focus to achieve highest accuracy. The framework
of the proposed system is divided into two phases:
1. Training phase: Effective features are extracted from specified sorts of road
conditions depending on acceleration and rotation around gravity first in the Feature
Extraction step. The features are then fed into a classifier model that can do fine-
grained identification.
2. Prediction phase: It will detect and identify road conditions by sensing real-
time vehicular dynamics. Pre-processing is done on sensor readings after collecting
the readings from the accelerometer and gyroscope integrated into the smartphone.
The system would next use the learned classifier model to forecast road quality. In
this paper, the performance of the system is evaluated in two steps: (a) analytical
validation and (b) experimental validation. Analytical validation evaluates the per-
formance of various classifiers using a variety of parameters. The system’s practi-
cality is tested in a real-world setting during experimental validation. To obtain the
dataset, a drive of about 40 min in length (25 km), and a total of 2,000 samples of
data were collected. The author has compared the performance of three classifiers
named SVM, Navies Bayes and C4.5 on the scale of various parameters such as
Intelligent and cost-effective mechanism for monitoring road quality 383
Precision, Recall, ROC area, F-measure, TP rate and FP rate. Overall, the results
demonstrate that C4.5 is superior in terms of detection accuracy (98.6%).
[24] The focus is on data from traffic accidents, which is the most fundamen-
tal measure of safety without which the scope and nature of road safety cannot be
determined. The precision of the data, data keeping and its analysis are all important
aspects in making successful use of accident information. As a result, the incident
report should be of high quality. If the initial incident report is weak and incom-
plete, analysis and application of the findings are poor as well. Road accidents are
unpredictably unpredictable incidents that require thorough investigation. The het-
erogeneity problem is a fundamental issue in accident data analysis. Although seg-
mentation can be used to reduce data heterogeneity, there is no assurance that it will
produce the most accurate group segmentation, which includes road accidents. The
k-means technique is used to suggest a model for analyzing road accident data.
Estimating the count of clusters: Estimating the count is the most difficult part of
the clustering method. One of the constraints of this technique is that the value of k
must be provided by the user. If the value of k is incorrect, the clustering results may
be erroneous. Gap statistics are utilized to circumvent this constraint.
Association rule: This mining is the act of generating a set of rules that define
the basic patterns in a data set. The frequency with which attributes of data occur in
the data set determines their correlation. The most fundamental prerequisite of clus-
ter analysis is to figure out how many clusters the clustering method will produce.
After determining the number of clusters, use the R statistical software to partition
the accident data sets using the k-means clustering algorithm. Cluster-based acci-
dent variables are determined by a thorough examination of each cluster. Using k to
represent clustering and association rules mining methods, this research proposes a
method for assessing accident patterns of various types of road accidents. The study
looked at accidents that occurred on Maharashtra’s roads in 2015 and 2016 [6-9].
K-means clustering finds five categories based on the attribute’s accident type, road
type, light condition and road attributes (C1–C5). Association rule mining was used
to construct rules for each cluster as well as the overall data set. All cluster rules
reveal situations relating to incidents within the cluster.
[25]. In paper [25], SVM is used as classification algorithm. The mobile app
will first process the data via various algorithms and then will transfer the data to
the database by internet. This app produces a sound on pothole detection. With the
help of the data, potholes can be categorized as low, medium and high. The paper
also shows the technologies and calculation methods used in the model in detail and
also shows the settings and results of the experiment. Various research papers are
studied by the authors. One of the papers provides them the parameters like speed,
Z-peak, etc. which can help in classifying the potholes. One of the research papers
uses accelerometer and GPS same as suggested in this paper by the authors. Various
designs like Cartel System, Pothole Patrol System, Nericell System, etc. can also
be used to detect conditions of road by mobile sensors. The solution is to develop
an Android app that collects data. The app can start or stop or save data collection.
The file in this paper is saved as .csv format. With the help of accelerometer sen-
sors readings are captured. along with these readings. Accelerometer readings are
384 Intelligent network design driven by Big Data analytics
A taxi-based mobile sensing system named Pothole Patrol was shown for moni-
toring and analyzing pavement conditions. Three-axis acceleration sensors, a GPS
receiver and a laptop computer with a data aggregation and processing unit are all
included in the cab. A more efficient and cost-effective technology for identifying
potholes and cracks was devised with laser imaging. It used sensors implanted in
cell phones to collect data in cars. The system, for instance, may employ a one-
degree-of-freedom vibration model to actively gain a vehicle’s competence. Both
supervised and unsupervised machine learning methods are used to detect road con-
ditions. Using a simple machine learning methodology and a clustering algorithm, a
method for detecting road anomalies was presented for evaluating driving behavior.
In the case of potholes and bumps, the average detection rates were determined to be
88.66 and 88.89%, respectively. The crowdsourcing strategy not only lowers costs
and improves efficiency but also removes geographical and time constraints, making
communication more convenient. The proposed solution of the paper is as follows:
A. System Model: Dr. Taguchi employed the Mahalanobis distance (MD) and
measurement scale and provided a threshold to identify unknown samples using
a combination of MD and signal-to-noise ratio.
B. Benchmark space: The window size of flat road data is initially believed to be
“h”. The flat road sample space has m dimensions, and the sample data is Xi =
(Xi1, Xi2, Xim), with the i-th feature’s data acquired in the j-th time.
Intelligent and cost-effective mechanism for monitoring road quality 385
After doing the literature survey in detail, the following are the gaps identified in the
mentioned research domain:
a. SCOOT Technology can only be used where the traffic flow is very less such as
Yamuna Express Way or Agra-Luck now Expressway.
b. In addition to SCOOT, a new technology was developed that is ETSI-G5, which
is very expensive as it uses onboard units (OBUs) and roadside units (RSUs)
that are placed along with the road infrastructure.
c. Neither SCOOT nor ETSI-G5 talks about the safety of roads. Only monitoring
or surveillance of vehicles is done using these technologies. This seems to be
providing a costly incomplete solution in terms of all types of traffic flow.
d. In the literature, various road safety techniques have been suggested; however,
in maximum cases focus was on the driver’s fault for the road accidents.
e. Quality of the roads and accidents due to animals is not being considered in the
literature as the factor for road accidents or damages to the vehicles.
f. Study of traffic congestion at traffic lights was not discussed in the literature.
17.4 Proposed methodology
Since mobile phones also have Bluetooth interfaces, a lot of apps have been
written to take data from the BLE device and use the mobile phone to communicate
with a central server or navigational entity. Though what we are proposing does not
remove such capability, the system we want to build is not dependent on users hav-
ing mobile phones. Instead, the BLE device will beacon every few seconds and those
beacons with their UUIDs will be picked up by Wi-Fi/BLE gateways. The gateway
devices cost £54 each, and this project will buy 50 such gateways. These gateways
will then use eight wireless routers, which will be connected to the University of
Ghana’s communications network. This is an Ethernet network running at 1 Gbps.
The information will be sent to a central server, where it will be stored, processed
and used to display real-time traffic flows.
The Wi-Fi/BLE gateways will be placed on two very busy roads that border
the University of Ghana. The first is called National Highway – four or N4, which
runs from north to south along the east side of the Campus. In fact, it forms a direct
boundary with the campus for 3.5 km. The second road is called the Haatso-Atomic
road and runs from east to west on the north side of the campus and forms a natural
campus boundary for 1.5 km. So, in this proposal, we would like to monitor traffic
from both roads making this a total of 5 km of road infrastructure that will be moni-
tored. The data gathered will be analyzed to show the traffic flow along these roads.
The results will be displayed using a Virtual Network Computing server; hence,
the results will be available to all commuters in real time. Information will also be
communicated using mobile phones as well as road signage such as overhead road
gantries.
Research methodology has been organized into:
1. Research design
The research is exploratory as well as descriptive. The study’s associated
variables were discovered, and the research problem was specified using an
exploratory research methodology. Exploratory research also proposes a con-
ceptual framework that incorporates the pertinent variables. Following that, a
descriptive research methodology was employed to empirically test the stated
conceptual framework of the study, as well as statistical analysis of the data.
2. Sampling design
The sample for the study of the city of Noida has been selected. We plan to
collect the data regarding the roads condition and traffic conditions of the road
near the Amity campus. To study will be performed on the road conditions
using the sample size of 500 vehicles with BLE devices for communication and
sensors like accelerometers embedded with them.
3. Data collection instruments
The basic data collection instruments are the sensor that is embedded in the
vehicle.
1. Accelerometer
2. Gyro meter
3. Cameras
4. GPS
Intelligent and cost-effective mechanism for monitoring road quality 387
17.5 Implementation
Figure 17.7 (a) and (b) Represent the working model for identifying whether
there is no pothole or not a pothole on the road, respectively.
represents the majority class in the above case). We dealt with the problem by adopt-
ing stratified cross-validation processes. The entire dataset is divided into numerous
folds, with the target distribution remaining the same for each fold. It is called strati-
fied because the folds are made using stratified sampling.
As our initial model, we used logistic regression. The logistic function, com-
monly known as the sigmoid function, is used to create the simplest basic categori-
zation approach. It is an S-shaped curve that transforms all real numbers to a number
between 0 and 1.
The logistic regression model predicts probabilities. It is given by:
Y = 1/(1 + e(x)).
The sample logistic regression equation can be modelled as
Y = 1/(1 + e((b0 + b1 x))).
The logistic regression equation’s coefficients can be estimated via maximum-
likelihood estimation. The second model we built was the k closest neighbours
model. KNN is an easy-to-use and develop machine learning algorithm. It is a
supervised machine learning technique that may be applied to classification and
390 Intelligent network design driven by Big Data analytics
regression issues. It is founded on the idea that similar objects are more likely to
be located near together, i.e., similar items are kept together as much as possible
[5]. The KNN method calculates closeness using several distance metrics such
as Minkowski distance, Euclidean distance and Manhattan distance. There is one
hyperparameter k in this method that we must choose well in order for our algorithm
to perform well on the data. To pick the optimal value of k, we looped over various
neighbours in the range of 5–25 in steps of 5, trained the algorithm and then stored
the best-performing model. The final model is SVMs. The SVM is a type of super-
vised machine learning approach for solving classification and regression issues. In
an SVM, each data point is displayed in n-dimensional space, where n is the number
of features in the data. We utilize a hyperplane to easily distinguish the classes while
categorizing them.
The decision tree classifier is the fourth model. A decision tree is a type of
supervised machine learning approach that may be used to solve classification and
regression problems. A decision tree is a hierarchical structure consisting of nodes
and their connections. There are three types of nodes in a decision tree: root node,
internal node, and leaf node. A root node is one that has zero or more outward edges
and no incoming edges. A node with exactly one incoming edge and two or more
outgoing edges is called an internal node. A leaf node has only one inbound edge
and no outward edges. Test attributes are used to analyze and segregate data based
on the characteristics of each non-leaf node in a decision tree.
Random forest classifier is the fifth model. Random forest is a supervised
machine learning approach for classifying and predicting data. A random forest is
a collection of several decision trees that, as the name suggests, create a forest. It
uses the ensemble approach. Each decision tree forecasts a distinct class label, and
the model’s final forecast is determined by the class label that receives the most
votes. A random forest model outperforms a decision tree model in the majority of
Intelligent and cost-effective mechanism for monitoring road quality 391
situations due to the low correlation between many decision trees in the model. The
model’s final prediction is derived using the ensembling approach, which combines
many low-correlated decision trees to get a range of forecasts. Uncorrelated models
can make more precise forecasts than individual model projections. The reason for
this is that in a random forest, each uncorrelated decision tree stops the others from
making their own mistakes.
Boosting techniques were also utilized to develop numerous models, including
the gradient boosting classifier. It makes use of a boosting method. Boosting is a
method of transforming weak pupils into powerful ones. In this situation, the deci-
sion tree is a poor learner. Algorithms for boosting exist in a range of shapes and
sizes. These algorithms differ from one another in terms of how they transform weak
learners into strong learners or how they uncover and correct weak learners’ flaws
in order to develop strong learners who can more accurately predict data. Adaboost
is a boosting method that addresses a poor learner’s shortcomings by using adaptive
weights or high weight data points. Adaboost starts by creating a decision tree for
each observation with the same weights. In succeeding rounds, it adjusts the given
weights based on the predictability of a single observation. The data that are more
difficult to categorize are given a lower weight, while the data that are simpler to
classify are given a higher weight. A gradient boosting machine, on the other hand,
addresses the drawbacks of a weak learner by using gradients in the loss function.
It uses a progressive, cumulative, and sequential approach to train models. Another
algorithm that uses the boosting technique is Xgboost. If the data isn’t too tough,
it can build an ensemble of linear models and an ensemble of gbtree, which uses a
decision tree as the basis estimate. It works by building a base model that predicts
the target variable first and then training subsequent models to match the residuals
from prior phases. To decrease residuals from previous phases, Xgboost employs a
step-by-step strategy.
17.6 Results
As a consequence of our research, we identified a few surprising and fascinating data,
indicating that AI approaches may be used to handle the issue of pothole recognition
and street condition order. We used a number of AI computations and attempted a
variety of proving and boundary modifying processes that we were familiar with to
get to our conclusions (Figure 17.9 a, b, c, d).
We trained several other models, such as K nearest neighbours, SVMs, and
gradient boosting algorithm, in a similar fashion, and they yielded 91.4, 92, and
92.3% accuracy, respectively. Figure 17.10a,b,c illustrate the results of a compara-
tive examination of models based on F1 Score, precision and recall.
discovered as the research progressed. Also, we gathered inspiration from all of the
references and used it to replicate their ideas, which we then improved upon to cre-
ate our own.
A major modification we made as part of our approach was a change in data
gathering that we thought would help us obtain a wider range of results from our
study. As a result, we used a car rather than a motorcycle, which had previously
been employed extensively in other investigations. This technology may be a little
more expensive than previous research, but it allows us to obtain more consistent
and accurate data from mobile sensors. As a result, our effort in this study meets our
expectations in terms of providing us with a high-quality dataset.
In addition, most researchers solely use accelerometers as their major source
of research, as evidenced by previous studies. However, in order to offer additional
context to the readings supplied by the sensor data, we analyzed the gyroscope read-
ing in addition to the accelerometer reading in this study. We were able to syn-
chronise our accelerometer readings to a global frame of reference thanks to the
gyroscope data, regardless of the car’s orientation or any other disturbances at the
time of data collection.
Last but not least, for our research, we used all three axes of the accelerometer.
This indicates that, unlike most previous studies.
References
[1] Pan J., Khan M., Popa S.I, et al. ‘Proactive vehicle re-routing strategies for con-
gestion avoidance’. 2012 IEEE 8th International Conference on Distributed
Computing in Sensor Systems (DCOSS), Publisher: IEEE, Hangzhou, China,
Intelligent and cost-effective mechanism for monitoring road quality 395
18.1 Conclusion
References
[1] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘EMEEDP: enhanced
multi-hop energy efficient distributed protocol for heterogeneous wireless
sensor network’. 2015 5th International Conference on Communication
1
Department of Computer Science & Engineering, Amity University, Uttar Pradesh, India
2
Department of Computer Science, Middlesex University, UK
3
Department of Electrical-Electronics Engineering, Trakya University, 22030 Edirne, Turkey
398 Intelligent network design driven by Big Data analytics
Systems and Network Technologies, CSNT; Gwalior, India, 04-06 Apr; 2015.
pp. 194–200.
[2] Kumar S., Ranjan P., Ramaswami R. ‘Energy optimization in distrib-
uted localized wireless sensor networks’. Proceedings of the International
Conference on Issues and Challenges Intelligent Computing Technique
(ICICT); Ghaziabad, India, 07-08 Feb; 2014.
[3] Chauhan R., Kumar S. ‘Packet loss prediction using artificial intelligence uni-
fied with big data analytics, Internet of things and cloud computing technolo-
gies’. 5th International Conference on Information Systems and Computer
Networks (ISCON); Mathura, India, 22-23 Oct; 2021. pp. 01–6.
[4] Sudhakaran S., Kumar S., Ranjan P., Tripathy M.R. ‘Blockchain-based
transparent and secure decentralized algorithm’. International Conference
on Intelligent Computing and Smart Communication 2019. Algorithms for
Intelligent Systems. Springer; Singapore; 2020.
[5] Kumar S., Trivedi M.C., Ranjan P., Punhani A. Evolution of Software-Defined
Networking Foundations for IoT and 5G Mobile Networks. 10. IGI Publisher;
2020. p. 350.
[6] Kumar S., Ranjan P., Radhakrishnan R., Tripathy M.R. ‘Energy efficient mul-
tichannel MAC protocol for high traffic applications in heterogeneous wireless
sensor networks’. Recent Advances in Electrical & Electronic Engineering.
2017, vol. 10(3), pp. 223–32.
[7] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘Resource efficient clus-
tering and next hop knowledge based routing in multiple heterogeneous wire-
less sensor networks’. International Journal of Grid and High Performance
Computing. 2017, vol. 9(2), pp. 1–20.
[8] Kumar S., Cengiz K., Vimal S., Suresh A. 'Energy efficient resource mi-
gration based load balance mechanism for high traffic applications iot'.
Wireless Personal Communications. 2021, vol. 10(3), pp. 1–14. doi: 10.1007/
S11277-021-08269-7.
[9] Haidar M., Kumar S,. ‘Smart healthcare system for biomedical and health
care applications using aadhaar and blockchain’. 2021 5th International
Conference on Information Systems and Computer Networks, ISCON 2021;
GLA Mathura, 22-23 October 2021; 2022. pp. 1–5.
[10] Punhani A., Faujdar N., Kumar S. ‘Design and evaluation of cubic Torus
Network-on-Chip architecture’. International Journal of Innovative
Technology and Exploring Engineering (IJITEE). 2019, vol. 8(6), pp.
2278–3075.
[11] Dubey G., Kumar S., Kumar S., Navaney P. ‘Extended opinion lexicon and
ML-based sentiment analysis of tweets: a novel approach towards accurate
classifier’. International Journal of Computational Vision and Robotics.
2020, vol. 10(6), pp. 505–21.
[12] Singh P., Bansal A., Kamal A.E., Kumar S. ‘Road surface quality monitoring us-
ing machine learning algorithm’ in Reddy A.N.R., Marla D., Favorskaya M.N.,
Satapathy S.C. (eds.). Intelligent Manufacturing and Energy Sustainability.
Smart Innovation, Systems and Technologies. 265. Singapore: Springer; 2022.
Conclusion 399
[13] Kumar S., Ranjan P., Radhakrishnan R., Tripathy M.R. ‘Energy aware dis-
tributed protocol for heterogeneous wireless sensor network’. International
Journal of Control and Automation. 2015, vol. 8(10), pp. 421–30.
[14] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘A utility maximiza-
tion approach to MAC layer channel access and forwarding’. Progress in
Electromagnetics Research Symposium. 2015, pp. 2363–7.
[15] Kumar S., Ranjan P., Ramaswami R., Tripathy M.R. ‘An NS3 implementa-
tion of physical layer based on 802.11 for utility maximization of WSN’. 2015
International Conference on Computational Intelligence and Communication
Networks; Jabalpur, India, 12-14 Dec; 2016. pp. 79–84.
[16] Sharma A., Awasthi Y., Kumar S. ‘The role of blockchain, AI and IoT for smart
road traffic management system’. 2020 IEEE India Council International
Subsections Conference, (INDISCON); Visakhapatnam, India, 03-04 Oct;
2020. pp. 289–96.
[17] Singh P., Bansal A., Kumar S. ‘Performance analysis of various information
platforms for recognizing the quality of indian roads’. Proceedings of the
Confluence 2020 - 10th International Conference on Cloud Computing, Data
Science and Engineering; Noida, India, 2020. pp. 63–76.
[18] Kumar S., Ranjan P., Singh P., Tripathy M.R. ‘Design and implementation of
fault tolerance technique for Internet of things (IoT)’. 2020 12th International
Conference on Computational Intelligence and Communication Networks
(CICN); Bhimtal, India, 25-26 Sep; 2020. pp. 154–9.
[19] Singh P., Bansal A., Kumar S. ‘Road monitoring and sensing system using
mobile sensors’. 2022 12th International Conference on Cloud Computing,
Data Science & Engineering (Confluence); Noida, India, IEEE, 2022. pp.
165–70.
[20] Reghu S., Kumar S. ‘Development of robust infrastructure in networking to
survive a disaster’. 2019 4th International Conference on Information Systems
and Computer Networks (ISCON ); Mathura, India, 21-22 Nov; 2019. pp.
250–5.
[21] Chauhan R., Kumar S. ‘Packet loss prediction using artificial intelligence uni-
fied with big data analytics, Internet of things and cloud computing technolo-
gies’. 5th International Conference on Information Systems and Computer
Networks (ISCON); Mathura, India, 22-23 Oct; 2021. pp. 1–6.
This page intentionally left blank
Index
actuators 14, 15, 24, 258, 270 association rule mining 61, 63, 67, 383
Ad hoc On-Demand Distance Vector algorithm 67
(AODV) 238–9 ATMEL 89s52 microprocessor 311
Ad hoc On-Demand Distance Vector RP attack detection 361–2
(AODV RP) 238 audio forensics 107
Ad hoc OnDemand Multipath Distance autocorrelation 200
Vector (AOMDV) 238–9
Ad hoc OnDemand Multipath Distance
Vector RP (AOMDV RP) 239 backup 119–20
Adafruit 25 Beaglebone 25
Advance Encryption Standard (AES) biclique attack 172
170–2 Big Data Analytics 15, 96
aim-based networking 26 advantages 100
AMBA Advanced Extensible Interface audio forensics 107
Protocol 40 challenges 16
AMC 236–8, 245, 247 classification 97–9
analog turbidity sensor 216 cloud and computing 16–17
anomaly detection 22 cyber crime 105–9
Application programming interfaces decision trees 107
(APIs) 19, 140–1 distributed processing 17–18
Application-specific integrated circuits in digital forensics 108–6
(ASICs) 2, 35, 42, 46 edge computing 16, 17
Arduino 25, 310, 315, 316, 318, examples 94–5
320, 332 image forensics 107
Arduino IDE 215, 218–220, 316, 320 implementation layer 101
Arduino UNO 309–312, 316, 345 issues 16
Arduino-based health-care life cycle 96–7
application 304 MapReduce 106–7
ARM processors 40 neural networks 107
Artificial intelligence (AI) processing techniques
convolutional neural network (CNN) MapReduce 102–5
with 296 traditional method 101–2
in networks 21, 26 random forests 107
Artificial neural network (ANN) 299, real-time edge analytics for 109–11
341, 342, 349 security issues associated with 108
402 Intelligent network design driven by Big Data analytics
edge computing 16, 17, 263 Gain ratio attribute evaluator (GRAE)
advantages 261–2 3, 61, 63, 70, 77, 80, 83–6
challenges 272–3 global variables 24
in Covid-19 pandemic era 271–2 Greedy-based Service Orientation
disadvantages 262 Algorithm 139
in healthcare 264–5, 268–9 GSM/General Packet Radio Service
modelling and simulation tools (GPRS) module 309
269–71 GSM/GPRS module 317, 319
Educational data mining (EDM) gyroscope sensor 314
62–5, 70
applications 72
association rule mining algorithm 67 Hadoop Distributed File system
classification algorithms 66 (HDFS) 18, 109
clustering algorithms 66–7 hard drive recovery 118
process 65–6 Hardware Verification Languages
Efficient routing grounded on taxonomy (HVLs) 42
(ERGOT) 122 healthcare
Electronic Design Automations cloud computing in 258–61
(EDAs) 36 deep learning algorithms in 267
Enriched Genetic Algorithm edge computing in 264–5, 268–9
(EGA) 126 IoT in 263–4, 268–9
Equifax (EFX) 28 machine learning in 265–9
equivalent image processing 280 heartbeat sensor 317, 318, 346
ESP32 microcontroller 217, 219 heart diseases 329–30
ETSI-G5 technology 385 deep learning 330–1
Exponential weighted moving average IoT-based heart disease prediction
(EWMA) 361 system
advantages 350
Arduino UNO 345
facial recognition 22 heartbeat sensor 346
Fast Forward Motion Picture Experts limitations 350
Group (FFMPEG) 125 liquid crystal display (LCD)
Field Programmable Gate Array display 347, 348
(FPGA) 35, 36, 39–45 pressure sensor 347, 348
File System Tracker 125 related work 332–4
flutter-based android app 387, 389 results 350–1
Formal verification 36 temperature sensor 347
FPGA: See Field Programmable Gate risk factors 331
Array (FPGA) symptoms 330, 331
Fraud detection 64 types 330
full backup strategy 120 heart rate monitor sensor
Fuzzy inference system (FIS) 213 (MAX 30105) 313
Fuzzy logic controller (FLC) 6, High-security distribution and rake
213, 214, 220, 226, 227, technology (HS-DRT) 122
228, 230 humidity sensor (DHT11) 313
Index 405
This book will be useful to researchers, scientists, engineers, professionals, advanced students Edited by
and faculty members in ICTs, data science, networking, AI, machine learning and sensing. It
will also be of interest to professionals in data science, AI, cloud and IoT start-up companies, Sunil Kumar, Glenford Mapp and Korhan Cengiz
as well as developers and designers.