0% found this document useful (0 votes)
31 views5 pages

Data Mining and It's Applications in Healthcare

Uploaded by

Anil Rawat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views5 pages

Data Mining and It's Applications in Healthcare

Uploaded by

Anil Rawat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2023 6th International Conference on Contemporary Computing and Informatics (IC3I)

Data Mining and It’s Applications in Healthcare


Priyanka Kumari Baljinder Kaur Manik Rakhra
Department of Computer Science and Department of Computer Science and Assistant Professor
Engineering Engineering Department of Computer Science and
Lovely Professional University Lovely Professional University Engineering
Phagwara, India Phagwara, India Lovely Professional University
[email protected] [email protected] Phagwara, India
[email protected]
2023 6th International Conference on Contemporary Computing and Informatics (IC3I) | 979-8-3503-0448-0/23/$31.00 ©2023 IEEE | DOI: 10.1109/IC3I59117.2023.10397856

Arun Singh Anil Kumar Rawat


IEEE Member Assistant Professor
Department of Computer Science and Department of Computer Science and
Engineering Engineering
Lovely Professional University Lovely Professional University
Phagwara, India Phagwara, India
[email protected] [email protected]

Abstract— Digitalization has resulted in accumulation of efficiency of data extraction. It comprises of the compactness,
gigantic amount of data and that too at an alarming rate. efficient and the presentation aspects. So, the first step as an
Processing such a large amount of data is the biggest challenge input to the is the knowledge base which initiates with the
of the recent world. Extracting meaningful information and Knowledge discovery database.
ensuring correctness and preciseness of data is the core of
current requirements of digital world. The process involving
preprocessing of raw data, extracting meaningful information
and deducing conclusive evidence for decision making lead to
the field named as data mining. Data mining is a framework of
patterns and rules aiming at extracting the relationship or
hidden information from the enormous set of databases. The
paper targets the healthcare sector and the role of data mining
in it. It also presents the challenges pertaining to the healthcare
industry. The current trends and the future scope have also been
illustrated.

Keywords—component, formatting, style, styling, insert Figure 1. Knowledge Discovery in Database (KDD) process

I. INTRODUCTION KDD process is process of knowledge discovery wherein


the raw data is inspected for patterns, meaningful contexts so
Data mining is a critical research area which acts as a as to weed out unnecessary jargons in the data which could
catalyst in the process of knowledge discovery database. The mislead the observation and context of the knowledge. It
21th century has witnessed the exponential accumulation of includes the selection of relevant data which is then pre-
data, thanks to the advancements in technologies. Social processed. It includes the following steps as discussed in[1].
platforms like YouTube, Facebook, twitter and financial
inclusion using the digital platforms like Paytm, BHIM app A. Data Cleaning: It is the very first step in the process of
etc. has eased the way the world interact and consumes pre-processing. It includes the raw data with missing
services. It has further seen acceleration due to the COVID19 values, outliers and noise enriched data (for example,
pandemic and its restrictions and the need for isolation. At the Medical Imaging data, Satellite data related to weather
same time, it has posed a serious threat in terms of processing forecast, volcanic eruption and many more). Processing
and extracting meaningful information and deducing inaccurate data is not only a wastage of time and
conclusion for effective decision making. Hence, among the resources but could have life threatening consequences.
various different domains like education, finance,
environmental concerns, the need for the uninterrupted and B. Integration: This step aggregates data from multiple
easily available healthcare services through the digital sources, thereby verifies the credibility of the data but
platform in today’s world is paramount. Hence this paper also enriches the depth of information extraction and
summarizes the role of data mining in healthcare sector along validates data accuracy as well. It contributes to the data
with its applications, issues and the future ahead. co-relation analysis and minimize the data redundancy.
C. Transformation: It is a process of feeding the data
Data mining is critical aspect wherein the quality of data relevant to the data mining. It involves smoothening,
acts as a key role to decide the fate of correctness and aggregation, normalization, reduction and binning of

1694
979-8-3503-0448-0/22/$31.00 ©2023 IEEE

Authorized licensed use limited to: Lovely Professional University - Phagwara. Downloaded on January 29,2024 at 09:08:08 UTC from IEEE Xplore. Restrictions apply.
2023 6th International Conference on Contemporary Computing and Informatics (IC3I)

data (for example Digital image) using methods apt C. Scikit-learn: It is a machine learning library for
methods like python programing that provides data mining and
data analysis services. However, this tool includes
1. Regression: The idea behind this technique is to find
only the command line interface, hence it is not easy
a common relation or mathematical pattern in which
to be use to it.
the data is distributed in space. Linear regression in
two variable is the simplest and distinctive method D. Spark: It is used for large scale data processing
which could further extend to multi- variable capabilities across various nodes. It demands high
regression in case of complex data entanglement. speed capabilities. It can run 100 times fast in
memory with respect to Hadoop MapReduce.
2. Clustering: It is a process of associating selective
attributes to a set of data which may differ E. R: It is an environment enabled programming
significantly with respect to the other cluster. This language for graphics and statistical computing. The
technique builds clear boundary conditions and at source codes can be written in Fortran, C++ and R.
times also provides the closeness of relation with the It provides the ability for the compatibility of codes
attributes between varied data. Clustering provides with multiple languages and hence powerful tool
the outlier information between different clusters of capable of handling various data mining tasks.
data and the level of correlation they observe in a However, R requires advanced technical knowledge
crowded space. to program.
3. Binning: It is a top to down splitting technique
III. PREPARE YOUR PAPER BEFORE STYLING
useful in data smoothening and discretization. It
contributes to generation of complex hierarchy The architecture of data mining follows a multi-layer
knowledge base. It is an unsupervised learning approach. The lower layer are the source of data that gather
technique which means that it doesn’t have any via various sources like the world wide web, database, data-
predefined learning agent guiding. ware house or any other repository. However, at this stage,
the data is in its raw form with mission values, abstract
4. Histogram: It is also an unsupervised learning
content, redundant data with needs to be processed before
technique where in the data is distributed in a fixed
working on it. The next layer deals with the pre-processing
range (buckets) and hence it discretizes the data. It
tools like data cleaning, integration and selection of relevant
contributes to synthesis of complex concepts into
data. The next layer comprises of the Data warehouse servers
simpler easier interpretation.
capable of handling and processing large amount of data. The
5. Entropy: It relates to the randomness of data pre- processed data is then sent as an input to the data mining
distributed in the data space. It also deals with data engine with various mining tools to process the data. The
discretization however; it is a supervised technique processed data is used to predict patterns useful for making
with top-down approach. decisions. The resulting decisions can be projected in the
form of graphical presentation to the GUI. Data mining
II. DATA MINING TOOLS FOR HEALTHCARE APPLICATIONS Task[4][5] includes the following as mentioned in the
[2] figure3.
A. RapidMiner : It is a stand-alone, open-source
advanced java-based analytical tool for end to end
data mining applications. It supports data
visualization, having Graphical User Interface
capable of supporting varied computer
environments. It includes API capable of handling
complex connections with other tools like R and
spark.
B. KNIME: It is a java based analytical end to end tool
that analyses, transforms, integrates and deploys
data. It is a significant tool considering the pre-
processing, cleaning and mining tasks. It has the
ability to link with other data mining tool. The main
advantage of KNIME includes easy integration with
other data mining tools, ability to perform task on
varied types of data. However, it lacks error Figure 2. Architecture of Data mining
measurement tools, issues in descriptor selection.

1695

Authorized licensed use limited to: Lovely Professional University - Phagwara. Downloaded on January 29,2024 at 09:08:08 UTC from IEEE Xplore. Restrictions apply.
2023 6th International Conference on Contemporary Computing and Informatics (IC3I)

hardware requirement, transmission challenges and many


more. However, in the 21st century all these challenges are
overcome due to mushrooming and exponential growth in all
these domains. Convolutional Neural Network and their
advance variants along with a new subset of branch aka Deep
Learning [2] has given the wings for the researchers to
explore new possibilities of solutions to the problems never
addressed before effectively. It is like a black box with
seemingly extreme potential but how it works is still a
mystery and therefore it is not standardized and cannot be
relied on especially in the area of medical healthcare.

B. 4G and foundation of 5G

No technology can have a wider impact unless the


transmission of data is not fast and global. 4G has
revolutionized the world and the impact is visible in almost
all sphere of individual’s daily life. Digital Payments,
ADHAAR and many other services are built on the
fundamental foundation of high-speed internet. The
mushrooming of online social media like Facebook, twitter,
YouTube and many more are reliant applications on internet.
Figure 3. Data Mining Tasks Privacy has become a critical aspect in ensuring digital
communication. Recently false news circulation has
IV. DATA MINING METHODS [3] demanded online platform rules as this false information
could have a drastic impact on the global market. Recent
Anomaly detection[8][9] considers standard support pandemic due to the corona virus is a relevant example of the
vector data description, density induced support vector data worst that can happen due to globalization.
description, Gaussian mixture. Clustering includes the vector
quantization method and Classification[7][10] deals with Emergence of 5G technology has the potential to
statistical, discriminant analysis, decision tree, Markov improve the connectively and move the technology to the
based, swarm intelligence, k-nearest neighbor, genetic next level, magnetizing the other technology to improvise so
classifiers, artificial neural network, support vector and resulting in collective growth in technology. Such
association rule. Various Algorithms of Data mining are advancement has a significant impact in the medical health
discussed in [8][11]. treatment to be available at ease to the remotest of the places.
However, to make it available round the clock and during
Applications of Data mining in healthcare critical data transmission is the key challenge to be addressed
by the researcher’s community.
Evolution of mankind is a continuous process of learning
and enhancing the quality of life. The beauty of evolution of C. Internet of Things (IoT)
especially the mankind is in its ability to pass to its next
generation, the learning and experiences the level of comfort IoT is a technology which can make any object or thing
it is able to achieve. Along with it the genetic information smart in a way that they can be connected to each other
with a much higher probability of survival as per the collectively connected to the cloud to the user for round the
Darwin’s principle of the survival of the fittest. Discovery of clock data collection and decision making. Not only this now
fire, wheel, iron and many more. creativity and Innovation there are new terminologies like Internet of Medical Things,
abilities has enhanced the life style of the mankind with Internet of Nano Things and so on further sub dividing the
respect to any other creature.[26,27] IoT segment and contributing in the efficient working of the
society. This platform also demands seamless internet
V. TECHNOLOGIES AND ITS IMPACT ON DATA MINING connectivity, network stability, big data and intelligent
systems for extracting relevant information from the
A. Artificial Neural Network (ANN) redundant data.

ANN is a field of science which aims at mimicking the D. Cloud Computing


human intelligence so as to make decision of their own. Early
introduction had limited scope due to the lack of A common platform from where the data can be made
technological advancements in terms of processing speed, available to all its beneficiaries is the sole purpose of cloud

1696

Authorized licensed use limited to: Lovely Professional University - Phagwara. Downloaded on January 29,2024 at 09:08:08 UTC from IEEE Xplore. Restrictions apply.
2023 6th International Conference on Contemporary Computing and Informatics (IC3I)

computing. However, as discussed earlier, no technology can 2. Energy: A wearable device [12] needs energy source
work in isolation but in tune with all other aspect for on like battery and since it is desired to be comfortable
optimized output. to the individual, its size needs to be small. Hence it
become critical in optimizing the device without
E. Big Data restricting the usage of such personalized devices
under constraints environment. Network stability,
Data will be the raw material required for any relevant packet loss ratio and throughput are the factors along
application. However, collecting and segregating the data with network strength are important in ensuring
along with storage can be handled by another concept of Big energy efficient transmission. Nano-materials for
Data. Internet of Nano things can help in ensuring low
energy requirements.
F. Wireless Sensor Networks
3. Security and Privacy: One major challenge the
The application part of ease and effective solution can be medical community is facing in research and
provided by the wireless network[12-19] wherein multiple development is the privacy concern [24][25] of the
sensors connected through a sink node to the cloud can individual. Government intervention in such case is
efficiently improve the productivity in various areas. For required and at the same time, the researchers have
example, medical healthcare, agriculture, mining, security to provide a safe and secure environment in ensuring
and many more. Wireless body area network can have a data security to create awareness and trust in the
significant implication in healthcare within a personalized society. The data of an individual can pose life
device can provide information related to the individual threatening challenge in case of online prescription
health, thereby eradicating the possibility of manifesting of and medication been altered by the intruder. The
any disease, i.e. moving for curing to developing a healthy cyber laws in this direction needs to be made so as
lifestyle. to ensure and secure the data. Wearable technology
is the future but it requires lot of efforts to bring it to
VI. CHALLENGES AHEAD IN HEALTHCARE INDUSTRY trust worthy application as the matter we are dealing
with is of life and death.
1. Data: Data itself is a challenge in realizing a
personalized healthcare: 4. Medical Accreditation: There are a number of
medical devices in the market, however many of
They are summarized as under: them are not medically accredited. So, to ensure the
device to work under standard protocol, the
a. Data Size: The amount of data collected by the government has to ensure global standard for
sensors poses challenge in storing and wearable personalized devices and their scope.
transmitting it. It also adds to the redundancy of
data. The possible solution to this could be to 5. Other Services: A wearable device is not self-
design an intelligent system which could extract functioning. It requires a range of technologies
the significant change in data and only those working collectively and efficiently without fault
extreme changes be transmitted to the cloud extracting and transmitting accurate information
through the sink. along with a platform wherein the medical
community to extract the meaningful information at
b. Data Type: As the sensors are connected to the the earliest with ease and the decision making be
different part of the body collecting varied precise. To achieve such an environment, is a
aspect of human physiology, the segregation, challenging task ahead of the researchers and
network space, internet and transmission size scientist.
differs. To address this issue, differentiation of
these data types is important so as to ensure data VII. CONCLUSION
throughput and faster delivery of critical
information which could be life threatening. Data mining has significant applications in healthcare,
ranging from disease diagnosis and treatment to drug
c. Critical Data: To ensure safety of individual in discovery and healthcare management. The healthcare
crunch situations, one has to be sure of the industry is generating vast amounts of data every day, and
critical data be transmitted on priority basis. To data mining techniques can help extract meaningful insights
provide better routing platform to the critical from this data.
data, extraction of such data is of utmost
importance that needs to be addressed. One of the most important applications of data mining in
healthcare is disease diagnosis and prediction. Data mining

1697

Authorized licensed use limited to: Lovely Professional University - Phagwara. Downloaded on January 29,2024 at 09:08:08 UTC from IEEE Xplore. Restrictions apply.
2023 6th International Conference on Contemporary Computing and Informatics (IC3I)

algorithms can be used to analyze patient data and identify [13] H. Joudaki et al., “Using data mining to detect health care fraud and
abuse: a review of literature,” Glob. J. Health Sci., vol. 7, no. 1, pp.
patterns that can be used to predict the likelihood of certain
194–202, 2015, doi: 10.5539/gjhs.v7n1p194.
diseases. This can help healthcare providers to intervene early [14] D. P. Shukla, S. B. Patel, and A. K. Sen, “A Literature Review in Health
and provide timely treatment, resulting in better patient Informatics Using Data Mining Techniques,” Int. J. Softw. Hardw. Res.
outcomes. Another important application of data mining in Eng., vol. 2, no. 2, pp. 123–129, 2014.
[15] C. C. Aggarwal, Managing and mining sensor data, vol. 9781461463.
healthcare is drug discovery. Data mining can help
2014. doi: 10.1007/978-1-4614-6309-2.
researchers analyze large volumes of data from clinical trials [16] H. Banaee, M. U. Ahmed, and A. Loutfi, “Data mining for wearable
and identify potential drug candidates that are safe and sensors in health monitoring systems: A review of recent trends and
effective. This can help accelerate the drug discovery process challenges,” Sensors (Switzerland), vol. 13, no. 12, pp. 17472–17500,
2013, doi: 10.3390/s131217472.
and lead to the development of new treatments for a range of
[17] M. K. Dath, M. Rakhra, D. Singh, A. Singh and R. Banala, "Basic
diseases. Data mining can also be used to improve healthcare design for the implementation of automatic surveillance system on
management. By analyzing data from electronic health helmet detection," 2022 4th International Conference on Artificial
records, healthcare providers can identify areas where they Intelligence and Speech Technology (AIST), Delhi, India, 2022, pp. 1-
5, doi: 10.1109/AIST55798.2022.10065367.
can improve their services and reduce costs. This can help
[18] M. D. K. Islam Zim, M. Rakhra, D. Singh and A. Singh, "Noise
healthcare organizations to operate more efficiently and Reduction and dehazing of Visual Data," 2022 4th International
provide better care to their patients. In conclusion, data Conference on Artificial Intelligence and Speech Technology (AIST),
mining has significant applications in healthcare and has the Delhi, India, 2022, pp. 1-5, doi: 10.1109/AIST55798.2022.10065069.
[19] S. Katoch, M. Rakhra and D. Singh, "Recognition Of Handwritten
potential to revolutionize the way healthcare is delivered. By
English Character Using Convolutional Neural Network," 2022 4th
using data mining techniques to analyze large volumes of International Conference on Artificial Intelligence and Speech
data, healthcare providers can improve patient outcomes, Technology (AIST), Delhi, India, 2022, pp. 1-6, doi:
accelerate drug discovery, and improve healthcare 10.1109/AIST55798.2022.10064860.
[20] M. A. Hanif, H. Kaur, M. Rakhra and A. Singh, "Role of CBIR In a
management.
Different fields-An Empirical Review," 2022 4th International
Conference on Artificial Intelligence and Speech Technology (AIST),
REFERENCES Delhi, India, 2022, pp. 1-7, doi: 10.1109/AIST55798.2022.10064825.
[21] R. Kumar Shukla, M. Rakhra, D. Singh and A. Singh, "The Role of
[1] S. A. Alasadi and W. S. Bhaya, “Review of data preprocessing Machine Learning in Health Care Diagnosis," 2022 4th International
techniques in data mining,” J. Eng. Appl. Sci., vol. 12, no. 16, pp. Conference on Artificial Intelligence and Speech Technology (AIST),
4102–4107, 2017, doi: 10.3923/jeasci.2017.4102.4107. Delhi, India, 2022, pp. 1-6, doi: 10.1109/AIST55798.2022.10064906.
[2] J. Santos-Pereira, L. Gruenwald, and J. Bernardino, “Top data mining [22] A. Pandey, A. Singh, G. Singh and M. Rakhra, "Using AI and IoT to
tools for the healthcare industry,” J. King Saud Univ. - Comput. Inf. assess the efficacy of English-language curricula in higher education:
Sci., vol. 34, no. 8, pp. 4968–4982, 2022, doi: A Proposed Method," 2022 4th International Conference on Artificial
10.1016/j.jksuci.2021.06.002. Intelligence and Speech Technology (AIST), Delhi, India, 2022, pp. 1-
[3] N. Jothi, N. A. Rashid, and W. Husain, “Data Mining in Healthcare - A 6, doi: 10.1109/AIST55798.2022.10065284.
Review,” Procedia Comput. Sci., vol. 72, pp. 306–313, 2015, doi: [23] A. Singh and M. Rakhra, "A Review For Different Sign Language
10.1016/j.procs.2015.12.145. Recognition Systems," 2022 4th International Conference on Artificial
[4] S. C. Pandey, “Data mining techniques for medical data: A review,” Intelligence and Speech Technology (AIST), Delhi, India, 2022, pp. 1-
Int. Conf. Signal Process. Commun. Power Embed. Syst. SCOPES 6, doi: 10.1109/AIST55798.2022.10065037.
2016 - Proc., pp. 972–982, 2017, doi: [24] A. Ansari, B. Kaur, M. Rakhra, A. Singh and D. Singh, "Handwritten
10.1109/SCOPES.2016.7955586. Text Recognition using Deep Learning Algorithms," 2022 4th
[5] Kaur, B., Batra, S., Arora, V. (2019). ERA-AODV for MANETs to International Conference on Artificial Intelligence and Speech
Improve Reliability in Energy-Efficient Way. In: Panigrahi, B., Technology (AIST), Delhi, India, 2022, pp. 1-6, doi:
Trivedi, M., Mishra, K., Tiwari, S., Singh, P. (eds) Smart Innovations 10.1109/AIST55798.2022.10065348.
in Communication and Computational Sciences. Advances in [25] J. Iavindrasana, G. Cohen, A. Depeursinge, H. Müller, R. Meyer, and
Intelligent Systems and Computing, vol 669. Springer, Singapore. A. Geissbuhler, “Clinical data mining: a review.,” Yearb. Med. Inform.,
https://doi.org/10.1007/978-981-10-8968-8_17. pp. 121–133, 2009, doi: 10.1055/s-0038-1638651.
[6] Xie, Y., Beram, S. M., Kaur, B., Neware, R., Rakhra, M., & Koundal, [26] S. G. Alonso et al., “Data Mining Algorithms and Techniques in
D. (2023). Research on Visualization of Large-scale User Association Mental Health: A Systematic Review,” J. Med. Syst., vol. 42, no. 9,
Feature Data Based on Nonlinear Dimension Reduction Method. 2018, doi: 10.1007/s10916-018-1018-2.
Journal of Mobile Multimedia, 587-602. [27] I. Yoo et al., “Data mining in healthcare and biomedicine: A survey of
[7] S. Neelamegam and E. Ramaraj, “Classification algorithm in Data the literature,” J. Med. Syst., vol. 36, no. 4, pp. 2431–2448, 2012, doi:
mining : An Overview,” Int. J. P2P Netw. Trends Technol., vol. 4, no. 10.1007/s10916-011-9710-5.
8, pp. 369–374, 2013.
[8] A. Zimek and P. Filzmoser, “There and back again: Outlier detection
between statistical reasoning and data mining algorithms,” Wiley
Interdiscip. Rev. Data Min. Knowl. Discov., vol. 8, no. 6, pp. 1–26,
2018, doi: 10.1002/widm.1280.
[9] S. Agrawal and J. Agrawal, “Survey on anomaly detection using data
mining techniques,” Procedia Comput. Sci., vol. 60, no. 1, pp. 708–
713, 2015, doi: 10.1016/j.procs.2015.08.220.
[10] G. Kesavaraj and S. Sukumaran, “06726842,” 2013.
[11] I. Journal, “IJERA (www.ijera.com)”.
[12] Y. Qu, G. Zheng, H. Ma, X. Wang, B. Ji, and H. Wu, “A survey of
routing protocols in WBAN for healthcare applications,” Sensors
(Switzerland), vol. 19, no. 7, 2019, doi: 10.3390/s19071638.

1698

Authorized licensed use limited to: Lovely Professional University - Phagwara. Downloaded on January 29,2024 at 09:08:08 UTC from IEEE Xplore. Restrictions apply.

You might also like