Data Mining and It's Applications in Healthcare
Data Mining and It's Applications in Healthcare
Abstract— Digitalization has resulted in accumulation of efficiency of data extraction. It comprises of the compactness,
gigantic amount of data and that too at an alarming rate. efficient and the presentation aspects. So, the first step as an
Processing such a large amount of data is the biggest challenge input to the is the knowledge base which initiates with the
of the recent world. Extracting meaningful information and Knowledge discovery database.
ensuring correctness and preciseness of data is the core of
current requirements of digital world. The process involving
preprocessing of raw data, extracting meaningful information
and deducing conclusive evidence for decision making lead to
the field named as data mining. Data mining is a framework of
patterns and rules aiming at extracting the relationship or
hidden information from the enormous set of databases. The
paper targets the healthcare sector and the role of data mining
in it. It also presents the challenges pertaining to the healthcare
industry. The current trends and the future scope have also been
illustrated.
Keywords—component, formatting, style, styling, insert Figure 1. Knowledge Discovery in Database (KDD) process
1694
979-8-3503-0448-0/22/$31.00 ©2023 IEEE
Authorized licensed use limited to: Lovely Professional University - Phagwara. Downloaded on January 29,2024 at 09:08:08 UTC from IEEE Xplore. Restrictions apply.
2023 6th International Conference on Contemporary Computing and Informatics (IC3I)
data (for example Digital image) using methods apt C. Scikit-learn: It is a machine learning library for
methods like python programing that provides data mining and
data analysis services. However, this tool includes
1. Regression: The idea behind this technique is to find
only the command line interface, hence it is not easy
a common relation or mathematical pattern in which
to be use to it.
the data is distributed in space. Linear regression in
two variable is the simplest and distinctive method D. Spark: It is used for large scale data processing
which could further extend to multi- variable capabilities across various nodes. It demands high
regression in case of complex data entanglement. speed capabilities. It can run 100 times fast in
memory with respect to Hadoop MapReduce.
2. Clustering: It is a process of associating selective
attributes to a set of data which may differ E. R: It is an environment enabled programming
significantly with respect to the other cluster. This language for graphics and statistical computing. The
technique builds clear boundary conditions and at source codes can be written in Fortran, C++ and R.
times also provides the closeness of relation with the It provides the ability for the compatibility of codes
attributes between varied data. Clustering provides with multiple languages and hence powerful tool
the outlier information between different clusters of capable of handling various data mining tasks.
data and the level of correlation they observe in a However, R requires advanced technical knowledge
crowded space. to program.
3. Binning: It is a top to down splitting technique
III. PREPARE YOUR PAPER BEFORE STYLING
useful in data smoothening and discretization. It
contributes to generation of complex hierarchy The architecture of data mining follows a multi-layer
knowledge base. It is an unsupervised learning approach. The lower layer are the source of data that gather
technique which means that it doesn’t have any via various sources like the world wide web, database, data-
predefined learning agent guiding. ware house or any other repository. However, at this stage,
the data is in its raw form with mission values, abstract
4. Histogram: It is also an unsupervised learning
content, redundant data with needs to be processed before
technique where in the data is distributed in a fixed
working on it. The next layer deals with the pre-processing
range (buckets) and hence it discretizes the data. It
tools like data cleaning, integration and selection of relevant
contributes to synthesis of complex concepts into
data. The next layer comprises of the Data warehouse servers
simpler easier interpretation.
capable of handling and processing large amount of data. The
5. Entropy: It relates to the randomness of data pre- processed data is then sent as an input to the data mining
distributed in the data space. It also deals with data engine with various mining tools to process the data. The
discretization however; it is a supervised technique processed data is used to predict patterns useful for making
with top-down approach. decisions. The resulting decisions can be projected in the
form of graphical presentation to the GUI. Data mining
II. DATA MINING TOOLS FOR HEALTHCARE APPLICATIONS Task[4][5] includes the following as mentioned in the
[2] figure3.
A. RapidMiner : It is a stand-alone, open-source
advanced java-based analytical tool for end to end
data mining applications. It supports data
visualization, having Graphical User Interface
capable of supporting varied computer
environments. It includes API capable of handling
complex connections with other tools like R and
spark.
B. KNIME: It is a java based analytical end to end tool
that analyses, transforms, integrates and deploys
data. It is a significant tool considering the pre-
processing, cleaning and mining tasks. It has the
ability to link with other data mining tool. The main
advantage of KNIME includes easy integration with
other data mining tools, ability to perform task on
varied types of data. However, it lacks error Figure 2. Architecture of Data mining
measurement tools, issues in descriptor selection.
1695
Authorized licensed use limited to: Lovely Professional University - Phagwara. Downloaded on January 29,2024 at 09:08:08 UTC from IEEE Xplore. Restrictions apply.
2023 6th International Conference on Contemporary Computing and Informatics (IC3I)
B. 4G and foundation of 5G
1696
Authorized licensed use limited to: Lovely Professional University - Phagwara. Downloaded on January 29,2024 at 09:08:08 UTC from IEEE Xplore. Restrictions apply.
2023 6th International Conference on Contemporary Computing and Informatics (IC3I)
computing. However, as discussed earlier, no technology can 2. Energy: A wearable device [12] needs energy source
work in isolation but in tune with all other aspect for on like battery and since it is desired to be comfortable
optimized output. to the individual, its size needs to be small. Hence it
become critical in optimizing the device without
E. Big Data restricting the usage of such personalized devices
under constraints environment. Network stability,
Data will be the raw material required for any relevant packet loss ratio and throughput are the factors along
application. However, collecting and segregating the data with network strength are important in ensuring
along with storage can be handled by another concept of Big energy efficient transmission. Nano-materials for
Data. Internet of Nano things can help in ensuring low
energy requirements.
F. Wireless Sensor Networks
3. Security and Privacy: One major challenge the
The application part of ease and effective solution can be medical community is facing in research and
provided by the wireless network[12-19] wherein multiple development is the privacy concern [24][25] of the
sensors connected through a sink node to the cloud can individual. Government intervention in such case is
efficiently improve the productivity in various areas. For required and at the same time, the researchers have
example, medical healthcare, agriculture, mining, security to provide a safe and secure environment in ensuring
and many more. Wireless body area network can have a data security to create awareness and trust in the
significant implication in healthcare within a personalized society. The data of an individual can pose life
device can provide information related to the individual threatening challenge in case of online prescription
health, thereby eradicating the possibility of manifesting of and medication been altered by the intruder. The
any disease, i.e. moving for curing to developing a healthy cyber laws in this direction needs to be made so as
lifestyle. to ensure and secure the data. Wearable technology
is the future but it requires lot of efforts to bring it to
VI. CHALLENGES AHEAD IN HEALTHCARE INDUSTRY trust worthy application as the matter we are dealing
with is of life and death.
1. Data: Data itself is a challenge in realizing a
personalized healthcare: 4. Medical Accreditation: There are a number of
medical devices in the market, however many of
They are summarized as under: them are not medically accredited. So, to ensure the
device to work under standard protocol, the
a. Data Size: The amount of data collected by the government has to ensure global standard for
sensors poses challenge in storing and wearable personalized devices and their scope.
transmitting it. It also adds to the redundancy of
data. The possible solution to this could be to 5. Other Services: A wearable device is not self-
design an intelligent system which could extract functioning. It requires a range of technologies
the significant change in data and only those working collectively and efficiently without fault
extreme changes be transmitted to the cloud extracting and transmitting accurate information
through the sink. along with a platform wherein the medical
community to extract the meaningful information at
b. Data Type: As the sensors are connected to the the earliest with ease and the decision making be
different part of the body collecting varied precise. To achieve such an environment, is a
aspect of human physiology, the segregation, challenging task ahead of the researchers and
network space, internet and transmission size scientist.
differs. To address this issue, differentiation of
these data types is important so as to ensure data VII. CONCLUSION
throughput and faster delivery of critical
information which could be life threatening. Data mining has significant applications in healthcare,
ranging from disease diagnosis and treatment to drug
c. Critical Data: To ensure safety of individual in discovery and healthcare management. The healthcare
crunch situations, one has to be sure of the industry is generating vast amounts of data every day, and
critical data be transmitted on priority basis. To data mining techniques can help extract meaningful insights
provide better routing platform to the critical from this data.
data, extraction of such data is of utmost
importance that needs to be addressed. One of the most important applications of data mining in
healthcare is disease diagnosis and prediction. Data mining
1697
Authorized licensed use limited to: Lovely Professional University - Phagwara. Downloaded on January 29,2024 at 09:08:08 UTC from IEEE Xplore. Restrictions apply.
2023 6th International Conference on Contemporary Computing and Informatics (IC3I)
algorithms can be used to analyze patient data and identify [13] H. Joudaki et al., “Using data mining to detect health care fraud and
abuse: a review of literature,” Glob. J. Health Sci., vol. 7, no. 1, pp.
patterns that can be used to predict the likelihood of certain
194–202, 2015, doi: 10.5539/gjhs.v7n1p194.
diseases. This can help healthcare providers to intervene early [14] D. P. Shukla, S. B. Patel, and A. K. Sen, “A Literature Review in Health
and provide timely treatment, resulting in better patient Informatics Using Data Mining Techniques,” Int. J. Softw. Hardw. Res.
outcomes. Another important application of data mining in Eng., vol. 2, no. 2, pp. 123–129, 2014.
[15] C. C. Aggarwal, Managing and mining sensor data, vol. 9781461463.
healthcare is drug discovery. Data mining can help
2014. doi: 10.1007/978-1-4614-6309-2.
researchers analyze large volumes of data from clinical trials [16] H. Banaee, M. U. Ahmed, and A. Loutfi, “Data mining for wearable
and identify potential drug candidates that are safe and sensors in health monitoring systems: A review of recent trends and
effective. This can help accelerate the drug discovery process challenges,” Sensors (Switzerland), vol. 13, no. 12, pp. 17472–17500,
2013, doi: 10.3390/s131217472.
and lead to the development of new treatments for a range of
[17] M. K. Dath, M. Rakhra, D. Singh, A. Singh and R. Banala, "Basic
diseases. Data mining can also be used to improve healthcare design for the implementation of automatic surveillance system on
management. By analyzing data from electronic health helmet detection," 2022 4th International Conference on Artificial
records, healthcare providers can identify areas where they Intelligence and Speech Technology (AIST), Delhi, India, 2022, pp. 1-
5, doi: 10.1109/AIST55798.2022.10065367.
can improve their services and reduce costs. This can help
[18] M. D. K. Islam Zim, M. Rakhra, D. Singh and A. Singh, "Noise
healthcare organizations to operate more efficiently and Reduction and dehazing of Visual Data," 2022 4th International
provide better care to their patients. In conclusion, data Conference on Artificial Intelligence and Speech Technology (AIST),
mining has significant applications in healthcare and has the Delhi, India, 2022, pp. 1-5, doi: 10.1109/AIST55798.2022.10065069.
[19] S. Katoch, M. Rakhra and D. Singh, "Recognition Of Handwritten
potential to revolutionize the way healthcare is delivered. By
English Character Using Convolutional Neural Network," 2022 4th
using data mining techniques to analyze large volumes of International Conference on Artificial Intelligence and Speech
data, healthcare providers can improve patient outcomes, Technology (AIST), Delhi, India, 2022, pp. 1-6, doi:
accelerate drug discovery, and improve healthcare 10.1109/AIST55798.2022.10064860.
[20] M. A. Hanif, H. Kaur, M. Rakhra and A. Singh, "Role of CBIR In a
management.
Different fields-An Empirical Review," 2022 4th International
Conference on Artificial Intelligence and Speech Technology (AIST),
REFERENCES Delhi, India, 2022, pp. 1-7, doi: 10.1109/AIST55798.2022.10064825.
[21] R. Kumar Shukla, M. Rakhra, D. Singh and A. Singh, "The Role of
[1] S. A. Alasadi and W. S. Bhaya, “Review of data preprocessing Machine Learning in Health Care Diagnosis," 2022 4th International
techniques in data mining,” J. Eng. Appl. Sci., vol. 12, no. 16, pp. Conference on Artificial Intelligence and Speech Technology (AIST),
4102–4107, 2017, doi: 10.3923/jeasci.2017.4102.4107. Delhi, India, 2022, pp. 1-6, doi: 10.1109/AIST55798.2022.10064906.
[2] J. Santos-Pereira, L. Gruenwald, and J. Bernardino, “Top data mining [22] A. Pandey, A. Singh, G. Singh and M. Rakhra, "Using AI and IoT to
tools for the healthcare industry,” J. King Saud Univ. - Comput. Inf. assess the efficacy of English-language curricula in higher education:
Sci., vol. 34, no. 8, pp. 4968–4982, 2022, doi: A Proposed Method," 2022 4th International Conference on Artificial
10.1016/j.jksuci.2021.06.002. Intelligence and Speech Technology (AIST), Delhi, India, 2022, pp. 1-
[3] N. Jothi, N. A. Rashid, and W. Husain, “Data Mining in Healthcare - A 6, doi: 10.1109/AIST55798.2022.10065284.
Review,” Procedia Comput. Sci., vol. 72, pp. 306–313, 2015, doi: [23] A. Singh and M. Rakhra, "A Review For Different Sign Language
10.1016/j.procs.2015.12.145. Recognition Systems," 2022 4th International Conference on Artificial
[4] S. C. Pandey, “Data mining techniques for medical data: A review,” Intelligence and Speech Technology (AIST), Delhi, India, 2022, pp. 1-
Int. Conf. Signal Process. Commun. Power Embed. Syst. SCOPES 6, doi: 10.1109/AIST55798.2022.10065037.
2016 - Proc., pp. 972–982, 2017, doi: [24] A. Ansari, B. Kaur, M. Rakhra, A. Singh and D. Singh, "Handwritten
10.1109/SCOPES.2016.7955586. Text Recognition using Deep Learning Algorithms," 2022 4th
[5] Kaur, B., Batra, S., Arora, V. (2019). ERA-AODV for MANETs to International Conference on Artificial Intelligence and Speech
Improve Reliability in Energy-Efficient Way. In: Panigrahi, B., Technology (AIST), Delhi, India, 2022, pp. 1-6, doi:
Trivedi, M., Mishra, K., Tiwari, S., Singh, P. (eds) Smart Innovations 10.1109/AIST55798.2022.10065348.
in Communication and Computational Sciences. Advances in [25] J. Iavindrasana, G. Cohen, A. Depeursinge, H. Müller, R. Meyer, and
Intelligent Systems and Computing, vol 669. Springer, Singapore. A. Geissbuhler, “Clinical data mining: a review.,” Yearb. Med. Inform.,
https://doi.org/10.1007/978-981-10-8968-8_17. pp. 121–133, 2009, doi: 10.1055/s-0038-1638651.
[6] Xie, Y., Beram, S. M., Kaur, B., Neware, R., Rakhra, M., & Koundal, [26] S. G. Alonso et al., “Data Mining Algorithms and Techniques in
D. (2023). Research on Visualization of Large-scale User Association Mental Health: A Systematic Review,” J. Med. Syst., vol. 42, no. 9,
Feature Data Based on Nonlinear Dimension Reduction Method. 2018, doi: 10.1007/s10916-018-1018-2.
Journal of Mobile Multimedia, 587-602. [27] I. Yoo et al., “Data mining in healthcare and biomedicine: A survey of
[7] S. Neelamegam and E. Ramaraj, “Classification algorithm in Data the literature,” J. Med. Syst., vol. 36, no. 4, pp. 2431–2448, 2012, doi:
mining : An Overview,” Int. J. P2P Netw. Trends Technol., vol. 4, no. 10.1007/s10916-011-9710-5.
8, pp. 369–374, 2013.
[8] A. Zimek and P. Filzmoser, “There and back again: Outlier detection
between statistical reasoning and data mining algorithms,” Wiley
Interdiscip. Rev. Data Min. Knowl. Discov., vol. 8, no. 6, pp. 1–26,
2018, doi: 10.1002/widm.1280.
[9] S. Agrawal and J. Agrawal, “Survey on anomaly detection using data
mining techniques,” Procedia Comput. Sci., vol. 60, no. 1, pp. 708–
713, 2015, doi: 10.1016/j.procs.2015.08.220.
[10] G. Kesavaraj and S. Sukumaran, “06726842,” 2013.
[11] I. Journal, “IJERA (www.ijera.com)”.
[12] Y. Qu, G. Zheng, H. Ma, X. Wang, B. Ji, and H. Wu, “A survey of
routing protocols in WBAN for healthcare applications,” Sensors
(Switzerland), vol. 19, no. 7, 2019, doi: 10.3390/s19071638.
1698
Authorized licensed use limited to: Lovely Professional University - Phagwara. Downloaded on January 29,2024 at 09:08:08 UTC from IEEE Xplore. Restrictions apply.