0% found this document useful (0 votes)
58 views9 pages

AI Paradigms: Data vs. Model-Centric

Uploaded by

nandhini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views9 pages

AI Paradigms: Data vs. Model-Centric

Uploaded by

nandhini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

FEATURE ARTICLE: AI

Technical Analysis of Data-Centric and


Model-Centric Artificial Intelligence
Abdul Majeed and Seong Oun Hwang , Gachon University, Seongnam, 13120, South Korea

The artificial intelligence (AI) field is going through a dramatic revolution in terms of new
horizons for research and real-world applications, but some research trajectories in AI are
becoming detrimental over time. Recently, there has been a growing call in the AI
community to combat a dominant research trend named model-centric AI (MC-AI), which
only fiddles with complex AI codes/algorithms. MC-AI may not yield desirable results when
applied to real-life problems like predictive maintenance due to limited or poor-quality data.
In contrast, a relatively new paradigm named data-centric (DC-AI) is becoming more
popular in the AI community. In this article, we discuss and compare MC-AI and DC-AI in
terms of basic concepts, working mechanisms, and technical differences. Then, we highlight
the potential benefits of the DC-AI approach to foster further research on this recent
paradigm. This pioneering work on DC-AI and MC-AI can pave the way to understand the
fundamentals and significance of these two paradigms from a broader perspective.

A
rtificial intelligence (AI) is a transformative number of new AI models, architectures, and low-
technology with a wide range of practical code/no-code tools have been developed. The abilities
applications in diverse sectors such as health of machine learning (ML) models are expanding from
care, defense, cybersecurity, and robotics. During the classification/prediction tasks to predictive mainte-
COVID-19 pandemic, AI was widely used to forecast nance and other complex tasks.3 Deep learning models
daily case tallies, gauge the efficacy of interventions, combined with the Internet of Things (IoT), and other
trace hidden routes of transmission, predict the course technologies are helping to combat the shortage of
of the epidemic, and analyze trends, to name just a few experts and resources in health-care sectors.4 Also,
applications.1 Recently, researchers/practitioners have advancements in federated learning (FL) and contras-
been expanding the horizon of AI applications from tive learning are improving the privacy and usability of
simple problems to global issues such as climate data. Developments in generative AI are assisting in
change. Addressing such global issues by utilizing AI curating more data to compensate for the deficiency
will have a very big impact on people around the of data and to improve the results of AI models. The lat-
globe.2 Apart from these applications, the amalgam- est generative AI tools, such as ChatGPT, have many
ation of AI with technologies like blockchain, edge innovative use cases (e.g., code writing, scientific paper
computing, and other Industry 4.0 technologies is rap- writing, answering questions, virtual assistants, and so
idly increasing and has opened up various innovative on).5 The forthcoming wave of AI will bring more pow-
use cases. Based on the discussion here, it is fair to say erful and innovative tools for diverse sectors.
that AI is on its way to becoming a very helpful tool for Before the inception of data-centric AI (DC-AI),
humans to execute many tasks. most of the efforts were put into a model-centric AI
Recently, AI has emerged as a strong competitor to (MC-AI) approach that puts special focus on improving
humans as it can perform many tasks in less time and the architectural aspects of AI models (e.g., modifying
with minimal cost than humans. More efforts are the network architecture, switching to a new model,
underway in developing artificial general intelligence reducing model size, and hyperparameter tuning). Using
systems (e.g., the systems that can closely mimic the this approach when an AI model fails to yield the required
way a human performs tasks). As a result, a large performance, developers only improve the architectural
aspects. This might not apply to some scenarios when
1520-9202 © 2023 IEEE
data are limited (or are of poor quality) and when further
Digital Object Identifier 10.1109/MITP.2023.3322410 data acquisition is difficult due to a limited budget.
Date of current version 12 January 2024. Another main drawback of this approach is 2 the data,

62
Authorized licensed useIT
limited to: Thiagarajar College of Engineering.
Professional Downloaded
Published by the on August
IEEE 07,2024Society
Computer at 04:41:07 UTC from IEEE Xplore. Restrictions
November/December apply.
2023
AI

meaning that if an AI model fails to yield the required pinpoints the potential benefits of DC-AI from a broader
accuracy, developers get more data irrespective of the perspective than previously anticipated.
fact that only a few images/features might be faulty. This
can waste time and effort and increase computing MC-AI AND DC-AI
overhead. Figure 1 illustrates the workflows of both DC-AI and
Thanks to the discovery by Prof. Andrew Ng, the MC-AI paradigms in real-world scenarios. We define
deficiency of large datasets can be easily overcome by both paradigms as follows:
rigorously using the DC-AI approach.6 In this approach,
when an AI model gives poor performance, developers In MC-AI, developers usually pay more atten-
need to inspect the data as well, rather than solely tion to optimizing the model’s codes while
improving the code. DC-AI can overcome the potential rarely inspecting the data. MC-AI can be for-
drawbacks of the MC-AI approach and reduce over- mally expressed in
head by collecting the required images/features, rather MC  AI ¼ C 0 þ D: (1)
than simply doubling the data. Furthermore, DC-AI can
increase the accuracy of convolutional neural network In DC-AI, developers need to look into the data,
(CNN) models by using even less, but good-quality, along with iteratively improving algorithms
data.7 It can be widely applicable to scenarios where and/or codes. Specifically, developers should
the commodity of data does not exist, or when getting iteratively investigate and enhance the data,
more data is difficult. Thus far, very little is known along with tweaking the AI model. DC-AI can be
about these two paradigms, and a concrete overview formally expressed in
of their workflow and key differences remains unex- DC  AI ¼ C þ D0 : (2)
plored. The main contributions of this work are summa-
In (1) and (2), C and D refer to code and data,
rized as follows:
respectively. The }0 } sign over C and D indicates the
We explore two schools of thought (DC-AI and priorities in the respective paradigm. In real settings, C
MC-AI) concerning the development of AI tech- is the code of any AI model, and D is data enclosed in
nology, and we identify opportunities to provide any modality (e.g., table, images, text, and so forth).
concrete technical details and insights about In MC-AI, developers improve the codes/algorithms
them. Specifically, we present a technical anal- only when the AI model yields poor results [step 8 in
ysis of the DC-AI and MC-AI paradigms, and we Figure 1(a)]. In contrast, both the data and the AI model’s
highlight the key differences between them. codes/algorithms are jointly inspected in DC-AI when
We pinpoint and describe six dimensions to the AI model yields poor results [step 8 in Figure 1(b)].
systematically highlight the MC-AI approach of Also, data are significantly improved in step 4 before
AI developments that remain unexplored in the being fed into AI models.
current literature.
We analyze different techniques that can be SIX NOTEWORTHY DIMENSIONS
vital in realizing DC-AI, and we group them into OF RESEARCH/DEVELOPMENTS
three levels to systematically demonstrate what IN MC-AI
DC-AI entails. MC-AI has significantly contributed to advancing the
We demonstrate the potential benefits of DC-AI technical potency of AI when solving many real-life
when solving many key issues in the current AI problems. Some of the major problems are natural lan-
technology. To the best of our knowledge, this is guage processing, emotion detection, human activity
the first work centering on DC-AI and MC-AI, and recognition, and pandemic mitigation.8,9 Researchers
it can provide a good foundation for future have explored MC-AI from multiple perspectives, but
research in this line of work. most of those are related to improving the architec-
tural aspects (e.g., the code of AI models). To provide a
This work’s four differences from the published arti- clear overview of developments concerning MC-AI, we
cle6 are that 1) it identifies and discusses the catego- classify major research/developments into the follow-
ries of MC-AI developments from the perspective of six ing six broad dimensions:
dimensions, 2) it identifies and groups techniques that
can be vital to enhancing data quality and realizing 1) Ever-expanding horizons of AI applications: In the
DC-AI, 3) it provides the workflow of MC-AI and DC-AI beginning, AI was mostly confined to the com-
when solving a real-world problem using AI, and 4) it puter science field and was used/investigated by

Authorized licensed use limited


November/December to: Thiagarajar College of Engineering. Downloaded on August 07,2024 at 04:41:07 UTC
2023 63
from IEEE Xplore. Restrictions apply.
IT Professional
AI

FIGURE 1. Workflows of (a) MC-AI and (b) DC-AI when adopted to solve real-life problems.

computer scientists. However, with time, AI has target class. However, these methods are sim-
expanded into many other disciplines. Currently, ple and require much greater human involve-
in most sectors, AI has been rigorously used to ment. The landscape of AI models changed,
accomplish multiple objectives, and AI has taken and computer scientists became interested
over some jobs from humans, such as gauging in limiting human involvement, so perceptron
the amount of liquid in water/wine bottles. Spe- evolved into neural networks that require less
cifically, AI applications in the health-care sector feature engineering (or human involvement).
are booming. The recent pandemic sparked the Consequently, a simple perceptron model that
use of AI in the health-care sector.1 Because the can work well with simple data was enhanced
data used to train AI models can vary from appli- to a complex neural network to solve problems
cation to application, developers need to pay in which the input can be images/videos. In
ample attention to the data for each particular other words, a simple binary mathematical
application. function was replaced with layers and channels,
2) Advancements in network architectures: In the and multiple interactions performed between
early days of AI development, the mapping of layers determine the output. These enhance-
input ðXÞ to output ðY Þ was governed by a few ments in the network architecture improved the
intermediate layers. In the perceptron model, technical status of AI technology. Today, a mam-
computer scientists are interested in determin- moth number of network architectures exist,
ing whether or not a mathematical function can which can be used to solve any real-world
map a vector of numbers to some specific problem.

64
Authorized licensed use limited to: Thiagarajar College of Engineering. Downloaded on August 07,2024 at 04:41:07 UTC from
IT Professional IEEE Xplore. Restrictions
November/December apply.
2023
AI

TABLE 1. Comparative analysis of AI models (adopted from Anwar12).

Network (year) Salient feature Parameters Top-five accuracy FLOPs


AlexNet (2012) Deeper 62 M 84.7% 1.5 B
VGGNet (2014) FS kernels 138 M 92.3% 19.6 B
Inception (2014) W-P kernels 6.4 M 93.3% 2B
ResNet-152 (2015) SC connections 60.3 M 95.51% 11 B
FLOPs: floating[point operations; FS: fixed-size; W-P: wider-parallel; SC: shortcut; M: million; B: billion.

3) Optimization (pruning) in the architecture: a comparative analysis of famous AI models,


Now, there are plenty of AI models, each having along with relevant technical details. From the
a different workflow in terms of network size analysis, we can see that AI developers are rap-
and architecture, data processing mechanism, idly advancing AI models to address issues stem-
number and types of parameters, convergence ming from AI use/governance.
speed, and so on. However, in some cases, 5) Advancement in data modalities and AI model
AI models need to operate on resource- training: Before 2016, data acquisition at some
constrained and tiny devices, like microcontrol- central place was necessary to train AI models.
ler units, and deploying complex AI models However, with the introduction of the FL concept,
(e.g., a CNN) on such devices can be impossi- data acquisition at some central place is no longer
ble. To this end, researchers/developers have required, and AI models can still be trained in a dis-
explored various ways to reduce the size of AI tributed manner.13 With the inception of FL, AI
models so that they can operate in resource- have developers shifted toward new data modali-
constrained environments/devices.10 Pruning ties and optimized ways of AI model training by
and quantization techniques have been devised using local data. Recently, many challenges and
to remove redundant parameters/weights, skip problems with FL deployment concerning client
some layers, and/or combine multiple layers to selection, model distribution, and privacy and
reduce the AI model’s size.11 Through such security of the FL ecosystem have been observed
code-based techniques, memory and CPU time by researchers.14 Therefore, developers are explor-
in AI models have improved significantly. ing ways to optimize the performance of FL eco-
4) New model development: The AI community systems by utilizing improved codes/algorithms.
has shown ever-increasing interest in the devel- 6) Hyperparameter tuning to yield better perfor-
opment of new AI models from slight modifica- mance: In most of the cases, developers try dif-
tions to previous models. The main motivations ferent combinations of hyperparameters, such
behind these developments are huge parameters, as batch size, a, sampling technique, filter size,
computing overhead, deficiencies in salient fea- cyclical momentum, and so forth, to yield desir-
ture extraction, and model size. For example, the able results with AI models. The selection of
CNN was upgraded to a new variant named optimal combinations of hyperparameters for
ResNet-18, which is relatively deeper and has diverse applications is challenging and needs
more layers. Consequently, this new model has careful implementation.15 In conventional AI
greater abilities in terms of extracting salient fea- implementations, the trial-and-error method is
tures from training data and can yield better per- adopted to determine the optimal values of
formance. However, ResNet-18 is prone to the hyperparameters, leading to extensive comput-
vanishing gradient problem and has greater com- ing overhead. In case of poor performance,
plexity due to its many multiplication operations. developers change either the hyperparameter
Similarly, the inception AI model is needed when values or network structure, which can slow
important global features are distributed in more down the development of AI models.
parts of the images, and a fixed kernel size may
not yield desirable results. The inception model In most of the aforementioned dimensions, the
goes wider, rather than as deep as conventional main focus of developers revolves around the code;
CNN models. Thus far, many AI models have they rarely inspect the data. However, the data consti-
been proposed, and more developments are tute a vital component in AI technology development
expected in the near future. In Table 1, we provide and significantly contribute to the quality of AI

Authorized licensed use limited


November/December to: Thiagarajar College of Engineering. Downloaded on August 07,2024 at 04:41:07 UTC
2023 65
from IEEE Xplore. Restrictions apply.
IT Professional
AI

TABLE 2. Six research dimensions centered on the MC-AI approach.

Dimension number Main priority (or focus) Data investigation


1 Ever-expanding horizons of AI applications Yes
2 Improving network architectures of AI models Rare
3 Pruning/optimizing the network architecture Rare
4 Proposing new models of AI (or upgraded Rare
versions)
5 Devising new modalities and AI model training Rare
6 Improving hyperparameters of AI models Rare

systems. In Table 2, we summarize all six dimensions DC-AI. Specifically, we classify various approaches that
and highlight their main priorities. From the analysis, can be employed as part of DC-AI into the following
we found that MC-AI gives minimal preference to the three levels”
data, and therefore, fiddling with code may not yield
desirable solutions to many industrial problems. 1) First level: This includes 24 basic approaches
that fall under the DC-AI umbrella. Most of the
approaches can be applied to the initial phase
THREE-LEVEL DC-AI PARADIGM of AI system development. For example, it is
DC-AI is a very recent paradigm that explores ways to vital to collect only relevant and necessary data
improve data quality to enhance the performance of AI concerning the problem, and there exists a
models.16,17 Table 3 presents the core approaches of data-relevance approach at this level to ensure

TABLE 3. Salient approaches of the DC-AI paradigm that can make AI more effective.

Level 3 Level 2 Level 1


Consistency
Accuracy
Completeness
Data quality Timeliness
Metadata
Effectiveness
Relevance
Readiness
DFS Data availability Service-level agreement
Properties
Patterns
Data observability Metrics
States
Statistics
Higher knowledge about data
Visibility of all data Data versioning
Effective data utilization
IDA Moving data to the right place Efficiency
Seamless access to data Removing property lock-in
Control on data Monitoring data flows
Define access levels
Correct data use Document unfair practices
DC
Ethically complaint data use Explain the risk of data misuse
Transparent models
DFS: data-first strategy; IDA: intelligent data architecture; DC: data compliance.

66
Authorized licensed use limited to: Thiagarajar College of Engineering. Downloaded on August 07,2024 at 04:41:07 UTC from
IT Professional IEEE Xplore. Restrictions
November/December apply.
2023
AI

it. Similarly, sensitive data classification and The Com value can be quantified via
risk of misuse can be assessed to guarantee  
U
better protection of sensitive data in the life- Com ¼ 1   100 (6)
n
cycle. Furthermore, analysis of data complete-
ness is desirable at this level to prevent where U denotes the number of missing values,
performance-degradation issues in AI systems. and n denotes the total number of entries. For
Satisfying most of the approaches at the first example, for a dataset having 500 records with
level can prevent inadvertently propagating 110 missing values, the Com is 78%. The Met is
data-specific biases to the other levels, which the detailed information of a column/dataset. It
can contribute to the development of effective is ideal to analyze the metadata before building
AI systems to solve real-world problems. ML models. For example, the distribution skew is
2) Second level: This level includes nine different a very common problem in ML, and it can be
approaches that are relatively more sophisti- computed using
cated and advanced than in previous level. cM
Met ¼ (7)
These approaches enhance the quality of data cm
and can be employed to determine whether the
where cM denotes the instances from the major
data are complete according to most of the
class, and cm denotes the instances from minor
aspects. These approaches empower AI devel-
class. In real cases, the distribution/frequencies
opers to have strong control over the data, and
of the column can be computed and used in the
therefore, all parts can be equally used in the
Met:
training/development of AI systems. Most of
The Tim can be quantified by taking the dif-
the approaches at this level are multicriteria,
ference between data curation time ðTc Þ and
meaning that multiple coefficients can be used
data use time ðTu Þ
to quantify the level of each approach. Equation
(3) is an example of data quality estimation Tim ¼ Tc  Tu : (8)
using a multicriteria method
Dq ¼ w1  Acc þ w2  Con þ w3 The value of Tim can be compared with some
Com þ w4  Met þ w5  Tim (3) threshold t to decide about data acceptance/
þw6  Rel þ w7  Eff unacceptance, expressed as

where Dq refers to data quality, and Acc, Con, unacceptable; if Tim  t
Tim ¼ (9)
acceptable; otherwise:
Com, Met, Tim, Rel, and Eff denote accuracy,
consistency, completeness, metadata, timeliness,
relevance, and effectiveness, respectively. These The Rel can be quantified using
parameters can be quantified using mathemati- Sf
cal formulas or numerical scores given by domain Rel ¼ (10)
Tf
experts based on data judgment. The formula for
computing Acc is expressed as where Sf denotes the salient features, and Tf
refers to the total number of features. Rel can
C
Acc ¼ (4) also be used to draw relevant samples out of the
A
total samples.
where C denotes the number of samples that are The Eff can be quantified using
recognized correctly, and A denotes the total
A
number of samples in a dataset. Eff ¼ (11)
D
Similarly, the value of Con can be quantified
using where A is the achieved accuracy/data, and D is
desirable accuracy/data.
Cindex ðmax n
Þ
Con ¼ ¼ n1 (5) In (3), wi , where i ¼ 1 to seven, denotes the
Rindex Rindex weights of each parameter. The range of wi coef-
where Cindex is the consistency index, Rindex is the ficients is between zero and one (e.g., wi > 0),
P7
random consistency index, and n is the number and i¼1 wi ¼ 1: The optimal values of coeffi-
of observations in the data. The value of Rindex is cients can be specified by domain experts or
determined using a lookup table by passing n as adjusted based on the importance/problem. It is
a parameter. worth noting that some of the aforementioned

Authorized licensed use limited


November/December to: Thiagarajar College of Engineering. Downloaded on August 07,2024 at 04:41:07 UTC
2023 67
from IEEE Xplore. Restrictions apply.
IT Professional
AI

parameters can be quantified using built-in func-


tions in some software (e.g., Microsoft Excel) or
assigning one/zero based on expertise. Also, data
quality estimation can vary depending on the
data modality.18 By guaranteeing of the most
approaches, data-specific bias can be signifi-
cantly restrained.
3) Third level: This level includes three distinct
approaches (also known as building blocks)
concerning DC-AI. These approaches are more
cutting edge and advanced than the bottom
and intermediate levels. In these blocks, all of
the previous 33 approaches are analyzed, and
further opportunities are explored to improve
data quality. In addition, the decision can be
made to reassess the downstream approaches
depending on the problem at hand. For exam-
ple, in some cases, all data cannot be sensitive,
and therefore, less attention can be paid to the
data-compliance block compared to the other
FIGURE 2. Potential benefits of DC-AI.
two blocks. Similarly, if we need to address only
social problems in AI systems, then the data-
first strategy requires closer attention com- the AI models’ lifespan by paying ample attention to
pared to the other two blocks. In some cases, data drift during development. DC-AI can increase
pilot projects (or prototypes) can be developed robustness by ensuring data availability and control. It
to assess the efficacy of these components can solve longstanding problems in AI (unfair deci-
before building the actual product/system. It is sions, explainability, trust, and so on) by applying data
important to note that some approaches may engineering practices. It can likely solve the global
not be needed all the time, and therefore, fur- challenges of climate change and supply-chain disrup-
ther investigation is needed to choose suitable tions through data-tailored actions. Finally, it can assist
DC-AI approaches in real-world scenarios. with understanding the curious nature of AI models by
explaining how data are used in them. It can contribute
By using all the approaches encompassed in these to the responsible use of AI, which is an urgent require-
three levels, good-quality data can be curated. ment amid the rapid rise in AI applications.
Finally, DC-AI can effectively contribute to efficiently
POTENTIAL BENEFITS OF DC-AI extracting causal and temporal relations from a limited
DC-AI explores various ways to make AI technology time series corpus. It can contribute to developing light-
more effective for human beings. In the future, it can weight models that can create general causal graphs as
bring many benefits to AI developers and consumers. a representation that can be used to forecast future
We summarize the potential benefits of DC-AI in Figure 2. time-series data/values. DC-AI provides a means to
For example, DC-AI can reduce computing over- curate synthetic time-series corpus, which can be used
head by identifying faulty parts of the data and fixing to enhance the performance of AI models when com-
(or augmenting) them rather than doubling the data, as bined with original time-series corpus. DC-AI can also
MC-AI does. It can foster AI’s transition from academic contribute to causal-features extraction in a short time,
labs to the market by exploiting the benefits of pre- which improves the robustness and computing efficiency
trained models to the extent possible, rather than of applications involving limited time-series corpus.
rebuilding models from scratch. It can extend AI adop-
tion to multiple domains involving limited data. DC-AI DISCUSSION
can also enhance accuracy by correctly labeling the Outcomes of the Research
data and involving multiple domain experts in the label- This work uncovers the details of MC-AI and DC-AI for
ing process. It can make AI technology more accessi- the reader, which can pave the way for understanding
ble and understandable to the nonexpert. It can extend these two schools of thought used for AI performance

68
Authorized licensed use limited to: Thiagarajar College of Engineering. Downloaded on August 07,2024 at 04:41:07 UTC from
IT Professional IEEE Xplore. Restrictions
November/December apply.
2023
AI

enhancement. The benefits of DC-AI explored in this CONCLUDING REMARKS


work can contribute to making AI more beneficial, in
This article provided an in-depth analysis of MC-AI and
particular, overcoming the societal risks of AI, which
DC-AI, which are two main research trends in AI tech-
are debatable issues around the globe. It pinpoints
nology development. MC-AI is a widely used approach
that DC-AI can likely open the black-box nature of AI,
that leverages AI to solve real-world problems, but it
which may assist with understanding the workflow of
mostly focuses on improving only the code in AI models.
most AI models. Explainable and fair AI systems are an
Recently, many ill effects of MC-AI, such as limited appli-
urgent need in some specific sectors, such as health
cability in data-constrained domains, societal risks, and
care. It provides a new perspective toward building
extensive computing overhead, have been observed that
more data engineering techniques as well as systemat-
require urgent solutions to develop lightweight, safe, and
ically applying DC-AI techniques that can extend the
reliable AI solutions. DC-AI is a fledgling paradigm and
application horizon of AI. It provides workflows for
DC-AI systems, which can allow the development of pro- will provide the means to fully/partially resolve most of
totypes and proofs of concept. Finally, this work aligns the drawbacks in MC-AI by rigorously improving one of
with recent trends toward making data AI ready and the key elements (i.e., data) of the AI ecosystem, rather
improving data quality before feeding them to AI models. than improving only the code. Exploring ways to properly
amalgamate these two approaches by identifying suit-
Guidelines Regarding the Use Cases able application scenarios will enhance the AI transition
for Which DC-AI Should Be Used from academic labs to the market and eventually solve
DC-AI can be applied in any situation, however, it is many longstanding problems in conventional AI. Our
more suitable to those scenarios involving fewer, none, work is an initial step toward representing the efficacy of
and/or low-quality data. Furthermore, it is an ideal can- DC-AI, which can pave the way for enabling AI for social
didate when getting more data is either difficult or good.
data collection budgets are very small. It is also handy
when data are coming from diverse sources and are in ACKNOWLEDGMENT
different modalities (i.e., tables, time series, and so on). This work was supported by the National Research
DC-AI is also needed in scenarios where outcomes/ Foundation of Korea and the Korea government
decisions are made solely based on training data such through Grant 2020R1A2B5B01002145.
as chatbot.19 DC-AI is inevitable in sensor-powered sys-
tems such as IoT-based medical diagnosis systems
and predictive maintenance. For example, in the pre- REFERENCES
dictive maintenance use case, it can contribute to data 1. M. Al-Hashimi and A. Hamdan, “The applications of
alignment, consistency, and fusion, which can be use- artificial intelligence to control COVID-19,” in
ful in identifying faulty machines proactively. Autono- Advances in Data Science and Intelligent Data
mous vehicles collect data from various sensors to Communication Technologies for COVID-19, A. E.
make reliable decisions in real time. To this end, DC-AI Hassanien, S. M. Elghamrawy, and I. Zelinka, Eds.
can play a vital role in preventing data incidents. In a Cham, Switzerland: Springer, 2022, pp. 55–75.
climate-change scenario, it can ensure relevant data 2. W. Leal Filho et al., “Deploying artificial intelligence
collection and quality enrichment, which can help bet- for climate change adaptation,” Technological
ter understand the dynamics of climate change. In Forecasting Social Change, vol. 180, Jul. 2022,
TinyML, it can prevent the possibility of data drift and Art. no. 121662, doi: 10.1016/j.techfore.2022.
extensive retraining of AI models. It is also very impor- 121662.
tant in equity and inclusion scenarios to prevent con- 3. O. Surucu, S. A. Gadsden, and J. Yawney, “Condition
flict by curating diverse data. Recently, it has been monitoring using machine learning: A review of
used to detect fraud in health-care applications by sim- theory, applications, and recent advances,” Expert
ply preparing and understanding data.20 Similarly, there Syst. Appl., vol. 221, Jul. 2023, Art. no. 119738, doi: 10.
are many use cases where DC-AI can be used, such as 1016/j.eswa.2023.119738.
language models, time-series forecasting, epidemic 4. Z. Lv, J. Guo, and H. Lv, “Deep learning-empowered
analysis, human activity recognition, and so forth. clinical big data analytics in healthcare digital twins,”
Finally, data are the cornerstone for AI developments, IEEE/ACM Trans. Comput. Biol. Bioinf., early access,
and therefore, DC-AI can vastly contribute to enhancing 2023, doi: 10.1109/TCBB.2023.3252668.
their quality, leading to significant performance enhance- 5. A. Bahrini et al., “ChatGPT: Applications, opportunities,
ment in AI. and threats,” in Proc. IEEE Syst. Inf. Eng. Des. Symp.

Authorized licensed use limited


November/December to: Thiagarajar College of Engineering. Downloaded on August 07,2024 at 04:41:07 UTC
2023 69
from IEEE Xplore. Restrictions apply.
IT Professional
AI

(SIEDS), 2023, pp. 274–279, doi: 10.1109/SIEDS58326. learning-based side-channel analysis,” IEEE Trans.
2023.10137850. Emerg. Topics Comput., early access, 2022, doi: 10.
6. E. Strickland, “Andrew NG, AI minimalist: The 1109/TETC.2022.3218372.
machine-learning pioneer says small is the new big,” 16. L. Schmarje et al., “A data-centric approach for
IEEE Spectr., vol. 59, no. 4, pp. 22–50, Apr. 2022, improving ambiguous labels with combined semi-
doi: 10.1109/MSPEC.2022.9754503. supervised classification and clustering,” in Proc. Eur.
7. E. Jeczmionek and P. A. Kowalski, “Input reduction of Conf. Comput. Vision, Cham, Switzerland: Springer,
convolutional neural networks with global sensitivity 2022, pp. 363–380, doi: 10.1007/978-3-031-20074-8_21.
analysis as a data-centric approach,” Neurocomputing, 17. A. Majeed and S. O. Hwang, “Data-centric artificial
vol. 506, pp. 196–205, Sep. 2022, doi: 10.1016/j.neucom. intelligence, preprocessing, and the quest for
2022.07.027. transformative artificial intelligence systems
8. S. Geravesh and V. Rupapara, “Artificial neural development,” Computer, vol. 56, no. 5, pp. 109–115,
networks for human activity recognition using sensor May 2023, doi: 10.1109/MC.2023.3240450.
based dataset,” Multimedia Tools Appl., vol. 82, no. 10, 18. I. Taleb, M. A. Serhani, C. Bouhaddioui, and R. Dssouli,
pp. 14,815–14,835, Apr. 2023, doi: 10.1007/s11042-022- “Big data quality framework: A holistic approach to
13716-z. continuous quality management,” J. Big Data, vol. 8,
9. A. H. Shamman, A. A. Hadi, A. R. Ramul, M. M. A. no. 1, pp. 1–41, May 2021, doi: 10.1186/s40537-021-
Zahra, and H. M. Gheni, “The artificial intelligence (AI) 00468-0.
role for tackling against COVID-19 pandemic,” 19. U. M. Fayyad, “From stochastic parrots to intelligent
Mater. Today, Proc., vol. 80, pp. 3663–3667, 2023, assistants—The secrets of data and human
doi: 10.1016/j.matpr.2021.07.357. interventions,” IEEE Intell. Syst., vol. 38, no. 3, pp.
10. Q. Huang, “Weight-quantized squeezeNet for 63–67, May/Jun. 2023, doi: 10.1109/MIS.2023.3268723.
resource-constrained robot vacuums for indoor 20. J. M. Johnson and T. M. Khoshgoftaar, “Data-centric
obstacle classification,” AI, vol. 3, no. 1, pp. 180–193, AI for healthcare fraud detection,” SN Comput. Sci.,
2022, doi: 10.3390/ai3010011. vol. 4, no. 4, 2023, Art. no. 389, doi: 10.1007/s42979-
11. J. Torres-Tello and S.-B. Ko, “Optimizing a 023-01809-x.
multispectral-images-based dl model, through feature
selection, pruning and quantization,” in Proc. IEEE Int.
ABDUL MAJEED is an assistant professor with the Depart-
Symp. Circuits Syst. (ISCAS), 2022, pp. 1352–1356, doi:
ment of Computer Engineering, Gachon University, Seong-
10.1109/ISCAS48785.2022.9937940.
nam, 13120, South Korea. His research interests include
12. A. Anwar, “Difference between AlexNet, VGGNet,
ResNet, and inception,” Medium-Towards Data privacy-preserving data publishing, data-centric artificial
Science, 2019. https://towardsdatascience.com/the- intelligence, and federated learning. Majeed received his
w3h-of-alexnet-vggnet-resnet-and-inception- Ph.D. degree in computer information systems and net-
7baaaecccc96 works from the Korea Aerospace University. Contact him
13. Q. Yang, Y. Liu, Y. Cheng, Y. Kang, T. Chen, and H. Yu, at [email protected].
“Federated learning,” Synthesis Lectures Artif. Intell.
Mach. Learn., vol. 13, no. 3, pp. 1–207, 2019, doi: 10.
SEONG OUN HWANG is a professor with the Department of
2200/S00960ED2V01Y201910AIM043.
Computer Engineering, Gachon University, Seongnam, 13120,
14. A. Majeed, X. Zhang, and S. O. Hwang, “Applications
South Korea. His research interests include cryptography,
and challenges of federated learning paradigm in the
data-centric artificial intelligence, and cybersecurity. Hwang
big data era with special emphasis on COVID-19,” Big
Data Cogn. Comput., vol. 6, no. 4, 2022, Art. no. 127, received his Ph.D. degree in computer science from the
doi: 10.3390/bdcc6040127. Korea Advanced Institute of Science and Technology. He is
15. L. Wu, G. Perin, and S. Picek, “I choose you: a Senior Member of IEEE. He is the corresponding author of
Automated hyperparameter tuning for deep this article. Contact him at [email protected].

70
Authorized licensed use limited to: Thiagarajar College of Engineering. Downloaded on August 07,2024 at 04:41:07 UTC from
IT Professional IEEE Xplore. Restrictions
November/December apply.
2023

You might also like