Artificial Intelligence For Drug Toxicity and Safety: Columbia University Medical Center New York, NY
Artificial Intelligence For Drug Toxicity and Safety: Columbia University Medical Center New York, NY
Abstract
Interventional pharmacology is one of medicine’s most potent weapons
against disease. These drugs, however, can result in damaging side effects
and must be closely monitored. Pharmacovigilance is the field of science that
monitors, detects, and prevents adverse drug effects. Safety efforts begin
during the development process, using in vivo and in vitro studies, continue
through clinical trials, and extend to post-marketing surveillance of ADRs in
real-world populations. Future toxicity and safety challenges, including
increased polypharmacy and patient diversity, stress the limits of these
traditional tools. Massive amounts of newly available data present an
opportunity for using artificial intelligence and machine learning to improve
drug safety science. Here, we explore recent advances as applied to pre-
clinical drug safety and post-marketing surveillance with a specific focus on
machine and deep learning approaches.
Keywords:
Pharmacovigilance, Machine Learning, Deep Learning, Adverse Drug
Reactions
The challenge of keeping drugs safe
Drug safety is a major challenge in bring new drugs to market. Unexpected
toxicities are a major source of attrition during clinical trials and post-
marketing safety concerns cause unnecessary morbidity and mortality.
Adverse events (AEs), or adverse drug reactions (ADRs) when causality is
demonstrated, are unexpected effects occurring from a normal dosage of the
drug. Between 2008 and 2017, the Food and Drug Administration (FDA)
approved 321 novel drugsi. Over the same period of time, the FDA Adverse
Event Reporting System (FAERS) ii recorded more than 10 million AE reports,
among which 5.8 million were serious reports and 1.1 million were AEs
related to death. AEs burden our health system causing 2 million hospital
stays each year and lengthening visits by 1.7 to 4.6 daysi. The economic,
social, and health burden of toxicity and safety assessment is an essential and
pressing public health concern.
There are two complementary systems to address drug safety (Figure 1).
Before a drug is approved, clinical trials ensure that this drug is safe and
effective for its intended use. Once a drug is marketed, drugs are monitored
through AE reports to ensure a drug’s safety information is up to date, a
process called pharmacovigilance (PV). However, neither of these processes
are error proof as clinical trials suffer from structural limitations. For
example, it is impossible to test for all potential synergistic effects or to
conduct trials on populations large enough to detect rare AEs. Until recently,
women and the elderly were considered special sub-groups for clinical trials.
These trials have focused on designing drugs for the average patient [1] even
at a time when there are increasing calls for precision medicine to enable the
“right drug at the right dose to the right patient”[2]. Once drugs are approved,
it is the purview of programs to monitor drug safety. These agencies use
databases of spontaneously collected AE reports to flag leads and perform
confirmatory follow up analyses. However, these spontaneous reports are
known to suffer from biases such as under-reporting which is especially
troublesome for rare events and drug-drug interactions (DDIs)[3]. The
research community has turned to statistical and computational approaches
to address these limitations and supplement its PV toolbox [4, 5].
Over the past decade, we have seen two phenomena occur: (1) the
explosion of freely accessible databases of medical, chemical, and
pharmacological knowledge, along with the rapid adoption of electronic
health record (EHR) systems stimulated by the Health Information
Technology for Economic and Clinical Health (HITECH) Actiii; and (2) the
development of novel computational methods in the realm of machine
learning (ML) (see Glossary) and deep learning (DL) a popular re-branding
of neural networks - catalyzed by exponential increase in compute power and
2
data availability. Below, we explore the recent literature leveraging artificial
intelligence (AI) methods, both ML and DL, on novel data sources for pre-
clinical and in post-marketing surveillance for PV (Figure 1). We further
encourage the reader to reference the following introductory reviews for
more details on ML [6, 7, 8] and DL[9, 10].
3
Regression
Early QSAR techniques relied on multivariate linear regression to assess
the chemical properties of drug candidates[17]. These approaches are
sensitive to high data dimensionality and feature correlation which may
result in overfitting, and limited interpretability. Modern regression-based
approaches incorporate feature selection techniques to address these
concerns. One such technique is the use of a penalized regression model. L1
regularization, which is used in least absolute shrinkage and selection
operator (LASSO), aims to prevent overfitting by reducing the number of
features and only selecting subsets that are most relevant to the QSAR
model prediction[18]. L2 regularization, which is used in ridge regression
(RR), aims to alleviate collinearity by reducing the effective number of
features used in the model. Recently, Algamal et al.[16] proposed a weight
adjustment to the adaptive LASSO aimed to improve the selection of
correlated descriptors. This approach demonstrated potential when used to
develop a QSAR prediction of the anti-cancer potency of various
imadazo[4,5-b]pyridine derivatives. The authors also proposed applying L1-
norm regularization in the selection of significant descriptors for anti-
hepatitis C virus activity of thiourea derivatives[19]. While regression-based
approaches have demonstrated utility in QSAR prediction, assumptions of
linearity, which are inherent in regression, as well as issues of
dimensionality affect most QSAR modelling tasks. Currently, the most
common alternatives are support vector machines (SVM) and ensemble
approaches, such as random forest, due to their high predictive accuracy,
robustness, and ease of interpretation[20].
4
when compared with naive Bayes, k-nearest neighbor (k-NN), and
random forest algorithms[22]. Specifically, when SVM was used in
conjunction with a Chemistry Development Kit (CDK)iv[23] molecular
fingerprint for HDAC1 activity prediction with 5-fold cross validation of
a training set, it achieved an area under the receiver operating curve
(AUC)=0.91, with 97% sensitivity, and 50% specificity. The training set
consisted of over 2,300 human HDAC1 inhibitors extracted from the
Binding Databasev, BindingDB[24]. This model also performed well
(AUC=0.89, with 95% sensitivity, and 75% specificity) when validated
using an external set of 413 ChEMBLvi compounds[25].
Ensemble learning
Ensemble methods combine several ML models into a robust predictive
model. In doing so, they have improved predictive performance when
compared with a single model, and are often less susceptible to bias and
overfitting. Random forest is an ensemble learning algorithm that can handle
class imbalances and avoid overfitting, two common challenges in QSAR
modeling. Ng et al.[28] used a decision forest model to predict estrogen
receptor binding, using a 3,308 chemical training set from the FDA’s
Estrogenic Activity Databasevii. Models showed good performance with an
internal accuracy of 92% and an external validation ranging from 70-89%
accuracy. More recently, Lee et al.[29] demonstrated the use of random forest
in ligand-based QSAR modelling using ChEMBL bioactivity data. Training of
the 1,121 developed QSAR models showed an overall AUC = 0.97 using 5-fold
cross validation. Testing on an external validation set showed an accuracy of
0.89.
To further highlight the impact of class imbalances in QSAR modeling,
Grenet et al.[30] showed that many commonly used ML modeling approaches
under-performed when using data from ToxCastviii, an US Environmental
Protection Agency (EPA) managed public dataset containing chemical
structure and bioactivity data. These data are imbalanced, and highly
5
enriched for inactive compounds that are negative for toxicity in in vitro
assays and for whom a half maximal activity concentration (AC50) could
not be measured. When SVM, random forest, linear discriminant analysis
(LDA), and neural networks were used on the ToxCast data, the AUC ranged
between 0.6 and 0.73 across the methods, well below the expected
performance. In response, the authors developed a stacked generalization
approach, an ensemble method that consists of training a learning algorithm
by combining the predictions of other algorithms. This stacked approached
showed better performance with more models reaching an AUC>0.80 than
any of the single QSAR classifiers.
Software
QSAR can also be used to predict target-based activities like toxicity. The
field of target-driven toxicity prediction is heavily infiltrated with
proprietary tools. Many of these use classical ML algorithms but refine the
type of data used to calculate predictions. TargeTox[31] ix and PrOCTOR[32]
x are two examples of recent open-source toxicity prediction tools.
6
PrOCTOR is a target-based toxicity prediction software that in addition to
network information, also incorporates chemical properties into its scoring.
To develop a PrOCTOR score, the algorithm combines chemical structure
properties of the drug candidate (e.g. molecular weight, polar surface area,
quantitative estimate of drug-likeness (QED)) along with protein target
information (e.g. network connectivity, tissue-specific expression). Drug
target data is extracted from public datasets including DrugBank[33],
GTExxiii[34], and ExACxiv[35]. Compared to TargeTox, the PrOCTOR model
includes many more features with a total of 48 variables (34 target-based, 10
structure, and 4 drug-likeness) per drug compound model. A random forest
classifier is used on the 48-feature model to develop a PrOCTOR score which
assesses the likelihood of toxicity. This model constructs 50 decision trees
using a subset of the features and uses the tree consensus to predict outcome.
PrOCTOR showed high performance (AUC=0.83) and accuracy (ACC = 0.75)
with high sensitivity (0.75) and specificity (0.74) when trained on a set of
784 FDA drugs with 10-fold cross validation. Further, when tested on a set of
FDA-approved drugs, the algorithm scored 3 drugs with known toxicity
events, docetaxel, bortezomib, and rosiglitazone, with the worst score.
PrOCTOR’s ability to leverage multiple types of target and structure-based
features for toxicity prediction places sets it above many other target-based
algorithms.
Deep Learning
DL is an extension of ANNs which uses a hierarchy of ANNs to learn useful
features from raw data. A Merck-sponsored Kaggle competition in 2012
introduced DL to the field of drug discovery. The winning team used DL on a
set of diverse QSAR data sets to predict activity values for various
compounds[20]. Recently, numerous studies of toxicity modelling have used
DL approaches.
To assess hepatotoxicity, Xu et al. used DL to build drug-induced liver
injury (DILI) prediction models with chemical structure data[36]. The
authors used a recurrent neural network to construct the models. The best
model was trained on 475 drugs and predicted an external validation set of
about 200 drugs with an accuracy of 86.9%, sensitivity of 82.5%, and
specificity of 92.9%. This model outperformed previously reported DILI
prediction models.
7
Deep convolutional neural networks (CNNs) are a class of DL networks
that learn representations of raw images from pixel information as a
hierarchy of images from which features can be extracted and used to classify
complex patterns[37]. CNNs have been used to predict toxicity from images
of cells pre-treated with a set of drugs[37]. This approach was able to
effectively predict a broad spectrum of toxicity mechanisms from different
drugs, nuclear stains, and cell lines. Tong et al. also used a CNN strategy in a
protein structure analysis task[38]. Specifically, a 3D CNN approach was used
to analyze amino acid microenviroments and predict effects of mutations on
protein structure. No prior knowledge or feature assumptions were required
for this prediction task. And, the approach demonstrated a two-fold increase
in accuracy prediction compared with models that require hand-selected
features. Other DL approaches that have been used to assess drug toxicity
include autoencoders[39], generative adversarial networks (GANs)[40] and
long short-term memory (LSTM)[41], among others.
Post-marketing surveillance
In 1962, it was revealed that thousands of babies were born with
malformed limbs because thalidomide, a mild sleeping pill, had no
contraindications for pregnant women to whom it was often prescribed off-
label[42]. The WHO ”Programme for International Drug Monitoring”xv was
created following this disaster. Since 1978, the Uppsala Monitoring Centre
(UMC) in Sweden is the global coordinator for PV in collaboration with the
WHO, and counts 134 full member countries with national agencies
supporting patient safety and drug AE reporting systems. These initiatives
are the proof that safety assessment in clinical trials has its limit, and that
drug safety needs to be actively monitored during the entire market life of
drugs.
In the United States, the FDA maintains FAERS, a database containing
adverse event reports, medication error reports and product quality
complaints resulting in AEs. These self-reported Individual Case Safety
Reports (ICSRs) have been a major data source for post-marketing drug
safety mining. The classical methods to evaluate causality include the
Naranjo algorithm[43], the Venulet algorithm[44], and the World Health
Organization-Upsala Monitoring Centre (WHO-UMC) system for
standardized case causality assessment, among others[45].
8
Naturally, spontaneous reporting systems (SRS) like FAERS enabled data
mining methods to identify statistical associations between drugs and AEs
[46, 47]. But with known short-comings, such as confounding biases and
under-reporting[48], the attention has shifted to other data sources and
advanced computational methods that could replace or complement existing
resources. Below we discuss the approaches and data sources that can
support post-marketing PV along with the associated AI-driven methods
needed to extract information and learn from it.
System Pharmacology
System pharmacology is the study of drug action using principals from
systems biology, considering the effect of drug on the entire system rather
than a single target or metabolizing enzyme. This approach promises to
explain unexpected drug effects that may result from complex interactions of
targets and pathways. Application of systems pharmacology to adverse drug
events differs from its use in drug discovery in that it’s focused on off-target
effects and clinical observations of adverse reactions. In addition, it is one of
the most data-rich approach to drug safety for in silico ADR mining. A variety
of open databases are available and have been listed in Table 1. As a
consequence of the rich data sources available, investigators in system
pharmacology for adverse drug effects (ADEs) now have methods of choice
involving network approaches and the ability to integrate multiple types of
features. Lorberbaum et al.[57] proposed the modular assembly of drug
safety subnetworks (MADSS) where they generated protein networks using
knowledge bases that were pruned with literature mining, genome-wide
association study (GWAS) data, assigned phenotype target with DrugBank
and ChEMBL, and finally, trained random forest models on network metrics
to predict new drugs causing AEs. Raja et al.[58] focused on DDIs by mining
the literature to integrate drug-gene interactions (DGIs) at different scales
and trained random forest models to predict DDIs with a gold standard
corpus, focusing specifically on cutaneous diseases. Sornalakshmi et al.[59]
trained SVM models on similarity measures such as 2D molecular structure
similarity, 3D pharmacologic similarity, interaction profile fingerprint (IPF)
similarity, target similarity and ADE similarity from DrugBank and SIDERxvi
to predict drug pairs likely to interact with each other based on a literature-
based gold standard. Xu et al. [36] used various pharmaceutical compound
datasets and neural networks to encode these drugs using the undirected
graph recursive neural networks (UG-RNN) introduced by Lusci et al. [60]
9
and classified them between DILI positive and DILI negative compounds.
Herrero et al.[61] used pharmacokinetic (PK) and pharmacodynamic (PD)
properties from DrugBank and other sources, along with drug-enzymes
relationship data to build neural networks supervised models taking
Lexicompxvii and Vidal compendiaxviii as ground truth for DDI labels.
EHR mining
Stimulated by the HITECH Act, the rapid and widespread adoption of
EHRs that we have witnessed this past decade has also enabled researchers
to tap into these rich and noisy sources of clinical data for PV. EHR data stand
out in their challenging heterogeneity: these temporal data sources include
categorical data such as diagnostic, procedure and medication codes, but also
continuous laboratory tests and measurement values, along with large
volumes of semi-structured and unstructured medical notes and reports.
10
Morel et al.[65] proposed another multivariate SCCS method based on
convolution of step functions with point drug exposures to estimate the effect
of longitudinal features. Kuang et al. directly followed up with that multiple
SCCS and presented a baseline regularization to take into account individual-
specific, time dependent occurrence rate of AEs. More recently, they have
also presented a version of that model for drug repurposing[66].
11
ADRs by Integrative Mining of Clinical Records and Biomedical Knowledge
(EU-ADR), and the Observational Medical Outcomes Partnership (OMOP)
datasets, two national networks that have defined common data models, as
ground truth and showed good generalization performances to predict
binary ADE outcomes. Kim et al.[71] opted for a naive Bayes classifier to
predict the likelihood of ADR in textual data from expert opinion on ADR case
reports from the Korean Adverse Event Reporting System database.
NLP has dramatically benefited from advances in DL in the recent years
to build better language models, with the development of word
embeddings[72], sequence to sequence (seq2seq) learning[73], and more
recently attention mechanisms[74, 75, 76]. The clinical domain has always
been a challenging field of application for NLP, and these novel methods have
been promptly applied to PV problems[77]. Language models can be trained
using the biomedical literature and then applied to clinical notes, as
demonstrated by Dev et al.[78], where the authors used MEDLINE to learn a
better representation of concepts found in narrative logs for classification of
ADEs. A recurrent theme in NLP for the detection of drug to AE relationship
prediction is the need to first detect the concept (i.e., Named Entity
Recognition, (NER)), and then perform the learning task. These two tasks as
showed in the studies previously mentioned can both be conducted with
neural networks. With its natural properties of connections and weights, DL
enables multi-task learning (MTL), an approach that consists of sharing the
weights of the neural networks between multiple tasks to improve overall
performances. Zhang et al.[79] used this technique to jointly learn NER in
texts for AE cases and ADE classification between serious and non-serious
effects. Similarly, Li et al. [80] applied biLSTM networks on the Medication,
Indication and Adverse Drug Events (MADE) 1.0 challenge for NER with a
conditional random field network, and for relations extraction with an
attention mechanism. More recently and using the same dataset, Yang et al.
[81] developed a similar LSTM model for NER but extracted relations
between concepts and ADEs by comparing SVM and random forests.
These NLP techniques have also been used with data collected on social
media and in online health communities. Some of the reviews covering
covering these applications are[82, 59, 83]. There is evidence that users on
social media disseminate information comparable to ICSRs and ADEs can be
classified from these high-noise data sources[84]. Post-marketing PV has
been conducted using Twitter data [85, 86] with embedding techniques and
biLSTM deep classifiers that outperform conditional random field methods,
12
discussion forums[87], or more domain-specific health social networking
sites[88].
Concluding Remarks
The availability of publicly accessible data, adoption of EHR systems, and
development of novel ML and DL approaches has transformed the field of PV.
This article focuses specifically on advances in AI techniques in the context of
pre-clinical drug safety and post-marketing surveillance. We encourage the
reader to reference systemic reviews on PV[21, 89, 90, 91, 92] which present
a more complete picture landscape of PV.
In the recent years, we have observed a growing integration of multi-scale
data, from molecular databases to clinical datasets, in conjunction with a
democratization of DL models to leverage these different data types. Neural
nets have been used so far mostly for NLP applications in PV, but they have
integrated the most recent state-of-the-art concepts such as attention
mechanisms and multi tasks learning. Their applications are starting to be
used beyond that scope, both in chemoinformatics and with clinical
observational data. We noted that most of the approaches in the recent years
that aim at predicting ADEs have been using annotated datasets. This almost
exclusive use of supervised models has its limits, as the prediction of novel
and unknown drug effects cannot rely on labeled data.
This is only the dawn of AI, and numerous questions remain such as how
to address class imbalances in supervised modeling tasks, and how to
incorporate unsupervised approaches in PV studies (Outstanding Questions).
Techniques such as GANs hold promise in addressing some of these concerns.
For example, novel unsupervised approaches using GANs that can generate in
silco molecules with desired chemical properties are starting to emerge[93,
94], showing great promise for drug safety.
While academic research has witnessed a drastic increase in the use of
ML and DL, the community will begin to see these approaches entering into
practice at a growing rate. For example, the FDA recently released plans for a
new regulatory framework to promote the development of safe medical
devices using AI algorithms. We expect that this will extend to drug
development and safety in the future. Appropriate regulatory frameworks
will need to be established to control for the risk of false positives. Overall,
the risk of implementing AI approaches for PV is low and the opportunity high
as it may have a positive impact on healthcare.
13
Resources
i) https://health.gov/
ii) https://open.fda.gov/data/faers/
iv) https://cdk.github.io
v) https://www.bindingdb.org/bind/index.jsp
vi) https://www.ebi.ac.uk/chembl/
vii) https://www.fda.gov/science-research/bioinformatics-
tools/estrogenic-activity-database-eadb
viii) https://www.epa.gov/chemical-research/exploring-toxcast-data-
downloadable-data
ix) https://github.com/artem-lysenko/TargeTox
x) https://github.com/kgayvert/PrOCTOR
xi) https://www.drugbank.ca/
xii) https://clinicaltrials.gov/
14
xiii) https://gtexportal.org/home/
xiv) http://exac.broadinstitute.org/
xv) https://www.who.int/medicines/areas/quality_safety/safety_effi
cacy/National_PV_Centres_Map/en/
xvi) http://sideeffects.embl.de/
xvii) https://online.lexi.com
xviii) http://www.vidal-dis.com/
xix) https://semrep.nlm.nih.gov/
xx) https://www.ebi.ac.uk/chebi/
xxi) https://pubchem.ncbi.nlm.nih.gov/
xxii) https://reactome.org/
xxiii) https://www.genome.jp/kegg/
References
[1] C. Tannenbaum, D. Day, et al., Age and sex in drug development and
testing for adults, Pharmacological research 121 (2017) 83–93.
15
event reports, Journal of the American Medical Informatics Association
19 (1) (2011) 79–85.
16
[15] G. Patlewicz, J. M. Fitzpatrick, Current and future perspectives on the
development, evaluation, and application of in silico approaches for
predicting toxicity, Chemical research in toxicology 29 (4) (2016) 438–
451.
[18] R. Tibshirani, Regression shrinkage and selection via the lasso, Journal
of the Royal Statistical Society: Series B (Methodological) 58 (1) (1996)
267–288.
17
[24] T. Liu, Y. Lin, X. Wen, R. N. Jorissen, M. K. Gilson, Bindingdb: a
webaccessible database of experimentally determined protein–ligand
binding affinities, Nucleic acids research 35 (suppl 1) (2006) D198–
D201.
[29] K. Lee, M. Lee, D. Kim, Utilizing random forest qsar models with
optimized parameters for target identification and its application to
targetfishing server, BMC bioinformatics 18 (16) (2017) 567.
18
[32] K. M. Gayvert, N. S. Madhukar, O. Elemento, A data-driven approach to
predicting successes and failures of clinical trials, Cell chemical biology
23 (10) (2016) 1294–1301.
[36] Y. Xu, Z. Dai, F. Chen, S. Gao, J. Pei, L. Lai, Deep learning for drug-induced
liver injury, Journal of chemical information and modeling 55 (10)
(2015) 2085–2093.
19
[41] H. Altae-Tran, B. Ramsundar, A. S. Pappu, V. Pande, Low data drug
discovery with one-shot learning, ACS central science 3 (4) (2017) 283–
293.
[42] J. E. Ridings, The thalidomide disaster, lessons from the past, in:
Teratogenicity Testing, Springer, 2013, pp. 575–586.
[51] M. Kuhn, I. Letunic, L. J. Jensen, P. Bork, The sider database of drugs and
side effects, Nucleic acids research 44 (D1) (2015) D1075–D1079.
20
[52] A. Gaulton, A. Hersey, M. Nowotka, A. P. Bento, J. Chambers, D. Mendez,
P. Mutowo, F. Atkinson, L. J. Bellis, E. Cibri´an-Uhalte, et al., The chembl
database in 2017, Nucleic acids research 45 (D1) (2016) D945–D954.
21
[61] M. Herrero-Zazo, M. Lille, D. J. Barlow, Application of machine learning
in knowledge discovery for pharmaceutical drug-drug interactions., in:
KDWeb, 2016.
22
[70] J. Mower, D. Subramanian, T. Cohen, Learning predictive models of drug
side-effect relationships from distributed representations of literature-
derived semantic predications, Journal of the American Medical
Informatics Association 25 (10) (2018) 1339–1350.
23
[80] F. Li, W. Liu, H. Yu, Extraction of information related to adverse drug
events from electronic health record notes: Design of an end-to-end
model based on deep learning, JMIR medical informatics 6 (4).
24
features, Journal of the American Medical Informatics Association 22 (3)
(2015) 671–681.
25
TABLE 1: Open-source Databases with molecular, or pharmacological
information
Name Description References Resource
Figure Legend
26
to toxicity and safety assessments that can identify patterns that otherwise
would be overlooked. Traditional machine learning, including methods like
logistic regression, random forests, and support vector machines can
produce interpretable models with relatively low complexity. These methods
are desirable when the goal is to understand how the predictors affect the
incidence or risk of an adverse event. A new class of methods, called deep
neural networks – and often referred to as “artificial intelligence” – allows for
more complex models to be built at the cost of requiring significantly more
data. The benefit of using these algorithms is that they can automatically
identify non-linear patterns in the data without requiring much manual
intervention. Common examples that have been used in drug safety research
include convolutional and recurrent neural networks. In both cases, these
models have been used to in pre-clinical drug toxicity study, to model patient
diversity, and to facilitate lead selection and trial design, and in post-marking
surveillance to conduct comparative effectiveness research, identify drug-
drug interactions, and to aid in clinical decision making. AI-assisted drug
safety and toxicity science remains a nascent and growing field that requires
further research to evaluate its potential clinical impact.
27
GLOSSARY
Area under the Receiver Operating Curve (AUC): Area between the curve
and the x-axis. Heuristic used to evaluate the performance of
classification models with AUC =1 indicating perfect classification.
28
Diffusion state distance (DSD): Distance metric based on properties of
graph diffusion designed to capture distinctions between annotations
in protein-protein interactions
29
make predictions: supervised and unsupervised learning. In supervised
learning, an algorithm is used to learn the mapping between input
variables and an output, such as a label. The goal is for the algorithm to
learn to predict a correct output when a new input is provided. In
unsupervised learning, there are no assigned labels to the input training
data. Here, the machine’s goal is to learn representations of the input data
that can be used for tasks such as predicting future inputs, and decision
making, without an output.
30
Word embeddings: In Natural Language Processing (NLP), one of the main
challenges is finding a good representation for the vocabulary the
corpus covers. While some methods simply encode tokens in binary
vectors with a sparse representation, word embeddings learn a
representation that takes into account the token's context
31