Applications of Machine and Deep Learning in Adaptive Immunity
Applications of Machine and Deep Learning in Adaptive Immunity
Engineering
Applications of Machine and
Deep Learning in Adaptive
Immunity
Access provided by New York University - Bobst Library on 06/01/22. For personal use only.
Annu. Rev. Chem. Biomol. Eng. 2021.12:39-62. Downloaded from www.annualreviews.org
https://doi.org/10.1146/annurev-chembioeng- Abstract
101420-125021
Adaptive immunity is mediated by lymphocyte B and T cells, which respec-
Copyright © 2021 by Annual Reviews.
tively express a vast and diverse repertoire of B cell and T cell receptors
All rights reserved
and, in conjunction with peptide antigen presentation through major histo-
compatibility complexes (MHCs), can recognize and respond to pathogens
and diseased cells. In recent years, advances in deep sequencing have led to a
massive increase in the amount of adaptive immune receptor repertoire data;
additionally, proteomics techniques have led to a wealth of data on peptide–
MHC presentation. These large-scale data sets are now making it possible
to train machine and deep learning models, which can be used to identify
complex and high-dimensional patterns in immune repertoires. This arti-
cle introduces adaptive immune repertoires and machine and deep learning
related to biological sequence data and then summarizes the many applica-
tions in this field, which span from predicting the immunological status of a
host to the antigen specificity of individual receptors and the engineering of
immunotherapeutics.
39
INTRODUCTION
Immune Receptors in Adaptive Immunity
The ability of the adaptive immune system to recognize foreign pathogens and diseased cells is
driven by lymphocytes: B and T cells, which can identify specific molecular structures (antigens
or epitopes) on foreign pathogens. Specificity to these antigenic epitopes is achieved through a
group of adaptive immune receptors belonging to the immunoglobulin superfamily: B cell re-
ceptors (BCRs) and their secreted version antibodies and T cell receptors (TCRs). BCRs and
TCRs cover a large functional sequence space, enabling them to recognize the myriad of different
pathogens and antigenic determinants (epitopes) an individual gets exposed to throughout life.
Because these receptors belong to the immunoglobulin superfamily, both constitute a disulfide
bond–linked heterodimer of either typically a heavy and light chain (BCR) or an α and β chain (in
Access provided by New York University - Bobst Library on 06/01/22. For personal use only.
some cases γ and δ chain; TCR) and are thus structurally related. BCRs bind to antigen directly,
Annu. Rev. Chem. Biomol. Eng. 2021.12:39-62. Downloaded from www.annualreviews.org
B cell receptors and antibody repertoires. Unlike traditional methods of antibody analysis (i.e.,
serological binding assays), targeted deep sequencing of BCR variable regions in combination
with a carefully chosen experimental design (9) can capture a wealth of quantitative information
on repertoires, including clonal selection and expansion, clonal diversity, clonal convergence,
and clonal evolution via somatic hypermutation. BCR repertoire sequencing has been used to
shed light on basic questions in immunobiology and development across various species (10–12).
Furthermore, BCR repertoire sequencing has also been used for medical and biotechnological
40 Pertseva et al.
a L VH DH JH CH
BCR Germline DNA VH
CH3 CH2 40 23 6
CH
1V Somatic recombination
H CH
V H D H JH VL
CL AAA mRNA
VL
Vα Jα Cα
TCR Cα Vα Germline DNA Vα
~70 61
Somatic recombination 1
Nucleotide
10
Cβ Vβ
V α Jα C α Vβ
AAA mRNA Deep sequencing
b Peptide
Access provided by New York University - Bobst Library on 06/01/22. For personal use only.
Library preparation
Annu. Rev. Chem. Biomol. Eng. 2021.12:39-62. Downloaded from www.annualreviews.org
MHC I
Relative intensity
Peptidome
Figure 1
The immune receptor repertoire and immune peptidome. (a) B cell receptors (BCRs) and T cell receptors (TCRs) generate diversity in
their variable regions via the process of V(D)J recombination, which assembles different variable (V), diversity (D), and joining ( J)
segments together to create distinct variable region combinations, which are then followed by the constant regions (CL /CH 1–3) that
comprise the rest of the receptor. In BCR variable heavy (VH ) and TCR variable beta (Vβ ) chains, all three V, D, and J genes are used,
whereas in the BCR variable light (VL ) and TCR variable alpha (Vα ) chains, only V and J genes are used. Library preparation from
recombined genomic DNA or mRNA of variable regions is coupled to deep sequencing to enable quantitative analysis of the immune
receptor repertoire. Key interrogated features relate to clonal selection and expansion, clonal evolution by somatic hypermutation,
germline recombination, repertoire architecture, and more. (b) Major histocompatibility complexes (MHCs) present peptides to TCRs
on T cells. Each cell carries multiple MHC alleles on its surface, and each allele presents multiple peptides forming a cell peptidome—a
sum of all peptides presented on a surface by MHCs. Peptidomes can be obtained by purifying peptide–MHC complexes with
subsequent peptide elution. The resulting peptide pool sequences then can be identified via mass-spectrometry methods (i.e., liquid
chromatography-mass spectrometry, LC-MS/MS). Figure adapted from Cell Receptors (Alpha and Beta Chains) and B Cell Receptors
(Light and Heavy Chains) by BioRender.com (2020), retrieved from https://app.biorender.com/biorender-templates.
purposes, such as vaccine profiling (13–15) and discovery of monoclonal antibodies (16, 17).
Substantial recent progress in the field of droplet microfluidics has enabled single-cell sequencing
of lymphocytes, therefore providing detailed insights into the landscape of natively paired
heavy-/light-chain BCR repertoires (18) and their associated phenotypic properties. Several
sophisticated strategies have been devised to directly link paired BCR or TCR sequences to
antigen specificity (19–22).
T cell receptor repertoires. Similar to that of BCRs, deep sequencing of TCR repertoires pro-
vides a quantitative framework to understand and harness this information to address questions
in fundamental immunology, as well as applications in molecular diagnostics and immunothera-
peutics. TCR repertoire sequencing has been instrumental for profiling clonal selection across a
variety of T cell populations, including effector, memory, and exhausted cytotoxic CD8+ T cells
and a wide range of helper CD4+ T cell subsets (Th1, Th2, Th17, T follicular helper cells) (23–26).
These studies have been pivotal in understanding how clones either can coexist or are exclusive
to certain T cell subsets, thereby informing selective pressures shaping cell-mediated adaptive
Immune repertoire databases. To organize the output of immune repertoire sequencing ex-
periments, the Adaptive Immune Receptor Repertoire (AIRR) community was founded to bring
together academic researchers, industry partners, data experts, and others to manage immune
repertoire sequencing data in a standardized fashion (33). The AIRR community defined min-
Access provided by New York University - Bobst Library on 06/01/22. For personal use only.
Annu. Rev. Chem. Biomol. Eng. 2021.12:39-62. Downloaded from www.annualreviews.org
imal experimental information guidelines for data set annotation, such as V(D)J usage, species,
diagnosis, and standard file formats for the annotated data, thus supporting high workflow re-
producibility and greater ease for meta-analysis across data sets. The AIRR Data Commons API
is also available for easy access, querying, and implementation across several immune repertoire
sequencing data repositories (34). The current main repositories are the iReceptor and Observed
Antibody Space (OAS) databases, the latter of which is maintained by the Oxford Protein Infor-
matics Group, who have also established the Structural T-Cell Receptor Database for curated sets
of TCR sequences and their confirmed structures (35–37). Other independent repositories of in-
terest include VDJServer, a platform that offers a complete analysis workflow for preprocessing,
annotation, and characterization of BCR sequences, and the Pan Immune Repertoire Database
(PIRD), which collects annotated TCR and BCR sequences from the China National GeneBank
(38, 39). For further databases of interest, the AntiBodies Chemically Defined (ABCD) database
offers manually curated sequences of antibodies and their known targets; the VDJdb aggregates
antigen specificities of TCR sequences from published T cell specificity assays as well as TCR
motifs to be used in specificity prediction; and the PIRD contains a database of TCRs and BCRs
with confirmed specificity toward specific antigens or diseases (39–42).
42 Pertseva et al.
where they are degraded by pH-dependent proteases. MHC II molecules are preassembled in the
ER and, together with the other proteins necessary for peptide loading, are transported to endo-
somes (4). In endosomes, MHC II is loaded with digested protein fragments and then transferred
to the surface to present its ligands to CD4+ T cells.
Importantly, MHC I and II molecules are both polygenic and polymorphic, meaning cells
express multiple MHC alleles, and thousands of alleles are found in the human population. More-
over, MHC molecules are extremely promiscuous and, in theory, can bind more than a million dif-
ferent peptides (45). Such complexity of peptide presentation by MHC has thus made it challeng-
ing to obtain comprehensive peptide–MHC data sets and predict peptide presentation on MHC.
There are two primary ways to characterize peptide presentation on MHC, either by mea-
suring the affinity of the interaction of synthetic peptide–MHC or by mass spectrometry (MS)
from cells expressing peptide–MHC (Figure 1). Full details on these methods and emerging
Access provided by New York University - Bobst Library on 06/01/22. For personal use only.
Annu. Rev. Chem. Biomol. Eng. 2021.12:39-62. Downloaded from www.annualreviews.org
positive rate
Confusion matrix
Annu. Rev. Chem. Biomol. Eng. 2021.12:39-62. Downloaded from www.annualreviews.org
True-
sequence encoding AUC ROC
0.5 positive (1) negative (0)
A 1 0 0 0
Actual True False
T 0 0 0 1 positive (1) positive negative
0
G 0 0 1 0 0.5 1.0 False True
Actual
A 1 0 0 0 False- negative (0) positive negative
… …… … … positive rate
C 0 1 0 0
Figure 2
Machine learning (ML) and deep learning (DL) model implementations. (a) ML models are based on a set of algorithms that discover
patterns in the data without making rigid assumptions about their distribution. To train ML models, data must be described through
numerical variables or features relevant for the given learning task. An example of relevant features are CDR3 sequences, which can be
used for model training. A common ML classification model is a support vector machine (SVM), which optimizes the decision surface
that separates two classes. (b) DL is a subfield of ML that takes advantage of multilayered neural networks to extract relevant data
variables in an automated manner. Thus, the fundamental difference between the two is that ML requires a feature extraction step. In
the context of immune receptor repertoires, data consist of nucleotide (nt) or amino acid (a.a.) sequences, which can be converted into
numerical vectors using one of the encoding schemes. In one-hot encoding, each nt or a.a. is assigned to a unique position marked with
a one in an otherwise all-zeros vector. DL models are based on neural networks composed of the layers of nodes connected by weights
or parameters. (c) During the training, the SVM and neural network parameters are tuned to reduce the difference between true and
predicted outcome on a given training data set. ML and DL models are validated on new unseen data (test data), and their performance
is estimated by different metrics that are calculated based on the discordance between predicted and true labels. One of the widely used
metrics is an area under a receiver operating characteristic curve (AUC ROC) plotted in false- and true-positive rate coordinates and
calculated at multiple thresholds for label assignments. Figure adapted from images created with BioRender.com.
feature-selection techniques are an active area of research. For more details on ML algorithms,
please refer to the valuable resource by Murphy (49).
44 Pertseva et al.
which has led to breakthroughs in image and speech recognition as well as improved performance
over ML on a variety of problems. One disadvantage of many ML algorithms, especially those
that utilize neural networks and DL, is the lack of interpretability (51).
A wide spectrum of DL and neural network architectures have been developed that differ in
areas such as activation and loss functions, weight sharing, and connectivity between nodes. Con-
volutional neural networks (CNNs) have been employed extensively for image processing and
capture local patterns in the data independently of their position. This property also proved to be
valuable for many biological problems, as CNNs have been applied to classify binding sites of tran-
scription factors (52) or microRNA targets (53). Another major class of DL models are recurrent
neural networks (RNNs), which have made a breakthrough in speech and language processing
tasks and found their application in a variety of biological tasks, including protein engineering
(54). DL architectures are also used to reduce data dimensionality or generate novel sequences,
Access provided by New York University - Bobst Library on 06/01/22. For personal use only.
Annu. Rev. Chem. Biomol. Eng. 2021.12:39-62. Downloaded from www.annualreviews.org
with variational autoencoders (VAEs) and generative adversarial networks (GANs) being the main
algorithms applied to such tasks. Full coverage of DL models is outside the scope of this review;
the interested reader could refer to several additional resources (50, 51, 55).
Training data. The size and quality of training data are at the heart of all ML and DL models
and largely determine their performance, robustness, and accuracy. The structure of training data
also dictates which type of learning can be performed: supervised or unsupervised (51). Supervised
learning requires labeled data, meaning that each data point is paired with a certain output, such
as class or quantitative value. For example, a library of antibody variants with known amino acid
sequences (e.g., features) are stratified to binders and nonbinders (two output classes). Supervised
learning algorithms, such as SVMs or random forest, are then used to predict an outcome on
new data, such as whether a new antibody sequence is a binder or nonbinder to target antigen.
On the contrary, in unsupervised learning, only input features are given, and an algorithm must
make sense of data without guidance. Typical unsupervised learning tasks are data clustering and
dimensionality reduction performed with methods such as principal component analysis or k-
means clustering (51). Note that the division between supervised and unsupervised learning is
formal, and many algorithms can do both tasks as well as perform semisupervised learning; the
interested reader is directed to several other valuable reviews on applications of ML and DL in
biology (55, 61–63).
Features. The next essential step after data collection is feature selection and extraction. This step
is necessary to remove redundant (e.g., highly correlated) features and select the most relevant
ones for the specific learning task. Reducing the number of features speeds up learning time by
decreasing computational burden as well as simplifying the model. An example of different features
given to an algorithm could be distinct ways of antibody sequence representation: whole sequence
or only CDR3 region or frequency of amino acid substrings present in a sequence.
Model evaluation and testing. Once a model is trained, it must be evaluated on a new set of
Access provided by New York University - Bobst Library on 06/01/22. For personal use only.
Annu. Rev. Chem. Biomol. Eng. 2021.12:39-62. Downloaded from www.annualreviews.org
test data. The reason for this is that ML and DL models could have so many parameters that
they could memorize training data. However, such a model would have little value when applied
to a new data set owing to its low generalization ability and poor real-world performance. This
phenomenon, referred to as overfitting, represents a fundamental challenge in ML and DL. To
avoid bias in model evaluation, the data set is usually split into two parts: training and test set,
with no data overlap between the sets. A common practice of random data split may not be a valid
approach in biology (64). For example, training data based on protein sequences should consider
sequence homology when determining training and test splits, as related proteins (defined by their
sequence or structure similarity) should be located to the same split.
Metrics to evaluate model performance depend on several factors, such as problem type (re-
gression or classification) and the proportion of the classes in a data set. The most commonly
applied metrics for classification problems are metrics based on the confusion matrix values
(true positive, false positive, true negative, and false negative), for instance, precision and recall
(true positive rate). Another widely used metric is an area under receiver operating characteristic
curve (AUC ROC). AUC ROC is plotted in true positive rate–false positive rate coordinates that
are calculated at different confidence thresholds for label assignment. AUC ROC values of 0.5 in-
dicate a random classifier, and a value of 1.0 indicates a perfect classifier. As a rule of thumb, several
performance metrics should be computed to get a complete picture of the model performance.
One-hot encoding. One-hot encoding represents a simple way to encode categorical values such
as the 20 canonical amino acids (49). In one-hot encoding, each category (amino acid length) is
converted into a vector of length equal to the number of categories (20 amino acids). For the given
amino acid residue, 19 of the categories will be filled with a 0, whereas a 1 will be used for the one
category with the corresponding amino acid. One-hot encoding is widely used to transform cate-
gorical values and provides a good baseline; however, it is not computationally efficient, because it
46 Pertseva et al.
is sparse and high dimensional (65). High dimensionality of encoded input is also associated with
model overfitting, as the number of features exceeds the number of data points, a phenomenon
known as the curse of dimensionality (65). Additionally, one-hot encoding treats all amino acids
equally without taking their physical and biochemical properties into account.
k-mer encoding. In k-mer encoding, the sequences are split into substrings of length k, and a
frequency of every unique k-mer is calculated (66). Therefore, each sequence is encoded as a vector
indicating the frequency of k-mers it is composed of. One drawback of this method is again the
sparsity and high dimensionality of the encoding as the number of possible unique k-mers grows
exponentially with the increased k. For example, a DNA sequence encoded through the frequency
of its pentameric substrings would be converted into a vector of length 45 = 1,024 (number of
possible unique DNA pentamers). This effect is even more dramatic for protein encoding with
Access provided by New York University - Bobst Library on 06/01/22. For personal use only.
Annu. Rev. Chem. Biomol. Eng. 2021.12:39-62. Downloaded from www.annualreviews.org
Amino acid composition. A sequence is encoded as a vector of 20 in which every position corre-
sponds to the frequency of a given amino acid in a given sequence. The advantage of this method is
that all sequences are transformed into same-length dense vectors; however, the positional amino
acid information is lost, and preserving it is often important for many biological applications.
there may be high-order dependencies within positional sequences; thus, it may be advantageous
to incorporate structural information. Recent efforts using structural modeling approaches can be
found in other reviews, such as that from Graves et al. (76). Here, we instead focus on sequence-
based ML and DL models that take advantage of immune repertoire deep sequencing and are
being used to classify immune status, predict antigen specificity, and engineer immune receptor
drug candidates (Figure 3).
48 Pertseva et al.
1,000
CAKGPKLYW CARPHKLYW
800 CASBHGADW CARBNGYW
VL
Antigen
CARPHYSADY CAKAKLDYW
600 VH
400 0 1 0 0 0 0 0 0 0
200 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 … 0 0 Repertoire classification
0 0 0 0 0 1 0 0 0 0
0 200 600 1,000
………………… ……
Primary B cells IgG+
0 0 0 0 0 0 0 1 0
Antigen sorting
TCRα
Access provided by New York University - Bobst Library on 06/01/22. For personal use only.
Annu. Rev. Chem. Biomol. Eng. 2021.12:39-62. Downloaded from www.annualreviews.org
they had highly divergent antibody sequences. This highlights the need for sequence-based ap-
proaches to capture high-dimensional patterns, rather than just using simple sequence alignment
and similarity thresholds.
Initial methods of repertoire classification sought to learn underlying structures using single-
algorithm approaches. For example, an unsupervised method called ALICE has been established
to identify TCRs participating in an immune response; in this approach, TCRs that have more
neighbors (defined as max 1 amino acid difference in CDRβ3) than expected by a statistical model
are defined as involved in the immune response (84). The method can be applied to repertoire
data to detect public and private TCRs associated with a certain disease or condition. Similarly,
Emerson et al. (85) have developed a statistical classifier based on the presence of known cy-
tomegalovirus (CMV)-associated TCR sequences in repertoire data to diagnose the CMV status
of patients. By employing an SVM-based approach trained on the compositional information of
CDRH3 sequences, Greiff et al. (86) could predict public and private clones within human and
murine repertoires with 80% accuracy. However, the performance of their SVM model was also
highly dependent on the size of the training data set, with a thousandfold increase in training data
from 102 to 105 , improving final prediction accuracy by 25%. Despite this, once trained, the SVM
classifier was highly robust, achieving high performance on both BCR and TCR repertoires, as
ML models before selecting the best model for feature selection and exploratory analysis. Using
SIMON, the authors could identify immune signatures predictive of influenza vaccination from
five separate clinical studies of seasonal influenza vaccination. In another study, Konishi et al. (89)
incorporated features such as V/J-frame patterns, CDR lengths, number of somatic hypermuta-
tions, and physicochemical properties of amino acid sequences in the CDRs to build an ensemble
model that combines inputs from linear, Bayesian, and CNN classifiers. Their ensemble model was
then used to classify normal versus cancerous tissue based on the localized BCR sequences with an
AUC value of 0.826. Besides encoded amino acid sequences, the ensemble model also identified
other significant discriminative features, such as the number of somatic hypermutations between
tumor and normal tissue. In fact, a recent publication comparing BCR repertoires between six
different immune-mediated diseases also revealed that other features, such as isotype and V-gene
usage patterns, are also important for discriminating between disease states (90), which could be
captured by more complex ensemble models in the future. By incorporating a greater array of
features into the final discriminator, ensemble-based approaches hold great promise for accurate
repertoire classification.
Finally, repertoire classification is a multiple-instance learning problem, where in an immune
repertoire of millions of sequences, only a few are true sequences that indicate its class. Therefore,
it is important for a discriminator to isolate patterns important to the disease state, or immune sta-
tus, rather than other confounding factors, such as genetic background, environmental factors, or
immune history. To do so, several groups have leveraged attention-based classifiers that can iden-
tify true discriminator sequences within a repertoire (91–93). The most recent, DeepRC, devel-
oped by Widrich and colleagues (93), leverages the exponential memory capacity of modern Hop-
field networks to greatly improve the storage capacity of the model’s attention mechanism, which,
combined with CNN-based sequence embedding, allows it to efficiently and accurately extract
motifs and residues within repertoires that contribute toward prediction of a disease class. Tests
on both real-world CMV data (85) and simulated data show that DeepRC outperforms a panel
of SVM, K-nearest neighbors, logarithmic regression, and previous multiple-instance learning–
based models, especially when detecting sequence motifs with very low witness rates.
Finally, there is a need for ground truth data sets to benchmark ML and DL tools for im-
mune repertoire classifier evaluation (94). In particular, it is essential to be able to separate true
sequence enrichment from the generation probabilities of repertoires and other confounding fac-
tors referenced previously. To fulfill this need, various bioinformatic tools have been created for in
silico repertoire generation (95, 96). One notable example is immuneSIM, an open-access software
package that generates standardized ground truth immune repertoires to be used for comparative
benchmarking analysis (97).
50 Pertseva et al.
Machine Learning to Predict the Antigen Specificity of Immune Receptors
Traditional methods of antibody specificity prediction are based on antibody–antigen structures,
obtained either experimentally or through modeling. Tools such as the PEASE and BepiPred-
2.0 (random forest based), AntibodyInterfacePrediction (SVM based), AG-Fast-Parapred (RNN
based), and the recently updated proABC-2 (CNN based) all seek to apply features learned from
antibody–antigen structures to predict new paratopes and epitopes based on given antibody or
antigen sequences (98). Although all of these tools have been used for fast and relatively effective
prediction of epitopes given the paratope sequence, and vice versa, they are all restricted by scarce
antibody–antigen crystal structure data; consequently, they often fail at predicting antibody–
antigen binding with novel sequences.
In contrast, sequence-based approaches for specificity prediction are not constrained by the
need for readily available crystal structures and are emerging as a promising approach. The po-
Access provided by New York University - Bobst Library on 06/01/22. For personal use only.
Annu. Rev. Chem. Biomol. Eng. 2021.12:39-62. Downloaded from www.annualreviews.org
develop predictive models for protein structure and evolution (116, 117). In one prominent exam-
ple, Mason et al. (118) used deep mutational scanning–guided library design on the CDRH3 of the
therapeutic antibody trastuzumab and combined this with a mammalian cell directed evolution
platform to generate deep sequencing data of antigen-binding variants. These sequencing data
were used to train neural networks that could accurately predict the binding status of antibodies
based on their protein sequence. The authors also showed that their DL models outperform struc-
tural modeling software in predicting binding ability and can synthesize novel binding sequences
from a much larger sequence space. Liu et al. (119) proposed an ensemble model titled Ens-Grad
for sequence optimization and novel binder sequence synthesis trained on phage display data. Us-
ing Ens-Grad, the authors were able to apply antibody features for antigen specificity learned by
the ensemble model for in silico affinity maturation of new input seed sequences, showing that
Ens-Grad can generalize into the unseen antibody space. However, in contrast to the method
Mason et al. (118) described, Ens-Grad cannot engineer antibodies focused on a specific epitope,
as that feature is not in the initial training data and therefore not learned as a discriminator. The
authors noted that enriched sequences form isolated clusters of distinct sequence families, which
may correspond to specific epitopes; therefore, it may be possible in the future to include epitope
prediction as a step in the ensemble model.
In addition to engineering CDRs for affinity maturation, efforts are also underway to improve
antibody developability and humanization. For developability, ensemble models for aggregation
and random forest regression models for hydrophobicity have shown promise in rapidly identify-
ing liabilities in antibody libraries before and after selection in the discovery process (120, 121).
Recently, a new tool, the Therapeutic Antibody Profiler, has been developed for prediction of five
more developability characteristics based on antibody variable domain sequences (122). With re-
spect to humanization, Clavero-Álvarez et al. (123) used a multivariate Gaussian model trained
on human and mouse variable heavy and light sequences from the Immunogenetics database to
predict sequence humanness. The final model was used not only to assign a score to sequences
based on their degree of humanness (defined as the multivariate Gaussian score) but also to per-
form in silico humanization of murine antibody sequences. Focusing on capturing long-range
and higher-order dependencies between residues in a human repertoire, Wollacott et al. (124)
employed an RNN to quantify sequence nativeness. Trained on variable region sequences from
the OAS database, the final model could not distinguish human from mouse, chicken, and llama
sequences (AUC > 0.97), but the authors also showed the model can select germline sequences
more compatible with CDRs from nonhuman sources.
Although significant progress has been made in efficient assessment of antibody developability
and humanization, what is lacking are generative models capable of synthesizing novel sequences
given developability parameters. Very recently, steps have been taken to address this missing
52 Pertseva et al.
piece. Amimeur et al. (125) employed a GAN model trained on sequences from the OAS to
discriminate between synthetic and natural human repertoire sequences, following which the
GAN was used as a generative model to synthesize novel libraries of sequences with humanlike
properties for expression in Chinese hamster ovary cells (CHO) and phage-display systems.
Because other generative modeling approaches, such as VAEs, have also been used for in silico
generation of antibody sequences (106), it is foreseeable that DL will make a profound impact on
therapeutic antibody discovery and development.
antigen presentation (126, 127). Here we describe the main points and tools designed to predict
Annu. Rev. Chem. Biomol. Eng. 2021.12:39-62. Downloaded from www.annualreviews.org
MHC presentation.
bines both affinity data and eluted peptide data; however, this is achieved through modified loss
function. The latest version, MHCflurry2.0, adds an antigen processing prediction step on top of
the binding model. This antigen processing model learns to discriminate between predicted strong
binders originating from the same protein that were either present in the MS data set (hits) or not
observed (decoys) (137). Predictions from both models are then combined via logistic regression
to give a composite score. The authors showed that this method leads to a dramatic increase in
the positive predictive value of MHC ligand predictions.
Another line of tools, MixMHCpred, is based on probabilistic modeling and was developed
by Bassani-Sternberg & Gfeller (145). MixMHCpred is trained solely on eluted peptide data and
uses allele-specific position weight matrices for prediction of peptide binding. To decipher MHC
I peptidomics data, Bassani-Sternberg et al. (151) took advantage of the shared motifs across data
sets with shared HLA alleles, which allowed them to assign motifs and predict ligands more pre-
cisely. Identification of the MHC II motifs is more challenging in comparison to the MHC I motifs
owing to the longer peptides and flexible position of the binding core on a peptide. To take these
factors into account, Racle et al. (142) proposed a specific motif deconvolution algorithm called
MoDec. MoDec is a probabilistic algorithm that allows flexible binding core position and simul-
taneously learns MHC II motifs, weights of position matrices, and allele-specific preferences of
the binding core position (142). Deciphered motifs are then used to train MixMHC2pred, which
also integrates motifs of peptide N and C termini. In contrast to neural network tools such as
NetMHCpan and MHCflurry, the MixMHCpred suite is based on a simple linear method of po-
sition weight matrices. However, its precision is equivalent to neural network–based approaches,
suggesting that the peptide binding to MHC may simply have a linear complexity.
Overall, the performance of the MHC I and MHC II peptide binding prediction tools has
substantially improved in accuracy in recent years, largely driven by the increase in MS data and
ML models. Nevertheless, MHC II prediction accuracy is substantially lower owing to the higher
complexity of the problem: MHC II open groove accommodates ligands of variable lengths, and
the position of the binding core is flexible. Also, details of MHC II pathways are poorly studied in
comparison to MHC I, and less MHC II ligand data are available; thus, obtaining more data and
improving tools for MHC II ligand prediction would be an important direction for future work.
Other challenges are the binding prediction for peptides with post-translational modifications
and prediction of peptide immunogenicity (binding to TCRs) as well as immunodominance. The
factors governing these phenomena are still not fully understood, so further research and efforts
are needed to build ML models capable of predicting defining features of MHC ligands eliciting
a strong immune response.
54 Pertseva et al.
CONCLUSIONS AND FUTURE DIRECTIONS
This review highlights the substantial progress that has been made in applying ML and DL to
unravel the complexity of immune receptor repertoires. The field has been catalyzed by the re-
cent exponential growth of data on BCR and TCR repertoires from deep sequencing experiments,
as well as peptide–MHC data from MS experiments. This has made it possible to encode these
molecular sequence data and use them for training of ML and DL models. To date, a wide variety
of model architectures have been implemented, spanning from the simple (SVMs, random forest,
logistic regression) to the complex (CNNs, RNNs, VAEs). Future research in this field will bene-
fit from comparing more model architectures and establishing guidelines for model selection for
immune repertoires, as such standardization has been beneficial for ML and DL in other appli-
cations (i.e., image classification and speech recognition). Progress in ML and DL for adaptive
immunity still depends on the generation of large-scale and high-quality training data; although
Access provided by New York University - Bobst Library on 06/01/22. For personal use only.
Annu. Rev. Chem. Biomol. Eng. 2021.12:39-62. Downloaded from www.annualreviews.org
major progress has been made, there is still a major lack of immune repertoire data, which refers
to BCR sequences with known antigen specificity and TCR sequences with known peptide–MHC
specificity. Therefore, advanced experimental approaches, such as single-cell sequencing, recom-
binant library screening, and antigen binding and function assays, must continue to develop to
generate repertoire data with defined antigen specificity. The long-term trajectory of this field is
very promising, as immunology, like other fields of biology, is going through a transformation in
which it is merging with computational and data science; thus, ML and DL are poised to lead to
important advances in the basic understanding of adaptive immunity as well as applications such as
prediction of immune status and specificity, discovery, and development of immunotherapeutics.
DISCLOSURE STATEMENT
The authors are not aware of any affiliations, memberships, funding, or financial holdings that
might be perceived as affecting the objectivity of this review.
ACKNOWLEDGMENTS
This work was funded by the ERC Starting Grant Antibodyomics and ETH Research Grant (to
S.T.R.). Figures included in this review were created with BioRender.com.
LITERATURE CITED
1. Brack C, Hirama M, Lenhard-Schuller R, Tonegawa S. 1978. A complete immunoglobulin gene is cre-
ated by somatic recombination. Cell 15(1):1–14
2. Alt FW, Rosenberg N, Casanova RJ, Thomas E, Baltimore D. 1982. Immunoglobulin heavy-chain ex-
pression and class switching in a murine leukaemia cell line. Nature 296(5855):325–31
3. Tonegawa S. 1983. Somatic generation of antibody diversity. Nature 302(5909):575–81
4. Murphy K, Weaver C. 2016. Janeway’s Immunobiology. New York: Garland Sci. 9th ed.
5. Weigert MG, Cesari IM, Yonkovich SJ, Cohn M. 1970. Variability in the lambda light chain sequences
of mouse antibody. Nature 228(5276):1045–47
6. Jacob J, Kelsoe G, Rajewsky K, Weiss U. 1991. Intraclonal generation of antibody mutants in germinal
centres. Nature 354(6352):389–92
7. Liu YJ, Malisan F, de Bouteiller O, Guret C, Lebecque S, et al. 1996. Within germinal centers, isotype
switching of immunoglobulin genes occurs after the onset of somatic mutation. Immunity 4(3):241–
50
8. Mesin L, Ersching J, Victora GD. 2016. Germinal center B cell dynamics. Immunity 45(3):471–82
15. Lavinder JJ, Wine Y, Giesecke C, Ippolito GC, Horton AP, et al. 2014. Identification and characteriza-
tion of the constituent human serum antibodies elicited by vaccination. PNAS 111(6):2259–64
16. Doria-Rose NA, Schramm CA, Gorman J, Moore PL, Bhiman JN, et al. 2014. Developmental pathway
for potent V1V2-directed HIV-neutralizing antibodies. Nature 509(7498):55–62
17. Reddy ST, Ge X, Miklos AE, Hughes RA, Kang SH, et al. 2010. Monoclonal antibodies isolated without
screening by analyzing the variable-gene repertoire of plasma cells. Nat. Biotechnol. 28(9):965–69
18. DeKosky BJ, Kojima T, Rodin A, Charab W, Ippolito GC, et al. 2015. In-depth determination and
analysis of the human paired heavy- and light-chain antibody repertoire. Nat. Med. 21:86–91
19. Setliff I, Shiakolas AR, Pilewski KA, Murji AA, Mapengo RE, et al. 2019. High-throughput mapping of
B cell receptor sequences to antigen specificity. Cell 179(7):1636–46.e15
20. Wang B, DeKosky BJ, Timm MR, Lee J, Normandin E, et al. 2018. Functional interrogation and mining
of natively paired human VH:VL antibody repertoires. Nat. Biotechnol. 36(2):152–55
21. Gilman MSA, Castellanos CA, Chen M, Ngwuta JO, Goodwin E, et al. 2016. Rapid profiling of RSV an-
tibody repertoires from the memory B cells of naturally infected adult donors. Sci. Immunol. 1(6):eaaj1879
22. Gérard A, Woolfe A, Mottet G, Reichen M, Castrillon C, et al. 2020. High-throughput single-cell
activity-based screening and sequencing of antibodies using droplet microfluidics. Nat. Biotechnol.
38(6):715–21
23. Jiang X, Wang S, Zhou C, Wu J, Jiao Y, et al. 2020. Comprehensive TCR repertoire analysis of CD4+
T-cell subsets in rheumatoid arthritis. J. Autoimmun. 109:102432
24. Brenna E, Davydov AN, Ladell K, McLaren JE, Bonaiuti P, et al. 2020. CD4+ T follicular helper cells
in human tonsils and blood are clonally convergent but divergent from non-Tfh CD4+ cells. Cell Rep.
30(1):137–52.e5
25. Ritvo P-G, Saadawi A, Barennes P, Quiniou V, Chaara W, et al. 2018. High-resolution repertoire anal-
ysis reveals a major bystander activation of Tfh and Tfr cells. PNAS 115(38):9604–9
26. Maceiras AR, Almeida SCP, Mariotti-Ferrandiz E, Chaara W, Jebbawi F, et al. 2017. T follicular helper
and T follicular regulatory cells have different TCR specificity. Nat. Commun. 8:15067
27. Welten SPM, Yermanos A, Baumann NS, Wagen F, Oetiker N, et al. 2020. Tcf1+ cells are required to
maintain the inflationary T cell pool upon MCMV infection. Nat. Commun. 11:2295
28. Yermanos A, Sandu I, Pedrioli A, Borsa M, Wagen F, et al. 2020. Profiling virus-specific Tcf1+ T cell
repertoires during acute and chronic viral infection. Front. Immunol. 11:986
29. Woodsworth DJ, Castellarin M, Holt RA. 2013. Sequence analysis of T-cell repertoires in health and
disease. Genome Med. 5(10):98
30. Kirsch IR, Watanabe R, O’Malley JT, Williamson DW, Scott L-L, et al. 2015. TCR sequencing
facilitates diagnosis and identifies mature T cells as the cell of origin in CTCL. Sci. Transl. Med.
7(308):308ra158
31. Thomas S, Mohammed F, Reijmers RM, Woolston A, Stauss T, et al. 2019. Framework engineering to
produce dominant T cell receptors with enhanced antigen-specific function. Nat. Commun. 10:4451
32. Guo X-ZJ, Dash P, Calverley M, Tomchuck S, Dallas MH, Thomas PG. 2016. Rapid cloning, expression,
and functional characterization of paired αβ and γδ T-cell receptor chains from single-cell analysis. Mol.
Ther. Methods Clin. Dev. 3:15054
56 Pertseva et al.
33. Rubelt F, Busse CE, Bukhari SAC, Bürckert J-P, Mariotti-Ferrandiz E, et al. 2017. Adaptive immune
receptor repertoire community recommendations for sharing immune-repertoire sequencing data. Nat.
Immunol. 18:1274–78
34. Christley S, Aguiar A, Blanck G, Breden F, Bukhari SAC, et al. 2020. The ADC API: a web API for the
programmatic query of the AIRR Data Commons. Front. Big Data 3:22
35. Corrie BD, Marthandan N, Zimonja B, Jaglale J, Zhou Y, et al. 2018. iReceptor: a platform for querying
and analyzing antibody/B-cell and T-cell receptor repertoire data across federated repositories. Immunol.
Rev. 284(1):24–41
36. Kovaltsuk A, Leem J, Kelm S, Snowden J, Deane CM, Krawczyk K. 2018. Observed Antibody Space: a
resource for data mining next-generation sequencing of antibody repertoires. J. Immunol. 201(8):2502–9
37. Leem J, de Oliveira SHP, Krawczyk K, Deane CM. 2018. STCRDab: the Structural T-Cell Receptor
Database. Nucleic Acids Res. 46(D1):D406–12
38. Christley S, Scarborough W, Salinas E, Rounds WH, Toby IT, et al. 2018. VDJServer: a cloud-based
Access provided by New York University - Bobst Library on 06/01/22. For personal use only.
Annu. Rev. Chem. Biomol. Eng. 2021.12:39-62. Downloaded from www.annualreviews.org
analysis portal and data commons for immune repertoire sequences and rearrangements. Front. Immunol.
9:976
39. Zhang W, Wang L, Liu K, Wei X, Yang K, et al. 2020. PIRD: Pan Immune Repertoire Database.
Bioinformatics 36(3):897–903
40. Lima WC, Gasteiger E, Marcatili P, Duek P, Bairoch A, Cosson P. 2020. The ABCD database: a repos-
itory for chemically defined antibodies. Nucleic Acids Res. 48(D1):D261–64
41. Shugay M, Bagaev DV, Zvyagin IV, Vroomans RM, Crawford JC, et al. 2018. VDJdb: a curated database
of T-cell receptor sequences with known antigen specificity. Nucleic Acids Res. 46(D1):D419–27
42. Bagaev DV, Vroomans RMA, Samir J, Stervbo U, Rius C, et al. 2020. VDJdb in 2019: database extension,
new analysis infrastructure and a T-cell receptor motif compendium. Nucleic Acids Res. 48(D1):D1057–62
43. Matsumura M, Fremont DH, Peterson PA, Wilson IA. 1992. Emerging principles for the recognition
of peptide antigens by MHC class I molecules. Science 257(5072):927–34
44. Chicz RM, Urban RG, Lane WS, Gorga JC, Stern LJ, et al. 1992. Predominant naturally processed
peptides bound to HLA-DR1 are derived from MHC-related molecules and are heterogeneous in size.
Nature 358(6389):764–68
45. Eisen HN, Hou XH, Shen C, Wang K, Tanguturi VK, et al. 2012. Promiscuous binding of extracellular
peptides to cell surface class I MHC protein. PNAS 109(12):4580–85
46. Joglekar AV, Li G. 2020. T cell antigen discovery. Nat. Methods. https://doi.org/10.1038/s41592-020-
0867-z
47. Vita R, Mahajan S, Overton JA, Dhanda SK, Martini S, et al. 2019. The Immune Epitope Database
(IEDB): 2018 update. Nucleic Acids Res. 47(D1):D339–43
48. Bzdok D, Altman N, Krzywinski M. 2018. Statistics versus machine learning. Nat. Methods 15(4):233–34
49. Murphy KP. 2012. Machine Learning: A Probabilistic Perspective. Cambridge, MA: MIT Press
50. LeCun Y, Bengio Y, Hinton G. 2015. Deep learning. Nature 521(7553):436–44
51. Goodfellow I, Bengio Y, Courville A. 2016. Deep Learning. Cambridge, MA: MIT Press
52. Wang M, Tai C, E W, Wei L. 2018. DeFine: Deep convolutional neural networks accurately quan-
tify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding
variants. Nucleic Acids Res. 46(11):e69
53. Cheng S, Guo M, Wang C, Liu X, Liu Y, Wu X. 2016. MiRTDL: a deep learning approach for miRNA
target prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 13(6):1161–69
54. Alley EC, Khimulya G, Biswas S, AlQuraishi M, Church GM. 2019. Unified rational protein engineering
with sequence-based deep representation learning. Nat. Methods 16(12):1315–22
55. Angermueller C, Pärnamaa T, Parts L, Stegle O. 2016. Deep learning for computational biology. Mol.
Syst. Biol. 12(7):878
56. Zhou J, Troyanskaya OG. 2015. Predicting effects of noncoding variants with deep learning-based se-
quence model. Nat. Methods 12(10):931–34
57. Alipanahi B, Delong A, Weirauch MT, Frey BJ. 2015. Predicting the sequence specificities of DNA- and
RNA-binding proteins by deep learning. Nat. Biotechnol. 33(8):831–38
58. Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, et al. 2020. Improved protein structure prediction
using potentials from deep learning. Nature 577(7792):706–10
58 Pertseva et al.
85. Emerson RO, DeWitt WS, Vignali M, Gravley J, Hu JK, et al. 2017. Immunosequencing identifies
signatures of cytomegalovirus exposure history and HLA-mediated effects on the T cell repertoire. Nat.
Genet. 49(5):659–65
86. Greiff V, Weber CR, Palme J, Bodenhofer U, Miho E, et al. 2017. Learning the high-dimensional im-
munogenomic features that predict public and private antibody repertoires. J. Immunol. 199(8):2985–
97
87. Ostmeyer J, Christley S, Rounds WH, Toby I, Greenberg BM, et al. 2017. Statistical classifiers for diag-
nosing disease from immune repertoires: a case study using multiple sclerosis. BMC Bioinform. 18:401
88. Tomic A, Tomic I, Rosenberg-Hasson Y, Dekker CL, Maecker HT, Davis MM. 2019. SIMON, an auto-
mated machine learning system, reveals immune signatures of influenza vaccine responses. J. Immunol.
203(3):749–59
89. Konishi H, Komura D, Katoh H, Atsumi S, Koda H, et al. 2019. Capturing the differences between
humoral immunity in the normal and tumor environments from repertoire-seq of B-cell receptors using
Access provided by New York University - Bobst Library on 06/01/22. For personal use only.
Annu. Rev. Chem. Biomol. Eng. 2021.12:39-62. Downloaded from www.annualreviews.org
(RAbD): a general framework for computational antibody design. PLOS Comput. Biol. 14(4):e1006112
112. Nimrod G, Fischman S, Austin M, Herman A, Keyes F, et al. 2018. Computational design of epitope-
specific functional antibodies. Cell Rep. 25(8):2121–31.e5
113. Baran D, Pszolla MG, Lapidoth GD, Norn C, Dym O, et al. 2017. Principles for computational design
of binding antibodies. PNAS 114(41):10900–5
114. Myung Y, Pires DEV, Ascher DB. 2020. mmCSM-AB: guiding rational antibody engineering through
multiple point mutations. Nucleic Acids Res. 48(W1):W125–31
115. Fowler DM, Fields S. 2014. Deep mutational scanning: a new style of protein science. Nat. Methods
11(8):801–7
116. Rollins NJ, Brock KP, Poelwijk FJ, Stiffler MA, Gauthier NP, et al. 2019. Inferring protein 3D structure
from deep mutation scans. Nat. Genet. 51(7):1170–76
117. Riesselman AJ, Ingraham JB, Marks DS. 2018. Deep generative models of genetic variation capture the
effects of mutations. Nat. Methods 15(10):816–22
118. Mason DM, Friedensohn S, Weber CR, Jordi C, Wagner B, et al. 2021. Optimization of therapeutic
antibodies by predicting antigen specificity from antibody sequence via deep learning. Nat. Biomed. Eng.
https://doi.org/10.1038/s41551-021-00699-9
119. Liu G, Zeng H, Mueller J, Carter B, Wang Z, et al. 2020. Antibody complementarity determining region
design using high-capacity machine learning. Bioinformatics 36(7):2126–33
120. Obrezanova O, Arnell A, de la Cuesta RG, Berthelot ME, Gallagher TRA, et al. 2015. Aggregation risk
prediction for antibodies and its application to biotherapeutic development. mAbs 7(2):352–63
121. Jain T, Boland T, Lilov A, Burnina I, Brown M, et al. 2017. Prediction of delayed retention of antibod-
ies in hydrophobic interaction chromatography from sequence using machine learning. Bioinformatics
33(23):3758–66
122. Raybould MIJ, Marks C, Krawczyk K, Taddese B, Nowak J, et al. 2019. Five computational developability
guidelines for therapeutic antibody profiling. PNAS 116(10):4025–30
123. Clavero-Álvarez A, Di Mambro T, Perez-Gaviro S, Magnani M, Bruscolini P. 2018. Humanization of
antibodies using a statistical inference approach. Sci. Rep. 8:14820
124. Wollacott AM, Xue C, Qin Q, Hua J, Bohnuud T, et al. 2019. Quantifying the nativeness of antibody
sequences using long short-term memory networks. Protein Eng. Des. Sel. 32(7):347–54
125. Amimeur T, Shaver JM, Ketchem RR, Taylor JA. 2020. Designing feature-controlled humanoid anti-
body discovery libraries using generative adversarial networks. bioRxiv 2020.04.12.024844. https://doi.
org/10.1101/2020.04.12.024844
126. Mei S, Li F, Leier A, Marquez-Lago TT, Giam K, et al. 2020. A comprehensive review and perfor-
mance evaluation of bioinformatics tools for HLA class I peptide-binding prediction. Brief. Bioinform.
21(4):1119–35
127. Nielsen M, Andreatta M, Peters B, Buus S. 2020. Immunoinformatics: predicting peptide-MHC binding.
Annu. Rev. Biomed. Data Sci. 3:191–215
128. Nielsen M, Lundegaard C, Lund O, Keşmir C. 2005. The role of the proteasome in generating cytotoxic
T-cell epitopes: insights obtained from improved predictions of proteasomal cleavage. Immunogenetics
57(1–2):33–41
60 Pertseva et al.
129. Tenzer S, Peters B, Bulik S, Schoor O, Lemmel C, et al. 2005. Modeling the MHC class I pathway by
combining predictions of proteasomal cleavage, TAP transport and MHC class I binding. Cell Mol. Life
Sci. 62(9):1025–37
130. Stranzl T, Larsen MV, Lundegaard C, Nielsen M. 2010. NetCTLpan: pan-specific MHC class I pathway
epitope predictions. Immunogenetics 62(6):357–68
131. Schneidman-Duhovny D, Khuri N, Dong GQ, Winter MB, Shifrut E, et al. 2018. Predicting CD4
T-cell epitopes based on antigen cleavage, MHCII presentation, and TCR recognition. PLOS ONE
13(11):e0206654
132. Yewdell JW, Bennink JR. 1999. Immunodominance in major histocompatibility complex class I-
restricted T lymphocyte responses. Annu. Rev. Immunol. 17:51–88
133. Garstka MA, Fish A, Celie PHN, Joosten RP, Janssen GMC, et al. 2015. The first step of peptide selec-
tion in antigen presentation by MHC class I molecules. PNAS 112(5):1505–10
134. Toes RE, Nussbaum AK, Degermann S, Schirle M, Emmerich NP, et al. 2001. Discrete cleavage motifs
Access provided by New York University - Bobst Library on 06/01/22. For personal use only.
Annu. Rev. Chem. Biomol. Eng. 2021.12:39-62. Downloaded from www.annualreviews.org
62 Pertseva et al.
CH12_TOC ARjats.cls May 18, 2021 8:28
Annual Review
of Chemical and
Biomolecular
Engineering
Stanley I. Sandler p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 1
Data Science in Chemical Engineering: Applications
to Molecular Science
Chowdhury Ashraf, Nisarg Joshi, David A.C. Beck, and Jim Pfaendtner p p p p p p p p p p p p p p p p p p15
Applications of Machine and Deep Learning in Adaptive Immunity
Margarita Pertseva, Beichen Gao, Daniel Neumeier, Alexander Yermanos,
and Sai T. Reddy p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p39
Infochemistry and the Future of Chemical Information Processing
Nikolay V. Ryzhkov, Konstantin G. Nikolaev, Artemii S. Ivanov,
and Ekaterina Skorb p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p63
Modeling Food Particle Systems: A Review of Current Progress
and Challenges
Lennart Fries p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p97
Dynamic Interconversion of Metal Active Site Ensembles
in Zeolite Catalysis
Siddarth H. Krishna, Casey B. Jones, and Rajamani Gounder p p p p p p p p p p p p p p p p p p p p p p p p p p p 115
Characterization of Nanoporous Materials
M. Thommes and C. Schlumberger p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 137
Emerging Biomedical Applications Based on the Response of Magnetic
Nanoparticles to Time-Varying Magnetic Fields
Angelie Rivera-Rodriguez and Carlos M. Rinaldi-Ramos p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 163
Nature-Inspired Chemical Engineering for Process Intensification
Marc-Olivier Coppens p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 187
Engineering Advances in Spray Drying for Pharmaceuticals
John M. Baumann, Molly S. Adam, and Joel D. Wood p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 217
Predictive Platforms of Bond Cleavage and Drug Release Kinetics
for Macromolecule–Drug Conjugates
Souvik Ghosal, Javon E. Walker, and Christopher A. Alabi p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 241
CH12_TOC ARjats.cls May 18, 2021 8:28
Errata
An online log of corrections to Annual Review of Chemical and Biomolecular Engineering
articles may be found at http://www.annualreviews.org/errata/chembioeng