0% found this document useful (0 votes)

32 views11 pages

Prototree

The document presents the Neural Prototype Tree (ProtoTree), an interpretable deep learning model designed for fine-grained image recognition that combines prototype learning with decision trees. ProtoTree achieves competitive accuracy while maintaining interpretability by using a binary tree structure with trainable prototypes, enabling both global and local explanations of its decision-making process. The model is particularly effective for classifying images from datasets like CUB-200-2011 and Stanford Cars, and it is trained end-to-end using standard loss functions without requiring additional annotations.

Uploaded by

Gita

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views11 pages

Prototree

Uploaded by

Gita

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Neural Prototype Trees for Interpretable Fine-grained Image Recognition

Meike Nauta1 Ron van Bree1 Christin Seifert1,2

1
University of Twente, the Netherlands 2 University of Duisburg-Essen, Germany
[email protected], [email protected], [email protected]

LEGEND PROTOTREE - GLOBALLY INTERPRETABLE MODEL LOCAL EXPLANATION

Prototype Prototype
present
(looks like) Test
Prototype Image
not present
(not like) ...

Training Data
Predic�on
Training Data

Summer Tanager Painted Bun�ng Scarlet Tanager Vermillion Flycatcher

Example Example
Training Training
Images Images Scarlet
Tanager

Figure 1: A ProtoTree is a globally interpretable model faithfully explaining its entire reasoning (left, partially shown).
Additionally, the decision making process for a single prediction can be followed (right): the presence of a red chest and
black wing, and the absence of a black stripe near the eye, identifies a Scarlet Tanager. A pruned ProtoTree learns roughly
200 prototypes for CUB (dataset with 200 bird species), making only 8 local decisions on average for one test image.

Abstract 1. Introduction

Prototype-based methods use interpretable representa- There is an ongoing scientific dispute between simple,
tions to address the black-box nature of deep learning mod- interpretable models and complex black boxes, such as
els, in contrast to post-hoc explanation methods that only Deep Neural Networks (DNNs). DNNs have achieved su-
approximate such models. We propose the Neural Prototype perior performance, especially in computer vision, but their
Tree (ProtoTree), an intrinsically interpretable deep learn- complex architectures and high-dimensional feature spaces
ing method for fine-grained image recognition. ProtoTree has led to an increasing demand for transparency, inter-
combines prototype learning with decision trees, and thus pretability and explainability [1], particularly in domains
results in a globally interpretable model by design. Addi- with high-stakes decisions [43]. In contrast, decision trees
tionally, ProtoTree can locally explain a single prediction are easy to understand and interpret [14, 19], because they
by outlining a decision path through the tree. Each node transparently arrange decision rules in a hierarchical struc-
in our binary tree contains a trainable prototypical part. ture. Their predictive performance is however far from
The presence or absence of this learned prototype in an im- competitive for computer vision tasks. We address this so-
age determines the routing through a node. Decision mak- called ‘accuracy-interpretability trade-off’ [1, 35] by com-
ing is therefore similar to human reasoning: Does the bird bining the expressiveness of deep learning with the inter-
have a red throat? And an elongated beak? Then it’s a pretability of decision trees.
hummingbird! We tune the accuracy-interpretability trade- We present the Neural Prototype Tree, ProtoTree in
off using ensemble methods, pruning and binarizing. We short, an intrinsically interpretable method for fine-grained
apply pruning without sacrificing accuracy, resulting in a image recognition. A ProtoTree has the representational
small tree with only 8 learned prototypes along a path to power of a neural network, and contains a built-in binary
classify a bird from 200 species. An ensemble of 5 Pro- decision tree structure, as shown in Fig. 1 (left). Each in-
toTrees achieves competitive accuracy on the CUB-200- ternal node in the tree contains a trainable prototype. Our
2011 and Stanford Cars data sets. Code is available at prototypes are prototypical parts learned with backpropa-
github.com/M-Nauta/ProtoTree. gation, as introduced in the Prototypical Part Network (Pro-

14933
Test image Prototype Similarity Weight Points Test image Prototype Similarity Weight Points
toPNet) [9] where a prototype is a trainable tensor that can score score
be visualized as a patch of a training sample. The extent 4.17 × 1.28 = 5.34 2.65 × 1.35 = 3.58
to which this prototype is present in an input image deter-
+
mines the routing of the image through the corresponding +

node. Leaves of the ProtoTree learn class distributions. The 3.98 × 1.29 = 5.13 1.78 × 1.02 = 1.82
paths from root to leaves represent the learned classifica-

…
tion rules. The reasoning of our model is thus similar to the Total points Lazuli Bunting = 36.16 Total points Indigo Bunting = 21.86

“Guess Who?” game where a player asks a series of binary

Figure 2: Excerpt from classification process of ProtoP-
questions related to visual properties to find out which of
Net [9]. ProtoPNet learns 10 prototypes per class, resulting
the 24 displayed images the other player had in mind.
in 2000 prototypes for CUB, therefore making 2000 local
To this end, a ProtoTree consists of a Convolutional Neu- decisions for one test image.
ral Network (CNN) followed by a binary tree structure and
can be trained end-to-end with a standard cross-entropy loss
function. We only require class labels and do not need 2. Related Work
any other annotations. To make the tree differentiable and
back-propagation compatible, we utilize a soft decision tree, Within computer vision, various explainability strategies
meaning that a sample is routed through both children, each exist for different notions of interpretability. A machine
with a certain weight. We present a novel routing procedure learning model can be explained for a single prediction,
based on the similarity between the latent image embedding e.g. part-based methods [56, 61], saliency maps [6, 13, 60]
and a prototype. We show that a trained soft ProtoTree can or representer points [53]. Others explain the internals of
be converted to a hard, and therefore more interpretable, a model, with e.g. activation maximization [39, 41] to vi-
ProtoTree without loss of accuracy. sualize neurons, deconvolution or upconvolution [12, 54]
A ProtoTree approximates the accuracy of non- to explain layers, generating image exemplars [18] to ex-
interpretable classifiers, while being interpretable-by- plain the latent space, or concept activation vectors [26]
design and offering truthful global and local explanations. to explain model sensitivity. While such post-hoc methods
This way it provides a novel take on interpretable machine give an intuition about the black-box model, intrinsic inter-
learning. In contrast to post-hoc explanations, which ap- pretable models such as classical decision trees, are fully
proximate a trained model or its output [37, 31], a ProtoTree simulatable since they faithfully show the decision mak-
is inherently interpretable since it directly incorporates in- ing process. Similarly, by utilizing interpretable features
terpretability in the structure of the predictive model [37]. A as splitting criteria, a ProtoTree’s decision making process
ProtoTree therefore faithfully shows its entire classification can be understood in its entirety, as well as for a single pre-
behaviour, independent of its input, providing a global ex- diction. ProtoTree combines prototypical feature represen-
planation (Fig. 1). As a consequence, our compact tree en- tations (Sec. 2.1) with soft-decision tree learning (Sec. 2.2).
ables a human to convey, or even print out, the whole model.
In contrast to local explanations, which explain a single pre- 2.1. Interpretability with Prototypes
diction and can be unstable and contradicting [3, 27], global Prototypes are visual explanations that can be incor-
explanations enable simulatability [35]. Additionally, our porated in a model for intrinsic interpretability. ProtoAt-
ProtoTree can produce local explanations by showing the tend [5] and DMR [4] use full images. In contrast, we go
routing of a specific input image through the tree (Fig. 1, for prototypical parts to break up the decision making pro-
right). Hence, ProtoTree allows retraceable decisions in a cess in small steps. Our choice is supported by BagNet [8]
human-comprehensible number of steps. In case of a mis- which found that “even complex perceptual tasks like Im-
classification, the responsible node can be easily identified ageNet can be solved just based on small image features
by tracking down the series of decisions, which eases error and without any notion of spatial relationships”. Related is
analysis. the Classification-By-Components network [44] that learns
Scientific Contributions positive, negative and indefinite visual components moti-
• An intrinsically interpretable neural prototype tree ar- vated by the recognition-by-components theory [7] describ-
chitecture for fine-grained image recognition. ing how humans recognize objects by segmenting it into
• Outperforming ProtoPNet [9] while having roughly multiple components.
only 10% of the number of prototypes, included in a We build upon the Prototypical Part Network (ProtoP-
built-in hierarchical structure. Net) [9], an intrinsically interpretable deep network archi-
• An ensemble of 5 interpretable ProtoTrees achieves tecture for case-based reasoning. Since their prototypes
competitive performance on CUB-200-2011 [50] have smaller spatial dimensions than the image itself, they
(CUB) and Stanford Cars [30]. represent prototypical parts and are therefore suited for

14934
MNIST. Furthermore, the above methods lose the main at-
tractive property of decision trees: interpretability. DNDFs
can be locally interpreted by visualizing a path of saliency
maps [33], as shown in Fig. 3a. Frosst & Hinton [15] train a
perceptron for each node, and visualize the learned weights
(a) DNDF [33] (b) SDT [15] (c) SDT [22] (d) Ours (Fig. 3b). The limited representational power of perceptrons
however leads to suboptimal classification results. The ap-
Figure 3: Visualized root node from soft decision trees. Ap- proach in [22] makes SDTs deterministic at test time and
plied to resp. CIFAR10, MNIST, FashionMNIST and CUB. linear split parameters can be visualized and enhanced with
Republished with permission from the authors (a-c). a spatial regularization term (Fig. 3c). In contrast to these
interpretable methods, we apply our method to natural im-
ages for fine-grained image recognition and our visualiza-
fine-grained image classification. ProtoPNet learns a pre- tions are sharp and full-color, therefore improving inter-
determined number of prototypical parts (prototypes) per pretability (Fig. 3d). Instead of image recognition, Neural-
class. To classify an image, the similarity between a pro- Backed Decision Trees for Segmentation [52] use visual de-
totype and a patch in the image is calculated by measur- cision rules with saliency maps for segmentation.
ing the distance in latent space. The resulting similarity Other tree approaches for image classification apply
scores are weighted by values learned by a fully-connected post-hoc explanation techniques, by showing example im-
layer. The explanation of ProtoPNet shows the reasoning ages per node [2, 55], visualizing typical CNN filters of
process for a single image, by visualizing all prototypes to- each node that can be manually labelled [55], showing class
gether with their weighted similarity score. Summing the activation maps [25] or manual inspection of leaf labels
weighted similarity scores per class gives a final score for and the meaning of internal nodes [51]. We extend prior
the image belonging to each class, as shown in Fig. 2. We work by including prototypes in a tree structure, thereby
improve upon ProtoPNet by showing an easy-to-interpret obtaining a globally explainable, intrinsically interpretable
global explanation by means of a decision tree. Such a hi- model with only one decision per node. Additionally, sim-
erarchical, logical model aids interpretability [14, 43], since ilar to ProtoPNet [9], a ProtoTree does not require manual
a tree has various conceptual advantages compared to a lin- labelling and is therefore self-explanatory. Our work differs
ear bag-of-prototypes: a tree enforces a sequence of steps from hierarchical image classification (e.g., a gibbon is an
and it supports negative associations (i.e. absence of proto- animal and a primate) such as [20], since we do not require
type), thereby reducing the number of prototypes and bet- hierarchical labels or a predefined taxonomy.
ter mimicking human reasoning. The hierarchical structure
therefore enhances interpretability and could also lead to 3. Neural Prototype Tree
more insights w.r.t. clusters in the data. Instead of multi-
plying similarity scores with weights, our local explanation A Neural Prototype Tree (ProtoTree) hierarchically
shows the routing of a sample through the tree. Further- routes an image through a binary tree for interpretable im-
more, we do not have class-specific prototypes, do not need age recognition. We now formalise the definition of a Pro-
to learn weights for similarity scores and we use a simple toTree for supervised learning. Consider a classification
cross-entropy loss function. problem with training set T containing N labelled images
{(x(1) , y (1) ), ..., (x(N ) , y (N ) )} ∈ X × Y. Given an input
2.2. Neural Soft Decision Trees image x, a ProtoTree predicts the class probability distribu-
tion over K classes, denoted as ŷ. We use y to denote the
Soft Decision Trees (SDTs) have shown to be more one-hot encoded ground-truth label y such that we can train
accurate than traditional hard decision trees [23, 24, 46]. a ProtoTree by minimizing the cross-entropy loss between y
Only recently deep neural networks are integrated in binary and ŷ. A ProtoTree can also be trained with soft labels from
SDTs. The Deep Neural Decision Forest (DNDF) [29] is a trained model for knowledge distillation, similar to [15].
an ensemble of neural SDTs: a neural network learns a A ProtoTree T is a combination of a convolutional neu-
latent representation of the input, and each node learns a ral network (CNN) f with a soft neural binary decision
routing function. Adaptive Neural Trees [47] (ANTs) are tree structure. As shown in Fig. 4, an input image is first
a generalization of DNDF. Each node can transform and forwarded through f . The resulting convolutional output
route its input with a small neural network. In contrast to z = f (x; ω) consists of D two-dimensional (H × W ) fea-
most SDTs that require a fixed tree structure, including ours, ture maps, where ω denotes the trainable parameters of f .
ANTs greedily learn a binary tree topology. Such greedy al- Secondly, the latent representation z ∈ RH×W ×D serves as
gorithms however could lead to suboptimal trees [40], and input for a binary tree. This tree consists of a set of internal
are only applied to simple classification problems such as nodes N , a set of leaf nodes L, and a set of edges E. Each

14935
Figure 4: Decision making process of a ProtoTree to predict class probability distribution ŷ of input image x. During training,
prototypes pn ∈ P , leaves’ class distributions c and CNN parameters ω are learned. Probabilities pe (shown with example
values) depend on the similarity between a patch in the latent input image and a prototype.

internal node n ∈ N has exactly two child nodes: n.left z is traversed through all edges and ends up in each leaf
connected by edge e(n, n.left) ∈ E and n.right connected node ℓ ∈ L with a certain probability. Path Pℓ denotes the
by e(n, n.right) ∈ E. Each internal node n ∈ N corre- sequence of edges from the root node to leaf ℓ. The prob-
sponds to a trainable prototype pn ∈ P . We follow the ability of sample z arriving in leaf ℓ, denoted as πℓ , is the
prototype definition of ProtoPNet [9] where each prototype product of probabilities of the edges in path Pℓ :
is a trainable tensor of shape H1 × W1 × D (with H1 ≤ H, Y
W1 ≤ W , and in our implementation H1 = W1 = 1) such πℓ (z) = pe (z). (3)
that the prototype’s depth corresponds to the depth of the e∈Pℓ
convolutional output z.
Each leaf node ℓ ∈ L carries a trainable parameter cℓ ,
We use a form of generalized convolution without
denoting the distribution in that leaf over the K classes that
bias [17], where each prototype pn ∈ P acts as a kernel
needs to be learned. The softmax function σ(cℓ ) normal-
by ‘sliding’ over z of shape H × W × D and computes
izes cℓ to get the class probability distribution of leaf ℓ. To
the Euclidean distance between pn and its current receptive
obtain the final predicted class probability distribution ŷ for
field z̃ (called a patch). We apply a minimum pooling op-
input image x, latent representation z = f (x|ω) is tra-
eration to select the patch in z of shape H1 × W1 × D that
versed through all edges in T such that all leaves contribute
is closest to prototype pn :
to the final prediction ŷ. An example prediction is shown
z̃ ∗ = argmin ||z̃ − pn ||. (1) on the right of Fig. 4. The contribution of leaf ℓ is weighted
z̃∈patches(z) by path probability πℓ , such that:
X
The distance between the nearest latent patch z̃ ∗ and pro- ŷ(x) = σ(cℓ ) · πℓ (f (x; ω)). (4)
totype pn determines to what extent the prototype is present ℓ∈L
anywhere in the input image, which influences the routing
of z through corresponding node n. In contrast to tradi- 4. Training a ProtoTree
tional decision trees, where an internal node routes sample
Training a ProtoTree requires to learn the parameters ω
z either right or left, our node n ∈ N is soft and routes z to
of CNN f for informative feature maps, the nodes’ proto-
both children, each with a fuzzy weight within [0, 1], giving
types P for routing and the leaves’ class distribution logits
it a probabilistic interpretation [15, 23, 29, 47]. Following
c for prediction. The number of prototypes to be learned,
this probabilistic terminology, we define the similarity be-
i.e. |P |, depends on the tree size. A binary tree structure is
tween z̃ ∗ and pn , and therefore the probability of routing
initialized by defining a maximum height h, which creates
sample z through the right edge as
2h leaves and 2h − 1 prototypes. Thus, the computational
pe(n,n.right) (z) = exp(−||z̃ ∗ − pn ||), (2) complexity of learning P is growing exponentially with h.
We require a pre-trained CNN f (e.g. on ImageNet or
such that pe(n,n.left) = 1 − pe(n,n.right) . Thus, the similar- training it on a specific prediction task first). During train-
ity between prototype pn and the nearest patch in the con- ing, prototypes in P are trainable tensors. Parameters ω
volutional output, z̃ ∗ , determines to what extent z is routed and P are simultaneously learned with back-propagation
to the right child of node n. Because of the soft routing, by minimizing the cross-entropy loss between the predicted

14936
Algorithm 1: Training a ProtoTree
Input: Training set T , max height h, nEpochs … …
(1)
1 initialize ProtoTree T with height h and ω, P , c ;
2 for t ∈ {1, ..., nEpochs} do
3 randomly split T into B mini-batches;
4 for (xb , yb ) ∈ {T 1 , ..., T b , ..., T B } do
5 ŷb = T (xb );
6 compute loss (ŷb , yb );
7 update ω and P with gradient descent; Figure 5: Pruning removes a subtree T ′ , and its parent, in
8 for ℓ ∈ L do which all leaves have an (nearly) uniform distribution.
(t+1) (t)
9 cℓ -= B1 · cℓ ;
(t+1)
10 cℓ += Eq. 5 for xb , yb ; 5. Interpretability and Visualization
11 prune T (optional);
To foster global model interpretability, we prune ineffec-
12 replace each prototype pn ∈ P with its nearest
tive prototypes, visualize the learned latent prototypes, and
latent patch z̃n∗ and visualize;
convert soft to hard decisions.
5.1. Pruning
class probability distribution ŷ and ground-truth y. The Interpretability can be quantified by explanation size [11,
learned prototypes should be near a latent patch of a training 45]. In a ProtoTree T , explanation size is related to the num-
image such that they can be visualized as an image patch to ber of prototypes. To reduce explanation size, we analyse
represent a prototypical part (cf. Sec. 5). the learned class probability distributions in the leaves and
Learning leaves’ distributions. In a classical decision remove leaves with nearly uniform distributions, i.e. little
tree, a leaf label is learned from the samples ending up in discriminative power. Specifically, we define a threshold τ
that leaf. Since we use a soft tree, learning the leaves’ distri- and prune all leaves where max(σ(cℓ )) ≤ τ , with τ being
butions is a global learning problem. Although it is possible slightly greater than 1/K where K is the number of classes.
to learn c with back-propagation together with ω and P , we If all leaves in a full subtree T ′ ⊂ T are pruned, T ′ (and its
found that this gives inferior classification results. We hy- prototypes) can be removed. As visualized in Fig. 5, Pro-
pothesize that including c in the loss term leads to an overly toTree T can be reorganized by additionally removing the
complex optimization problem. Kontschieder et al. [29] now superfluous parent of the root of T ′ .
noted that solely optimizing leaf parameters is a convex op- 5.2. Prototype Visualization
timization problem and proposed a derivative-free strategy.
Translating their approach to our methodology gives the fol- Learned latent prototypes need to be mapped to pixel
lowing update scheme for cℓ for all ℓ ∈ L: space to enable interpretability. Similar to ProtoPNet [9],
we replace each prototype pn ∈ P with its nearest latent
(t+1)
X (t) patch present in the training data, z̃n∗ . Thus,
cℓ = (σ(cℓ ) ⊙ y ⊙ πℓ ) ⊘ ŷ, (5)
x,y∈T pn ← z̃n∗ , z̃n∗ = argmin ||z̃ ∗ − pn || (6)
z∈{f (x),∀x∈T }

where t indexes a training epoch, ⊙ denotes element-wise such that prototype pn is equal to latent representation z̃n∗ .
multiplication and ⊘ is element-wise division. The result Where ProtoPNet replaces its prototypes during training ev-
is a vector of size K representing the class distribution in ery 10th epoch, prototype replacement after training is suf-
leaf ℓ. This learning scheme is however computationally ficient for a ProtoTree, since our routing mechanism implic-
(t+1)
expensive: at each epoch, first cℓ is computed to up- itly optimizes prototypes to represent a certain patch. This
date the leaves, and then all other parameters are trained by reduces computational complexity and simplifies the train-
looping through the data again, meaning that ŷ is computed ing process.
twice. We propose to do this more efficiently and intertwine We denote by x∗n the training image corresponding to
mini-batch gradient descent optimization for ω and P with nearest patch z̃n∗ . Prototype pn can now be visualized
a derivative-free update to learn c, as shown in Alg. 1. Our as a patch of x∗n . We forward x∗n through f to create a
algorithm has the advantage that each mini-batch update of 2-dimensional similarity map that includes the similarity
ω and P is taken into account for updating c(t+1) , which score between pn and all patches in z = f (x∗n )
aids convergence. Moreover, computing ŷ only once for
each batch roughly halves the training time. Sn(i,j) = exp(−||z̃ (i,j) − pn ||), (7)

14937
Data Method Inter- Top-1 #Proto
set pret. Accuracy types
Triplet Model [34] - 87.5 n.a.
TranSlider [58] - 85.8 n.a.

CUB (224 × 224)

TASN [57] o 87.0 n.a.
Figure 6: Visualizing a prototype by selecting the most sim- ProtoPNet [9] + 79.2 2000
ilar patch from the upsampled similarity map. ProtoTree h=9 (ours) ++ 82.2±0.7 202
ProtoPNet ens. (3) [9] + 84.8 6000
ProtoTree ens. (3) + 86.6 605
where (i, j) indicates the location of patch z̃ in patches(z). ProtoTree ens. (5) + 87.2 1008
Similarity map Sn is upsampled with bicubic interpolation RAU [36] - 93.8 n.a.
to the input shape of x∗n , after which pn is visualized as

CARS (224 × 224)

Triplet Model [34] - 93.6 n.a.
a rectangular patch of x∗n , at the same location of nearest TASN [57] o 93.8 n.a.
latent patch z̃n∗ (see Fig. 6). Thus, instead of merely show-
ProtoPNet [9] + 86.1 1960
ing the nearest training patch in the tree, we use the corre-
ProtoTree h=11 (ours) ++ 86.6±0.2 195
sponding latent patch z̃n∗ for routing, making the visualized
ProtoPNet ens. (3) [9] + 91.4 5880
ProtoTree a faithful model explanation.
ProtoTree ens. (3) + 90.3 586
5.3. Deterministic reasoning ProtoTree ens. (5) + 91.5 977

In a soft decision tree, all nodes contribute to a predic- Table 1: Mean accuracy and standard deviation of our Pro-
tion. In contrast, in a hard, deterministic tree, only the nodes toTree (5 runs) and ensemble with 3 or 5 ProtoTrees com-
along a path account for the final prediction, making hard pared with self-reported accuracy of uninterpretable state-
decision trees easier to interpret than soft trees [2]. Whereas of-the-art2 (-), attention-based models (o) and interpretable
a ProtoTree is soft during training, we propose two possible ProtoPNet (+, with ResNet34-backbone).
strategies to convert a ProtoTree to a hard tree at test time:
1. select the path to the leaf with the highest path proba-
bility: argmaxℓ∈L (πℓ ) 1 × 1 convolutional layer1 to reduce the dimensionality of
2. greedily traverse the tree, i.e. go right at internal node latent output z to D, the prototype depth. Based on cross-
n if pe(n,n.right) > 0.5 and left otherwise. validation from {128, 256, 512}, we used D=256 for CUB
Sec. 6.2 evaluates to what extent these deterministic strate- and D=128 for CARS. Similar to ProtoPNet, H1 =W1 =1 to
gies influence accuracy compared to soft reasoning. provide well-sized patches, such that a prototype is of size
1 × 1 × 256 for CUB. We use ReLU as activation function,
6. Experiments except for the last layer which has a Sigmoid function to
act as a form of normalization. We initialize the prototypes
We evaluate the accuracy-interpretability trade-off of a by sampling from N (0.5, 0.1). The initial leaf distributions
ProtoTree, and compare our ProtoTrees with ProtoPNet [9] (1) (1)
σ(cℓ ) are uniformly distributed by initializing cℓ with
and state-of-the-art models in terms of accuracy and inter- zeros for all ℓ ∈ L. See Suppl. for all details.
pretability. We evaluate on CUB-200-2011 [50] with 200
bird species (CUB) and Stanford Cars [30] with 196 car 6.2. Accuracy and Interpretability
types (CARS), since both were used by ProtoPNet [9].
Table 1 compares the accuracy of ProtoTrees with state-
of-the-art methods. Our ProtoTree outperforms ProtoPNet
6.1. Experimental Setup
for both datasets. We also evaluated the accuracy of Pro-
We implemented ProtoTree in PyTorch. Our CNN f toTree ensembles by averaging the predictions of 3 or 5 in-
contains the convolutional layers of ResNet50 [21], pre- dividual ProtoTrees, all trained on the same dataset. An
trained on ImageNet [10] for CARS. For CUB, ResNet50 ensemble of ProtoTrees outperforms a ProtoPNet ensem-
is pretrained on iNaturalist2017 [49], containing plants and ble, and approximates the accuracy of uninterpretable or
animals and therefore a suitable source domain [32], using attention-based methods, while providing intrinsically in-
the backbone of [59]. Backbone f is frozen for 30 epochs terpretable global and faithful local explanations.
after which f is optimized jointly with the prototypes with 1 ProtoPNet [9] appends two 1×1 convolutional layers, but in our model
Adam [28]. For a fair comparison with ProtoPNet [9], we this gave varying (and lower) accuracy across runs.
resize all images to 224 × 224 such that the resulting feature 2 Using higher-resolution images (e.g. 448 × 448) has shown to give

maps are 7 × 7. The CNN architecture is extended with a better results [48, 57] with e.g. accuracy up to 90.4% [16] for CUB.

14938
Dataset K h Initial Acc Acc pruned Acc pruned+repl. # Prototypes % Pruned Distance z̃n∗
CUB 200 9 82.206 ± 0.723 82.192 ± 0.723 82.199 ± 0.726 201.6 ± 1.9 60.5 0.0020 ± 0.0068
CARS 196 11 86.584 ± 0.250 86.576 ± 0.245 86.576 ± 0.245 195.4 ± 0.5 90.5 0.0005 ± 0.0016

Table 2: Impact of pruning and prototype replacement: 1) before pruning and replacement, 2) after pruning, 3) after pruning
and replacement, 4) number of prototypes after pruning, 5) fraction of prototypes that is pruned and 6) Euclidean distance
from each latent prototype to its nearest latent training patch (after pruning). Showing mean and std dev across 5 runs.

100%

90%
Strategy Accuracy Fidelity Path length
80% Soft 82.20±0.01 n.a. n.a.
Accuracy

70% Max πℓ 82.19±0.01 0.999±0.001 8.3 ± 1.1 (9, 3)

CUB
60% CARS Greedy 82.07±0.01 0.987±0.002 8.3 ± 1.1 (9, 3)
Single Tree
50% Ensemble
Min. Height Table 3: Soft vs. deterministic classification strategies at
40%
7 8 9 10 11
Height of ProtoTree test time. Fidelity is agreement with soft strategy. Min
and max path lengths in brackets. ProtoTree on CUB (h=9,
Figure 7: Top-1 accuracy of a ProtoTree (across 5 runs), and pruned and replaced), averaged over 5 runs (mean, stdev).
an ensemble with those 5 ProtoTrees. A vertical dotted line
shows the minimal height such that #leaves ≥ #classes.
(up to > 90%), preserving roughly 1 prototype per class.
In contrast, ProtoPNet [9] uses 10 prototypes per class (cf.
Tree Height. A ProtoTree with fewer prototypes is Tab. 1), resulting in 2000 prototypes in total for CUB. Thus,
smaller and hence easier to interpret, but represents a less a ProtoTree is almost 90% smaller and therefore easier to
complex model and might have less predictive power. Fig. 7 interpret. Even with an ensemble of ProtoTrees, which in-
shows the accuracy of ProtoTrees with increasing height. It creases global explanation size, the number of prototypes is
confirms that it is sensible to set the initial height h such still substantially smaller than ProtoPNet (cf. Table 1).
that the number of leaves is at least as large as the number Deterministic reasoning. As discussed in Sec. 5.3, a
of classes K. For CUB, accuracy increases up to a certain ProtoTree can make deterministic predictions at test time to
height (h = 9) after which accuracy plateaus. An increas- improve human understanding of the decision making pro-
ing height has a higher impact on the accuracy for CARS, cess. Table 3 shows that selecting the leaf with the highest
probably because of its lower inter-class part similarity for path probability leads to nearly the same prediction accu-
which a more imbalanced tree, with fewer shared proto- racy as soft routing, since the fidelity (i.e. fraction of test
types, is more suitable. Ensembling substantially increases images for which the soft and hard strategy make the same
prediction accuracy, although at the cost of explanation size. classification [19]) is practically 1. The greedy strategy per-
Pruning. Since our training algorithm optimizes the leaf forms slightly worse but its fidelity is still close to 1. Results
distributions to minimize the error between ŷ and one-hot are similar for other datasets and tree heights (Suppl.). Our
encoded y, most leaves learn either one class label, or an experiments therefore show that a ProtoTree can be safely
almost uniform distribution, as shown in Fig. 8 (top left) for converted to a deterministic tree, such that a prediction can
CUB with h=8. Other datasets and tree heights show a sim- be explained by presenting one path in the tree. Compared
ilar pattern (Suppl.). We set pruning threshold τ = 0.01, to ProtoPNet [9], where a user is required to analyse 2000
such that we are left with leaves that can be interpreted prototypes to understand a single prediction for CUB, our
(nearly) deterministically. Table 2 shows that the predic- deterministic ProtoTree (h=9) reduces the number of deci-
tion accuracy of a ProtoTree barely changes when the tree sions to follow to 9 prototypes at maximum. When using
is pruned and visualized. The negligible difference after a more accurate ensemble of 5 deterministic ProtoTrees, a
prototype replacement (i.e. visualization) is supported by maximum of only 45 prototypes needs to be analysed, re-
the fact that the distance from each prototype to its near- sulting in much smaller local explanations than ProtoPNet.
est patch is close to zero, indicating that ProtoTree already Visualizations and Discussion. Figure 8 shows a snip-
implicitly optimizes prototypes to be near a latent image pet of a ProtoTree trained on CUB (more in Suppl.), and
patch. This confirms that replacing prototypes after train- Figure 9 shows a local explanation containing the full path
ing suffices, instead of during training as in ProtoPNet [9]. along the tree with a greedy classification strategy. From
Furthermore, pruning drastically reduces the size of the tree analysing various ProtoTrees, we conclude that prototypes

14939
Figure 8: Subtree of an automatically visualized ProtoTree trained on CUB, h=8 (middle). Each internal node contains
a prototype (left) and the training image from which it is extracted (right). A ProtoTree faithfully shows its reasoning and
clusters similar classes (e.g. birds with a white chest). Top left: maximum values of all leaf distributions. Top right: ProtoTree
reveals biases learned by the model: e.g. classifying a Gray Catbird based on the presence of a leaf. Best viewed in color.

Test image

Figure 9: Local explanation that shows the greedy path when classifying a White-breasted Nuthatch. Prototypes found in the
test image are: a dark-colored tail, a contrastive eye, sky background (learned bias), a white chest and a blue-grey wing.

are in general perceptually relevant, and successfully clus- 7. Conclusion

ter similar-looking classes. Similar to ProtoPNet [9], some
prototypes seem to focus on background. This is not nec- We presented the Neural Prototype Tree (ProtoTree) for
essarily an error in our visualization but shows that a Pro- intrinsically interpretable fine-grained image recognition.
toTree can reveal learned biases. For example, Fig. 8 (top Whereas the Prototypical Part Network (ProtoPNet) [9]
right) shows a green leaf to distinguish between a Gray Cat- presents a user a large number of prototypes, our novel
bird and a Black Tern, because the latter is in the training architecture with end-to-end training procedure improves
data usually surrounded by sky or water. Further research interpretability by arranging the prototypes in a hierarchi-
could investigate to what extent undesired prototypes can cal tree structure. This breaks up the reasoning process in
be ‘fixed’ with a human-in-the-loop that replaces them with small steps which simplifies model comprehension and er-
a manually selected patch, in order to create a model that is ror analysis, and reduces the number of prototypes by a fac-
completely “right for the right reasons” [42]. Furthermore, tor of 10. Most learned prototypes are semantically rele-
we found that human’s perceptual similarity could differ vant, which results in a fully simulatable model. Addition-
from similarity assigned by the model, since it is not always ally, we outperform ProtoPNet [9] on the CUB-200-2011
clear why the model considered an image highly similar to a and Stanford Cars data sets. An ensemble of 5 ProtoTrees
prototype. The visualized prototypes could therefore be fur- approximates the accuracy of non-interpretable state-of-the
ther explained by indicating whether e.g. color or shape was art models, while still having fewer prototypes than ProtoP-
most important, as presented by [38], or by showing a clus- Net [9]. Thus, ProtoTree achieves competitive performance
while maintaining intrinsic interpretability. As a result, our
ter of patches. Especially prototypes close to the root of the
tree are sometimes not as clear and semantically meaningful work questions the existence of an accuracy-interpretability
as prototypes closer to leaves. This is probably due to the trade-off and stimulates novel usage of powerful neural net-
binary tree structure that requires a patch from a training im- works as backbone for interpretable, predictive models. In
age to split the data into two subsets. A natural progression future work, we would like to investigate the potential of
of this work would therefore be to investigate non-binary ProtoTree for other types of problems that contain prototyp-
ical features, such as specific wave patterns in sensor data.
trees, with multiple prototypes per node.

14940
References [15] Nicholas Frosst and Geoffrey Hinton. Distilling a neu-
ral network into a soft decision tree. arXiv preprint
[1] Amina Adadi and Mohammed Berrada. Peeking inside the arXiv:1711.09784, 2017.
black-box: A survey on explainable artificial intelligence [16] Weifeng Ge, Xiangru Lin, and Yizhou Yu. Weakly su-
(xai). IEEE Access, 6:52138–52160, 2018. pervised complementary parts models for fine-grained im-
[2] Stephan Alaniz and Zeynep Akata. Explainable observer- age classification from the bottom up. In Proceedings of
classifier for explainable binary decisions. arXiv preprint the IEEE/CVF Conference on Computer Vision and Pattern
arXiv:1902.01780, 2019. Recognition (CVPR), June 2019.
[3] David Alvarez-Melis and Tommi S Jaakkola. On the [17] Kamaledin Ghiasi-Shirazi. Generalizing the convolution op-
robustness of interpretability methods. arXiv preprint erator in convolutional neural networks. Neural Processing
arXiv:1806.08049, 2018. Letters, 50(3):2627–2646, 2019.
[4] Plamen Angelov and Eduardo Soares. Towards deep ma- [18] Riccardo Guidotti, Anna Monreale, Stan Matwin, and Dino
chine reasoning: a prototype-based deep neural network with Pedreschi. Black box explanation by learning image ex-
decision tree inference. In 2020 IEEE International Confer- emplars in the latent feature space. In Ulf Brefeld, Elisa
ence on Systems, Man, and Cybernetics (SMC), pages 2092– Fromont, Andreas Hotho, Arno Knobbe, Marloes Maathuis,
2099. IEEE, 2020. and Céline Robardet, editors, Machine Learning and Knowl-
[5] Sercan O Arik and Tomas Pfister. Protoattend: edge Discovery in Databases, pages 189–205, Cham, 2020.
Attention-based prototypical learning. arXiv preprint Springer International Publishing.
arXiv:1902.06292, 2019. [19] Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri,
[6] Sebastian Bach, Alexander Binder, Grégoire Montavon, Franco Turini, Fosca Giannotti, and Dino Pedreschi. A sur-
Frederick Klauschen, Klaus-Robert Müller, and Wojciech vey of methods for explaining black box models. ACM com-
Samek. On pixel-wise explanations for non-linear classifier puting surveys (CSUR), 51(5):1–42, 2018.
decisions by layer-wise relevance propagation. PLOS ONE, [20] Peter Hase, Chaofan Chen, Oscar Li, and Cynthia Rudin. In-
10(7):1–46, 07 2015. terpretable image recognition with hierarchical prototypes.
[7] Irving Biederman. Recognition-by-components: a the- In Proceedings of the AAAI Conference on Human Compu-
ory of human image understanding. Psychological review, tation and Crowdsourcing, volume 7, pages 32–40, 2019.
94(2):115, 1987. [21] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
[8] Wieland Brendel and Matthias Bethge. Approximating cnns Deep residual learning for image recognition. In Proceed-
with bag-of-local-features models works surprisingly well on ings of the IEEE conference on computer vision and pattern
imagenet. In 7th International Conference on Learning Rep- recognition, pages 770–778, 2016.
resentations, ICLR 2019, New Orleans, LA, USA, May 6-9, [22] Thomas M Hehn, Julian FP Kooij, and Fred A Hamprecht.
2019. OpenReview.net, 2019. End-to-end learning of decision trees and forests. Interna-
tional Journal of Computer Vision, pages 1–15, 2019.
[9] Chaofan Chen, Oscar Li, Daniel Tao, Alina Barnett, Cynthia
[23] Ozan Irsoy, Olcay Taner Yıldız, and Ethem Alpaydın. Soft
Rudin, and Jonathan K Su. This looks like that: Deep learn-
decision trees. In Proceedings of the 21st International Con-
ing for interpretable image recognition. In H. Wallach, H.
ference on Pattern Recognition (ICPR2012), pages 1819–
Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R.
1822. IEEE, 2012.
Garnett, editors, Advances in Neural Information Process-
[24] Ozan Irsoy, Olcay Taner Yildiz, and Ethem Alpaydin. Bud-
ing Systems 32, pages 8930–8941. Curran Associates, Inc.,
ding trees. In 2014 22nd International Conference on Pattern
2019.
Recognition, pages 3582–3587. IEEE, 2014.
[10] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li,
[25] Ruyi Ji, Longyin Wen, Libo Zhang, Dawei Du, Yanjun Wu,
and Li Fei-Fei. Imagenet: A large-scale hierarchical image
Chen Zhao, Xianglong Liu, and Feiyue Huang. Attention
database. In 2009 IEEE conference on computer vision and
convolutional binary neural tree for fine-grained visual cat-
pattern recognition, pages 248–255. Ieee, 2009.
egorization. In Proceedings of the IEEE/CVF Conference
[11] Finale Doshi-Velez and Been Kim. Towards a rigorous on Computer Vision and Pattern Recognition, pages 10468–
science of interpretable machine learning. arXiv preprint 10477, 2020.
arXiv:1702.08608, 2017. [26] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai,
[12] Alexey Dosovitskiy and Thomas Brox. Inverting visual rep- James Wexler, Fernanda Viegas, and Rory sayres. Inter-
resentations with convolutional networks. In Proceedings pretability beyond feature attribution: Quantitative testing
of the IEEE Conference on Computer Vision and Pattern with concept activation vectors (TCAV). volume 80 of Pro-
Recognition (CVPR), June 2016. ceedings of Machine Learning Research, pages 2668–2677,
[13] Ruth C. Fong and Andrea Vedaldi. Interpretable explanations Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018.
of black boxes by meaningful perturbation. In Proceedings PMLR.
of the IEEE International Conference on Computer Vision [27] Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Max-
(ICCV), Oct 2017. imilian Alber, Kristof T. Schütt, Sven Dähne, Dumitru Er-
[14] Alex A. Freitas. Comprehensible classification models: A han, and Been Kim. The (Un)reliability of Saliency Methods,
position paper. SIGKDD Explor. Newsl., 15(1):1–10, Mar. pages 267–280. Springer International Publishing, Cham,
2014. 2019.

14941
[28] Diederik P Kingma and Jimmy Ba. Adam: A method for models instead. Nature Machine Intelligence, 1(5):206–215,
stochastic optimization. arXiv preprint arXiv:1412.6980, 2019.
2014. [44] Sascha Saralajew, Lars Holdijk, Maike Rees, Ebubekir Asan,
[29] Peter Kontschieder, Madalina Fiterau, Antonio Criminisi, and Thomas Villmann. Classification-by-components: Prob-
and Samuel Rota Bulo. Deep neural decision forests. In The abilistic modeling of reasoning over a set of components. In
IEEE International Conference on Computer Vision (ICCV), H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc,
December 2015. E. Fox, and R. Garnett, editors, Advances in Neural Informa-
[30] Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. tion Processing Systems 32, pages 2792–2803. Curran Asso-
3d object representations for fine-grained categorization. In ciates, Inc., 2019.
4th International IEEE Workshop on 3D Representation and [45] Wilson Silva, Kelwin Fernandes, Maria J. Cardoso, and
Recognition (3dRR-13), Sydney, Australia, 2013. Jaime S. Cardoso. Towards complementary explanations us-
[31] Thibault Laugel, Marie-Jeanne Lesot, Christophe Marsala, ing deep neural networks. In Danail Stoyanov, Zeike Taylor,
Xavier Renard, and Marcin Detyniecki. The dangers of post- Seyed Mostafa Kia, Ipek Oguz, Mauricio Reyes, Anne Mar-
hoc interpretability: Unjustified counterfactual explanations. tel, Lena Maier-Hein, Andre F. Marquand, Edouard Duches-
arXiv preprint arXiv:1907.09294, 2019. nay, Tommy Löfstedt, Bennett Landman, M. Jorge Cardoso,
[32] Hao Li, Pratik Chaudhari, Hao Yang, Michael Lam, Avinash Carlos A. Silva, Sergio Pereira, and Raphael Meier, editors,
Ravichandran, Rahul Bhotika, and Stefano Soatto. Re- Understanding and Interpreting Machine Learning in Med-
thinking the hyperparameters for fine-tuning. arXiv preprint ical Image Computing Applications, pages 133–140, Cham,
arXiv:2002.11770, 2020. 2018. Springer International Publishing.
[33] Shichao Li and Kwang-Ting Cheng. Visualizing the [46] Alberto Suárez and James F Lutsko. Globally optimal
decision-making process in deep neural decision forest. In fuzzy decision trees for classification and regression. IEEE
CVPR Workshops, pages 114–117, 2019. Transactions on Pattern Analysis and Machine Intelligence,
[34] J. Liang, J. Guo, Y. Guo, and S. Lao. Adaptive triplet 21(12):1297–1311, 1999.
model for fine-grained visual categorization. IEEE Access, [47] Ryutaro Tanno, Kai Arulkumaran, Daniel Alexander, An-
6:76776–76786, 2018. tonio Criminisi, and Aditya Nori. Adaptive neural trees.
[35] Zachary C. Lipton. The mythos of model interpretability. volume 97 of Proceedings of Machine Learning Research,
Queue, 16(3):30:31–30:57, June 2018. pages 6166–6175, Long Beach, California, USA, 09–15 Jun
[36] X. Ma and A. Boukerche. An ai-based visual attention model 2019. PMLR.
for vehicle make and model recognition. In 2020 IEEE Sym- [48] Hugo Touvron, Andrea Vedaldi, Matthijs Douze, and Herve
posium on Computers and Communications (ISCC), pages Jegou. Fixing the train-test resolution discrepancy. In H.
1–6, 2020. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E.
[37] Grégoire Montavon, Wojciech Samek, and Klaus-Robert Fox, and R. Garnett, editors, Advances in Neural Informa-
Müller. Methods for interpreting and understanding deep tion Processing Systems 32, pages 8252–8262. Curran Asso-
neural networks. Digital Signal Processing, 73:1–15, 2018. ciates, Inc., 2019.
[38] Meike Nauta, Annemarie Jutte, Jesper Provoost, and Christin [49] Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui,
Seifert. This looks like that, because ... explaining prototypes Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and
for interpretable image recognition, 2020. Serge Belongie. The inaturalist species classification and de-
[39] Anh Nguyen, Alexey Dosovitskiy, Jason Yosinski, Thomas tection dataset. In The IEEE Conference on Computer Vision
Brox, and Jeff Clune. Synthesizing the preferred inputs for and Pattern Recognition (CVPR), June 2018.
neurons in neural networks via deep generator networks. In [50] C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie.
D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. The Caltech-UCSD Birds-200-2011 Dataset. Technical Re-
Garnett, editors, Advances in Neural Information Process- port CNS-TR-2011-001, California Institute of Technology,
ing Systems 29, pages 3387–3395. Curran Associates, Inc., 2011.
2016. [51] Alvin Wan, Lisa Dunlap, Daniel Ho, Jihan Yin, Scott Lee,
[40] Mohammad Norouzi, Maxwell Collins, Matthew A Johnson, Henry Jin, Suzanne Petryk, Sarah Adel Bargal, and Joseph E
David J Fleet, and Pushmeet Kohli. Efficient non-greedy op- Gonzalez. Nbdt: Neural-backed decision trees. arXiv
timization of decision trees. In Advances in neural informa- preprint arXiv:2004.00221, 2020.
tion processing systems, pages 1729–1737, 2015. [52] Alvin Wan, Daniel Ho, Younjin Song, Henk Tillman,
[41] Chris Olah, Alexander Mordvintsev, and Ludwig Sarah Adel Bargal, and Joseph E Gonzalez. Segnbdt:
Schubert. Feature visualization. Distill, 2017. Visual decision rules for segmentation. arXiv preprint
https://distill.pub/2017/feature-visualization. arXiv:2006.06868, 2020.
[42] Andrew Slavin Ross, Michael C Hughes, and Finale Doshi- [53] Chih-Kuan Yeh, Joon Kim, Ian En-Hsu Yen, and Pradeep K
Velez. Right for the right reasons: Training differentiable Ravikumar. Representer point selection for explaining deep
models by constraining their explanations. arXiv preprint neural networks. In S. Bengio, H. Wallach, H. Larochelle,
arXiv:1703.03717, 2017. K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Ad-
[43] Cynthia Rudin. Stop explaining black box machine learn- vances in Neural Information Processing Systems 31, pages
ing models for high stakes decisions and use interpretable 9291–9301. Curran Associates, Inc., 2018.

14942
[54] Matthew D. Zeiler and Rob Fergus. Visualizing and under-
standing convolutional networks. In David Fleet, Tomas Pa-
jdla, Bernt Schiele, and Tinne Tuytelaars, editors, Computer
Vision – ECCV 2014, pages 818–833, Cham, 2014. Springer
International Publishing.
[55] Quanshi Zhang, Yu Yang, Haotian Ma, and Ying Nian Wu.
Interpreting cnns via decision trees. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recogni-
tion, pages 6261–6270, 2019.
[56] Heliang Zheng, Jianlong Fu, Tao Mei, and Jiebo Luo. Learn-
ing multi-attention convolutional neural network for fine-
grained image recognition. In Proceedings of the IEEE Inter-
national Conference on Computer Vision (ICCV), Oct 2017.
[57] Heliang Zheng, Jianlong Fu, Zheng-Jun Zha, and Jiebo Luo.
Looking for the devil in the details: Learning trilinear atten-
tion sampling network for fine-grained image recognition. In
Proceedings of the IEEE/CVF Conference on Computer Vi-
sion and Pattern Recognition (CVPR), June 2019.
[58] Kuo Zhong, Ying Wei, Chun Yuan, Haoli Bai, and Junzhou
Huang. Translider: Transfer ensemble learning from ex-
ploitation to exploration. In Proceedings of the 26th ACM
SIGKDD International Conference on Knowledge Discov-
ery & Data Mining, KDD ’20, page 368–378, New York,
NY, USA, 2020. Association for Computing Machinery.
[59] Boyan Zhou, Quan Cui, Xiu-Shen Wei, and Zhao-Min
Chen. Bbn: Bilateral-branch network with cumulative learn-
ing for long-tailed visual recognition. In Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pages 9719–9728, 2020.
[60] Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva,
and Antonio Torralba. Learning deep features for discrimi-
native localization. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), June
2016.
[61] Bolei Zhou, Yiyou Sun, David Bau, and Antonio Torralba.
Interpretable basis decomposition for visual explanation. In
Proceedings of the European Conference on Computer Vi-
sion (ECCV), September 2018.

14943

ProtoPNet: Interpretable Image Recognition
No ratings yet
ProtoPNet: Interpretable Image Recognition
12 pages
Donnelly Rashomon Sets For Prototypical-Part Networks Editing Interpretable Models in Real-Time CVPR 2025 Paper
No ratings yet
Donnelly Rashomon Sets For Prototypical-Part Networks Editing Interpretable Models in Real-Time CVPR 2025 Paper
11 pages
PIP-Net: Intuitive Prototypes for Image Classification
No ratings yet
PIP-Net: Intuitive Prototypes for Image Classification
10 pages
AIML 4th and 5th Module Notes
No ratings yet
AIML 4th and 5th Module Notes
77 pages
Deep Neural Decision Trees: Yongxin Yang Irene Garcia Morillo Timothy M. Hospedales
No ratings yet
Deep Neural Decision Trees: Yongxin Yang Irene Garcia Morillo Timothy M. Hospedales
7 pages
NBDT-NEURAL-BACKED DECISION TREE - 2004.00221v3
No ratings yet
NBDT-NEURAL-BACKED DECISION TREE - 2004.00221v3
19 pages
Zhou Revisiting Prototypical Network For Cross Domain Few-Shot Learning CVPR 2023 Paper
No ratings yet
Zhou Revisiting Prototypical Network For Cross Domain Few-Shot Learning CVPR 2023 Paper
10 pages
2021-Seul Ki Yeom-Pruning by Explaining A Novel Criterion For Deep Neural Network Pruning
No ratings yet
2021-Seul Ki Yeom-Pruning by Explaining A Novel Criterion For Deep Neural Network Pruning
14 pages
Deep Learning for Image Captioning
No ratings yet
Deep Learning for Image Captioning
17 pages
How To CombineNeural Networks Anddecision Trees
No ratings yet
How To CombineNeural Networks Anddecision Trees
34 pages
GNN Explainability: A Survey
No ratings yet
GNN Explainability: A Survey
19 pages
A Comprehensive Survey On Self-Interpretable Neural Networks
No ratings yet
A Comprehensive Survey On Self-Interpretable Neural Networks
24 pages
Neural Network Interpretability Survey
No ratings yet
Neural Network Interpretability Survey
17 pages
Approach To Provide Interpretability in Machine Le
No ratings yet
Approach To Provide Interpretability in Machine Le
15 pages
Prototype Based Deepm Learning Paper 1 Sacha
No ratings yet
Prototype Based Deepm Learning Paper 1 Sacha
12 pages
4902 Towards Interpretable Deep Rei
No ratings yet
4902 Towards Interpretable Deep Rei
19 pages
TDSC Choo 221
No ratings yet
TDSC Choo 221
12 pages
Cîciu Radu-Marian - Research - Paper - Orchard Monitoring Using UAV To Capture Invaders Insect Images
No ratings yet
Cîciu Radu-Marian - Research - Paper - Orchard Monitoring Using UAV To Capture Invaders Insect Images
8 pages
Comprehensive Exam
No ratings yet
Comprehensive Exam
35 pages
V25I0108
No ratings yet
V25I0108
7 pages
Plant Species Categorization Using Unified Ensemble Layer
No ratings yet
Plant Species Categorization Using Unified Ensemble Layer
11 pages
1580 Rethinking The Value of Networ
No ratings yet
1580 Rethinking The Value of Networ
21 pages
Prototype Based Deepm Learning Paper 2 Zhou
No ratings yet
Prototype Based Deepm Learning Paper 2 Zhou
12 pages
MProtoNet - A Case-Based Interpretable Model For Brain Tumor Classification With 3D Multi-Parametric Magnetic Resonance Imaging
No ratings yet
MProtoNet - A Case-Based Interpretable Model For Brain Tumor Classification With 3D Multi-Parametric Magnetic Resonance Imaging
15 pages
NIPS 2017 Inferring Generative Model Structure With Static Analysis Paper
No ratings yet
NIPS 2017 Inferring Generative Model Structure With Static Analysis Paper
11 pages
Enhancing Source Code Classification Effectiveness Via Prompt Learning Incorporating Knowledge Features
No ratings yet
Enhancing Source Code Classification Effectiveness Via Prompt Learning Incorporating Knowledge Features
23 pages
A Hardware-Friendly High-Precision CNN Pruning Method and Its FPGA Implementation
No ratings yet
A Hardware-Friendly High-Precision CNN Pruning Method and Its FPGA Implementation
22 pages
WWW - Explainable Neural Rule Learning
No ratings yet
WWW - Explainable Neural Rule Learning
11 pages
Single-Shot Pruning and Quantization For Hardware-Friendly Neural Network Acceleration
No ratings yet
Single-Shot Pruning and Quantization For Hardware-Friendly Neural Network Acceleration
8 pages
A Case Study in Interpretability
No ratings yet
A Case Study in Interpretability
23 pages
Enhancing Reliability Through Interpretability A Comprehensive Survey of Interpretable Intelligent Fault Diagnosis in Rotating Machinery
No ratings yet
Enhancing Reliability Through Interpretability A Comprehensive Survey of Interpretable Intelligent Fault Diagnosis in Rotating Machinery
32 pages
Tree-Based Convolutional Neural Networks: Principles and Applications
No ratings yet
Tree-Based Convolutional Neural Networks: Principles and Applications
104 pages
Entropy 26 01046
No ratings yet
Entropy 26 01046
33 pages
Group Fisher Pruning For Practical Network Compression 2108.00708v1
No ratings yet
Group Fisher Pruning For Practical Network Compression 2108.00708v1
12 pages
@vtucode - in Module 4 AI 2021 Scheme 5th Sem
No ratings yet
@vtucode - in Module 4 AI 2021 Scheme 5th Sem
11 pages
Cross-Layer Ranking for Filter Pruning
No ratings yet
Cross-Layer Ranking for Filter Pruning
10 pages
Igaming Fortune Tiger Programming - Daniel Phillipe Gonçalves Menezes
No ratings yet
Igaming Fortune Tiger Programming - Daniel Phillipe Gonçalves Menezes
4 pages
Henna NativeNI 22 Author V
No ratings yet
Henna NativeNI 22 Author V
7 pages
Machine Learning Wheat Analysis
No ratings yet
Machine Learning Wheat Analysis
8 pages
Deep Learning Architectures Enabling Sophisticated Feature Extraction and Representation For Complex Data Analysis
No ratings yet
Deep Learning Architectures Enabling Sophisticated Feature Extraction and Representation For Complex Data Analysis
11 pages
NIPS 2017 Prototypical Networks For Few Shot Learning Paper
No ratings yet
NIPS 2017 Prototypical Networks For Few Shot Learning Paper
11 pages
On Interpretability of Artificial Neural Networks A Survey
No ratings yet
On Interpretability of Artificial Neural Networks A Survey
20 pages
Circuit Net
No ratings yet
Circuit Net
2 pages
Sensors 21 05888 v2
No ratings yet
Sensors 21 05888 v2
30 pages
(2022) Bridging Pre-Trained Models and Downstream Tasks For Source Code Understanding
No ratings yet
(2022) Bridging Pre-Trained Models and Downstream Tasks For Source Code Understanding
12 pages
Bafandkar Maryam Thesis Final 2022
No ratings yet
Bafandkar Maryam Thesis Final 2022
106 pages
Interpret Ability
No ratings yet
Interpret Ability
65 pages
S2DNAS Transforming Static CNN Model For Dynamic Inference Via Neural Architecture Search
No ratings yet
S2DNAS Transforming Static CNN Model For Dynamic Inference Via Neural Architecture Search
18 pages
Neural Architecture Search Survey
No ratings yet
Neural Architecture Search Survey
53 pages
Visual Analytics for Explainable AI
No ratings yet
Visual Analytics for Explainable AI
10 pages
Deep Neural Decision Forests Explained
No ratings yet
Deep Neural Decision Forests Explained
9 pages
Final Thesis
No ratings yet
Final Thesis
85 pages
Tree Regularization for Deep Model Interpretability
No ratings yet
Tree Regularization for Deep Model Interpretability
26 pages
PyTorch: Dynamic Deep Learning Library
No ratings yet
PyTorch: Dynamic Deep Learning Library
12 pages
Comprehensive Survey of Decision Trees
No ratings yet
Comprehensive Survey of Decision Trees
15 pages
Aif-C01 - 166qa
No ratings yet
Aif-C01 - 166qa
135 pages
Thesis Master 2022 Application of GNN For Graph Classification
No ratings yet
Thesis Master 2022 Application of GNN For Graph Classification
81 pages
Tree Modeling With Real Tree-Parts Examples: Ke Xie, Feilong Yan, Andrei Sharf, Oliver Deussen, Baoquan Chen, Hui Huang
No ratings yet
Tree Modeling With Real Tree-Parts Examples: Ke Xie, Feilong Yan, Andrei Sharf, Oliver Deussen, Baoquan Chen, Hui Huang
12 pages
Assessments PDF
No ratings yet
Assessments PDF
90 pages
TD-8 Q1W5-W6 10p FELIX SPTVE-1
No ratings yet
TD-8 Q1W5-W6 10p FELIX SPTVE-1
10 pages
Random Forest Unsupervised Learning Guide
No ratings yet
Random Forest Unsupervised Learning Guide
14 pages
M02W04 - F - Vectorization For Linear Regression - Update
No ratings yet
M02W04 - F - Vectorization For Linear Regression - Update
35 pages
Numeracy: Calculator Allowed
No ratings yet
Numeracy: Calculator Allowed
12 pages
Infants Posses A System of Numerical Knowledge Karen Wynn
No ratings yet
Infants Posses A System of Numerical Knowledge Karen Wynn
7 pages
CMI AC in Statistical Mechanics
No ratings yet
CMI AC in Statistical Mechanics
177 pages
Introduction To Computer Graphics With OpenGL ES by JungHyun Han
No ratings yet
Introduction To Computer Graphics With OpenGL ES by JungHyun Han
341 pages
Errors and Uncertainties
No ratings yet
Errors and Uncertainties
9 pages
Automata Theory and Formal Languages
No ratings yet
Automata Theory and Formal Languages
34 pages
Operations on Integers Lesson Plan
No ratings yet
Operations on Integers Lesson Plan
5 pages
Math Theorems SSC
No ratings yet
Math Theorems SSC
4 pages
DIP-Lab Task 1
No ratings yet
DIP-Lab Task 1
5 pages
Two-Dimensional Potential Flows: Complex Flow Potential and Complex Flow Velocity
No ratings yet
Two-Dimensional Potential Flows: Complex Flow Potential and Complex Flow Velocity
9 pages
Mathematics: For Grade 6
No ratings yet
Mathematics: For Grade 6
30 pages
DYNSIM Best Practices 1 - Compressor
100% (2)
DYNSIM Best Practices 1 - Compressor
21 pages
Mhf4u U2 L7
No ratings yet
Mhf4u U2 L7
19 pages
Gate Syllabus
No ratings yet
Gate Syllabus
3 pages
Comm Lab-II (BECL504) Manual
No ratings yet
Comm Lab-II (BECL504) Manual
45 pages
CS301 Quiz 3 Lec 26 To 33
No ratings yet
CS301 Quiz 3 Lec 26 To 33
180 pages
Week 4
No ratings yet
Week 4
16 pages
Supersymmetric Quantum Mechanics From A Chemist's Perspective
No ratings yet
Supersymmetric Quantum Mechanics From A Chemist's Perspective
6 pages
Development of A Composite I.P.R. Model For Multi-Layered Reservoirs
No ratings yet
Development of A Composite I.P.R. Model For Multi-Layered Reservoirs
11 pages
Makam Tone System For Turkish Art Music
100% (1)
Makam Tone System For Turkish Art Music
24 pages
Review Paper
No ratings yet
Review Paper
13 pages
B.Tech (Mechanical and Automation Engg) 5th To 8th Sem PDF
No ratings yet
B.Tech (Mechanical and Automation Engg) 5th To 8th Sem PDF
98 pages
Behavioral Car-Following Model Simulation
No ratings yet
Behavioral Car-Following Model Simulation
7 pages
Basic Trigonometry: Provided by Dse - Life
No ratings yet
Basic Trigonometry: Provided by Dse - Life
3 pages
Age vs Sleep Time Regression Analysis
No ratings yet
Age vs Sleep Time Regression Analysis
9 pages
Assignment
No ratings yet
Assignment
2 pages

Prototree

Uploaded by

Prototree

Uploaded by

Neural Prototype Trees for Interpretable Fine-grained Image Recognition

Meike Nauta1 Ron van Bree1 Christin Seifert1,2

LEGEND PROTOTREE - GLOBALLY INTERPRETABLE MODEL LOCAL EXPLANATION

Summer Tanager Painted Bun�ng Scarlet Tanager Vermillion Flycatcher

“Guess Who?” game where a player asks a series of binary

CUB (224 × 224)

CARS (224 × 224)

70% Max πℓ 82.19±0.01 0.999±0.001 8.3 ± 1.1 (9, 3)

are in general perceptually relevant, and successfully clus- 7. Conclusion

You might also like