0% found this document useful (0 votes)

9 views9 pages

Vision Transformer Reliability Evaluation on the C

Vision Transformer Reliability Evaluation on the C. Talks about how the large models of vision tranformers scale of edge compilers like those of NVIDIA Jetson Nano or other small compilers

Uploaded by

ee23b080

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views9 pages

Vision Transformer Reliability Evaluation on the C

Vision Transformer Reliability Evaluation on the C. Talks about how the large models of vision tranformers scale of edge compilers like those of NVIDIA Jetson Nano or other small compilers

Uploaded by

ee23b080

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

This article has been accepted for publication in IEEE Transactions on Nuclear Science.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TNS.2024.3513774

IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. XX, NO. XX, XXXX 2024 1

Vision Transformer Reliability Evaluation

on the Coral Edge TPU
Bruno Loureiro Coelho, Pablo R. Bodmann, Niccolò Cavagnero, Christopher Frost, and Paolo Rech

Abstract—Vision transformers (ViTs) outperform convolu- without truly achieving a global receptive field. Fortunately,
tional neural networks (CNNs) in tasks such as image classifica- researchers have recently developed a new architecture capable
tion, and, despite their high computational complexity, can still of correlating input information on a global scale: the trans-
be mapped to low-power EdgeAI accelerators, such as the Coral
Tensor Processing Unit (TPU). In this paper, through accelerated former model.
neutron beam experiments, we study the reliability of six ViTs Transformers are a type of deep learning (DL) model archi-
on the Coral TPU and four micro-benchmarks. According to our tecture originally introduced in natural language processing
data, the internal size of attention heads (the main computational (NLP), where they revolutionized the field. More recently,
block in ViTs) has negligible impact on the FIT rate of the the transformer architecture has been successfully applied to
model compared to increasing the number of heads in the model;
furthermore, our results show that employing convolutions in the image and video processing, being named vision transformers
patch embedding reduces the FIT rate of the model. Additionally, (ViTs). ViTs leverage the concept of attention, which allows
we decompose ViT into four basic computational blocks which a global processing of information from all over the image,
represent the main operators of the model, showing that, although overcoming the spatially local receptive field of CNNs and
the transformer layer (with multi-head self-attention and multi- resulting in a higher accuracy. Interestingly, transformers,
layer perceptron) presents the highest FIT rate, it is actually the
patch embedding that is more likely to cause misclassifications. despite having a more complex architecture with respect to
These results can be leveraged to design hardening techniques CNNs, can also be deployed in embedded applications with
that improve the resilience of the critical blocks of a ViT, strict energy, weight, and space constraints. In this paper, we
identified in our evaluation, while minimizing the additional study the reliability of transformer models on low-power and
overhead. low-cost commercial-of-the-shelf (COTS) accelerators, such
as the Coral Edge Tensor Processing Unit (TPU), a device
I. I NTRODUCTION capable of processing neural networks in an extremely cost-
effective and energy-efficient manner.
Processing visual information is a key task in applications While the effect of radiation on CNNs executed on TPUs
such as self-driving cars, airplanes, space probes, and Un- has already been studied [4], [5], to the best of our knowledge,
manned Aerial Vehicles (UAVs), where reliable computing this is the first paper investigating the impact of atmospheric
is also crucial [1]. Until recently, convolutional neural net- neutrons on the reliability of transformers running on TPUs. In
works (CNNs) were the main approach to detect or classify order to provide a complete and accurate reliability overview,
objects in an image or video. However, the accuracy of CNN- we consider six different vision transformer models: Compact
based detection is bounded by an intrinsic limitation due to Convolution Transformer (CCT) [6], two standard vision trans-
the very nature of the convolution operation: being a local formers (ViT) [7] (one with 8 attention heads and 8x8 patches,
operator performed as a sliding window over the input image, and another with 16 heads and 16x16 patches), and three
the network can only extract information from pixels that EfficientFormers [8] (with increasing internal sizes, named L1,
are spatially close to each others. Attempts to increase the L3 and L7). Our data shows that CCT has the lowest FIT
receptive field of CNNs [2], [3] have shown improvements rate, suggesting a reliability benefit in adopting convolution.
in global reasoning capabilities at the expense of efficiency. Additionally, the FIT rate of the EfficientFormers does not
Nonetheless, these approaches either introduce a significant depend on the model size, whereas ViT-16 has a 5x higher
information bottleneck [2] or they enlarge the kernel size [3] FIT rate compared to the smaller ViT-8.
Additionally, to better understand the main reasons for the
This work was supported in part by the Italian Ministry for University and
Research (MUR) through the “Departments of Excellence 2023-27” program observed phenomena, we characterize the reliability of four
(L.232/2016) awarded to the Department of Industrial Engineering and by micro-benchmarks: two single attention heads (one from ViT-
the European Union’s 2020 research and innovation programme under grant 8, the other from ViT-16), and the transformer encoders from
agreement No 101008126, corresponding to the RADNEXT project
Pablo Rafael Bodmann is with the Universidade Federal do Rio Grande do ViT-8 and ViT-16, respectively. As these micro-benchmarks
Sul (UFRGS), Porto Alegre, Brazil (e-mail: [email protected]). represent the most characteristic atomic operations of ViTs,
Niccolò Cavagnero is with the Department of Control and Computer Engi- they provide insight into how the architecture of each model
neering of the Politecnico di Torino, Italy (email: [email protected]
Christopher Frost is with ChipIR, Rutherford Appleton Laboratory, affects the FIT rate. Furthermore, we evaluated eight additional
Science and Technology Facility Council, Didcot, UK (email: christo- micro-models that represent the main operations performed by
[email protected]). the ViT model: patch embedding, multi-head self-attention,
Paolo Rech and Bruno Loureiro Coelho are with the Department of Indus-
trial Engineering of the University of Trento, Italy (e-mail: [email protected] transformer layer, and multi-layer perceptron classification
and [email protected]). head.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Transactions on Nuclear Science. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TNS.2024.3513774

IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. XX, NO. XX, XXXX 2024 2

Overall, we present experimental data on 18 configurations an output. While each neuron is relatively simple, a large
of vision transformers tested for more than 266 hours of number of neurons in parallel, called a layer, is able to
effective neutron irradiation at the ChipIR facility. When process complex information. Deep learning stacks several of
scaled to the natural neutron flux at New York City [9], these layers in sequence to build powerful models capable of
this accounts for more than 258 billion years of neutron achieving super-human performance in specific tasks [11].
exposure. Our results show that the probability of radiation- While NNs can achieve high accuracy in image classifica-
induced errors affecting the output of a model increases with tion tasks, radiation-induced faults can negatively affect the
the model size. Furthermore, the probability of these errors model by causing SDCs. However, considering the output of
is significantly affected by the complexity of the architecture, a neural network is probabilistic, the corrupted output can
with more complex architectures such as the ones used in still allow for a correct classification. This can happen, for
the EfficientFormers [8] being more susceptible to radiation instance, when the corruption modifies classification probabil-
effects. In addition to characterizing the reliability of different ities without changing the class with the highest probability.
ViTs on the Coral EdgeTPU, we identify the most critical Therefore, SDCs that do not affect the final classification are
blocks of the transformer architecture. Specifically, our ex- considered tolerable SDCs. In contrast, some SDCs do change
perimental data shows that radiation-induced errors on the the classification, thus being considered critical SDCs.
patch embedding layer of the transformer model are more
likely to lead to misclassifications than errors on other layers
of the model. Fortunately, our analysis has also shown that B. Vision transformers
employing convolution operations on the patch embedding Transformers are the current State-of-the-Art in machine
layer improves the resilience of the ViT model. These results learning (ML) models, being able to outperform previous
can be used to design effective selective hardening techniques architectures in multiple tasks across several fields, such as
that improve the overall reliability of the model or to tune computer vision and robotics. Vision transformers (ViT) [7]
existing reliability solutions specifically designed for machine were shown to outperform convolutional neural networks
learning models [10]. (CNNs), the previously most commonly adopted architecture
The remainder of the paper is structured as follows. Sec- for image processing. The improvement in accuracy achieved
tion II presents background information and related work. by ViT is in large part due to its ability to process the entire
Next, we describe the experimental methodology in Sec- image at once. In contrast, CNNs are intrinsically limited
tion III. The results of our experiments are discussed in due to convolution being a local operation, thus binding the
Section IV, where we characterize the reliability of transformer maximum achievable accuracy [7].
models and micro-benchmarks. Finally, Section V concludes Figure 1 illustrates a simplified architecture of the standard
the paper with our final remarks. Vision Transformer [7], which adapts an architecture initially
developed for natural language processing tasks to be able to
II. BACKGROUND & R ELATED W ORK process images. The basic ViT architecture splits the input
In this section, we present background information on the picture in non-overlapping patches, which are then encoded
effects of radiation on neural networks and discuss related with information about their spatial position in the image.
work. Additionally, we provide details on the vision trans- After this initial encoding, the data is processed by a series of
former architecture, and on the Coral Edge TPU. transformer layers, responsible for the extraction of informa-
tion from the input. Each transformer layer processes the input
through a combination of self-attention heads and a multi-layer
A. Effects of radiation on neural networks perceptron (MLP). Each attention head leverages the concept
Radiation-induced transient faults have three possible out- of self-attention to capture both global and local dependen-
comes: (1) the fault propagates into an error that causes a cies in the input data. More specifically, the self-attention
detected unrecoverable error (DUE): a program crash or hang, mechanism weighs the importance of different patches in an
thus requiring a restart of the application or the device, (2) the image with respect to every other patch. This allows the
fault propagates through the stack of system layers and leads to model to identify and focus on the more complex and relevant
a silent data corruption (SDC), affecting the application output, relationships in the image. Therefore, the attention head is
or (3) the application is unaffected (i.e., the fault is masked, one of the core components of the ViT architecture, being
or the corrupted data is not used) [11]. The probability of responsible for identifying the main informative features of the
radiation causing SDCs or DUEs depends on a combination input, thus affecting the final classification. After computing
of factors, including the hardware architecture (such as the the attention scores, the transformer block leverages an MLP
memory/logic sensitivity [12], [13]), and the application [14]. to increase the non-linear fitting capability of the model.
As such, there is a need to study the reliability of a given This process is repeated for each of the transformer layers
application implemented on the selected hardware in order to in the model, with the output of one layer being used as
safely deploy the system. the input of the next layer. Finally, the output of the last
Neural networks (NNs) are being applied to solve various transformer layer is forwarded to a classifier MLP responsible
tasks in several fields, such as computer vision and robotics. for outputting a prediction (class of an object). For a more in-
A neural network is based on artificial neurons, which receive depth understanding of the vision transformer architecture, we
weighted inputs and apply an activation function to produce refer the reader to the original Vision Transformer paper [7].

IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. XX, NO. XX, XXXX 2024 3

Classifier MLP can say that this device is likely manufactured in 12 or 16 nm

finFET technology, based on data published by Google [15],
Normalization [16]. A Coral TPU is capable of computing 4 Tera operations
per second, with a maximum consumption of 2W, making it
Transformer Layer N
a highly-efficient accelerator. To this end, the TPU operates
...
over unsigned 8-bit integers, thus reducing the latency of data
Transformer Layer 2 transfer between host and TPU, and additionally improving
Transformer Layer 1 efficiency. As sensor data is usually in floating-point, the input
data first goes through a dedicated quantization layer before
being processed by the TPU. Once the accelerator finishes its
computations, the output then goes through a de-quantization
Multi-Layer Perceptron
layer before being easily accessible by the host device. As the
Normalization TPU only operates on unsigned 8-bit integers, the quantization
layers are implemented by the host device, which is fortunately
a negligible overhead due to the simplicity of the layers.
In our experiments, we adopt a Coral USB accelerator
Multi-Head Self-Attention Edge TPU attached to a host Raspberry Pi 4. This setup
has two main advantages, as the Raspberry Pi 4 represents
Normalization a realistic embedded application scenario, while also allowing
the neutron beam to target the TPU without irradiating the
Patch Embedding host device. Thus, this setup enables us to easily evaluate the
reliability of the TPU without introducing errors in the host
device, as the latter is not irradiated.
Figure 2 shows an overview of the architecture of the
Coral Edge TPU, which is composed of a systolic array fed
Fig. 1: Architecture of a vision transformer, adapted from [7]. by a large set of input buffers that do not have any kind
of error protection. The systolic array applies the model’s
weights on the input of each layer before forwarding them
Given the complex and heterogeneous architecture of vision
to the activation unit. This unit accumulates the partial sums
transformers, the effects of radiation may vary depending
(of inputs multiplied by weights) then applies the activation
on the details of the architecture and model configuration.
function, generating the output of the layer. The output of
Therefore, considering the high accuracy and popularity of
each layer is then used as the input of the next layer, which
these models, we evaluate the reliability of six different
repeats the process with the respective weights and activation
ViT models, and four micro-benchmarks that represent the
functions of each layer. After the final layer, the output is sent
core of the ViT architecture: single attention heads with two
back to the host device, which applies the de-quantization layer
different internal sizes, and transformer encoders with two
and returns the floating-point output to the application.
different configurations. Furthermore, to advance the under-
In order to accelerate a NN with a TPU, the model must
standing of the causes of critical SDCs, we evaluate micro-
be first converted into an appropriate format. The framework
models, i.e., partial models where we can observe intermediate
for the Coral Edge TPU leverages TensorFlow and Tensor-
outputs rather than only the final classification. The details
FlowLite, a collection of libraries available in C++ and Python.
of each model, micro-benchmark, and micro-model are more
The workflow necessary to map a NN into the TPU starts with
thoroughly described in Section III-B.
a regular TensorFlow model, in 32-bit floating point precision.
As vision transformers process the entire input at once,
Once the model is defined and trained, TensorFlowLite enables
utilizing ViTs in real-time applications requires considerable
the conversion of the model to a quantized version that adopts
computational power. Fortunately, ViTs can be mapped to low-
unsigned 8-bit integer. Finally, this quantized model is then
power Edge AI accelerators that allow applications to achieve
converted into a TPU-compatible model by the Coral TPU
accurate image classification in a highly efficient manner. By
compiler, which is provided by Google. Through quantization,
using these devices, it is possible to perform visual tasks in
vision transformers severely reduce the computational cost
embedded applications that have power constraints and high
on a TPU while maintaining an accuracy comparable with
accuracy requirements, including safety-critical scenarios such
the original ViT. Once a model is successfully converted and
as self-driving cars, airplanes, space probes, and nano-sats.
compiled to run on the Coral TPU, utilizing the model is
a straightforward process. First, the host application (e.g.,
C. Coral Edge TPU Python script) instantiates a Coral interpreter, which then loads
The Coral Edge TPU is a co-processor specialized in accel- the already-compiled model. Next, the application utilizes the
erating neural networks, making it a promising candidate to Coral interpreter API to load the input data into the TPU
be deployed in embedded applications, where power-efficiency input buffers, which is achieved in a few lines of code.
and performance are a requirement. While information and Finally, the host application requests the Coral interpreter to
documentation about the technology of the TPU is sparse, we run inference on the TPU, an API call that returns when the

IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. XX, NO. XX, XXXX 2024 4

TPUs

USB

Fig. 2: Google’s Coral Edge TPU architecture. Adapted from Rasp4

[17].
Fig. 3: The setup at ChipIR. The Raspberry is out of the
picture.
outputs are already available via the interpreter API. At this
point, the application can easily access the de-quantized output
in floating-point, thus making it readily usable. Didcot, UK. ChipIR delivers a neutron beam suitable to mimic
the atmospheric neutron effects in electronic devices [28],
D. Related work allowing the measurement of the Failure In Time (FIT) rate of
the device executing a code. Figure 3 shows part of our setup
The reliability of the Coral Edge TPU has been studied with
at ChipIR, where we irradiated four TPUs using a 3 × 3 cm
neutron [18], [19], [20], [21], heavy ion [15], [4], proton [4],
beam spot, which is sufficient to irradiate the chip uniformly.
[22], and Co-60 [22] experiments with several applications,
The available neutron flux was about 3.5 × 106 n/(cm2 /s),
such as CNNs. Due to the TPU largely being a functional
allowing us to acquire data equivalent of 1, 844 years of natural
unit, as shown in existing research, radiation-induced errors
exposure in only 60 seconds.
often manifest as single-event upsets, thus leading to SDCs.
Although less frequent, radiation can also cause recoverable or
unrecoverable single-event functional interrupts, which respec- B. Tested codes and experimental setup
tively can or cannot run more applications without requiring a In order to provide an in-depth evaluation of vision trans-
device reboot [15]. While these studies have shown promising formers, we selected six transformer models: ViT-8 and ViT-16,
results for the deployment of NNs on Edge TPUs, to the which are TPU versions of the standard ViT [7] with different
best of our knowledge, none of them have evaluated the configurations; Compact Convolutional Transformer (CCT), a
reliability of vision transformer models. Due to the popularity transformer that applies convolutions during patch embedding;
of vision transformers, some research has been carried out and three EfficientFormers with increasing complexity, named
on their reliability [23], [24], [25], [26], but targeting other L1, L3 and L7. The input images were taken from the CIFAR-
accelerators, such as GPUs. Hence, this work is the first 100 dataset and enlarged to 64x64 pixels during both training
paper to investigate the impact of atmospheric neutrons on (preparing) and inference (evaluation) of the models. A list of
the reliability of vision transformers running on TPUs. each of the transformer models and their main characteristics
As vision transformers have significantly higher accuracy follows:
than CNNs [27], deploying ViTs on a Coral Edge TPU, an • ViT-8, the classical ViT model as described in [7], with
extremely power-efficient accelerator, presents opportunities 8 heads with 128 channels and 8x8 patches;
for several applications, such as self-driving cars, and space • ViT-16, a ViT with 16 heads with 256 channels and 16x16
probes. patches;
• CCT [6], a modification of the original ViT-8 (8 heads)
III. M ETHODOLOGY that adopts convolutions to create and tokenize the image
In this section, we describe the experimental setup and the patches;
• EfficientFormer [8], architectures that include a more
codes used for our evaluation.
advanced and efficient transformer block. We choose
three models from this family with increasing internal
A. Neutron beam experiments sizes (L1, L3, and L7). All of them have 8 attention heads.
The radiation experiments were performed at the ChipIR L3 is 2.54× larger than L1, whereas L7 is of slightly
facility at the Rutherford Appleton Laboratory (RAL) in increased size (2.59× larger than L1).

IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. XX, NO. XX, XXXX 2024 5

Classifier MLP indicate which parts of the image contain different kinds of
Patch Embedding
relevant information. This information is then passed through
Partial Output
(prev. layer) an MLP to increase the non-linear fitting capability of the
model, which completes a single (d) transformer layer. In
(a) Patch Embedding micro- (b) Classifier Multi-Layer Per-
model. ceptron micro-model.
other words, the (d) transformer layer includes the (c) multi-
Transformer Layer 1
head self-attention block and an additional MLP (along with
residual and normalization layers, as previously described and
Transformer Layer 1
Multi-Layer Perceptron
shown in Figure 1).
Normalization Normalization
The micro-models selected allow us to obtain and ana-
lyze the intermediate outputs of the model, which would
be an otherwise impossible or highly inefficient process due
Multi-Head Self-Attention Multi-Head Self-Attention
to the way the Edge TPU functions. Particularly, the Edge
Normalization Normalization
TPU does not allow us to obtain intermediate results of a
Patch Embedding Patch Embedding
neural network without executing part of the model on the
host device. Therefore, obtaining intermediate results without
micro-models requires synchronizing the TPU and host after
(c) Multi-Head Self-Attention (d) Transformer Layer micro- every single layer to exchange the outputs and inputs of each
micro-model. model.
layer (which also requires quantization of values). Instead, our
Fig. 4: Selected micro-models for our ablation experiments. approach allows us to obtain the output from each micro-
Each micro-model allows us to gain insight into vital blocks model, which can then be used for further analysis.
of ViT. The experiment consists of each TPU running one of the
models, micro-benchmarks, or micro-models listed above.
Additionally, each host device (Raspberry Pi 4) only has one
ViT-8 and ViT-16 are chosen to understand the impact of TPU connected via USB. Therefore, each host only runs
the number of heads in the ViT sensitivity. CCT is tested in one benchmark at a time. Figure 5 shows an iteration of
order to measure the effect of convolution in the error rate the experiment, which starts with the TPU being initialized
of a transformer, and the three EfficientFormers are chosen with the model parameters, the test images, and the expected
to compare a more efficient transformer block with increasing (golden) output for each image. After the initialization, the
complexity. main loop starts: the image is fed as input to the TPU, which
Due to limitations on the TPU, the attention heads were will then apply the model over that input. When the TPU
implemented from scratch rather than using the ones available completes its computations, it returns the output to the host
in TensorFlow. This was done because the GELU [29] activa- device, which in turn compares the obtained result with the
tion function used by the MLPs is not supported by the TPU respective fault-free golden (expected) output. If there is any
compiler. Thus, to overcome this limitation, we use the Tanh discrepancy between the computed output and the golden
approximation [30], which can be mapped to the TPU without output, the erroneous data is logged for posterior analysis.
impacting the model accuracy. After all the images of the batch have been tested, the main
Besides the transformer models, we also evaluated micro- loop starts again from the first image. Considering that only
benchmarks, which are characteristic atomic operations ex- the layers of the neural network are executed on the TPU,
ecuted in the ViT models. The micro-benchmarks comprise whereas the comparison is executed on the Raspberry Pi (not
two different single attention heads with the same sizes as the irradiated), one can assume that all observed errors come from
ones in ViT-8 (Attention 1) and ViT-16 (Attention 2), and two the TPU.
transformer encoders, which also follow the sizes of the ones
in ViT-8 and ViT-16 (listed as Transformer Encoder 1 and 2,
IV. E XPERIMENTAL R ESULTS
respectively).
Additionally, in order to analyze how radiation affects dif- In this section, we present the results of neutron experi-
ferent parts of the ViT model, we selected four micro-models, ments with several transformer models and micro-benchmarks.
as shown in Figure 4. The idea is to propose an ablation study, Section IV-A shows how radiation-induced errors affect trans-
incrementally adding parts of the ViT model to understand the formers on the Edge TPU, while Section IV-B analyzes how
contribution of each part to the overall framework error rate. errors in different layers affect the correct application output.
These micro-models were selected to evaluate several aspects
of the ViT model: first, (a) patch embedding is responsible for
creating an efficient representation of the image (which has a A. Radiation-induced errors on Edge TPU transformers
high dimensionality) while retaining the necessary information In order to better understand how radiation affects vision
for the subsequent blocks of the model. Next, we wanted transformers running on Edge TPUs, we evaluated both entire
to isolate the (b) MLP classifier, the block responsible for transformer models and micro-benchmarks that compose the
outputting the classification scores. After evaluating the very core computations of transformers. While evaluating entire
first and last blocks, we selected the (c) multi-head self- models allows us to characterize the reliability of different
attention block, which computes the attention scores that configurations and architectures, evaluating micro-benchmarks

IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. XX, NO. XX, XXXX 2024 6

400

286.31
270.39
SDC

260.13
Initialization 350

229.60
Raspberry Pi 4 Critical SDC
300
Crashes
250

FIT
Set image as 200
first layer 150

48.95
28.55

14.05

13.19
12.59
100

8.42
8.42
7.64

6.91
6.41

3.30

1.92
0.66

3.11
Run Transformer Model Compare 50
inference with No SDC 0
CCT ViT 8 ViT 16 L1 L3 L7
template
SDC detected Fig. 7: FIT rate for the tested models. CCT uses convolution,
Log SDC ViT 8 and 16 are the classical transformers, and L1, L3, and
L7 are EfficientFormers with increasing complexity.

Fig. 5: Flow of an iteration of the experiment for the trans-

formers. matches those of ViT-8 (8 heads of size 128), whereas the
second matches those of ViT-16 (16 heads of size 256). The
transformer encoder is the main block of the model, combining
70
SDC the information from each self-attention head and MLP.
45.69

60
Crashes Based on the results shown in Figure 6, despite the different
50 sizes, the FIT rates of Attention 1 and Attention 2 are
40 similar. However, this is not true for the transformer encoders:
FIT

18.41
12.76

30 Transformer Encoder 2 has a 2.51× higher FIT rate than

11.59

Transformer Encoder 1. As the comparison between attention

5.03

20
1.98

1.65

1.62

10 heads showed that the internal size has a negligible effect on

0 the FIT rate, we can deduce that the difference between the
A1 A2 T1 T2 transformer encoders is due to the increased number of heads
Fig. 6: FIT rate for the micro-benchmarks. A1 and A2 are (8 heads in Transformer Encoder 1 and 16 in Transformer
single attention heads of size 128 and 256, respectively. T1 Encoder 2). Therefore, transformers with a smaller number of
and T2 are transformer encoders of 8 heads of size 128, and heads are likely to be more reliable due to the lower probability
16 heads of size 256, respectively. of radiation-induced errors. However, it is important to note
that reducing the number and size of heads may decrease the
accuracy of the model: while ViT-16 (16 heads of size 256)
provides more detailed information about which parts of a has an accuracy of 97%, ViT-8 (8 heads of size 128) achieves
model are more susceptible to radiation-induced faults. only 93% accuracy.
Figure 6 shows the SDC (blue) and DUE (red) FIT rates In addition to the micro-benchmarks, we also evaluated
measured for each of the four micro-benchmarks previously the reliability of six vision transformer models to investigate
described in Section III-B. The data is plotted with 95% how radiation affects each architecture. Figure 7 shows the
confidence intervals considering a Poisson distribution (we FIT rates measured for the models previously explained in
collected at least 100 events per micro-benchmark). As ex- Section III-B: CCT uses convolutions in conjunction with
pected, the FIT rate for DUEs is significantly lower than the self-attention heads; ViT-8 and ViT-16 are the baseline vision
FIT rate for SDCs, which is a characteristic of the Edge TPU: transformer model with different internal configurations (ViT-
the separation between accelerator (TPU) and host (Raspberry 16 is larger); and EfficientFormer L1, L3, L7 each use 8 heads
Pi 4) makes DUEs less likely than in embedded systems. of increasing complexity (i.e., EfficientFormer L7 has more
Additionally, the drivers of the Coral Edge TPU run entirely complex heads than L3 and L1).
on the host device (Raspberry Pi 4), further increasing the While both CCT and ViT-8 have 8 attention heads, ViT-
robustness of the accelerator to DUEs. As the DUE FIT rate 8 has a 1.71× higher SDC FIT rate. Since the internal size
is both low and consistent in every model evaluated, the rest of each head has little impact on the FIT rate (as previously
of our analysis will focus on radiation-induced SDCs. shown in Figure 6), this difference can be attributed to the
As previously described, Attention 1 and Attention 2 are way the input image is processed: CCT uses convolutions in
single self-attention heads, where each self-attention head order to create and tokenize the patches of the input image,
identifies relationships between image patches on both a local whereas ViT-8 directly splits the image and creates tokens
and global scale. The difference between the two self-attention from image patches. Since convolutions are applied in an over-
heads is the internal size used: Attention 1 matches the size of lapping manner over neighboring pixels, each patch includes
the heads used in ViT-8 (internal size of 128), and Attention some information from adjacent patches. This redundancy
2 matches the size of ViT-16 (internal size of 256). Similarly, of information could explain why convolutions improve the
Transformer Encoder 1 and Transformer Encoder 2 differ resilience of the model, as even if a patch is corrupted, part
by the number of heads and internal size used: the first of its information is being encoded in neighboring patches.

IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. XX, NO. XX, XXXX 2024 7

Convolutions, then, not only help in improving the training

efficiency, but also reduce the model SDC FIT rate.
In line with the evaluation of micro-benchmarks, the results
for the full models, shown in Figure 7, confirm that larger
models are more sensitive to neutrons: ViT-16 has a FIT rate
5.31× higher than ViT-8. This difference is mainly due to the
higher number of heads in ViT-16, as seen in the comparison
between the Transformer Encoder 1 and Transformer Encoder
2 micro-benchmarks. Additionally, the SDC FIT rates of the
three EfficientFormers are very similar to each other, which
can be attributed to the models having the same number of
heads. While they have different internal sizes, the results for
the EfficientFormer models agree with the evaluation of micro-
benchmarks, which show that the size of the attention head has
negligible impact on the SDC FIT rate.
Surprisingly, the three EfficientFormers have a very similar
SDC FIT rate to ViT-16 despite the latter having double the Fig. 8: Average error magnitude (relative to correct value) for
number of heads. This result can be explained by architectural ViT-8 and ViT-16 models.
differences between ViTs and EfficientFormers, as the latter
uses several inverted residual blocks (similar to the ones inside
the MobileNet [31] CNN), including a modified block with the carefully selected and evaluated select parts (or micro-models)
attention heads. Additionally, EfficientFormers run patch em- of the ViT model, as previously detailed in Section III-B: the
bedding multiple times - once at the start of each “MetaBlock” patch embedding, the first multi-head self-attention (including
(the main blocks of the EfficientFormer architecture). In future residual and normalization layers), the first transformer layer,
work, we plan to evaluate the reliability of the different blocks and the final MLP classifier. By evaluating these blocks, we
of EfficientFormers (micro-benchmarks). are able to identify the parts of the model that are more
In addition to the SDC FIT rate (blue) shown in Figure 7, vulnerable to radiation, both in terms of probability of an
we also measured the critical SDC FIT rate (yellow). A error happening (SDC FIT rate), and how an error propagates
critical SDC is defined as a wrong classification, i.e., the class to the final output (how likely it is to cause a critical SDC).
detected is not the expected one. In contrast, a tolerable SDC Our experiments included each of the 4 blocks for the ViT-
is a change in classification probabilities without changing 8 and ViT-16 configurations, with the exception of the patch
the predicted class, meaning that any number of prediction size: every micro-model used a patch size of 8 × 8. By using a
probabilities differed from what was expected, yet the final constant patch size, we were able to further analyze the impact
classification was not changed. Our evaluation shows that the of the size of the internal dimensions of the transformers.
critical SDC FIT rate (yellow) is quite low, meaning that neu- First, we analyzed the average error magnitude for tolerable
trons have a low chance of modifying the final classification. and critical SDCs in order to determine if there is a significant
A further analysis of critical SDCs (not shown in the figure) difference depending on the criticality of the SDC. Figure 8
revealed that the expected class is still among the classes shows the average error magnitude for the ViT-8 and ViT-16
with the five highest probabilities, meaning that transformer (full) models, with a higher value meaning that the incorrect
models are capable of extracting some correct information output is further from the correct (expected) value. The data
even through radiation-induced errors. shows that critical SDCs have higher error magnitude than
We further analyzed tolerable (non-critical) SDCs by mea- tolerable SDCs for both ViT-8 and ViT-16. This result can
suring how much the probability of the detected class has be intuitively explained, as larger differences in the class
changed. The results showed that, in tolerable SDCs, ViTs probabilities will likely lead to misclassification. In addition
present considerable differences in the probability of the to the transformer models, we also analyzed the output of
expected class: 4.08% in ViT-8, and 4% in ViT-16. In contrast, the micro-models, shown in Figure 9. Surprisingly, the error
the effect on CCT and the EfficientFormers is lower: 1.4% in magnitude for the output of micro-models does not vary
CCT, 1.8% in EfficientFormer L1, 1.5% in L3, and 2.39% significantly whether it is a tolerable or critical SDC. This
in L7. These results indicate that CCT is the most robust could mean that the initial value value of the error does not
network among the ones tested, as it presented the lowest FIT have a large impact on whether it will cause a misclassification
rate, lowest critical SDC FIT rate, and little change in the or not, but rather that the misclassification happens due to how
probability of the expected class on the non-critical SDCs. the error propagates.
Next, we aimed to identify how errors in different parts of
the transformer model propagate into the output. As previously
B. Identifying the causes of misclassifications explained in Section III-B, we cannot extract the output of
On top of characterizing the reliability of micro-benchmarks every single layer during one execution on the Edge TPU.
and models (in the discussion above), we aimed to identify the Therefore, as illustrated in Figure 10, we leverage corrupted
causes of misclassifications (critical SDCs). To this end, we outputs of micro-models (gathered with real beam experi-

IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. XX, NO. XX, XXXX 2024 8

Average Error Magnitude 0.60 40

27.18
0.48
0.47
Tolerable Error SDC Critical SDC Crashes
0.50 Critical Error 35

0.37

0.38
0.37
30

0.33
0.32

0.33
0.31
0.29
0.40

0.28
0.26

FIT
25

14.02
0.30
20
0.20
15

7.60

5.26

4.89
0.10

5.11
4.49
10

3.10
2.90

2.88
2.50

2.41

0.96
1.71

0.52
0.84

0.04
1.05
0.67

0.43

0.04
0.00

0.22
in
g io
n
ye
r
i ng io
n
yer 5
nt nt

0
dd La dd La

0
be A tte er be Atte er 0
Em lf- rm Em lf- rm
Se fo Se fo g io
n
ye
r er g
io
n
ye
r er
tc
h s ch s in
nt ifi in
nt ifi
Pa ea
d an Pa
t
ea
d an dd te La ss dd te La ss
Tr Tr be t er la be At er la
i-H i-H f-A C lf- C
ul
t
ul
t Em l rm
LP Em rm
LP
h Se sf
o h Se fo
M M tc d an
M tc d ns M
Pa ea al Pa ea a al
H Tr Fi
n H Tr Fi
n
ti- ti-
ul ul
M M

VIT-8 VIT-16
Fig. 9: Average error magnitude (relative to correct value) for
VIT-8 VIT-16
ViT-8 and ViT-16 blocks (micro-benchmarks).
Fig. 11: SDC (tolerable and critical) and DUE FIT rates of
the main parts of the ViT-8 and ViT-16 transformer models.
Radiation-induced fault
Micro-model
Corrupted
Input Data Patch Embedding
Micro Output subsequent layer, which is further aggravated by the nature
Neutron Beam Experiment Log corrupted micro output of the patch embedding layer. Finally, patch embedding is
Post-processing Inject observed
the only non-residual layer, meaning that radiation-induced
Corrupted
(without beam) Corrupted corrupted output
Final Output errors are not smoothed out by re-using previously (correctly)
Micro Output

Full model
computed data.
Patch Transformer Classifier
Input Data
Embedding Encoder MLP
C. Impact of experimental results
Fig. 10: We utilize the corrupted outputs observed and col- Based on the results shown above, we are able to identify
lected during beam experiments to inject (real) errors in the the patch embedding as the most critical operation in the ViT
full ViT models. model, meaning that a radiation-induced error in this block has
a high probability of resulting in a misclassification (critical
SDC). Additionally, while transformer layers have the highest
ments) and inject them into the full model to obtain what FIT rate of the evaluated blocks, most of these SDCs are
would be the final corrupted output (i.e., the classification tolerable, i.e., they do not change the final classification. This
probabilities). While the figure shows an example for the patch is an important result considering the transformer encoder
embedding micro-model, the process is analogous for the other comprises the vast majority of the computations performed by
three micro-models, with the corrupted outputs collected dur- ViT, as this block includes several transformer layers (3 in the
ing the experiment being injected into the appropriate part of evaluated models). In contrast, the patch embedding requires
the full model. This process allows us to observe intermediate orders-of-magnitude fewer operations and is only executed
errors while also being able to realistically simulate how the once in the ViT model. Thus, it may be possible to improve
error would have affected the final output. the reliability of ViT by implementing selective hardening
Figure 11 combines the SDC FIT rate of each evaluated techniques on this block. Due to it being a light-weight
micro-model with the ratio of critical SDCs. Based on these operation, replicating this block would introduce negligible
results, while the transformer layer is the most likely to suffer overhead while protecting the most critical operation of the
radiation-induced faults, these SDCs rarely affect the final model. Alternatively, as shown in the comparison of vision
classification, i.e., they are tolerable SDCs (meaning they are transformer architectures, employing convolutions in the patch
not critical). In contrast, errors in the patch embedding often embedding increases the reliability of the model.
lead to misclassifications (critical SDCs): around 15% of all
SDCs in patch embedding lead to misclassification in ViT- V. C ONCLUSION
8 and 20% in ViT-16. These results show that errors while In this work, we reported the results collected after irradi-
processing the initial image are more critical than errors af- ating Coral Edge TPUs with neutrons for over 266 effective
fecting the computation in the internal parts of the transformer hours. We considered six different transformer models, four
encoder. Because the patch embedding reduces the dimension micro-benchmarks, and eight parts of the Vision Transformer
of the initial image with a large number of pixels (over 12, 000 (ViT) model (micro-models), for a total of eighteen config-
values for a 64x64 image) to a small number of patches urations evaluated. Data showed that the size of the head
(e.g., 192 patches for ViT-8), errors in this procedure may has a negligible influence on the model FIT rate, while the
significantly affect the representation of patches. Additionally, number of heads impacts the SDC FIT rate significantly.
errors in the first layer of the model will cascade into every When comparing different models, the results indicate that

IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. XX, NO. XX, XXXX 2024 9

the underlying architecture of the transformer has a large [15] M. Casey, E. Wyrwas, and R. Austin, “Recent radiation test results
influence on the SDC FIT rate. By evaluating both micro- on cots ai edge processing asics,” in NEPP Electronics Technology
Workshop (ETW), Greenbelt, Maryland, USA, Jun. 2022.
benchmarks and full models, our experimental data provided [16] N. P. Jouppi, D. Hyun Yoon, M. Ashcraft, M. Gottscho, T. B. Jablin,
valuable insights on the sensitivity of each part of vision G. Kurian, J. Laudon, S. Li, P. Ma, X. Ma, T. Norrie, N. Patil, S. Prasad,
transformers to radiation-induced faults. Additionally, after C. Young, Z. Zhou, and D. Patterson, “Ten lessons from three genera-
tions shaped google’s tpuv4i : Industrial product,” in 2021 ACM/IEEE
observing real SDCs in different parts of the ViT model, we 48th Annual International Symposium on Computer Architecture (ISCA),
injected the errors collected during the experiment in order Virtual Event, Spain, Jun. 2021, pp. 1–14.
to determine what parts of the model are more likely to [17] Q-Engineering. Google coral edge tpu explained in depth. Accessed:
2023-02-01. [Online]. Available: https://qengineering.eu/google-corals-
cause misclassifications. Based on this analysis, we identified tpu-explained.html
the patch embedding as the most critical component of ViT. [18] R. L. R. Junior and P. Rech, “Reliability of google’s tensor process-
Interestingly, our experimental results have also shown that ing units for convolutional neural networks,” in 2022 52nd Annual
IEEE/IFIP International Conference on Dependable Systems and Net-
employing convolutions during patch embedding considerably works - Supplemental Volume (DSN-S), Baltimore, MD, USA, Jun. 2022,
improves the reliability of the model. pp. 25–27.
[19] R. L. Rech and P. Rech, “Reliability of google’s tensor processing
units for embedded applications,” in 2022 Design, Automation & Test
R EFERENCES in Europe Conference & Exhibition (DATE), Antwerp, Belgium, Mar.
2022, pp. 376–381.
[20] P. R. Bodmann and P. Rech, “Tensor processing unit reliability de-
[1] Road vehicles — Functional safety, International Organization for Stan-
pendence on temperature and radiation source,” IEEE Transactions on
dardization ISO 26 262, Dec. 2018.
Nuclear Science, vol. 71, no. 4, pp. 854–860, Apr. 2024.
[2] J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze-and-excitation
[21] P. R. Bodmann, M. Saveriano, A. Kritikakou, and P. Rech, “Neutrons
networks,” in 2018 IEEE/CVF Conference on Computer Vision and
sensitivity of deep reinforcement learning policies on edgeai accelera-
Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, Jun. 2018,
tors,” IEEE Transactions on Nuclear Science, vol. 71, no. 8, pp. 1480–
pp. 7132–7141.
1486, Aug. 2024.
[3] Z. Liu, H. Mao, C. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A [22] G. Lentaris, V. Leon, C. Sakos, D. Soudris, A. Tavoularis, A. Costantino,
convnet for the 2020s,” in 2022 IEEE/CVF Conference on Computer and C. B. Polo, “Performance and radiation testing of the coral tpu co-
Vision and Pattern Recognition (CVPR), Los Alamitos, CA, USA, Jun. processor for ai onboard satellites,” in 2023 European Data Handling
2022, pp. 11 966–11 976. & Data Processing Conference (EDHPC), Juan Les Pins, France, Oct.
[4] M. C. Casey, J. S. Goodwill, E. J. Wyrwas, R. A. Austin, C. M. Wilson, 2023, pp. 1–4.
S. D. Stansberry, N. Gorius, and S. Aslam, “Single-event effects on [23] K. Ma, C. Amarnath, and A. Chatterjee, “Error resilient transformers:
commercial-off-the-shelf edge-processing artificial intelligence asics,” A novel soft error vulnerability guided approach to error checking and
IEEE Transactions on Nuclear Science, vol. 70, no. 8, pp. 1716–1723, suppression,” in 2023 IEEE European Test Symposium (ETS), Venezia,
Aug. 2023. Italy, May 2023, pp. 32–37.
[5] R. L. Rech Junior, S. Malde, C. Cazzaniga, M. Kastriotou, M. Letiche, [24] X. Xue, C. Liu, Y. Wang, B. Yang, T. Luo, L. Zhang, H. Li, and
C. Frost, and P. Rech, “High energy and thermal neutron sensitivity of X. Li, “Soft error reliability analysis of vision transformers,” IEEE
google tensor processing units,” IEEE Transactions on Nuclear Science, Transactions on Very Large Scale Integration (VLSI) Systems, vol. 31,
vol. 69, no. 3, pp. 567–575, Mar. 2022. no. 12, pp. 2126–2136, Dec. 2023.
[6] A. Hassani, S. Walton, N. Shah, A. Abuduweili, J. Li, and H. Shi, [25] L. Roquet, F. Fernandes dos Santos, P. Rech, M. Traiola, O. Sentieys,
“Escaping the big data paradigm with compact transformers,” arXiv and A. Kritikakou, “Cross-Layer Reliability Evaluation and Efficient
preprint 2104:05704, Apr. 2021. Hardening of Large Vision Transformers Models,” in Design Automation
[7] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, Conference (DAC) preprint, San Francisco, CA, USA, Jun. 2024.
T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, [26] G. Gavarini, A. Ruospo, and E. Sanchez, “Evaluation and mitigation of
J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Trans- faults affecting swin transformers,” in 2023 IEEE 29th International
formers for image recognition at scale,” in International Conference on Symposium on On-Line Testing and Robust System Design (IOLTS),
Learning Representations, Vienna, Austria, May 2021. Crete, Greece, Sep. 2023, pp. 168–174.
[8] Y. Li, G. Yuan, Y. Wen, J. Hu, G. Evangelidis, S. Tulyakov, Y. Wang, [27] L. Yuan, Y. Chen, T. Wang, W. Yu, Y. Shi, Z.-H. Jiang, F. E. Tay, J. Feng,
and J. Ren, “Efficientformer: Vision transformers at mobilenet speed,” and S. Yan, “Tokens-to-token vit: Training vision transformers from
in Advances in Neural Information Processing Systems, vol. 35, New scratch on imagenet,” in Proceedings of the IEEE/CVF international
Orleans, Louisiana, USA, Nov. 2022, pp. 12 934–12 949. conference on computer vision, Guangzhou, China, Nov. 2021, pp. 538–
[9] C. Slayman, “Jedec standards on measurement and reporting of alpha 547.
particle and terrestrial cosmic ray induced soft errors,” in Soft Errors in [28] C. Cazzaniga and C. D. Frost, “Progress of the scientific commissioning
Modern Electronic Systems. Boston, MA, USA: Springer US, 2011, of a fast neutron beamline for chip irradiation,” Journal of Physics, vol.
vol. 41, ch. 3, pp. 55–76. 1021, pp. 12 037–12 041, May 2018.
[10] N. Cavagnero, F. Dos Santos, M. Ciccone, G. Averta, T. Tommasi, [29] D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),” arXiv
and P. Rech, “Transient-fault-aware design and training to enhance preprint 1606:08415, Jun. 2023.
dnns reliability with zero-overhead,” in 2022 IEEE 28th International [30] ——, “Bridging nonlinearities and stochastic regularizers with gaussian
Symposium on On-Line Testing and Robust System Design (IOLTS), error linear units,” arXiV preprint 1606:08415, Nov. 2016.
Torino, Italy, Sep. 2022, pp. 181–187. [31] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand,
[11] P. Rech, “Artificial neural networks for space and safety-critical appli- M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural
cations: Reliability issues and potential solutions,” IEEE Transactions networks for mobile vision applications,” arXiv preprint 1704:04861,
on Nuclear Science, vol. 71, no. 4, pp. 377–404, Apr. 2024. Apr. 2017.
[12] R. Baumann, “Radiation-induced soft errors in advanced semiconductor
technologies,” IEEE Transactions on Device and Materials Reliability,
vol. 5, no. 3, pp. 305–316, Sep. 2005.
[13] J. Noh, V. Correas, S. Lee, J. Jeon, I. Nofal, J. Cerba, H. Belhaddad,
D. Alexandrescu, Y. Lee, and S. Kwon, “Study of neutron soft error
rate (ser) sensitivity: Investigation of upset mechanisms by comparative
simulation of finfet and planar mosfet srams,” IEEE Transactions on
Nuclear Science, vol. 62, no. 4, pp. 1642–1649, Aug. 2015.
[14] V. Sridharan and D. R. Kaeli, “Using hardware vulnerability factors to
enhance AVF analysis,” in Proceedings of the 37th annual international
symposium on Computer architecture, New York, NY, USA, Jun. 2010,
pp. 461–472.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

Chloride 80-Net Ups Manual
100% (3)
Chloride 80-Net Ups Manual
126 pages
Official Transcript of Competencies
No ratings yet
Official Transcript of Competencies
2 pages
A Survey On Vision Transformer
No ratings yet
A Survey On Vision Transformer
23 pages
EWD Camry 2006
No ratings yet
EWD Camry 2006
400 pages
Computer_Vision_IEEE_Paper
No ratings yet
Computer_Vision_IEEE_Paper
3 pages
Singh Training Strategies For Vision Transformers For Object Detection CVPRW 2023 Paper
No ratings yet
Singh Training Strategies For Vision Transformers For Object Detection CVPRW 2023 Paper
9 pages
Applsci 13 05521 v2
No ratings yet
Applsci 13 05521 v2
17 pages
用于目标检测的视觉Transformer的训练策略
No ratings yet
用于目标检测的视觉Transformer的训练策略
9 pages
ViTA A Vision Transformer Inference Accelerator For Edge Applications
No ratings yet
ViTA A Vision Transformer Inference Accelerator For Edge Applications
5 pages
A Survey On Efficient Vision Transformers Algorithms Techniques and Performance Benchmarking
No ratings yet
A Survey On Efficient Vision Transformers Algorithms Techniques and Performance Benchmarking
19 pages
(NIPS23) Scattering Transformation For ViT
No ratings yet
(NIPS23) Scattering Transformation For ViT
21 pages
BoltVision, A Comparative Analysis of CNN, CCT, and ViT in Bolt Detection
No ratings yet
BoltVision, A Comparative Analysis of CNN, CCT, and ViT in Bolt Detection
26 pages
Automatic Classification of Mechanical Components of Engines Using Deep Learning Techniques
No ratings yet
Automatic Classification of Mechanical Components of Engines Using Deep Learning Techniques
10 pages
Machines 11 01068 v2
No ratings yet
Machines 11 01068 v2
14 pages
Seminar
No ratings yet
Seminar
61 pages
An Investigation of Deep Neural Network Based Techniques For Object Detection An
No ratings yet
An Investigation of Deep Neural Network Based Techniques For Object Detection An
6 pages
A Simple Single-Scale Vision Transformer For Object Localization
No ratings yet
A Simple Single-Scale Vision Transformer For Object Localization
12 pages
Transformers_for_Vision_A_Survey_on_Innovative_Methods_for_Computer_Vision
No ratings yet
Transformers_for_Vision_A_Survey_on_Innovative_Methods_for_Computer_Vision
28 pages
Exploring The Synergies of Hybrid Cnns and Vits Architectures For Computer Vision: A Survey
No ratings yet
Exploring The Synergies of Hybrid Cnns and Vits Architectures For Computer Vision: A Survey
27 pages
Project Presentation
No ratings yet
Project Presentation
20 pages
A Simple Single-Scale Vision Transformer For Object Detection and Instance Segmentation
No ratings yet
A Simple Single-Scale Vision Transformer For Object Detection and Instance Segmentation
23 pages
Investigations of Object Detection in Im
No ratings yet
Investigations of Object Detection in Im
46 pages
Object Detection For Indoor Localization System
No ratings yet
Object Detection For Indoor Localization System
3 pages
Propsal
No ratings yet
Propsal
7 pages
Irjet V10i1063
No ratings yet
Irjet V10i1063
6 pages
Vitae: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias
No ratings yet
Vitae: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias
23 pages
Deep Learning For Material Recognition: Most Recent Advances and Open Challenges
No ratings yet
Deep Learning For Material Recognition: Most Recent Advances and Open Challenges
20 pages
An Overview of Vision Transformers For Image Processing A Survey
No ratings yet
An Overview of Vision Transformers For Image Processing A Survey
17 pages
Efficacy of Deep Learning Algorithms in Detecting Lung Cancer
No ratings yet
Efficacy of Deep Learning Algorithms in Detecting Lung Cancer
6 pages
Behavior Cloning For Self Driving Cars Using Attention Models
No ratings yet
Behavior Cloning For Self Driving Cars Using Attention Models
5 pages
LiDar Re
No ratings yet
LiDar Re
13 pages
Nivetha Me P2 PPT
No ratings yet
Nivetha Me P2 PPT
18 pages
2022 - ViTAEv2 - Zhang Et Al - Arxiv
No ratings yet
2022 - ViTAEv2 - Zhang Et Al - Arxiv
22 pages
Real-Time Fine-Grained Air Quality Sensing Networks in Smart City: Design, Implementation and Optimization
No ratings yet
Real-Time Fine-Grained Air Quality Sensing Networks in Smart City: Design, Implementation and Optimization
4 pages
Research Notes
No ratings yet
Research Notes
9 pages
A Survey of The Vision Transformers and Its CNN-Transformer Based Variants - Khan Et Al
No ratings yet
A Survey of The Vision Transformers and Its CNN-Transformer Based Variants - Khan Et Al
82 pages
Li Et Al. - 2022 - Rethinking Vision Transformers For MobileNet Size and Speed
No ratings yet
Li Et Al. - 2022 - Rethinking Vision Transformers For MobileNet Size and Speed
15 pages
Model Acceleration For Efficient Deep Learning Computing
No ratings yet
Model Acceleration For Efficient Deep Learning Computing
92 pages
Transfer Learning For Object Detection Using State-of-the-Art Deep Neural Networks
No ratings yet
Transfer Learning For Object Detection Using State-of-the-Art Deep Neural Networks
7 pages
Module V-Deep Learning
No ratings yet
Module V-Deep Learning
19 pages
Paper Review of Five Machine Vision Topics
No ratings yet
Paper Review of Five Machine Vision Topics
3 pages
Vehicle Perception With Lidar and Deep Learning
No ratings yet
Vehicle Perception With Lidar and Deep Learning
72 pages
Systematic Evaluation of Convolution Neural Network Advances On The Imagenet-2017
No ratings yet
Systematic Evaluation of Convolution Neural Network Advances On The Imagenet-2017
9 pages
A Review of Advances in Image Recognition Models F
No ratings yet
A Review of Advances in Image Recognition Models F
5 pages
Real Time Object Detection Using SSD and MobileNet
No ratings yet
Real Time Object Detection Using SSD and MobileNet
6 pages
Object Detection Using Deep Learning Approach
100% (1)
Object Detection Using Deep Learning Approach
9 pages
Assignement 2
No ratings yet
Assignement 2
3 pages
A Survey On Visual Transformer
No ratings yet
A Survey On Visual Transformer
23 pages
2103 - ICML - Perceiver General Perception With Iterative Attention
No ratings yet
2103 - ICML - Perceiver General Perception With Iterative Attention
16 pages
Real-Time Object Detection Using Deep Learning
No ratings yet
Real-Time Object Detection Using Deep Learning
4 pages
Analysis of Convolutional Neural Network Based Image Classification Techniques
No ratings yet
Analysis of Convolutional Neural Network Based Image Classification Techniques
19 pages
Challenging Task
No ratings yet
Challenging Task
21 pages
Li Et Al. - 2022 - EfficientFormer Vision Transformers at MobileNet Speed
No ratings yet
Li Et Al. - 2022 - EfficientFormer Vision Transformers at MobileNet Speed
19 pages
Deep Learning Based Image Recognition For 5G Smart
No ratings yet
Deep Learning Based Image Recognition For 5G Smart
19 pages
ViT Transformers SEMINAR
No ratings yet
ViT Transformers SEMINAR
16 pages
A Comprehensive Review of Convolutional Neural Networks For Defect Detection in Industrial Applications
No ratings yet
A Comprehensive Review of Convolutional Neural Networks For Defect Detection in Industrial Applications
46 pages
Real Time Object Detection With Deep Learning and OpenCV
No ratings yet
Real Time Object Detection With Deep Learning and OpenCV
5 pages
Final
No ratings yet
Final
55 pages
A Literature Review of Object Detection Using YOLOv4 Detector
No ratings yet
A Literature Review of Object Detection Using YOLOv4 Detector
7 pages
Phase 4
No ratings yet
Phase 4
27 pages
Escaping The Big Data Paradigm With Compact Transformers
No ratings yet
Escaping The Big Data Paradigm With Compact Transformers
18 pages
A Review On Artabotrys Odoratissimus (Annonaceae) : Saritha Kodithala and R Murali
No ratings yet
A Review On Artabotrys Odoratissimus (Annonaceae) : Saritha Kodithala and R Murali
3 pages
Super Memory British English Student A2 B1
No ratings yet
Super Memory British English Student A2 B1
6 pages
Illustration-W5 6
No ratings yet
Illustration-W5 6
16 pages
Insurance Industry Career
No ratings yet
Insurance Industry Career
6 pages
3.4 MOP Setpoint
No ratings yet
3.4 MOP Setpoint
4 pages
Pricing of Services: Presented By: Himanshu Gupta Sashank.V.V.N Vipul Srivastava
No ratings yet
Pricing of Services: Presented By: Himanshu Gupta Sashank.V.V.N Vipul Srivastava
21 pages
Greek Architecture
No ratings yet
Greek Architecture
13 pages
Beginning of The Year Progress Note
No ratings yet
Beginning of The Year Progress Note
2 pages
Wearable Devices For The Detection of Covid-19
No ratings yet
Wearable Devices For The Detection of Covid-19
21 pages
PCC-2000 Reference Manual V1.42
No ratings yet
PCC-2000 Reference Manual V1.42
26 pages
Aramean Crusade Against The Assyrian Name & Identity
No ratings yet
Aramean Crusade Against The Assyrian Name & Identity
7 pages
Purbasari and Purbararang Script
No ratings yet
Purbasari and Purbararang Script
22 pages
The Empathetic School
100% (1)
The Empathetic School
9 pages
MITinformation Brochure 2 June 2023
No ratings yet
MITinformation Brochure 2 June 2023
18 pages
Panasonic 120 150 PDF
No ratings yet
Panasonic 120 150 PDF
5 pages
GS 150
No ratings yet
GS 150
72 pages
Mathematics 9 - Q3 - Mod11 - Conditions Proving For Triangles Similar - v3
100% (2)
Mathematics 9 - Q3 - Mod11 - Conditions Proving For Triangles Similar - v3
28 pages
Kelly Criterion Strategy Short
50% (2)
Kelly Criterion Strategy Short
7 pages
Mini-Vert Brochure
No ratings yet
Mini-Vert Brochure
4 pages
AWS-SOP - Creating ALB and Configuring Target Groups, Listeners and Stickiness
No ratings yet
AWS-SOP - Creating ALB and Configuring Target Groups, Listeners and Stickiness
15 pages
Thesis Paper
No ratings yet
Thesis Paper
7 pages
Puritan Literature
No ratings yet
Puritan Literature
4 pages
Organophosphate Insecticides (OPC)
No ratings yet
Organophosphate Insecticides (OPC)
27 pages
2015 고등 영어독해와작문 (안병규) 교과서PDF
No ratings yet
2015 고등 영어독해와작문 (안병규) 교과서PDF
184 pages
In An Artist's Studio
50% (2)
In An Artist's Studio
4 pages
Some Basic Concepts of Chemistry
No ratings yet
Some Basic Concepts of Chemistry
19 pages
Project Topics On Law of Evidence
No ratings yet
Project Topics On Law of Evidence
5 pages

Vision Transformer Reliability Evaluation on the C

Uploaded by

Vision Transformer Reliability Evaluation on the C

Uploaded by

This article has been accepted for publication in IEEE Transactions on Nuclear Science.

Vision Transformer Reliability Evaluation

Classifier MLP can say that this device is likely manufactured in 12 or 16 nm

Fig. 2: Google’s Coral Edge TPU architecture. Adapted from Rasp4

Fig. 5: Flow of an iteration of the experiment for the trans-

30 Transformer Encoder 2 has a 2.51× higher FIT rate than

Transformer Encoder 1. As the comparison between attention

10 heads showed that the internal size has a negligible effect on

Convolutions, then, not only help in improving the training

Average Error Magnitude 0.60 40

You might also like