Eeg 2

The document presents SeizureTransformer, a novel deep learning model designed for simultaneous time-step level seizure detection from long EEG recordings. This model combines 1D convolutions, a residual CNN stack, and a transformer encoder to effectively capture long-range dependencies in EEG data, outperforming existing methods in accuracy and efficiency. Extensive experiments demonstrate its potential for real-time seizure detection, achieving top performance in the 2025 Seizure Detection Challenge.

Uploaded by

jeevachandru423

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views7 pages

Eeg 2

Uploaded by

jeevachandru423

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

SeizureTransformer: Scaling U-Net with

Transformer for Simultaneous Time-Step Level

Seizure Detection from Long EEG Recordings
1st Kerui Wu 2nd Ziyue Zhao 3rd Bülent Yener
Department of Computer Science Department of Computer Science Department of Computer Science
Rensselaer Polytechnic Institute Rensselaer Polytechnic Institute Rensselaer Polytechnic Institute
Troy, NY. USA. Troy, NY. USA. Troy, NY. USA.
wuk9@[Link] zhaoz10@[Link] yener@[Link]
arXiv:2504.00336v2 [[Link]] 2 Apr 2025

Abstract—Epilepsy is a common neurological disorder that signals is significant in order to obtain fast and objective EEG
affects around 65 million people worldwide. Detecting seizures analysis.
quickly and accurately is vital, given the prevalence and severity EEG signals can be treated as a batch of time series, a
of the associated complications. Recently, deep learning-based
automated seizure detection methods have emerged as solutions; sequence of data points indexed in a discrete-time order, which
however, most existing methods require extensive post-processing formulates the automated seizure detection problem to be
and do not effectively handle the crucial long-range patterns part of a classification task in time series analysis. In recent
in EEG data. In this work, we propose SeizureTransformer, years, deep learning models have demonstrated impressive
a simple model comprised of (i) a deep encoder comprising abilities to capture the intricate dependencies within time
1D convolutions (ii) a residual CNN stack and a transformer
encoder to embed previous output into high-level representation series data, making them a powerful tool for time series
with contextual information, and (iii) streamlined decoder which analysis over traditional statistical methods. However, most
converts these features into a sequence of probabilities, directly existing work [3]–[9] implements the classification task at
indicating the presence or absence of seizures at every time step. a sliding window level, which involves segmenting a signal
Extensive experiments on public and private EEG seizure recording into distinct windows and predicting a label for
detection datasets demonstrate that our model significantly out-
performs existing approaches (ranked in the first place in the each sample. Converting separated predictions into final event
2025 ”seizure detection challenge” organized in the International prediction in Standardized Computer-based Organized Report-
Conference on Artificial Intelligence in Epilepsy and Other ing of EEG (SCORE) standard [10] that can be used in real
Neurological Disorders), underscoring its potential for real-time, life involves extensive time-consuming post-processing, which
precise seizure detection. departs existing algorithms from simultaneous detection. More
Index Terms—Time series analysis, change point detection,
deep learning, transformers
than that, existing time series analysis research often train and
evaluate models using datasets that have a small sequence
length [11], while EEG studies haven shown that long-range
I. I NTRODUCTION
input records can largely benefit accurate prediction [12].
Epilepsy is a prevalent neurological disorder distinguished In contrast to window-level classification models, sequence-
by recurring seizures. Worldwide, there are approximately 65 to-sequence modeling, a type of encoder-decoder model to
million people with epilepsy, more than Parkinson’s disease, map an input sequence to an output sequence, provides a
Alzheimer’s disease, and Multiple Sclerosis combined. One of straightforward solution to avoid redundant post-processing
the most serious complications linked to epilepsy is Sudden steps through time-step-level classification. In the filed of Nat-
Unexpected Death in Epilepsy(SUDEP), which tragically re- ural Language Processing(NLP), Transformer-based models
sults in the deaths of around 1 in every 1000 epilepsy patients have shown remarkable predictive and generative abilities [13],
each year [1]. Given the severity of this risk, early and precise [14]. However, studies have shown that CNN-based models
seizure detection is crucial in clinical practice, as prompt achieve better classification ability in time series analysis com-
intervention can considerably lower mortality rates [2]. pared to RNN-based and Transformer-based models [15]. This
Traditionally, large numbers of multi-channel EEG signals lets the focus of scientific signal classification study be on the
are visually analyzed by neurologists with the goal of un- U-Net [12], [16], [17], a fully convolutional encoder-decoder
derstanding when and where the seizures start and how they network with skip connections that was originally designed
propagate within the brain. However, there are two main for image segmentation [18]. The drawback of such models
disadvantages of visual analysis of EEG signals: it is time- also stands out. Firstly, U-Net primarily operates within local
consuming and prone to subjectivity. Therefore, automation receptive fields, making it difficult for U-Net to effectively
of the detection of the underlying brain dynamics in EEG model long-range dependencies as the input sequence length
becomes big. Beyond that, Scaling U-Net to large datasets training data from two datasets by resampling signals into 256
or high-resolution sequences requires stacking deeper layers, Hz and fixing the channel sequence in order (Fig. 2a).
often leading to vanishing gradients, overfitting, and massive We combine two datasets by concatenating segmented one-
memory and computation usage. minute-long time series windows together, i.e., 60 × 256 =
In this work, we propose a simple U-Net-based archi- 15360 time steps per window. A 75% overlap ratio between
tecture, namely, SeizureTransformer, to solve the mentioned two consecutive windows was set as a hyperparameter during
challenges. The model comprises of three components (i) the segmentation process to augment training examples. To
a deep encoder comprising 1D convolutions (ii) a residual improve the model’s ability to distinguish seizure signals from
CNN stack and a transformer encoder to embed previous background noise, we statistically categorize training windows
output into high-level representation with global contextual into three classes: no-seizure, full-seizure, and partial-seizure,
information, and (iii) streamlined decoder which converts these and uniformly sample a certain number of windows from each
features into a sequence of probabilities, directly indicating the class to create a balanced dataset. Specifically, our training
presence or absence of seizures at every time step. The scaling dataset is constructed as follows:
embedding components makes the model to be easily scalable
D = Dps ∪ Df∗ s ∪ Dns
∗
to build up the model size and to handle long-sequence signals.
Experimentally, our model achieves the consistent state-of-the- where Dps contains all partial-seizure windows, Df∗ s and Dns ∗
art performance, efficiency, and generalization across diverse is a randomly selected subset of full-seizure and no-seizure
subjects and devices in public and private EEG datasets. Our window with |Df∗ s | = 0.3 × |Dps | and |Dns ∗
| = 2.5 × |Dps |.
model has ranked number one in an international competition Pre-processing. We followed [6]’s process for preprocessing
organized by the International Conference on Artificial Intel- EEG data before feeding into the model using a bandpass filter
ligence in Epilepsy and Other Neurological Disorders. to keep frequencies in a range from 0.5 Hz to 120 Hz and two
notch filters to eliminate signals at 1 Hz and 60 Hz, which are
II. R ESULTS
typically associated with heart rate and power line noise (Fig.
A. Model Overview 2b).
We design model architecture based on the U-Net to do Training Setting. We implemented our deep learning model
end-to-end learning from raw waveforms for time-step-level using PyTorch and trained on 2 parallel NVIDIA L40S 46GB
classification to achieve simultaneous seizure detection. Our GPUs. Our training parameters include a batch size of 256,
model consists of three primary modules: an encoder, a a learning rate of 1e-3, a weight decay of 2e-5, and a drop
scaling embedding component, and a decoder, as shown in rate of 0.1 for all dropout layers both at training and test time.
Fig. 1. Taking the continuous long-term EEG signals from We use Binary Cross-Entropy loss as the objective function
the epilepsy monitoring unit, the encoder extracts features and RAdam as the optimizer. The training process was set
by recognizing patterns through one-dimensional convolution to be 100 epochs with early stopping if no improvement in
layers. The feature vectors are further embedded by a ResCNN validation loss was observed over 12 epochs.
stack and a Transformer encoder stack with a global attention Post-processing. After having a sequence of probabilities,
mechanism to generate high-level representations that capture outputted by the model, we implement a set of simple post-
rich temporal dependencies. The streamlined decoder then processing steps to convert continuous probabilities to the
converts these representations into a sequence of probability, final detection(Fig. 2c). Initially, we apply a straightforward
indicating the presence or absence of seizures at every time threshold filter to obtain a discrete mask. Then, two morpho-
step. Residual connections between each encoder layer and logical operations are employed to eliminate spurious spikes of
decoder layer are used to ease the gradient flow and to seizure activity and to fill short 0 gaps. Lastly, we implement a
avoid degradation problems in the deep neural network. More simple duration-based rule to discard blocks of seizure labels
details about network architecture selection are provided in the lasting less than a minimal clinically relevant duration.
methodology section.
C. Evaluation Results
B. Model Training We used TUSZ’s predefined test set, consisting of 42.7
Datasets. We use Temple University Hospital EEG Seizure hours of waveforms from 43 subjects with 469 seizure activi-
Corpus v2.0.3(TUSZ) [19] and Siena Scalp EEG Database ties, to evaluate the detection performance of SeizureTrans-
[20] to form our training dataset. TUSZ is the largest public former with other traditional and deep-learning algorithms.
dataset for seizure detection that has been manually annotated The test set of TUSZ is a list of blind EEG signals that are
with data for seizure events. The predefined training set in completely separated from its training set and validation set,
TUSZ has 910 hours of recording sessions from 579 subjects which ensures the generalization of model performance.
with various sampling frequencies, from 250 Hz to 1000 Hz. We quantify the model’s performance using the area un-
The Siena Scalp EEG Database is a small dataset that contains der the receiver operating characteristics(AUROC). For each
128 hours of recording sessions from 14 subjects with a unified continuous EEG recording, the ROC curve plots the true and
sampling rate of 512 Hz. Both datasets contain at least 19 false positive rates across all possible decision thresholds,
electrodes of the international 10-20 system. We unify the and the AUC represents the area under the ROC curve,
Encoder Scaling Embedding
(512, 960)(512, 480)
(64, 3840) (128, 3840) (128, 1920) (256, 1920) (256, 960)
(32, 15360) (32, 7680) (64, 7680)
(19, 15360)

(256, 3840) (256, 1920) (512, 1920) (512, 960) (512, 960) (512, 480)
(64, 7680) (128, 7680) (128, 3840)
(32, 15360) (64, 15360)
(1, 15360)

Decoder

Res CNN Stack

Conv1D & ELU Activation
Pooling
Input BatchNorm Relu Convolution BatchNorm Relu Convolution Output
Upsample
x7
Residual Connection
Conv1D & Sigmoid
Transformer Encoder Stack
Res CNN Stack
Position Encoding
Input 4-head attention Add & norm Positionwise FNN Add & norm Output
Transformer Encoder Stack
x8

Fig. 1. SeizureTransformer Architecture.

TABLE I
M ODEL P ERFORMANCE IN THE SEIZURE DETECTION CHALLENGE 2025.

Result
Model Architecture Input Length(s) F1-score Sensitivity Precision FP (per day)
SeizureTransformer U-Net & CNN & Transformer 60 0.43 0.37 0.45 1
Van Gogh Detector CNN & Transformer N × 10 0.36 0.39 0.42 3
S4Seizure S4 12 0.34 0.30 0.42 2
DeepSOZ-HEM LSTM & Transformer 600 0.31 0.58 0.27 14
HySEIZa Hyena-Hierarchy & CNN 12 0.26 0.6 0.22 13
Zhu-Transformer CNN & Transformer 25 0.20 0.46 0.16 24
SeizUnet U-Net & LSTM 30 0.19 0.16 0.20 4
Channel-adaptive CNN 15 0.14 0.06 0.20 1
EventNet U-Net 120 0.14 0.6 0.09 20
Gradient Boost Gradient Boosted Trees 10 0.07 0.15 0.09 6
DynSD LSTM 1 0.06 0.55 0.04 37
Random Forest Random Forest 2 0.06 0.05 0.07 1

which summarizes the model’s performance. We compare our in Epilepsy and Other Neurological Disorders, provides a
model’s performance using the same evaluation metric under completely blind private dataset consisting of continuous EEG
the TUSZ’s predefined test set with other seizure detection recordings for evaluation, which makes it an ideal place to
models, namely, Zhu-Transformer [6], EEGWaveNet [8], and test the performance and generalization of our model fairly.
DCRNN [7], to demonstrate the effectiveness of our proposed The test dataset was collected at the EMU of the Filadelfia
approach. Models used here for the comparison are pre-trained Danish Epilepsy Center in Dianalund from January 2018
models based on different training sets. All of these pre-trained to December 2020 with the NicoleteOneT M v44 amplifier.
models are implemented by [21] and are publicly available. The dataset contains 4360 hours of EEG recordings from 65
As shown in Figure 3, our model demonstrated the highest subjects with various ages, where for each subject, at least
performance, with a mean AUROC of 0.876 and a distribution one seizure during the hospital stay with a visually identifi-
tightly concentrated toward higher values. able electrographic correlate to the seizures recorded on the
video. The ground truth labels were annotated by three board-
D. Application in Seizure Detection Challenge
certified neurophysiologists with expertise in long-term video-
The 2025 Seizure Detection Challenge 1 , organized as EEG monitoring. The F1-score, sensitivity, precision, and false
part of the International Conference on Artificial Intelligence positive per day were used as the primary ranking criterion to
1 competition align with real-world requirements. The event-based scoring
website and leaderboard is available in: https:
//[Link]/challenge/ evaluates annotations at the event level by assessing the degree
TABLE II
a b M ODEL’ S RUNTIME OVER TUSZ V 2.0.3’ S T ESTING S ET
Fp 1 Fp 2

F7
F3 Fz F4
F8 Model Total Runtime(s) Runtime(s) per 1-hour EEG
SeizureTransformer 169.96 3.98
T3 C3 Cz C4 T4
DCRNN 2571.75 60.24
T5
P3 Pz P4
T6
EEGWaveNet 1690.19 39.59
Zhu-Transformer 3309.51 77.53
O1 O2

FP1-Avg
F3-Avg
C3-Avg
U-Net; but different to SeizureTransformer, it chooses to add
P3-Avg
O1-Avg
Seizure Transformer LSTM layers, instead of transformer encoders, after the U-Net
F7-Avg decoder, instead of embedding into the U-Net, and turns out
T3-Avg
T5-Avg to be not as good as our results.
FZ-Avg c
CZ-Avg
PZ-Avg III. D ISCUSSION
FP2-Avg
F4-Avg A. Runtime Analysis
C4-Avg
P4-Avg
O2-Avg
Window-level classification models assign predictions in-
F8-Avg
T4-Avg
dividually to each segmented window. Mapping window la-
T6-Avg
Seizure
bels to the final annotation output followed by the SCORE
compliant [10] that contains the start time and duration time
of a seizure requires the model to segment windows with a
great overlap ratio to ensure the start and stop time’s precision.
Fig. 2. EEG Signal Processing Pipeline: (a) Brain activity is recorded using
a 19-channel EEG system. (b) A 60-second EEG sample is pre-processed This led to tremendous redundant computing and complicated
through normalization, Butterworth bandpass filtering, and 1 Hz & 60 Hz mapping procedures. On the other hand, the time-step-level
IIR notch filters to remove noise. (c) After neural network analysis, post- classification models do not require such post-processing steps
processing steps—threshold filtering, morphological opening and closing, and
removal of short-duration events—produce the final detection. as their predictions can directly indicate the onset time and
activity duration. This approach inherently mitigates the redun-
Mean: 0.876 Mean: 0.642 Mean: 0.566 Mean: 0.679 dant computations associated with overlapping windows and
significantly simplifies the annotation pipeline, which makes
1.0 this method align more closely with the practical clinical
0.8 requirement for efficient automated seizure detection.
We further show our model’s efficiency by comparing the
Density

0.6
inference time with other models using TUSZ’s testing set
0.4 in Table II. Our model demonstrate the lowest running time
0.2
with the ability to handle a one-hour-long recording in 3.98
seconds.
0.0
SeizureTransformer DCRNN EEGWaveNet Zhu-Transformer B. Ablation Study
The better performance of the proposed method for seizure
Fig. 3. Violin plots illustrating the distribution of AUROC values for detection could be due to several factors. Here, we show
SeizureTransformer, DCRNN, EEGWaveNet, and Zhu-Transformer models each model component’s necessity by testing multiple partial
evaluated on the TUSZ v2.0.3 predefined testing set. Mean AUROC scores
for each model are indicated above each plot, with the SeizureTransformer models after removing certain components. As shown in
demonstrating the highest overall performance. Figure 4, vanilla U-Net has an underwhelming performance
with a low AUROC mean. Solely adding a ResCNN stack or
a transformer stack will marginally improve the model perfor-
of overlap between predicted and reference events. mance but also lead to a bigger variance with some extreme
As shown in Table I, our model largely outperforms the false cases. By contrast, integrating both the ResCNN and
other algorithms in terms of F1-score. It is noteworthy that Transformer stacks produces not only higher mean AUROC
we set the picking threshold to be 80% in the competition, but also reduced variance, indicating that these components
which leads to a relative low sensitivity but comes with the complement each other effectively. These results underscore
best precision and False Positive rate. Van Gogh Detector the importance of each proposed element in achieving robust
and Zhu-Transformer are window-level classification models and accurate seizure detection.
that also take advantage of both convolutional and transformer
encoder units; however, their performance did not reach that C. Challenge Results
of SeizureTransformer. This points to the beneficial effects of The competition leaderboard shows a relatively low F1-
time-step-level end-to-end learning. Similarly, SeizUnet, like score across every algorithm compared to the results shown in
our model, is a time-step-level classification algorithm using previously published reviews [22], [23] and the self-reported
Mean: 0.876 Mean: 0.856 Mean: 0.864 Mean: 0.839 Mean: 0.803 Mean: 0.767 tion [12], [26], denoising heart sound signals [27], and Seizure
detection [28].
1.0
There are some works exploring combining U-Net with
0.8 Transformer together for other fields. For example, in a medi-
cal image segmentation task, [29] used self and cross-attention
Density

0.6
with U-Net; [30] incorporated hierarchical Swin Transformer
0.4
into U-Net to extract both coarse and fine-grained feature
0.2 representations. In seismic analysis, [31] proposed a deep
0.0
neural network that can be regarded as a U-Net with global
and self-attention but without a residual connection. However,
RPT RT PT T R N
in the signal processing area, to the best of our knowledge,
Fig. 4. Ablation study for SeizureTransformer by drawing AUROC distribu- there is no existing work to scale U-Net using transformer
tions of models that contains partial components. N represents a vanilla deep blocks. The closest work to this paper [28], where multiple
U-Net without ResCNN and Transformer encoder stack; R represents the U- attention-gated U-Net are used and a following LSTM network
Net with ResCNN stack; T represents the U-Net with Transformer Stack; P
means adding positional encoding before feeding into the transformer stack. is implemented to fusion results.
B. Preliminary
TABLE III For a continuous EEG waveform, before segmenting it to
M ODEL PERFORMANCE IN TUSZ’ S PREDEFINED TESTING SET.
uniform windows as training examples, we resample all data to
Scale Model F-1 Sensitivity Precision a common, i.e., 256, sampling rate using the Fourier method
Gotman 0.0679 0.0558 0.0868 [32], to fix the time resolution for the convolutions in the
EEGWaveNet 0.1088 0.1051 0.1128
DCRNN 0.1917 0.4777 0.1199 model to be meaningful across subjects, and implement a
Sample-based
Zhu-Transformer 0.4256 0.5406 0.3510 Gaussian normalization to each channel, calculated by
EventNet 0.4830 0.5514 0.4286
SeizureTransformer 0.5803 0.4710 0.7556 x∗i = (x∗i − x̄)/sx ,
Gotman 0.2089 0.6199 0.1256
EEGWaveNet 0.2603 0.4427 0.1844 K
DCRNN 0.3262 0.5723 0.2281 1 X
Event-based
Zhu-Transformer 0.5387 0.6116 0.5259
x̄ = xi ,
K i=1
EventNet 0.5655 0.6116 0.5259
SeizureTransformer 0.6752 0.7110 0.6427 K
1 X
sx = (xi − x̄)2 .
K − 1 i=1
performance of computing algorithms. To comprehensively The generated dataset, after slicing, is denoted as D =
understand the model’s performance, we test our model with (X , Y) = {(xi , yi ) | i = 1, . . . , N }, where N represents the
several published algorithms, namely, EventNet [24], Zhu- number of training samples. Each input window xi ∈ RT ×d
Transformer [6], DCRNN [7], EEGWaveNet [8], and the represents a multivariate time series with T = 256 × 60 =
Gotman algorithm [25], in the TUSZ’s predefined testing 15,360 time steps and d = 19 channels. The corresponding
set using the same evaluation metrics(F1-score, sensitivity, time-step-level label yi ∈ {0, 1}T is a binary, box-shaped
and precision) implemented by the challenge organizers [21]. ground truth signal indicating the presence of seizure activity
The testing tools provide both sample-level and event-level at each time step.
evaluation. As shown in Table III, while our model keeps
the state-of-the-art performance, all model achieved better F1- C. Network Design
scores. Such result difference might be due to the distribution Encoder. We use one-dimensional convolutions along the
shift between datasets. As described by the organizer, the time axis to extract local temporal patterns, outputting a
private evaluation dataset include recordings from various tokenized representation of the signal. Specifically, we use a
ages, and the data was collected by portable EEG amplifiers, convolution-pooling block with various kernel sizes from 11 to
allowing patients to move freely within the building, which 3 to detect features at different temporal scales, capturing both
will likely lead to unique attributes in the recording that depart slow and fast dynamics. This reduces the time step size from
from the training set. 15360 to Td = 512 while increases the channel size from 19
to kd = 480 to compensate the loss of resolution in the time
IV. M ETHODS domain. The ELU function is set as the activation function
after each convolution layer.
A. Related Work
Scaling Embedding. Followed by [31], after getting the
U-Net [18] architecture was first proposed in the field of encoded output, we implement a ResCNN stack first to refine
CV for image segmentation tasks. Considering the temporal these tokenized features to yield a better generalization with
continuity of time series data, such networks have been widely better temporal invariance.
deployed in various scientific signal processing applications, We then implement a transformer encoder stack [33] to scale
such as seismic phase detection [16], sleep-staging classifica- the model and to capture long-range dependencies across the
tokenized signal. Specifically, the sine and cosine functions of VI. DATA AVAILABILITY
different frequuencies are used to be positional encodings, We used the following publicly available datasets in this
P E(pos,2i) = sin(pos/10000 2i/Td
), work for training our model. The test set used in the com-
petition was not made publicly available at the time of this
P E(pos,2i+1) = cos(pos/100002i/Td ), write-up.
• Siena Scalp EEG Database: The database consists of
which can then be summed with the input embedding. The
refined representation, denoted as Z, will then be projected EEG recordings of 14 patients acquired at the Unit
into equally-shaped query, key, and value spaces, of Neurology and Neurophysiology of the University
of Siena. Subjects include 9 males (ages 25-71) and
Q = ZW Q , K = ZW K , V = ZW V , 5 females (ages 20-58). Subjects were monitored with
a Video-EEG with a sampling rate of 512 Hz, with
and processed with the use of the global-attention mechanism,
electrodes arranged on the basis of the international 10-20
QK T System. Most of the recordings also contain 1 or 2 EKG
A = sof tmax( √ )V.
dk signals. The diagnosis of epilepsy and the classification
The attention output is combined with tokens with a residual of seizures according to the criteria of the International
connection and layer normalization and a subsequent feed- League Against Epilepsy were performed by an expert
forward network to transform the output with another residual clinician after a careful review of the clinical and elec-
addition. trophysiological data of each patient. License: https://
Such hierarchical processing scales the model and integrates [Link]/content/siena-scalp-eeg/view-license/1.0.0/
• TUH EEG Seizure Corpus v2.0.3: This database is a
both local features and global context, enabling the model to
learn complex temporal dependencies. subset of the TUH EEG Corpus that was collected from
Decoder. Similar to the encoder, we use a convolutional archival records of clinical EEGs at Temple University
decoder to decrypt the compressed information from the center Hospital recorded between 2002 – 2017. From this large
latent space into a sequence of probability distribution. How- dataset, a subset of files with a high likelihood of con-
ever, instead of the convolution-pooling block, we upsample taining seizures was retained based on clinical notes and
the input with a scale factor of 2 and then with a convolution on the output of seizure detection algorithms. V2.0.0
to decrease the number of channels and to increase the number contains 7377 .edf files from 675 subjects for a total
of time steps back to the original window size. Like U-Net, duration of 1476 hours of data. The files are mostly
the residual connections are used between the encoder and short (avg. 10 minutes). The dataset has a heterogeneous
decoder to facilitate efficient gradient flow. sampling frequency and number of channels. All files are
Training. The model is trained to produce predictions ŷi that acquired at a minimum of 250 Hz. A minimum of 17 EEG
minimize the following objective: channels is available in all recordings. They are posi-
tioned according to the 10-20 system. The annotations are
ŷi = fθ (xi ), θ ∈ arg min L provided as .csv and contain the start time, stop, channel,
and seizure type. License: [Link]
Here, we use the Binary Cross-Entropy loss as our training
projects/nedc/forms/tuh [Link].
objective L, which measures the dissimilarity between the
predicted and true labels: VII. C ODE AVAILABILITY
L(ŷi,j , yi,j ) = −yi,j log(ŷi,j ) − (1 − yi,j ) log(1 − ŷi,j ) Our source code and model are available at [Link]
com/keruiwu/SeizureTransformer.
where yi,j and ŷi,j are the ground truth and predicted labels,
respectively, for sample i at time step j. R EFERENCES
[1] L. Hirsch, E. Donner, E. So, M. Jacobs, L. Nashef, J. Noebels, and
V. L IMITATION J. Buchhalter, “Abbreviated report of the nih/ninds workshop on sudden
While there has been a rich literature of research on epileptic unexpected death in epilepsy,” Neurology, vol. 76, no. 22, pp. 1932–
1938, 2011.
seizure detection and prediction, there is more work to be [2] A. Van de Vel, K. Cuppens, B. Bonroy, M. Milosevic, K. Jansen,
done to generalize the algorithms to anatomically different S. Van Huffel, B. Vanrumste, P. Cras, L. Lagae, and B. Ceulemans,
types of epilepsy, different ambulatory settings for recordings. “Non-eeg seizure detection systems and potential sudep prevention: state
of the art: review and update,” Seizure, vol. 41, pp. 141–153, 2016.
This is evident from the gaps between the training-validations [3] H. Wu, T. Hu, Y. Liu, H. Zhou, J. Wang, and M. Long, “Timesnet:
v.s. testing F1-scores of the work presented in this paper. Our Temporal 2d-variation modeling for general time series analysis,” arXiv
demonstrates a high F1-score over other data sets. However, preprint arXiv:2210.02186, 2022.
[4] Y. Liu, T. Hu, H. Zhang, H. Wu, S. Wang, L. Ma, and M. Long, “itrans-
its F1-score is lower on the withheld test data set while it still former: Inverted transformers are effective for time series forecasting,”
out performs the competing ones with a significant difference. arXiv preprint arXiv:2310.06625, 2023.
Thus, future work will focus on understanding the differences [5] T. Zhou, Z. Ma, Q. Wen, X. Wang, L. Sun, and R. Jin, “Fedformer:
Frequency enhanced decomposed transformer for long-term series fore-
in the data distributions between training and test data sets to casting,” in International conference on machine learning. PMLR,
improve our model. 2022, pp. 27 268–27 286.
[6] Y. Zhu and M. D. Wang, “Automated seizure detection using transformer [26] M. Perslev, M. Jensen, S. Darkner, P. J. Jennum, and C. Igel, “U-time: A
models on multi-channel eegs,” in 2023 IEEE EMBS International fully convolutional network for time series segmentation applied to sleep
Conference on Biomedical and Health Informatics (BHI). IEEE, 2023, staging,” Advances in neural information processing systems, vol. 32,
pp. 1–6. 2019.
[7] S. Tang, J. A. Dunnmon, K. Saab, X. Zhang, Q. Huang, F. Dubost, [27] A. Mukherjee, R. Banerjee, and A. Ghose, “A novel u-net architecture
D. L. Rubin, and C. Lee-Messer, “Self-supervised graph neural networks for denoising of real-world noise corrupted phonocardiogram signal,”
for improved electroencephalographic seizure analysis,” arXiv preprint arXiv preprint arXiv:2310.00216, 2023.
arXiv:2104.08336, 2021. [28] M. R. Islam, X. Zhao, Y. Miao, H. Sugano, and T. Tanaka, “Epileptic
[8] P. Thuwajit, P. Rangpong, P. Sawangjai, P. Autthasan, R. Chaisaen, seizure focus detection from interictal electroencephalogram: a survey,”
N. Banluesombatkul, P. Boonchit, N. Tatsaringkansakul, T. Sud- Cognitive Neurodynamics, vol. 17, no. 1, pp. 1–23, Feb 2023.
hawiyangkul, and T. Wilaiprasitporn, “Eegwavenet: Multiscale cnn- [29] O. Petit, N. Thome, C. Rambour, L. Themyr, T. Collins, and L. Soler,
based spatiotemporal feature extraction for eeg seizure detection,” IEEE “U-net transformer: Self and cross attention for medical image segmen-
Transactions on Industrial Informatics, vol. 18, no. 8, pp. 5547–5557, tation,” in Machine Learning in Medical Imaging: 12th International
2021. Workshop, MLMI 2021, Held in Conjunction with MICCAI 2021,
[9] A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with Strasbourg, France, September 27, 2021, Proceedings 12. Springer,
selective state spaces,” arXiv preprint arXiv:2312.00752, 2023. 2021, pp. 267–276.
[10] S. Beniczky, H. Aurlien, J. C. Brøgger, L. J. Hirsch, D. L. Schomer, [30] A. Lin, B. Chen, J. Xu, Z. Zhang, G. Lu, and D. Zhang, “Ds-transunet:
E. Trinka, R. M. Pressler, R. Wennberg, G. H. Visser, M. Eisermann Dual swin transformer u-net for medical image segmentation,” IEEE
et al., “Standardized computer-based organized reporting of eeg: Score– Transactions on Instrumentation and Measurement, vol. 71, pp. 1–15,
second version,” Clinical Neurophysiology, vol. 128, no. 11, pp. 2334– 2022.
2346, 2017. [31] S. M. Mousavi, W. L. Ellsworth, W. Zhu, L. Y. Chuang, and G. C.
[11] A. Bagnall, H. A. Dau, J. Lines, M. Flynn, J. Large, A. Bostrom, Beroza, “Earthquake transformer—an attentive deep-learning model for
P. Southam, and E. Keogh, “The uea multivariate time series classi- simultaneous earthquake detection and phase picking,” Nature commu-
fication archive, 2018,” arXiv preprint arXiv:1811.00075, 2018. nications, vol. 11, no. 1, p. 3952, 2020.
[12] H. Li and Y. Guan, “Deepsleep convolutional neural network allows [32] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy,
accurate and fast detection of sleep arousal,” Communications biology, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright et al.,
vol. 4, no. 1, p. 18, 2021. “Scipy 1.0: fundamental algorithms for scientific computing in python,”
[13] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Nature methods, vol. 17, no. 3, pp. 261–272, 2020.
Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning [33] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
with a unified text-to-text transformer,” Journal of machine learning Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in
research, vol. 21, no. 140, pp. 1–67, 2020. neural information processing systems, vol. 30, 2017.
[14] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy,
V. Stoyanov, and L. Zettlemoyer, “Bart: Denoising sequence-to-sequence
pre-training for natural language generation, translation, and comprehen-
sion,” arXiv preprint arXiv:1910.13461, 2019.
[15] Y. Wang, H. Wu, J. Dong, Y. Liu, M. Long, and J. Wang, “Deep time
series models: A comprehensive survey and benchmark,” arXiv preprint
arXiv:2407.13278, 2024.
[16] W. Zhu and G. C. Beroza, “Phasenet: a deep-neural-network-based seis-
mic arrival-time picking method,” Geophysical Journal International,
vol. 216, no. 1, pp. 261–273, 2019.
[17] C. Chatzichristos, J. Dan, A. M. Narayanan, N. Seeuws, K. Vandecas-
teele, M. De Vos, A. Bertrand, and S. Van Huffel, “Epileptic seizure
detection in eeg via fusion of multi-view attention-gated u-net deep
neural networks,” in 2020 IEEE Signal Processing in Medicine and
Biology Symposium (SPMB). IEEE, 2020, pp. 1–7.
[18] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks
for biomedical image segmentation,” in Medical image computing and
computer-assisted intervention–MICCAI 2015: 18th international con-
ference, Munich, Germany, October 5-9, 2015, proceedings, part III 18.
Springer, 2015, pp. 234–241.
[19] V. Shah, E. Von Weltin, S. Lopez, J. R. McHugh, L. Veloso, M. Gol-
mohammadi, I. Obeid, and J. Picone, “The temple university hospital
seizure detection corpus,” Frontiers in neuroinformatics, vol. 12, p. 83,
2018.
[20] P. Detti, “Siena scalp eeg database,” PhysioNet. doi, vol. 10, p. 493,
2020.
[21] J. Dan, U. Pale, A. Amirshahi, W. Cappelletti, T. M. Ingolfsson,
X. Wang, A. Cossettini, A. Bernini, L. Benini, S. Beniczky et al.,
“Szcore: Seizure community open-source research evaluation framework
for the validation of electroencephalography-based automated seizure
detection algorithms,” Epilepsia, 2024.
[22] S. Supriya, S. Siuly, H. Wang, and Y. Zhang, “Epilepsy detection from
eeg using complex network techniques: A review,” IEEE Reviews in
Biomedical Engineering, vol. 16, pp. 292–306, 2021.
[23] M. K. Siddiqui, R. Morales-Menendez, X. Huang, and N. Hussain, “A
review of epileptic seizure detection using machine learning classifiers,”
Brain informatics, vol. 7, no. 1, p. 5, 2020.
[24] N. Seeuws, M. De Vos, and A. Bertrand, “Avoiding post-processing
with event-based detection in biomedical signals,” IEEE Transactions
on Biomedical Engineering, 2024.
[25] J. Gotman, “Automatic recognition of epileptic seizures in the eeg,”
Electroencephalography and clinical Neurophysiology, vol. 54, no. 5,
pp. 530–540, 1982.

Optimized Deep Learning for Seizure Detection
No ratings yet
Optimized Deep Learning for Seizure Detection
5 pages
Epilepsy Seizure Detection Using Transformer
No ratings yet
Epilepsy Seizure Detection Using Transformer
5 pages
Epileptic Electrocorticogram Signal Detections With Patient-Specific Quantized Convolution Neural N
No ratings yet
Epileptic Electrocorticogram Signal Detections With Patient-Specific Quantized Convolution Neural N
4 pages
Detection of Epileptic Seizure in EEG Signals Usin
No ratings yet
Detection of Epileptic Seizure in EEG Signals Usin
15 pages
Deep Learning for Neonatal EEG Seizure Detection
No ratings yet
Deep Learning for Neonatal EEG Seizure Detection
21 pages
Real-Time EEG Seizure Detection Methods
No ratings yet
Real-Time EEG Seizure Detection Methods
26 pages
2020 Epileptic Signal Classification With Deep EEG
No ratings yet
2020 Epileptic Signal Classification With Deep EEG
14 pages
Epilepsia Open - 2023 - Wong - EEG Datasets For Seizure Detection and Prediction A Review
No ratings yet
Epilepsia Open - 2023 - Wong - EEG Datasets For Seizure Detection and Prediction A Review
16 pages
EEG Seizure Detection with ML & DL
No ratings yet
EEG Seizure Detection with ML & DL
12 pages
EEG Seizure Detection with Deep Learning
No ratings yet
EEG Seizure Detection with Deep Learning
8 pages
Automatic Seizure Detection Using Three-Dimensional CNN Based On Multi-Channel EEG
No ratings yet
Automatic Seizure Detection Using Three-Dimensional CNN Based On Multi-Channel EEG
10 pages
1 s2.0 S1568494625005228 Main
No ratings yet
1 s2.0 S1568494625005228 Main
52 pages
Deep Learning for EEG Seizure Prediction
No ratings yet
Deep Learning for EEG Seizure Prediction
6 pages
Seizure Prediction with SGSTAN Model
No ratings yet
Seizure Prediction with SGSTAN Model
13 pages
Transfer Learning for EEG Seizure Classification
No ratings yet
Transfer Learning for EEG Seizure Classification
18 pages
Epileptiform Spike Detection Via Convolutional Neural Networks
No ratings yet
Epileptiform Spike Detection Via Convolutional Neural Networks
5 pages
Machine Learning for Epilepsy Prediction
No ratings yet
Machine Learning for Epilepsy Prediction
5 pages
Research Paper Mtech (AI)
No ratings yet
Research Paper Mtech (AI)
10 pages
EL Poster
No ratings yet
EL Poster
1 page
Bioengineering 10 00918
No ratings yet
Bioengineering 10 00918
16 pages
Engineering Reports - 2023 - Alam - Field Programmable Gate Array Based Energy Efficient and Fast Epileptic Seizure
No ratings yet
Engineering Reports - 2023 - Alam - Field Programmable Gate Array Based Energy Efficient and Fast Epileptic Seizure
14 pages
PIIS2405844024078587
No ratings yet
PIIS2405844024078587
12 pages
CNN for Automated EEG Seizure Detection
No ratings yet
CNN for Automated EEG Seizure Detection
25 pages
Deep Learning for Epileptic Seizure Prediction
No ratings yet
Deep Learning for Epileptic Seizure Prediction
22 pages
VLSI SVM Seizure Detection System
No ratings yet
VLSI SVM Seizure Detection System
11 pages
Eeg SNN
No ratings yet
Eeg SNN
10 pages
EEG Seizure Detection with DNN & BDFA
No ratings yet
EEG Seizure Detection with DNN & BDFA
16 pages
ICASSP10
No ratings yet
ICASSP10
5 pages
EEG-based Onset Detection of Focal Epileptic Seizures With Multimodal Feature Representations
No ratings yet
EEG-based Onset Detection of Focal Epileptic Seizures With Multimodal Feature Representations
13 pages
Applsci 11 07661 v2
No ratings yet
Applsci 11 07661 v2
13 pages
EEG Seizure Detection with Autoencoders
No ratings yet
EEG Seizure Detection with Autoencoders
32 pages
Computers in Biology and Medicine: Sciencedirect
No ratings yet
Computers in Biology and Medicine: Sciencedirect
9 pages
Epileptic Seizure Detection with AI
No ratings yet
Epileptic Seizure Detection with AI
54 pages
RISC-V CNN Coprocessor For Real-Time Epilepsy Detection in Wearable Application
No ratings yet
RISC-V CNN Coprocessor For Real-Time Epilepsy Detection in Wearable Application
13 pages
Epileptic Seizure Detection Using Multi Channel EEG Wavelet Power Spectra and 1 D Convolutional Neural Networks
No ratings yet
Epileptic Seizure Detection Using Multi Channel EEG Wavelet Power Spectra and 1 D Convolutional Neural Networks
4 pages
2017 Geometric Deep Learning For Subject Independent
No ratings yet
2017 Geometric Deep Learning For Subject Independent
13 pages
A Machine Learning Approach To Preictal Phase Dete
No ratings yet
A Machine Learning Approach To Preictal Phase Dete
10 pages
Transfer Learning for Epilepsy Detection
No ratings yet
Transfer Learning for Epilepsy Detection
8 pages
Weighted Graph Project
No ratings yet
Weighted Graph Project
11 pages
Shoeb Icml 2010
No ratings yet
Shoeb Icml 2010
8 pages
2022 An Attention-Based Wavelet Convolution Neural
No ratings yet
2022 An Attention-Based Wavelet Convolution Neural
10 pages
Implementation of SVM-Based Low Power EEG Signal Classification Chip
No ratings yet
Implementation of SVM-Based Low Power EEG Signal Classification Chip
5 pages
GNN 5
No ratings yet
GNN 5
23 pages
2007 EEG Signal Classification Using Wavelet Feature Extraction and A Mixture of Expert Model-With-Cover-Page-V2
No ratings yet
2007 EEG Signal Classification Using Wavelet Feature Extraction and A Mixture of Expert Model-With-Cover-Page-V2
11 pages
EEG Seizure Detection: Methods & Trends
No ratings yet
EEG Seizure Detection: Methods & Trends
31 pages
Epilepsy Detection
No ratings yet
Epilepsy Detection
4 pages
EEG Signal Classification via Deep Learning
No ratings yet
EEG Signal Classification via Deep Learning
6 pages
EEG Seizure Detection Using DWT and ANN
No ratings yet
EEG Seizure Detection Using DWT and ANN
10 pages
EEG Seminar
No ratings yet
EEG Seminar
20 pages
3 Applsci 12 04181
No ratings yet
3 Applsci 12 04181
14 pages
Eeg 1
No ratings yet
Eeg 1
6 pages
Deep Learning Model For Analyzing EEG Signal Analysis
No ratings yet
Deep Learning Model For Analyzing EEG Signal Analysis
11 pages
ML-Based Epilepsy Seizure Classification
No ratings yet
ML-Based Epilepsy Seizure Classification
10 pages
Journal Pre-Proof: Microprocessors and Microsystems
No ratings yet
Journal Pre-Proof: Microprocessors and Microsystems
29 pages
Neurosci 05 00004
No ratings yet
Neurosci 05 00004
12 pages
2019 JBHI Epilepsy Seizure
No ratings yet
2019 JBHI Epilepsy Seizure
10 pages
Artgo Fernanda
No ratings yet
Artgo Fernanda
12 pages
Investigating The Impact of CNN Depth On Neonatal Seizure Detection Performance
No ratings yet
Investigating The Impact of CNN Depth On Neonatal Seizure Detection Performance
4 pages
Event Driven Neural Network On A Mixed Signal Neuromorphic Processor For EEG Based Epileptic Seizure Detection
No ratings yet
Event Driven Neural Network On A Mixed Signal Neuromorphic Processor For EEG Based Epileptic Seizure Detection
17 pages
Python Notes
No ratings yet
Python Notes
12 pages
UNO Minda Power Electronics Material New
No ratings yet
UNO Minda Power Electronics Material New
60 pages
Mysql
No ratings yet
Mysql
3 pages
High-Speed Magnetic Levitation Project
No ratings yet
High-Speed Magnetic Levitation Project
15 pages
Core Java Notes
No ratings yet
Core Java Notes
89 pages
Static Electricity Sensor Project
No ratings yet
Static Electricity Sensor Project
11 pages
Convolutional Coding for Noise Reduction
No ratings yet
Convolutional Coding for Noise Reduction
3 pages
Finding The Optimal Screening Test For Periprosthetic Joint Infection - A Prospective Study
No ratings yet
Finding The Optimal Screening Test For Periprosthetic Joint Infection - A Prospective Study
9 pages
3dimensions and Selenia Dimensions Mammography Systems Breast Tomosynthesis Physician Labeling (MAN-10576) Rev - 001 09-2023
No ratings yet
3dimensions and Selenia Dimensions Mammography Systems Breast Tomosynthesis Physician Labeling (MAN-10576) Rev - 001 09-2023
15 pages
DeepVision3 User Manual
No ratings yet
DeepVision3 User Manual
640 pages
Skin Cancer Detection via Gradient Boosting
No ratings yet
Skin Cancer Detection via Gradient Boosting
7 pages
Biomedical Signal Processing Problems
No ratings yet
Biomedical Signal Processing Problems
4 pages
Modelling-Project Notes-2
No ratings yet
Modelling-Project Notes-2
49 pages
E-Commerce Customer Behavior Analysis
No ratings yet
E-Commerce Customer Behavior Analysis
9 pages
For 499
No ratings yet
For 499
23 pages
Data Science Syllabus
No ratings yet
Data Science Syllabus
14 pages
Privacy-Preserving Theft Detection in Smart Grids
No ratings yet
Privacy-Preserving Theft Detection in Smart Grids
11 pages
Pre-Isch L:P Intestine
No ratings yet
Pre-Isch L:P Intestine
8 pages
Data Science and Machine Learning Syllabus V1.0
No ratings yet
Data Science and Machine Learning Syllabus V1.0
6 pages
Machine Learning for Credit Default Prediction
No ratings yet
Machine Learning for Credit Default Prediction
14 pages
1 s2.0 S1568494618307014 Main
No ratings yet
1 s2.0 S1568494618307014 Main
10 pages
Psychometric Properties of The 18-Item Indonesian Mental Toughness Questionnaire Using The Rasch Model and Machine Learning
No ratings yet
Psychometric Properties of The 18-Item Indonesian Mental Toughness Questionnaire Using The Rasch Model and Machine Learning
20 pages
Pereira 2021
No ratings yet
Pereira 2021
7 pages
Metrics Used in LLMs
No ratings yet
Metrics Used in LLMs
63 pages
QFR rp181
No ratings yet
QFR rp181
32 pages
Bayes' Theorem in Medical Testing
No ratings yet
Bayes' Theorem in Medical Testing
10 pages
Module 3
No ratings yet
Module 3
79 pages
Curate Et Al. (2016) A Method For Sex Estimation Femur
No ratings yet
Curate Et Al. (2016) A Method For Sex Estimation Femur
7 pages
Deep Learning - AD3501 - Notes - Unit 4 - Model Evaluation
No ratings yet
Deep Learning - AD3501 - Notes - Unit 4 - Model Evaluation
18 pages
Diao 2022
No ratings yet
Diao 2022
17 pages
Diabetes Prediction with ML
No ratings yet
Diabetes Prediction with ML
6 pages
Roopa Report
No ratings yet
Roopa Report
64 pages
Method Agreement Analysis Review
No ratings yet
Method Agreement Analysis Review
13 pages
Cricket Match Outcome Predictions
No ratings yet
Cricket Match Outcome Predictions
28 pages
Predictive Modeling for Gemstone Pricing
100% (1)
Predictive Modeling for Gemstone Pricing
22 pages
Unit1 Foundations Ai Healthcare
No ratings yet
Unit1 Foundations Ai Healthcare
46 pages
Thesis Defense
No ratings yet
Thesis Defense
80 pages

Eeg 2

Uploaded by

Eeg 2

Uploaded by

SeizureTransformer: Scaling U-Net with

Transformer for Simultaneous Time-Step Level

Res CNN Stack

Fig. 1. SeizureTransformer Architecture.

You might also like