Eeg 2
Eeg 2
Abstract—Epilepsy is a common neurological disorder that signals is significant in order to obtain fast and objective EEG
affects around 65 million people worldwide. Detecting seizures analysis.
quickly and accurately is vital, given the prevalence and severity EEG signals can be treated as a batch of time series, a
of the associated complications. Recently, deep learning-based
automated seizure detection methods have emerged as solutions; sequence of data points indexed in a discrete-time order, which
however, most existing methods require extensive post-processing formulates the automated seizure detection problem to be
and do not effectively handle the crucial long-range patterns part of a classification task in time series analysis. In recent
in EEG data. In this work, we propose SeizureTransformer, years, deep learning models have demonstrated impressive
a simple model comprised of (i) a deep encoder comprising abilities to capture the intricate dependencies within time
1D convolutions (ii) a residual CNN stack and a transformer
encoder to embed previous output into high-level representation series data, making them a powerful tool for time series
with contextual information, and (iii) streamlined decoder which analysis over traditional statistical methods. However, most
converts these features into a sequence of probabilities, directly existing work [3]–[9] implements the classification task at
indicating the presence or absence of seizures at every time step. a sliding window level, which involves segmenting a signal
Extensive experiments on public and private EEG seizure recording into distinct windows and predicting a label for
detection datasets demonstrate that our model significantly out-
performs existing approaches (ranked in the first place in the each sample. Converting separated predictions into final event
2025 ”seizure detection challenge” organized in the International prediction in Standardized Computer-based Organized Report-
Conference on Artificial Intelligence in Epilepsy and Other ing of EEG (SCORE) standard [10] that can be used in real
Neurological Disorders), underscoring its potential for real-time, life involves extensive time-consuming post-processing, which
precise seizure detection. departs existing algorithms from simultaneous detection. More
Index Terms—Time series analysis, change point detection,
deep learning, transformers
than that, existing time series analysis research often train and
evaluate models using datasets that have a small sequence
length [11], while EEG studies haven shown that long-range
I. I NTRODUCTION
input records can largely benefit accurate prediction [12].
Epilepsy is a prevalent neurological disorder distinguished In contrast to window-level classification models, sequence-
by recurring seizures. Worldwide, there are approximately 65 to-sequence modeling, a type of encoder-decoder model to
million people with epilepsy, more than Parkinson’s disease, map an input sequence to an output sequence, provides a
Alzheimer’s disease, and Multiple Sclerosis combined. One of straightforward solution to avoid redundant post-processing
the most serious complications linked to epilepsy is Sudden steps through time-step-level classification. In the filed of Nat-
Unexpected Death in Epilepsy(SUDEP), which tragically re- ural Language Processing(NLP), Transformer-based models
sults in the deaths of around 1 in every 1000 epilepsy patients have shown remarkable predictive and generative abilities [13],
each year [1]. Given the severity of this risk, early and precise [14]. However, studies have shown that CNN-based models
seizure detection is crucial in clinical practice, as prompt achieve better classification ability in time series analysis com-
intervention can considerably lower mortality rates [2]. pared to RNN-based and Transformer-based models [15]. This
Traditionally, large numbers of multi-channel EEG signals lets the focus of scientific signal classification study be on the
are visually analyzed by neurologists with the goal of un- U-Net [12], [16], [17], a fully convolutional encoder-decoder
derstanding when and where the seizures start and how they network with skip connections that was originally designed
propagate within the brain. However, there are two main for image segmentation [18]. The drawback of such models
disadvantages of visual analysis of EEG signals: it is time- also stands out. Firstly, U-Net primarily operates within local
consuming and prone to subjectivity. Therefore, automation receptive fields, making it difficult for U-Net to effectively
of the detection of the underlying brain dynamics in EEG model long-range dependencies as the input sequence length
becomes big. Beyond that, Scaling U-Net to large datasets training data from two datasets by resampling signals into 256
or high-resolution sequences requires stacking deeper layers, Hz and fixing the channel sequence in order (Fig. 2a).
often leading to vanishing gradients, overfitting, and massive We combine two datasets by concatenating segmented one-
memory and computation usage. minute-long time series windows together, i.e., 60 × 256 =
In this work, we propose a simple U-Net-based archi- 15360 time steps per window. A 75% overlap ratio between
tecture, namely, SeizureTransformer, to solve the mentioned two consecutive windows was set as a hyperparameter during
challenges. The model comprises of three components (i) the segmentation process to augment training examples. To
a deep encoder comprising 1D convolutions (ii) a residual improve the model’s ability to distinguish seizure signals from
CNN stack and a transformer encoder to embed previous background noise, we statistically categorize training windows
output into high-level representation with global contextual into three classes: no-seizure, full-seizure, and partial-seizure,
information, and (iii) streamlined decoder which converts these and uniformly sample a certain number of windows from each
features into a sequence of probabilities, directly indicating the class to create a balanced dataset. Specifically, our training
presence or absence of seizures at every time step. The scaling dataset is constructed as follows:
embedding components makes the model to be easily scalable
D = Dps ∪ Df∗ s ∪ Dns
∗
to build up the model size and to handle long-sequence signals.
Experimentally, our model achieves the consistent state-of-the- where Dps contains all partial-seizure windows, Df∗ s and Dns ∗
art performance, efficiency, and generalization across diverse is a randomly selected subset of full-seizure and no-seizure
subjects and devices in public and private EEG datasets. Our window with |Df∗ s | = 0.3 × |Dps | and |Dns ∗
| = 2.5 × |Dps |.
model has ranked number one in an international competition Pre-processing. We followed [6]’s process for preprocessing
organized by the International Conference on Artificial Intel- EEG data before feeding into the model using a bandpass filter
ligence in Epilepsy and Other Neurological Disorders. to keep frequencies in a range from 0.5 Hz to 120 Hz and two
notch filters to eliminate signals at 1 Hz and 60 Hz, which are
II. R ESULTS
typically associated with heart rate and power line noise (Fig.
A. Model Overview 2b).
We design model architecture based on the U-Net to do Training Setting. We implemented our deep learning model
end-to-end learning from raw waveforms for time-step-level using PyTorch and trained on 2 parallel NVIDIA L40S 46GB
classification to achieve simultaneous seizure detection. Our GPUs. Our training parameters include a batch size of 256,
model consists of three primary modules: an encoder, a a learning rate of 1e-3, a weight decay of 2e-5, and a drop
scaling embedding component, and a decoder, as shown in rate of 0.1 for all dropout layers both at training and test time.
Fig. 1. Taking the continuous long-term EEG signals from We use Binary Cross-Entropy loss as the objective function
the epilepsy monitoring unit, the encoder extracts features and RAdam as the optimizer. The training process was set
by recognizing patterns through one-dimensional convolution to be 100 epochs with early stopping if no improvement in
layers. The feature vectors are further embedded by a ResCNN validation loss was observed over 12 epochs.
stack and a Transformer encoder stack with a global attention Post-processing. After having a sequence of probabilities,
mechanism to generate high-level representations that capture outputted by the model, we implement a set of simple post-
rich temporal dependencies. The streamlined decoder then processing steps to convert continuous probabilities to the
converts these representations into a sequence of probability, final detection(Fig. 2c). Initially, we apply a straightforward
indicating the presence or absence of seizures at every time threshold filter to obtain a discrete mask. Then, two morpho-
step. Residual connections between each encoder layer and logical operations are employed to eliminate spurious spikes of
decoder layer are used to ease the gradient flow and to seizure activity and to fill short 0 gaps. Lastly, we implement a
avoid degradation problems in the deep neural network. More simple duration-based rule to discard blocks of seizure labels
details about network architecture selection are provided in the lasting less than a minimal clinically relevant duration.
methodology section.
C. Evaluation Results
B. Model Training We used TUSZ’s predefined test set, consisting of 42.7
Datasets. We use Temple University Hospital EEG Seizure hours of waveforms from 43 subjects with 469 seizure activi-
Corpus v2.0.3(TUSZ) [19] and Siena Scalp EEG Database ties, to evaluate the detection performance of SeizureTrans-
[20] to form our training dataset. TUSZ is the largest public former with other traditional and deep-learning algorithms.
dataset for seizure detection that has been manually annotated The test set of TUSZ is a list of blind EEG signals that are
with data for seizure events. The predefined training set in completely separated from its training set and validation set,
TUSZ has 910 hours of recording sessions from 579 subjects which ensures the generalization of model performance.
with various sampling frequencies, from 250 Hz to 1000 Hz. We quantify the model’s performance using the area un-
The Siena Scalp EEG Database is a small dataset that contains der the receiver operating characteristics(AUROC). For each
128 hours of recording sessions from 14 subjects with a unified continuous EEG recording, the ROC curve plots the true and
sampling rate of 512 Hz. Both datasets contain at least 19 false positive rates across all possible decision thresholds,
electrodes of the international 10-20 system. We unify the and the AUC represents the area under the ROC curve,
Encoder Scaling Embedding
(512, 960)(512, 480)
(64, 3840) (128, 3840) (128, 1920) (256, 1920) (256, 960)
(32, 15360) (32, 7680) (64, 7680)
(19, 15360)
(256, 3840) (256, 1920) (512, 1920) (512, 960) (512, 960) (512, 480)
(64, 7680) (128, 7680) (128, 3840)
(32, 15360) (64, 15360)
(1, 15360)
Decoder
TABLE I
M ODEL P ERFORMANCE IN THE SEIZURE DETECTION CHALLENGE 2025.
Result
Model Architecture Input Length(s) F1-score Sensitivity Precision FP (per day)
SeizureTransformer U-Net & CNN & Transformer 60 0.43 0.37 0.45 1
Van Gogh Detector CNN & Transformer N × 10 0.36 0.39 0.42 3
S4Seizure S4 12 0.34 0.30 0.42 2
DeepSOZ-HEM LSTM & Transformer 600 0.31 0.58 0.27 14
HySEIZa Hyena-Hierarchy & CNN 12 0.26 0.6 0.22 13
Zhu-Transformer CNN & Transformer 25 0.20 0.46 0.16 24
SeizUnet U-Net & LSTM 30 0.19 0.16 0.20 4
Channel-adaptive CNN 15 0.14 0.06 0.20 1
EventNet U-Net 120 0.14 0.6 0.09 20
Gradient Boost Gradient Boosted Trees 10 0.07 0.15 0.09 6
DynSD LSTM 1 0.06 0.55 0.04 37
Random Forest Random Forest 2 0.06 0.05 0.07 1
which summarizes the model’s performance. We compare our in Epilepsy and Other Neurological Disorders, provides a
model’s performance using the same evaluation metric under completely blind private dataset consisting of continuous EEG
the TUSZ’s predefined test set with other seizure detection recordings for evaluation, which makes it an ideal place to
models, namely, Zhu-Transformer [6], EEGWaveNet [8], and test the performance and generalization of our model fairly.
DCRNN [7], to demonstrate the effectiveness of our proposed The test dataset was collected at the EMU of the Filadelfia
approach. Models used here for the comparison are pre-trained Danish Epilepsy Center in Dianalund from January 2018
models based on different training sets. All of these pre-trained to December 2020 with the NicoleteOneT M v44 amplifier.
models are implemented by [21] and are publicly available. The dataset contains 4360 hours of EEG recordings from 65
As shown in Figure 3, our model demonstrated the highest subjects with various ages, where for each subject, at least
performance, with a mean AUROC of 0.876 and a distribution one seizure during the hospital stay with a visually identifi-
tightly concentrated toward higher values. able electrographic correlate to the seizures recorded on the
video. The ground truth labels were annotated by three board-
D. Application in Seizure Detection Challenge
certified neurophysiologists with expertise in long-term video-
The 2025 Seizure Detection Challenge 1 , organized as EEG monitoring. The F1-score, sensitivity, precision, and false
part of the International Conference on Artificial Intelligence positive per day were used as the primary ranking criterion to
1 competition align with real-world requirements. The event-based scoring
website and leaderboard is available in: https:
//[Link]/challenge/ evaluates annotations at the event level by assessing the degree
TABLE II
a b M ODEL’ S RUNTIME OVER TUSZ V 2.0.3’ S T ESTING S ET
Fp 1 Fp 2
F7
F3 Fz F4
F8 Model Total Runtime(s) Runtime(s) per 1-hour EEG
SeizureTransformer 169.96 3.98
T3 C3 Cz C4 T4
DCRNN 2571.75 60.24
T5
P3 Pz P4
T6
EEGWaveNet 1690.19 39.59
Zhu-Transformer 3309.51 77.53
O1 O2
FP1-Avg
F3-Avg
C3-Avg
U-Net; but different to SeizureTransformer, it chooses to add
P3-Avg
O1-Avg
Seizure Transformer LSTM layers, instead of transformer encoders, after the U-Net
F7-Avg decoder, instead of embedding into the U-Net, and turns out
T3-Avg
T5-Avg to be not as good as our results.
FZ-Avg c
CZ-Avg
PZ-Avg III. D ISCUSSION
FP2-Avg
F4-Avg A. Runtime Analysis
C4-Avg
P4-Avg
O2-Avg
Window-level classification models assign predictions in-
F8-Avg
T4-Avg
dividually to each segmented window. Mapping window la-
T6-Avg
Seizure
bels to the final annotation output followed by the SCORE
compliant [10] that contains the start time and duration time
of a seizure requires the model to segment windows with a
great overlap ratio to ensure the start and stop time’s precision.
Fig. 2. EEG Signal Processing Pipeline: (a) Brain activity is recorded using
a 19-channel EEG system. (b) A 60-second EEG sample is pre-processed This led to tremendous redundant computing and complicated
through normalization, Butterworth bandpass filtering, and 1 Hz & 60 Hz mapping procedures. On the other hand, the time-step-level
IIR notch filters to remove noise. (c) After neural network analysis, post- classification models do not require such post-processing steps
processing steps—threshold filtering, morphological opening and closing, and
removal of short-duration events—produce the final detection. as their predictions can directly indicate the onset time and
activity duration. This approach inherently mitigates the redun-
Mean: 0.876 Mean: 0.642 Mean: 0.566 Mean: 0.679 dant computations associated with overlapping windows and
significantly simplifies the annotation pipeline, which makes
1.0 this method align more closely with the practical clinical
0.8 requirement for efficient automated seizure detection.
We further show our model’s efficiency by comparing the
Density
0.6
inference time with other models using TUSZ’s testing set
0.4 in Table II. Our model demonstrate the lowest running time
0.2
with the ability to handle a one-hour-long recording in 3.98
seconds.
0.0
SeizureTransformer DCRNN EEGWaveNet Zhu-Transformer B. Ablation Study
The better performance of the proposed method for seizure
Fig. 3. Violin plots illustrating the distribution of AUROC values for detection could be due to several factors. Here, we show
SeizureTransformer, DCRNN, EEGWaveNet, and Zhu-Transformer models each model component’s necessity by testing multiple partial
evaluated on the TUSZ v2.0.3 predefined testing set. Mean AUROC scores
for each model are indicated above each plot, with the SeizureTransformer models after removing certain components. As shown in
demonstrating the highest overall performance. Figure 4, vanilla U-Net has an underwhelming performance
with a low AUROC mean. Solely adding a ResCNN stack or
a transformer stack will marginally improve the model perfor-
of overlap between predicted and reference events. mance but also lead to a bigger variance with some extreme
As shown in Table I, our model largely outperforms the false cases. By contrast, integrating both the ResCNN and
other algorithms in terms of F1-score. It is noteworthy that Transformer stacks produces not only higher mean AUROC
we set the picking threshold to be 80% in the competition, but also reduced variance, indicating that these components
which leads to a relative low sensitivity but comes with the complement each other effectively. These results underscore
best precision and False Positive rate. Van Gogh Detector the importance of each proposed element in achieving robust
and Zhu-Transformer are window-level classification models and accurate seizure detection.
that also take advantage of both convolutional and transformer
encoder units; however, their performance did not reach that C. Challenge Results
of SeizureTransformer. This points to the beneficial effects of The competition leaderboard shows a relatively low F1-
time-step-level end-to-end learning. Similarly, SeizUnet, like score across every algorithm compared to the results shown in
our model, is a time-step-level classification algorithm using previously published reviews [22], [23] and the self-reported
Mean: 0.876 Mean: 0.856 Mean: 0.864 Mean: 0.839 Mean: 0.803 Mean: 0.767 tion [12], [26], denoising heart sound signals [27], and Seizure
detection [28].
1.0
There are some works exploring combining U-Net with
0.8 Transformer together for other fields. For example, in a medi-
cal image segmentation task, [29] used self and cross-attention
Density
0.6
with U-Net; [30] incorporated hierarchical Swin Transformer
0.4
into U-Net to extract both coarse and fine-grained feature
0.2 representations. In seismic analysis, [31] proposed a deep
0.0
neural network that can be regarded as a U-Net with global
and self-attention but without a residual connection. However,
RPT RT PT T R N
in the signal processing area, to the best of our knowledge,
Fig. 4. Ablation study for SeizureTransformer by drawing AUROC distribu- there is no existing work to scale U-Net using transformer
tions of models that contains partial components. N represents a vanilla deep blocks. The closest work to this paper [28], where multiple
U-Net without ResCNN and Transformer encoder stack; R represents the U- attention-gated U-Net are used and a following LSTM network
Net with ResCNN stack; T represents the U-Net with Transformer Stack; P
means adding positional encoding before feeding into the transformer stack. is implemented to fusion results.
B. Preliminary
TABLE III For a continuous EEG waveform, before segmenting it to
M ODEL PERFORMANCE IN TUSZ’ S PREDEFINED TESTING SET.
uniform windows as training examples, we resample all data to
Scale Model F-1 Sensitivity Precision a common, i.e., 256, sampling rate using the Fourier method
Gotman 0.0679 0.0558 0.0868 [32], to fix the time resolution for the convolutions in the
EEGWaveNet 0.1088 0.1051 0.1128
DCRNN 0.1917 0.4777 0.1199 model to be meaningful across subjects, and implement a
Sample-based
Zhu-Transformer 0.4256 0.5406 0.3510 Gaussian normalization to each channel, calculated by
EventNet 0.4830 0.5514 0.4286
SeizureTransformer 0.5803 0.4710 0.7556 x∗i = (x∗i − x̄)/sx ,
Gotman 0.2089 0.6199 0.1256
EEGWaveNet 0.2603 0.4427 0.1844 K
DCRNN 0.3262 0.5723 0.2281 1 X
Event-based
Zhu-Transformer 0.5387 0.6116 0.5259
x̄ = xi ,
K i=1
EventNet 0.5655 0.6116 0.5259
SeizureTransformer 0.6752 0.7110 0.6427 K
1 X
sx = (xi − x̄)2 .
K − 1 i=1
performance of computing algorithms. To comprehensively The generated dataset, after slicing, is denoted as D =
understand the model’s performance, we test our model with (X , Y) = {(xi , yi ) | i = 1, . . . , N }, where N represents the
several published algorithms, namely, EventNet [24], Zhu- number of training samples. Each input window xi ∈ RT ×d
Transformer [6], DCRNN [7], EEGWaveNet [8], and the represents a multivariate time series with T = 256 × 60 =
Gotman algorithm [25], in the TUSZ’s predefined testing 15,360 time steps and d = 19 channels. The corresponding
set using the same evaluation metrics(F1-score, sensitivity, time-step-level label yi ∈ {0, 1}T is a binary, box-shaped
and precision) implemented by the challenge organizers [21]. ground truth signal indicating the presence of seizure activity
The testing tools provide both sample-level and event-level at each time step.
evaluation. As shown in Table III, while our model keeps
the state-of-the-art performance, all model achieved better F1- C. Network Design
scores. Such result difference might be due to the distribution Encoder. We use one-dimensional convolutions along the
shift between datasets. As described by the organizer, the time axis to extract local temporal patterns, outputting a
private evaluation dataset include recordings from various tokenized representation of the signal. Specifically, we use a
ages, and the data was collected by portable EEG amplifiers, convolution-pooling block with various kernel sizes from 11 to
allowing patients to move freely within the building, which 3 to detect features at different temporal scales, capturing both
will likely lead to unique attributes in the recording that depart slow and fast dynamics. This reduces the time step size from
from the training set. 15360 to Td = 512 while increases the channel size from 19
to kd = 480 to compensate the loss of resolution in the time
IV. M ETHODS domain. The ELU function is set as the activation function
after each convolution layer.
A. Related Work
Scaling Embedding. Followed by [31], after getting the
U-Net [18] architecture was first proposed in the field of encoded output, we implement a ResCNN stack first to refine
CV for image segmentation tasks. Considering the temporal these tokenized features to yield a better generalization with
continuity of time series data, such networks have been widely better temporal invariance.
deployed in various scientific signal processing applications, We then implement a transformer encoder stack [33] to scale
such as seismic phase detection [16], sleep-staging classifica- the model and to capture long-range dependencies across the
tokenized signal. Specifically, the sine and cosine functions of VI. DATA AVAILABILITY
different frequuencies are used to be positional encodings, We used the following publicly available datasets in this
P E(pos,2i) = sin(pos/10000 2i/Td
), work for training our model. The test set used in the com-
petition was not made publicly available at the time of this
P E(pos,2i+1) = cos(pos/100002i/Td ), write-up.
• Siena Scalp EEG Database: The database consists of
which can then be summed with the input embedding. The
refined representation, denoted as Z, will then be projected EEG recordings of 14 patients acquired at the Unit
into equally-shaped query, key, and value spaces, of Neurology and Neurophysiology of the University
of Siena. Subjects include 9 males (ages 25-71) and
Q = ZW Q , K = ZW K , V = ZW V , 5 females (ages 20-58). Subjects were monitored with
a Video-EEG with a sampling rate of 512 Hz, with
and processed with the use of the global-attention mechanism,
electrodes arranged on the basis of the international 10-20
QK T System. Most of the recordings also contain 1 or 2 EKG
A = sof tmax( √ )V.
dk signals. The diagnosis of epilepsy and the classification
The attention output is combined with tokens with a residual of seizures according to the criteria of the International
connection and layer normalization and a subsequent feed- League Against Epilepsy were performed by an expert
forward network to transform the output with another residual clinician after a careful review of the clinical and elec-
addition. trophysiological data of each patient. License: https://
Such hierarchical processing scales the model and integrates [Link]/content/siena-scalp-eeg/view-license/1.0.0/
• TUH EEG Seizure Corpus v2.0.3: This database is a
both local features and global context, enabling the model to
learn complex temporal dependencies. subset of the TUH EEG Corpus that was collected from
Decoder. Similar to the encoder, we use a convolutional archival records of clinical EEGs at Temple University
decoder to decrypt the compressed information from the center Hospital recorded between 2002 – 2017. From this large
latent space into a sequence of probability distribution. How- dataset, a subset of files with a high likelihood of con-
ever, instead of the convolution-pooling block, we upsample taining seizures was retained based on clinical notes and
the input with a scale factor of 2 and then with a convolution on the output of seizure detection algorithms. V2.0.0
to decrease the number of channels and to increase the number contains 7377 .edf files from 675 subjects for a total
of time steps back to the original window size. Like U-Net, duration of 1476 hours of data. The files are mostly
the residual connections are used between the encoder and short (avg. 10 minutes). The dataset has a heterogeneous
decoder to facilitate efficient gradient flow. sampling frequency and number of channels. All files are
Training. The model is trained to produce predictions ŷi that acquired at a minimum of 250 Hz. A minimum of 17 EEG
minimize the following objective: channels is available in all recordings. They are posi-
tioned according to the 10-20 system. The annotations are
ŷi = fθ (xi ), θ ∈ arg min L provided as .csv and contain the start time, stop, channel,
and seizure type. License: [Link]
Here, we use the Binary Cross-Entropy loss as our training
projects/nedc/forms/tuh [Link].
objective L, which measures the dissimilarity between the
predicted and true labels: VII. C ODE AVAILABILITY
L(ŷi,j , yi,j ) = −yi,j log(ŷi,j ) − (1 − yi,j ) log(1 − ŷi,j ) Our source code and model are available at [Link]
com/keruiwu/SeizureTransformer.
where yi,j and ŷi,j are the ground truth and predicted labels,
respectively, for sample i at time step j. R EFERENCES
[1] L. Hirsch, E. Donner, E. So, M. Jacobs, L. Nashef, J. Noebels, and
V. L IMITATION J. Buchhalter, “Abbreviated report of the nih/ninds workshop on sudden
While there has been a rich literature of research on epileptic unexpected death in epilepsy,” Neurology, vol. 76, no. 22, pp. 1932–
1938, 2011.
seizure detection and prediction, there is more work to be [2] A. Van de Vel, K. Cuppens, B. Bonroy, M. Milosevic, K. Jansen,
done to generalize the algorithms to anatomically different S. Van Huffel, B. Vanrumste, P. Cras, L. Lagae, and B. Ceulemans,
types of epilepsy, different ambulatory settings for recordings. “Non-eeg seizure detection systems and potential sudep prevention: state
of the art: review and update,” Seizure, vol. 41, pp. 141–153, 2016.
This is evident from the gaps between the training-validations [3] H. Wu, T. Hu, Y. Liu, H. Zhou, J. Wang, and M. Long, “Timesnet:
v.s. testing F1-scores of the work presented in this paper. Our Temporal 2d-variation modeling for general time series analysis,” arXiv
demonstrates a high F1-score over other data sets. However, preprint arXiv:2210.02186, 2022.
[4] Y. Liu, T. Hu, H. Zhang, H. Wu, S. Wang, L. Ma, and M. Long, “itrans-
its F1-score is lower on the withheld test data set while it still former: Inverted transformers are effective for time series forecasting,”
out performs the competing ones with a significant difference. arXiv preprint arXiv:2310.06625, 2023.
Thus, future work will focus on understanding the differences [5] T. Zhou, Z. Ma, Q. Wen, X. Wang, L. Sun, and R. Jin, “Fedformer:
Frequency enhanced decomposed transformer for long-term series fore-
in the data distributions between training and test data sets to casting,” in International conference on machine learning. PMLR,
improve our model. 2022, pp. 27 268–27 286.
[6] Y. Zhu and M. D. Wang, “Automated seizure detection using transformer [26] M. Perslev, M. Jensen, S. Darkner, P. J. Jennum, and C. Igel, “U-time: A
models on multi-channel eegs,” in 2023 IEEE EMBS International fully convolutional network for time series segmentation applied to sleep
Conference on Biomedical and Health Informatics (BHI). IEEE, 2023, staging,” Advances in neural information processing systems, vol. 32,
pp. 1–6. 2019.
[7] S. Tang, J. A. Dunnmon, K. Saab, X. Zhang, Q. Huang, F. Dubost, [27] A. Mukherjee, R. Banerjee, and A. Ghose, “A novel u-net architecture
D. L. Rubin, and C. Lee-Messer, “Self-supervised graph neural networks for denoising of real-world noise corrupted phonocardiogram signal,”
for improved electroencephalographic seizure analysis,” arXiv preprint arXiv preprint arXiv:2310.00216, 2023.
arXiv:2104.08336, 2021. [28] M. R. Islam, X. Zhao, Y. Miao, H. Sugano, and T. Tanaka, “Epileptic
[8] P. Thuwajit, P. Rangpong, P. Sawangjai, P. Autthasan, R. Chaisaen, seizure focus detection from interictal electroencephalogram: a survey,”
N. Banluesombatkul, P. Boonchit, N. Tatsaringkansakul, T. Sud- Cognitive Neurodynamics, vol. 17, no. 1, pp. 1–23, Feb 2023.
hawiyangkul, and T. Wilaiprasitporn, “Eegwavenet: Multiscale cnn- [29] O. Petit, N. Thome, C. Rambour, L. Themyr, T. Collins, and L. Soler,
based spatiotemporal feature extraction for eeg seizure detection,” IEEE “U-net transformer: Self and cross attention for medical image segmen-
Transactions on Industrial Informatics, vol. 18, no. 8, pp. 5547–5557, tation,” in Machine Learning in Medical Imaging: 12th International
2021. Workshop, MLMI 2021, Held in Conjunction with MICCAI 2021,
[9] A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with Strasbourg, France, September 27, 2021, Proceedings 12. Springer,
selective state spaces,” arXiv preprint arXiv:2312.00752, 2023. 2021, pp. 267–276.
[10] S. Beniczky, H. Aurlien, J. C. Brøgger, L. J. Hirsch, D. L. Schomer, [30] A. Lin, B. Chen, J. Xu, Z. Zhang, G. Lu, and D. Zhang, “Ds-transunet:
E. Trinka, R. M. Pressler, R. Wennberg, G. H. Visser, M. Eisermann Dual swin transformer u-net for medical image segmentation,” IEEE
et al., “Standardized computer-based organized reporting of eeg: Score– Transactions on Instrumentation and Measurement, vol. 71, pp. 1–15,
second version,” Clinical Neurophysiology, vol. 128, no. 11, pp. 2334– 2022.
2346, 2017. [31] S. M. Mousavi, W. L. Ellsworth, W. Zhu, L. Y. Chuang, and G. C.
[11] A. Bagnall, H. A. Dau, J. Lines, M. Flynn, J. Large, A. Bostrom, Beroza, “Earthquake transformer—an attentive deep-learning model for
P. Southam, and E. Keogh, “The uea multivariate time series classi- simultaneous earthquake detection and phase picking,” Nature commu-
fication archive, 2018,” arXiv preprint arXiv:1811.00075, 2018. nications, vol. 11, no. 1, p. 3952, 2020.
[12] H. Li and Y. Guan, “Deepsleep convolutional neural network allows [32] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy,
accurate and fast detection of sleep arousal,” Communications biology, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright et al.,
vol. 4, no. 1, p. 18, 2021. “Scipy 1.0: fundamental algorithms for scientific computing in python,”
[13] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Nature methods, vol. 17, no. 3, pp. 261–272, 2020.
Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning [33] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
with a unified text-to-text transformer,” Journal of machine learning Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in
research, vol. 21, no. 140, pp. 1–67, 2020. neural information processing systems, vol. 30, 2017.
[14] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy,
V. Stoyanov, and L. Zettlemoyer, “Bart: Denoising sequence-to-sequence
pre-training for natural language generation, translation, and comprehen-
sion,” arXiv preprint arXiv:1910.13461, 2019.
[15] Y. Wang, H. Wu, J. Dong, Y. Liu, M. Long, and J. Wang, “Deep time
series models: A comprehensive survey and benchmark,” arXiv preprint
arXiv:2407.13278, 2024.
[16] W. Zhu and G. C. Beroza, “Phasenet: a deep-neural-network-based seis-
mic arrival-time picking method,” Geophysical Journal International,
vol. 216, no. 1, pp. 261–273, 2019.
[17] C. Chatzichristos, J. Dan, A. M. Narayanan, N. Seeuws, K. Vandecas-
teele, M. De Vos, A. Bertrand, and S. Van Huffel, “Epileptic seizure
detection in eeg via fusion of multi-view attention-gated u-net deep
neural networks,” in 2020 IEEE Signal Processing in Medicine and
Biology Symposium (SPMB). IEEE, 2020, pp. 1–7.
[18] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks
for biomedical image segmentation,” in Medical image computing and
computer-assisted intervention–MICCAI 2015: 18th international con-
ference, Munich, Germany, October 5-9, 2015, proceedings, part III 18.
Springer, 2015, pp. 234–241.
[19] V. Shah, E. Von Weltin, S. Lopez, J. R. McHugh, L. Veloso, M. Gol-
mohammadi, I. Obeid, and J. Picone, “The temple university hospital
seizure detection corpus,” Frontiers in neuroinformatics, vol. 12, p. 83,
2018.
[20] P. Detti, “Siena scalp eeg database,” PhysioNet. doi, vol. 10, p. 493,
2020.
[21] J. Dan, U. Pale, A. Amirshahi, W. Cappelletti, T. M. Ingolfsson,
X. Wang, A. Cossettini, A. Bernini, L. Benini, S. Beniczky et al.,
“Szcore: Seizure community open-source research evaluation framework
for the validation of electroencephalography-based automated seizure
detection algorithms,” Epilepsia, 2024.
[22] S. Supriya, S. Siuly, H. Wang, and Y. Zhang, “Epilepsy detection from
eeg using complex network techniques: A review,” IEEE Reviews in
Biomedical Engineering, vol. 16, pp. 292–306, 2021.
[23] M. K. Siddiqui, R. Morales-Menendez, X. Huang, and N. Hussain, “A
review of epileptic seizure detection using machine learning classifiers,”
Brain informatics, vol. 7, no. 1, p. 5, 2020.
[24] N. Seeuws, M. De Vos, and A. Bertrand, “Avoiding post-processing
with event-based detection in biomedical signals,” IEEE Transactions
on Biomedical Engineering, 2024.
[25] J. Gotman, “Automatic recognition of epileptic seizures in the eeg,”
Electroencephalography and clinical Neurophysiology, vol. 54, no. 5,
pp. 530–540, 1982.