0% found this document useful (0 votes)
0 views5 pages

Semi-Supervised Deep Learning Based Method for Abnormality Detection in Videos

The document presents a semi-supervised deep learning method for detecting anomalies in surveillance videos, utilizing the UCF Crime dataset containing 950 normal and 950 anomalous videos. The proposed model achieved an accuracy of 83.96%, outperforming existing methods by leveraging 3D feature extraction and a multilayered deep learning architecture. The paper discusses the methodology, results, and comparisons with previous works in the field of anomaly detection in computer vision.

Uploaded by

Devil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views5 pages

Semi-Supervised Deep Learning Based Method for Abnormality Detection in Videos

The document presents a semi-supervised deep learning method for detecting anomalies in surveillance videos, utilizing the UCF Crime dataset containing 950 normal and 950 anomalous videos. The proposed model achieved an accuracy of 83.96%, outperforming existing methods by leveraging 3D feature extraction and a multilayered deep learning architecture. The paper discusses the methodology, results, and comparisons with previous works in the field of anomaly detection in computer vision.

Uploaded by

Devil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2025 First International Conference on Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT)

2025 First International Conference on Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT) | 979-8-3315-1857-8/25/$31.00 ©2025 IEEE | DOI: 10.1109/CE2CT64011.2025.10939099

Semi-supervised deep learning based method for


abnormality detection in videos
Deevesh Chaudhary Sunil Kumar Vijaypal Singh Dhaka
Department of Data Science and Department of Computer Department of Computer
Engineering Communication and Engineering Communication and Engineering
Manipal University Jaipur Manipal University Jaipur Manipal University Jaipur
Jaipur, India Jaipur, India Jaipur, India
[email protected] [email protected] [email protected]

Abstract—Anomaly detection refers to recognition of events


different from normal ones for example road accident, fight,
robbery, arsenal etc. Anomaly identification in real world
surveillance videos is an important application of computer
vision. The work proposed in the paper detects anomalous Fig. 1. Different frames captured within the video for unusual events
events in surveillance videos dataset and is based upon semi
supervised deep learning model. We trained the model using Due to diversity in number of anomaly events in real life
UCF Crime dataset that consists of 950 normal videos and 950 situations, it is very complicated to identify any generic
anomalous videos. The anomaly videos in the dataset consists of anomalies occurring in surveillance video. Most of the
13 different types of anomalies such as Abuse, arrest, explosion, computer vision based anomaly detection models considers
fight etc. that generally occur in real life. The anomaly in the that any feature that deviates from normal pattern is anomaly.
dataset is labelled at video level and not at a specific frame in a
However, the definition of normal and anomalous events is
video to define the semi supervised nature of learning paradigm.
The extracted 3D features from dataset are fed into the
ambiguous as few events that seems to be normal may be
multilayered deep learning model. Experimental results show abnormal in different scenarios [1] for e.g. children running
that our approach has significant improvement over state-of- around park is normal activity but people running around
the-art approaches for anomaly detection in surveillance videos. during riots is abnormal activity.
The accuracy of model comes out to be 83.96 percent, that is Due to supervised nature, most of the solution for
improvement over other methods. computer vision based automated anomaly detection
algorithms are not generalized as they can detect only specific
Keywords—Anomaly detection, feature extraction,
Convolutional Neural Network, surveillance videos
type of anomaly such as accident, violence, robbery etc. [2]
[3]. Also, video captured by surveillance cameras are very
I. INTRODUCTION diverse and can change over the period of time depending
upon different times of the day, therefore the supervised
Anomaly is something that deviates from normal or learning approach for automated anomaly detection
standard scenarios. Anomalies in surveillance system relates sometimes gives wrong results for different anomalies. It is
to the activities such as fight, robbery, kidnapping, accident, required that automated anomaly detection algorithm should
bomb explosion, etc. that is different from normal activities. be designed without any prior knowledge with the least
A few anomaly events in surveillance videos are illustrated in supervision. The contribution of proposed work are as
fig. 1. These anomaly events need to be identified for safety follows:
and security measures. Anomaly detection is useful for public
safety purposes. The aim of anomaly identification system is 1. We proposed a semi supervised deep learning model
to timely activate the signal or raise an alarm in case any trained to learn anomalies in surveillance videos. To
activity happens that is different from normal activities. maintain the semi supervised learning nature of model,
CCTV cameras are crucial for public surveillance and play an anomaly event is defined on the entire video rather in a
important role for ensuring public safety at crowded places particular frame within the video.
such as market, stadium, shopping malls, cinema, etc.
2. The UCF crime dataset contains separate predefined
Manpower is required to observe these surveillance videos
videos for normal events and 13 types of anomalous
captured by CCTV cameras to detect and understand any kind
events videos. The proposed work extracts the 3
of anomaly in videos. This kind of manual observation of
dimensional features from the dataset. These features are
videos and detection of anomaly activities fails in such
more favorable than handcrafted features such as edge,
scenarios and demands for developing automated model for
corners, histogram etc.
detecting abnormal events in surveillance videos. The
computer vision based automated anomaly detection system 3. The accuracy of proposed work is significantly improved
proves to be very cost efficient and less dependent on over other methods for anomaly event detection in CCTV
manpower. Additionally, the use of deep learning techniques videos.
for aforementioned task provides better results as compared to
any other techniques. The paper is divided into multiple sections as follows: Section
II presents literature survey of anomaly detection using
different learning paradigms. Section III discusses proposed

979-8-3315-1857-8/25/$31.00 ©2025 IEEE 162


Authorized licensed use limited to: SJM INSTITUTE OF TECHNOLOGY. Downloaded on May 29,2025 at 08:07:40 UTC from IEEE Xplore. Restrictions apply.
2025 First International Conference on Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT)

work followed by results and comparison in section IV and V supervisory signals over high confidence to low confidence
respectively. Section V is conclusion about paper. snippets depends upon the similarity of snippet features.
Sultani et al. [19] proposed learning anomaly by utilizing the
II. RELATED WORK deep multiple instance ranking framework (MIRF) and
Due to the increasing application over the multiple training videos with only minimally labelled information.
domains such as public safety, crowd management, violence Within the framework of multiple instance learning (MIL), it
detection, security, and people behavior analysis[4], considered normal and abnormal videos to be bags, while
computer vision has seen a lot of activity in the research field video segments were considered to be instances. A high
of anomaly detection. This section discusses literature review predicted anomaly score is generated by this model for video
of previous work based upon detection of anomaly over segments that contain anomalies.
unsupervised and weakly supervised datasets. A network that was proposed by Awan et al. [20] to obtain
A. Anomaly Detection discriminative image features is one in which two networks
that are complementary to one another are trained together to
A number of different approaches have been proposed in
obtain the features. The features obtained from the third
order to identify anomalous events in surveillance videos
[5][6]. Datta et al. [7] extracted features from person limb network are then concatenated with the features obtained
portion in the form of its motion trajectory and orientation to from the first two networks. The network learns from features
detect human violence in surveillance videos. They used of the proposals related to whole object instances, and as a
direction and magnitude of person motion to relate with result, it learns to predict higher probabilities for proposals
acceleration measure vector. Kooij at al. [6] make use of audio having whole objects rather than proposals that have only
and video information from video to monitor the human specific object parts.
aggression. Audio and video information are fused together to
input dynamic bayesian network for estimation of aggression
level. Gao et al. [5] proposed a novel feature Oriented Violent III. PROPOSED WORK
flows (OViF) that uses dynamic information during change in A. Proposed Methodology
motion for violence detection. Mohammadi et al. [3] used
Fig. 2 illustrates the block diagram of methodology followed
newtonian mechanics based social force model to identify
behavior heuristics for describing people behavior in in the work. Initially the video dataset from UCF crime
surveillance videos. Apart from detecting violence and non- dataset is prepared by extracting the frames and then feature
violence pattern, researchers have used global motion pattern extraction is done out of frames. The features are used to train
such as topic modeling[8], histogram approach[9], social the model feed into convolutional neural network that results
force model[10], mixture of dynamic texture model, etc. for in detection of anomaly.
anomaly detection. Feng et al. [11] proposed a real time based
image reconstruction autoencoder model. The work
incorporates spatial constraints to reduce the false positive
anomaly detection. Chidananda et al. [12] proposed a model
to process the frames of videos with homomorphic and
gaussian binomial algorithm. It uses similarity score based k
means clustering method to cluster normal and anomaly
frames. The anomaly frame fed into deep predictive network
for detection. Lin et al. [13] created synthetic anomaly data
and proposed a 3D deep learning model for anomaly
detection.
Fig 2: Block Diagram of Methodology
B. Unsupervised methods for anomaly detection
These method do not require labelled training dataset and B. Dataset
learn features for anomaly detection in unsupervised manner. UCF Crime Video dataset[19] consists of total 1900 real
Giorno et al.[14] used a discriminative learning method that is world video surveillance clippings from various sources. 950
not dependent on the temporal ordering of anomalies in order of the videos are normal activities, and 950 of the videos are
to detect anomalies within surveillance videos. Ionescu et al. for different anomaly situations as illustrated in fig. 3. The
[15] proposed a unmasking based framework that extracts the images shown are sample frames only which contains
features using two consecutive video sequences with the help anomaly. All the videos are unedited and collected from text
of a binary classifier. Liu et al. [16] proposed multi classifier queries from multiple sources such as YouTube, google and
two sample tests (MC2ST) method in machine learning to live leak.
establish a relationship between heuristic unmasking
procedure. Motion feature based upon frame level has been
explored and sampling method has been suggested to improve
accuracy of anomaly events detection in surveillance videos.
C. Weakly supervised methods
Zaheer et al. [17] used video level annotation rather than
frame level. Binary clustering was used to generate pseudo
level annotation to lessen the noise anomalous videos. Graph
convolutional networks were used to clean up the noisy labels
in the video dataset by Zhong et al.[18]. The movement of Fig 3: Types of anomalies in dataset

163
Authorized licensed use limited to: SJM INSTITUTE OF TECHNOLOGY. Downloaded on May 29,2025 at 08:07:40 UTC from IEEE Xplore. Restrictions apply.
2025 First International Conference on Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT)

The training dataset contains 800 videos of normal activity respectively. Adagrad optimization algorithm is used with
and 150 videos of abnormal activity. The testing dataset learning rate 0.001 for best performance.
contains 810 videos of normal activity and 140 videos of
abnormal activity. All 290 anomaly activity videos in training IV. RESULTS
and testing dataset consists of 13 types of anomalies A. Evaluation Metrics
mentioned above. These anomalies occurred in different
frames in each anomaly activity video. The two most favorable evaluation metrics i.e. Receiver
operating characteristics (ROC) and Area under curve (AUC)
C. Feature Extraction are used to evaluate the model. ROC graph indicates the
Frames are extracted from video at 25 frames per second and performance of classification model. ROC plot has two major
resized to 240 × 320 pixels before computing features. All the parameters: True positive rate (TPR) and false positive rate
videos in the dataset are divided into 32 distinct segments, (FPR). TPR is defined as percentage of actual positives
each of which is considered of as an instance within the bag, against all correctly identified samples. FPR is defined as
percentage of false positives against all positive predictions.
as seen in fig. 4. Segment count of 32 is determined
TPR and FPR formula are used as per (1).
empirically. 3D features extracted from these video segments
to train a fully connected neural network from these features = , = (1)
set.
Fig. 5 illustrates the results of the proposed methodology
in terms of the ROC. The model is trained for 100 epochs and
loss curve is illustrated in fig. 6.

Fig 4: Frames extraction

D. Deep learning Model


Fig. 5: ROC Curve
Initially the surveillance video dataset is segmented into
32 fixed parts that act as an input to the model. The model is
trained by considering each video as a bag and its 32 temporal
segments as instance of bag. These bag instances are created
separately for anomaly events video and non-anomaly events
video and are named as positive bag and negative bag
respectively. The CNN proposed in this work starts with input
dimension of 2048 features and feeds into four layered CNN,
First layer: 4096 units; second layer: 512 units; third layer: 32
units; fourth layer: 16 units; final layer: 1 unit FC layers.
Between the layers, 60% dropout regularization is used. For
the first and last FC layers, we use ReLU activation and
Sigmoid activation, respectively. The model is trained to 100
epochs after which model accuracy is intact. The learning rate
at 0.0001 is set empirically with RMSprop optimizer.  Fig. 6: Loss Curve
The proposed neural network is 4 layered Fully connected
network. In the first layer, there are 4096 units, in the second Fig. 7 demonstartes the result on video with different
layer, there are 512 units, and then in the third and fourth anomalies and normal videos. High anomaly scores for
layers, there are 32 units and 1 unit, respectively. Dropout frames having some kind of anomaly and low scores for
regularization of 60% is utilized between layers. ReLU and normal frames displayed in fig. 6 validates the accuracy of
sigmoid activation function is used in first and last layer model.

164
Authorized licensed use limited to: SJM INSTITUTE OF TECHNOLOGY. Downloaded on May 29,2025 at 08:07:40 UTC from IEEE Xplore. Restrictions apply.
2025 First International Conference on Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT)

(a) (b)

(c) (d)

Fig. 7: Results on test videos where higher anomaly scores for frames containing anomaly as compared to frames with no anomaly or normal conditions (a),
(b) are anomaly scores for explosion and abuse videos from UCF Crime dataset whereas (c), (d) are footage from CCTV installed outside house exhibiting
normal behavior.

V. COMPARISON method learns the local features through end to end fully
Sultani et al.[19] proposed multiple instance learning convolutional feed forward autoencoder. The methodology
method that learns a deep anomaly ranking model to predict provides accuracy of 50.6%. Zhao et al. [22] proposed a
anomaly scores for anomaly videos and achieved accuracy of method that extracts spatio temporal features using C3D
75.41%. Hasan et al. [21] proposed two semi supervised network and increase the temporal features using uni
method based upon autoencoders. The first methods uses directional LSTM model to detect anomaly in videos giving
handcrafted spatio temporal local features from video accuracy of 82.36%. The comparison of above discussed
sequences to learn fully connected auto encoder. Second method with ours is shown in table 2.

TABLE II: COMPARISON WITH EXISTING WORKS ON UCF DATASET. TABLE SHOWS OUR WORK OUTPERFORMS PRIOR WORK INDICATING THE EFFECTIVENESS
OF METHOD.

Approach Supervision Accuracy (%)


Support Vector Machine[21] One Class Classification 50.00
Subspace Support Vector Data Description[23] One Class Classification 58.50
BODS[24] One Class Classification 68.26
Generalized one class discriminative subspace[24] One Class Classification 70.46
Scene aware context reasoning [25] One Class Classification 72.70
Generative Cooperating Learning[26] One Class Classification 74.20
Multiple instance ranking[19] Weakly Supervised 77.92
Inner Bag Loss[27] Weakly Supervised 78.66
Ours Weakly Supervised 83.96

VI. CONCLUSION REFERENCES


A deep learning based approach is proposed to detect [1] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection,”
ACM Comput. Surv., vol. 41, no. 3, pp. 1–58, Jul. 2009, doi:
different kind of anomalies in surveillance videos. Both 10.1145/1541880.1541882.
normal and anomalous types of video with weakly labelled [2] S. Kamijo, Y. Matsushita, K. Ikeuchi, and M. Sakauchi, “Traffic
annotations are used to learn the deep learning model. Video monitoring and accident detection at intersections,” IEEE Trans.
level annotations is used instead of temporal annotations of Intell. Transport. Syst., vol. 1, no. 2, pp. 108–118, Jun. 2000, doi:
segment within the video. The model is trained on UCF 10.1109/6979.880968.
crime dataset having various types of general anomalies. [3] S. Mohammadi, A. Perina, H. Kiani, and V. Murino, “Angry
The experimental results shows that our method crowds: detecting violent events in videos,” in Computer vision –
ECCV 2016: 14th european conference, amsterdam, the
outperforms other methods of anomaly detection. netherlands, october 11–14, 2016, proceedings, part VII, vol. 9911,

165
Authorized licensed use limited to: SJM INSTITUTE OF TECHNOLOGY. Downloaded on May 29,2025 at 08:07:40 UTC from IEEE Xplore. Restrictions apply.
2025 First International Conference on Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT)

B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds. Cham: Springer [16] Y. Liu, C.-L. Li, and B. Póczos, “Classifier Two Sample Test for
International Publishing, 2016, pp. 3–18. Video Anomaly Detections,” undefined, 2018.
[4] D. Chaudhary, S. Kumar, and V. S. Dhaka, “Video based human [17] M. Z. Zaheer, A. Mahmood, H. Shin, and S.-I. Lee, “A Self-
crowd analysis using machine learning: a survey,” Computer Reasoning Framework for Anomaly Detection Using Video-Level
Methods in Biomechanics and Biomedical Engineering: Imaging & Labels,” IEEE Signal Process. Lett., vol. 27, pp. 1705–1709, 2020,
Visualization, pp. 1–19, Oct. 2021, doi: doi: 10.1109/LSP.2020.3025688.
10.1080/21681163.2021.1986859. [18] J.-X. Zhong, N. Li, W. Kong, S. Liu, T. H. Li, and G. Li, “Graph
[5] Y. Gao, H. Liu, X. Sun, C. Wang, and Y. Liu, “Violence detection Convolutional Label Noise Cleaner: Train a Plug-and-play Action
using Oriented VIolent Flows,” Image Vis. Comput., vol. 48–49, pp. Classifier for Anomaly Detection,” arXiv, 2019, doi:
37–41, Apr. 2016, doi: 10.1016/j.imavis.2016.01.006. 10.48550/arxiv.1903.07256.
[6] J. F. P. Kooij, M. C. Liem, J. D. Krijnders, T. C. Andringa, and [19] W. Sultani, C. Chen, and M. Shah, “Real-World Anomaly
D. M. Gavrila, “Multi-modal human aggression detection,” Detection in Surveillance Videos,” in 2018 IEEE/CVF Conference
Computer Vision and Image Understanding, vol. 144, pp. 106–120, on Computer Vision and Pattern Recognition, Jun. 2018, pp. 6479–
Mar. 2016, doi: 10.1016/j.cviu.2015.06.009. 6488, doi: 10.1109/CVPR.2018.00678.
[7] A. Datta, M. Shah, and N. Da Vitoria Lobo, “Person-on-person [20] M. Awan and J. Shin, “Weakly supervised object detection using
violence detection in video data,” in Object recognition supported complementary learning and instance clustering,” IEEE Access, vol.
by user interaction for service robots, 2002, pp. 433–438, doi: 8, pp. 103419–103432, 2020, doi:
10.1109/ICPR.2002.1044748. 10.1109/ACCESS.2020.2999596.
[8] T. Hospedales, S. Gong, and T. Xiang, “A Markov Clustering [21] M. Hasan, J. Choi, J. Neumann, A. K. Roy-Chowdhury, and L. S.
Topic Model for mining behaviour in video,” in 2009 IEEE 12th Davis, “Learning temporal regularity in video sequences,” in 2016
International Conference on Computer Vision, Sep. 2009, pp. 1165– IEEE Conference on Computer Vision and Pattern Recognition
1172, doi: 10.1109/ICCV.2009.5459342. (CVPR), Jun. 2016, pp. 733–742, doi: 10.1109/CVPR.2016.86.
[9] X. Cui, Q. Liu, M. Gao, and D. N. Metaxas, “Abnormal detection [22] Y. Zhao, G. Mogos, and K. L. Man, “Video Anomaly Detection
using interaction energy potentials,” in CVPR 2011, Jun. 2011, pp. by the Combination of C3D and LSTM,” Jun. 2021.
3161–3167, doi: 10.1109/CVPR.2011.5995558. [23] F. Sohrab, J. Raitoharju, M. Gabbouj, and A. Iosifidis, “Subspace
[10] R. Mehran, A. Oyama, and M. Shah, “Abnormal crowd behavior Support Vector Data Description,” arXiv, 2018, doi:
detection using social force model,” in 2009 IEEE Conference on 10.48550/arxiv.1802.03989.
Computer Vision and Pattern Recognition, Jun. 2009, pp. 935–942, [24] J. Wang and A. Cherian, “GODS: Generalized One-Class
doi: 10.1109/CVPRW.2009.5206641. Discriminative Subspaces for Anomaly Detection,” in 2019
[11] J. Feng, D. Wang, and L. Zhang, “Crowd anomaly detection via IEEE/CVF International Conference on Computer Vision (ICCV),
spatial constraints and meaningful perturbation,” ISPRS Int J Oct. 2019, pp. 8200–8210, doi: 10.1109/ICCV.2019.00829.
Geoinf, vol. 11, no. 3, p. 205, Mar. 2022, doi: 10.3390/ijgi11030205. [25] C. Sun, Y. Jia, Y. Hu, and Y. Wu, “Scene-Aware Context
[12] K. Chidananda and A. P. S. Kumar, “An Efficient Real Time Reasoning for Unsupervised Abnormal Event Detection in Videos,”
Anomaly Detection in Surveillance Videos Using PRU-DPCN in Proceedings of the 28th ACM International Conference on
Classifier,” SN COMPUT. SCI., vol. 6, no. 1, p. 31, Dec. 2024, doi: Multimedia, New York, NY, USA, Oct. 2020, pp. 184–192, doi:
10.1007/s42979-024-03443-7. 10.1145/3394171.3413887.
[13] W. Lin, J. Gao, Q. Wang, and X. Li, “Learning to detect anomaly [26] M. Z. Zaheer, A. Mahmood, M. H. Khan, M. Segu, F. Yu, and S.-
events in crowd scenes from synthetic data,” Neurocomputing, vol. I. Lee, “Generative Cooperative Learning for Unsupervised Video
436, pp. 248–259, May 2021, doi: 10.1016/j.neucom.2021.01.031. Anomaly Detection,” arXiv, 2022, doi: 10.48550/arxiv.2203.03962.
[14] A. Del Giorno, J. A. Bagnell, and M. Hebert, “A discriminative [27] J. Zhang, L. Qing, and J. Miao, “Temporal Convolutional
framework for anomaly detection in large videos,” in Computer Network with Complementary Inner Bag Loss for Weakly
vision – ECCV 2016, vol. 9909, B. Leibe, J. Matas, N. Sebe, and M. Supervised Anomaly Detection,” in 2019 IEEE International
Welling, Eds. Cham: Springer International Publishing, 2016, pp. Conference on Image Processing (ICIP), Sep. 2019, pp. 4030–4034,
334–349. doi: 10.1109/ICIP.2019.8803657.
[15] R. T. Ionescu, S. Smeureanu, B. Alexe, and M. Popescu,
“Unmasking the abnormal events in video,” arXiv, 2017, doi:
10.48550/arxiv.1705.08182.

166
Authorized licensed use limited to: SJM INSTITUTE OF TECHNOLOGY. Downloaded on May 29,2025 at 08:07:40 UTC from IEEE Xplore. Restrictions apply.

You might also like