Semi-Supervised Deep Learning Based Method for Abnormality Detection in Videos
Semi-Supervised Deep Learning Based Method for Abnormality Detection in Videos
2025 First International Conference on Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT) | 979-8-3315-1857-8/25/$31.00 ©2025 IEEE | DOI: 10.1109/CE2CT64011.2025.10939099
work followed by results and comparison in section IV and V supervisory signals over high confidence to low confidence
respectively. Section V is conclusion about paper. snippets depends upon the similarity of snippet features.
Sultani et al. [19] proposed learning anomaly by utilizing the
II. RELATED WORK deep multiple instance ranking framework (MIRF) and
Due to the increasing application over the multiple training videos with only minimally labelled information.
domains such as public safety, crowd management, violence Within the framework of multiple instance learning (MIL), it
detection, security, and people behavior analysis[4], considered normal and abnormal videos to be bags, while
computer vision has seen a lot of activity in the research field video segments were considered to be instances. A high
of anomaly detection. This section discusses literature review predicted anomaly score is generated by this model for video
of previous work based upon detection of anomaly over segments that contain anomalies.
unsupervised and weakly supervised datasets. A network that was proposed by Awan et al. [20] to obtain
A. Anomaly Detection discriminative image features is one in which two networks
that are complementary to one another are trained together to
A number of different approaches have been proposed in
obtain the features. The features obtained from the third
order to identify anomalous events in surveillance videos
[5][6]. Datta et al. [7] extracted features from person limb network are then concatenated with the features obtained
portion in the form of its motion trajectory and orientation to from the first two networks. The network learns from features
detect human violence in surveillance videos. They used of the proposals related to whole object instances, and as a
direction and magnitude of person motion to relate with result, it learns to predict higher probabilities for proposals
acceleration measure vector. Kooij at al. [6] make use of audio having whole objects rather than proposals that have only
and video information from video to monitor the human specific object parts.
aggression. Audio and video information are fused together to
input dynamic bayesian network for estimation of aggression
level. Gao et al. [5] proposed a novel feature Oriented Violent III. PROPOSED WORK
flows (OViF) that uses dynamic information during change in A. Proposed Methodology
motion for violence detection. Mohammadi et al. [3] used
Fig. 2 illustrates the block diagram of methodology followed
newtonian mechanics based social force model to identify
behavior heuristics for describing people behavior in in the work. Initially the video dataset from UCF crime
surveillance videos. Apart from detecting violence and non- dataset is prepared by extracting the frames and then feature
violence pattern, researchers have used global motion pattern extraction is done out of frames. The features are used to train
such as topic modeling[8], histogram approach[9], social the model feed into convolutional neural network that results
force model[10], mixture of dynamic texture model, etc. for in detection of anomaly.
anomaly detection. Feng et al. [11] proposed a real time based
image reconstruction autoencoder model. The work
incorporates spatial constraints to reduce the false positive
anomaly detection. Chidananda et al. [12] proposed a model
to process the frames of videos with homomorphic and
gaussian binomial algorithm. It uses similarity score based k
means clustering method to cluster normal and anomaly
frames. The anomaly frame fed into deep predictive network
for detection. Lin et al. [13] created synthetic anomaly data
and proposed a 3D deep learning model for anomaly
detection.
Fig 2: Block Diagram of Methodology
B. Unsupervised methods for anomaly detection
These method do not require labelled training dataset and B. Dataset
learn features for anomaly detection in unsupervised manner. UCF Crime Video dataset[19] consists of total 1900 real
Giorno et al.[14] used a discriminative learning method that is world video surveillance clippings from various sources. 950
not dependent on the temporal ordering of anomalies in order of the videos are normal activities, and 950 of the videos are
to detect anomalies within surveillance videos. Ionescu et al. for different anomaly situations as illustrated in fig. 3. The
[15] proposed a unmasking based framework that extracts the images shown are sample frames only which contains
features using two consecutive video sequences with the help anomaly. All the videos are unedited and collected from text
of a binary classifier. Liu et al. [16] proposed multi classifier queries from multiple sources such as YouTube, google and
two sample tests (MC2ST) method in machine learning to live leak.
establish a relationship between heuristic unmasking
procedure. Motion feature based upon frame level has been
explored and sampling method has been suggested to improve
accuracy of anomaly events detection in surveillance videos.
C. Weakly supervised methods
Zaheer et al. [17] used video level annotation rather than
frame level. Binary clustering was used to generate pseudo
level annotation to lessen the noise anomalous videos. Graph
convolutional networks were used to clean up the noisy labels
in the video dataset by Zhong et al.[18]. The movement of Fig 3: Types of anomalies in dataset
163
Authorized licensed use limited to: SJM INSTITUTE OF TECHNOLOGY. Downloaded on May 29,2025 at 08:07:40 UTC from IEEE Xplore. Restrictions apply.
2025 First International Conference on Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT)
The training dataset contains 800 videos of normal activity respectively. Adagrad optimization algorithm is used with
and 150 videos of abnormal activity. The testing dataset learning rate 0.001 for best performance.
contains 810 videos of normal activity and 140 videos of
abnormal activity. All 290 anomaly activity videos in training IV. RESULTS
and testing dataset consists of 13 types of anomalies A. Evaluation Metrics
mentioned above. These anomalies occurred in different
frames in each anomaly activity video. The two most favorable evaluation metrics i.e. Receiver
operating characteristics (ROC) and Area under curve (AUC)
C. Feature Extraction are used to evaluate the model. ROC graph indicates the
Frames are extracted from video at 25 frames per second and performance of classification model. ROC plot has two major
resized to 240 × 320 pixels before computing features. All the parameters: True positive rate (TPR) and false positive rate
videos in the dataset are divided into 32 distinct segments, (FPR). TPR is defined as percentage of actual positives
each of which is considered of as an instance within the bag, against all correctly identified samples. FPR is defined as
percentage of false positives against all positive predictions.
as seen in fig. 4. Segment count of 32 is determined
TPR and FPR formula are used as per (1).
empirically. 3D features extracted from these video segments
to train a fully connected neural network from these features = , = (1)
set.
Fig. 5 illustrates the results of the proposed methodology
in terms of the ROC. The model is trained for 100 epochs and
loss curve is illustrated in fig. 6.
164
Authorized licensed use limited to: SJM INSTITUTE OF TECHNOLOGY. Downloaded on May 29,2025 at 08:07:40 UTC from IEEE Xplore. Restrictions apply.
2025 First International Conference on Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT)
(a) (b)
(c) (d)
Fig. 7: Results on test videos where higher anomaly scores for frames containing anomaly as compared to frames with no anomaly or normal conditions (a),
(b) are anomaly scores for explosion and abuse videos from UCF Crime dataset whereas (c), (d) are footage from CCTV installed outside house exhibiting
normal behavior.
V. COMPARISON method learns the local features through end to end fully
Sultani et al.[19] proposed multiple instance learning convolutional feed forward autoencoder. The methodology
method that learns a deep anomaly ranking model to predict provides accuracy of 50.6%. Zhao et al. [22] proposed a
anomaly scores for anomaly videos and achieved accuracy of method that extracts spatio temporal features using C3D
75.41%. Hasan et al. [21] proposed two semi supervised network and increase the temporal features using uni
method based upon autoencoders. The first methods uses directional LSTM model to detect anomaly in videos giving
handcrafted spatio temporal local features from video accuracy of 82.36%. The comparison of above discussed
sequences to learn fully connected auto encoder. Second method with ours is shown in table 2.
TABLE II: COMPARISON WITH EXISTING WORKS ON UCF DATASET. TABLE SHOWS OUR WORK OUTPERFORMS PRIOR WORK INDICATING THE EFFECTIVENESS
OF METHOD.
165
Authorized licensed use limited to: SJM INSTITUTE OF TECHNOLOGY. Downloaded on May 29,2025 at 08:07:40 UTC from IEEE Xplore. Restrictions apply.
2025 First International Conference on Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT)
B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds. Cham: Springer [16] Y. Liu, C.-L. Li, and B. Póczos, “Classifier Two Sample Test for
International Publishing, 2016, pp. 3–18. Video Anomaly Detections,” undefined, 2018.
[4] D. Chaudhary, S. Kumar, and V. S. Dhaka, “Video based human [17] M. Z. Zaheer, A. Mahmood, H. Shin, and S.-I. Lee, “A Self-
crowd analysis using machine learning: a survey,” Computer Reasoning Framework for Anomaly Detection Using Video-Level
Methods in Biomechanics and Biomedical Engineering: Imaging & Labels,” IEEE Signal Process. Lett., vol. 27, pp. 1705–1709, 2020,
Visualization, pp. 1–19, Oct. 2021, doi: doi: 10.1109/LSP.2020.3025688.
10.1080/21681163.2021.1986859. [18] J.-X. Zhong, N. Li, W. Kong, S. Liu, T. H. Li, and G. Li, “Graph
[5] Y. Gao, H. Liu, X. Sun, C. Wang, and Y. Liu, “Violence detection Convolutional Label Noise Cleaner: Train a Plug-and-play Action
using Oriented VIolent Flows,” Image Vis. Comput., vol. 48–49, pp. Classifier for Anomaly Detection,” arXiv, 2019, doi:
37–41, Apr. 2016, doi: 10.1016/j.imavis.2016.01.006. 10.48550/arxiv.1903.07256.
[6] J. F. P. Kooij, M. C. Liem, J. D. Krijnders, T. C. Andringa, and [19] W. Sultani, C. Chen, and M. Shah, “Real-World Anomaly
D. M. Gavrila, “Multi-modal human aggression detection,” Detection in Surveillance Videos,” in 2018 IEEE/CVF Conference
Computer Vision and Image Understanding, vol. 144, pp. 106–120, on Computer Vision and Pattern Recognition, Jun. 2018, pp. 6479–
Mar. 2016, doi: 10.1016/j.cviu.2015.06.009. 6488, doi: 10.1109/CVPR.2018.00678.
[7] A. Datta, M. Shah, and N. Da Vitoria Lobo, “Person-on-person [20] M. Awan and J. Shin, “Weakly supervised object detection using
violence detection in video data,” in Object recognition supported complementary learning and instance clustering,” IEEE Access, vol.
by user interaction for service robots, 2002, pp. 433–438, doi: 8, pp. 103419–103432, 2020, doi:
10.1109/ICPR.2002.1044748. 10.1109/ACCESS.2020.2999596.
[8] T. Hospedales, S. Gong, and T. Xiang, “A Markov Clustering [21] M. Hasan, J. Choi, J. Neumann, A. K. Roy-Chowdhury, and L. S.
Topic Model for mining behaviour in video,” in 2009 IEEE 12th Davis, “Learning temporal regularity in video sequences,” in 2016
International Conference on Computer Vision, Sep. 2009, pp. 1165– IEEE Conference on Computer Vision and Pattern Recognition
1172, doi: 10.1109/ICCV.2009.5459342. (CVPR), Jun. 2016, pp. 733–742, doi: 10.1109/CVPR.2016.86.
[9] X. Cui, Q. Liu, M. Gao, and D. N. Metaxas, “Abnormal detection [22] Y. Zhao, G. Mogos, and K. L. Man, “Video Anomaly Detection
using interaction energy potentials,” in CVPR 2011, Jun. 2011, pp. by the Combination of C3D and LSTM,” Jun. 2021.
3161–3167, doi: 10.1109/CVPR.2011.5995558. [23] F. Sohrab, J. Raitoharju, M. Gabbouj, and A. Iosifidis, “Subspace
[10] R. Mehran, A. Oyama, and M. Shah, “Abnormal crowd behavior Support Vector Data Description,” arXiv, 2018, doi:
detection using social force model,” in 2009 IEEE Conference on 10.48550/arxiv.1802.03989.
Computer Vision and Pattern Recognition, Jun. 2009, pp. 935–942, [24] J. Wang and A. Cherian, “GODS: Generalized One-Class
doi: 10.1109/CVPRW.2009.5206641. Discriminative Subspaces for Anomaly Detection,” in 2019
[11] J. Feng, D. Wang, and L. Zhang, “Crowd anomaly detection via IEEE/CVF International Conference on Computer Vision (ICCV),
spatial constraints and meaningful perturbation,” ISPRS Int J Oct. 2019, pp. 8200–8210, doi: 10.1109/ICCV.2019.00829.
Geoinf, vol. 11, no. 3, p. 205, Mar. 2022, doi: 10.3390/ijgi11030205. [25] C. Sun, Y. Jia, Y. Hu, and Y. Wu, “Scene-Aware Context
[12] K. Chidananda and A. P. S. Kumar, “An Efficient Real Time Reasoning for Unsupervised Abnormal Event Detection in Videos,”
Anomaly Detection in Surveillance Videos Using PRU-DPCN in Proceedings of the 28th ACM International Conference on
Classifier,” SN COMPUT. SCI., vol. 6, no. 1, p. 31, Dec. 2024, doi: Multimedia, New York, NY, USA, Oct. 2020, pp. 184–192, doi:
10.1007/s42979-024-03443-7. 10.1145/3394171.3413887.
[13] W. Lin, J. Gao, Q. Wang, and X. Li, “Learning to detect anomaly [26] M. Z. Zaheer, A. Mahmood, M. H. Khan, M. Segu, F. Yu, and S.-
events in crowd scenes from synthetic data,” Neurocomputing, vol. I. Lee, “Generative Cooperative Learning for Unsupervised Video
436, pp. 248–259, May 2021, doi: 10.1016/j.neucom.2021.01.031. Anomaly Detection,” arXiv, 2022, doi: 10.48550/arxiv.2203.03962.
[14] A. Del Giorno, J. A. Bagnell, and M. Hebert, “A discriminative [27] J. Zhang, L. Qing, and J. Miao, “Temporal Convolutional
framework for anomaly detection in large videos,” in Computer Network with Complementary Inner Bag Loss for Weakly
vision – ECCV 2016, vol. 9909, B. Leibe, J. Matas, N. Sebe, and M. Supervised Anomaly Detection,” in 2019 IEEE International
Welling, Eds. Cham: Springer International Publishing, 2016, pp. Conference on Image Processing (ICIP), Sep. 2019, pp. 4030–4034,
334–349. doi: 10.1109/ICIP.2019.8803657.
[15] R. T. Ionescu, S. Smeureanu, B. Alexe, and M. Popescu,
“Unmasking the abnormal events in video,” arXiv, 2017, doi:
10.48550/arxiv.1705.08182.
166
Authorized licensed use limited to: SJM INSTITUTE OF TECHNOLOGY. Downloaded on May 29,2025 at 08:07:40 UTC from IEEE Xplore. Restrictions apply.