Project Report(Dc)
Project Report(Dc)
1
Certificate
We hereby certify that the work presented in this B.Tech. Project (ITPE40) report,
entitled “Detection of human abnormal activity”, in partial fulfillment of the requirements
for the Mid-Semester evaluation of the Bachelor of Technology (Information
Technology) program, is an authentic record of our own work carried out from January
2025 to March 2025, under the supervision of Dr. A.K. Patel, Assistant Professor,
Computer Engineering Department, National Institute of Technology, Kurukshetra,
India.
The matter presented in this project report has not been submitted for the award of
any other degree elsewhere.
Signature of Candidate
Harshita (12113100)
This is to certify that the above statement made by the candidates is correct to the best
of my knowledge.
Date: 27.03.2025
Signature of Supervisor Faculty Mentor
Dr. A.K. Patel
Asst. Prof
2
Table of Contents
3
Abstract
With the increasing deployment of surveillance cameras in public and private spaces,
monitoring human activities for security and safety has become crucial. However, traditional
monitoring systems rely heavily on human supervision, leading to inefficiencies and high
false-positive rates. To address these challenges, this study proposes an intelligent, deep
learning-based system for Abnormal Human Activity Detection that integrates spatial and
temporal feature extraction for real-time video analysis.
Our approach employs a hybrid CNN-LSTM model, leveraging VGG16 for spatial feature
extraction and stacked Long Short-Term Memory (LSTM) networks for temporal
dependency modelling. The model is trained on pre-processed video sequences, where frames
are resized, normalized, and fed into a deep learning pipeline. The system is optimized using
categorical cross-entropy loss and the Adam optimizer, ensuring robust learning while
incorporating dropout layers to prevent overfitting.
Important Keywords: Abnormal human activity; CNN; Long Short-Term Memory; Adam
Optimizer; security monitoring; automated surveillance; Deep Learning.
4
1. Introduction
With the rapid expansion of surveillance systems in public and private spaces, ensuring
effective monitoring of human activities has become a pressing need. Security cameras are
widely installed in locations such as streets, malls, and banks to enhance safety; however,
their effectiveness is often hindered by the overwhelming amount of video data that requires
manual monitoring [1]. Due to the imbalance between the number of cameras and available
human observers, a significant portion of recorded footage remains unreviewed, leading to
missed critical events [2].
Anomaly detection in video surveillance is a key challenge in computer vision, aimed at
identifying unusual activities such as accidents, violence, and suspicious behavior [3].
Traditional surveillance approaches depend on human supervision, which is both time-
consuming and prone to errors [4]. Furthermore, existing automated methods often struggle
with high false-positive rates and poor adaptability to diverse environments [5]. Hence, the
need for an intelligent, automated system capable of accurately detecting abnormal activities
in real time has become evident [3].
2. Motivation
In today’s world, surveillance cameras are omnipresent, yet their effectiveness in often
compromised due to the sheer volume of footage that goes unreviewed. Relying solely on
human observation is not scalable, as fatigue, distraction, and human error can lead to critical
security lapses. The challenge is not just about monitoring but about intelligently
understanding activities in real-time to prevent incidents before they escalate.
This project seeks to transform traditional surveillance into an adaptive and intelligent
security system using deep learning. By integrating CNN-based feature extraction with
LSTM-based sequence analysis, the system learns to identify anomalies without constant
human oversight. Unlike conventional motion-detection systems that often trigger false
alarms, our model understands contextual behaviour—distinguishing normal activities from
genuine threats. The motivation behind this research is to bridge the gap between passive
surveillance and proactive security, ensuring a smarter, faster, and more reliable
response to unusual events.
5
3. Literature Review
5. Proposed Work
7
To address the challenge of abnormal human activity detection, this project integrates
Convolutional Neural Networks (CNNs) for spatial feature extraction and Long
Short-Term Memory (LSTM) networks for temporal modelling. The objective is to
develop an efficient and accurate deep learning-based model capable of recognizing
and classifying abnormal activities from video sequences.
The dataset consists of video files categorized into normal and abnormal activities. The
preprocessing steps involve:
Frame Extraction: Each video is converted into a sequence of frames, resized to
64x64 pixels for consistency.
8
Sequence Standardization: Each video is represented using a fixed number of
frames (30 frames per video) to maintain uniform input size. If a video has fewer
frames, the last frame is repeated until the sequence length is met.
Normalization: Pixel values are scaled to the range [0,1] to ensure faster and more
stable training.
One-Hot Encoding: The class labels are transformed into categorical format for
multi-class classification.
Dataset Splitting: The dataset is split into training, validation, and test sets, with
20% of the training data reserved for validation.
To efficiently extract spatial features from frames, the pre-trained VGG16 model
(trained on ImageNet) is used:
The fully connected layers of VGG16 are removed, retaining only the convolutional
layers to obtain high-dimensional feature maps.
To prevent overfitting and reduce computational cost, the weights of VGG16 are
frozen during training.
The extracted features from all frames in a sequence are then reshaped into a format
suitable for LSTM processing
The input to the LSTM model is a sequence of extracted features from VGG16.
The first LSTM layer processes the sequential data and passes outputs to another
LSTM layer to capture long-term dependencies.
Dropout layers (50%) are added to prevent overfitting.
A fully connected Dense layer (128 neurons, ReLU activation) is used to transform
learned features.
The final classification layer has a softmax activation function, outputting
probabilities for each class (normal or abnormal activity).
9
5.5 Model Training
The model is trained using the categorical cross-entropy loss function, optimized
with Adam (learning rate = 0.0001).
The dataset is trained over 30 epochs with a batch size of 16 to balance performance
and computational efficiency.
Performance is evaluated using accuracy, measuring the model’s ability to correctly
classify activities.
The trained model is saved as "model.h5" for future use.
10
7. Current Status and Future Plans of Project Work
Phase 4: Model Training & Optimization - Train model on training data Week 5-
- Monitor loss and accuracy 6
- Fine-tune hyperparameters
Phase 5: Model Evaluation & Testing - Evaluate model on test dataset Week 7-
- Compute accuracy, precision, recall, 8
F1-score
- Identify misclassified cases
Phase 6: Model Deployment & Future - Save trained model for future use -----
Enhancements - Deploy as a web service (optional)
- Propose future improvements
Following work are planned for the upcoming duration of the semester:
(1) In the future, we plan to expand the system by incorporating a wider range of abnormal
activities for detection.
(2) Additionally, we aim to enhance the model's accuracy by refining detection algorithms
and improving feature extraction techniques.
11
8. Conclusion
Despite its strong performance, certain limitations exist, such as potential overfitting,
dependency on training data, and high computational costs. Future improvements may
include data augmentation, optimizing the LSTM architecture, and experimenting with
transformer-based models for even better results.
12
References
[1] Patrikar, D. R., & Parate, M. R. (2021). Anomaly Detection using Edge Computing in
Video Surveillance System: Review. arXiv preprint arXiv:2107.02778. (arxiv.org)
[2] Şengönül, E., Samet, R., Abu Al-Haija, Q., Alqahtani, A., Alturki, B., & Alsulami, A. A.
(2023). An Analysis of Artificial Intelligence Techniques in Surveillance Video Anomaly
Detection: A Comprehensive Survey. Applied Sciences, 13(8), 4956. (mdpi.com)
[3] Jebur, S. A., Hussein, K. A., Hoomod, H. K., Alzubaidi, L., & Santamaría, J. (2023).
Review on Deep Learning Approaches for Anomaly Event Detection in Video
Surveillance. Electronics, 12(1), 29. (mdpi.com)
[4] Berroukham, A., Housni, K., Lahraichi, M., & Boulfrifi, I. (2023). Deep learning-based
methods for anomaly detection in video surveillance: a review. Bulletin of Electrical
Engineering and Informatics, 12(1), 3944. (beei.org)
[5] Saleem, G., Bajwa, U. I., Raza, R. H., Alqahtani, F. H., & Tolba, A. (2022). Efficient
anomaly recognition using surveillance videos. PLOS ONE, 17(10), e0275734.
(pubmed.ncbi.nlm.nih.gov)
APPENDIX
Model to detect human abnormal activity
➢Imports for analysis and visualization
import os
import numpy as np
import cv2
import tensorflow as tf
13
from tensorflow.keras.layers import TimeDistributed, Conv2D, MaxPooling2D, Flatten,
LSTM, Dense, Dropout
import tensorflow as tf
import numpy as np
import random
train_path = "./dataset/train"
test_path = "./dataset/test"
cap = cv2.VideoCapture(video_path)
frames = []
if not ret:
break
frames.append(frame)
cap.release()
14
while len(frames) < sequence_length:
frames.append(frames[-1])
return np.array(frames)
➢Distribution of Target Variable
#Bar values
recs = axes.patches
#To get the value labels from value_counts()
v_labels = DF['HYPCLASS'].value_counts()
15