0% found this document useful (0 votes)

294 views

Project Report - Credit Card Fraud Detection

The document summarizes a project on credit card fraud detection using machine learning models. It discusses preprocessing credit card transaction data that includes identifying and handling outliers and imbalanced classes. Three classifiers - Naive Bayes, KNN and Decision Tree - are implemented and their performance is evaluated using classification reports, confusion matrices and ROC curves. The dataset is from Kaggle and the code is executed on Google Colab after partitioning data into 80% training and 20% test sets with 5-fold cross validation applied for robustness.

Uploaded by

Snehal Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

294 views

Project Report - Credit Card Fraud Detection

Uploaded by

Snehal Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

PROJECT REPORT - CREDIT CARD FRAUD DETECTION

Submitted by: Snehal Jain

Roll no. 220727
BA (CA + Maths)
Project link: Credit Card Fraud Detection.pdf

Problem Statement

To design and execute a data mining project encompassing data cleaning, preprocessing, classification and
evaluating performance metrics.

Introduction

Given the prevalence of fraud, there is a pressing need for robust fraud detection systems. Broadly, fraud
detection falls into two categories: misuse and anomaly detection. Misuse detection employs
machine-learning-based classification models to differentiate between fraudulent and legitimate transactions.
Conversely, anomaly detection establishes a baseline from sequential records to define the attributes of a typical
transaction and create a distinctive profile for it. This paper presents a strategy for misuse detection utilizing a
blend of K-nearest neighbor (KNN), Naive Bayes, and Decision tree models.

Dataset Details

This dataset contains credit card transactions made by European cardholders in the year 2023. It comprises over
550,000 records, and the data has been anonymized to protect the cardholders' identities. The primary objective
of this dataset is to facilitate the development of fraud detection algorithms and models to identify potentially
fraudulent transactions.

1. Key Features:
a. id: Unique identifier for each transaction
b. V1-V28: Anonymized features representing various transaction attributes (e.g., time, location,
etc.)
c. Amount: The transaction amount
d. Class: Binary label indicating whether the transaction is fraudulent (1) or not (0)
2. Target variable:

The target variable chosen is ‘Class’ which will indicate the presence of fraudulent transactions.

3. Head:

4. Describe:
5. Shape:
(110177, 31)

6. Correlation Heatmap:

plt.style.use("seaborn")
plt.rcParams['figure.figsize']= (22,11)
title = "Correlation Heatmap"
plt.title(title,fontsize=18, weight= 'bold')
sns.heatmap(creditcard_dataset.corr(), cmap="coolwarm", annot=True)
plt.show()

● The most significant strong positive correlations are between V16 - V17 - V18 and between V9 - V10
● The most significant strong negative correlations are between V4 - V14 // V4 - V12 // V4 - V10 //
V10 - V11 // V11 - V14 // V11 - V12 // V21 - V22
● There is clear lack of h-positive correlations in the range of V19 to V28
● There are several moderate to semi-strong positive/negative correlations in the range of V1 to V18
Code Overview:

1) Data Preprocessing
● Handling missing values by filling with the mean.
● Dropping duplicate rows.
● Standardizing features using different scalers: StandardScaler, RobustScaler, and MinMaxScaler
● Detecting outliers using box plot, k means cluster algorithm and RobustScaler.
● Resampling the minority class for balancing.

2) Modeling
● Splitting the balanced data into training and testing sets.
● Implementing three classifiers: Naive Bayes, KNN, and Decision Tree.
● Evaluating each model's performance using classification reports.
● Plotting confusion matrices and ROC curves for model evaluation.

Glossary:

1. Standard Scaler: Scales features by removing mean and scaling to unit variance.
2. Robust Scaler: Scales features using median and interquartile range to mitigate outliers.
3. MinMaxScaler: Scales features to a specified range (default 0 to 1).
4. Confusion Matrix: A table showing true and predicted values to evaluate classifier performance.
5. ROC Curve: Graphical representation of a classifier's true positive rate against false positive rate.
6. Cross Validation: Method to assess model performance by iteratively splitting data into training and
validation sets.
7. Naive Bayes Classifier: Probabilistic algorithm based on Bayes' theorem, assuming feature
independence.
8. K-Nearest Neighbors (KNN) Classifier: Predicts by majority vote of its k-nearest neighbors in
feature space.
9. Decision Tree Classifier: Builds a tree structure to classify instances based on feature conditions.
10. Box plot: It shows the median, quartiles, and potential outliers of numerical data.
11. Scatter plot: A scatter plot is a type of plot that displays values for two variables as points on a 2D
plane.
Methods:

Steps Methods

Preprocessing 1. Data Loading

● Import necessary libraries: pandas,
matplotlib, seaborn, sklearn modules.
● Read the dataset using ‘pd.read_csv()’ and
describe its basic statistics and shape.
● Visualize the correlation heatmap using
seaborn's `heatmap()`.
2. Handling Missing Values and Duplicate
● Fill missing values with the mean using
`fillna()` method.
● Remove duplicate rows using
`drop_duplicates()`
3. Outlier Detection
● Boxplot IQR method.
● Scatter plot using cluster analysis.
● Removing outliers using ‘RobustScaler`.
4. Feature Scaling
● Standardize features using `StandardScaler`.
● Handle outliers using `RobustScaler`.
● Normalize features using `MinMaxScaler`.
5. Handling Imbalanced Data
● Upsample the minority class using
`resample()` to balance the dataset.
6. Train-Test Split
● Split the balanced data into training and
testing sets using `train_test_split()`.

Evaluation Methods 1. Confusion Matrix

● Defined a function to plot the confusion
matrix for different classifiers.
2. ROC Curve Plotting Function
● Define a function to plot the ROC curve
for different classifiers.
3. Cross Validation
● Split the data into 5 subsets using k-folds

Classifier Performance Metrics 1. Naive Bayes Classifier

● Initialize and fit Gaussian Naive Bayes
classifier.
● Generate predictions and print
classification reports.
Steps Methods

● Plot confusion matrix and ROC curve for

Naive Bayes Classifier.
2. K-Nearest Neighbors (KNN) Classifier
● Initialize and fit KNN classifier.
● Generate predictions and print
classification reports.
● Plot confusion matrix and ROC curve for
KNN Classifier.
3. Decision Tree Classifier
● Initialize and fit the Decision Tree classifier.
● Generate predictions and print
classification reports.
● Plot confusion matrix and ROC curve for
Decision Tree Classifier.

Experimental Setup and Dataset Handling

The dataset was taken from Kaggle, a prominent online community in the fields of machine learning and data
science. The code was executed on Google colab a which is a cloud-based platform. Prior to feeding them into
the algorithm, I partitioned all datasets into training data (80%) and test data (20%). To enhance the robustness
of our methodology and maintain consistent model performance, we employed Stratified K-Fold
cross-validation with a fold value of 5. Furthermore, I addressed the skewed nature of the credit card
fraud-related data by dropping duplicate columns and filling missing values. Standardization of features was
done and to identify outliers, both box plots and scatter plots were employed. Outliers detected through these
methods were effectively managed using the robust scaler method, which proved efficient in handling these data
anomalies. 'Class' variable was isolated as the target for predictive analysis. To ensure consistency and
comparability across different features, Min-Max Scaling was applied, effectively standardizing inconsistent
values within the dataset.
Cross Validation

Classifiers Output

The data was cross validated with a fold value of 5

1. Naive Bayes
2. KNN Classification
3. Decision Tree
Classifiers Output
Classifiers Output
Output Analysis:

1. Naive Bayes Classifier Report: Accuracy: 0.92

Class/Report Precision Recall F1-score

Class 0 0.87 0.98 0.92

Class 1 0.98 0.85 0.91

2. KNN Classifier Report: Accuracy: 0.94

Class/Report Precision Recall F1-score

Class 0 0.92 0.96 0.94

Class 1 0.95 0.92 0.94

3. Decision Tree Classifier Report: Accuracy: 1.00

Class/Report Precision Recall F1-score

Class 0 1.0 1.0 1.0

Class 1 1.0 1.0 1.0

Findings:

RobustScaler Advantage:
● In the absence of RobustScaler, models show perfect scores for the majority class while failing to
identify any instances of the minority class, which signals a lack of generalization and model bias.
RobustScaler helps in handling outliers and might improve the model's capability to detect the
minority class.
Conclusion:

1. Naive Bayes
● Performs reasonably well but has slightly lower recall for fraudulent transactions.

2. KNN:
● Shows better performance in identifying both classes with higher precision and recall.

3. Decision Tree:
● Achieves perfect scores, indicating a likely overfitting issue.

These models offer different trade-offs between precision and recall for identifying fraudulent transactions.
While KNN seems balanced, Decision Tree's perfect scores might indicate overfitting and might not generalize
well on new data. Further data validation could enhance the models' performance and generalize better to new
datasets.

References:

1. Kaggle
2. Stackoverflow
3. Geeksforgeeks
4. Javapoint
5. Naive Bayes
6. Decision tree

CWMS PPT h4mdv1
No ratings yet
CWMS PPT h4mdv1
23 pages
Analysis of An Interview Based On Emotion Detection Using Convolutional Neural Networks
No ratings yet
Analysis of An Interview Based On Emotion Detection Using Convolutional Neural Networks
25 pages
Laptop Price Prediction Using Machine Learning: International Journal of Computer Science and Mobile Computing
100% (1)
Laptop Price Prediction Using Machine Learning: International Journal of Computer Science and Mobile Computing
5 pages
IBA-Report Sales Superstore
No ratings yet
IBA-Report Sales Superstore
9 pages
"Hospital Management System": Project Report On
No ratings yet
"Hospital Management System": Project Report On
19 pages
Flight Delay Prediction: Project Synopsis On
No ratings yet
Flight Delay Prediction: Project Synopsis On
13 pages
PROJECT REPORT For Machine Learning
100% (1)
PROJECT REPORT For Machine Learning
22 pages
Summer Internship Report On: Aws Data Engineering (Topic)
No ratings yet
Summer Internship Report On: Aws Data Engineering (Topic)
21 pages
A Report of 08 Weeks Industrial Training At: ASPEXX Health Solution Pvt. LTD
No ratings yet
A Report of 08 Weeks Industrial Training At: ASPEXX Health Solution Pvt. LTD
74 pages
DIP Mini Project
100% (1)
DIP Mini Project
12 pages
Face Recognition Based Attendance System Using Machine Learning
No ratings yet
Face Recognition Based Attendance System Using Machine Learning
5 pages
Final ML Report
No ratings yet
Final ML Report
34 pages
REPORT FILE of FACE MASK DETECTION
No ratings yet
REPORT FILE of FACE MASK DETECTION
45 pages
Virtual Mirror - A Hassle Free Approach To The Use of Trial Room
No ratings yet
Virtual Mirror - A Hassle Free Approach To The Use of Trial Room
38 pages
Sentiments Analysis Using Ai: Project Report
No ratings yet
Sentiments Analysis Using Ai: Project Report
27 pages
DBMS Project Report - $#$&
100% (1)
DBMS Project Report - $#$&
22 pages
Ooad Record Abinash
No ratings yet
Ooad Record Abinash
241 pages
Face Recognition System
No ratings yet
Face Recognition System
7 pages
Mini Project Progress Presentation: Chatbot (Artificial Intellgence Customer Care Service
100% (1)
Mini Project Progress Presentation: Chatbot (Artificial Intellgence Customer Care Service
11 pages
Face Recognition Attendance System
No ratings yet
Face Recognition Attendance System
18 pages
LP-III - Mini Project Report (ML)
No ratings yet
LP-III - Mini Project Report (ML)
15 pages
Final Report
No ratings yet
Final Report
49 pages
Currency Detector App For Visually Impaired
No ratings yet
Currency Detector App For Visually Impaired
5 pages
QR Code Menu Final
No ratings yet
QR Code Menu Final
48 pages
Campus Placement Analyzer: Using Supervised Machine Learning Algorithms
No ratings yet
Campus Placement Analyzer: Using Supervised Machine Learning Algorithms
5 pages
Project Report
100% (1)
Project Report
29 pages
Smart Dustbin
No ratings yet
Smart Dustbin
30 pages
Project Report Vehicle Management System
No ratings yet
Project Report Vehicle Management System
132 pages
For Alumni Management System: Software Requirement Specification
100% (1)
For Alumni Management System: Software Requirement Specification
15 pages
Intershipdocument 18881A12A5
No ratings yet
Intershipdocument 18881A12A5
32 pages
Sign Language Recognition Synopsis
No ratings yet
Sign Language Recognition Synopsis
4 pages
Project Synopsis
No ratings yet
Project Synopsis
8 pages
Online Bus Reservation
33% (6)
Online Bus Reservation
30 pages
An Obstacle Avoiding Vacuum Cleaner Robot (MAE 4733 Term Final Project Presentation)
No ratings yet
An Obstacle Avoiding Vacuum Cleaner Robot (MAE 4733 Term Final Project Presentation)
35 pages
Home Automation - Final Year Project
No ratings yet
Home Automation - Final Year Project
11 pages
Final Project Report - Pet Orphnage
No ratings yet
Final Project Report - Pet Orphnage
43 pages
Campus Selection Procedure Android App Project Report
No ratings yet
Campus Selection Procedure Android App Project Report
86 pages
A Project Report: in Partial Fulfillment For The Award of The Degree
No ratings yet
A Project Report: in Partial Fulfillment For The Award of The Degree
50 pages
Flight DElay Report
No ratings yet
Flight DElay Report
49 pages
Company Annual Sales Prediction Based On Advertisement Expenses
No ratings yet
Company Annual Sales Prediction Based On Advertisement Expenses
18 pages
Industrial Training Report
No ratings yet
Industrial Training Report
24 pages
Face Recognition Based Attendance System
No ratings yet
Face Recognition Based Attendance System
70 pages
19 - Crop Recommender System Using Machine Learning Approach
No ratings yet
19 - Crop Recommender System Using Machine Learning Approach
64 pages
SMS Spam Detection Using Machine Learning
No ratings yet
SMS Spam Detection Using Machine Learning
9 pages
Flight Price Prediction Capstone Project Submission 2
No ratings yet
Flight Price Prediction Capstone Project Submission 2
69 pages
Virtual Mouse Control Using Hand Class Gesture: Bachelor of Engineering Electronics and Telecommunication
No ratings yet
Virtual Mouse Control Using Hand Class Gesture: Bachelor of Engineering Electronics and Telecommunication
34 pages
File 4
No ratings yet
File 4
60 pages
Automatic Time-Table Generator: Department of Computer Science and Engineering
No ratings yet
Automatic Time-Table Generator: Department of Computer Science and Engineering
6 pages
A Project Report
No ratings yet
A Project Report
18 pages
CSE35 Project Report
No ratings yet
CSE35 Project Report
111 pages
Face Recognition Based Attendance Management System
No ratings yet
Face Recognition Based Attendance Management System
5 pages
Internship - Report Nithin
No ratings yet
Internship - Report Nithin
25 pages
Online Exam Management System
No ratings yet
Online Exam Management System
23 pages
Placement Prediction Using Various Machine Learning Models and Their Efficiency Comparison
No ratings yet
Placement Prediction Using Various Machine Learning Models and Their Efficiency Comparison
5 pages
Internship Report Core Java
100% (1)
Internship Report Core Java
46 pages
Online Rental System
No ratings yet
Online Rental System
60 pages
Project Final Report
100% (1)
Project Final Report
44 pages
Mallareddy Institute of Engineering & Technology
No ratings yet
Mallareddy Institute of Engineering & Technology
15 pages
Project Report PDF
No ratings yet
Project Report PDF
29 pages
Touchpad Plus Ver. 1.1 Class 7
From Everand
Touchpad Plus Ver. 1.1 Class 7
Nisha Batra
No ratings yet
RPT English Year 1 SK 2025 2026 by Rozayusacademy Kump B
No ratings yet
RPT English Year 1 SK 2025 2026 by Rozayusacademy Kump B
13 pages
Gender Politics and Society in Spain Routledge Advances in European Politics 1st Edition Monic Threlfall - Own the ebook now with all fully detailed content
100% (1)
Gender Politics and Society in Spain Routledge Advances in European Politics 1st Edition Monic Threlfall - Own the ebook now with all fully detailed content
47 pages
Sop Yash PDF
No ratings yet
Sop Yash PDF
3 pages
R. D. Laing: Contemporary Perspectives: References Reprints
No ratings yet
R. D. Laing: Contemporary Perspectives: References Reprints
7 pages
Dissertation Slang
100% (1)
Dissertation Slang
8 pages
Ahmed Nader Hussien Salem EL Bagory: Career Objective
No ratings yet
Ahmed Nader Hussien Salem EL Bagory: Career Objective
3 pages
(ECE 401) Electrostatics and Magnetostatics - Week 6
No ratings yet
(ECE 401) Electrostatics and Magnetostatics - Week 6
17 pages
Republic Act No. 7836: Philippine Teachers Professionalization Act of 1994
No ratings yet
Republic Act No. 7836: Philippine Teachers Professionalization Act of 1994
21 pages
Final Paper
No ratings yet
Final Paper
10 pages
Jay_Cantor
No ratings yet
Jay_Cantor
2 pages
Teacher Readiness in Multimedia Instruction: Input Towards a Multimedia Training Design for Public Elementary School Teachers of Makati
No ratings yet
Teacher Readiness in Multimedia Instruction: Input Towards a Multimedia Training Design for Public Elementary School Teachers of Makati
22 pages
Arihan Gupta - Research Assessment 2
No ratings yet
Arihan Gupta - Research Assessment 2
2 pages
G11 - Q1 - M6 - Significant People (Modular)
No ratings yet
G11 - Q1 - M6 - Significant People (Modular)
5 pages
Reading and Writing Lesson Plan
No ratings yet
Reading and Writing Lesson Plan
11 pages
The Philippine Flag: Student'S Activity Sheet
No ratings yet
The Philippine Flag: Student'S Activity Sheet
11 pages
Inspire Award Manak 2020-2021 Result
No ratings yet
Inspire Award Manak 2020-2021 Result
68 pages
9th STD Tamil Sample Study Materials
No ratings yet
9th STD Tamil Sample Study Materials
46 pages
OSS-Lesson plan (1)
No ratings yet
OSS-Lesson plan (1)
4 pages
Iwb Lesson Plan-1st Grade - Telling Time
No ratings yet
Iwb Lesson Plan-1st Grade - Telling Time
2 pages
20241502430956tangazo La Nafasi Ya Kazi TRC and MSCL
No ratings yet
20241502430956tangazo La Nafasi Ya Kazi TRC and MSCL
16 pages
Engineering Colleges: KAB Educational Consultants
No ratings yet
Engineering Colleges: KAB Educational Consultants
88 pages
Anthology
No ratings yet
Anthology
136 pages
0893 Lower Secondary Science Stage 7 Scheme of Work - tcm143-595695
0% (1)
0893 Lower Secondary Science Stage 7 Scheme of Work - tcm143-595695
103 pages
Concept and Sources of Dharma - Vedas, Dharmsutras & Dharmashashtra
No ratings yet
Concept and Sources of Dharma - Vedas, Dharmsutras & Dharmashashtra
10 pages
DERIVATIVE SENTENCES
No ratings yet
DERIVATIVE SENTENCES
4 pages
WEEK 4_ Naziirah
No ratings yet
WEEK 4_ Naziirah
2 pages
Lesson Plan Punctuation
No ratings yet
Lesson Plan Punctuation
5 pages
Raghav Khandelwal - Batch 2025 - B.tech - ECE
No ratings yet
Raghav Khandelwal - Batch 2025 - B.tech - ECE
2 pages
Carinosa OFFICIAL
No ratings yet
Carinosa OFFICIAL
8 pages
3i's Worksheet 4 Nikko
No ratings yet
3i's Worksheet 4 Nikko
6 pages

Project Report - Credit Card Fraud Detection

Uploaded by

Project Report - Credit Card Fraud Detection

Uploaded by

PROJECT REPORT - CREDIT CARD FRAUD DETECTION

Submitted by: Snehal Jain

Preprocessing 1. Data Loading

Evaluation Methods 1. Confusion Matrix

Classifier Performance Metrics 1. Naive Bayes Classifier

● Plot confusion matrix and ROC curve for

Experimental Setup and Dataset Handling

The data was cross validated with a fold value of 5

1. Naive Bayes Classifier Report: Accuracy: 0.92

Class/Report Precision Recall F1-score

Class 0 0.87 0.98 0.92

Class 1 0.98 0.85 0.91

2. KNN Classifier Report: Accuracy: 0.94

Class/Report Precision Recall F1-score

Class 0 0.92 0.96 0.94

Class 1 0.95 0.92 0.94

3. Decision Tree Classifier Report: Accuracy: 1.00

Class/Report Precision Recall F1-score

Class 0 1.0 1.0 1.0

Class 1 1.0 1.0 1.0

You might also like