0% found this document useful (0 votes)
88 views

IEEE CIS Fraud Detection: Kaveri Biswas (DT2019003), Keerthana P Girijan (DT2019004), Shefali Bedarkar (DT2019008)

The document summarizes a fraud detection project using credit card transaction data. It outlines the features in the transaction and identity data, including transaction amount, product code, payment card details, and engineered features. It also describes exploratory data analysis of the data, finding it is sparse with 3.52% fraudulent transactions. Missing values were observed in some features. New time-based features like hour, day, and month were created to analyze patterns in fraudulent transactions over time. Fraud was found to be higher between 4am-12pm and lowest from 2pm-4pm, with the highest from 7am-10am.

Uploaded by

Kavya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views

IEEE CIS Fraud Detection: Kaveri Biswas (DT2019003), Keerthana P Girijan (DT2019004), Shefali Bedarkar (DT2019008)

The document summarizes a fraud detection project using credit card transaction data. It outlines the features in the transaction and identity data, including transaction amount, product code, payment card details, and engineered features. It also describes exploratory data analysis of the data, finding it is sparse with 3.52% fraudulent transactions. Missing values were observed in some features. New time-based features like hour, day, and month were created to analyze patterns in fraudulent transactions over time. Fraud was found to be higher between 4am-12pm and lowest from 2pm-4pm, with the highest from 7am-10am.

Uploaded by

Kavya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 7

IEEE CIS Fraud Detection

Team: Seekers
Kaveri Biswas (DT2019003), Keerthana P Girijan (DT2019004),
Shefali Bedarkar (DT2019008)
Agenda

• Project Description
• Features
• Exploratory Data Analysis
• Missingness and Imputation
• PCA and Feature Engineering
• Balancing Techniques
• Modeling and Score
• Future Work
Project Description

• The dataset of credit card transactions is provided by the Vesta Corporation, said to be world’s
leading payment service company
• The dataset divided into two files, transaction and identity for both train and test
• Train dataset: 354324 x 434; Test dataset: 236216 x 433
• ‘isFraud’ is the binary target variable
Features
• Transaction features:
• TransactionDT: timedelta from a given reference datatime (not an actual timestamp)
• TransactionAMT: transaction amount paid in USD
• ProductCD: product code, the product for each transaction
• card1 – card6: payment card information, such as card type, card category, issue bank, country, etc
• addr: address
• dist: distance
• P_ and R_ emaildomain: purchaser and recipient email domain
• C1-C14: counting, such as how many addresses are found to be associated with the payment card, etc. The actual
meaning is masked
• D1-D15: timedelta, such as days between previous transaction, etc.
• M1-M9: match, such as names on card and address, etc.
• Vxxx: Vesta engineered rich features, including ranking, counting, and other entity relations.
• Identity Features:
• Categorical Features: DeviceType, DeviceInfo, id_12 – id_38
Exploratory Data Analysis (EDA)
• While conducting EDA, we found that the data was sparse
• Only 3.52% of the total transactions were positively classified as ‘isFraud’
• V and id features in train data have more than 70% missing values
• Another observation we made was of ‘TransactionDT’. Both train and test data details had been
taken at the same time but train amount values more than test data
• We created new columns like hours, day, week, and month to take a closer look at the time and
target
• It seems that in the hours from 4am to 12pm the fraction of fraudulent transaction is significantly
higher than other hours. And from hour 2pm to 4pm, the fractions of fraud is the lowest. While
from 7am to 10am the fraction is the highest. So we can create another new feature, classifying
time periods into different levels of warning sign in terms of their fraud fraction.

You might also like