100% found this document useful (2 votes)

1K views31 pages

Project +Sweta+Kumari+ +FRA+Milestone+1+ July+ 2021

The document discusses building a model to predict company defaults using financial data. It outlines preprocessing steps like outlier treatment, missing value imputation, and transforming the target variable. Univariate and bivariate analyses are conducted on important variables before splitting the data into train and test. A logistic regression model is built on the train set and validated on the test set, reporting performance metrics and interpreting the results.

Uploaded by

sweta kumari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

1K views31 pages

Project +Sweta+Kumari+ +FRA+Milestone+1+ July+ 2021

Uploaded by

sweta kumari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 31

Finance and Risk

Analytics

Name: Sweta Kumari

PGP-DSBA Online
July’ 21
Date: 08/05/2022

0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Table of Contents
Problem : Company Analysis
1. Outlier Treatment………………………………………………………………………………………………………….
2. Missing Value Treatment …………………11
3. Transform Target variable into 0 and 1……………………………………………………………………………………14
4. Univariate (4 marks) & Bivariate ( 6marks) analysis with proper interpretation. (You may choose
to include only those variables which were significant in the model building)…………………………………
15
5. Train Test Split……………………………………………………………………………………………………………………………20
6. Build Logistic Regression Model (using statsmodel library) on most important variables on Train
Dataset and choose the optimum cutoff. Also showcase your model building
approach……………………………………………………………………………………………………………………………………...21
7. Validate the Model on Test Dataset and state the performance matrices. Also state
interpretation from the model ……………………………………………………………………………………………..…23
List of Figure.
Figure 1: Outlier Graph
Figure 2: Company data before scaling
Figure 3: Company data after scaling10
Figure 4: Visual Inspect of missing data 11
Figure 5:Corrleation Graph
Figure 6 :Filter Correlation Graph 13
Figure 7:Univaritant Analysis
Figure 8:Bivariant Analysis
Problem Statement:

Businesses or companies can fall prey to default if they are not able to keep up their debt
obligations. Defaults will lead to a lower credit rating for the company which in turn reduces its
chances of getting credit in the future and may have to pay higher interests on existing debts as well
as any new obligations. From an investor's point of view, he would want to invest in a company if it
is capable of handling its financial obligations, can grow quickly, and is able to manage the growth
scale.

A balance sheet is a financial statement of a company that provides a snapshot of what a company
owns, owes, and the amount invested by the shareholders. Thus, it is an important tool that helps
evaluate the performance of a business.

Data that is available includes information from the financial statement of the companies for the
previous year (2015). Also, information about the Net worth of the company in the following year
(2016) is provided which can be used to drive the labeled field.

Explanation of data fields available in Data Dictionary, 'Credit Default Data Dictionary.xlsx'

Introduction:

We need to create a default variable that should take the value of 1 when net worth next year is
negative & 0 when net worth next year is positive.

Question 1.1 Outlier Treatment

Ans: We will import all the necessary libraries in the Jupiter notebook and create Dataframe for the
dataset . I have used the original dataset file for importing and making analysis of it . I tried
converting into CSV file, but data got tampered and data type changed and created lot of issue while
making analysis . So , its advisable to work with original dataset file . So finally used excel file only .

Created Dataframe for the dataset as df for further analysis .

Checking top 5 records :

Checking bottom 5 records:

Checking the shape of the data frame:

The number of rows (observations) is 3586
The number of columns (variables) is 67

Information about the data frame:

Total of 3586 rows and Data type is 63 Float64 , 3 int 64 , 1 object and it does not have any null
values

<class 'pandas.core.frame.DataFrame'>
Range Index: 3586 entries, 0 to 3585
Data columns (total 67 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Co_Code 3586 non-null int64
1 Co_Name 3586 non-null object
2 Networth_Next_Year 3586 non-null float64
3 Equity_Paid_Up 3586 non-null float64
4 Networth 3586 non-null float64
5 Capital_Employed 3586 non-null float64
6 Total_Debt 3586 non-null float64
7 Gross_Block_ 3586 non-null float64
8 Net_Working_Capital_ 3586 non-null float64
9 Current_Assets_ 3586 non-null float64
10 Current_Liabilities_and_Provisions_ 3586 non-null float64
11 Total_Assets_to_Liabilities_ 3586 non-null float64
12 Gross_Sales 3586 non-null float64
13 Net_Sales 3586 non-null float64
14 Other_Income 3586 non-null float64
15 Value_Of_Output 3586 non-null float64
16 Cost_of_Production 3586 non-null float64
17 Selling_Cost 3586 non-null float64
18 PBIDT 3586 non-null float64
19 PBDT 3586 non-null float64
20 PBIT 3586 non-null float64
21 PBT 3586 non-null float64
22 PAT 3586 non-null float64
23 Adjusted_PAT 3586 non-null float64
24 CP 3586 non-null float64
25 Revenue_earnings_in_forex 3586 non-null float64
26 Revenue_expenses_in_forex 3586 non-null float64
27 Capital_expenses_in_forex 3586 non-null float64
28 Book_Value_Unit_Curr 3586 non-null float64
29 Book_Value_Adj._Unit_Curr 3582 non-null float64
30 Market_Capitalisation 3586 non-null float64
31 CEPS_annualised_Unit_Curr 3586 non-null float64
32 Cash_Flow_From_Operating_Activities 3586 non-null float64
33 Cash_Flow_From_Investing_Activities 3586 non-null float64
34 Cash_Flow_From_Financing_Activities 3586 non-null float64
35 ROG-Net_Worth_perc 3586 non-null float64
36 ROG-Capital_Employed_perc 3586 non-null float64
37 ROG-Gross_Block_perc 3586 non-null float64
38 ROG-Gross_Sales_perc 3586 non-null float64
39 ROG-Net_Sales_perc 3586 non-null float64
40 ROG-Cost_of_Production_perc 3586 non-null float64
41 ROG-Total_Assets_perc 3586 non-null float64
42 ROG-PBIDT_perc 3586 non-null float64
43 ROG-PBDT_perc 3586 non-null float64
44 ROG-PBIT_perc 3586 non-null float64
45 ROG-PBT_perc 3586 non-null float64
46 ROG-PAT_perc 3586 non-null float64
47 ROG-CP_perc 3586 non-null float64
48 ROG-Revenue_earnings_in_forex_perc 3586 non-null float64
49 ROG-Revenue_expenses_in_forex_perc 3586 non-null float64
50 ROG-Market_Capitalisation_perc 3586 non-null float64
51 Current_Ratio[Latest] 3585 non-null float64
52 Fixed_Assets_Ratio[Latest] 3585 non-null float64
53 Inventory_Ratio[Latest] 3585 non-null float64
54 Debtors_Ratio[Latest] 3585 non-null float64
55 Total_Asset_Turnover_Ratio[Latest] 3585 non-null float64
56 Interest_Cover_Ratio[Latest] 3585 non-null float64
57 PBIDTM_perc[Latest] 3585 non-null float64
58 PBITM_perc[Latest] 3585 non-null float64
59 PBDTM_perc[Latest] 3585 non-null float64
60 CPM_perc[Latest] 3585 non-null float64
61 APATM_perc[Latest] 3585 non-null float64
62 Debtors_Velocity_Days 3586 non-null int64
63 Creditors_Velocity_Days 3586 non-null int64
64 Inventory_Velocity_Days 3483 non-null float64
65 Value_of_Output_to_Total_Assets 3586 non-null float64
66 Value_of_Output_to_Gross_Block 3586 non-null float64
dtypes: float64(63), int64(3), object(1)
memory usage: 1.8+ MB

Checking Missing Values :

Summary of the dataset:

Observation: Using the company data, we compiled a descriptive summary. We can see the mean,
standard deviation, and percentile details for all columns since most of the data is continuous.

Checking duplicate value in the dataset :

There are no duplicate records present in the dataset .

Outliers:

 Below, we have removed the default column and created two different datasets - df_X, df_Y.
 The values that are outside of the upper limit (UL) and lower limit (LL) are detected, and null
values are imputed to the values that are above and below these limits.
 We are performing concatenation on the datasets and also removing the ‘Networth Next
Year’, as that will significantly influence the default variable, which is derived from the
‘Networth Next Year’ column variable.

Fig 1
Observation: Significant number of outliers are present for almost all the variables, the actual
percentage of data which are present above and below the third and first quantiles respectively.

Dataset before scaling

Fig 2

Dataset after scaling

Fig 3

Question 1.2 Missing Value Treatment

Ans:
 There are some missing values in the dataset that will be handled in the next steps.
 The data set, which consisted of 3586 rows, did not have many missing values to begin with.
 The total number of missing records was 118 in the entire dataset.

Observation:
 Null Values are present in many columns ,however significant number was present in
“Inventory _Velocity _Days “ .

Visual Inspect of Missing data :

Fig 4
Missing Value treatment : Many columns had null values, but significantly maximum number was
present in "Inventory_Vel_Days" column. we treated the column were imputed with the
average value

Correlation Matrix on Scaled Data :

Fig 5

Fig 5
Question 1.3 Transform Target variable into 0 and 1
Ans : Creating a Target Variable Default using Networth Next Year
 Based on the project notes, a new dependent variable named "Default" was created.
Criteria -
1 denotes the case of negative net worth for the company next year
0 denotes the company will have a positive net worth next year
 Made use of np.where function to achieve this.
Creating a binary target variable using 'Networth_Next_Year'.
 We checked whether the data was split based on this dependent variable after generating
the dependent column. This is illustrated in the chart below.
 Distinct values of the dependent variable – 0 and 1.

Lets see what does” Default “ variable look like

Fig 6

Question 1.4 Univariate (4 marks) & Bivariate ( 6marks) analysis with proper interpretation.
(You may choose to include only those variables which were significant in the model
building)
Ans :
 Univariate analysis is the simplest form of analyzing data. Uni means one, so in other words
the data has only one variable. Univariate data requires to analyze each variable separately.
Data is gathered for the purpose of answering a question, or more specifically, a research
question.
 Bivariate analysis is one of the simplest forms of quantitative (statistical) analysis. It
involves the analysis of two variables (often denoted as X, Y), for the purpose of determining
the empirical relationship between them.
Univariant Analysis :

Fig 7
 Histplot & Boxplot has been created for the numerical variables which have importance
w.r.t. features in the dataset.

Skewness in the data before imputation

Skewness in the data after imputation

We have 3 clusters 0,1,2 To find the optimal number of clusters, we can use k-elbow method.

Bivariant Analysis :

Gross _Sales Vs Net_Sales

Networth Vs Capital_Employed

Networth Vs Cost_of Production

Fig 8
Multivariant Analysis :
1. Furthermore, we conducted a multivariate analysis on the data to see if there are any
correlation that are observed within the data.
2. Using the correlation function and seaborn clustermap, we plotted correlations and
obtained a better understanding of the data.
3. Our observations were as follows:networth and networth next year were highly correlated.
Variables pertaining to Rate of Growth were also highly correlated.
4. The analysis shows that there is a collinearity issue with this data set.
Before we proceed to the logistic regression model building exercise, the dataset will be scaled. It is
a step of data pre-processing that applies to independent variables to normalize the data within a
specific range. It brings all of the data in the range of 0 and 1.

Question 1.5 Train Test Split

Ans: The train-test split procedure is used to estimate the performance of machine learning
algorithms when they are used to make predictions on data not used to train the model.

It is a fast and easy procedure to perform, the results of which allow you to compare the
performance of machine learning algorithms for your predictive modeling problem. Although simple
to use and interpret, there are times when the procedure should not be used, such as when you
have a small dataset and situations where additional configuration is required, such as when it is
used for classification and the dataset is not balanced.

The data set is split into response variables (data with independent variables) and predictor variables
(data with a predictor variable).
After splitting the training and testing sets in the ratio 67:33,try to the fit the model into the testing
and training sets and find out the performance of those sets.

After splitting the dataset let’s see the classification confusion matrix .

Confusion matrix and classification report of train data:

Confusion matrix and classification report of test data:

Question 1.6 Build Logistic Regression Model (using statsmodel library) on most important
variables on Train Dataset and choose the optimum cutoff. Also showcase your model building
approach

Ans : Logistic regression is a process of modeling the probability of a discrete outcome given an
input variable. The most common logistic regression models a binary outcome; something that can
take two values such as true/false, yes/no, and so on.
For model building, we try to approach recursive feature elimination and we want to select top 15
features that would contribute to the model well.

For Modeling we will use Logistic Regression with recursive feature elimination.

Below are the highest contributing independent variables to the model building.
Now let’s fit the model and find classification matrix :
Question 1.7 Validate the Model on Test Dataset and state the performance matrices. Also state
interpretation from the model

Ans : We train the model and then validate the model in both the training and testing sets.

 For both sets, we are plotting the confusion matrix and classification report.
The training data shows high precision and accuracy, but the recall seems to be lower.
 As long as the recall value is improved, we will get True Positives (TP), which means that we
will be able to correctly identify defaulters because if we miss one, the bank will have to pay
higher interest on existing debts and the cash flow of the bank will not be regularized.

Confusion matrix and Classification Report for the training set:

Confusion Matrix :
[[2134 17]
[ 81 170]]

Classification Report :
Confusion matrix and Classification Report for the test set:

Confusion matrix :

Classification Report

Observations:
 Over 94% accuracy was achieved while recall, precision and f1 score were also very high at
99,94% and 97% respectively.
 We observed accuracy and precision for both models (ratio of True) Positives to the entire
Positives) is on the higher side, but recall seems to be the main problem.
It is likely that we had an imbalanced dataset for our model.
 Therefore, we have balanced our default values (the ratio of 0's to 1's must be greater). We
had only 11% of the defaults in our dataset, so we used SMOTE before fitting it to the model.

We see poor recall score for both train and test. Since only 10.8% of the total data had defaults, we
will now try to balance the data before fitting the model.

I have imported SMOTE and Imblearn function for better performance analysis.
pred_train_smote = selector_smote.predict(X_res)
pred_test_smote = selector_smote.predict(X_test)

 The recall has improved greatly, which means the chances of identifying the product have
increased.
 There have been significant improvements in defaulters and there is less chance of the
model missing out on any potential default candidates/companies to our bank.

Classification Report for the Training Set:

Classification Report for the Test Set:

Finally, we are able to achieve a descent recall value without overfitting. Considering the
opportunities such as outliers, missing values and correlated features this is a fairly good model. It
can be improved if we get better quality data where the features explaining the default are not
missing to this extent. Of course, we can try other techniques which are not sensitive towards
missing values and outliers.

FRA Project Milestone 1 Overview
90% (21)
FRA Project Milestone 1 Overview
44 pages
FRA Milestone1 - Maminulislam
100% (4)
FRA Milestone1 - Maminulislam
23 pages
Predicting Credit Risk Using Financial Data
100% (3)
Predicting Credit Risk Using Financial Data
42 pages
Project FRA Milestone1 JPY Nikita Chaturvedi 05.05.2022 Jupyter Notebook PDF
76% (21)
Project FRA Milestone1 JPY Nikita Chaturvedi 05.05.2022 Jupyter Notebook PDF
102 pages
Sonal Fra Milestone1 v1
No ratings yet
Sonal Fra Milestone1 v1
20 pages
India Credit Risk Model Analysis
100% (4)
India Credit Risk Model Analysis
19 pages
Gowtham Mra 2
No ratings yet
Gowtham Mra 2
18 pages
RFM Analysis for Auto Parts Sales Insights
100% (8)
RFM Analysis for Auto Parts Sales Insights
30 pages
Data Insights for Auto Parts Company
100% (3)
Data Insights for Auto Parts Company
29 pages
Project Report - FRA V1.0
71% (7)
Project Report - FRA V1.0
28 pages
FRA Assignment
100% (1)
FRA Assignment
31 pages
MRA Project MIlestone1
83% (18)
MRA Project MIlestone1
29 pages
RFM Analysis for Customer Segmentation
100% (7)
RFM Analysis for Customer Segmentation
43 pages
MRA Project Milestone 2
71% (17)
MRA Project Milestone 2
20 pages
FRA Project Data Analysis Overview
100% (2)
FRA Project Data Analysis Overview
27 pages
Business Report M2 PDF
100% (2)
Business Report M2 PDF
14 pages
Vaibhav Kumar MRA Project Milestone 2
No ratings yet
Vaibhav Kumar MRA Project Milestone 2
18 pages
Grocery Store Combo Meal Analysis
100% (2)
Grocery Store Combo Meal Analysis
31 pages
Grocery Store Combo Meal Suggestions
100% (4)
Grocery Store Combo Meal Suggestions
24 pages
MRA Project 2: Sudesh Yadav
100% (2)
MRA Project 2: Sudesh Yadav
23 pages
Life Insurance Agent Bonus Analysis
80% (5)
Life Insurance Agent Bonus Analysis
34 pages
Mra Project: Prepared By: Deepak Batabyal Date:-09 Feb 2020
100% (2)
Mra Project: Prepared By: Deepak Batabyal Date:-09 Feb 2020
32 pages
MRA Project Milestone 2
92% (12)
MRA Project Milestone 2
26 pages
MRA Project Milestone 1 - Maminulislam
83% (6)
MRA Project Milestone 1 - Maminulislam
30 pages
MRA Project Milesone-1: BY-Shorya Goel PGP Dsba Oct - 20 B
92% (25)
MRA Project Milesone-1: BY-Shorya Goel PGP Dsba Oct - 20 B
35 pages
Mra Project
No ratings yet
Mra Project
12 pages
MRA Project Milestone 1 PDF
No ratings yet
MRA Project Milestone 1 PDF
1 page
Boston Condo Sale Story
0% (1)
Boston Condo Sale Story
11 pages
Boston Condo Sales Analysis
No ratings yet
Boston Condo Sales Analysis
61 pages
Grocery Store Combo Offer Analysis
100% (5)
Grocery Store Combo Offer Analysis
40 pages
Project 7 - DVT - Manoj
No ratings yet
Project 7 - DVT - Manoj
1 page
Capstone Proect Notes 2
100% (2)
Capstone Proect Notes 2
16 pages
Business: Capstone Project House Price Prediction Project Note-1
88% (8)
Business: Capstone Project House Price Prediction Project Note-1
40 pages
Marketing & Retail Analytics Assignment: Cafe Data - 02
100% (1)
Marketing & Retail Analytics Assignment: Cafe Data - 02
11 pages
Education - Post 12th Standard - CSV
88% (16)
Education - Post 12th Standard - CSV
11 pages
Auto Parts Sales Analysis & Insights
100% (1)
Auto Parts Sales Analysis & Insights
47 pages
Credit Risk and Stock Analysis Insights
100% (4)
Credit Risk and Stock Analysis Insights
19 pages
Manali Andyal 26 05 2025 FRA Part A Guided Project Report PDF
100% (1)
Manali Andyal 26 05 2025 FRA Part A Guided Project Report PDF
19 pages
Problem Statement
100% (1)
Problem Statement
17 pages
Customer Buying Patterns Analysis
100% (1)
Customer Buying Patterns Analysis
30 pages
Arnab Chowdhury DM
75% (4)
Arnab Chowdhury DM
14 pages
Facebook Comment Prediction Guide
100% (1)
Facebook Comment Prediction Guide
12 pages
Life Insurance Sales Insights and EDA
100% (1)
Life Insurance Sales Insights and EDA
16 pages
Problem 2
100% (1)
Problem 2
10 pages
Data Analysis Executive Summary
No ratings yet
Data Analysis Executive Summary
17 pages
MRA Assignment: by Chitra Mukadam
100% (2)
MRA Assignment: by Chitra Mukadam
19 pages
Café Chain POS Data Analysis Insights
83% (6)
Café Chain POS Data Analysis Insights
35 pages
Anushi Sparkling
100% (4)
Anushi Sparkling
70 pages
India Credit Risk Model Development
No ratings yet
India Credit Risk Model Development
14 pages
Data Visualization Project Nilanjan Das PDF
100% (1)
Data Visualization Project Nilanjan Das PDF
26 pages
Auto Parts Customer Segmentation Analysis
60% (5)
Auto Parts Customer Segmentation Analysis
20 pages
MRA+Project+-+Milestone+2+ Sweta+Kumari+ July+2021
100% (4)
MRA+Project+-+Milestone+2+ Sweta+Kumari+ July+2021
29 pages
Project - FRA MILESTONE 1
No ratings yet
Project - FRA MILESTONE 1
3 pages
FRA Main Project Part A Guided
No ratings yet
FRA Main Project Part A Guided
62 pages
Corporate Credit Rating Analysis
No ratings yet
Corporate Credit Rating Analysis
31 pages
Task 2 Exploratory Data Analysis
No ratings yet
Task 2 Exploratory Data Analysis
5 pages
Credit Default Practice PDF
100% (1)
Credit Default Practice PDF
27 pages
Data Visualization with Seaborn Guide
No ratings yet
Data Visualization with Seaborn Guide
38 pages
Eny Eg LH5 PP LWRJQ AJCb 8 S65 HT0 Ty 8 Q
No ratings yet
Eny Eg LH5 PP LWRJQ AJCb 8 S65 HT0 Ty 8 Q
9 pages
Aosdijfpqoiew
No ratings yet
Aosdijfpqoiew
6 pages
Snowchat - Leveraging OpenAI's GPT For SQL Queries
No ratings yet
Snowchat - Leveraging OpenAI's GPT For SQL Queries
9 pages
HCNA Storage V3 BSSN Building The Structure of Storage Network Training Material PDF
100% (1)
HCNA Storage V3 BSSN Building The Structure of Storage Network Training Material PDF
497 pages
Really Big Elephants: Data Warehousing Postgresql
No ratings yet
Really Big Elephants: Data Warehousing Postgresql
62 pages
ADSO Request Deletion Fails With: Object &1 ADSO Is Locked by User &2
No ratings yet
ADSO Request Deletion Fails With: Object &1 ADSO Is Locked by User &2
2 pages
AI in Academic Writing: Ethics & Tools
No ratings yet
AI in Academic Writing: Ethics & Tools
3 pages
Contoh Tugas Laporan Akhir PBO
No ratings yet
Contoh Tugas Laporan Akhir PBO
28 pages
Predictive Analytics in Healthcare
100% (1)
Predictive Analytics in Healthcare
8 pages
The Basic Syntax of AWK
No ratings yet
The Basic Syntax of AWK
18 pages
Class 9 Part B Unit 1
No ratings yet
Class 9 Part B Unit 1
13 pages
Distributed Database Design Basics
No ratings yet
Distributed Database Design Basics
6 pages
AST-0060878 Wayne Eckerson Research Report Big Data Analytics Final
No ratings yet
AST-0060878 Wayne Eckerson Research Report Big Data Analytics Final
57 pages
Class 12 IT Practical File 2023-24
100% (1)
Class 12 IT Practical File 2023-24
15 pages
Socio-Economic Factors of Cashew Farmers
No ratings yet
Socio-Economic Factors of Cashew Farmers
3 pages
s3 Presentation 1
No ratings yet
s3 Presentation 1
15 pages
Data Privacy Act 2012 Overview
No ratings yet
Data Privacy Act 2012 Overview
40 pages
Paper Mech
No ratings yet
Paper Mech
25 pages
DVB Rcs
No ratings yet
DVB Rcs
195 pages
Network Management Tools
No ratings yet
Network Management Tools
14 pages
15 - Chapter 7 - Triggers
No ratings yet
15 - Chapter 7 - Triggers
14 pages
Understanding SDMX
No ratings yet
Understanding SDMX
14 pages
Applied and Action Research
No ratings yet
Applied and Action Research
12 pages
IT Data Backup Essentials
No ratings yet
IT Data Backup Essentials
6 pages
Scope and Delimitations (LimitationS)
100% (1)
Scope and Delimitations (LimitationS)
9 pages
SCADA-GIS Integration Specifications
No ratings yet
SCADA-GIS Integration Specifications
1 page
Linux Command Exercises Overview
No ratings yet
Linux Command Exercises Overview
19 pages
Kanika Project
No ratings yet
Kanika Project
20 pages
Microsoft Access Practice Exam 1: Instructions To Download and Unzip The File Needed To Perform This Practice Exam
100% (1)
Microsoft Access Practice Exam 1: Instructions To Download and Unzip The File Needed To Perform This Practice Exam
2 pages
Data Engineering with Scala and Spark
No ratings yet
Data Engineering with Scala and Spark
10 pages
Chapter 1 Revised
No ratings yet
Chapter 1 Revised
13 pages
Chap15-Defining & Designing Qualitative Research
No ratings yet
Chap15-Defining & Designing Qualitative Research
32 pages

Project +Sweta+Kumari+ +FRA+Milestone+1+ July+ 2021

Uploaded by

Project +Sweta+Kumari+ +FRA+Milestone+1+ July+ 2021

Uploaded by

Finance and Risk

Name: Sweta Kumari

Question 1.1 Outlier Treatment

Created Dataframe for the dataset as df for further analysis .

Checking top 5 records :

Checking the shape of the data frame:

Information about the data frame:

Checking Missing Values :

Summary of the dataset:

Checking duplicate value in the dataset :

There are no duplicate records present in the dataset .

Dataset before scaling

Dataset after scaling

Question 1.2 Missing Value Treatment

Visual Inspect of Missing data :

Correlation Matrix on Scaled Data :

Lets see what does” Default “ variable look like

Skewness in the data before imputation

Skewness in the data after imputation

Gross _Sales Vs Net_Sales

Networth Vs Cost_of Production

Question 1.5 Train Test Split

Confusion matrix and classification report of train data:

Confusion matrix and Classification Report for the training set:

Classification Report for the Training Set:

Classification Report for the Test Set:

You might also like