Predictive Modeling
Predictive Modeling
1. Introduction
2. Data Overview
2.1 Dataset Description
2.2 Data Exploration
3. Data Preprocessing
3.1 Data Cleaning
3.2 Feature Engineering
3.3 Train-Test Split
4. Model Development
4.1 Logistic Regression
4.1.1 Model Training
4.1.2 Model Evaluation
4.2 Linear Discriminant Analysis (LDA)
4.2.1 Model Training
4.2.2 Model Evaluation
4.3 Decision Tree
4.3.1 Model Training
4.3.2 Model Evaluation
5. Model Performance Comparison
5.1 Training Set Performance
5.2 Testing Set Performance
6. Feature Importance Analysis
7. Business Recommendations
8. Conclusion
9. References
Problem Definition: The primary objective is to analyze and build a machine learning
model to help identify which leads are more likely to convert to paid customers for
ExtraaLearn. This involves:
Analyzing the dataset to understand the features and their relevance to lead conversion.
Building a predictive model to identify leads with a higher probability of conversion.
Determining the factors driving the lead conversion process. Creating a profile of leads
likely to convert based on the insights gained from the model.
# Libraries to help with reading and manipulating data
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore")
Shape of the dataset: (4612, 15)
status
count 4612.00000
mean 0.29857
std 0.45768
min 0.00000
25% 0.00000
50% 0.00000
75% 1.00000
max 1.00000
Univariate analysis
status 0 1 All
current_occupation
All 3235 1377 4612
Professional 1687 929 2616
Unemployed 1058 383 1441
Student 490 65 555
status 0 1 All
first_interaction
All 3235 1377 4612
Website 1383 1159 2542
Mobile App 1852 218 2070
status 0 1 All
profile_completed
All 3235 1377 4612
High 1318 946 2264
Medium 1818 423 2241
Low 99 8 107
status 0 1 All
last_activity
All 3235 1377 4612
Email Activity 1587 691 2278
Website Activity 677 423 1100
Phone Activity 971 263 1234
status 0 1 All
print_media_type1
All 3235 1377 4612
No 2897 1218 4115
Yes 338 159 497
status 0 1 All
print_media_type2
All 3235 1377 4612
No 3077 1302 4379
Yes 158 75 233
status 0 1 All
digital_media
All 3235 1377 4612
No 2876 1209 4085
Yes 359 168 527
status 0 1 All
educational_channels
All 3235 1377 4612
No 2727 1180 3907
Yes 508 197 705
status 0 1 All
referral
All 3235 1377 4612
No 3205 1314 4519
Yes 30 63 93
#Data Preparation for modeling
0 0.70415
1 0.29585