0% found this document useful (0 votes)

30 views

Machine Learning 2 Working-pages-Deleted

The document discusses the challenges faced by US businesses in attracting qualified foreign talent and outlines the role of the Office of Foreign Labor Certification (OFLC) in processing visa applications. It emphasizes the need for a machine learning solution to streamline the visa approval process and presents a detailed analysis of the data, model building, and evaluation of various classification models. The final recommendation is to use the Gradient Boosting model with oversampled data due to its high accuracy and balanced performance metrics.

Uploaded by

murali.dhiviya96

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views

Machine Learning 2 Working-pages-Deleted

Uploaded by

murali.dhiviya96

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

MACHINE LEARNING 2

MODEL BUILDING AND TUNING

OFLC EASYVISA

BUSINESS REPORT

PGPDSBA.O. AUG24.A

DHIVIYA MURALIDHARAN
Problem Statement
Business communities in the United States are facing high demand for human resources, but one of
the constant challenges is identifying and attracting the right talent, which is perhaps the most
important element in remaining competitive. Companies in the United States look for hard-working,
talented, and qualified individuals both locally as well as abroad.

The Immigration and Nationality Act (INA) of the US permits foreign workers to come to the United
States to work on either a temporary or permanent basis. The act also protects US workers against
adverse impacts on their wages or working conditions by ensuring US employers' compliance with
statutory requirements when they hire foreign workers to fill workforce shortages. The immigration
programs are administered by the Office of Foreign Labor Certification (OFLC).

OFLC processes job certification applications for employers seeking to bring foreign workers into the
United States and grants certifications in those cases where employers can demonstrate that there
are not sufficient US workers available to perform the work at wages that meet or exceed the wage
paid for the occupation in the area of intended employment.

Objective
In FY 2016, the OFLC processed 775,979 employer applications for 1,699,957 positions for temporary
and permanent labor certifications. This was a nine percent increase in the overall number of
processed applications from the previous year. The process of reviewing every case is becoming a
tedious task as the number of applicants is increasing every year.

The increasing number of applicants every year calls for a Machine Learning based solution that can
help in shortlisting the candidates having higher chances of VISA approval. OFLC has hired the firm
EasyVisa for data-driven solutions. You as a data scientist at EasyVisa have to analyze the data
provided and, with the help of a classification model:

1. Facilitate the process of visa approvals.

2. Recommend a suitable profile for the applicants for whom the visa should be certified or
denied based on the drivers that significantly influence the case status.

Data Description
The data contains the different attributes of the employee and the employer. The detailed data
dictionary is given below.

• case_id: ID of each visa application

• continent: Information of continent the employee

• education_of_employee: Information of education of the employee

• has_job_experience: Does the employee has any job experience? Y= Yes; N = No

• requires_job_training: Does the employee require any job training? Y = Yes; N = No

• no_of_employees: Number of employees in the employer's company

• yr_of_estab: Year in which the employer's company was established

• region_of_employment: Information of foreign worker's intended region of employment in
the US.

• prevailing_wage: Average wage paid to similarly employed workers in a specific occupation

in the area of intended employment. The purpose of the prevailing wage is to ensure that
the foreign worker is not underpaid compared to other workers offering the same or similar
service in the same area of employment.

• unit_of_wage: Unit of prevailing wage. Values include Hourly, Weekly, Monthly, and Yearly.

• full_time_position: Is the position of work full-time? Y = Full-Time Position; N = Part-Time

Position

• case_status: Flag indicating if the Visa was certified or denied

Data Overview
Data Type

Statistical summary
Observations and Insights
• Data has 25840 rows and 12 columns which gives information about the employees and
employer

• It has no duplicate or missing values

• Categorical variables give information about employee’s education, job experience, whether
the employee requires training or not, continent, etc.

• Numerical variables give information about the number of employees in the company, year
of establishment of the company, prevailing wage, etc.

• case_id column was removed, as it had only unique values

• No_of_employees had negative number, which were treated with absolute values, as
company cannot have negative counting of employee

• Majority of the employees belong to Asia continent, Northeast region, with a Bachelor’s
degree and has prior work experience and does need Job training and are paid annually.

• Employers range between the years 1800-2016, and also company size (employees count)
range is also vary varied starting from 11 and 602069.

• Few outliers were detected in all the numerical variables, it was not treated because they
were genuine

• Year of establishment and number of employees were binned to have clear picture.

UNIVARIATE ANALYSIS

Distribution of Continent and Education of employee

Distribution of Job experience and Job Training

Distribution of Year of establishment and Region of employment

Distribution of Unit of wage and Full-time position

BIVARIATE, CORRELATION AND PAIRPLOT ANALYSIS

Distribution of continent and education of employee by case status

Distribution of Job experience and training by case status

Distribution of region of employment, Unit of wages and Full-time position

by case status
Distribution of Year of establishment with case status

Distribution of prevailing wage with case status

Distribution of no of employees with case status

Observations and Insights
• Majority of employees (>50%) are from Asia • Cases getting certified is highest for Europe
(80%), then Africa (72%), then Asia (65%), & least for S.America & N.America (around 60%)
• Majority of employees have either a bachelor's (40%) or a master's (38%) and minority of
applicants have either a doctorate (8%) or only a high school diploma (13%) • Cases getting
certified is highest for doctorate degree (>86%), followed by master degree (>76%), then
bachelor's (~62%) & high school (<35%)
• 58% of all applicants have prior job experience and 42% do not. Cases certified is high for
applicants with prior job experience (75%) & low for applicants without prior job experience
(~56%)
• Majority do not require the employee to receive any additional job training & are full time
rather than part time opportunities. These attributes were not found to have an impact on
the case statuses with equal number of cases getting certified independent of the attributes
• Majority of the applications are to Northeast (28.3%), then South (27.5%), then West
(25.8%), Midwest (16.9%) and least to Island (1.5%) regions
• Cases certified follows Midwest (75%), then South (70%), then Northeast, West, & Island
(60%)
• Region of employment being Midwest is an important attribute contributing positively to a
case being certified
• Approximately, 67% of all cases are approved and 33% of all cases are denied
• The distribution of number of employees is skewed right with several outliers. However,
greater than twice the number of cases (i.e., 65%) are certified than denied both for
employers having lesser as well as a greater number of employees
• The median prevailing wage for certified applications is slightly higher compared to denied
applications
• Year of establishment does not provide any important aspect for the visa certification or
denial.

DATA PROCESSING
There are several outliers, but not treated since they are found to be genuine. No missing and
duplicate values.

Data Preparation for Modelling

• Predict which visa application can be certified and denied
• One hot encoding the categorical variables
• Split the data into train, validation and test to be able to evaluate the model that we build
after hyper tuning.
MODEL BUILDING CRITERION
Types of Wrong Predictions:
False Negative (FN): Predicting an applicant should be denied when they should be approved.
False Positive (FP): Predicting an applicant should be approved when they should be denied.
Importance of Both Cases:
False Positive (FP):
Consequence: An unqualified employee gets a job that should have been filled by a US citizen.
Impact: Reduced quality of workforce, potential legal and ethical implications.
False Negative (FN):
Consequence: A qualified applicant is denied, and critical positions remain unfilled.
Impact: Reduced productivity and competitiveness of US companies, economic implications.
Reducing Losses:
Prioritize Review Process: Identify candidates predicted to be approved so agents can prioritize
these applications. This optimizes resource allocation and review efficiency.
Evaluation Metric - F1 Score:
Why F1 Score: It is the harmonic mean of precision and recall, making it a balanced metric to
minimize both False Positives and False Negatives.
Balanced Class Weights:
Purpose: Ensures the model does not Favor one class over the other, focusing equally on both
approval and denial predictions.

MODEL BUILDING METHODS & STEPS

1. Model building on original data
2. Model building on Oversampled data
3. Model building on Under sampled data
4. Hyper tuning the model
5. Model Comparison and Final model selection

Models built includes

• Bagging
• Random Forest
• AdaBoost
• Gradient Boost
• Decision Tree
Model is built on both Train and validation set for original, oversampled, under sampled.
From the 15 models built, we select 4 models for hyper tuning and select the best model to run them
on the test data. This avoids data leakage
1. Model building on original data

GBM and AdaBoost: These models show minimal differences between training and validation
performance, indicating they generalize well and are less likely to overfit.

Bagging and Random Forest: Both exhibit signs of overfitting. These ensemble methods often
perform well, but tuning hyperparameters like the number of trees, depth, and using regularization
techniques can help improve generalization.

Decision Tree: The model is highly overfitting, indicating a need for pruning or setting depth
limitations.

2. Model building on Over sampled data

GBM and AdaBoost: These models show minimal differences between training and validation
performance, indicating they generalize well and are less likely to overfit.

Bagging and Random Forest: Both exhibit signs of overfitting.

Decision Tree: The model is highly overfitting. Pruning or limiting the depth of the tree may help to
mitigate overfitting.

3. Model building on Under sampled data

GBM and AdaBoost: These models show minimal or negative differences between training and
validation performance, indicating they generalize well and are less likely to overfit.
Bagging and Random Forest: Both exhibit signs of overfitting. Consider tuning hyperparameters such
as the number of trees, maximum depth, or using regularization techniques to improve
generalization.

Decision Tree: The model is highly overfitting. Pruning or limiting the depth of the tree may help to
mitigate overfitting.

• After building 15 models, it was observed that both the GBM and Adaboost models, trained
on over and under sampled dataset, exhibited strong performance on both the training and
validation datasets.

• Sometimes models might overfit after under sampling and oversampling, so it's better to
tune the models to get a generalized performance

• We will tune these 4 models using the same data (under sampled or oversampled) as we
trained them on before

4. Hyper Tuning the model

a. Tuning AdaBoostClassifier model with Oversampled data

Confusion Matrix on Train set

Performance Metrics on train and validation set

b. Tuning AdaBoostClassifier model with Under sampled data

Confusion Matrix on Train set

Performance Metrics on train and validation set

c. Tuning Gradient Boosting model with Under sampled Data

Confusion matrix on Train set

Performance metrics on Train and validation set

d. Tuning Gradient Boosting model with Oversampled data

Confusion matrix on Train set

Performance metrics on Train and validation set

5.Model Comparison and Final Model Selection
Based on the evaluation results of the hyper tuned models for visa application prediction, it is
evident that all four models demonstrate notable enhancements in performance metrics
compared to their default counterparts. We can analyse this with the help of the performance
comparison data below:
On train set

On Validation set

Final Model Selection

1. Gradient Boosting with Oversampled Data:
o Highest validation accuracy (0.75) and F1 score (0.82).
o High recall (0.86) and precision (0.78).
o The training performance is also strong, indicating the model generalizes well
without overfitting.
2. AdaBoost with Oversampled Data:
o Validation accuracy (0.74) and F1 score (0.81) are slightly lower than Gradient
Boosting with Oversampled Data.
o However, it has a slightly higher recall (0.87), making it effective at identifying
positive cases.
3. Gradient Boosting with Under sampled Data and AdaBoost with Under sampled Data:
o Both models have lower validation accuracy and recall compared to their
oversampled counterparts.
o Indicate under sampling may not be as effective for this dataset.
Recommendation:
Based on the comparison, Gradient Boosting with Oversampled Data is the best model to fit on the
test data. It has the highest validation performance, indicating it is likely to generalize well to unseen
data. Additionally, its strong training performance suggests it has effectively learned from the
oversampled data without overfitting. And has the highest F1 Scores for both validation and training
sets, it's the best model for test data. The high F1 Score signifies that the model effectively balances
precision and recall, making it a reliable choice for accurately predicting positive and negative cases
in the dataset.
This model should provide the most consistent and accurate results on test data, ensuring a robust
performance in real-world scenarios.

Model Built on the test set

It is evident from the confusion matrix that

this model identifies FP and FN correctly at
81%, such that wrong applicant will not be
certified with visa or a eligible applicant will
be denied.

The most important features utilized in identifying the target variable, i.e., case_status, are:
Recommendations and Insights
The profile of the applicants for whose visa can be certified:
• Education level - At least has a Bachelor's degree - Master's and doctorate are
preferred.
• Job Experience - has job experience.
• Prevailing wage - has a high prevailing wage most likely yearly (The median prevailing
wage of the employees for whom the visa got certified is around 72k. )
• Continent - it has been observed that applicants from Europe, Africa, and Asia have
higher chances of visa certification.
The profile of the applicants for whom the visa status can be denied:
• Education level - high school degree.
• Job Experience - Doesn't have any job experience.
• Prevailing wage and unit of wage - applicants with hourly units of wage (The median
prevailing wage of the employees for whom the visa got certified is around 65k.)
• Continent - it has been observed that applicants from South America, North
America, and Oceania have higher chances of visa applications getting denied
Additional information like Gender of the applicant, marital status, Specialization of the degree,
number of years of experience of the employees can be given.
With respect to the employer, the salary slab according the experience they are offering and the
sector in which they are operating.

Easy Visa Project PDF
100% (4)
Easy Visa Project PDF
17 pages
60 Easy Studies
0% (1)
60 Easy Studies
2 pages
Credit EDA Case Study
100% (3)
Credit EDA Case Study
22 pages
Machiavelli - The Prince
No ratings yet
Machiavelli - The Prince
3 pages
Entrepreneurship: Quarter 2 - Module 4: Recruit Qualified People For One's Business Enterprise
75% (4)
Entrepreneurship: Quarter 2 - Module 4: Recruit Qualified People For One's Business Enterprise
27 pages
dsba_project_main__et_easyvisa
No ratings yet
dsba_project_main__et_easyvisa
46 pages
MACHINE LEARNING 2 BUSINESS REPORT
No ratings yet
MACHINE LEARNING 2 BUSINESS REPORT
21 pages
Kewal Kumar Singh
No ratings yet
Kewal Kumar Singh
21 pages
Project 5-EasyVisa assignment (1)
No ratings yet
Project 5-EasyVisa assignment (1)
57 pages
Ml-1-Guided-Bus Report
No ratings yet
Ml-1-Guided-Bus Report
35 pages
ML2 Easy Visa Project Business Report
100% (1)
ML2 Easy Visa Project Business Report
24 pages
Visa Application Report
No ratings yet
Visa Application Report
7 pages
Salary Data Set Description: Source
No ratings yet
Salary Data Set Description: Source
2 pages
Examining Career Development within Learning Organisations: Career Development Book Series, #4
From Everand
Examining Career Development within Learning Organisations: Career Development Book Series, #4
Denise N. Fyffe
No ratings yet
Sajjadiani Et Al - 2019 - Using Machine Learning To Translate Applicant Work History Into Predictors of
No ratings yet
Sajjadiani Et Al - 2019 - Using Machine Learning To Translate Applicant Work History Into Predictors of
61 pages
America's Professional Highlight Resume Book
From Everand
America's Professional Highlight Resume Book
Allen Phillip Alexandre
No ratings yet
Roadmap to Cima Gateway Success: Roadmap to help you pass your CIMA Gateway exams - A practical guide: Roadmap to help you pass your CIMA Gateway exams - A practical guide
From Everand
Roadmap to Cima Gateway Success: Roadmap to help you pass your CIMA Gateway exams - A practical guide: Roadmap to help you pass your CIMA Gateway exams - A practical guide
Constantine Kiritsis
No ratings yet
Credit Card EDA: Authored by
100% (1)
Credit Card EDA: Authored by
16 pages
Successful Personnel Selection: EASY STEP-BY-STEP GUIDE FOR COMPANY MANAGERS, OR HOW TO FIND, KEEP AND DEVELOP EMPLOYEES
From Everand
Successful Personnel Selection: EASY STEP-BY-STEP GUIDE FOR COMPANY MANAGERS, OR HOW TO FIND, KEEP AND DEVELOP EMPLOYEES
Vladimir Krastev
No ratings yet
Loan Status Prediction
No ratings yet
Loan Status Prediction
23 pages
WCC WP Matching Lab
No ratings yet
WCC WP Matching Lab
9 pages
HAYUDINI, MUDZRAMER - ACTIVITY 4
No ratings yet
HAYUDINI, MUDZRAMER - ACTIVITY 4
9 pages
Ashoka Women'S Engineering College
No ratings yet
Ashoka Women'S Engineering College
26 pages
mig1
No ratings yet
mig1
20 pages
CompetitiveEdge:A Guide to Business Programs 2013
From Everand
CompetitiveEdge:A Guide to Business Programs 2013
Peterson's
No ratings yet
Credit EDA Case Study
No ratings yet
Credit EDA Case Study
19 pages
Employee Retention
From Everand
Employee Retention
Rosalind Kincaid
No ratings yet
EDA Assignment
No ratings yet
EDA Assignment
33 pages
199 Pre-Written Employee Performance Appraisals: The Complete Guide to Successful Employee Evaluations And Documentation
From Everand
199 Pre-Written Employee Performance Appraisals: The Complete Guide to Successful Employee Evaluations And Documentation
Stephanie Lyster
No ratings yet
Employee Recruitment and Analytics
No ratings yet
Employee Recruitment and Analytics
18 pages
7 Easy Steps to Conduct a Human Resources Audit and Protect Your Company!
From Everand
7 Easy Steps to Conduct a Human Resources Audit and Protect Your Company!
Vanessa Nelson
5/5 (1)
Roadmap to Federal Jobs: How to Determine Your Qualifications, Develop an Effective USAJOBS Resume, Apply for and Land U.S. Government Jobs
From Everand
Roadmap to Federal Jobs: How to Determine Your Qualifications, Develop an Effective USAJOBS Resume, Apply for and Land U.S. Government Jobs
Barbara A. Adams
No ratings yet
BDUD-A0
No ratings yet
BDUD-A0
4 pages
Wiley CMA Learning System Exam Review 2013, Financial Decision Making, + Test Bank
From Everand
Wiley CMA Learning System Exam Review 2013, Financial Decision Making, + Test Bank
IMA
5/5 (1)
Wiley CMA Learning System Exam Review 2013, Financial Decision Making, Online Intensive Review + Test Bank
From Everand
Wiley CMA Learning System Exam Review 2013, Financial Decision Making, Online Intensive Review + Test Bank
IMA
No ratings yet
EDA Case Study
No ratings yet
EDA Case Study
94 pages
The Right Interview
From Everand
The Right Interview
SAAD ABBAS
No ratings yet
Recruiter FAQ
No ratings yet
Recruiter FAQ
6 pages
Adult Census Income Prediction
No ratings yet
Adult Census Income Prediction
31 pages
Employee Attrition Miniblogs
100% (1)
Employee Attrition Miniblogs
15 pages
Id5132 1
No ratings yet
Id5132 1
22 pages
Employment Agency Success
From Everand
Employment Agency Success
Vincent Gabriel
No ratings yet
Employee Future Prediction
No ratings yet
Employee Future Prediction
3 pages
Capstone Project
No ratings yet
Capstone Project
1 page
Cost & Managerial Accounting I Essentials
From Everand
Cost & Managerial Accounting I Essentials
William Keller
4/5 (14)
mini_projet
No ratings yet
mini_projet
49 pages
Top Notch Executive Interviews: How to Strategically Deal With Recruiters, Search Firms, Boards of Directors, Panels, Presentations, Pre-interviews, and Other High-Stress Situations
From Everand
Top Notch Executive Interviews: How to Strategically Deal With Recruiters, Search Firms, Boards of Directors, Panels, Presentations, Pre-interviews, and Other High-Stress Situations
Katharine Hansen
No ratings yet
Conservation Jobs | The Step-by-Step System to get Hired as a Wildlife Conservationist
From Everand
Conservation Jobs | The Step-by-Step System to get Hired as a Wildlife Conservationist
Conservation Careers
No ratings yet
102128-Article Text-211595-1-10-20240913
No ratings yet
102128-Article Text-211595-1-10-20240913
3 pages
Quantitative Methods Predictive Model OJT ABSORPTION
No ratings yet
Quantitative Methods Predictive Model OJT ABSORPTION
9 pages
My Career Mentor & Me: Placing Myself and Staying on the Right Career Pathway
From Everand
My Career Mentor & Me: Placing Myself and Staying on the Right Career Pathway
Dr. Michael V Mulligan
No ratings yet
BUILD IT: Employee Share Ownership Plans
From Everand
BUILD IT: Employee Share Ownership Plans
Craig West
No ratings yet
Talent Keepers: How Top Leaders Engage and Retain Their Best Performers
From Everand
Talent Keepers: How Top Leaders Engage and Retain Their Best Performers
Christopher Mulligan
No ratings yet
A_Model_to_Predict_Pay_Scale_Fixation_in_Job_Marke
No ratings yet
A_Model_to_Predict_Pay_Scale_Fixation_in_Job_Marke
6 pages
How to Find a Job: Common mistakes and how to correct them
From Everand
How to Find a Job: Common mistakes and how to correct them
Ralph Snider
No ratings yet
Business & Leadership: Vol 3
From Everand
Business & Leadership: Vol 3
Zaheer Siddiqui
No ratings yet
8 - Graduates Employability Analysis Using Classification Model A Data Mining Approach - AL PDF
No ratings yet
8 - Graduates Employability Analysis Using Classification Model A Data Mining Approach - AL PDF
9 pages
Prediciton of Loan Apprval-Project Report
No ratings yet
Prediciton of Loan Apprval-Project Report
82 pages
Data Science Resume 2.2.25
No ratings yet
Data Science Resume 2.2.25
2 pages
Day10Async - Selection & Performance Appraisal and Management - Kier Mangubat
No ratings yet
Day10Async - Selection & Performance Appraisal and Management - Kier Mangubat
22 pages
Prelims Rns
No ratings yet
Prelims Rns
10 pages
Quality and Service In Private Clubs - What Every Manager Needs to Know
From Everand
Quality and Service In Private Clubs - What Every Manager Needs to Know
Ed Rehkopf
No ratings yet
Recruitment Process
From Everand
Recruitment Process
Rohit Singh
No ratings yet
Listening skills
No ratings yet
Listening skills
5 pages
assignment python
No ratings yet
assignment python
25 pages
SUBQUERIES - Basics
No ratings yet
SUBQUERIES - Basics
8 pages
Java Lab Record 2024
No ratings yet
Java Lab Record 2024
43 pages
CYCLE TEST 2 IMPORTANT QUESTIONS AND ANSWERS
No ratings yet
CYCLE TEST 2 IMPORTANT QUESTIONS AND ANSWERS
27 pages
Car claims for insurance corrupt
No ratings yet
Car claims for insurance corrupt
641 pages
Business Report
No ratings yet
Business Report
5 pages
ShowTime OTT Business report
No ratings yet
ShowTime OTT Business report
17 pages
dev seg 1 2023
No ratings yet
dev seg 1 2023
2 pages
q1 Grade 8 Health DLL Week 2
100% (2)
q1 Grade 8 Health DLL Week 2
9 pages
Bitaug ES Brigada Eskwela Action Plan 2024-2025
No ratings yet
Bitaug ES Brigada Eskwela Action Plan 2024-2025
3 pages
Chapter 6 PDF
No ratings yet
Chapter 6 PDF
26 pages
Course Syllabus: Hebrews & James - BI 2942-01
No ratings yet
Course Syllabus: Hebrews & James - BI 2942-01
4 pages
Fiction in The Media or Fiction in The Media PDF
No ratings yet
Fiction in The Media or Fiction in The Media PDF
164 pages
Answers, and Left The Bigger Issues Like Rizal's Retraction Unanswered. However, Does It
No ratings yet
Answers, and Left The Bigger Issues Like Rizal's Retraction Unanswered. However, Does It
2 pages
Apuntes Week 1
No ratings yet
Apuntes Week 1
8 pages
Music and Young Children - Aronoff, Frances Webber, Author - 1969 - New York - Holt, Rinehart and Winston, Inc. - 9780030766909 - Anna's Archive
100% (1)
Music and Young Children - Aronoff, Frances Webber, Author - 1969 - New York - Holt, Rinehart and Winston, Inc. - 9780030766909 - Anna's Archive
216 pages
ChatGPT Interview Coach Jan Tegze 1681742790
No ratings yet
ChatGPT Interview Coach Jan Tegze 1681742790
10 pages
A Pose Together With Our Dear Principal Dr. Gemma P. Layes, Ma'am Sulmaca, Ma'am Romallosa and The Eco Bin Painting Participants Mutya Maglapit and Faith Nanas
No ratings yet
A Pose Together With Our Dear Principal Dr. Gemma P. Layes, Ma'am Sulmaca, Ma'am Romallosa and The Eco Bin Painting Participants Mutya Maglapit and Faith Nanas
20 pages
MCI Norms-100 Adms
50% (2)
MCI Norms-100 Adms
90 pages
Business Statistics
No ratings yet
Business Statistics
3 pages
Counseling Skills for Managers
No ratings yet
Counseling Skills for Managers
11 pages
Final PBL Report Btech - CSE .
No ratings yet
Final PBL Report Btech - CSE .
7 pages
Week 2 - Rizal The National Hero
No ratings yet
Week 2 - Rizal The National Hero
67 pages
Metacognitive Awareness Inventory
No ratings yet
Metacognitive Awareness Inventory
5 pages
Educ 4721 Assignment 1 Sophia Bubner
No ratings yet
Educ 4721 Assignment 1 Sophia Bubner
24 pages
MYP_1 (12)
No ratings yet
MYP_1 (12)
2 pages
DLL (The Mats)
No ratings yet
DLL (The Mats)
6 pages
BEP Press Kit For Prof Sattar Bawany - 1 May 2023
No ratings yet
BEP Press Kit For Prof Sattar Bawany - 1 May 2023
22 pages
Finlatics IBEP Project 1
No ratings yet
Finlatics IBEP Project 1
2 pages
Google (3 Items X 10 Points) : Answer Key
No ratings yet
Google (3 Items X 10 Points) : Answer Key
1 page
Answered - 52350 - Measurement and Evaluation of Education
No ratings yet
Answered - 52350 - Measurement and Evaluation of Education
17 pages
Mid Term Project Report
No ratings yet
Mid Term Project Report
5 pages
Institute Name: Sri Ramachandra Institute of Higher Education and Research (IR-O-I-1486)
No ratings yet
Institute Name: Sri Ramachandra Institute of Higher Education and Research (IR-O-I-1486)
37 pages

Machine Learning 2 Working-pages-Deleted

Uploaded by

Machine Learning 2 Working-pages-Deleted

Uploaded by

MACHINE LEARNING 2

MODEL BUILDING AND TUNING

1. Facilitate the process of visa approvals.

• case_id: ID of each visa application

• continent: Information of continent the employee

• education_of_employee: Information of education of the employee

• has_job_experience: Does the employee has any job experience? Y= Yes; N = No

• requires_job_training: Does the employee require any job training? Y = Yes; N = No

• no_of_employees: Number of employees in the employer's company

• yr_of_estab: Year in which the employer's company was established

• prevailing_wage: Average wage paid to similarly employed workers in a specific occupation

• full_time_position: Is the position of work full-time? Y = Full-Time Position; N = Part-Time

• case_status: Flag indicating if the Visa was certified or denied

• It has no duplicate or missing values

• case_id column was removed, as it had only unique values

Distribution of Continent and Education of employee

Distribution of Year of establishment and Region of employment

BIVARIATE, CORRELATION AND PAIRPLOT ANALYSIS

Distribution of Job experience and training by case status

Distribution of region of employment, Unit of wages and Full-time position

Distribution of prevailing wage with case status

Distribution of no of employees with case status

Data Preparation for Modelling

MODEL BUILDING METHODS & STEPS

Models built includes

2. Model building on Over sampled data

Bagging and Random Forest: Both exhibit signs of overfitting.

3. Model building on Under sampled data

4. Hyper Tuning the model

Confusion Matrix on Train set

Performance Metrics on train and validation set

b. Tuning AdaBoostClassifier model with Under sampled data

Confusion Matrix on Train set

c. Tuning Gradient Boosting model with Under sampled Data

Confusion matrix on Train set

Performance metrics on Train and validation set

d. Tuning Gradient Boosting model with Oversampled data

Confusion matrix on Train set

Performance metrics on Train and validation set

Final Model Selection

Model Built on the test set

It is evident from the confusion matrix that

You might also like