Copy of Final Project
Copy of Final Project
import pandas as pd
df = pd.read_csv('/content/Employee.csv')
df.head()
df.describe()
df.shape
(4653, 9)
df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 2764 entries, 0 to 4651
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Education 2764 non-null object
1 JoiningYear 2764 non-null int64
2 City 2764 non-null object
3 PaymentTier 2764 non-null int64
4 Age 2764 non-null int64
5 Gender 2764 non-null object
6 EverBenched 2764 non-null object
7 ExperienceInCurrentDomain 2764 non-null int64
8 LeaveOrNot 2764 non-null int64
dtypes: int64(5), object(4)
memory usage: 215.9+ KB
df.isnull().sum()
Education 0
JoiningYear 0
City 0
PaymentTier 0
Age 0
Gender 0
EverBenched 0
ExperienceInCurrentDomain 0
LeaveOrNot 0
dtype: int64
df.duplicated().sum()
df.isnull().sum()
Education 0
JoiningYear 0
City 0
PaymentTier 0
Age 0
Gender 0
EverBenched 0
ExperienceInCurrentDomain 0
LeaveOrNot 0
dtype: int64
Education int64
JoiningYear int64
City int64
PaymentTier int64
Age int64
Gender int64
EverBenched int64
ExperienceInCurrentDomain int64
LeaveOrNot int64
dtype: object
x = df.drop(columns=['LeaveOrNot']) # Features
y = df['LeaveOrNot'] # Target variable
Model Selection
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/
_logistic.py:460: ConvergenceWarning: lbfgs failed to converge
(status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
https://scikit-learn.org/stable/modules/linear_model.html#logistic-
regression
n_iter_i = _check_optimize_result(
Model Training
model = RandomForestClassifier()
model.fit(xtrain, ytrain)
RandomForestClassifier()
Model Evaluation
trainpred = model.predict(xtrain)
testpred = model.predict(xtest)
print(classification_report(ytest, testpred))