0% found this document useful (0 votes)
17 views7 pages

Modelling and Simmulation Assignment - Ipynb - Colab

Student Droupout Prediction Using Decision Tree Classifier

Uploaded by

Muhammad Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views7 pages

Modelling and Simmulation Assignment - Ipynb - Colab

Student Droupout Prediction Using Decision Tree Classifier

Uploaded by

Muhammad Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

keyboard_arrow_down Step 1: Exploratory Data Analysis (EDA)

Let's begin by examining the dataset to understand its structure and the relationships between features and the target variable
(Dropout/Graduate).

path= '/content/drive/MyDrive/dataset.csv'

import pandas as pd

data= pd.read_csv(path)

data

Marital Application Application Daytime/evening Previous


Course Nacion
status mode order attendance qualification

0 1 8 5 2 1 1

1 1 6 1 11 1 1

2 1 1 5 5 1 1

3 1 8 2 15 1 1

4 2 12 1 3 0 1

... ... ... ... ... ... ...

4419 1 1 6 15 1 1

4420 1 1 2 15 1 1

4421 1 1 1 12 1 1

4422 1 1 1 9 1 1

4423 1 5 1 15 1 1

4424 rows × 35 columns

# Display summary statistics


data.describe()

Marital Application Application Daytime/evening Previou


Course
status mode order attendance qualificatio

count 4424.000000 4424.000000 4424.000000 4424.000000 4424.000000 4424.00000

mean 1.178571 6.886980 1.727848 9.899186 0.890823 2.53142

std 0.605747 5.298964 1.313793 4.331792 0.311897 3.96370

min 1.000000 1.000000 0.000000 1.000000 0.000000 1.00000

25% 1.000000 1.000000 1.000000 6.000000 1.000000 1.00000

50% 1.000000 8.000000 1.000000 10.000000 1.000000 1.00000

75% 1.000000 12.000000 2.000000 13.000000 1.000000 1.00000

max 6.000000 18.000000 9.000000 17.000000 1.000000 17.00000

8 rows × 34 columns

# Display data types of each column


data.dtypes
0

Marital status int64

Application mode int64

Application order int64

Course int64

Daytime/evening attendance int64

Previous qualification int64

Nacionality int64

Mother's qualification int64

Father's qualification int64

Mother's occupation int64

Father's occupation int64

Displaced int64

Educational special needs int64

Debtor int64

Tuition fees up to date int64

Gender int64

Scholarship holder int64

Age at enrollment int64

International int64

Curricular units 1st sem (credited) int64

Curricular units 1st sem (enrolled) int64

Curricular units 1st sem (evaluations) int64

Curricular units 1st sem (approved) int64

Curricular units 1st sem (grade) float64

Curricular units 1st sem (without evaluations) int64

Curricular units 2nd sem (credited) int64

Curricular units 2nd sem (enrolled) int64

Curricular units 2nd sem (evaluations) int64

Curricular units 2nd sem (approved) int64

Curricular units 2nd sem (grade) float64

Curricular units 2nd sem (without evaluations) int64

# Check for missing values


data.isnull().sum()
Application mode 0

Application order 0

Course 0

Daytime/evening attendance 0

Previous qualification 0

Nacionality 0

Mother's qualification 0

Father's qualification 0

Mother's occupation 0

Father's occupation 0

Displaced 0

Educational special needs 0

Debtor 0

Tuition fees up to date 0

Gender 0

Scholarship holder 0

Age at enrollment 0

International 0

Curricular units 1st sem (credited) 0

Curricular units 1st sem (enrolled) 0

Curricular units 1st sem (evaluations) 0

Curricular units 1st sem (approved) 0

Curricular units 1st sem (grade) 0

Curricular units 1st sem (without evaluations) 0

Curricular units 2nd sem (credited) 0

Curricular units 2nd sem (enrolled) 0

Curricular units 2nd sem (evaluations) 0

Curricular units 2nd sem (approved) 0

Curricular units 2nd sem (grade) 0

Curricular units 2nd sem (without evaluations) 0

Unemployment rate 0

Inflation rate 0

keyboard_arrow_down Step 2: Data Visualization


We will create various charts to visualize the data.

Scatter Plot

Let's create a scatter plot to see the relationship between the " Curricular units 2nd sem (grade) " and the " Target ".

import matplotlib.pyplot as plt


plt.scatter(data['Curricular units 2nd sem (grade)'], data['Target'])
plt.xlabel('Curricular units 2nd sem (grade)')
plt.ylabel('Target')
plt.title('Scatter Plot of Curricular units 2nd sem (grade) vs. Target'
plt show()

Bar Chart

Let's create a bar chart for the " Marital status " feature.

data['Marital status'].value_counts().plot(kind='bar')
plt.xlabel('Marital Status')
plt.ylabel('Count')
plt.title('Bar Chart of Marital Status')
plt.show()

Box Plot

Let's create a box plot for the " Curricular units 2nd sem (grade) " feature.
data.boxplot(column='Curricular units 2nd sem (grade)')
plt.title('Box Plot of Curricular units 2nd sem (grade)')
plt.show()

Histogram

Let's create a histogram for the " Curricular units 2nd sem (grade) " feature.

data['Curricular units 2nd sem (grade)'].hist()


plt.xlabel('Curricular units 2nd sem (grade)')
plt.ylabel('Frequency')
plt.title('Histogram of Curricular units 2nd sem (grade)')
plt.show()

keyboard_arrow_down Step 3: Data Preprocessing


We will preprocess the data, handling missing values, encoding categorical variables, and splitting the data into training and testing sets.

from sklearn.model_selection import train_test_split


from sklearn.preprocessing import LabelEncoder

# Encode the target variable


label_encoder = LabelEncoder()
data['Target'] = label_encoder.fit_transform(data['Target'])

# Define the features (X) and the target (y)


X = data.drop('Target', axis=1)
y = data['Target']

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

keyboard_arrow_down Step 4: Model Building


We will build and train a decision tree model to predict student dropout rates.

from sklearn.tree import DecisionTreeClassifier


from sklearn.metrics import accuracy_score, classification_report

# Build and train the model


model = DecisionTreeClassifier()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

print(f'Accuracy: {accuracy}')
print(f'Classification Report:\n{report}')

Accuracy: 0.6813559322033899
Classification Report:
precision recall f1-score support

0 0.77 0.66 0.71 316


1 0.34 0.39 0.36 151
2 0.76 0.81 0.78 418

accuracy 0.68 885


macro avg 0.62 0.62 0.62 885
weighted avg 0.69 0.68 0.68 885

def get_user_input_and_predict(model, feature_columns):


user_input = {}
for column in feature_columns:
user_input[column] = [input(f"Enter value for {column}: ")]

# Create a DataFrame for user inputs


input_df = pd.DataFrame(user_input)

# Handle any necessary preprocessing (e.g., converting to numeric)


for column in feature_columns:
if X[column].dtype in ['int64', 'float64']:
input_df[column] = pd.to_numeric(input_df[column])

# Predict using the trained model


prediction = model.predict(input_df)

# Decode the prediction


decoded_prediction = label_encoder.inverse_transform(prediction)

return decoded_prediction[0]

pred= model.predict(X_test)

# Dictionary for mapping encoded target values to original labels


target_mapping = {0: 'Dropout', 1: 'Enrolled', 2: 'Graduate'}
output= target_mapping[pred[0]]

original=target_mapping[y_pred[0]]

Comparing Values

print(f"Original Value: '{original}' and Predicted Value: '{output}'")

Original Value: 'Dropout' and Predicted Value: 'Dropout'

feature_columns = X.columns

# Predict on user inputs


predicted_class = get_user_input_and_predict(model, feature_columns)
predicted_class= target_mapping[predicted_class]
print(f"The predicted class is: {predicted_class}")

Enter value for Marital status: 1


Enter value for Application mode: 8
Enter value for Application order: 5
Enter value for Course: 2
Enter value for Daytime/evening attendance: 1
Enter value for Previous qualification: 1
Enter value for Nacionality: 1
Enter value for Mother's qualification: 1
Enter value for Father's qualification: 10
Enter value for Mother's occupation: 6
Enter value for Father's occupation: 10
Enter value for Displaced: 1
Enter value for Educational special needs: 0
Enter value for Debtor: 0
Enter value for Tuition fees up to date: 1
Enter value for Gender: 1
Enter value for Scholarship holder: 0
Enter value for Age at enrollment: 20
Enter value for International: 0
Enter value for Curricular units 1st sem (credited): 0
Enter value for Curricular units 1st sem (enrolled): 0
Enter value for Curricular units 1st sem (evaluations): 0
Enter value for Curricular units 1st sem (approved): 0
Enter value for Curricular units 1st sem (grade): 0
Enter value for Curricular units 1st sem (without evaluations): 0
Enter value for Curricular units 2nd sem (credited): 0
Enter value for Curricular units 2nd sem (enrolled): 0
Enter value for Curricular units 2nd sem (evaluations): 0
Enter value for Curricular units 2nd sem (approved): 0
Enter value for Curricular units 2nd sem (grade): 0
Enter value for Curricular units 2nd sem (without evaluations): 0
Enter value for Unemployment rate: 10.8
Enter value for Inflation rate: 1.4
Enter value for GDP: 1.74
The predicted class is: Dropout

You might also like