MBSD
MBSD
Marks Obtained
CRITERIA AND SCALES
S1 S2 S3
Criterion 1: Does the application meet the desired specifications and produce the desired outputs?
(CPA-1, CPA-2, CPA-3) [8 marks]
1 2 3 4
The application does not The application partially The application meets the The application meets all
meet the desired meets the desired desired specifications but the desired specifications
specifications and is specifications and is is producing incorrect or and is producing correct
producing incorrect producing incorrect or partially correct outputs. outputs.
outputs. partially correct outputs.
Criterion 2: How well is the code organization? [2 marks]
1 2 3 4
The code is poorly The code is readable only Some part of the code is The code is well
organized and very to someone who knows well organized, while organized and very easy
difficult to read. what it is supposed to be some part is difficult to to follow.
doing. follow.
Criterion 3: Does the report adhere to the given format and requirements? [6 marks]
1 2 3 4
The report does not The report contains the The report contains all the The report contains all the
contain the required required information only required information but required information and
information and is partially but is formatted is formatted poorly. completely adheres to the
formatted poorly. well. given format.
Criterion 4: How does the student performed individually and as a team member?
(CPA-1, CPA-2, CPA-3) [4 marks]
1 2 3 4
The student worked on the The student worked on the The student worked on the
The student did not work assigned task, and assigned task, and assigned task, and
on the assigned task. accomplished goals accomplished goals accomplished goals
partially. satisfactorily. beyond expectations.
Final Score = (Criterial_1_score x 2) + (Criteria_2_score / 2) + (Criteria_3_score x (3/2)) + (Criteria_4_score)
= ______________________
Page 1 of 5
DATA PREPROCESSING STEPS:
For the given dataset of grades achieved by different students, we have done the following
preprocessing on data:
First of all, we have displayed the dataset to analyze the full picture. Then we checked all the columns
provided in the dataset. We found out the dimensions of the dataset by using shape method.
The data was inspected to see if there were any null values present in any column. We applied a
method which returned the number of null values in each column. To deal with missing values, we had
to apply a solution that could manage all the null values efficiently without rendering any significant
impact on data accuracy. Hence, we found out the mode of all the values in each column and replaced
the null values in each column with its mode.
Next, we looked for categorical data in the dataset and created a function for making key value pair of
categorical columns with respect to features. Then we searched for unique values in each column. The
number of unique values was found to be 13. We assigned the respective numeric value (GPA) for each
value (grade) in the column as follows:
𝑨+ = 4.0, A = 4.0, 𝑨− = 3.7, 𝑩+ = 3.3, B = 3.0, 𝑩− = 2.7, 𝑪+ = 2.3, C = 2, 𝑪− = 1.7, 𝑫+ = 1.3, D = 1.0, F = 0.0,
WU = 0.0
Furthermore, the seat number was removed from the dataset in order to attain a simplified data for
making prediction. We implemented a function that could efficiently retrieve all the courses taught in
different years. If we provide it the parameter ‘(1)’, it would only retrieve the courses of First year.
Similarly, if we provide ‘(1,2)’, it would retrieve the courses of First year combined with Second year
and for parameters ‘(1,2,3)’, the courses of First year combined with both Second and Third year are
regained. The courses are the features and the CGPA is set as the target of the model.
MODELS USED:
Model 1: predict final CGPA based on GPs of first year only.
Model 2: predict final CGPA based on GPs of first two years.
Model 2: predict final CGPA based on GPs of first three years.
ALGORITHMS USED:
We have implemented Linear Regression Model and KNN Regressor Model for predicting final CGPAs.
Page 2 of 5
1. Linear Regression:
Linear regression is the most commonly used model for predictive analysis of continuous data. It
attempts to model the relationship between two variables by fitting a linear equation to observed
data. One variable is considered to be an explanatory variable, and the other is considered to be a
dependent variable.
First of all, we split the data into 70% train data and 30% test data, and then applied Linear Regression
model on data.
For Model 1:
The accuracy of train data for first year was found to be 84.23%. Secondly, we verified if there are any
NaN values in our test data and forwarded with its prediction. The mean squared error was calculated
as 6% and the accuracy was 81%.
For Model 2:
The 2nd model was handled in the same way as the 1st. The accuracy of the train data was acquired as
90%. As for the test data, its mean squared error was computed as 3% and the accuracy was 92%.
For Model 3:
The accuracy of the train data was found to be 92.568%. The test data for this model consisted of the
minimum mean squared error, that is, 1% and the accuracy was 97% which is the best among all three
models.
2. KNN Regressor:
KNN regressor is a non-parametric model that, in an intuitive manner, approximates the association
between independent variables and the continuous outcome by averaging the observations in the
same neighborhood.
For Model 1:
We calculated the accuracy of train data first which came out as 84.64%. As far as test data is
concerned, the prediction was made for it and its mean squared error was observed as 6.99% and the
accuracy was 79%. We also provided the solution for handling the case for a single input by the user.
To achieve this, the test data was first reshaped into 1 column and as many rows as suggested by
NumPy and then sent on for prediction.
For Model 2:
The 2nd model was managed in the similar way. The accuracy of the train data was obtained as 87.7%.
Meanwhile, the mean squared error and accuracy of test data were gained as 3% and 91%
respectively.
Page 3 of 5
For Model 3:
The accuracy of the train data for this model was acquired as 89.3%. The test data was discovered to
have the accuracy of 95% with mean squared error of 1.5%.
95
90
85
97
80 92
75 81
70
Model 1 Model 2 Model 3
Page 4 of 5
PERFORMANCE OF MACHINE LEARNING SYSTEMS:
LINEAR REGRESSION:
The accuracy achieved by implementing linear regression is quite good in all three cases.
The accuracy of the first model lies in the range of 80% to 90% whereas the accuracies for model 2 and 3 are
significantly above 90% for both train and test data. Moreover, For model 1, the difference in the accuracies of
train and test data is 3.23%. For model 2, the difference is 2% and for model 3, it is 5.568% which is pretty much
acceptable.
Based on all these values, we can form a statement that our model is a good fit.
KNN REGRESSOR:
KNN regressor also succeeded in providing high accuracies. In this case, the model 1 possessed a difference of
5.64% in the accuracies of train and test data. But for model 2, the difference between the two got improved
and turned out to be 3.3%. The model 3 had the difference of 5.7% in the accuracies which is also not bad at all.
The accuracies of all the models for both train and test data are mostly above 80% which is considered to be
excellent.
Hence, we can say that our model is a really good fit according to the analysis of all the accuracies.
Here, the dataset provided to us comprised of the record of 571 students which is not a very large number. One
reason for obtaining high accuracies could be the size of the dataset as well as the simplification of data. But
irrespective of the reason, the correctness of predictions is quite impressive.
Page 5 of 5