0% found this document useful (0 votes)
18 views4 pages

Students Exam Scores Analysis - Ipynb

Python document for students

Uploaded by

priyankanpriya03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views4 pages

Students Exam Scores Analysis - Ipynb

Python document for students

Uploaded by

priyankanpriya03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 4

{"metadata":{"kernelspec":{"display_name":"Python

3","language":"python","name":"python3"},"language_info":{"codemirror_mode":
{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-
python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","
version":"3.12.3"},"kaggle":{"accelerator":"none","dataSources":
[{"sourceId":5399169,"sourceType":"datasetVersion","datasetId":3128523}],"dockerIma
geVersionId":30761,"isInternetEnabled":false,"language":"python","sourceType":"note
book","isGpuEnabled":false}},"nbformat_minor":4,"nbformat":4,"cells":
[{"cell_type":"markdown","source":"# Understand the Data\n","metadata":{}},
{"cell_type":"markdown","source":"## Import libraries","metadata":{}},
{"cell_type":"code","source":"# type: ignore\nimport numpy as np \nimport pandas
as pd\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nimport warnings\
nwarnings.filterwarnings('ignore')","metadata":{},"outputs":
[],"execution_count":null},{"cell_type":"markdown","source":"## Input
Data","metadata":{}},{"cell_type":"code","source":"df =
pd.read_csv(\"./Expanded_data_with_more_features.csv\", encoding=
'unicode_escape')\ndf.head(2)","metadata":{},"outputs":[],"execution_count":null},
{"cell_type":"code","source":"df.shape","metadata":{},"outputs":
[],"execution_count":null},{"cell_type":"code","source":"df.size","metadata":
{},"outputs":[],"execution_count":null},
{"cell_type":"code","source":"df.info()","metadata":{},"outputs":
[],"execution_count":null},
{"cell_type":"code","source":"df.describe(include='all').T","metadata":
{},"outputs":[],"execution_count":null},
{"cell_type":"code","source":"df.columns","metadata":{},"outputs":
[],"execution_count":null},{"cell_type":"markdown","source":"`Data Dictionary`\n\n|
Column Name | Description
|\
n|----------------------|----------------------------------------------------------
-------------------|\n| **Gender** | Gender of the student (male/female)
|\n| **EthnicGroup** | Ethnic group of the student (neither Christian nor
Jewish) |\n| **ParentEduc** | Parent(s)
education background (from some_highschool to master's degree) |\n|
**LunchType** | School lunch type (standard or free/reduced)
|\n| **TestPrep** | Test preparation course followed (completed or none)
|\n| **ParentMaritalStatus** | Parent(s) marital status
(married/single/widowed/divorced) |\n| **PracticeSport** | How
often the student practices sport (never/sometimes/regularly) |\n|
**IsFirstChild** | If the child is the first in the family (yes/no)
|\n| **NrSiblings** | Number of siblings the student has (0 to 7)
|\n| **TransportMeans** | Means of transport to school (schoolbus/private)
|\n| **WklyStudyHours** | Weekly self-study hours (less than 5 hrs; between 5
and 10 hrs; more than 10 hrs) |\n| **MathScore** | Math test score (0-100)
|\n| **ReadingScore** | Reading test score (0-100)
|\n| **WritingScore** | Writing test score (0-100)
|\n","metadata":{}},{"cell_type":"markdown","source":"# Data Cleaning","metadata":
{}},{"cell_type":"code","source":"df.columns","metadata":{},"outputs":
[],"execution_count":null},
{"cell_type":"code","source":"df.isnull().sum()","metadata":{},"outputs":
[],"execution_count":null},{"cell_type":"code","source":"df.fillna({\n
'EthnicGroup': 'Unknown',\n 'ParentEduc': 'No Edu info',\n
'ParentMaritalStatus': 'No info',\n \n}, inplace=True)","metadata":{},"outputs":
[],"execution_count":null},{"cell_type":"code","source":"df.drop(columns=['Unnamed:
0'], inplace=True)","metadata":{},"outputs":[],"execution_count":null},
{"cell_type":"code","source":"df","metadata":{},"outputs":
[],"execution_count":null},{"cell_type":"code","source":"df.info()","metadata":
{},"outputs":[],"execution_count":null},
{"cell_type":"code","source":"df[df['WklyStudyHours']=='05-Oct']","metadata":
{},"outputs":[],"execution_count":null},{"cell_type":"markdown","source":"## Add
new Col","metadata":{}},{"cell_type":"code","source":"# percentage col \
ndf['Percentage']= ( (df['WritingScore'] + df['MathScore'] +
df['ReadingScore'])/300 ) * 100\ndf['Percentage'] = df['Percentage'].apply(lambda
x: '{:,.2f}'.format(x))\ndf['Percentage'] =
df['Percentage'].astype('float16')","metadata":{},"outputs":
[],"execution_count":null},{"cell_type":"code","source":"# grade col\ndef
grade(score):\n \n if score >= 80.0:\n return 'A'\n elif score >=
60.0:\n return 'B'\n elif score >= 40.0:\n return 'C'\n elif
score >= 30.0:\n return 'D'\n else:\n return 'F'\n","metadata":
{},"outputs":[],"execution_count":null},{"cell_type":"code","source":"df['Grade'] =
df['Percentage'].apply(grade)","metadata":{},"outputs":[],"execution_count":null},
{"cell_type":"code","source":"df","metadata":{},"outputs":
[],"execution_count":null},{"cell_type":"code","source":"# for future reference\n\
ndef all_mean_score_set():\n return {'MathScore':'mean', 'ReadingScore':
'mean','WritingScore':'mean'}","metadata":{},"outputs":[],"execution_count":null},
{"cell_type":"markdown","source":"# EDA","metadata":{}},
{"cell_type":"markdown","source":"## Gender","metadata":{}},
{"cell_type":"code","source":"gender_count = df['Gender'].value_counts()\
nplt.pie(gender_count, labels=gender_count.index, autopct=lambda p : '{:.1f}%
({:,.1f})'.format(p,p * sum(gender_count)/100))\nplt.title('Gender Distribution')\
nplt.show()","metadata":{},"outputs":[],"execution_count":null},
{"cell_type":"code","source":"ax = sns.countplot(data=df, x='Gender', hue='Grade',
palette='viridis')\n\nfor container in ax.containers:\n
plt.bar_label(container)\n\nplt.title('Male & Female Grade') \
nplt.show()","metadata":{},"outputs":[],"execution_count":null},
{"cell_type":"markdown","source":"<div class=\"alert alert-block alert-info\">\
n<b>Info : </b> Both males and females have nearly equal
participation.\n</div>","metadata":{}},{"cell_type":"markdown","source":"## Parent
Education vs Score","metadata":{}},{"cell_type":"code","source":"par_edu =
df.groupby(['ParentEduc', ]).agg({'MathScore':'mean', \n
'ReadingScore': 'mean',\n
'WritingScore':'mean'})\n\npar_edu","metadata":{},"outputs":
[],"execution_count":null},{"cell_type":"code","source":"df.groupby(['ParentEduc'])
[['MathScore', 'ReadingScore', 'WritingScore']].agg(np.mean) \\\
n .style.background_gradient(cmap='RdPu')","metadata":{},"outputs":
[],"execution_count":null},{"cell_type":"code","source":"# Does parental education
have an affect on different genders?\ndf.groupby(['Gender', 'ParentEduc'])
[['MathScore', 'ReadingScore', 'WritingScore']].agg(np.mean)\\\n
.style.background_gradient(cmap='RdPu')","metadata":{},"outputs":
[],"execution_count":null},
{"cell_type":"code","source":"sns.clustermap(data=par_edu, cmap='viridis',
annot=True) \nplt.title('Relationship b/w student Score and Parent Education ',
size=19)\nplt.show()","metadata":{},"outputs":[],"execution_count":null},
{"cell_type":"markdown","source":"<div class=\"alert alert-block alert-info\">\
n<b>Info : </b>Children of parents who have a master's degree are more likely to
have better scores.\n</div>","metadata":{}},{"cell_type":"markdown","source":"##
Parent Marital Status vs Score","metadata":{}},
{"cell_type":"code","source":"par_mar =
df.groupby(['ParentMaritalStatus', ]).agg({'MathScore':'mean', \n
'ReadingScore': 'mean',\n
'WritingScore':'mean'})\n\
npar_mar.style.background_gradient(cmap='RdPu')","metadata":{},"outputs":
[],"execution_count":null},
{"cell_type":"code","source":"sns.clustermap(data=par_mar, cmap='viridis',
annot=True)\nplt.title('Relationship b/w student Score and Parent Marital Status ',
size=19)\nplt.show()","metadata":{},"outputs":[],"execution_count":null},
{"cell_type":"markdown","source":"<div class=\"alert alert-block alert-info\">\
n<b>Info : </b>There is no significant difference in children's scores due to their
parents' marital status.\n</div>","metadata":{}},
{"cell_type":"markdown","source":"## All Scores","metadata":{}},
{"cell_type":"code","source":"# df[df[\"ReadingScore\"] < 10].count()\nfig =
plt.figure(figsize=(20, 5))\n\nfor index, one in
enumerate([\"MathScore\", \"ReadingScore\", \"WritingScore\"]):\n
fig.add_subplot(1, 3, index + 1)\n sns.boxplot(x=df[one])","metadata":
{},"outputs":[],"execution_count":null},{"cell_type":"code","source":"#math\
nsns.catplot(data=df, kind='boxen', x='MathScore', palette='Set2')\nplt.title('Math
Boxen plot')\nfor x in [20, 40, 60, 80, 100]:\n plt.axvline(x=x, color='black',
linestyle='--', linewidth=0.7)\n \n#reading\nsns.catplot(data=df, kind='boxen',
x='ReadingScore', palette='Set1')\nplt.title('Reading Boxen plot')\nfor x in [20,
40, 60, 80, 100]:\n plt.axvline(x=x, color='black', linestyle='--',
linewidth=0.7)\n \n \n#writing\nsns.catplot(data=df, kind='boxen',
x='WritingScore', palette='Set3')\nplt.title('Writing Boxen plot')\nfor x in [20,
40, 60, 80, 100]:\n plt.axvline(x=x, color='black', linestyle='--',
linewidth=0.7)\n\n\nplt.show()","metadata":{},"outputs":[],"execution_count":null},
{"cell_type":"markdown","source":"## Ethnic group vs Score","metadata":{}},
{"cell_type":"code","source":"group_counts = df['EthnicGroup'].value_counts()\
nlabels = group_counts.index\n\nplt.pie(group_counts, labels=labels,
autopct='%1.1f%%')\nplt.title('Ethnic Groups')\nplt.show()","metadata":
{},"outputs":[],"execution_count":null},
{"cell_type":"code","source":"df.columns","metadata":{},"outputs":
[],"execution_count":null},{"cell_type":"markdown","source":"## Sport vs
Score","metadata":{}},{"cell_type":"code","source":"sport =
df.groupby(['PracticeSport']).agg({'MathScore':'mean', \n
'ReadingScore': 'mean',\n
'WritingScore':'mean'})\n\
nsport.style.background_gradient(cmap='RdPu')","metadata":{},"outputs":
[],"execution_count":null},{"cell_type":"code","source":"sns.clustermap(data=sport,
annot=True)\nplt.show()","metadata":{},"outputs":[],"execution_count":null},
{"cell_type":"markdown","source":"## Test Practice vs Score","metadata":{}},
{"cell_type":"code","source":"df.groupby(['PracticeSport']).agg(all_mean_score_set(
)) \\\n .style.background_gradient(cmap='RdPu')","metadata":{},"outputs":
[],"execution_count":null},
{"cell_type":"code","source":"df.groupby(['PracticeSport',
'TestPrep']).agg(all_mean_score_set()) \\\
n .style.background_gradient(cmap='RdPu')","metadata":{},"outputs":
[],"execution_count":null},{"cell_type":"markdown","source":"## Lunch vs
Score","metadata":{}},{"cell_type":"code","source":"df.groupby(['LunchType',
'Gender']).agg(all_mean_score_set()) \\\
n .style.background_gradient(cmap='RdPu')","metadata":{},"outputs":
[],"execution_count":null},{"cell_type":"markdown","source":"## First Child ? vs
Score","metadata":{}},
{"cell_type":"code","source":"df.groupby(['IsFirstChild']).agg(all_mean_score_set()
) \\\n .style.background_gradient(cmap='RdPu')","metadata":{},"outputs":
[],"execution_count":null},{"cell_type":"markdown","source":"## siblings vs
Score","metadata":{}},
{"cell_type":"code","source":"df['NrSiblings'].value_counts().plot(kind='bar')\
nplt.title('Nr of Siblings')\nplt.show()","metadata":{},"outputs":
[],"execution_count":null},
{"cell_type":"code","source":"df.groupby(['NrSiblings']).agg(all_mean_score_set())
\\\n .style.background_gradient(cmap='RdPu')","metadata":{},"outputs":
[],"execution_count":null},{"cell_type":"markdown","source":"## Transportation vs
Score","metadata":{}},
{"cell_type":"code","source":"df.groupby(['TransportMeans']).agg(all_mean_score_set
()) \\\n .style.background_gradient(cmap='RdPu')","metadata":{},"outputs":
[],"execution_count":null},
{"cell_type":"code","source":"df.groupby(['TransportMeans',
'TestPrep']).agg(all_mean_score_set()) \\\
n .style.background_gradient(cmap='RdPu')\n","metadata":{},"outputs":
[],"execution_count":null},
{"cell_type":"code","source":"df.groupby(['TransportMeans',
'PracticeSport']).agg(all_mean_score_set()) \\\
n .style.background_gradient(cmap='RdPu')\n","metadata":{},"outputs":
[],"execution_count":null},
{"cell_type":"code","source":"df.groupby(['TransportMeans',
'WklyStudyHours']).agg(all_mean_score_set()) \\\
n .style.background_gradient(cmap='RdPu')","metadata":{},"outputs":
[],"execution_count":null},{"cell_type":"markdown","source":"## weekly study hr vs
Score","metadata":{}},{"cell_type":"code","source":"df.groupby(['WklyStudyHours',
'TestPrep']).agg(all_mean_score_set()) \\\
n .style.background_gradient(cmap='RdPu')","metadata":{},"outputs":
[],"execution_count":null},{"cell_type":"code","source":"# Determine if there is
linearity between the target variable and the categorical features. This indicates
if linear regression is a good predictive model.\ntarget = 'MathScore'\n\n#
Identify categorical features\ncategorical_features =
df.select_dtypes(include=['object']).columns\n\n# Create box plots\nfor feature in
categorical_features:\n plt.figure(figsize=(10, 6))\n sns.boxplot(x=feature,
y=target, data=df)\n plt.title(f'Box Plot of {target} by {feature}')\n
plt.xlabel(target)\n plt.ylabel(feature)\n plt.show()","metadata":
{},"outputs":[],"execution_count":null}]}

You might also like