Python Lab 9
Python Lab 9
Objective
This lab session is designed to immerse you in the practicalities of data visualisation
using Python’s Matplotlib and Seaborn libraries, alongside data manipulation with the
Pandas library. You will use tasks to effectively enhance your understanding of visu-
alising and interpreting data. These activities will guide you through the intricacies of
creating various plots, from basic histograms to complex heat maps, while reinforcing
your data filtering and processing skills with Pandas. This hands-on experience is crucial
for developing a keen eye for data analysis and storytelling through visual representations
• Display the first few rows of the dataset to understand its structure.
Age Distribution
3500
3000
2500
2000
Frequency
1500
1000
500
0
20 30 40 50 60 70 80 90
Age
1
Middle East University
Faculty of Information Technology
Artificial Intelligence Department
Course: Python Programming 0431201 (Lab No.8 )
Instructor: Dr. Huthaifa Abuhammad
Gender Proportion
Female
33.1%
66.9%
Male
2
Middle East University
Faculty of Information Technology
Artificial Intelligence Department
Course: Python Programming 0431201 (Lab No.8 )
Instructor: Dr. Huthaifa Abuhammad
20000
15000
Count
10000
5000
0
<=50K >50K
Income Level
3000
2500
Number of Individuals
2000
1500
1000
500
0
<=50K >50K
Income Level
3
Middle East University
Faculty of Information Technology
Artificial Intelligence Department
Course: Python Programming 0431201 (Lab No.8 )
Instructor: Dr. Huthaifa Abuhammad
80
Average Hours per Week
60
40
20
0
2 4 6 8 10 12 14 16
Education Level (Number of Years)
4
Middle East University
Faculty of Information Technology
Artificial Intelligence Department
Course: Python Programming 0431201 (Lab No.8 )
Instructor: Dr. Huthaifa Abuhammad
Correlation Heatmap
1.0
age 1 -0.077 0.037 0.078 0.058 0.069
0.8
fnlwgt -0.077 1 -0.043 0.00043 -0.01 -0.019
0.6
education-num 0.037 -0.043 1 0.12 0.08 0.15
capital-gain
fnlwgt
capital-loss
education-num
hours-per-week
• Group and Count: Use Pandas to group the dataset by the ’occupation’ column
and count the individuals in each category.
• Sort and Plot: Sort the data for clarity and plot a horizontal bar chart with
Matplotlib, placing occupation names on the y-axis and their frequencies on the
x-axis.
• Label and Save: Ensure your chart is well-labelled with a clear title and axis
labels, and save the figure as a PDF file for your records.
This task will enable you to practice data grouping, sorting, and creating an infor-
mative bar chart that reveals the occupational diversity in the dataset.
5
Middle East University
Faculty of Information Technology
Artificial Intelligence Department
Course: Python Programming 0431201 (Lab No.8 )
Instructor: Dr. Huthaifa Abuhammad
Occupation Distribution
Armed-Forces
Priv-house-serv
Protective-serv
Tech-support
Farming-fishing
Handlers-cleaners
Transport-moving
Occupation
?
Machine-op-inspct
Other-service
Sales
Adm-clerical
Exec-managerial
Craft-repair
Prof-specialty
0 500 1000 1500 2000 2500 3000 3500 4000
Count
6
Middle East University
Faculty of Information Technology
Artificial Intelligence Department
Course: Python Programming 0431201 (Lab No.8 )
Instructor: Dr. Huthaifa Abuhammad
80
60
Hours per Week
40
20
0
Male Female
Gender
• Select Variables: Choose a set of variables from the dataset that might have
exciting relationships.
• Data Preparation: Ensure the selected data is clean and properly formatted for
visualisation. This may involve converting data types or handling missing values.
• Create the Pairplot: Use Seaborn’s ‘pairplot‘ function to create a grid of scatter
plots for each pair of variables. This function automatically creates both scatter
plots and histograms.
• Analysis and Interpretation: Observe the scatter plots to identify any notice-
able relationships, trends, or outliers among the selected variables.
7
Middle East University
Faculty of Information Technology
Artificial Intelligence Department
Course: Python Programming 0431201 (Lab No.8 )
Instructor: Dr. Huthaifa Abuhammad
• Save the Visualisation: Save your pairplot in a suitable format for your report
or presentation.
By completing this task, you will gain experience in creating complex visualisations
that can reveal intricate relationships between multiple data dimensions.
80
60
age
40
20
15.0
12.5
education-num
10.0
income
7.5 <=50K
5.0 >50K
2.5
100
80
hours-per-week
60
40
20
0
20 40 60 80 0 5 10 15 0 25 50 75 100
age education-num hours-per-week