0% found this document useful (0 votes)
14 views8 pages

Python Lab 9

Uploaded by

soudi1070
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views8 pages

Python Lab 9

Uploaded by

soudi1070
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Middle East University

Faculty of Information Technology


Artificial Intelligence Department
Course: Python Programming 0431201 (Lab No.8 )
Instructor: Dr. Huthaifa Abuhammad

Objective
This lab session is designed to immerse you in the practicalities of data visualisation
using Python’s Matplotlib and Seaborn libraries, alongside data manipulation with the
Pandas library. You will use tasks to effectively enhance your understanding of visu-
alising and interpreting data. These activities will guide you through the intricacies of
creating various plots, from basic histograms to complex heat maps, while reinforcing
your data filtering and processing skills with Pandas. This hands-on experience is crucial
for developing a keen eye for data analysis and storytelling through visual representations

Python Programming Lab: Data Visualisation Tasks


Task 1: Basic Data Overview
• Load the dataset using Pandas from the attached file Adult Census Income.

• Display the first few rows of the dataset to understand its structure.

• Generate a basic summary of the dataset (e.g., using describe()).

Task 2: Age Distribution


• Create a histogram to visualise the age distribution of individuals in the dataset.

Age Distribution
3500

3000

2500

2000
Frequency

1500

1000

500

0
20 30 40 50 60 70 80 90
Age

1
Middle East University
Faculty of Information Technology
Artificial Intelligence Department
Course: Python Programming 0431201 (Lab No.8 )
Instructor: Dr. Huthaifa Abuhammad

Task 3: Gender Proportion


create a pie chart to display the proportion of males and females in the dataset. This
task involves summarising the gender data to determine the relative frequency of each
gender and then visually representing these proportions in a pie chart. The size of
each slice in the pie chart will correspond to the percentage of the dataset that each
gender represents, providing a clear and immediate visual representation of the gender
distribution.

Gender Proportion
Female

33.1%

66.9%

Male

Task 4: Income Level Analysis


• Analyse the distribution of income levels by creating a bar chart that compares
the count of individuals with incomes <= 50K and > 50K, offering a clear visu-
alisation of the income distribution in the dataset.

• Deepen your analysis by applying a filter based on a specific criterion, such as


education level, and create a comparative bar chart for this subset. This step will
help you explore the relationship between the chosen criterion and income levels,
providing insights into socio-economic patterns within the dataset.

2
Middle East University
Faculty of Information Technology
Artificial Intelligence Department
Course: Python Programming 0431201 (Lab No.8 )
Instructor: Dr. Huthaifa Abuhammad

Income Level Counts


25000

20000

15000
Count

10000

5000

0
<=50K >50K
Income Level

Income Level Counts for Individuals with Bachelors Degree

3000

2500
Number of Individuals

2000

1500

1000

500

0
<=50K >50K
Income Level

3
Middle East University
Faculty of Information Technology
Artificial Intelligence Department
Course: Python Programming 0431201 (Lab No.8 )
Instructor: Dr. Huthaifa Abuhammad

Task 5: Education and Hours-per-Week Relationship


By generating a scatter plot, explore the link between education level and work commit-
ment. This plot will map education level (expressed as the number of years of schooling)
against the average hours worked per week. Such a visualisation can reveal trends or
patterns, highlighting how educational attainment might influence working hours. This
task combines quantitative data analysis with visual interpretation, fostering a deeper
understanding of the interplay between education and professional life.

Education Level vs. Average Hours per Week


100

80
Average Hours per Week

60

40

20

0
2 4 6 8 10 12 14 16
Education Level (Number of Years)

Task 6: Correlation Heatmap


This task involves selecting a set of numerical variables from the dataset and creating
a heatmap to visualise their correlations. A heatmap is a powerful tool for uncovering
relationships between variables, as it uses colour intensity to represent correlation coeffi-
cients. By completing this task, students will gain insights into how different numerical
factors are interrelated, which is essential for understanding complex datasets and can
guide further data analysis or modelling efforts.

4
Middle East University
Faculty of Information Technology
Artificial Intelligence Department
Course: Python Programming 0431201 (Lab No.8 )
Instructor: Dr. Huthaifa Abuhammad

Correlation Heatmap
1.0
age 1 -0.077 0.037 0.078 0.058 0.069

0.8
fnlwgt -0.077 1 -0.043 0.00043 -0.01 -0.019

0.6
education-num 0.037 -0.043 1 0.12 0.08 0.15

capital-gain 0.078 0.00043 0.12 1 -0.032 0.078 0.4

capital-loss 0.058 -0.01 0.08 -0.032 1 0.054 0.2

hours-per-week 0.069 -0.019 0.15 0.078 0.054 1 0.0


age

capital-gain
fnlwgt

capital-loss
education-num

hours-per-week

Task 7: Occupation Distribution


Create a horizontal bar chart to analyse the distribution of individuals across various
occupations in the dataset. This visualisation helps compare the frequency of each
occupation, offering insights into the most and least common professions. Here’s how
you can approach it:

• Group and Count: Use Pandas to group the dataset by the ’occupation’ column
and count the individuals in each category.

• Sort and Plot: Sort the data for clarity and plot a horizontal bar chart with
Matplotlib, placing occupation names on the y-axis and their frequencies on the
x-axis.

• Label and Save: Ensure your chart is well-labelled with a clear title and axis
labels, and save the figure as a PDF file for your records.

This task will enable you to practice data grouping, sorting, and creating an infor-
mative bar chart that reveals the occupational diversity in the dataset.

5
Middle East University
Faculty of Information Technology
Artificial Intelligence Department
Course: Python Programming 0431201 (Lab No.8 )
Instructor: Dr. Huthaifa Abuhammad

Occupation Distribution
Armed-Forces
Priv-house-serv
Protective-serv
Tech-support
Farming-fishing
Handlers-cleaners
Transport-moving
Occupation

?
Machine-op-inspct
Other-service
Sales
Adm-clerical
Exec-managerial
Craft-repair
Prof-specialty
0 500 1000 1500 2000 2500 3000 3500 4000
Count

Task 8: Box Plot for Hours-per-Week by Gender


Create a box plot to compare the distribution of hours worked per week between genders.
This task will help you understand variations in work hours across gender lines. Here’s
how to proceed:
- Filter and Categorise: Use Pandas to filter the data by gender and categorise
the hours worked per week. - Create the Box Plot: Utilise Matplotlib or Seaborn to
create a box plot that displays the distribution of weekly work hours for each gender. -
Interpret the Plot: Analyse the plot to understand work-hour differences. Look for
patterns such as median, quarterlies, and potential outliers. - Save Your Work: Save
the plot as a PDF file for easy sharing and future reference.
This exercise will enhance your skills in data visualisation, focusing on using box
plots for comparative analysis.

6
Middle East University
Faculty of Information Technology
Artificial Intelligence Department
Course: Python Programming 0431201 (Lab No.8 )
Instructor: Dr. Huthaifa Abuhammad

Hours per Week by Gender


100

80

60
Hours per Week

40

20

0
Male Female
Gender

Task 9: Pairplot of Select Features


This task involves selecting a few variables from the dataset, such as age, education,
hours per week, and income, and creating a Seaborn pairplot to visualise their pairwise
relationships. Pairplots are helpful for spotting trends, correlations, and patterns in
data. Follow these steps:

• Select Variables: Choose a set of variables from the dataset that might have
exciting relationships.

• Data Preparation: Ensure the selected data is clean and properly formatted for
visualisation. This may involve converting data types or handling missing values.

• Create the Pairplot: Use Seaborn’s ‘pairplot‘ function to create a grid of scatter
plots for each pair of variables. This function automatically creates both scatter
plots and histograms.

• Customisation: Customise your pairplot with different colours or styles to make


distinct species or categories stand out.

• Analysis and Interpretation: Observe the scatter plots to identify any notice-
able relationships, trends, or outliers among the selected variables.

7
Middle East University
Faculty of Information Technology
Artificial Intelligence Department
Course: Python Programming 0431201 (Lab No.8 )
Instructor: Dr. Huthaifa Abuhammad

• Save the Visualisation: Save your pairplot in a suitable format for your report
or presentation.

By completing this task, you will gain experience in creating complex visualisations
that can reveal intricate relationships between multiple data dimensions.

80

60
age

40

20

15.0
12.5
education-num

10.0
income
7.5 <=50K
5.0 >50K
2.5

100
80
hours-per-week

60
40
20
0
20 40 60 80 0 5 10 15 0 25 50 75 100
age education-num hours-per-week

Task 10: Custom Visualisation Challenge


• Encourage students to create a custom visualisation that combines multiple plot
types or explores an aspect of the dataset not covered in previous tasks.
• Students should explain their choice of visualisation and any insights they gleaned
from the data.

You might also like