DS_Practical_01
DS_Practical_01
: 210430116121
Experiment No: 1
Date:
AIM: Exploratory data analysis using matplotlib and seaborn in python Introduction:
Data exploration and visualization are important steps in the data analysis process. In this lab,
students will learn how to explore and visualize data using mathematical and statistical tools such
as histograms, box plots, scatter plots, and correlation matrices. Students will also learn how to use
matplotlib & Seaborn to perform these analyses.
Objectives:
Materials:
Procedure:
Example:
Education Level Marital Status Employment
Age Gender Income Status Industry
32 Female 45000 Bachelor's Single Employed Technology
1
Information Technology
Enrolment No.: 210430116121
This dataset includes information on age, gender, income, education level, and marital status,
employment status and Industry for a sample of 25 individuals. This data could be used to explore
and visualize various relationships and patterns, such as the relationship between age and income,
or the distribution of income by education level. Few more relationships and patterns that could be
explored and visualized using the sample dataset I provided:
1. Relationship between age and income: Create a scatter plot to see if there is a relationship
between age and income. Also calculate the correlation coefficient to determine the
strength and direction of the relationship.
2
Information Technology
Enrolment No.: 210430116121
2. Distribution of income by gender: Create a box plot to compare the distribution of income
between males and females. This could reveal any differences in the median, quartiles, and
outliers for each gender.
3. Distribution of income by education level: Create a box plot to compare the distribution of
income for each level of education. This could reveal any differences in the median,
quartiles, and outliers for each education level.
4. Relationship between age and education level: Create a histogram to see the distribution of
ages for each education level. This could reveal any differences or similarities in the age
distribution across education levels.
Observations and output:
3
Information Technology
Enrolment No.: 210430116121
4
Information Technology
Enrolment No.: 210430116121
import numpy as np
dataframe=pd.DataFrame(df, columns=['Income', 'Age']) matrix =
dataframe.corr() print(matrix)
heatmap=sns.heatmap(df.corr(),vmin=-1, vmax=1, annot=True);
heatmap.set_title('Correlation Heatmap', fontdict={'fontsize':12}, pad=12);
5
Information Technology
Enrolment No.: 210430116121
1. Relationship between age and income: Create a scatter plot to see if there is a relationship
between age and income. Also calculate the correlation coefficient to determine the strength
and direction of the relationship
6
Information Technology
Enrolment No.: 210430116121
2. Distribution of income by gender: Create a box plot to compare the distribution of income
between males and females. This could reveal any differences in the median, quartiles, and
outliers for each gender.
f, ax = plt.subplots(figsize=(5, 5))
sns.boxplot(data = df , x = 'Gender' , y = 'Income') plt.show()
3. Distribution of income by education level: Create a box plot to compare the distribution
of income for each level of education. This could reveal any differences in the median, quartiles,
and outliers for each education level.
f, ax = plt.subplots(figsize=(5, 5))
sns.boxplot(data = df , x = 'education level' , y = 'Income') plt.show()
7
Information Technology
Enrolment No.: 210430116121
4. Relationship between age and education level: Create a histogram to see the distribution
of ages for each education level. This could reveal any differences or similarities in the age
distribution across education levels.
Conclusion:
In this lab, students learned how to explore and visualize data using mathematical and statistical
tools such as histograms, box plots, scatter plots, and correlation matrices. These tools are useful
in identifying patterns and relationships in data, and in making informed decisions based on data
analysis. The skills students have learned in this lab will be helpful in your future studies and career
in data analysis.
Quiz: (Sufficient space to be provided for the answers or use extra file pages to write answers)
1. What are the measures of central tendency? Provide examples and explain when each
measure is appropriate to use.
2. How can you calculate the correlation coefficient between two variables using
mathematical and statistical tools? Interpret the correlation coefficient value.
3. Explain the concept of skewness and kurtosis in statistics. How can you measure and
interpret these measures using mathematical and statistical tools?
Suggested References:
1. "Python for Data Analysis" by Wes McKinney
8
Information Technology
Enrolment No.: 210430116121
9
Information Technology