0% found this document useful (0 votes)
27 views9 pages

DS_Practical_01

The document outlines an experiment focused on exploratory data analysis using Python's matplotlib and Seaborn libraries. Students will learn to visualize data through various statistical tools and present their findings effectively. The lab includes procedures for creating histograms, box plots, scatter plots, and correlation matrices, using a sample dataset to explore relationships among variables such as age, income, and education level.

Uploaded by

Armankhan Pathan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views9 pages

DS_Practical_01

The document outlines an experiment focused on exploratory data analysis using Python's matplotlib and Seaborn libraries. Students will learn to visualize data through various statistical tools and present their findings effectively. The lab includes procedures for creating histograms, box plots, scatter plots, and correlation matrices, using a sample dataset to explore relationships among variables such as age, income, and education level.

Uploaded by

Armankhan Pathan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Enrolment No.

: 210430116121

Experiment No: 1

Date:

AIM: Exploratory data analysis using matplotlib and seaborn in python Introduction:

Data exploration and visualization are important steps in the data analysis process. In this lab,
students will learn how to explore and visualize data using mathematical and statistical tools such
as histograms, box plots, scatter plots, and correlation matrices. Students will also learn how to use
matplotlib & Seaborn to perform these analyses.

Relevant CO: CO1, CO2

Objectives:

1. To understand the importance of data exploration and visualization in data analysis.


2. To learn how to use Seaborn to create histograms, box plots, scatter plots, and correlation
matrices.
3. To interpret the results of these analyses and draw conclusions from them.
4. To present the results of these analyses in a clear and effective manner.

Materials:

- A computer with Jupyter Notebook installed.


- Sample dataset provided by subject faculty or shown below.

Procedure:

1. Open Jupyter notebook and create a new notebook.


2. Input the sample dataset provided by your faculty into a notebook using pandas.
3. Use Seaborn to create a histogram for each column in the dataset.
4. Use Seaborn to create a box plot for column age and gender in the dataset.
5. Create a scatter plot for two variables in the dataset using seaborn scatter plot function.
6. Create a correlation matrix for all numeric variables in the dataset using seaborn.
7. Interpret the results of these analyses and draw conclusions from them.
8. Present the results of these analyses in a clear and effective manner using charts and graphs.
9. Submit the completed workbook and presentation to the faculty for grading.

Example:
Education Level Marital Status Employment
Age Gender Income Status Industry
32 Female 45000 Bachelor's Single Employed Technology

1
Information Technology
Enrolment No.: 210430116121

45 Male 65000 Master's Married Employed Finance


28 Female 35000 High School Single Unemployed None
52 Male 80000 Doctorate Married Employed Education

36 Female 55000 Bachelor's Divorced Employed Healthcare


40 Male 70000 Bachelor's Married Self-Employed Consulting
29 Female 40000 Associate's Single Employed Retail
55 Male 90000 Master's Married Employed Engineering
33 Female 47000 Bachelor's Single Employed Government
47 Male 75000 Bachelor's Married Self-Employed Entertainment
41 Female 60000 Master's Single Employed Nonprofit
38 Male 52000 High School Divorced Employed Construction
31 Female 48000 Bachelor's Married Employed Technology
49 Male 85000 Doctorate Married Employed Finance
27 Female 30000 High School Single Unemployed None
54 Male 92000 Master's Married Employed Education
39 Female 58000 Bachelor's Married Self-Employed Consulting
30 Male 42000 Associate's Single Employed Retail
56 Female 96000 Doctorate Married Employed Healthcare
35 Male 55000 Bachelor's Single Employed Government
48 Female 73000 Bachelor's Married Self-Employed Entertainment
42 Male 65000 Master's Divorced Employed Nonprofit
37 Female 50000 High School Married Employed Construction
34 Male 49000 Bachelor's Single Unemployed None
51 Female 82000 Master's Married Employed Engineering

This dataset includes information on age, gender, income, education level, and marital status,
employment status and Industry for a sample of 25 individuals. This data could be used to explore
and visualize various relationships and patterns, such as the relationship between age and income,
or the distribution of income by education level. Few more relationships and patterns that could be
explored and visualized using the sample dataset I provided:

1. Relationship between age and income: Create a scatter plot to see if there is a relationship
between age and income. Also calculate the correlation coefficient to determine the
strength and direction of the relationship.

2
Information Technology
Enrolment No.: 210430116121

2. Distribution of income by gender: Create a box plot to compare the distribution of income
between males and females. This could reveal any differences in the median, quartiles, and
outliers for each gender.

3. Distribution of income by education level: Create a box plot to compare the distribution of
income for each level of education. This could reveal any differences in the median,
quartiles, and outliers for each education level.

4. Relationship between age and education level: Create a histogram to see the distribution of
ages for each education level. This could reveal any differences or similarities in the age
distribution across education levels.
Observations and output:

import pandas as pd df=pd.read_excel(r"practical1.xlsx") df

Histogram for each column in the dataset.

import seaborn as sns import matplotlib.pyplot as plt


# import done to avoid warnings from warnings
import filterwarnings

3
Information Technology
Enrolment No.: 210430116121

f, ax = plt.subplots(4,2,figsize=(15, 15)) sns.histplot(df['Age']


,ax=ax[0,0]) sns.histplot(df['Gender'] ,ax=ax[0,1])
sns.histplot(df['Income'] ,ax=ax[1,0]) sns.histplot(df['education level']
,ax=ax[1,1]) sns.histplot(df['marital status'] ,ax=ax[2,0])
sns.histplot(df['employment status'] ,ax=ax[2,1])
sns.histplot(df['industry'],ax=ax[3,0]) plt.tight_layout() plt.show()

4
Information Technology
Enrolment No.: 210430116121

Correlation matrix for Age and Income

import numpy as np
dataframe=pd.DataFrame(df, columns=['Income', 'Age']) matrix =
dataframe.corr() print(matrix)
heatmap=sns.heatmap(df.corr(),vmin=-1, vmax=1, annot=True);
heatmap.set_title('Correlation Heatmap', fontdict={'fontsize':12}, pad=12);

5
Information Technology
Enrolment No.: 210430116121

1. Relationship between age and income: Create a scatter plot to see if there is a relationship
between age and income. Also calculate the correlation coefficient to determine the strength
and direction of the relationship

import seaborn as sns import


matplotlib.pyplot as plt
# import done to avoid warnings from warnings
import filterwarnings
sns.scatterplot(x="Income", y="Age", data=df,hue="Gender") plt.show()

6
Information Technology
Enrolment No.: 210430116121

2. Distribution of income by gender: Create a box plot to compare the distribution of income
between males and females. This could reveal any differences in the median, quartiles, and
outliers for each gender.

f, ax = plt.subplots(figsize=(5, 5))
sns.boxplot(data = df , x = 'Gender' , y = 'Income') plt.show()

3. Distribution of income by education level: Create a box plot to compare the distribution
of income for each level of education. This could reveal any differences in the median, quartiles,
and outliers for each education level.

f, ax = plt.subplots(figsize=(5, 5))
sns.boxplot(data = df , x = 'education level' , y = 'Income') plt.show()

7
Information Technology
Enrolment No.: 210430116121

4. Relationship between age and education level: Create a histogram to see the distribution
of ages for each education level. This could reveal any differences or similarities in the age
distribution across education levels.

f , ax = plt.subplots(2,1, figsize=(5, 5)) sns.histplot(df['education


level'], ax=ax[0]) sns.histplot(df['Age'] ,ax=ax[1])
plt.tight_layout()

Conclusion:
In this lab, students learned how to explore and visualize data using mathematical and statistical
tools such as histograms, box plots, scatter plots, and correlation matrices. These tools are useful
in identifying patterns and relationships in data, and in making informed decisions based on data
analysis. The skills students have learned in this lab will be helpful in your future studies and career
in data analysis.

Quiz: (Sufficient space to be provided for the answers or use extra file pages to write answers)
1. What are the measures of central tendency? Provide examples and explain when each
measure is appropriate to use.
2. How can you calculate the correlation coefficient between two variables using
mathematical and statistical tools? Interpret the correlation coefficient value.
3. Explain the concept of skewness and kurtosis in statistics. How can you measure and
interpret these measures using mathematical and statistical tools?

Suggested References:
1. "Python for Data Analysis" by Wes McKinney

8
Information Technology
Enrolment No.: 210430116121

2. "Data Visualization with Python and Matplotlib" by Benjamin Root

Rubrics wise marks obtained

Understanding of Analysis of Capability of Documentation


Problem the Problem writing program Total
02 02 05 01 10

9
Information Technology

You might also like