0% found this document useful (0 votes)
41 views13 pages

06 Seaborn

This document contains code to analyze and visualize data from a sales dataset using seaborn and matplotlib libraries in Python. Various plots are created including scatter plots, histograms, density plots, pair plots and bar plots to understand relationships between variables like work experience, salary, sales and different divisions. Data is loaded from a CSV file and different attributes are explored.

Uploaded by

Anonymous 001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views13 pages

06 Seaborn

This document contains code to analyze and visualize data from a sales dataset using seaborn and matplotlib libraries in Python. Various plots are created including scatter plots, histograms, density plots, pair plots and bar plots to understand relationships between variables like work experience, salary, sales and different divisions. Data is loaded from a CSV file and different attributes are explored.

Uploaded by

Anonymous 001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

seaborn

March 7, 2024

[26]: import pandas as pd


import seaborn as sns
import matplotlib.pyplot as plt

[27]: df = pd.read_csv("dm_office_sales.csv")
df.head()

[27]: division level of education training level work experience \


0 printers some college 2 6
1 printers associate's degree 2 10
2 peripherals high school 0 9
3 office supplies associate's degree 2 5
4 office supplies high school 1 5

salary sales
0 91684 372302
1 119679 495660
2 82045 320453
3 92949 377148
4 71280 312802

[28]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 division 1000 non-null object
1 level of education 1000 non-null object
2 training level 1000 non-null int64
3 work experience 1000 non-null int64
4 salary 1000 non-null int64
5 sales 1000 non-null int64
dtypes: int64(4), object(2)
memory usage: 47.0+ KB

[29]: df.shape

1
[29]: (1000, 6)

[30]: #scatter plot


plt.figure(figsize = (12,8))
sns.scatterplot(x='work experience',y='sales',data=df)
plt.title('this is scatter plot')

#Graph shows a positive co-relation as y is directly prop. to x

[30]: Text(0.5, 1.0, 'this is scatter plot')

[31]: sns.scatterplot(x = 'salary', y = 'sales', data=df, hue = 'level of education',␣


↪style = 'division')

plt.legend(loc = (1.1,0.5))

[31]: <matplotlib.legend.Legend at 0x2a5065c77d0>

2
[32]: #displacement plot
sns.displot(data=df,x='salary')

C:\Users\chand\anaconda3\Lib\site-packages\seaborn\axisgrid.py:118: UserWarning:
The figure layout has changed to tight
self._figure.tight_layout(*args, **kwargs)

[32]: <seaborn.axisgrid.FacetGrid at 0x2a507642a50>

3
[33]: #histogram
sns.set(style='darkgrid')
sns.histplot(data=df,x='sales', bins = 10) #bins helps to zoom in or zoom out␣
↪the plot

[33]: <Axes: xlabel='sales', ylabel='Count'>

4
0.1 KDE
[34]: # KDE is kernel density estimation maps an estimate of probability density␣
↪function of a random variable.

# KDE is fundamental smoothing problem.

[35]: sns.histplot(data=df,x='sales', bins = 100, kde= True)

[35]: <Axes: xlabel='sales', ylabel='Count'>

5
[36]: import numpy as np
sample = np.random.randint(0,80,200)
sample_age_df = pd.DataFrame(sample,columns=['age'])
sample_age_df.head()

[36]: age
0 75
1 5
2 31
3 26
4 66

[37]: sns.histplot(data=sample_age_df,x='age',kde=True, bins=100)

[37]: <Axes: xlabel='age', ylabel='Count'>

6
[38]: sns.kdeplot(data=sample_age_df,x='age')

[38]: <Axes: xlabel='age', ylabel='Density'>

7
[39]: sample_age_df.to_csv('eg.csv') #to save a data

[40]: #countplot
plt.figure(figsize = (12,8))
sns.countplot(x='division',data=df,hue= 'level of education')

[40]: <Axes: xlabel='division', ylabel='count'>

8
[41]: #barplot()
sns.barplot(x='division', y='sales',data=df, estimator = np.min)

[41]: <Axes: xlabel='division', ylabel='sales'>

9
[42]: #help(sns.barplot)

0.2 Pair Plot


[43]: #pairplot
sns.pairplot(df)

C:\Users\chand\anaconda3\Lib\site-packages\seaborn\axisgrid.py:118: UserWarning:
The figure layout has changed to tight
self._figure.tight_layout(*args, **kwargs)

[43]: <seaborn.axisgrid.PairGrid at 0x2a508490690>

10
[44]: sns.pairplot(df, hue = 'division')

C:\Users\chand\anaconda3\Lib\site-packages\seaborn\axisgrid.py:118: UserWarning:
The figure layout has changed to tight
self._figure.tight_layout(*args, **kwargs)

[44]: <seaborn.axisgrid.PairGrid at 0x2a50a34c190>

11
[45]: sns.pairplot(df, hue = 'division', diag_kind='hist', corner = True)
#corner= True helps in avoiding repetition of plots

C:\Users\chand\anaconda3\Lib\site-packages\seaborn\axisgrid.py:118: UserWarning:
The figure layout has changed to tight
self._figure.tight_layout(*args, **kwargs)

[45]: <seaborn.axisgrid.PairGrid at 0x2a50b3c4f50>

12
13

You might also like