0% found this document useful (0 votes)
7 views

vertopal.com_Python Seaborn Tutorial for Beginners v2

This document is a comprehensive tutorial on Seaborn, a Python data visualization library built on Matplotlib, aimed at beginners. It covers various types of plots including line plots, bar plots, histograms, box plots, scatter plots, lmplots, pair plots, joint plots, and heat maps, along with examples of how to implement them using raw data. The tutorial also includes instructions for importing necessary packages and loading data, making it a practical guide for visualizing statistical data and identifying trends.

Uploaded by

kart238
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

vertopal.com_Python Seaborn Tutorial for Beginners v2

This document is a comprehensive tutorial on Seaborn, a Python data visualization library built on Matplotlib, aimed at beginners. It covers various types of plots including line plots, bar plots, histograms, box plots, scatter plots, lmplots, pair plots, joint plots, and heat maps, along with examples of how to implement them using raw data. The tutorial also includes instructions for importing necessary packages and loading data, making it a practical guide for visualizing statistical data and identifying trends.

Uploaded by

kart238
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Python Seaborn Tutorial for Beginners

What is Seaborn?
• Python Data Visualization Library - based on MatPlotLib (see previous tutorials)
• Used for plotting statistical graphs, identifying trends, relationships & outliers
• In my opinion, Seaborn is easier & faster to use (less code) Vs MatPlotLib
# Visual representation of Seaborn

import os
from IPython.display import Image
PATH = "F:\\Github\\Python tutorials\\Introduction to Seaborn\\"
Image(filename = PATH + "Seaborn.png", width=900, height=900)
Tutorial Overview
• What is matplotlib and how/why it's used

• Trend Plots:
– Line Plots
• Summary Plots:
– Bar Plots
• Distribution of Data:
– Histogram
– Box Plots
• Relationship Plots
– Scatter Plots
– lmplot (combo of regplot() and FacetGrid)
• Holistic views / Combo:
– Sub Plots
– Pair Plots
– Join Plots
• Correlation / Relationships:
– Heat Maps

Tutorial Overview by video


Video 1:
1. What is matplotlib and how/why it's used
2. Line Plots
3. Bar Plots
4. Histogram

Video 2:
1. Box Plots
2. Scatter Plots
3. lmplot (combo of regplot() and FacetGrid)

Video 3:
1. Sub Plots
2. Pair Plots
3. Join Plots
4. Heat Maps

Importing / Installing packages


# Packages / libraries
import os #provides functions for interacting with the operating
system
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
# To install seaborn type "pip install seaborn" to the anaconda
terminal
# import sys
# !conda list Check the packages installed

1. Loading the Raw Data


# Loading the data

raw_data = pd.read_csv("F:\\Github\Python tutorials\\Introduction to


Seaborn\\Marketing Raw Data.csv")
# runs all the data
raw_data

print(raw_data.shape)

#runs the first 5 rows


raw_data.head()

(182, 11)

Date Week Week_ID Month Month_ID Year Day_Name


Visitors \
0 09/11/2020 46 34 11 11 2020 Monday
707
1 10/11/2020 46 34 11 11 2020 Tuesday
1455
2 11/11/2020 46 34 11 11 2020 Wednesday
1520
3 12/11/2020 46 34 11 11 2020 Thursday
1726
4 13/11/2020 46 34 11 11 2020 Friday
2134

Revenue Marketing Spend Promo


0 465 651.375 No Promo
1 10386 1298.250 Promotion Red
2 12475 1559.375 Promotion Blue
3 11712 1801.750 No Promo
4 10000 2614.500 No Promo

2. Line Gragh
# Example 1 - Simple 1 line graph
# Assuming we want to investigate the Revenue by Date

ax = sns.lineplot(x='Week_ID', y='Revenue', data = raw_data)

# Notes: error bands show the confidence interval


# Example 2 - Adding Categories

# By Promo
ax = sns.lineplot(x='Week_ID', y='Revenue', hue = 'Promo', data =
raw_data)
# Example 3 - By Promo with style
ax = sns.lineplot(x='Week_ID', y='Revenue', hue = 'Promo', style =
'Promo', data = raw_data)

# Example 4 - By Promo with style & Increase the size & Remove error
bars

# increase the size


sns.set(rc={'figure.figsize':(12,10)})

ax = sns.lineplot(x='Week_ID', y='Revenue', hue = 'Promo', style =


'Promo', data = raw_data, ci=None)
# Example 5 - By Promo with style & Increase the size & Remove error
bars & adding markers & by month

ax = sns.lineplot(x='Month_ID', y='Revenue', hue = 'Promo', style =


'Promo', data = raw_data, ci=None, markers=True)
# Example 6 - By Day_Name with style & Increase the size & Remove
error bars & adding markers & by month

ax = sns.lineplot(x='Month_ID', y='Revenue', hue = 'Day_Name', style =


'Promo', data = raw_data, ci=None, markers=True)
# Example 7 - By Year with style & Increase the size & Remove error
bars & adding markers & by month

ax = sns.lineplot(x='Year', y='Revenue', hue = 'Day_Name', style =


'Promo', data = raw_data, ci=None, markers=True)
LinePlot Documentation
https://seaborn.pydata.org/generated/seaborn.lineplot.html

3. Bar Plots
# Example 1 - Total Revenue by Month

ax = sns.barplot(x="Month_ID", y="Revenue", data=raw_data)

# Notes:
# 1 - the lines signify the confidence interval
# 2 - Takes mean by default

raw_data[['Month_ID', 'Revenue']].groupby('Month_ID', as_index =


False).agg({'Revenue':'mean'})

Month_ID Revenue
0 11 11255.454545
1 12 11667.806452
2 13 9588.516129
3 14 10683.892857
4 15 10555.354839
5 16 10806.500000
6 17 7636.000000

# Example 2 - Total Revenue by Month - Remove the Confidence Interval


ax = sns.barplot(x="Month_ID", y="Revenue", data=raw_data, ci=False)
# Example 3 - Total Revenue by Month - Remove the Confidence Interval
- By Promo
ax = sns.barplot(x="Month_ID", y="Revenue", data=raw_data, ci=False,
hue = 'Promo')
# Example 4 - Total Revenue by Month - Remove the Confidence Interval
- By Promo - Changing direction
ax = sns.barplot(x="Revenue", y="Year", ci=False, hue = 'Promo',
orient = 'h', data=raw_data)
# Example 5 - Total Revenue by Month - Remove the Confidence Interval
- By Promo - Changing direction - Changing Colour
ax = sns.barplot(x="Revenue", y="Year", ci=False, hue = 'Promo',
orient = 'h', data=raw_data, color="#1CB3B1")

# Cool Way to pick colours


# https://htmlcolorcodes.com/color-picker/
4. Histograms
# Example 1 - Investigating the distribution of Revenue

x = raw_data['Revenue'].values

sns.distplot(x, color = 'blue');

# As you can see, it's a bit imbalance. Right skewd distribution as


the mean is to the right
# Example 2 - Investigating the distribution of Revenue, adding the
mean

x = raw_data['Revenue'].values

sns.distplot(x, color = 'blue');

# Calculating the mean


mean = raw_data['Revenue'].mean()

#ploting the mean


plt.axvline(mean, 0,1, color = 'red')

<matplotlib.lines.Line2D at 0x1f6ca13ca58>
# Example 3 - Investigating the distribution of Visitors, adding the
mean

x = raw_data['Visitors'].values

sns.distplot(x, color = 'red');

# Calculating the mean


mean = raw_data['Visitors'].mean()

#ploting the mean


plt.axvline(mean, 0,1, color = 'blue')

<matplotlib.lines.Line2D at 0x1f6c9ba4358>
5. Box Plots
# Example 1 - Investigating the distribution of Revenue

x = raw_data['Revenue'].values

ax = sns.boxplot(x)

print('The meadian is: ', raw_data['Revenue'].median())

# Notes:
# The line signifies the median
# The box in the middle show the beginning of Q1 (25th percentile) and
the end of the Q3 (75th percentile)
# The whiskers (left - right) show the minimum quartile and maximum
quartile
# The dots on the right are "outliers"

The meadian is: 9452.0


# More Details

PATH = "F:\\Github\\Python tutorials\\Introduction to Seaborn\\"


Image(filename = PATH + "Seaborn boxplot.png", width=900, height=900)

# More details here: https://towardsdatascience.com/understanding-


boxplots-5e2df7bcbd51
# Credits: Michael Galarnyk
# Example 2 - Investigating the distribution of Revenue by Day

ax = sns.boxplot(x="Day_Name", y="Revenue", data=raw_data)


# Example 3 - Investigating the distribution of Revenue by Day -
Horizontal - change color

ax = sns.boxplot(x="Revenue", y="Day_Name", data=raw_data, color =


'#EE67CF')

# Cool Way to pick colours


# https://htmlcolorcodes.com/color-picker/
# Example 4 - Investigating the distribution of Revenue by Day -
changing color - adding hue

ax = sns.boxplot(x="Day_Name", y="Revenue", data=raw_data,


color="#B971E7", hue = 'Promo')

# Cool Way to pick colours


# https://htmlcolorcodes.com/color-picker/
# Example 5 - Investigating the distribution of Revenue by Day - by
color - by data points

ax = sns.boxplot(x="Day_Name", y="Revenue", data=raw_data, color =


'#D1EC46')
ax = sns.swarmplot(x="Day_Name", y="Revenue", data=raw_data,
color="red")
More on Boxplots here:
https://seaborn.pydata.org/generated/seaborn.boxplot.html#seaborn.boxplot

6. ScatterPlots
raw_data.columns

Index(['Date', 'Week', 'Week_ID', 'Month', 'Month_ID', 'Year',


'Day_Name',
'Visitors', 'Revenue', 'Marketing Spend', 'Promo'],
dtype='object')

# Example 1 - Relationship between Marketing Spend and Revenue

ax = sns.scatterplot(x="Marketing Spend", y="Revenue", data=raw_data,


color = 'green')
# Example 2 - Relationship between Marketing Spend and Revenue -
changing color, hue & Style

ax = sns.scatterplot(x="Marketing Spend", y="Revenue", data=raw_data,


color = 'green', hue = 'Promo', style = 'Promo')
# Example 3 - Relationship between Marketing Spend and Revenue -
changing color & hue - adding size

ax = sns.scatterplot(x="Marketing Spend", y="Revenue", data=raw_data,


color = 'green', hue = 'Promo', size = 'Revenue',
sizes=(20, 200))
# Example 4 - Relationship between Marketing Spend and Revenue -
changing color & hue - adding size - by day

ax = sns.scatterplot(x="Visitors", y="Revenue", data=raw_data, color =


'green', hue = 'Day_Name', size = 'Revenue',
sizes=(20, 200))
7. lmPlots
# Example 1 - Relationship between Marketing Spend and Revenue

ax = sns.lmplot(x="Marketing Spend", y="Revenue", data=raw_data,


height=9)

# Notes:
# What is Linear Regression: It is a predictive statistical method for
modelling the relationship between x (independent variable) & y
(dependent V).
# How it works (cost function MSE):
https://towardsdatascience.com/machine-learning-fundamentals-via-
linear-regression-41a5d11f5220
# Example 2 - Relationship between Marketing Spend and Revenue -
changing color, hue & Style

ax = sns.lmplot(x="Marketing Spend", y="Revenue", data=raw_data, hue =


'Promo', ci= False, height=9, markers=["o", "x", "+"])
# Example 3 - Relationship between Marketing Spend and Revenue - by
column

ax = sns.lmplot(x="Marketing Spend", y="Revenue", data=raw_data, col =


'Promo', ci= False, height=5,
line_kws={'color': 'red'},
scatter_kws={'color':'#ADC11E'})
# Example 4 - Relationship between Marketing Spend and Revenue - by
column - by day - add Jitter too

ax = sns.lmplot(x="Visitors", y="Revenue", data=raw_data, col =


'Day_Name', ci= False, height=4,
line_kws={'color': '#031722'},
scatter_kws={'color':'#1E84C1'},
col_wrap=3,
x_jitter=.3)
8. SubPlots
# Set up the matplotlib figure
fig, axes = plt.subplots(2, 2, figsize=(12, 7))

a = raw_data['Revenue'].values
b = raw_data['Visitors'].values
c = raw_data['Marketing Spend'].values

# plot 1
sns.distplot(a, color = 'blue', ax=axes[0,0])
# plot 2
sns.distplot(b, color = 'blue', ax=axes[0,1])

# plot 3
sns.distplot(c, color = 'blue', ax=axes[1,0])

# plot 4
sns.boxplot(x="Revenue", y="Day_Name", data=raw_data, color =
'#52F954', ax=axes[1,1])
sns.swarmplot(x="Revenue", y="Day_Name", data=raw_data, color="blue",
ax=axes[1,1])

<matplotlib.axes._subplots.AxesSubplot at 0x1f6cde977f0>

9. Pairplots
# Example 1 - running on all dataframe - green color
g = sns.pairplot(raw_data, plot_kws={'color':'green'})
# Example 2 - running on specific columns - green color
g = sns.pairplot(raw_data[['Revenue','Visitors','Marketing Spend']],
plot_kws={'color':'#0EDCA9'})

# Cool Way to pick colours


# https://htmlcolorcodes.com/color-picker/
# Example 3 - running on specific columns - adding hue
g = sns.pairplot(raw_data[['Revenue','Visitors','Marketing Spend',
'Promo']], hue = 'Promo')
# Example 4 - running on specific columns - adding hue - adding kind =
reg
g = sns.pairplot(raw_data[['Revenue','Visitors','Marketing Spend',
'Promo']], hue = 'Promo', kind="reg", size = 6)
More on Pairplots:
https://seaborn.pydata.org/generated/seaborn.pairplot.html

10. JoinPlots
Draw a plot of two variables with bivariate and univariate graphs.
# Example 1 - Revenue vs marketing Spend Relationship with
g = sns.jointplot("Revenue", "Marketing Spend", data=raw_data,
kind="reg", color = 'green', size = 10)
11. Heat Map
# First we need to create a "Dataset" to display on a Heatmap - we
will use a correlation dataset
# .corr() is used to find the pairwise correlation of all columns in
the dataframe. Any null values are automatically excluded
# The closer to 1 or -1 the better. As one variable increases, the
other variable tends to also increase / decrease
# More Info here: https://statisticsbyjim.com/basics/correlations/

# Example 1 - Heatmap for PC

pc = raw_data[['Revenue','Visitors','Marketing Spend',
'Promo']].corr(method ='pearson')

cols = ['Revenue CC','Visitors CC','Marketing Spend CC']

ax = sns.heatmap(pc, annot=True,
yticklabels=cols,
xticklabels=cols,
annot_kws={'size': 50})

# Example 2 - Heatmap for PC - changing cmap

ax = sns.heatmap(pc, annot=True,
yticklabels=cols,
xticklabels=cols,
annot_kws={'size': 50},
cmap="BuPu")
# Examples:
# cmap="YlGnBu"
# cmap="Blues"
# cmap="BuPu"
# cmap="Greens"

More details for Heatmaps here:


https://seaborn.pydata.org/generated/seaborn.heatmap.html

You might also like