0% found this document useful (0 votes)
7 views

Seaborn

Seaborn is a Python library designed for data visualization that simplifies the creation of various plots using matplotlib, particularly with pandas dataframes. It includes sample datasets and provides functions for creating scatter plots, bar plots, box plots, and more, allowing for customization of themes, colors, and sizes. The document outlines how to use Seaborn for visualizing data, including examples of different plot types and customization options.

Uploaded by

sheela471983
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Seaborn

Seaborn is a Python library designed for data visualization that simplifies the creation of various plots using matplotlib, particularly with pandas dataframes. It includes sample datasets and provides functions for creating scatter plots, bar plots, box plots, and more, allowing for customization of themes, colors, and sizes. The document outlines how to use Seaborn for visualizing data, including examples of different plot types and customization options.

Uploaded by

sheela471983
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Seaborn

Seaborn is Python library for visualizing data. Seaborn uses matplotlib to


create graphics, but it provides tools that make it much easier to create
several types of plots. In particular, it is simple to use seaborn with
pandas dataframes.
import seaborn as sns
import matplotlib.pyplot as plt # needed to use matplotlib functionality directly

Seaborn includes sample sets of data which will use here to illustrate its
features. The function sns.get_dataset_names() displays names of the
available datasets:
sns.get_dataset_names()

['anagrams',
'anscombe',
'attention',
'brain_networks',
'car_crashes',
'diamonds',
'dots',
'exercise',
'flights',
'fmri',
'gammas',
'geyser',
'iris',
'mpg',
'penguins',
'planets',
'tips',
'titanic']

We will use the tips dataset which contains data on restaurant visits: the
total bill amount, tip amount, sex of the person paying the bill, whether
the visiting group included smokers, day and time (dinner or lunch) of the
visit, and the size of the group. The function sns.load_dataset() can be used
to get a pandas dataframe with the data:
tips = sns.load_dataset('tips')
tips.head()
da
total_bill tip sex smoker time size
y

Femal Su
0 16.99 1.01 No Dinner 2
e n

Su
1 10.34 1.66 Male No Dinner 3
n

Su
2 21.01 3.50 Male No Dinner 3
n

Su
3 23.68 3.31 Male No Dinner 2
n

Femal Su
4 24.59 3.61 No Dinner 4
e n

Plots from dataframes


In order to produce a scatter plot showing the bill amount on the � axis and the tip amount
on the � axis we just need to specify the dataframe and names of the appropriate columns:
sns.relplot(data=tips, # dataframe
x="total_bill", # x-values column
y="tip" # y-values column
)
plt.show()
By default, seaborn uses the original matplotlib settings for fonts, colors etc. The
function sns.set_theme() can be used to modify these settings:

sns.set_theme(
style="darkgrid", # sets background color, grid visibility etc.
font_scale=1.2, # scales all fonts by the factor 1.2
)

sns.relplot(data=tips,
x="total_bill",
y="tip"
)
plt.show()
We can use the values in the “day” column to assign marker colors:
sns.relplot(data=tips,
x="total_bill",
y="tip",
hue="day" # assign colors based on values of this column
)
plt.show()
Next, we set different marker sizes based on values in the “size” column:
sns.relplot(data=tips,
x="total_bill",
y="tip",
hue="day",
size="size" # assigns marker sizes based on values of this column
)
plt.show()
We can also split the plot into subplots based on values of some column. Below we create
two subplots, each displaying data for a different value of the “time” column:
sns.relplot(data=tips,
x="total_bill",
y="tip",
hue="day",
size="size",
col="time", # columns of subplots depending on "time" value
)
plt.show()
We can subdivide the plot even further using values of the “sex” column:
sns.relplot(data=tips,
x="total_bill",
y="tip",
hue="day",
size="size",
col="time", # columns of subplots depending on "time" value
row="sex", # rows of subplots depending on "sex" value
height=3.4
)
plt.show()
Figure-level and axes-level functions
Functions provided by seaborn come in two types. Figure-level functions generate the whole
matplotlib figure which may consist of several subplots. The function sns.relplot() used in
the examples above if such a figure-level function. Axes-level functions create plots in a
matplotlib axes object, i.e. in a single subplot. Plots obtained using such a function can be
combined with plots created using other seaborn axes-level functions or just standard
matplotlib functions:
plt.figure(figsize=(10, 4))
sns.scatterplot(data=tips, x="total_bill", y="tip") # seaborn plot
plt.plot([0, 50], [0, 10], 'r-') # add a line using matplotlib
plt.title("Sample plot")
plt.show()
Below we create a figure with three subplots. Two of them use axes level seaborn functions,
and the third is regular matplotlib plot.
plt.figure(figsize=(15, 5))
plt.suptitle("Multiple plots")
plt.subplot(131)
sns.scatterplot(data=tips, x="total_bill", y="tip")
plt.subplot(132)
sns.histplot(data=tips, x="tip", bins=10, hue="time", multiple="stack")
plt.subplot(133)
plt.plot([1, 2, 3], [2, 4, 1], 'ro-')
plt.show()

Customizing plots
Seaborn functions have several arguments which can be used to control appearance of plots:
size of plots, colors, marker sizes etc.
sns.relplot(data=tips,
x="total_bill",
y="tip",
hue="day",
size="size",
row="time",
sizes=(50, 400), # a tuple with minimal and maximal size of markers
palette = "bright", # color palette used to plot markers
height= 3, # height of each subplot
aspect = 4 # aspect ratio: the width of each subplot will be 1.5 times
its height
)

plt.show()

To further customize appearance of plots we can use the function sns.set_theme() (which
we have seen once already) with the rc argument. The value of this argument should be a
dictionary of matplotlib rc parameters:
sns.set_theme(
rc={
'axes.facecolor': 'linen',
'grid.color': 'tan',
'grid.linewidth': 3,
'grid.alpha': 0.2,
'font.family': "monospace",
'axes.labelsize': 16
})

sns.relplot(
data=tips,
x="total_bill",
y="tip",
hue="day",
size="size",
row="time",
sizes=(50, 400),
palette="bright",
height=3,
aspect=4
)

plt.show()

The values of currently used rc parameters can be displayed using


the sns.axes.style() function:

sns.axes_style()

{'axes.facecolor': 'linen',
'axes.edgecolor': 'white',
'axes.grid': True,
'axes.axisbelow': True,
'axes.labelcolor': '.15',
'figure.facecolor': 'white',
'grid.color': 'tan',
'grid.linestyle': '-',
'text.color': '.15',
'xtick.color': '.15',
'ytick.color': '.15',
'xtick.direction': 'out',
'ytick.direction': 'out',
'lines.solid_capstyle': 'round',
'patch.edgecolor': 'w',
'patch.force_edgecolor': True,
'image.cmap': 'rocket',
'font.family': ['monospace'],
'font.sans-serif': ['Arial',
'DejaVu Sans',
'Liberation Sans',
'Bitstream Vera Sans',
'sans-serif'],
'xtick.bottom': False,
'xtick.top': False,
'ytick.left': False,
'ytick.right': False,
axes.spines.left': True,
'axes.spines.bottom': True,
'axes.spines.right': True,
'axes.spines.top': True}

Seaborn plot types


Below are examples of plots which can be created using seaborn. For the
full list of seaborn functions and their options see the seaborn
documentation.

In all examples we will use sample datasets provided with seaborn.


import seaborn as sns
import matplotlib.pyplot as plt

# load DataFrames with sample data


tips = sns.load_dataset('tips')
flights = sns.load_dataset('flights')

print("\ntips:")
display(tips.head())
print("\nflights:")
display(flights.head())

tips:

da
total_bill tip sex smoker time size
y

Femal Su
0 16.99 1.01 No Dinner 2
e n

Su
1 10.34 1.66 Male No Dinner 3
n

Su
2 21.01 3.50 Male No Dinner 3
n

Su
3 23.68 3.31 Male No Dinner 2
n

Femal Su
4 24.59 3.61 No Dinner 4
e n

flights:
mont
year passengers
h

0 1949 Jan 112

1 1949 Feb 118

2 1949 Mar 132

3 1949 Apr 129

4 1949 May 121

Scatter plot
sns.set_theme(font_scale=1.2) # set theme for plots

plt.figure(figsize=(14, 5))
sns.scatterplot(data=tips,
x="total_bill",
y="tip",
hue="sex"
)

plt.title("Scatter plot")
plt.show()

Line plot
plt.figure(figsize=(14, 5))

sns.lineplot(data = flights,
x="month",
y="passengers",
hue = "year",
palette = "muted"
)
plt.title("Line plot")
plt.show()

Bar plot
By default values in a column with categorical data are plotted in the order they are
encountered in the DataFrame. In the example below it means that the order of days on the x-
axis would not necessarily correspond to the usual ordering of days in a week. We can
override this by assigning a list of values of the x-column to the order argument. Bars will
be plotted according to the ordering of the list.
# DataFrame with total tip amounts for a given day and sex
t = tips.groupby(["day", "sex"])["tip"].sum().reset_index()
t.head()

day sex tip

0 Thur Male 89.41

Femal
1 Thur 82.42
e

2 Fri Male 26.93

Femal
3 Fri 25.03
e

4 Sat Male 181.95


plt.figure(figsize=(14, 5))
sns.barplot(data=t,
x="day",
y="tip",
hue="sex",
order = [ "Thur", "Fri", "Sat", "Sun"],
palette = "muted"
)
plt.title("Bar plot")
plt.show()

Strip plot
In a strip plot values in each category are plotted along the y-axis. Plotted points have their x-
coordinates randomized a bit, to decrease overlapping.
plt.figure(figsize=(14, 5))
sns.stripplot(data=tips,
x="day",
y="tip",
hue="sex",
dodge=True, # separate strips of points of different colors
order = [ "Thur", "Fri", "Sat", "Sun"],
palette = "muted"
)

plt.title("Strip plot")
plt.show()

Swarm plot
Swarm plot is similar to the strip plot, but x-coordinates of points are adjusted so that the
points do not overlap.
plt.figure(figsize=(14, 5))
sns.swarmplot(data=tips,
x="day",
y="tip",
hue="sex",
dodge=True, # separate strips of points of different colors
order = [ "Thur", "Fri", "Sat", "Sun"],
palette = "muted"
)

plt.title("Swarm plot")
plt.show()

Box plot
Components of a box plot:

 The lower edge of a box marks the first quartile: 25% of data values are below it.
 The line inside a box marks the median: 50% of data values are below, and 50% is
above it.
 The upper edge of a box marks the third quartile: 75% of data values are below it.
 The height of the box (i.e. the difference between the first and third quartiles) is
called the Interquartile Range (IRQ).
 The length of whiskers can be controlled by the whis argument. The whiskers of
a box extend to the smallest and larges data values which are within whis × IQR
from the lower and upper edges of a box.
 Data values which are outside the range of whiskers are considered to be outliers.
They are plotted as individual points.
plt.figure(figsize=(14, 5))
sns.boxplot(data=tips,
x="day",
y="total_bill",
hue="sex",
dodge=True, # separate boxes of different colors
width=0.7, # width of boxes
whis=1.5, # controls the length of whiskers of a box
order = [ "Thur", "Fri", "Sat", "Sun"],
palette = "deep"
)

plt.title("Box plot")
plt.show()

Violin plot
Violin plots show kernel density estimate (KDE) of data.
plt.figure(figsize=(14, 5))
sns.violinplot(data=tips,
x="day",
y="total_bill",
hue="sex",
dodge=True, # separate plots of different colors
width=0.5, # width of plots
order = [ "Thur", "Fri", "Sat", "Sun"],
palette = "bright",
)

plt.title("Violin plot")
plt.show()
Histogram plot
plt.figure(figsize=(14, 5))
sns.histplot(data=tips,
x="total_bill",
stat="density", # normalize the histogram so that the total area of bars is
1
kde=True, # plot kernel density estimate
bins=25 # number of bins of the histogram
)

plt.title("Histogram plot")
plt.show()

Joint plot
The function sns.jointplot() produces a plot of data points together with marginal
subplots. Below we use it to add histograms on the margins of the x-axis and y-axis of a
scatter plot. While all previous examples used axes-level functions, sns.jointplot() is a
figure-level function, i.e. it builds the whole matplolib figure.
sns.jointplot(data=tips,
x="total_bill",
y="tip",
kind="scatter", # plot kind
height=6.5, # plot height
)

plt.suptitle("Join plot", y=1.02) # adds title to the plot


plt.show()

Pair plot
Some types of plots (e.g. the scatter plot or the line plot) show relationships between two
variables. The function sns.pairplot() is useful if we are dealing with more than two
variables. It produces a grid of subplots, one subplot for each pair of variables. Subplots on
the diagonal of the grid, which depend only on one variable, can be used to illustrate this
single variable using its histogram, KDE etc. The function sns.pairplot() is a figure-level
function.
sns.pairplot(data=tips,
vars = ["tip", "total_bill", "size"], # names of columns used for the plot
kind="scatter", # kind of plots for each pair of different
columns
diag_kind="kde", # kind of plots on the diagonal
hue="sex",
height=2,
aspect=1.1,
palette="muted"
)

plt.suptitle("Pair plot", y=1.02) # adds title to the plot


plt.show()

You might also like