0% found this document useful (0 votes)
2 views

Unit-3_part-2

This document provides an introduction to the Seaborn library, focusing on various types of categorical plots such as stripplots, swarmplots, boxplots, and violin plots. It also covers estimation plots like barplots and pointplots, as well as regression visualization using regplot and lmplot. The document includes code examples demonstrating how to create these plots using datasets like 'tips' and 'titanic'.

Uploaded by

sasek74445
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Unit-3_part-2

This document provides an introduction to the Seaborn library, focusing on various types of categorical plots such as stripplots, swarmplots, boxplots, and violin plots. It also covers estimation plots like barplots and pointplots, as well as regression visualization using regplot and lmplot. The document includes code examples demonstrating how to create these plots using datasets like 'tips' and 'titanic'.

Uploaded by

sasek74445
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Unit 3: Introduction to Seaborn Library (Part-2)

keyboard_arrow_down Unit 3: Introduction to Seaborn Library (Part-2)


Double-click (or enter) to edit

keyboard_arrow_down Categorical scatterplots


Categorical scatterplots:

stripplot() (with kind="strip"; the default)

swarmplot() (with kind="swarm")

Categorical distribution plots:

boxplot() (with kind="box")

violinplot() (with kind="violin")

boxenplot() (with kind="boxen")

Categorical estimate plots: pointplot() (with kind="point")

barplot() (with kind="bar")

countplot() (with kind="count")

The default representation of the data in catplot() uses a scatterplot. There are actually two
different categorical scatter plots in seaborn. They take different approaches to resolving the main
challenge in representing categorical data with a scatter plot, which is that all of the points
belonging to one category would fall on the same position along the axis corresponding to the
categorical variable. The approach used by stripplot(), which is the default “kind” in catplot() is to
adjust the positions of points on the categorical axis with a small amount of random “jitter”

Start coding or generate with AI.

import matplotlib.pyplot as plt


import seaborn as sns
import numpy as np
import pandas as pd
tips = sns.load_dataset("tips")
print(tips)
#by defualt strip plot
sns.catplot(data=tips, x="day", y="total_bill")

total_bill tip sex smoker day time size


0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
.. ... ... ... ... ... ... ...
239 29.03 5.92 Male No Sat Dinner 3
240 27.18 2.00 Female Yes Sat Dinner 2
241 22.67 2.00 Male Yes Sat Dinner 2
242 17.82 1.75 Male No Sat Dinner 2
243 18.78 3.00 Female No Thur Dinner 2

[244 rows x 7 columns]


<seaborn.axisgrid.FacetGrid at 0x7fe610150a90>

sns.catplot(data=tips, x="day", y="total_bill", kind="strip")


<seaborn.axisgrid.FacetGrid at 0x7fe6100e8d50>

The jitter parameter controls the magnitude of jitter or disables it altogether:

sns.catplot(data=tips, x="day", y="total_bill", jitter=False)


<seaborn.axisgrid.FacetGrid at 0x7fe60dd6c610>

sns.catplot(data=tips, x="day", y="total_bill", kind="swarm")


<seaborn.axisgrid.FacetGrid at 0x7fe610102d50>

sns.catplot(data=tips, x="day", y="total_bill", hue="sex", kind="swarm")


<seaborn.axisgrid.FacetGrid at 0x7fe6100eb390>

sns.catplot(data=tips, x="smoker", y="tip", order=["No", "Yes"])


<seaborn.axisgrid.FacetGrid at 0x7fe60dd8cc90>

sns.catplot(data=tips, x="total_bill", y="day", hue="time", kind="swarm")


<seaborn.axisgrid.FacetGrid at 0x7fe60db6aa10>

keyboard_arrow_down Boxplots
sns.catplot(data=tips, x="day", y="total_bill", hue="smoker", kind="box")
<seaborn.axisgrid.FacetGrid at 0x7fe60db58910>

keyboard_arrow_down boxenplot
diamonds = sns.load_dataset("diamonds")
sns.catplot(
data=diamonds.sort_values("color"),
x="color", y="price", kind="boxen",
)
<seaborn.axisgrid.FacetGrid at 0x7fe60da9f610>

keyboard_arrow_down Violinplots
sns.catplot(
data=tips, x="total_bill", y="day", hue="sex", kind="violin",
)
<seaborn.axisgrid.FacetGrid at 0x7fe60d845dd0>

sns.catplot(
data=tips, x="total_bill", y="day", hue="sex",
kind="violin", bw_adjust=.5, cut=0,
)
<seaborn.axisgrid.FacetGrid at 0x7fe60da59bd0>

sns.catplot(
data=tips, x="day", y="total_bill", hue="sex",
kind="violin", split=True,
)
<seaborn.axisgrid.FacetGrid at 0x7fe60d7c2bd0>

sns.catplot(
data=tips, x="day", y="total_bill", hue="sex",
kind="violin", inner="stick", split=True, palette="pastel",
)
<seaborn.axisgrid.FacetGrid at 0x7fe60d766590>

It can also be useful to combine swarmplot() or stripplot()


keyboard_arrow_down with a box plot or violin plot to show each observation
along with a summary of the distribution:

g = sns.catplot(data=tips, x="day", y="total_bill", kind="violin", inner=None)


sns.swarmplot(data=tips, x="day", y="total_bill", color="k", size=3, ax=g.ax)
<Axes: xlabel='day', ylabel='total_bill'>

keyboard_arrow_down Estimating central tendency


Bar Plots

titanic = sns.load_dataset("titanic")
sns.catplot(data=titanic, x="sex", y="survived", hue="class", kind="bar")
<seaborn.axisgrid.FacetGrid at 0x7fe60d6ba510>

sns.catplot(data=titanic, x="age", y="deck", errorbar=("pi", 95), kind="bar")


<seaborn.axisgrid.FacetGrid at 0x7fe60b165e50>

sns.catplot(data=titanic, x="deck", kind="count")


<seaborn.axisgrid.FacetGrid at 0x7fe60af9e590>

sns.catplot(
data=titanic, y="deck", hue="class", kind="count",
palette="pastel", edgecolor=".6",
)
<seaborn.axisgrid.FacetGrid at 0x7fe60b051290>

keyboard_arrow_down Point plots


An alternative style for visualizing the same information is offered by the pointplot() function. This
function also encodes the value of the estimate with height on the other axis, but rather than
showing a full bar, it plots the point estimate and confidence interval. Additionally, pointplot()
connects points from the same hue category. This makes it easy to see how the main relationship
is changing as a function of the hue semantic, because your eyes are quite good at picking up on
differences of slopes:

sns.catplot(data=titanic, x="sex", y="survived", hue="class", kind="point")


<seaborn.axisgrid.FacetGrid at 0x7fe60af91bd0>

sns.catplot(
data=titanic, x="class", y="survived", hue="sex",
palette={"male": "g", "female": "m"},
markers=["^", "o"], linestyles=["-", "--"],
kind="point"
)
<seaborn.axisgrid.FacetGrid at 0x7fe60aed2710>

Just like relplot(), the fact that catplot() is built on a


keyboard_arrow_down FacetGrid means that it is easy to add faceting variables to
visualize higher-dimensional relationships:

sns.catplot(
data=tips, x="day", y="total_bill", hue="smoker",
kind="swarm", col="time", aspect=.7,
)
<seaborn.axisgrid.FacetGrid at 0x7fe60aef47d0>

For further customization of the plot, you can use the


keyboard_arrow_down methods on the FacetGrid object that it returns:
g = sns.catplot(
data=titanic,
x="fare", y="embark_town", row="class",
kind="box", orient="h",
sharex=False, margin_titles=True,
height=1.5, aspect=4,
)
g.set(xlabel="Fare", ylabel="")
g.set_titles(row_template="{row_name} class")
for ax in g.axes.flat:
ax.xaxis.set_major_formatter('${x:.0f}')
Regression Plot

keyboard_arrow_down Estimating regression fits:


The two functions that can be used to visualize a linear fit are regplot() and lmplot().

In the simplest invocation, both functions draw a scatterplot of two variables, x and y, and then fit
the regression model y ~ x and plot the resulting regression line and a 95% confidence interval for
that regression:

tips = sns.load_dataset("tips")
sns.regplot(x="total_bill", y="tip", data=tips);
sns.lmplot(x="total_bill", y="tip", data=tips);
sns.lmplot(x="size", y="tip", data=tips);
sns.lmplot(x="size", y="tip", data=tips, x_jitter=.05);
sns.lmplot(x="size", y="tip", data=tips, x_estimator=np.mean);
When the y variable is binary, simple linear regression also
keyboard_arrow_down “works” but provides implausible predictions:
tips["big_tip"] = (tips.tip / tips.total_bill) > .15
sns.lmplot(x="total_bill", y="big_tip", data=tips,
y_jitter=.03);
The solution in this case is to fit a logistic regression, such
keyboard_arrow_down that the regression line shows the estimated probability of
y = 1 for a given value of x:

sns.lmplot(x="total_bill", y="big_tip", data=tips,


logistic=True, y_jitter=.03);
keyboard_arrow_down Conditioning on other variables
The plots above show many ways to explore the relationship between a pair of variables. Often,
however, a more interesting question is “how does the relationship between these two variables
change as a function of a third variable?” This is where the main differences between regplot() and
lmplot() appear. While regplot() always shows a single relationship, lmplot() combines regplot()
with FacetGrid to show multiple fits using hue mapping or faceting.

The best way to separate out a relationship is to plot both levels on the same axes and to use color
to distinguish them:

sns.lmplot(x="total_bill", y="tip", hue="smoker", data=tips);


sns.lmplot(x="total_bill", y="tip", hue="smoker", data=tips,
markers=["o", "x"], palette="Set1");
To add another variable, you can draw multiple “facets”
keyboard_arrow_down with each level of the variable appearing in the rows or
columns of the grid:

sns.lmplot(x="total_bill", y="tip", hue="smoker", col="time", data=tips);


sns.lmplot(x="total_bill", y="tip", hue="smoker",
col="time", row="sex", data=tips, height=3);
keyboard_arrow_down Plotting a regression in other contexts
A few other seaborn functions use regplot() in the context of a larger, more complex plot. The first
is the jointplot() function that we introduced in the distributions tutorial. In addition to the plot
styles previously discussed, jointplot() can use regplot() to show the linear regression fit on the
joint axes by passing kind="reg":

sns.jointplot(x="total_bill", y="tip", data=tips, kind="reg");

You might also like