Unit-3_part-2
Unit-3_part-2
The default representation of the data in catplot() uses a scatterplot. There are actually two
different categorical scatter plots in seaborn. They take different approaches to resolving the main
challenge in representing categorical data with a scatter plot, which is that all of the points
belonging to one category would fall on the same position along the axis corresponding to the
categorical variable. The approach used by stripplot(), which is the default “kind” in catplot() is to
adjust the positions of points on the categorical axis with a small amount of random “jitter”
keyboard_arrow_down Boxplots
sns.catplot(data=tips, x="day", y="total_bill", hue="smoker", kind="box")
<seaborn.axisgrid.FacetGrid at 0x7fe60db58910>
keyboard_arrow_down boxenplot
diamonds = sns.load_dataset("diamonds")
sns.catplot(
data=diamonds.sort_values("color"),
x="color", y="price", kind="boxen",
)
<seaborn.axisgrid.FacetGrid at 0x7fe60da9f610>
keyboard_arrow_down Violinplots
sns.catplot(
data=tips, x="total_bill", y="day", hue="sex", kind="violin",
)
<seaborn.axisgrid.FacetGrid at 0x7fe60d845dd0>
sns.catplot(
data=tips, x="total_bill", y="day", hue="sex",
kind="violin", bw_adjust=.5, cut=0,
)
<seaborn.axisgrid.FacetGrid at 0x7fe60da59bd0>
sns.catplot(
data=tips, x="day", y="total_bill", hue="sex",
kind="violin", split=True,
)
<seaborn.axisgrid.FacetGrid at 0x7fe60d7c2bd0>
sns.catplot(
data=tips, x="day", y="total_bill", hue="sex",
kind="violin", inner="stick", split=True, palette="pastel",
)
<seaborn.axisgrid.FacetGrid at 0x7fe60d766590>
titanic = sns.load_dataset("titanic")
sns.catplot(data=titanic, x="sex", y="survived", hue="class", kind="bar")
<seaborn.axisgrid.FacetGrid at 0x7fe60d6ba510>
sns.catplot(
data=titanic, y="deck", hue="class", kind="count",
palette="pastel", edgecolor=".6",
)
<seaborn.axisgrid.FacetGrid at 0x7fe60b051290>
sns.catplot(
data=titanic, x="class", y="survived", hue="sex",
palette={"male": "g", "female": "m"},
markers=["^", "o"], linestyles=["-", "--"],
kind="point"
)
<seaborn.axisgrid.FacetGrid at 0x7fe60aed2710>
sns.catplot(
data=tips, x="day", y="total_bill", hue="smoker",
kind="swarm", col="time", aspect=.7,
)
<seaborn.axisgrid.FacetGrid at 0x7fe60aef47d0>
In the simplest invocation, both functions draw a scatterplot of two variables, x and y, and then fit
the regression model y ~ x and plot the resulting regression line and a 95% confidence interval for
that regression:
tips = sns.load_dataset("tips")
sns.regplot(x="total_bill", y="tip", data=tips);
sns.lmplot(x="total_bill", y="tip", data=tips);
sns.lmplot(x="size", y="tip", data=tips);
sns.lmplot(x="size", y="tip", data=tips, x_jitter=.05);
sns.lmplot(x="size", y="tip", data=tips, x_estimator=np.mean);
When the y variable is binary, simple linear regression also
keyboard_arrow_down “works” but provides implausible predictions:
tips["big_tip"] = (tips.tip / tips.total_bill) > .15
sns.lmplot(x="total_bill", y="big_tip", data=tips,
y_jitter=.03);
The solution in this case is to fit a logistic regression, such
keyboard_arrow_down that the regression line shows the estimated probability of
y = 1 for a given value of x:
The best way to separate out a relationship is to plot both levels on the same axes and to use color
to distinguish them: