0% found this document useful (0 votes)
5 views28 pages

Data Visualization in Python With Libraries

The document provides a comprehensive guide to data visualization in Python, detailing various libraries such as Matplotlib and Seaborn for creating different types of visual representations like line charts, bar graphs, histograms, scatter plots, and heat maps. It explains the significance of data visualization in analyzing trends and correlations in data, particularly for large datasets. The article also includes practical examples and comparisons between the libraries to help users choose the appropriate tools for their visualization needs.

Uploaded by

pichedekho3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views28 pages

Data Visualization in Python With Libraries

The document provides a comprehensive guide to data visualization in Python, detailing various libraries such as Matplotlib and Seaborn for creating different types of visual representations like line charts, bar graphs, histograms, scatter plots, and heat maps. It explains the significance of data visualization in analyzing trends and correlations in data, particularly for large datasets. The article also includes practical examples and comparisons between the libraries to help users choose the appropriate tools for their visualization needs.

Uploaded by

pichedekho3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Data Visualization in Python With

Libraries
The process of finding trends and correlations in our data by representing it
pictorially is called Data Visualization. To perform data visualization in python, we
can use various python data visualization modules such as Matplotlib, Seaborn,
Plotly, etc. In this article, The Complete Guide to Data Visualization in Python, we
will discuss how to work with some of these modules for data visualization in python
and cover the following topics in detail.

 What is Data Visualization?

 Data Visualization in Python

 Matplotlib and Seaborn

 Line Charts

 Bar Graphs

 Histograms

 Scatter Plots

 Heat Maps

What is Data Visualization?

Data visualization is a field in data analysis that deals with visual representation of
data. It graphically plots data and is an effective way to communicate inferences
from data.

Using data visualization, we can get a visual summary of our data. With pictures,
maps and graphs, the human mind has an easier time processing and understanding
any given data. Data visualization plays a significant role in the representation of
both small and large data sets, but it is especially useful when we have large data
sets, in which it is impossible to see all of our data, let alone process and understand
it manually.

Data Visualization in Python

Python offers several plotting libraries, namely Matplotlib, Seaborn and many other
such data visualization packages with different features for creating informative,
customized, and appealing plots to present data in the most simple and effective
way.

Figure 1: Data visualization

Matplotlib and Seaborn

Matplotlib and Seaborn are python libraries that are used for data visualization. They
have inbuilt modules for plotting different graphs. While Matplotlib is used to embed
graphs into applications, Seaborn is primarily used for statistical graphs.
But when should we use either of the two? Let’s understand this with the help of a
comparative analysis. The table below provides comparison between Python’s two
well-known visualization packages Matplotlib and Seaborn.

Matplotlib Seaborn

It is mainly used for statistics


It is used for basic graph plotting like line charts, bar visualization and can perform
graphs, etc. complex visualizations with
fewer commands.

It mainly works with datasets and arrays. It works with entire datasets.

Matplotlib acts productively with


Seaborn is considerably more organized and functional
data arrays and frames. It
than Matplotlib and treats the entire dataset as a solitary
regards the aces and figures as
unit.
objects.

Matplotlib is more customizable


Seaborn has more inbuilt themes and is mainly used for and pairs well with Pandas and
statistical analysis. Numpy for Exploratory Data
Analysis.

Table 1: Matplotlib vs Seaborn


Line Charts

A Line chart is a graph that represents information as a series of data points


connected by a straight line. In line charts, each data point or marker is plotted and
connected with a line or curve.

Let's consider the apple yield (tons per hectare) in Kanto. Let's plot a line graph
using this data and see how the yield of apples changes over time. We start by
importing Matplotlib and Seaborn.

Figure 2: Importing necessary modules

Using Matplotlib

We are using random data points to represent the yield of apples.


Figure 3: Plotting apple yield

To better understand the graph and its purpose, we can add the x-axis values too.
Figure 4: Axis values

Figure 5: Axis with labels

To plot multiple datasets on the same graph, just use the plt.plot function once for
each dataset. Let's use this to compare the yields of apples vs. oranges on the same
graph.
Figure 6: Plotting multiple graphs

We can add a legend which tells us what each line in our graph means. To
understand what we are plotting, we can add a title to our graph.
Figure 7: Plotting multiple graphs

To show each data point on our graph, we can highlight them with markers using the
marker argument. Many different marker shapes like a circle, cross, square,
diamond, etc. are provided by Matplotlib.
Figure 8: Using markers

You can use the plt.figure function to change the size of the figure.
Figure 9: Changing graph size

Using Seaborn

An easy way to make your charts look beautiful is to use some default styles from
the Seaborn library. These can be applied globally using the sns.set_style function.
Figure 10: Using Seaborn

We can also use the darkgrid option to change the background color to a darker
shade.
Figure 11: Using darkgrid in Seaborn

Bar Graphs

When you have categorical data, you can represent it with a bar graph. A bar graph
plots data with the help of bars, which represent value on the y-axis and category on
the x-axis. Bar graphs use bars with varying heights to show the data which belongs
to a specific category.
Figure 12: Plotting Bar graphs

We can also stack bars on top of each other. Let's plot the data for apples and
oranges.
Figure 13: Plotting stacked bar graphs

Let’s use the tips dataset in Seaborn next. The dataset consists of :

 Information about the sex (gender)

 Time of day

 Total bill

 Tips given by customers visiting the restaurant for a week


Figure 14: Iris Dataset

We can draw a bar chart to visualize how the average bill amount varies across
different days of the week. We can do this by computing the day-wise averages and
then using plt.bar. The Seaborn library also provides a barplot function that can
automatically compute averages.
Figure 15: Plotting averages of each bar

If you want to compare bar plots side-by-side, you can use the hue argument. The
comparison will be done based on the third feature specified in this argument.

Figure 16: Plotting multiple bar graphs


You can make the bars horizontal by switching the axes.

Figure 17: Plotting horizontal bar graphs

Histograms

A Histogram is a bar representation of data that varies over a range. It plots the
height of the data belonging to a range along the y-axis and the range along the x-
axis. Histograms are used to plot data over a range of values. They use a bar
representation to show the data belonging to each range. Let's again use the ‘Iris’
data which contains information about flowers to plot histograms.
Figure 18: Iris datase

Now, let’s plot a histogram using the hist() function.

Figure 19: Plotting histograms

We can control the number or size of bins too.


Figure 20: Changing number of bins

We can change the number and size of bins using numpy too.
Figure 21: Changing number and size of bins

We can create bins of unequal size too.

Figure 22: Bins of unequal size

Similar to line charts, we can draw multiple histograms in a single chart. We can
reduce each histogram's opacity so that one histogram's bars don't hide the others'.
Let's draw separate histograms for each species of flowers.
Figure 23: Multiple histograms

Multiple histograms can be stacked on top of one another by setting the stacked
parameter to True.
Figure 24: Stacking histograms

Scatter Plots

Scatter plots are used when we have to plot two or more variables present at
different coordinates. The data is scattered all over the graph and is not confined to a
range. Two or more variables are plotted in a Scatter Plot, with each variable being
represented by a different color. Let's use the ‘Iris’ dataset to plot a Scatter Plot.

Figure 25: Iris Dataset

First, let’s see how many different species of flowers we have.


Figure 26: Unique flower species

Let’s try plotting the data with the help of a line chart.

Figure 27: Plotting line chart

This is not very informative. We cannot figure out the relationship between different
data points.
Figure 28: Scatter plot

This is much better. But we still cannot differentiate different data points belonging to
different categories. We can color the dots using the flower species as a hue.

Figure 29: Scatter plot with multiple colors

Since Seaborn uses Matplotlib's plotting functions internally, we can use functions
like plt.figure and plt.title to modify the figure.
Figure 30: Changing dimensions of scatter plot

Heat Maps

Heatmaps are used to see changes in behavior or gradual changes in data. It uses
different colors to represent different values. Based on how these colors range in
hues, intensity, etc., tells us how the phenomenon varies. Let's use heatmaps to
visualize monthly passenger footfall at an airport over 12 years from the flights
dataset in Seaborn.
Figure 31: Flights dataset

The above dataset, flights_df shows us the monthly footfall in an airport for each
year, from 1949 to 1960. The values represent the number of passengers (in
thousands) that passed through the airport. Let’s use a heatmap to visualize the
above data.
Figure 32: Plotting heatmap

The brighter the color, the higher the footfall at the airport. By looking at the graph,
we can infer that :

1. The annual footfall for any given year is highest around July and August.

2. The footfall grows annually. Any month in a year will have a higher footfall
when compared to the previous years.

Let's display the actual values in our heatmap and change the hue to blue.

You might also like