0% found this document useful (0 votes)
284 views28 pages

Matplotlib Data Visualization Techniques

The document is a course syllabus for the unit on data visualization in the second year of a Bachelor of Technology program in Information Technology. It includes an introduction to the unit, a list of topics to be covered such as importing Matplotlib and creating different types of plots, and sample questions. It was compiled by a professor and verified by the head of the department and principal.

Uploaded by

it hod
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
284 views28 pages

Matplotlib Data Visualization Techniques

The document is a course syllabus for the unit on data visualization in the second year of a Bachelor of Technology program in Information Technology. It includes an introduction to the unit, a list of topics to be covered such as importing Matplotlib and creating different types of plots, and sample questions. It was compiled by a professor and verified by the head of the department and principal.

Uploaded by

it hod
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

CS3352 FOUNDATIONS OF DATA SCIENCE

II YEAR / III SEMESTER B.Tech.- INFORMATION TECHNOLOGY

UNIT V
DATA VISUALIZATION

COMPILED BY,

Prof.M.KARTHIKEYAN, M.E., HoD / IT

VERIFIED BY

HOD PRINCIPAL CEO/CORRESPONDENT

DEPARTMENT OF INFORMATION TECHNOLOGY

SENGUNTHAR COLLEGE OF ENGINEERING, TIRUCHENGODE – 637 205.


1
UNIT V
DATA VISUALIZATION

➢ Importing Matplotlib – Line plots – Scatter plots


➢ visualizing errors – density and contour plots
➢ Histograms – legends – colors
➢ subplots – text and annotation – customization
➢ three-dimensional plotting –
➢ Geographic Data with Basemap –
➢ Visualization with Seaborn.

2
LIST OF IMPORTANT QUESTIONS
UNIT V
DATA VISUALIZATION
PART – A
1. What is Data Visualization?
2. Which are the best libraries for data visualization in python?
3. How can we visualize more than three dimensions of data in a single chart?
4. How to plot the distribution of customers by age?
5. What is the purpose of a Scatter plot?
6. Define IQR in a box plot.
7. What is a Boxplot?
8. What is a heat map in Python? Create a correlation matrix using the core function of
the data frame?
9. What is a scatter plot? For what type of data is scatter plot usually used for?
10. What features might be visible in scatterplots?
11. What type of plot would you use if you need to demonstrate “relationship” between
variables/parameters?
12. When will you use a histogram and when will you use a bar chart?
13. What is an outlier?
14. What type of data is box-plots usually used for? Why?
15. What information could you gain from a box-plot?

PART – B

1. Explain in detail about three-dimensional Plotting using Matplotlib.


2. Explain various types of plotting using Matplotlib in python.
3. Mapping Geographical Data with Basemap Python Package
4. Explain in detail about Data Visualization Using Seaborn.
5. Explain with example about Matplotlib Subplot.

3
LIST OF IMPORTANT QUESTIONS
UNIT V
DATA VISUALIZATION

PART – A

1. What is Data Visualization?


After manipulating the data to our use, we have to present it for others to understand. This process
is known as Data Visualization. Data can be presented or visualized using tools and techniques
such as infographics, graphs, fever charts, histograms to name a few. Data visualization
helps present your trends and patterns to customers, stakeholders, and team members for
activities as varied as driving sales to product development to performance analysis.
2. Which are the best libraries for data visualization in python?
Essentially, there are four libraries that are used for data visualization in python :

• Matplotlib
• Seaborn
• Plotly
• Bokeh
3. How can we visualize more than three dimensions of data in a single chart?
To visualize data beyond three dimensions, we need to use visual cues such as color, size, and
shape.
• Color is used to depict both continuous and categorical data.

• Marker Size is used to represent continuous data. Can be used for categorical data as
well. However, since size differences are difficult to detect, it is not considered the most
appropriate choice for categorical data.
• Shapes are used to represent different classes.
4. How to plot the distribution of customers by age?
The distribution of customers by age can be plotted simply by creating a histogram from the Age
column of the customer’s DataFrame.
5. What is the purpose of a Scatter plot?
Scatter plots are used to observe relationships between two different numeric variables.
4
6. Define IQR in a box plot.
IQR stands for interquartile range. In a box plot and IQR is the length of the box.
7. What is a Boxplot?
A Box and Whisker Plot (or Boxplot) are used to represent data distribution through
their quartiles. The graph looks like a rectangle with lines extending from the top and bottom.
These lines are known as the “whiskers”, and represent the variability outside the upper and
lower quartiles.
8. What is a heat map in Python? Create a correlation matrix using the core function of
the data frame?
Heatmaps are used to cross-examining multivariate data and represent it through color
variations.

9. What is a scatter plot? For what type of data is scatter plot usually used for?
A scatter plot is a chart used to plot a correlation between two or more variables at the
same time. It’s usually used for numeric data.

10. What features might be visible in scatterplots?


1. Correlation: the two variables might have a relationship, for example, one might depend
on another. But this is not the same as causation.

2. Associations: the variables may be associated with one another.

3. Outliers: there could be cases where the data in two dimensions does not follow the
general pattern.

4. Clusters: sometimes there could be groups of data that form a cluster on the plot.
5
5. Gaps: some combinations of values might not exist in a particular case.

6. Barriers: boundaries.

7. Conditional relationships: some relationship between the variables rely on a condition to


be met.

11. What type of plot would you use if you need to demonstrate “relationship” between
variables/parameters?
When we are trying to show the relationship between 2 variables, scatter plots or
charts are used. When we are trying to show “relationship” between three variables,
bubble charts are used.

12. When will you use a histogram and when will you use a bar chart?

Both plots are used to plot the distribution of a variable. Histograms are usually used for a
categorical variable, while bar charts are used for a categorical variable.

13. What is an outlier?


The outlier is a commonly used terms by analysts referred for a value that appears far away
and diverges from an overall pattern in a sample. There are two types of outliers: univariate
and multivariate.
14.What type of data is box-plots usually used for? Why?
Boxplots are usually used for continuous variables. The plot is generally not informative when
used for discrete data.

15. What information could you gain from a box-plot?


1. Minimum/maximum score
2. Lower/upper quartile
3. Median
4. The Interquartile Range
5. Skewness
6. Dispersion
7. Outliers
16. When do you use a boxplot and in what situation would you choose boxplot over
histograms.
Boxplots are used when trying to show a statistical distribution of one variable or compare the
distributions of several variables. It is a visual representation of the statistical five number
summary.

6
Histograms are better at determining the probability distribution of the data; however, boxxplots
are better for comparison between datasets and they are more space efficient.
17. When analyzing a histogram, what are some of the features to look for?
1. Asymmetry
2. Outliers
3. Multimodality
4. Gaps
5. Heaping/Rounding: Heaping example: temperature data can consist of common values
due to conversion from Fahrenheit to Celsius. Rounding example: weight data that are
all multiples of 5.
6. Impossibilities/Errors

18. What type of data is histograms usually used for?

Continuous data

PART – B

1. Explain in detail about three-dimensional Plotting using Matplotlib.

Even though Matplotlib was initially designed with only two-dimensional plotting in mind, some
three-dimensional plotting utilities were built on top of Matplotlib's two-dimensional display in later
versions, to provide a set of tools for three-dimensional data visualization. Three-dimensional
plots are enabled by importing the mplot3d toolkit, included with the Matplotlib package.

A three-dimensional axes can be created by passing the keyword projection='3d' to any of the
normal axes creation routines.

from mpl_toolkits import mplot3d


import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
ax = plt.axes(projection='3d')
z = np.linspace(0, 1, 100)
x = z * np.sin(20 * z)
y = z * np.cos(20 * z)
ax.plot3D(x, y, z, 'gray')

7
ax.set_title('3D line plot')
plt.show()
We can now plot a variety of three-dimensional plot types. The most basic three-dimensional plot
is a 3D line plot created from sets of (x, y, z) triples. This can be created using the ax.plot3D
function.

3D scatter plot is generated by using the ax.scatter3D function.


from mpl_toolkits import mplot3d
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
ax = plt.axes(projection='3d')
z = np.linspace(0, 1, 100)
x = z * np.sin(20 * z)
y = z * np.cos(20 * z)
c=x+y
ax.scatter(x, y, z, c=c)
ax.set_title('3d Scatter plot')
plt.show()

8
Matplotlib - 3D Contour Plot

The ax.contour3D() function creates three-dimensional contour plot. It requires all the input data
to be in the form of two-dimensional regular grids, with the Z-data evaluated at each point. Here,
we will show a three-dimensional contour diagram of a three-dimensional sinusoidal function.
from mpl_toolkits import mplot3d
import numpy as np
import matplotlib.pyplot as plt
def f(x, y):
return np.sin(np.sqrt(x ** 2 + y ** 2))

x = np.linspace(-6, 6, 30)
y = np.linspace(-6, 6, 30)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)

9
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.contour3D(X, Y, Z, 50, cmap='binary')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
ax.set_title('3D contour')
plt.show()

2. Explain various types of plotting using Matplotlib in python.

Creating Line Chart


Linestyle

We can use the keyword argument linestyle, or shorter ls, to change the style of the plotted line:

import matplotlib.pyplot as plt


import numpy as np

ypoints = np.array([3, 8, 1, 10])

plt.plot(ypoints, linestyle = 'dotted')


plt.show()

10
Creating Scatter Plots

With Pyplot, you can use the scatter() function to draw a scatter plot.

The scatter() function plots one dot for each observation. It needs two arrays of the same length,
one for the values of the x-axis, and one for values on the y-axis:

Example
A simple scatter plot:
import matplotlib.pyplot as plt
import numpy as np
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x, y)
plt.show()

11
Creating Bars
With Pyplot, you can use the bar() function to draw bar graphs:
Example
Draw 4 bars:
import matplotlib.pyplot as plt
import numpy as np

x = np.array(["A", "B", "C", "D"])


y = np.array([3, 8, 1, 10])

plt.bar(x,y)
plt.show()
Result:

Horizontal Bars
If we want the bars to be displayed horizontally instead of vertically, use the barh() function:

Histogram
A histogram is a graph showing frequency distributions.
It is a graph showing the number of observations within each given interval.
Example
A simple histogram:
import matplotlib.pyplot as plt
import numpy as np

x = np.random.normal(170, 10, 250)

plt.hist(x)
plt.show()

12
Creating Pie Charts

With Pyplot, you can use the pie() function to draw pie charts:

Example
import matplotlib.pyplot as plt
import numpy as np

y = np.array([35, 25, 25, 15])


mylabels = ["Apples", "Bananas", "Cherries", "Dates"]

plt.pie(y, labels = mylabels)

13
3. Mapping Geographical Data with Basemap Python Package

Basemap is a matplotlib extension used to visualize and create geographical maps in python. The

main purpose of this tutorial is to provide basic information on how to plot and visualize

geographical data with the help of Basemap package. If you need further information on basemap,

please visit basemap page.


• Installing Basemap package
• Adding vector layers to a map
• Projection, bounding box, & resolution
• Plotting a specific region
• Background relief maps
• Plotting netCDF data using Basemap

Installing Basemap package

To install Basemap, the conda package manager will be utilized. If you haven’t already installed

Anaconda/conda package manager on your PC , there is a blog post on how to

install Anaconda/conda package manager. After you’ve installed the conda package manager,

follow the steps below to install basemap package.


1) Start an Ubuntu terminal or an Anaconda prompt
2) Add a new environment variable named basemap_stable

14
conda create --name basemap_stable
3) Activate the basemap_stable environment:
conda activate basemap_stable
4) Install basemap package and it’s :
conda install -c anaconda basemap
5) View a list of python dependencies by typing conda list
conda list
6) Now open your favorite Python Notebook or IDE in the active conda environment, In my case, I
used jupyter notebook
7) Finally, import the Basemap and Matplotlib libraries:
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
Adding vector layers to a map
Coastlines
• First, let’s initialize a map with Basemap() function
• Then, use the drawcoastlines() function to add coastlines on the map
fig = plt.figure(figsize = (12,12))
m = Basemap()
m.drawcoastlines()
plt.title("Coastlines", fontsize=20)
plt.show()

The drawcoastlines function has the following main arguments:

• linewidth: 1.0, 2.0, 3.0…


15
• linestyle: solid, dashed…

• color: black, red…

Let’s apply some changes to the coastlines


fig = plt.figure(figsize = (12,12))
m = Basemap()
m.drawcoastlines(linewidth=1.0, linestyle='dashed', color='red')
plt.title("Coastlines", fontsize=20)
plt.show()

Countries

Use the drawcountries() function to add countries on the map


fig = plt.figure(figsize = (12,12))
m = Basemap()
m.drawcoastlines(linewidth=1.0, linestyle='solid', color='black')
m.drawcountries()

16
plt.title("Country boundaries", fontsize=20)
plt.show()

The drawcountries() function uses similar arguments like drawcountries() as shown


below:
fig = plt.figure(figsize = (12,12))
m = Basemap()
m.drawcoastlines(linewidth=1.0, linestyle='solid', color='black')
m.drawcountries(linewidth=1.0, linestyle='solid', color='k')
plt.title("Country boundaries", fontsize=20)
plt.show()

17
Draw major rivers

• Use the drawrivers() function to add major rivers on the map

• The drawrivers() function can


take linewidth, linestyle, color arguments

fig = plt.figure(figsize = (12,12))


m = Basemap()
m.drawcoastlines(linewidth=1.0, linestyle='solid', color='black')
m.drawcountries(linewidth=1.0, linestyle='solid', color='k')
m.drawrivers(linewidth=0.5, linestyle='solid', color='#0000ff')
plt.title("Major rivers", fontsize=20)
plt.show()

Fill continents

• This function is used to draw color filled continents

• Use fillcontinents() function to fill continents


fig = plt.figure(figsize = (12,12))
m = Basemap()
m.drawcoastlines(linewidth=1.0, linestyle='solid', color='black')
m.drawcountries(linewidth=1.0, linestyle='solid', color='k')
18
m.fillcontinents()
plt.title("Color filled continents", fontsize=20)
plt.show()

The fillcontinents() function can take the following arguments:

• color: fills continents (default gray)

• lake_color: fills inland lakes

• alpha: sets transparency for continents


fig = plt.figure(figsize = (12,12))
m = Basemap()
m.drawcoastlines(linewidth=1.0, linestyle='solid', color='black')
m.drawcountries(linewidth=1.0, linestyle='solid', color='k')
m.fillcontinents(color='coral',lake_color='aqua', alpha=0.9)

19
plt.title("Color filled continents", fontsize=20)
plt.show()

4. Explain in detail about Data Visualization Using Seaborn.


Seaborn is a Python data visualization library based on the Matplotlib library. It provides a
high-level interface for drawing attractive and informative statistical graphs. Here in this article,
we’ll learn how to create basic plots using the Seaborn library. Such as:
▪ Scatter Plot
▪ Histogram
▪ Bar Plot
▪ Box and Whiskers Plot
▪ Pairwise Plots

Scatter Plot:

Scatter plots can be used to show a linear relationship between two or three data points
using the seaborn library. A Scatter plot of price vs age with default arguments will be
like this:

20
plt.style.use("ggplot")
plt.figure(figsize=(8,6))
sns.regplot(x = cars_data["Age"], y = cars_data["Price"])
plt.show()

Here, regplot means Regression Plot. By default fit_reg = True. It estimates and plots a
regression model relating the x and y variable.

Age and Price are 2 numerical variables so, what if you want to add one more
categorical variable. Let’s say you have a Scatter Plot of price vs age column and you
want to do it by fuel type. So for that, you have to use a parameter called hue, including
other variables to show the fuel types categories with different colors. So there is a
function in seaborn libraries called lmplot this helps us to add another categorical
variable in the numerical variables Scatter Plot.

sns.lmplot(x='Age', y='Price', data=cars_data,

fit_reg=False,

hue='FuelType',

legend=True,

palette="Set1",height=6)

21
We need to put the legend = True, to know which FuelType is which color and palette,
that is color scheme. We use “Set1” as a palette here.

Histogram:

In order to draw a histogram in Seaborn, we have a function called distplot and inside
that, we have to pass the variable which we want to include. Histogram with default
kernel density estimate:
plt.figure(figsize=(8,6))

sns.distplot(cars_data['Age'])

plt.show()

22
Bar Plot:

Bar plot is for categorical variables. Bar plot is the commonly used plot because of its
simplicity and it’s easy to understand data through them. You can plot a barplot in seaborn
using the countplot library. It’s really simple. Let’s plot a barplot of FuelType.

plt.figure(figsize=(8,6))
sns.countplot(x="FuelType", data=cars_data)
plt.show()

In the y-axis, we have got the frequency distribution of FuelType of the cars.
Grouped Bar Plot:

We can plot a barplot between two variables. That’s called grouped barplot. Let’s plot a
barplot of FuelType distributed by different values of the Automatic column.

plt.figure(figsize=(8,6))

sns.countplot(x="FuelType", data=cars_data,

hue="Automatic")

plt.show()
23
Box and Whiskers Plot:

Box and whiskers plots are used for analyzing the detailed distribution of a dataset.
Let’s plot Box and whiskers plot of the Price column of the dataset to visually interpret
the “five-number summary”. Five Number Summary includes minimum, maximum, and
the three quartiles(1st quartile, median and third quartile).

plt.figure(figsize=(8,6))

sns.boxplot(y=cars_data["Price"])

plt.show()

24
• Minimum: The minimum value of the dataset excluding the outliers
• Maximum: The maximum value of the dataset excluding the outliers
• First quartile/Q1 (25th percentile): The median value between the smallest number and
the median of the dataset
• Median (50th percentile): The median of the Dataset.
• Third Quartile/Q3 (75th percentile): The median value between the median of the
dataset and the highest value of the dataset

Here, IQR is interquartile range:


IQR = Q3-Q1
Anything above the whiskers is called Outliers. Outlier or extreme values lies above 1.5 times the

median values.
Box and Whiskers Plot(Numerical vs Categorical Variable):
Boxplot for Price of the cars for various FuelType.

plt.figure(figsize=(8,6))
sns.boxplot(x=cars_data["FuelType"],
y=cars_data["Price"],
)
plt.show()

25
5. Explain with example about Matplotlib Subplot.

Display Multiple Plots

With the subplot() function you can draw multiple plots in one figure:

Example

Draw 2 plots:

import matplotlib.pyplot as plt


import numpy as np

#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(1, 2, 1)
plt.plot(x,y)

#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(1, 2, 2)
plt.plot(x,y)

plt.show()
26
The subplot() Function
The subplot() function takes three arguments that describes the layout of the figure.
The layout is organized in rows and columns, which are represented by
the first and second argument.
The third argument represents the index of the current plot.
plt.subplot(1, 2, 1)
#the figure has 1 row, 2 columns, and this plot is the first plot.

plt.subplot(1, 2, 2)
#the figure has 1 row, 2 columns, and this plot is the second plot.
So, if we want a figure with 2 rows an 1 column (meaning that the two plots will be displayed on
top of each other instead of side-by-side), we can write the syntax like this:
Example

Draw 2 plots on top of each other:

import matplotlib.pyplot as plt


import numpy as np

#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])

plt.subplot(2, 1, 1)
plt.plot(x,y)

#plot 2:

27
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])

plt.subplot(2, 1, 2)
plt.plot(x,y)

plt.show()
Result:

28

You might also like