Ad3301 Dev Full Notes
Ad3301 Dev Full Notes
Importing Matplotlib – Simple line plots – Simple scatter plots – visualizing errors – density
and contour plots – Histograms – legends – colors – subplots – text and annotation –
customization – three dimensional plotting - Geographic Data with Basemap - Visualization
with Seaborn.
********************************************************************************
One of Matplotlib’s most important features is its ability to play well with many operating
systems and graphics backends.
Matplotlib supports dozens of backends and output types, which means you can count on it to
work regardless of which operating system you are using or which output format you wish.
This cross-platform, everything-to-everyone approach has been one of the great strengths of
Matplotlib. It has led to a large user base, which in turn has led to an active developer base
and Matplotlib’s powerful tools and ubiquity within the scientific Python world.
In recent years, however, the interface and style of Matplotlib have begun to show their age.
Newer tools like ggplot and ggvis in the R language, along with web visualization toolkits
based on D3js and HTML5 canvas, often make Matplotlib feel clunky and old-fashioned.
Still, I'm of the opinion that we cannot ignore Matplotlib's strength as a well-tested, cross-
platform graphics engine.
Recent Matplotlib versions make it relatively easy to set new global plotting styles (see
Customizing Matplotlib: Configurations and Style Sheets), Even with wrappers like these, it is
still often useful to dive into Matplotlib's syntax to adjust the final plot output.
Before we dive into the details of creating visualizations with Matplotlib, there are a few useful
things you should know about using the package.
IMPORTING MATPLOTLIB
Just as we use the np shorthand for NumPy and the pd shorthand for Pandas, we will use
somestandard shorthands for Matplotlib imports:
1
import matplotlib as mpl
Setting Styles
We will use the plt.style directive to choose appropriate aesthetic styles for our figures.
Here we will set the classic style, which ensures that the plots we create use the classic
Matplotlib style:
plt.style.use('classic')
The stylesheets used here are supported as of Matplotlib version 1.5; if you are using an earlier
version of Matplotlib, only the default style is available.
show() or No show()
The best use of Matplotlib differs depending on how you are using it; roughly, the three
applicable contexts are using Matplotlib in a script, in an IPython terminal, or in an IPython
notebook.
If you are using Matplotlib from within a script, the function plt.show() is your
friend.
plt.show() starts an event loop, looks for all currently active figure objects, and opens one or
more interactive windows that display your figure or figures.
So, for example, you may have a file called myplot.py containing the following:
import numpy as np
2
x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x))
plt.plot(x, np.cos(x))
plt.show()
You can then run this script from the command-line prompt, which will result in a
windowopening with your figure displayed:
$ python myplot.py
The plt.show() command does a lot under the hood, as it must interact with your system's
interactive graphical backend.
The details of this operation can vary greatly from system to system and even installation to
installation, but matplotlib does its best to hide all these details from you.
One thing to be aware of: the plt.show() command should be used only once per Python session,
and is most often seen at the very end of the script. Multiple show() commands can lead to
unpredictable backend-dependent behavior, and should mostly be avoided.
It can be very convenient to use Matplotlib interactively within an IPython shell (see IPython:
Beyond Normal Python).
IPython is built to work well with Matplotlib if you specify Matplotlib mode. To enable this
mode, you can use the %matplotlib magic command after starting ipython:
In [1]: %matplotlib
3
In [2]: import matplotlib.pyplot as plt
At this point, any plt plot command will cause a figure window to open, and further commands
can be run to update the plot.
Some changes (such as modifying properties of lines that are already drawn) will not draw
automatically: to force an update, use plt.draw().Using plt.show() in Matplotlib mode is
not required.
The IPython notebook is a browser-based interactive data analysis tool that can combine
narrative, code, graphics, HTML elements, and much more into a single executable document
(see IPython: Beyond Normal Python).
Plotting interactively within an IPython notebook can be done with the %matplotlib command,
and works in a similar way to the IPython shell. In the IPython notebook, you also have the
option of embedding graphics directly in the notebook, with two possible options:
%matplotlib notebook will lead to interactive plots embedded within the notebook
%matplotlib inline will lead to static images of your plot embedded in the notebook
%matplotlib inline
After running this command (it needs to be done only once per kernel/session), any cell
withinthe notebook that creates a plot will embed a PNG image of the resulting graphic:
import numpy as np
x = np.linspace(0, 10, 100)
fig = plt.figure()
plt.plot(x, np.sin(x), '-')
plt.plot(x, np.cos(x), '--');
4
Saving Figures to File
One nice feature of Matplotlib is the ability to save figures in a wide variety of formats.
Saving a figure can be done using the savefig() command.
For example, to save the previous figure as a PNG file, you can run this:
fig.savefig('my_figure.png')
To confirm that it contains what we think it contains, let's use the IPython Image object
todisplay the contents of this file:
from IPython.display import Image
Image('my_figure.png')
5
In savefig(), the file format is inferred from the extension of the given filename.
Depending on what backends you have installed, many different file formats are available.
The list of supported file types can be found for your system by using the following method of
the figure canvas object:
fig.canvas.get_supported_filetypes()
Output
{'eps': 'Encapsulated Postscript',
'jpeg': 'Joint Photographic Experts Group',
'jpg': 'Joint Photographic Experts Group',
'pdf': 'Portable Document Format',
'pgf': 'PGF code for LaTeX',
'png': 'Portable Network Graphics',
'ps': 'Postscript',
'raw': 'Raw RGBA bitmap',
'rgba': 'Raw RGBA bitmap',
'svg': 'Scalable Vector Graphics',
'svgz': 'Scalable Vector Graphics',
'tif': 'Tagged Image File Format',
'tiff': 'Tagged Image File Format',
Note that when saving your figure, it's not necessary to use plt.show() or related commands
discussed earlier.
MATLAB-style Interface
Matplotlib was originally written as a Python alternative for MATLAB users, and much of its
syntax reflects that fact.
The MATLAB-style tools are contained in the pyplot (plt) interface. For example, the following
code will probably look quite familiar to MATLAB users:
Program
plt.figure() # create a plot figure
# create the first of two panels and set current axis
plt.subplot(2, 1, 1) # (rows, columns, panel number)
plt.plot(x, np.sin(x))
# create the second panel and set current axis
plt.subplot(2, 1, 2)
plt.plot(x, np.cos(x));
Output
It is important to note that this interface is stateful: it keeps track of the "current" figure and
axes,which are where all plt commands are applied.
You can get a reference to these using the plt.gcf() (get current figure) and plt.gca() (get
current axes) routines.
While this stateful interface is fast and convenient for simple plots, it is easy to run into
problems. For example, once the second panel is created, how can we go back and add
something to the first? This is possible within the MATLAB-style interface, but a bit clunky.
Fortunately, there is a better way.
7
Object-oriented interface
The object-oriented interface is available for these more complicated situations, and for when
you want more control over your figure.
Rather than depending on some notion of an "active" figure or axes, in the object-oriented
interface the plotting functions are methods of explicit Figure and Axes objects.
To re-create the previous plot using this style of plotting, you might do the following:
Program:
# First create a grid of plots
# ax will be an array of two Axes objects
fig, ax = plt.subplots(2)
# Call plot() method on the appropriate object
ax[0].plot(x, np.sin(x))
ax[1].plot(x, np.cos(x));
Output:
For more simple plots, the choice of which style to use is largely a matter of preference, but the
object-oriented approach can become a necessity as plots become more complicated.
The difference is as small as switching plt.plot() to ax.plot(), but there are a few gotchas
that we will highlight as they comeup in the following sections.
Prpgram
:%matplotlib inline
8
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
For all Matplotlib plots, we start by creating a figure and an axes. In their simplest form, a
figure and axes can be created as follows:
fig = plt.figure()
ax = plt.axes()
Output
In Matplotlib, the figure (an instance of the class plt.Figure) can be thought of as a single
container that contains all the objects representing axes, graphics, text, and labels.
The axes (an instance of the class plt.Axes) is what we see above: a bounding box with ticks
and labels, which will eventually contain the plot elements that make up our visualization.
Throughout this book, we'll commonly use the variable name fig to refer to a figure instance,
and ax to refer to an axes instance or group of axes instances.
Once we have created an axes, we can use the ax.plot function to plot some data. Let's start
witha simple sinusoid:
Program:
fig = plt.figure()
ax = plt.axes()
x = np.linspace(0, 10, 1000)
ax.plot(x, np.sin(x));
9
Output:
Alternatively, we can use the pylab interface and let the figure and axes be created for us in the
background (see Two Interfaces for the Price of One for a discussion of these two interfaces):
plt.plot(x, np.sin(x));
Output
To create a single figure with multiple lines, we can simply call the plot function multiple times:
plt.plot(x, np.sin(x))
plt.plot(x, np.cos(x));
10
Output
The first adjustment you might wish to make to a plot is to control the line colors and
styles.
The plt.plot() function takes additional arguments that can be used to specify these.
To adjust the color, you can use the color keyword, which accepts a string argument representing
virtually any imaginable color. The color can be specified in a variety of ways:
Program
plt.plot(x, np.sin(x - 0), color='blue') # specify color by name
plt.plot(x, np.sin(x - 1), color='g') # short color code (rgbcmyk)
plt.plot(x, np.sin(x - 2), color='0.75') # Grayscale between 0 and 1
plt.plot(x, np.sin(x - 3), color='#FFDD44') # Hex code (RRGGBB from 00 to FF)
plt.plot(x, np.sin(x - 4), color=(1.0,0.2,0.3)) # RGB tuple, values 0 to 1
plt.plot(x, np.sin(x - 5), color='chartreuse'); # all HTML color names supported
Output
If no color is specified, Matplotlib will automatically cycle through a set of default colors
11
formultiple lines.
Similarly, the line style can be adjusted using the linestyle keyword:
Program
plt.plot(x, x + 0, linestyle='solid')
plt.plot(x, x + 1, linestyle='dashed')
plt.plot(x, x + 2, linestyle='dashdot')
plt.plot(x, x + 3, linestyle='dotted');
# For short, you can use the following codes:
plt.plot(x, x + 4, linestyle='-') # solid
plt.plot(x, x + 5, linestyle='--') # dashed
plt.plot(x, x + 6, linestyle='-.') # dashdot
plt.plot(x, x + 7, linestyle=':'); # dotted
Output:
Linestyle and color codes can be combined into a single non-keyword argument to the plt.plot()
function:
Program
plt.plot(x, x + 0, '-g') # solid green
plt.plot(x, x + 1, '--c') # dashed cyan
plt.plot(x, x + 2, '-.k') # dashdot black
plt.plot(x, x + 3, ':r'); # dotted red
12
Output
These single-character color codes reflect the standard abbreviations in the RGB
(Red/Green/Blue) and CMYK (Cyan/Magenta/Yellow/blacK) color systems, commonly used
fordigital color graphics.
There are many other keyword arguments that can be used to fine-tune the appearance of the
plot; for more details, I'd suggest viewing the docstring of the plt.plot() function using IPython's
help tools (See Help and Documentation in IPython).
Another commonly used plot type is the simple scatter plot, a close cousin of the line plot. Instead
of points being joined by line segments, here the points are represented individually with a dot,
circle, or other shape.
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
Scatter Plots with plt.plot
In the previous section we looked at plt.plot/ax.plot to produce line plots. It turns out that this
same function can produce scatter plots as well:
Program:
x = np.linspace(0, 10, 30)
y = np.sin(x)
plt.plot(x, y, 'o', color='black');
13
The third argument in the function call is a character that represents the type of symbol used for
the plotting.
Just as you can specify options such as '-', '--' to control the line style, the marker style has its
own set of short string codes.
The full list of available symbols can be seen in the documentation of plt.plot, or in Matplotlib's
online documentation. Most of the possibilities are fairly intuitive, and we'll show a number of
the more common ones here:
Program
rng = np.random.RandomState(0)
for marker in ['o', '.', ',', 'x', '+', 'v', '^', '<', '>', 's', 'd']:
plt.plot(rng.rand(5), rng.rand(5), marker,
label="marker='{0}'".format(marker))
plt.legend(numpoints=1)
plt.xlim(0, 1.8);
Output:
14
For even more possibilities, these character codes can be used together with line and color
codesto plot points along with a line connecting them:
plt.plot(x, y, '-ok');
Output
Additional keyword arguments to plt.plot specify a wide range of properties of the lines
andmarkers:
Program
plt.plot(x, y, '-p', color='gray',
markersize=15, linewidth=4,
markerfacecolor='white',
markeredgecolor='gray',
markeredgewidth=2)
plt.ylim(-1.2, 1.2);
Output
15
This type of flexibility in the plt.plot function allows for a wide variety of possible visualization
options. For a full description of the options available, refer to the plt.plot documentation.
The primary difference of plt.scatter from plt.plot is that it can be used to create scatter plots
where the properties of each individual point (size, face color, edge color, etc.) can be
individually controlled or mapped to data.
Let's show this by creating a random scatter plot with points of many colors and sizes. In order
to better see the overlapping results, we'll also use the alpha keyword to adjust the transparency
level:
Program:
rng = np.random.RandomState(0)x
= rng.randn(100)
y = rng.randn(100)
colors = rng.rand(100)
sizes = 1000 * rng.rand(100)
plt.scatter(x, y, c=colors, s=sizes, alpha=0.3,
cmap='viridis')
plt.colorbar(); # show color scale
16
Output:
Notice that the color argument is automatically mapped to a color scale (shown here by
the colorbar() command), and that the size argument is given in pixels.
The color and size of points can be used to convey information in the visualization, in order to
visualize multidimensional data.
For example, we might use the Iris data from Scikit-Learn, where each sample is one of three
types of flowers that has had the size of its petals and sepals carefully measured:
Program:
from sklearn.datasets import load_iris
iris = load_iris()
features = iris.data.T
plt.scatter(features[0], features[1], alpha=0.2, s=100*features[3], c=iris.target,
cmap='viridis')
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1]);
Output:
17
We can see that this scatter plot has given us the ability to simultaneously explore four different
dimensions of the data: the (x, y) location of each point corresponds to the sepal length and
width, the size of the point is related to the petal width, and the color is related to the particular
species of flower.
Multicolor and multifeature scatter plots like this can be useful for both exploration and
presentation of data.
VISUALIZING ERRORS
For any scientific measurement, accurate accounting for errors is nearly as important, if not
moreimportant, than accurate reporting of the number itself.
For example, imagine that I am using some astrophysical observations to estimate the Hubble
Constant, the local measurement of the expansion rate of the Universe. I know that the current
literature suggests a value of around 71 (km/s)/Mpc, and I measure a value of 74 (km/s)/Mpc
with my method. Are the values consistent? The only correct answer, given this information, is
this: there is no way to know.
Suppose I augment this information with reported uncertainties: the current literature suggests a
value of around 71 ±± 2.5 (km/s)/Mpc, and my method has measured a value of 74 ±± 5
(km/s)/Mpc. Now are the values consistent? That is a question that can be quantitatively
answered.
In visualization of data and results, showing these errors effectively can make a plot convey
much more complete information.
Basic Errorbars
A basic errorbar can be created with a single Matplotlib function call:
Program:
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
18
import numpy as np
x = np.linspace(0, 10, 50)
dy = 0.8
y = np.sin(x) + dy * np.random.randn(50)
plt.errorbar(x, y, yerr=dy, fmt='.k');
Output:
Here the fmt is a format code controlling the appearance of lines and points, and has the
same syntax as the shorthand used in plt.plot, outlined in Simple Line Plots and Simple Scatter
Plots.
In addition to these basic options, the errorbar function has many options to fine-tune the
outputs.
Using these additional options you can easily customize the aesthetics of your errorbar plot.
I often find it helpful, especially in crowded plots, to make the errorbars lighter than the points
themselves:
Program:
plt.errorbar(x, y, yerr=dy, fmt='o', color='black',ecolor='lightgray', elinewidth=3,
capsize=0);
Output:
19
In addition to these options, you can also specify horizontal errorbars (xerr), one-sided
errorbars, and many other variants. For more information on the options available, refer to
the docstringof plt.errorbar.
Continuous Errors
Program:
from sklearn.gaussian_process import GaussianProcess
# define the model and draw some data
model = lambda x: x * np.sin(x)xdata = np.array([1, 3, 5, 6, 8]) ydata = model(xdata)
# Compute the Gaussian process fit
gp = GaussianProcess(corr='cubic', theta0=1e-2, thetaL=1e-4, thetaU=1E-1,random_start=100)
gp.fit(xdata[:, np.newaxis], ydata)
xfit = np.linspace(0, 10, 1000)
yfit, MSE = gp.predict(xfit[:, np.newaxis], eval_MSE=True) dyfit = 2 * np.sqrt(MSE)
# 2*sigma ~ 95% confidence region
We now have xfit, yfit, and dyfit, which sample the continuous fit to our data.
We could pass these to the plt.errorbar function as above, but we don't really want to plot 1,000
points with 1,000 errorbars.
Instead, we can use the plt.fill_between function with a light color to visualize this continuous
error:
Program:
plt.plot(xdata, ydata, 'or')
plt.plot(xfit, yfit, '-', color='gray')
plt.fill_between(xfit, yfit - dyfit, yfit + dyfit,
color='gray', alpha=0.2)
plt.xlim(0, 10);
20
Output:
Note what we've done here with the fill_between function: we pass an x value, then the lower
y-bound, then the upper y-bound, and the result is that the area between these regions is filled.
The resulting figure gives a very intuitive view into what the Gaussian process
regression algorithm is doing: in regions near a measured data point, the model is strongly
constrained and this is reflected in the small model errors.
In regions far from a measured data point, the model is not strongly constrained, and the model
errors increase.
For more information on the optionsavailable in plt.fill_between() (and
the closely related plt.fill() function), see the function docstring or the Matplotlib
documentation.
Finally, if this seems a bit too low level for your taste, refer to Visualization With
Seaborn, where we discuss the Seaborn package, which has a more streamlined API for
visualizing this type of continuous errorbar.
21
• The resulting figure is the working of the Gaussian process
regression algorithm
o In regions near a measured data point, the model is strongly
constrained and this is reflected in the small model errors.
o In regions far from a measured data point, the model is not
strongly constrained, and the model errors increase.
A simple histogram
In[2]: plt.hist(data);
A customized histogram
In[3]: plt.hist(data, bins=30, normed=True, alpha=0.5, histtype='stepfilled',
color='steelblue', edgecolor='none');
Customizing Colorbars
• The colormap can be specified using the cmap argument to the
plotting function that is creating the visualization.
A grayscale colormap
In[4]: plt.imshow(I, cmap='gray');
Choosing the colormap
• Three different categories of colormaps:
1. Sequential colormaps
These consist of one continuous sequence of colors (e.g., binary or
viridis).
2. Divergent colormaps
These usually contain two distinct colors, which show positive and
negative deviations from a mean (e.g., RdBu or PuOr).
3. Qualitative colormaps
These mix colors with no particular sequence (e.g., rainbow or jet).
In[6]: view_colormap('jet')
In[7]: view_colormap('viridis')
A discretized colormap
In[11]: plt.imshow(I, cmap=plt.cm.get_cmap('Blues', 6))
plt.colorbar()
plt.clim(-1, 1);
9. Multiple Subplots
• The subplots in Matplotlib are groups of smaller axes that can exist
together within a single figure.
• These subplots might be insets, grids of plots, or other more
complicated layouts.
• There are four routines for creating subplots in Matplotlib.
• They include:
o plt.axes
o plt.subplot
o plt.subplots
o plt.GridSpec
i. plt.axes: Subplots by Hand
o The most basic method of creating axes is to use the plt.axes
function.
o By default this function creates a standard axes object that fills the
entire figure.
o plt.axes also takes an optional argument that is a list of four
numbers in the figure coordinate system.
o These numbers represent [bottom, left, width, height] in the figure
coordinate system, which ranges from 0 at the bottom left of the
figure to 1 at the top right of the figure.
• There are two axes (the top with no tick labels) that are just
touching: the bottom of the upper panel (at position 0.5) matches
the top of the lower panel (at position 0.1 + 0.4).
A plt.subplot() example
In[4]: for i in range(1, 7):
plt.subplot(2, 3, i)
plt.text(0.5, 0.5, str((2, 3, i)), fontsize=18, ha='center')