The Art of Data Visualization - Learn 7 Visualizations in R
The Art of Data Visualization - Learn 7 Visualizations in R
With ever increasing volume of data, it is impossible to tell stories without visualizations.
Data visualization is an art of how to turn numbers into useful knowledge.
R Programming lets you learn this art by o ering a set of inbuilt functions and libraries
to build visualizations and present data. Before the technical implementations of the
visualization, let’s see rst how to select the right chart type.
https://www.tatvic.com/blog/7-visualizations-learn-r/?fbclid=IwAR1VVWigxT11hPiyR_nEgBb6FA0kEeBoCbPScTpUNL6zqrpyqyXgiZBM6nA 1/15
28/2/2019 The Art of Data Visualization: Learn 7 visualizations in R
1. Comparison
2. Composition
3. Distribution
4. Relationship
To determine which amongst these is best suited for your data, I suggest you should
answer a few questions like,
How many data points will you display for each variable?
Will you display values over a period of time, or among items or groups?
Below is a great explanation on selecting a right chart type by Dr. Andrew Abela.
https://www.tatvic.com/blog/7-visualizations-learn-r/?fbclid=IwAR1VVWigxT11hPiyR_nEgBb6FA0kEeBoCbPScTpUNL6zqrpyqyXgiZBM6nA 2/15
28/2/2019 The Art of Data Visualization: Learn 7 visualizations in R
In your day-to-day activities, you’ll come across the below listed 7 charts most of the
time.
1. Scatter Plot
2. Histogram
4. Box Plot
5. Area Chart
6. HeatMap
7. Correlogram
We’ll use ‘Big Mart data’ example as shown below to understand how to create
visualizations in R. You can download the full dataset from here.
1. Scatter Plot
When to use: Scatter Plot is used to see the relationship between two continuous
variables.
https://www.tatvic.com/blog/7-visualizations-learn-r/?fbclid=IwAR1VVWigxT11hPiyR_nEgBb6FA0kEeBoCbPScTpUNL6zqrpyqyXgiZBM6nA 3/15
28/2/2019 The Art of Data Visualization: Learn 7 visualizations in R
In our above mart dataset, if we want to visualize the items as per their cost data, then
we can use scatter plot chart using two continuous variables, namely Item_Visibility &
Item_MRP as shown below.
Here is the R code for simple scatter plot using function ggplot() with geom_point().
Now, we can view a third variable also in same chart, say a categorical variable
(Item_Type) which will give the characteristic (item_type) of each data set. Di erent
categories are depicted by way of di erent color for item_type in below chart.
https://www.tatvic.com/blog/7-visualizations-learn-r/?fbclid=IwAR1VVWigxT11hPiyR_nEgBb6FA0kEeBoCbPScTpUNL6zqrpyqyXgiZBM6nA 4/15
28/2/2019 The Art of Data Visualization: Learn 7 visualizations in R
We can even make it more visually clear by creating separate scatter plots for each
separate Item_Type as shown below.
https://www.tatvic.com/blog/7-visualizations-learn-r/?fbclid=IwAR1VVWigxT11hPiyR_nEgBb6FA0kEeBoCbPScTpUNL6zqrpyqyXgiZBM6nA 5/15
28/2/2019 The Art of Data Visualization: Learn 7 visualizations in R
2. Histogram
When to use: Histogram is used to plot continuous variable. It breaks the data into bins
and shows frequency distribution of these bins. We can always change the bin size and
see the e ect it has on visualization.
From our mart dataset, if we want to know the count of items on basis of their cost,
then we can plot histogram using continuous variable Item_MRP as shown below.
https://www.tatvic.com/blog/7-visualizations-learn-r/?fbclid=IwAR1VVWigxT11hPiyR_nEgBb6FA0kEeBoCbPScTpUNL6zqrpyqyXgiZBM6nA 6/15
28/2/2019 The Art of Data Visualization: Learn 7 visualizations in R
Here is the R code for simple histogram plot using function ggplot() with
geom_histogram().
When to use: Bar charts are recommended when you want to plot a categorical
variable or a combination of continuous and categorical variable.
From our dataset, if we want to know number of marts established in particular year,
then bar chart would be most suitable option, use variable Establishment Year as shown
below.
https://www.tatvic.com/blog/7-visualizations-learn-r/?fbclid=IwAR1VVWigxT11hPiyR_nEgBb6FA0kEeBoCbPScTpUNL6zqrpyqyXgiZBM6nA 7/15
28/2/2019 The Art of Data Visualization: Learn 7 visualizations in R
Here is the R code for simple bar plot using function ggplot() for a single continuous
variable.
As a variation, you can remove coord_ ip() parameter to get the above bar chart
vertically.
https://www.tatvic.com/blog/7-visualizations-learn-r/?fbclid=IwAR1VVWigxT11hPiyR_nEgBb6FA0kEeBoCbPScTpUNL6zqrpyqyXgiZBM6nA 8/15
28/2/2019 The Art of Data Visualization: Learn 7 visualizations in R
To know item weights (continuous variable) on basis of Outlet Type (categorical variable)
on single bar chart, use following code:
ggplot(train, aes(Item_Type, Item_Weight)) + geom_bar(stat = "identity", fill =
"darkblue") + scale_x_discrete("Outlet Type")
+ scale_y_continuous("Item Weight", breaks = seq(0,15000, by = 500))
+ theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) + labs(title = "Bar Chart")
Stacked bar chart is an advanced version of bar chart, used for visualizing a combination
of categorical variables.
From our dataset, if we want to know the count of outlets on basis of categorical
variables like its type (Outlet Type) and location (Outlet Location Type) both, stack chart
will visualize the scenario in most useful manner.
https://www.tatvic.com/blog/7-visualizations-learn-r/?fbclid=IwAR1VVWigxT11hPiyR_nEgBb6FA0kEeBoCbPScTpUNL6zqrpyqyXgiZBM6nA 9/15
28/2/2019 The Art of Data Visualization: Learn 7 visualizations in R
Here is the R code for simple stacked bar chart using function ggplot().
4. Box Plot
When to use: Box Plots are used to plot a combination of categorical and continuous
variables. This plot is useful for visualizing the spread of the data and detect outliers. It
shows ve statistically signi cant numbers- the minimum, the 25th percentile, the
median, the 75th percentile and the maximum.
From our dataset, if we want to identify each outlet’s detailed item sales including
minimum, maximum & median numbers, box plot can be helpful. In addition, it also
gives values of outliers of item sales for each outlet as shown in below chart.
https://www.tatvic.com/blog/7-visualizations-learn-r/?fbclid=IwAR1VVWigxT11hPiyR_nEgBb6FA0kEeBoCbPScTpUNL6zqrpyqyXgiZBM6nA 10/15
28/2/2019 The Art of Data Visualization: Learn 7 visualizations in R
The black points are outliers. Outlier detection and removal is an essential step of
successful data exploration.
Here is the R code for simple box plot using function ggplot() with geom_boxplot.
5. Area Chart
When to use: Area chart is used to show continuity across a variable or data set. It is
very much same as line chart and is commonly used for time series plots. Alternatively,
it is also used to plot continuous variables and analyze the underlying trends.
From our dataset, when we want to analyze the trend of item outlet sales, area chart
can be plotted as shown below. It shows count of outlets on basis of sales.
https://www.tatvic.com/blog/7-visualizations-learn-r/?fbclid=IwAR1VVWigxT11hPiyR_nEgBb6FA0kEeBoCbPScTpUNL6zqrpyqyXgiZBM6nA 11/15
28/2/2019 The Art of Data Visualization: Learn 7 visualizations in R
Here is the R code for simple area chart showing continuity of Item Outlet Sales using
function ggplot() with geom_area.
6. Heat Map
When to use: Heatmap uses intensity (density) of colors to display relationship between
two or three or many variables in a two dimensional image. Heatmap Analysis for
website allows you to explore two dimensions as the axis and the third dimension by
intensity of color.
From our dataset, if we want to know cost of each item on every outlet, we can plot
heatmap as shown below using three variables Item MRP, Outlet Identi er & Item Type
from our mart dataset.
https://www.tatvic.com/blog/7-visualizations-learn-r/?fbclid=IwAR1VVWigxT11hPiyR_nEgBb6FA0kEeBoCbPScTpUNL6zqrpyqyXgiZBM6nA 12/15
28/2/2019 The Art of Data Visualization: Learn 7 visualizations in R
The dark portion indicates Item MRP is close 50. The brighter portion indicates Item
MRP is close to 250.
Here is the R code for simple heat map using function ggplot().
7. Correlogram
When to use: Correlogram is used to test the level of co-relation among the variable
available in the data set. The cells of the matrix can be shaded or colored to show the
co-relation value.
Darker the color, higher the co-relation between variables. Positive co-relations are
displayed in blue and negative correlations in red color. Color intensity is proportional
to the co-relation value.
From our dataset, let’s check co-relation between Item cost, weight, visibility along with
Outlet establishment year and Outlet sales from below plot.
https://www.tatvic.com/blog/7-visualizations-learn-r/?fbclid=IwAR1VVWigxT11hPiyR_nEgBb6FA0kEeBoCbPScTpUNL6zqrpyqyXgiZBM6nA 13/15
28/2/2019 The Art of Data Visualization: Learn 7 visualizations in R
In our example, we can see that Item cost & Outlet sales are positively correlated while
Item weight & its visibility are negatively correlated.
install.packages("corrgram")
library(corrgram)
Now I guess it should be easy for you to visualize the data using ggplot2 library in R
Programming.
Apart from visualizations, you can learn more about data mining and the process to
Combine Data from Analytics into R.
To know more or for any assistance on R programming, please drop us a comment with
your details & we will be glad to assist you!!
https://www.tatvic.com/blog/7-visualizations-learn-r/?fbclid=IwAR1VVWigxT11hPiyR_nEgBb6FA0kEeBoCbPScTpUNL6zqrpyqyXgiZBM6nA 14/15
28/2/2019 The Art of Data Visualization: Learn 7 visualizations in R
Dikesh is a software developer and has been a pioneer in building some cool and fun codes
at Tatvic. Dikesh loves gaming and has been a champion of Counter Strikes at least at
Tatvic.
2 . Leave new
thanks for posting, it’s very heplful. may i reference this for my assignment?
Reply
Hey Max, we are glad you found this blog resourceful. Please feel free to
reference this for your assignment as a source
Reply
https://www.tatvic.com/blog/7-visualizations-learn-r/?fbclid=IwAR1VVWigxT11hPiyR_nEgBb6FA0kEeBoCbPScTpUNL6zqrpyqyXgiZBM6nA 15/15