0% found this document useful (0 votes)
18 views42 pages

R Module 4

Uploaded by

NIDHISH S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views42 pages

R Module 4

Uploaded by

NIDHISH S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

Module 4

Exploratory Data Analysis, Main Graphical Packages, Pie Charts, Scatter Plots, Line Plots,
Histograms, Box Plots, Bar Plots, Other Graphical packages

Graphics using R
Exploratory Data Analysis
Exploratory Data Analysis (EDA) is a visual based method used to analyse data
sets and to summarize their main characteristics.
Exploratory Data Analysis (EDA) shows how to use visualisation and
transformation to explore data in a systematic way.
EDA is an iterative cycle of the below steps:
1) Generate questions about data.
2) 2) Search for answers by visualising, transforming, and modelling data.
3) 3) Use what is learnt to refine questions and/or generate new questions.
Exploratory Data Analysis (EDA) is an approach for data analysis that employs a variety of
techniques (mostly graphical) to:
1) Maximize insight into a data set
2) Uncover underlying structure
3) Extract important variables
4) Detect outliers and anomalies
5) Test underlying assumptions
6) Develop parsimonious models
7) Determine optimal factor settings.
Main Graphical Packages
• The basic graphs in R can be drawn using the base graphics system.
• These have some limitations and they are overcome in the next level of graphics called
the grid graphics system.
• This system allows to plot the points or lines in the place where desired.
• But, this does not allow us to draw a scatter plot.
• Hence, we go for the next level of plotting which the lattice graphics system.
• In this system, the results of a plot can be saved.
• Also these scatter plots can contain multiple panels in which we can draw multiple
graphs and compare them to each other.
• The next levels of graphs are the ggplot2 graphics system.
• In this the “gg” stands for “grammar of graphics”. This breaks down the graphs into
many parts or chunks.
Pie Charts
• In R the pie chart is created using the pie() function which takes positive numbers as
vector input.
• The additional parameters are used to control labels, colour, title etc.
• The basic syntax for creating a pie-chart is as given below and the explanation of the
parameters are also listed.
Scatter Plots
• Scatter plots are used for exploring the relationship between the two
continuous variables.
• Let us consider the dataset “cars” that lists the “Speed and Stopping
Distances of Cars”.
• The basic scatter plot in the base graphics system can be obtained by using
the plot() function as in Fig. 4.3.
• The below example compares if the speed of a car has effect on its stopping
distance using the plot.
• This plot can be made more appealing and readable by adding colour and
changing the plotting character.
• For this we use the arguments col and pch (can take the values between 1
and 25) in the plot() function as below.
• Thus the plot in Fig. 4.4 shows that there is a strong positive correlation
between the speed of a car and its stopping distance.
• When we have more than two variables and we want to find the correlation
between one variable versus the remaining ones we use scatter plot matrix.
• We use pairs() function to create matrices of scatter plots as in Fig. 4.6.
• The basic syntax for creating scatter plot matrices in R is as below.
Histograms
• Histograms represents the variable values frequencies, that are split into
ranges.
• This is similar to bar charts, but histograms group values into continuous
ranges.
• In R histograms in the base graphics are drawn using the function hist() as
in the Fig. that takes a vector of numbers as input together with few more
parameters listed below.
• The lattice histogram is drawn using the function histogram() as in Fig. and it
behaves in the same way as the base ones.
• But it allows easy splitting of data into panels and saving plots as variables.
• The breaks argument behaves the same way as with hist().
• The lattice histograms support counts, probability densities, and percentage y-
axes via the type argument, which takes the string “count”, “density”, or
“percent”.
• The ggplot histograms are created by adding the function geom_histogram() to
the ggplot() function as in Fig.
• Bin specification is simple here, we just need to pass a numeric bin width to
geom_histogram() function.
• It is possible to choose between counts and densities by passing the special
names ..count.. or ..density.. to the y-aesthetic.
Box Plots
• The box plot divides the data into three quartiles.
• This graph represents the minimum, maximum, median, first quartile and third
quartile in the data.
• This shows the data distribution by drawing the box plots.
• In R base graphics the box plot is created using the boxplot() function as in Fig.,
which takes the following parameters.
• The parameters are used to give the data as a data frame, a vector or a
formula, a logical value to draw a notch, a logical value to draw a box as per the
width of the sample, give title of the chart, labels for the boxes.
• The basic syntax for creating a box-plot is as given below and the explanation of
the parameters are also listed.
Bar Plots
• Bar charts are the natural way of displaying numeric variables split by a
categorical variable.
• In R base graphics the bar chart is created using the barplot() function as in
Fig., which takes a matrix or a vector of numeric values.
• The additional parameters are used to give labels to the X-axis, Y-axis, give
title of the chart, labels for the bars and colours.
• The basic syntax for creating a bar-chart is as given below and the
explanation of the parameters are also listed.
• By default the bars are vertical, but if we want horizontal bars, they can be
generated with horiz = TRUE parameter as in Fig.
• We can also do some fiddling with the plot parameters, via the par() function.
• The las parameter controls whether labels are horizontal, vertical, parallel, or
perpendicular to the axes.
• Plots are usually more readable if you set las = 1, for horizontal.
• The mar parameter is a numeric vector of length 4, giving the width of the plot
margins at the bottom/left/ top/right of the plot.

You might also like