BMAT202P- Probability and statistics lab
Table and Graphical Representations
“
⚫ Importing CSV and Tabular Data Files
We can change the current working directory as follows:
⚫ setwd("<location of the dataset>")
⚫ Example
>setwd("C:\\Users\\admin\\Desktop\\")
>data=read.csv("stud.csv")
⚫ Comma-separated values (CSV) files
⚫ Data files have many formats and accordingly we have
options for loading them.
>data=read.csv(“C:\\Users\\admin\\Desktop\\Mokesh
\\stud.csv”)
Or
>data=read.csv(“C:/Users/admin/Desktop/Mokesh/stud
.csv”)
Graphics on R
⚫ A simple plot plot(X) has each element of a discrete variable X
ploted on the y-axis and the element's index on the x-axis
>v <- c(7,12,28,3,41)
>t <- c(14,7,6,19,3)
> plot(v,type = "o", col = "red", xlab = "Month", ylab =
"Rain fall",main = "Rain fall chart")
>lines(t, type = "o", col = "blue")
R a i n fall c h a r t
40
30
Rainfall
20
10
1 2 3 4 5
M ont h
⚫ Line chart
⚫ A line chart is a simple plot with consecutive plots connected
by lines
type= p type= l type= o type= b
5
4
4
3
3
y
y
y
y
2
2
1
1
1 3 5 1 3 5 1 3 5 1 3 5
x x x x
type= c type= s type= S type= h
5
5
4
4
3
3
y
y
y
y
2
2
1
1
1 3 5 1 3 5 1 3 5 1 3 5
x x x x
y y
0 100 300 500 0 100 300 500
1
1
x
3
x
3
type= c
type= p
5
y y
0 100 300 500 0 100 300 500
1
1
x
3
x
3
type= l
type= s
5
5
y y
0 100 300 500 0 100 300 500
1
1
x
3
x
3
type= S
type= o
5
5
y y
0 100 300 500 0 100 300 500
1
1
x
3
x
3
type= h
type= b
5
5
⚫ Scatterplot
A scatterplot plot(X,Y) has each element of a variable Y
ploted on the y-axis and the corresponding element
for variable X on the x-axis
# scatterplot
>attach(mtcars)
>plot(wt, mpg, main="Weight / MPG graph",
xlab="Car Weight (lbs)", ylab="Miles Per Gallon",
pch=19)
Weight / M P G graph
30
Miles Per Gallon
25
20
15
10
2 3 4 5
C a r W e i g ht (lbs)
⚫ Kernel density plots
⚫ Kernel density plots nicely visualize the shape of a distribution.
They can be better than histograms, even with normal curves
because histograms are strongly affected by the number of bins
used and by outliers.
⚫ # Kernel density plot
⚫ >d <- density(mtcars$mpg) # kernel density estimates
⚫ >plot(d)
⚫ # Filled density plot
⚫ >d <- density(mtcars$mpg)
⚫ >plot(d, main="Kernel Density of Miles Per Gallon")
⚫ >polygon(d, col="red", border="blue")
Kernel Density of Miles Per Gallon
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07
Density
10 20 30 40
N = 32 Bandwidth = 2.477
⚫ boxplot(X) is a plot that, if X is a vector, the vector
elements are the heights of the bars in the plot, if X is a
matrix, the matrix columns are the heights of the bars
in the plot, stacked after the first bar (column)
⚫ If the argument beside=TRUE, then the values in each
column are juxtaposed, not stacked.
⚫ The argument horiz=TRUE creates an horizontal
barplot.
⚫ > simple barplot
⚫ > barplot (VADeaths[,"Rural Male"])
60
50
40
30
20
10
0
50-54 55-59 60-64 65-69 70-74
⚫ # stacked barplots
barplot(VADeaths[,c("Rural Male", "Rural Female")])
150
100
50
0
Rural Male Rural Female
⚫ > # juxtaposed barplots
⚫ > barplot(VADeaths [,c("Rural Male", "Rural
Female")],beside=T)
60
50
40
30
20
10
0
Rural Male Rural Female
>H <- c(7,12,28,3,41)
>M <- c("Mar","Apr","May","Jun","Jul")
>barplot(H,names.arg = M,xlab = "Month",ylab =
"Revenue",col="blue",main = "Revenue chart")
R even u e chart
40
30
Revenue
20
10
0
Mar Apr May Jun Jul
Month
Example :-
>colors <- c("green","orange","brown")
>months <- c("Mar","Apr","May","Jun","Jul")
>regions <- c("East","West","North")
>Values <-
matrix(c(2,9,3,11,9,4,8,7,3,12,5,2,8,10,11),nrow = 3,ncol
= 5,byrow =TRUE)
>barplot(Values,main = "total revenue",names.arg =
months,xlab = "month",ylab = "revenue",col=colors)
>legend("topleft", regions, cex = 1.3, fill = colors)
total r e v e n u e
30
East
West
25
North
20
revenue
15
10
5
0
Mar A pr May Jun Jul
month
⚫ # Simple Dotplot
>dotchart(mtcars$mpg,labels=row.names(mtcars),cex=.7,
main="Gas Milage for Car Models",xlab="Miles Per
Gallon")
⚫ # Dotplot: Grouped Sorted and Colored
⚫ # Sort by mpg, group and color by cylinder
⚫ >x <- mtcars[order(mtcars$mpg),] # sort by mpg
⚫ >x$cyl <- factor(x$cyl) # it must be a factor
⚫ >x$color[x$cyl==4] <- "red"
⚫ >x$color[x$cyl==6] <- "blue"
⚫ >x$color[x$cyl==8] <- "darkgreen"
⚫ >dotchart(x$mpg,labels=row.names(x),cex=.7,groups=
x$cyl,main="Gas Milage for Car Models\ngrouped by
cylinder",xlab="Miles Per Gallon",gcolor="black",
color=x$color)
G a s M i l a g e for C a r M o d e l s
g r o u p e d b y c y linde r
4
T o yo t a Corolla
Fiat 128
Lotus E uropa
H o nd a Civic
Fiat X 1 - 9
P o r sc he 9 1 4 - 2
Merc 2 4 0 D
Merc 230
D a t sun 710
T o yo t a C orona
V o l vo 1 4 2 E
6
H ornet 4 D rive
Mazda RX4 W ag
M a z d a RX4
Ferrari Dino
Merc 280
Valiant
Merc 2 8 0 C
8
Pontiac Firebird
H ornet S portabout
Merc 4 5 0 S L
Merc 4 5 0 S E
F o r d P antera L
D o d g e Challenger
A M C Javelin
Merc 4 5 0 S L C
Maserati B ora
C hrysle r Imperial
D uster 360
C am aro Z 2 8
Lincoln Continental
Cadillac Fleetw ood
10 15 20 25 30
Miles Per Gallon
⚫ Pie
⚫ pie(x) draws a circle (pie) cut into segments (slices), each slice
represents a unique value from the elements of x and the sixe of the
slice and the relative frequency of each unique value is represented
by the size of t
# simple pie
>pie(unique(mtcars$cyl), labels = unique(mtcars$cyl), main="Pie Chart of
N. of cylinders") # pie with percentages and colors
>with(mtcars, {
>n.cyl <- unique(cyl)
>percent.cyl <-round(table(cyl)/dim(mtcars)[1]*100,2)
>lbls <- paste(n.cyl," cyl=",percent.cyl,"%", sep="")
>pie(n.cyl, labels = lbls , main="Pie Chart of N. of cylinders",
col=rainbow(length(lbls)))})
P i e C h a r t o f N. o f c y l i n d e r s
6 cyl=34. 38%
4 cyl=21.88%
8 cyl=43. 75%
>x <- c(21, 62, 10, 53)
>labels <- c("London", "New York", "Singapore",
"Mumbai")
>pie(x,labels)
New York
London
Singapore
Mumbai
>x = c(21, 62, 10, 53)
>labels = c("London", "New York", "Singapore",
"Mumbai")
>pie(x, labels, main = "City pie chart", col =
rainbow(length(x)))
C it y pie c h a r t
N e w York
L o nd o n
S i ng a p o r e
M um b a i
>x <- c(21, 62, 10,53)
>labels <- c("London","New York“ ,"Singapore“ ,"Mumbai" )
>piepercent<- round(100*x/sum(x), 1)
>pie(x, labels = piepercent, main = "City pie chart",col =
rainbow(length(x)))
>legend("topright", c("London","New York","Singapore",
"Mumbai"), cex = 0.8,fill = rainbow(length(x)))
C it y pie c h a r t
London
N ew York
Singap ore
42.5 Mumb ai
14.4
6.8
36.3
⚫ histogram
⚫ hist(X) is an histogram, a bar plot with the frequencies of the values
in X on the y-axis and the ranges of values on the x-axis
⚫ A cumulative distribution curve is the proportion of X on the y-
axis, up to the current position on the x-axis
⚫ > # simple histogram
⚫ > hist(faithful$waiting)
H i s t o g r a m o f faithful$waiting
50
40
Frequency
30
20
10
0
40 50 60 70 80 90 100
faithful$waiting
# draw the histogram
>hist(faithful$waiting, prob =TRUE, xlim=range(xx) , border =
"gray" , col="gray90")
# adds the frequency polygon
>lines(xx, yy, lwd=2, col = "royalblue")
⚫ boxplot
boxplot(X) is a box-and-whisker plot with the values of variable X,
this is an effective way to summarize larger datasets.
# Boxplot of MPG by Car Cylinders
> boxplot(mpg~cyl,data=mtcars, main="Car Milage
Data",xlab="Number of Cylinders", ylab="Miles Per
Gallon")
Car Milage Data
30
Miles Per Gallon
25
20
15
10
4 6 8
Number of Cylinders
⚫ Pairs
⚫ pairs() shows a matrix with all the scatterplots for the columns of
variable X
⚫ pairs(~mpg+disp+drat+wt,data=mtcars,
main="Scatterplot Matrix MPG, Displacement,Rear
axle ratio,Weight")
Scatterplot M atri x M P G , Di sp l acem en t, Rear axle ratio, W ei g h
100 300 2 3 4 5
30
mpg
20
10
300
disp
100
5.0
4.0
d ra t
3.0
5
4
wt
3
2
10 20 30 3.0 4.0 5.0
⚫ Contour
⚫ contour(X,Y,Z) draws a contour plot, with vector X for the rows,
vectorY for the columns and matrix X for the data
>x <- 10*(1:nrow(volcano)); x.at <- seq(100, 800, by=100)
>y <- 10*(1:ncol(volcano)); y.at <- seq(100, 600, by=100)
# Using Terrain Colors
>image(x, y, volcano, col=terrain.colors(100),axes=FALSE)
>contour(x, y, volcano, levels=seq(90, 200, by=5), add=TRUE,
col="brown")
>axis(1, at=x.at)
>axis(2, at=y.at)
>box()
>title(main="Maunga Whau Volcano", sub =
"col=terrain.colors(100)", font.main=4)
Mau n g a Whau Volcano
600
500 110
155
400
300
y
200
180
140
100
135
125
120
100 200 300 400 500 600 700 800
x
c o l= t e r r a i n . c o lo r s ( 1 0 0 )
⚫ Persp
persp(X,Y,Z) draws a 3d graph, with vector X for the rows, vectorY
for the columns and matrix X for the data
# # (2) Visualizing a simple DEM model
>z <- 2 * volcano # Exaggerate the relief
>x <- 10 * (1:nrow(z)) # 10 meter spacing (S to N)
>y <- 10 * (1:ncol(z)) # 10 meter spacing (E to W)
>persp(x, y, z, theta = 120, phi = 15, scale = FALSE, axes =
FALSE)
⚫ Tables
Example:
> library(MASS)
>ships
> table(ships$type)
>table(ships$type,ships$year)
Example :-
>library(MASS)
>USArrests
>table(USArrests[,3])
>table(cut(USArrests[,3],pretty(USArrests[,3])))
Example :-
> airquality
> table(airquality[,4],airquality[,5])
>table(cut(airquality[,4],pretty(airquality[,4])),
airquality[,5])
Example :-
> library(MASS)
> cars