0% found this document useful (0 votes)
16 views101 pages

Module 5 (003) - Updated

This module focuses on using R for graphical data representation, non-parametric tests, and analysis of variance (ANOVA). It aims to equip students with skills to visualize data effectively, conduct non-parametric hypothesis testing, and perform one-way and two-way ANOVA. Key content includes various types of graphs, such as box plots, scatter plots, and bar charts, along with their customization and interpretation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views101 pages

Module 5 (003) - Updated

This module focuses on using R for graphical data representation, non-parametric tests, and analysis of variance (ANOVA). It aims to equip students with skills to visualize data effectively, conduct non-parametric hypothesis testing, and perform one-way and two-way ANOVA. Key content includes various types of graphs, such as box plots, scatter plots, and bar charts, along with their customization and interpretation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 101

Scientific Programming using R

Module Number: 05

Module Name: R for Graphs, Non Parametric Tests and ANOVA


R for Graphs, Non Parametric Tests and ANOVA

AIM:
To familiarise students to understand the graphical representation of data, tests related with non
parametric methods and analysis of variance.

2
R for Graphs, Non Parametric Tests and ANOVA

Objectives:
The Objectives of this module are:
• Describe graphical representation of data.
• Discuss the non parametric testing of hypothesis for categorical variable.
• Explain one way and two way analysis of variance.

3
R for Graphs, Non Parametric Tests and ANOVA

Outcome:
At the end of this module, you are expected to:
• Visualize the data using appropriate visual types.
• Explain different non parametric hypothesis testing procedure for given data.
• Illustrate one way and two way analysis of variance.

4
R for Graphs, Non Parametric Tests and ANOVA

Content
• Graphs
• Non Parametric Methods
• ANOVA

5
R for Graphs, Non Parametric Tests and ANOVA

Graphs

Graphs are a powerful way to present data and results in a concise manner. Graphs are more powerful tool
to express our views and thoughts to the higher authority. Whatever kind of data we have, there is a way
to illustrate it graphically. A graph is more readily understandable than words and numbers, and
producing good graphs is a vital skill. Some graphs are also useful in examining data so that you can gain
some idea of patterns that may exist; this can direct you towards the correct statistical analysis.

R has powerful and flexible graphical capabilities. In general terms, R has two kinds of graphical
commands: some commands generate a basic plot of some sort, and other commands are used to tweak
the output and to produce a more customized finish.

6
R for Graphs, Non Parametric Tests and ANOVA

Graphs -Box-Whisker plots


The box-whisker plot (often abbreviated to boxplot) is a useful way to visualize complex data where
you have multiple samples. In general, you look into differences between samples. The basic form of
the box-whisker plot shows the median value, the quartiles (or hinges), and the max/min values.

The following example shows a simple data


frame composed of two columns:

7
R for Graphs, Non Parametric Tests and ANOVA

Graphs -Box-Whisker plots


This means that you get a lot of information in a compact manner. The box-whisker plot is also useful to
visualize a single sample because you can show outliers if you choose. You can use the boxplot()
command to create box-whisker plots. The command can work in a variety of ways to visualize simple
or quite complex data.
The following example shows a simple data frame
composed of two columns:

8
R for Graphs, Non Parametric Tests and ANOVA

Graphs -Box-Whisker plots

You have seen these data before. You can use the boxplot()
command to visualize one of the variables here:

> boxplot(carsd$mpg)
This produces a simple graph which is shown in the
screenshot. This graph shows the typical layout of a box-
whisker plot.

The stripe shows the median, the box represents the upper
and lower hinges, and the whiskers show the maximum and
minimum values. 9
R for Graphs, Non Parametric Tests and ANOVA

Graphs -Box-Whisker plots

If you have several items to plot, you can simply give the
vector names in the boxplot() command:
> boxplot(carsd$mpg, carsd$wt)
The resulting graph appears in the given screenshot. In this
case, you specify vectors that corresponds to the two
columns in the data frame, but they could be completely
separate.

10
R for Graphs, Non Parametric Tests and ANOVA

Customizing Boxplots
A plot without labels is useless; the plot needs labels. You can use the xlab and ylab instructions to label
the axes. You can use the names instruction to set the labels (currently displayed as 1 and 2) for the two
samples, like so:
> boxplot(carsd$mpg, carsd$wt, names = c(‘mpg', ‘weight'))

11
R for Graphs, Non Parametric Tests and ANOVA

Customizing Boxplots

> boxplot(carsd$mpg, carsd$wt, names = c(‘mpg', ‘weight'))


> title(xlab = 'Variable', ylab = 'Value')

The resulting plot looks like as shown in the screenshot. In this


case, you used the title() command to add the axis labels, but you
could have specifed xlab and ylab within the boxplot() command.

12
R for Graphs, Non Parametric Tests and ANOVA

Customizing Boxplots
Now you have names for each of the samples as well as
axis labels. Notice that the whiskers of the count sample do
not extend to the top, and that you appear to have a separate
point displayed. You can determine how far out the
whiskers extend, but by default this is 1.5 times the
interquartile range.

You can alter this by using the range = instruction; if you


specify range = 0 as shown in the following example, the
whiskers extend to the maximum and minimum values:

> boxplot(carsd$mpg, carsd$wt, names = c('count', 'speed'),


range = 0, xlab = 'Variable', ylab = 'Value', col = ‘red')

13
R for Graphs, Non Parametric Tests and ANOVA

Horizontal Boxplots

With a simple additional instruction, we can display the bars horizontally rather than vertically (which is
the default):
> boxplot(mpg ~ vs, data = carsd1, range = 0, horizontal = TRUE)
> title(ylab = ‘Miles per Gallon', xlab = ‘vs')

14
R for Graphs, Non Parametric Tests and ANOVA

Horizontal Boxplots

When we use horizontal = TRUE instruction, your graph is displayed with horizontal bars. Notice how
with the title() command you had to switch the x and y labels. The xlab instruction refers to the
horizontal axis and the ylab instruction refers to the vertical.

15
R for Graphs, Non Parametric Tests and ANOVA

Scatter Plots
The basic plot() command is an example of The following data frame contains two columns of
numeric values, and because they contain the
a generic function that can be pressed into
same number of observations, they could form the basis
service for a variety of uses. Many for a scatter plot:
specialized statistical routines includes a
plotting routine to produce a specialized
graph. However, you will use the plot()
command to produce xy scatter plots. The
scatter plot is used especially to show the
relationship between two variables.

16
R for Graphs, Non Parametric Tests and ANOVA

Scatter Plots
The basic form of the plot() command requires you to specify the x and y data, each being a numeric
vector. You use it like so:
plot(x, y, ...)
If you have your data contained in a data frame as in the following example, you must use the $ syntax to
get at the variables; you might also use with() or attach() commands. For the example data here, the
following commands all produce a similar result.
> plot(carsd$wt, carsd$mpg)
> with(carsd, plot(wt, mpg))
> attach(carsd)
> plot(wt, mpg)
17
> detach(carsd)
R for Graphs, Non Parametric Tests and ANOVA

Scatter Plots

> plot(carsd$wt, carsd$mpg, col = ‘red’)


> with(carsd, plot(wt, mpg))
> attach(carsd)
> plot(wt, mpg, col = ‘red’)
> detach(carsd)

Notice that the names of the axis labels match up with what you typed into
the command. In this case, you used the $ syntax to extract the variables;
these are reflected in the labels. 18
R for Graphs, Non Parametric Tests and ANOVA

Using Formula Syntax


There is another way that you can specify what user wants to plot; rather than giving the x and y values
as separate components, user produce a formula to describe the situation:
>form <- mpg ~ wt
> plot(form, data = mtcars)

User can use the tilde character (~) to symbolize formula. On the left, you place the response variable
(i.e., the dependent variable) and on the right you place the predictor (independent) variable. At the end,
you tell the command where to find these data. This is useful because it means you do not need to use the
$ syntax or use the attach() command to allow R to read the variables inside the data frame.

19
R for Graphs, Non Parametric Tests and ANOVA

Adding lines of Best-Fit to Scatter plots


abline() command is used to add a straight line matching the slope and the intercept of a series of points
when you produced a QQ plot. You can do the same thing here; first, you need to determine the slope and
intercept. Slope and intercept for the linear model will be passed on to the abline() command.
> abline(lm(mpg ~ wt, data = mtcars))

Now, user can see another advantage of using the formula notation: the command is very similar to the
original plot() command. The default line produced is a thin solid black line, but user can alter its
appearance in various ways. It is possible to alter the colour using the col = instruction, user can alter the
line width using the lwd = instruction; and can alter the line type using the lty = instruction.

20
R for Graphs, Non Parametric Tests and ANOVA

Adding lines of Best-Fit to Scatter plots

21
R for Graphs, Non Parametric Tests and ANOVA

Line Types that Can be Specified Using the lty Instruction in a Graphical Command

22
R for Graphs, Non Parametric Tests and ANOVA

Line Types that Can be Specified Using the lty Instruction in a Graphical Command
plot(mtcars$mpg~mtcars$wt, col = "red", pch = 2, cex = 2) >
abline(lm(mtcars$mpg~mtcars$wt), lty = 'dotted', lwd = 3, col = 'gray50')

23
R for Graphs, Non Parametric Tests and ANOVA

Pairs Plots
scatterplot matrix where each pairwise combination is plotted
is referred as pairs plot. pairs( ) function will create a pairs
plot— also can be created customized pairs plots.

By default, the pairs() command takes all the columns in a


data frame and creates a matrix of scatter plots. This is useful
but messy if user have a lot of columns. User can choose
which columns to be displayed by using the formula notation
along the following lines:
pairs(~ x + y + z, data = our.data)
24
R for Graphs, Non Parametric Tests and ANOVA

Pairs Plots

25
R for Graphs, Non Parametric Tests and ANOVA

Line Charts
There may be many occasions when data is time-dependent, data that is collected over a period of time.
If user wants to display these data as a scatter plot where the y-axis reflects the magnitude of the data
which are recorded and the x-axis reflects the time then line chart is the best option to display. It would
seem sensible to be able to join the data together with lines in order to highlight the changes over time.

26
R for Graphs, Non Parametric Tests and ANOVA

Line Charts

plot(airmiles, type = 'o') plot(airmiles, type = 'b')

27
R for Graphs, Non Parametric Tests and ANOVA

Line Charts

plot(tempe$Maxtemp, type = 'b')

28
R for Graphs, Non Parametric Tests and ANOVA

Pie Charts

If you have data that represents how something is divided up


between various categories, the pie chart is a common
graphic choice to illustrate your data. For example, you might
have data that shows sales for various items for a whole year.
The pie chart enables you to show how each item contributed
to total sales. Each item is represented by a slice of pie—the
bigger the slice, the bigger the contribution to the total sales.
In simple terms, the pie chart takes a series of data,
determines the proportion of each item towards the total, and
then represents these as different slices of the pie.
29
R for Graphs, Non Parametric Tests and ANOVA

Pie Charts
The pie chart is commonly used to display pie(tempe$Maxtemp,labels = tempe$Month,
col=rainbow(length(tempe$Month)), +
proportional data. You can create pie charts using the main="Maximum Temperature of a Year")
pie() command. In its simplest form, you can use a
vector of numeric values to create your plot like so:

30
R for Graphs, Non Parametric Tests and ANOVA

Cleveland dot charts

Cleveland dot plots are a great alternative to a


simple bar chart, particularly if you have more
than a few items. It does not take much for a
bar chart to look cluttered. In the same amount
of space, many more values can be included in
a dot plot, and it is easier to read as well. R has
a built-in base function, dotchart().

31
R for Graphs, Non Parametric Tests and ANOVA

Cleveland dot charts


# Dotplot: Grouped Sorted and Colored
# Sort by mpg, group and color by cylinder
x <- mtcars[order(mtcars$mpg),] # sort by mpg
x$cyl <- factor(x$cyl) # it must be a factor
x$color[x$cyl==4] <- "red"
x$color[x$cyl==6] <- "blue"
x$color[x$cyl==8] <- "darkgreen"
dotchart(x$mpg,labels=row.names(x),cex=.7,groups= x$cyl,
main="Gas Milage for Car Models\ngrouped by cylinder",
xlab="Miles Per Gallon", gcolor="black", color=x$color)

32
R for Graphs, Non Parametric Tests and ANOVA

Bar Charts
The bar chart is suitable for showing data that falls into discrete categories. Histogram is the type
of bar chart but width of the bar is important in histogram. Bar chart explains about the magnitude
of particular observation in the data, but histogram explains about the distribution of data.

Bar charts are widely used because they convey information in a readily understood fashion. They
are also flexible and can show items in various groupings.

33
R for Graphs, Non Parametric Tests and ANOVA

Single-category Bar charts


The simplest plot can be made from a single vector of numeric values. In the following example you have
such an item:

tempe$Maxtemp [1] 42 34 40 28 35 34 28 33 34 38 40 26

To make a bar chart you use the barplot() command and specify the vector name in the instruction.

34
R for Graphs, Non Parametric Tests and ANOVA

Single-category Bar charts

> barplot(tempe$Maxtemp, col = colr, ylab = "Max Temperature", xlab = "Months", names =
tempe$Month)

Bar plots need not be based on counts or frequencies. You can create bar plots that represent means,
medians, standard deviations, etc. Use the aggregate( ) function and pass the results to the barplot( )
function.

35
R for Graphs, Non Parametric Tests and ANOVA

Multiple category Bar charts


The examples of bar charts you have seen so far have all involved in a single “row” of data, i.e., all the
data relates to categories in one group. It is also quite common to have several groups of categories. You
can display these groups in several ways, the most primitive being a separate graph for each group.
However, you can also arrange your bar chart so that these multiple categories are displayed on one
single plot. You have two options: stacked bars and grouped bars.

Stacked Bar Charts:


If your data contains several groups of categories, you can display the data in a bar chart in one of two
ways. You can decide to show the bars in blocks (or groups) or you can choose to have them stacked.

36
R for Graphs, Non Parametric Tests and ANOVA

Multiple category Bar charts

barplot(t(data1), col = c("red","yellow","blue","pink"), legend.text = T)

37
R for Graphs, Non Parametric Tests and ANOVA

Grouped Bar Charts

When your data are in a matrix with several rows, the default bar chart is a stacked chart as you
saw in the previous slides. You can force the elements of each column to be unstacked by using the
beside = TRUE instruction as shown in the following code (the default is set to FALSE), in the
screenshot.

38
R for Graphs, Non Parametric Tests and ANOVA

Grouped Bar Charts


The resulting graph now shows as a series of bars in each of the column categories

39
R for Graphs, Non Parametric Tests and ANOVA

Non Parametric Tests


Non parametric test is otherwise known as distribution free test because the underlying assumptions for
the test is very fewer or weaker in respect of statistical assumptions. For underlying population
distribution do have very fewer assumptions about the shape. So if parametric tests violate in assumption,
non-parametric test is in huge alternative for testing of hypothesis.

Unlike in parametric test, non-parametric test uses the scale of data is in ordinal or nominal. And for
parametric test, mean and variance will be test for the hypothesis. But in non- parametric test median will
be used to test the hypothesis.
Parametric Test Corresponding Non-Parametric Test
Independent t test (One Sample) Wilcoxon Signed Rank test
Independent t test (Two Sample) Mann Whitney U test
Dependent sample t test Wilcoxon Matched pairs signed rank test
One Way ANOVA Kruskal-Wallis Test 40
R for Graphs, Non Parametric Tests and ANOVA

Non Parametric Tests

Wilcoxon Signed Rank Test for one population median:


In many situations, family income, house prices are highly skewed or we can say the level of data
scale will be very low like ordinal or nominal. If distribution of population is skewed, then preferring
non-parametric test is a wise choice for the researcher.
Wilcoxon Signed Rank test for small sample size:
Wilcoxon Signed Rank test for small sample can be explained with the help of an example.
iNurture University placement office is interested to know that the median salary for B.Tech graduates
exceeds Rs.50000. Employees in the office believe that the median salary for the B.Tech graduates
exceeds Rs.50000 (Skewed highly in the right), since the population is highly skewed, using t test will
not be right choice to test the claim of the iNurture University. So, Wilcoxon Signed Rank test would
be a right choice to test the claim.

41
R for Graphs, Non Parametric Tests and ANOVA

Non Parametric Tests


P - Values = 0.4409, which is much greater than the level of significance. So, we are in the situation or
condition to accept the null hypothesis that the median salary of the B.Tech graduates in iNurture
University is below Rs. 50000. Other way we can conclude that the claim of iNurture University about
the median salary for the B. Tech Graduates is not true.
Note:
v test statistic approaches normal distribution if the number of observation is more than 20. If the number
of observation is more than 20 , test assumes normality.

42
R for Graphs, Non Parametric Tests and ANOVA

Non Parametric Tests - Mann-Whitney U test


Non Parametric test for two population median:

Mann-Whitney U test: Mann-Whitney U test will be used for alternative to the independent two sample t
test. In Mann-Whitney U test medians will be compared to test the hypothesis.

Example:
The workforce of the Bangalore district is made up of the rural and urban divisions. A few months ago,
several rural division supervisors began claiming that the urban division employees waste tar from the tar
godown. The supervisors claimed the urban division uses more tar per mile of road maintenance than the
rural division. In response to these claims, the Bangalore district material manager performed a test. He
selected random sample from the district job cost records of jobs performed by the urban division (UD)
and another sample of jobs performed by the rural division(RD). The kg of tar per mile for each job are
recorded. Though the measurement scale ratio the manager assumes that the data is highly skewed as the
job in rural area is much lesser than the jobs in urban area. So, it is safe to use non parametric test on
account of this reason.
43
R for Graphs, Non Parametric Tests and ANOVA

Non Parametric Tests - Mann-Whitney U test


Non Parametric test for two population median

Conclusion for the R results:

Since the p value for the test is greater than the fixed level of significance (0.05), null hypothesis is
accepted. The claim of rural division supervisors is not true.

44
R for Graphs, Non Parametric Tests and ANOVA

Non Parametric Tests – Chi Square Test

When you have categorical data you can look


for associations between categories by using
the chi squared test. Routines to achieve this
are accessed using the chisq.test() command.
You can add various additional instructions to
the basic command to suit your requirements.

The summary of arguments for the function


chisq.test( ) is given in the screenshot.

45
R for Graphs, Non Parametric Tests and ANOVA

Non Parametric Tests – Chi Square Test


The most common use for a chi-squared test is where you have multiple categories and want to see
if associations exist between them. In the following example, you can see some categorical data set
out in a data frame. You have seen these data before:

Researcher wants to test that, is there any association between cylinder, forward gear and number of
carburators in different car brands.

46
R for Graphs, Non Parametric Tests and ANOVA

Non Parametric Tests – Chi Square Test


Since the p value is very much greater than 0.05, we
have enough evidence to accept the null hypothesis.
In this problem, null hypothesis has no association
between number of cylinders, forward gears and
number of carburators.
Researcher can conclude that there is no association
between number of cylinders, forward gears and
number of carburetors.

47
R for Graphs, Non Parametric Tests and ANOVA

Chi Square Test – Monte Carlo Simulation


The default is that simulate.p.value = FALSE and that
B = 2000. The latter is the number of replicates to use
in the Monte Carlo test, which is set to 2500 for this
example.

Researcher can conclude that there is no association


between number of cylinders, forward gears and
number of carburators.

48
R for Graphs, Non Parametric Tests and ANOVA

Chi Square Test – Yates Correction

Yates correction is applied to 2X2 contingency table. Yates


correction will be applied if any of the cell frequency is less
than 5, then Yates correction will be used to get the error
less results.

When you have a 2 n 2 contingency table it is common to


apply the Yates’ correction. By default, this is used if the
contingency table has two rows and two columns. You can
turn off the correction using the correct = FALSE
instruction in the command.

49
R for Graphs, Non Parametric Tests and ANOVA

Single category: goodness of Fit tests

You can use the chisq.test() command to In the following example, you have a simple data
carry out a goodness of ft test. In this case, frame containing two columns; the first column
you must have two vectors of numerical contains values related to an old survey. The
values, one representing the observed values second column contains values related to a new
and the other representing the expected ratio survey. You want to see if the proportions of the
of values. The goodness of ft tests is the data new survey matches the old one, so you perform a
against the ratios (probabilities) you goodness of ft test.
specified. If you do not specify any, the data
are tested against equal probability.

50
R for Graphs, Non Parametric Tests and ANOVA

Single category: goodness of Fit tests

To run the test you use the chisq.test() command, but


this time you must specify the test data as a single
vector and also point to the vector that contains the
probabilities.

In the above result, p value is greater than 0.05, so, there is not enough evidence
to reject null hypothesis. Hence, the test is concluded that, new survey matches
with old survey on performance index of the employee.

51
R for Graphs, Non Parametric Tests and ANOVA

ANOVA
ANOVA is a statistical tool used to find the equality of more than two groups. If a researcher wants to
find out equality of two means he/she can go for t test or z test based on the sample size. ANOVA uses
the F test to give a conclusion of the experimental results. ANOVA mainly deals with experimental data
rather than the observational data. Experimental data is observed under the controlled condition unlike
observational data is observed from the natural happenings without any control over the situation.
In ANOVA, we do not observe the independent variable or quantify them individually. ANOVA
generally, uses independent variable as factors and it is not observed directly. For example, suppose a
researcher wants to identify the effect of three dosage of a particular medicine for a particular fever.

52
R for Graphs, Non Parametric Tests and ANOVA

ANOVA

Researcher wants to identify in which dosage, the medicine is effective in reducing the fever level. In
this case, the observed variable is severeness of fever and the factor is medicine dosage level. Here,
factors cannot be observed quantitatively because it is taken by the person and the reaction to it in terms
of fever level.
Assumptions of ANOVA:
• Factors in the general linear model are linear in nature or additive in nature.
• All the population in the analysis is having common variance.
• Within each group, samples are drawn randomly and normally distributed with mean µ and
variance σ2 .
• All samples are drawn independently of each other.
• Errors in the model is normally distributed with mean 0 and common variance σ2 . 53
R for Graphs, Non Parametric Tests and ANOVA

One Way - ANOVA

One-way ANOVA and two-way ANOVA are one factor ANOVA, where the main source of
variation or interest of study is focused to one factor. The main difference between one-way and
two-way ANOVA is that, in one-way ANOVA source of variation is in one direction but in the
two-way ANOVA it is in two-way direction.
General linear model for the one-way ANOVA is as follows:
yij = μ + τi + εij

where, yij is observation of ith treatment of jth observation


μ is the general mean or overall mean
τi is the effect of ith treatment or ith group

εij is the error term of ith treatment of jth observation


54
R for Graphs, Non Parametric Tests and ANOVA

Post ANOVA test:


As we rejected our null hypothesis, we are in a situation that which of our groups mean differ from one
another in the data.
As an introduction, here we explained only about least significance difference (lsd) test. Following is the
procedure on how the lsd test is performed to find out the pairwise comparison of group mean in the data set.
lsdcritical = SE difference of two group means ∗ t error df, α%

2Se2
SE difference of two group means =
n

where, Se2 is Error mean sum of square


n is the number of observation in the ith group
α% is the fixed level of significance 55
R for Graphs, Non Parametric Tests and ANOVA

One Way - ANOVA

Following is the data with three groups in the columns and each replicated three times. So the total
observation is 9. Treatment effect or group effect can be explained as, it is the deviation between the
over all mean and group mean.

Observation or Replication Group 1 Group 2 Group 3

1 20 22 𝑦31 = 25

2 18 19 𝑦32 = 23

3 16 19 𝑦33 = 18

Average 18 20 22 𝜇 = 20

56
R for Graphs, Non Parametric Tests and ANOVA

One Way - ANOVA

Error term or unexplained effect are deviations between individual observation and corresponding group
mean.
y32 = μ + τ3 + ε32

20 + 18 + ⋯ + 23 + 18
μ= = 20
9
τ3 = second group mean − overall mean
τ3 = 22 − 20 = 2
ε32 = y32 − average of τ2
ε32 = 23 − 22 = 1
23 = 20 + 2 + 1
This is how the general linear model works in estimating the effect of factors in the experiments.
In the sequel, we see how ANOVA is performed and estimated the factors effect.
57
R for Graphs, Non Parametric Tests and ANOVA

One Way - ANOVA


The same procedure will be explained by using an empirical data:
iNurture University decided to find out that the students will learn most effectively with a constant
background sound, as opposed to an unpredictable sound or no sound at all. iNurture University
randomly divides 27 students into 3 groups of 9. All students study a passage of text for 30 minutes.
Those in group 1 study with background sound at a constant volume in the background. Those in group
2 study with noise that changes volume periodically. Those in group 3 study with no sound in the
background. After studying, all students take a 25-point multiple choice test over the given study
material. Their scores are as follows explained in the next slide:

58
R for Graphs, Non Parametric Tests and ANOVA

One Way - ANOVA

H0 (Null Hypothesis): The marks got by 3 Group 1 Group 2


different groups of students under different sound Observation of Group 3
(Constant (Random
background is statistically equal. Or the 9 Students (No Sound)
Sound) Sound)
background sound does not have effect on the
students scored in the test. 1 22 10 16

Using Notation: μ𝐶𝑆 = μ𝑅𝑆 = μ𝑁𝑆 2 23 12 18


3 21 19 19
H1 (Alternative Hypothesis): The marks got by 3
different groups of students under different sound 4 19 9 21
background is statistically unequal. Or the 5 18 18 22
background sound does have effect on the students 6 19 22 24
scored in the test. 7 20 21 23
Using Notation: μ𝐶𝑆 ≠ μ𝑅𝑆 ≠ μ𝑁𝑆 (At least 8 22 20 20
one of the pair is true in the notation).
9 24 19 17

59
R for Graphs, Non Parametric Tests and ANOVA

One Way - ANOVA

60
R for Graphs, Non Parametric Tests and ANOVA

One Way - ANOVA

61
R for Graphs, Non Parametric Tests and ANOVA

Conclusion from the R Output

1. Interpretation of R output is as same as explained in the previous illustrated example.


2. In the pairwise comparison test, it is important to mention in the function about the error degrees
of freedom and error mean sum of square.
3. Confidence interval for each of the group mean is given in the output. For example, for the first
group, the confidence interval is 18.50203 for the lower limit and 23.27574 is the upper limit for
the population arithmetic mean of the first group.
4. Groups which are having same alphabet is statistically equal with one another. For instance, If
group1 and group 3 have same alphabet, they are statistically equal with one another. In the
same way if group1 and group2 have different alphabet, they are statistically unequal. If group3
and group2 have the same alphabet, they are statistically insignificant.
62
R for Graphs, Non Parametric Tests and ANOVA

Two Way ANOVA

General linear model for the two-way ANOVA is as follows:


yij = μ + τi + αj + εij

where, yij is observation of ith treatment of jth replication


μ is the general mean or overall mean
τi is the effect of ith treatment or ith group

αj is the effect of jth replication

εij is the error term of ith treatment of jth replication

63
R for Graphs, Non Parametric Tests and ANOVA

Two Way ANOVA

This linear model is illustrated with small example as follows:

Following is the data with three groups in the columns and each replicated three times. So, the total
observation is 9.

Observation or Replication Group 1 Group 2 Group 3 Replication Average


1 20 22 𝑦31 = 25 22.333
2 18 19 𝑦32 = 23 20.000
3 16 19 𝑦33 = 18 17.667
Group Average 18 20 22 𝜇 = 20

64
R for Graphs, Non Parametric Tests and ANOVA

Two Way ANOVA


Treatment effect or group effect can be explained as the deviation between the overall mean and group
mean.
Replication effect can be explained as the deviation between replication mean with total mean.
Error term or unexplained effect are deviations between individual observation and corresponding group
mean.
y32 = μ + τ3 + α2 + ε32
20 + 18 + ⋯ + 23 + 18 This is how the
μ= = 20
9 general linear
τ3 = Third group mean − overall mean
τ3 = 22 − 20 = 2 model works in
α2 = Second replication mean − overall mean
α2 = 20 − 20 estimating the
α2 = 0
ε32 = y32 − average of τ3 − average of α2 + overall mean
effect of factors in
ε32 = 23 − 22 − 20 + 20 = 1 the experiments.
23 = 20 + 2 + 1
23 = 20 + 2 + 0 + 1 65
R for Graphs, Non Parametric Tests and ANOVA

Two Way ANOVA

Hypothesis formation for groups:


H0 (Null Hypothesis): All the group means (k number of groups) in the experiments are statistically
equal.
Using Notation: μ1 = μ2 = ⋯ = μk
H1 (Alternative Hypothesis): At least one of the group mean in the experiment statistically differs from
the rest of the group means.
Using Notation: μ1 ≠ μ2 ≠ ⋯ ≠ μk (At least one of the pair is true in the notation).
Hypothesis formation for replication:
H0 (Null Hypothesis): All the replication means (n number of replication) in the experiments are
statistically equal.
Using Notation: μ1 = μ2 = ⋯ = μn
H1 (Alternative Hypothesis): At least one of the replication mean in the experiment statistically differs
from the rest of the replication means.
Using Notation: μ1 ≠ μ2 ≠ ⋯ ≠ μn (At least one of the pair is true in the notation).

66
R for Graphs, Non Parametric Tests and ANOVA

Two Way ANOVA


iNurture University decided to find out that the students will learn most effectively with a constant
background sound, as opposed to an unpredictable sound or no sound at all. iNurture University randomly
divides 27 students into 3 groups of 9. All students study a passage of text for 30 minutes. Those in group 1
study with background sound at a constant volume in the background. Those in group 2 study with noise
that changes volume periodically. Those in group 3 study with no sound in the background. After studying,
all students take a 25-point multiple choice test over the given study material. Their scores are depicted in
the table. Observation of 9 Students Group 1 (Constant Sound) Group 2 (Random Sound) Group 3 (No Sound)
1 22 10 16
2 23 12 18
3 21 19 19
4 19 9 21
5 18 18 22
6 19 22 24
7 20 21 23
8 22 20 20 67
R for Graphs, Non Parametric Tests and ANOVA

Two Way ANOVA

68
R for Graphs, Non Parametric Tests and ANOVA

Two Way ANOVA

69
R for Graphs, Non Parametric Tests and ANOVA

Two Way ANOVA


Interpretation of Two-way ANOVA results of R:

In two-way ANOVA, replication variation is extracted from the error variance, so the error mean sum of
square become smaller than the one-way ANOVA. Hence, the F critical value is comparatively higher
than the one-way ANOVA.

Conclusion from the R Output:

Interpretation of R output is as same as explained in the above illustrated example.

In the pairwise comparison test, it is important to mention in the function about the error degrees of
freedom and error mean sum of square.

70
R for Graphs, Non Parametric Tests and ANOVA

Two Way ANOVA


$means:

Confidence interval for the each of the group mean is given in the output. For example, for the first
group, the confidence interval is 18.48295 for the lower limit and 23.29483 is the upper limit for
the population arithmetic mean of the first group.

$Groups:

Groups which are having same alphabet is statistically equal with one another. For instance, If
group1 and group 3 have same alphabet, they are statistically equal with one another. In the same
way if group1 and group2 have different alphabet, they are statistically unequal. If group3 and
group2 have the same alphabet, they are statistically insignificant.

71
R for Graphs, Non Parametric Tests and ANOVA

Self Assessment Question

1. A data representation technique used for non technical person is called as _________.

a. Graphical Techniques
b. Statistical Inference
c. Testing of hypothesis

Answer: Graphical Techniques

72
R for Graphs, Non Parametric Tests and ANOVA

Self Assessment Question

2. ___________ is the function used for Box-Whisker plot.

a. bwplot ( )
b. wplot( )
c. boxplot( )
d. None of the above

Answer: boxplot( )

73
R for Graphs, Non Parametric Tests and ANOVA

Self Assessment Question


3. If you have two items to plot, which one of the given options is the boxplot() command?

a. boxplot(item1 | item 2)
b. boxplot(item1 & item 2)
c. boxplot(item1 , item 2)

Answer: boxplot(item1 , item 2)

74
R for Graphs, Non Parametric Tests and ANOVA

Self Assessment Question

4. . boxplot(item1 ~ item2, data = dataset, range = 0) will produce_____________.

a. Vertical boxplot
b. Horizontal boxplot
c. Slanting boxplot
d. Error message

Answer: Vertical boxplot

75
R for Graphs, Non Parametric Tests and ANOVA

Self Assessment Question

5. .boxplot(item1 ~ item2, data = dataset, range = 0, horizontal = T) will produce__________ boxplot.

a. Vertical
b. Horizontal
c. Slanting

Answer: Horizontal

76
R for Graphs, Non Parametric Tests and ANOVA

Self Assessment Question

6. Plot function will produce____________ plot.

a. Line
b. Trend line
c. Fitted line
d. Scatter

Answer: Scatter

77
R for Graphs, Non Parametric Tests and ANOVA

Self Assessment Question

7. Function abline( ) will produce_______ plot.

a. Line
b. Trend line
c. Fitted line
d. Scatter

Answer: Fitted line

78
R for Graphs, Non Parametric Tests and ANOVA

Self Assessment Question

8. In plot ( ) function, argument lty is used for _______________.

a. Fitted line thickness


b. Fitted line structure
c. Observed point texture
d. None of the Above

Answer: Fitted line structure

79
R for Graphs, Non Parametric Tests and ANOVA

Self Assessment Question

9. In plot ( ) function, argument lwd is used for _______________.

a. Fitted line thickness


b. Fitted line structure
c. Observed point texture
d. None of the Above

Answer: Fitted line thickness

80
R for Graphs, Non Parametric Tests and ANOVA

Self Assessment Question

10. In plot ( ) function, argument pch used is for _______________.

a. Fitted line thickness


b. Observed point size
c. Observed point design
d. None of the Above

Answer: Observed point design

81
R for Graphs, Non Parametric Tests and ANOVA

Self Assessment Question

11. Which one of the given options is the function used for pairwise plot in R?

a. pplot( )
b. pairplot ( )
c. pairs ( )
d. None of the above

Answer: pairs ( )

82
R for Graphs, Non Parametric Tests and ANOVA

Self Assessment Question

12. Which one of the given function is used for line chart in R?
i. pplot( )
ii. pairplot ( )
iii. pairs ( )

a. Only i
b. Only ii
c. All i, ii and iii
d. None of the above

Answer: None of the above


83
R for Graphs, Non Parametric Tests and ANOVA

Self Assessment Question

13. Which one of the given functions is used for pie chart in R?

a. piechart( )
b. pieplot ( )
c. piecharts ( )
d. pie ( )

Answer: pie ( )

84
R for Graphs, Non Parametric Tests and ANOVA

Self Assessment Question

14. Which one of the given function is used for Clevland dot chart in R?

a. dotchart( )
b. cchart ( )
c. clevchart( )
d. clchart ( )

Answer: dotchart( )

85
R for Graphs, Non Parametric Tests and ANOVA

Self Assessment Question

15. Which one of the given functions is used for stacked bar chart in R?

a. dotchart( )
b. cchart ( )
c. clevchart( )
d. barplot ( )

Answer: barplot ( )

86
R for Graphs, Non Parametric Tests and ANOVA

Self Assessment Question

16. Which one of the given statistical function is used for Wilcoxon signed rank test?

a. wilcoxon.test ( )
b. wilcox.test ( )
c. wilcoxons.test ( )
d. wilcoxonsign.test ( )

Answer: wilcox.test ( )

87
R for Graphs, Non Parametric Tests and ANOVA

Self Assessment Question

17. For testing of association, which one of the given options is the null hypothesis?

a. Attributes are dependent


b. Attributes are independent
c. Attributes are not tested
d. None of the above

Answer: Attributes are independent

88
R for Graphs, Non Parametric Tests and ANOVA

Self Assessment Question


18. In non parametric test, ___________ statistical measure will be tested.

a. Mean
b. Mode
c. Median
d. GM

Answer: Median

89
R for Graphs, Non Parametric Tests and ANOVA

Self Assessment Question

19. Which one of the given options is the statistical analysis used to test arithmetic mean of two groups?

a. t test
b. ANOVA
c. Z Test
d. None of the above

Answer: t test

90
R for Graphs, Non Parametric Tests and ANOVA

Self Assessment Question

20. _____________ is the statistical analysis used to test arithmetic mean of more than two groups.

a. t test
b. ANOVA
c. Z Test
d. None of the above

Answer: ANOVA

91
R for Graphs, Non Parametric Tests and ANOVA

Self Assessment Question

21. In two way ANOVA, _____________ is the second variability considered for the variance accountability.

a. Treatment
b. Within variation
c. Between variation
d. Replication

Answer: Replication

92
R for Graphs, Non Parametric Tests and ANOVA

Self Assessment Question

22. __________________ is the function used for one way ANOVA in R programming language.

a. aov ( )
b. anova( )
c. ava( )
d. None of the above

Answer: aov ( )

93
R for Graphs, Non Parametric Tests and ANOVA

Self Assessment Question

23. _____________ is the function used for two way ANOVA in R programming language.

a. taov ( )
b. tanova( )
c. tava( )
d. aov( )

Answer: aov( )

94
R for Graphs, Non Parametric Tests and ANOVA

Self Assessment Question

24. ____________ is the distribution used for ANOVA to test the significance of arithmetic mean of more
than two groups.

a. F test
b. t test
c. Chi square test
d. Wilcoxon Test

Answer: F test

95
R for Graphs, Non Parametric Tests and ANOVA

Self Assessment Question

25. In ANOVA, observations are __________________.

a. Dependent with one another


b. Independent with one another
c. Categorical in nature
d. Numerical but dependent

Answer: Independent with one another

96
R for Graphs, Non Parametric Tests and ANOVA

Assignment

You need to answer below sets of Questions. These sets of questions are meant for testing
Module V.
1. Write the procedure to create bar chart for the given data and give colour of rainbow.
2. Write the procedure to create line chart for the given numerical data and give
diamond shape for points.
3. Write the procedure to create stacked bar chart for the given data.
4. Write the procedure to create grouped bar chart for the given data and give three
different colour.
5. Write down the guidelines for non parametric tests and its assumptions.
6. Write down the guidelines for one way ANOVA and its assumptions.
7. Write down the guidelines for two way ANOVA and its assumptions.
8. Write the procedure for post ANOVA test and its importance
9. List out the non parametric tests and its corresponding parametric tests.
10. Write the procedure for Cleveland Dot chart and write its advantage over the
barcharts
“Note: Data is available in R as a inbuilt datasets: “iris” and “mtcars”. 97
R for Graphs, Non Parametric Tests and ANOVA

Summary
• R is a sophisticated software to create customizable graphs.
• R has rich function to create pie charts with different types in the arguments.
• R has inbuilt function to create box plot technique to find out the outliers in the data set.
• R provides methods of assessing model assumptions and applying multiple comparison procedures
following significant omnibus tests
• R has inbuilt function to test the independent one sample median.
• R has inbuilt function to test the independent two sample median.
• R has inbuilt function to test the dependent two sample median.
• R has inbuilt function to test the more than two group’s arithmetic mean.
• R has base inbuilt function to test the more than two group’s arithmetic mean with more than one way
variation in the data set.
98
R for Graphs, Non Parametric Tests and ANOVA

Document Links

Topics URL Notes


This link gives complete information about the
R Graphs - Histograms and Density Plots, Dot
https://www.statmethods.net/graphs/creating
R Graphs Plots
.html
Bar Plots, Line Charts, Pie Charts, Boxplots,
Scatterplots.
https://www.statmethods.net/stats/nonpara
Non Parametric metric.html This page explains about the different non
Test http://www.iasri.res.in/sscnars/R_manual/04 parametric tests in R programming language
%20Nonparametric%20tests%20in%20R.pdf
https://www.r-bloggers.com/one-way-
This page explains about complete information
analysis-of-variance-anova/
ANOVA about one way and two way ANOVA in R using
http://rtutorialseries.blogspot.com/2011/01/r
Examples.
-tutorial-series-two-way-anova-with.html

99
R for Graphs, Non Parametric Tests and ANOVA

Video Links

Topics URL Notes


This video gives complete information
https://www.youtube.com/watch?v=Z3V4Pbxeahg
about the R Graphs - Histograms and
https://www.youtube.com/watch?v=S0uoef36iFU
R Graphs Density Plots, Dot Plots
https://www.youtube.com/watch?v=4HXvMbw79N4
Bar Plots, Line Charts, Pie Charts,
https://www.youtube.com/watch?v=0MrYVzPxBIc
Boxplots, Scatterplots.
This video explains about the different
Non Parametric https://www.youtube.com/watch?v=wHwZiJVLE8A
non parametric tests in R programming
Test https://www.youtube.com/watch?v=z7ICtLfbKyA
language
This video explains about complete
https://www.youtube.com/watch?v=qrP7evoNCy4
ANOVA information about one way and two way
https://www.youtube.com/watch?v=zDGQxC0bWn4
ANOVA in R using Examples.

100
R for Graphs, Non Parametric Tests and ANOVA

E-Book Links

Topics URL Page Number

R Graphs 117 to 136

Non Parametric
http://ceal.fing.uncu.edu.ar/industrial/TyHM/DOE/R_in_Action.pdf 160 to 163
Test
218 to 223
ANOVA
226 to 231

101

You might also like