Module 5 (003) - Updated
Module 5 (003) - Updated
Module Number: 05
AIM:
To familiarise students to understand the graphical representation of data, tests related with non
parametric methods and analysis of variance.
2
R for Graphs, Non Parametric Tests and ANOVA
Objectives:
The Objectives of this module are:
• Describe graphical representation of data.
• Discuss the non parametric testing of hypothesis for categorical variable.
• Explain one way and two way analysis of variance.
3
R for Graphs, Non Parametric Tests and ANOVA
Outcome:
At the end of this module, you are expected to:
• Visualize the data using appropriate visual types.
• Explain different non parametric hypothesis testing procedure for given data.
• Illustrate one way and two way analysis of variance.
4
R for Graphs, Non Parametric Tests and ANOVA
Content
• Graphs
• Non Parametric Methods
• ANOVA
5
R for Graphs, Non Parametric Tests and ANOVA
Graphs
Graphs are a powerful way to present data and results in a concise manner. Graphs are more powerful tool
to express our views and thoughts to the higher authority. Whatever kind of data we have, there is a way
to illustrate it graphically. A graph is more readily understandable than words and numbers, and
producing good graphs is a vital skill. Some graphs are also useful in examining data so that you can gain
some idea of patterns that may exist; this can direct you towards the correct statistical analysis.
R has powerful and flexible graphical capabilities. In general terms, R has two kinds of graphical
commands: some commands generate a basic plot of some sort, and other commands are used to tweak
the output and to produce a more customized finish.
6
R for Graphs, Non Parametric Tests and ANOVA
7
R for Graphs, Non Parametric Tests and ANOVA
8
R for Graphs, Non Parametric Tests and ANOVA
You have seen these data before. You can use the boxplot()
command to visualize one of the variables here:
> boxplot(carsd$mpg)
This produces a simple graph which is shown in the
screenshot. This graph shows the typical layout of a box-
whisker plot.
The stripe shows the median, the box represents the upper
and lower hinges, and the whiskers show the maximum and
minimum values. 9
R for Graphs, Non Parametric Tests and ANOVA
If you have several items to plot, you can simply give the
vector names in the boxplot() command:
> boxplot(carsd$mpg, carsd$wt)
The resulting graph appears in the given screenshot. In this
case, you specify vectors that corresponds to the two
columns in the data frame, but they could be completely
separate.
10
R for Graphs, Non Parametric Tests and ANOVA
Customizing Boxplots
A plot without labels is useless; the plot needs labels. You can use the xlab and ylab instructions to label
the axes. You can use the names instruction to set the labels (currently displayed as 1 and 2) for the two
samples, like so:
> boxplot(carsd$mpg, carsd$wt, names = c(‘mpg', ‘weight'))
11
R for Graphs, Non Parametric Tests and ANOVA
Customizing Boxplots
12
R for Graphs, Non Parametric Tests and ANOVA
Customizing Boxplots
Now you have names for each of the samples as well as
axis labels. Notice that the whiskers of the count sample do
not extend to the top, and that you appear to have a separate
point displayed. You can determine how far out the
whiskers extend, but by default this is 1.5 times the
interquartile range.
13
R for Graphs, Non Parametric Tests and ANOVA
Horizontal Boxplots
With a simple additional instruction, we can display the bars horizontally rather than vertically (which is
the default):
> boxplot(mpg ~ vs, data = carsd1, range = 0, horizontal = TRUE)
> title(ylab = ‘Miles per Gallon', xlab = ‘vs')
14
R for Graphs, Non Parametric Tests and ANOVA
Horizontal Boxplots
When we use horizontal = TRUE instruction, your graph is displayed with horizontal bars. Notice how
with the title() command you had to switch the x and y labels. The xlab instruction refers to the
horizontal axis and the ylab instruction refers to the vertical.
15
R for Graphs, Non Parametric Tests and ANOVA
Scatter Plots
The basic plot() command is an example of The following data frame contains two columns of
numeric values, and because they contain the
a generic function that can be pressed into
same number of observations, they could form the basis
service for a variety of uses. Many for a scatter plot:
specialized statistical routines includes a
plotting routine to produce a specialized
graph. However, you will use the plot()
command to produce xy scatter plots. The
scatter plot is used especially to show the
relationship between two variables.
16
R for Graphs, Non Parametric Tests and ANOVA
Scatter Plots
The basic form of the plot() command requires you to specify the x and y data, each being a numeric
vector. You use it like so:
plot(x, y, ...)
If you have your data contained in a data frame as in the following example, you must use the $ syntax to
get at the variables; you might also use with() or attach() commands. For the example data here, the
following commands all produce a similar result.
> plot(carsd$wt, carsd$mpg)
> with(carsd, plot(wt, mpg))
> attach(carsd)
> plot(wt, mpg)
17
> detach(carsd)
R for Graphs, Non Parametric Tests and ANOVA
Scatter Plots
Notice that the names of the axis labels match up with what you typed into
the command. In this case, you used the $ syntax to extract the variables;
these are reflected in the labels. 18
R for Graphs, Non Parametric Tests and ANOVA
User can use the tilde character (~) to symbolize formula. On the left, you place the response variable
(i.e., the dependent variable) and on the right you place the predictor (independent) variable. At the end,
you tell the command where to find these data. This is useful because it means you do not need to use the
$ syntax or use the attach() command to allow R to read the variables inside the data frame.
19
R for Graphs, Non Parametric Tests and ANOVA
Now, user can see another advantage of using the formula notation: the command is very similar to the
original plot() command. The default line produced is a thin solid black line, but user can alter its
appearance in various ways. It is possible to alter the colour using the col = instruction, user can alter the
line width using the lwd = instruction; and can alter the line type using the lty = instruction.
20
R for Graphs, Non Parametric Tests and ANOVA
21
R for Graphs, Non Parametric Tests and ANOVA
Line Types that Can be Specified Using the lty Instruction in a Graphical Command
22
R for Graphs, Non Parametric Tests and ANOVA
Line Types that Can be Specified Using the lty Instruction in a Graphical Command
plot(mtcars$mpg~mtcars$wt, col = "red", pch = 2, cex = 2) >
abline(lm(mtcars$mpg~mtcars$wt), lty = 'dotted', lwd = 3, col = 'gray50')
23
R for Graphs, Non Parametric Tests and ANOVA
Pairs Plots
scatterplot matrix where each pairwise combination is plotted
is referred as pairs plot. pairs( ) function will create a pairs
plot— also can be created customized pairs plots.
Pairs Plots
25
R for Graphs, Non Parametric Tests and ANOVA
Line Charts
There may be many occasions when data is time-dependent, data that is collected over a period of time.
If user wants to display these data as a scatter plot where the y-axis reflects the magnitude of the data
which are recorded and the x-axis reflects the time then line chart is the best option to display. It would
seem sensible to be able to join the data together with lines in order to highlight the changes over time.
26
R for Graphs, Non Parametric Tests and ANOVA
Line Charts
27
R for Graphs, Non Parametric Tests and ANOVA
Line Charts
28
R for Graphs, Non Parametric Tests and ANOVA
Pie Charts
Pie Charts
The pie chart is commonly used to display pie(tempe$Maxtemp,labels = tempe$Month,
col=rainbow(length(tempe$Month)), +
proportional data. You can create pie charts using the main="Maximum Temperature of a Year")
pie() command. In its simplest form, you can use a
vector of numeric values to create your plot like so:
30
R for Graphs, Non Parametric Tests and ANOVA
31
R for Graphs, Non Parametric Tests and ANOVA
32
R for Graphs, Non Parametric Tests and ANOVA
Bar Charts
The bar chart is suitable for showing data that falls into discrete categories. Histogram is the type
of bar chart but width of the bar is important in histogram. Bar chart explains about the magnitude
of particular observation in the data, but histogram explains about the distribution of data.
Bar charts are widely used because they convey information in a readily understood fashion. They
are also flexible and can show items in various groupings.
33
R for Graphs, Non Parametric Tests and ANOVA
tempe$Maxtemp [1] 42 34 40 28 35 34 28 33 34 38 40 26
To make a bar chart you use the barplot() command and specify the vector name in the instruction.
34
R for Graphs, Non Parametric Tests and ANOVA
> barplot(tempe$Maxtemp, col = colr, ylab = "Max Temperature", xlab = "Months", names =
tempe$Month)
Bar plots need not be based on counts or frequencies. You can create bar plots that represent means,
medians, standard deviations, etc. Use the aggregate( ) function and pass the results to the barplot( )
function.
35
R for Graphs, Non Parametric Tests and ANOVA
36
R for Graphs, Non Parametric Tests and ANOVA
37
R for Graphs, Non Parametric Tests and ANOVA
When your data are in a matrix with several rows, the default bar chart is a stacked chart as you
saw in the previous slides. You can force the elements of each column to be unstacked by using the
beside = TRUE instruction as shown in the following code (the default is set to FALSE), in the
screenshot.
38
R for Graphs, Non Parametric Tests and ANOVA
39
R for Graphs, Non Parametric Tests and ANOVA
Unlike in parametric test, non-parametric test uses the scale of data is in ordinal or nominal. And for
parametric test, mean and variance will be test for the hypothesis. But in non- parametric test median will
be used to test the hypothesis.
Parametric Test Corresponding Non-Parametric Test
Independent t test (One Sample) Wilcoxon Signed Rank test
Independent t test (Two Sample) Mann Whitney U test
Dependent sample t test Wilcoxon Matched pairs signed rank test
One Way ANOVA Kruskal-Wallis Test 40
R for Graphs, Non Parametric Tests and ANOVA
41
R for Graphs, Non Parametric Tests and ANOVA
42
R for Graphs, Non Parametric Tests and ANOVA
Mann-Whitney U test: Mann-Whitney U test will be used for alternative to the independent two sample t
test. In Mann-Whitney U test medians will be compared to test the hypothesis.
Example:
The workforce of the Bangalore district is made up of the rural and urban divisions. A few months ago,
several rural division supervisors began claiming that the urban division employees waste tar from the tar
godown. The supervisors claimed the urban division uses more tar per mile of road maintenance than the
rural division. In response to these claims, the Bangalore district material manager performed a test. He
selected random sample from the district job cost records of jobs performed by the urban division (UD)
and another sample of jobs performed by the rural division(RD). The kg of tar per mile for each job are
recorded. Though the measurement scale ratio the manager assumes that the data is highly skewed as the
job in rural area is much lesser than the jobs in urban area. So, it is safe to use non parametric test on
account of this reason.
43
R for Graphs, Non Parametric Tests and ANOVA
Since the p value for the test is greater than the fixed level of significance (0.05), null hypothesis is
accepted. The claim of rural division supervisors is not true.
44
R for Graphs, Non Parametric Tests and ANOVA
45
R for Graphs, Non Parametric Tests and ANOVA
Researcher wants to test that, is there any association between cylinder, forward gear and number of
carburators in different car brands.
46
R for Graphs, Non Parametric Tests and ANOVA
47
R for Graphs, Non Parametric Tests and ANOVA
48
R for Graphs, Non Parametric Tests and ANOVA
49
R for Graphs, Non Parametric Tests and ANOVA
You can use the chisq.test() command to In the following example, you have a simple data
carry out a goodness of ft test. In this case, frame containing two columns; the first column
you must have two vectors of numerical contains values related to an old survey. The
values, one representing the observed values second column contains values related to a new
and the other representing the expected ratio survey. You want to see if the proportions of the
of values. The goodness of ft tests is the data new survey matches the old one, so you perform a
against the ratios (probabilities) you goodness of ft test.
specified. If you do not specify any, the data
are tested against equal probability.
50
R for Graphs, Non Parametric Tests and ANOVA
In the above result, p value is greater than 0.05, so, there is not enough evidence
to reject null hypothesis. Hence, the test is concluded that, new survey matches
with old survey on performance index of the employee.
51
R for Graphs, Non Parametric Tests and ANOVA
ANOVA
ANOVA is a statistical tool used to find the equality of more than two groups. If a researcher wants to
find out equality of two means he/she can go for t test or z test based on the sample size. ANOVA uses
the F test to give a conclusion of the experimental results. ANOVA mainly deals with experimental data
rather than the observational data. Experimental data is observed under the controlled condition unlike
observational data is observed from the natural happenings without any control over the situation.
In ANOVA, we do not observe the independent variable or quantify them individually. ANOVA
generally, uses independent variable as factors and it is not observed directly. For example, suppose a
researcher wants to identify the effect of three dosage of a particular medicine for a particular fever.
52
R for Graphs, Non Parametric Tests and ANOVA
ANOVA
Researcher wants to identify in which dosage, the medicine is effective in reducing the fever level. In
this case, the observed variable is severeness of fever and the factor is medicine dosage level. Here,
factors cannot be observed quantitatively because it is taken by the person and the reaction to it in terms
of fever level.
Assumptions of ANOVA:
• Factors in the general linear model are linear in nature or additive in nature.
• All the population in the analysis is having common variance.
• Within each group, samples are drawn randomly and normally distributed with mean µ and
variance σ2 .
• All samples are drawn independently of each other.
• Errors in the model is normally distributed with mean 0 and common variance σ2 . 53
R for Graphs, Non Parametric Tests and ANOVA
One-way ANOVA and two-way ANOVA are one factor ANOVA, where the main source of
variation or interest of study is focused to one factor. The main difference between one-way and
two-way ANOVA is that, in one-way ANOVA source of variation is in one direction but in the
two-way ANOVA it is in two-way direction.
General linear model for the one-way ANOVA is as follows:
yij = μ + τi + εij
2Se2
SE difference of two group means =
n
Following is the data with three groups in the columns and each replicated three times. So the total
observation is 9. Treatment effect or group effect can be explained as, it is the deviation between the
over all mean and group mean.
1 20 22 𝑦31 = 25
2 18 19 𝑦32 = 23
3 16 19 𝑦33 = 18
Average 18 20 22 𝜇 = 20
56
R for Graphs, Non Parametric Tests and ANOVA
Error term or unexplained effect are deviations between individual observation and corresponding group
mean.
y32 = μ + τ3 + ε32
20 + 18 + ⋯ + 23 + 18
μ= = 20
9
τ3 = second group mean − overall mean
τ3 = 22 − 20 = 2
ε32 = y32 − average of τ2
ε32 = 23 − 22 = 1
23 = 20 + 2 + 1
This is how the general linear model works in estimating the effect of factors in the experiments.
In the sequel, we see how ANOVA is performed and estimated the factors effect.
57
R for Graphs, Non Parametric Tests and ANOVA
58
R for Graphs, Non Parametric Tests and ANOVA
59
R for Graphs, Non Parametric Tests and ANOVA
60
R for Graphs, Non Parametric Tests and ANOVA
61
R for Graphs, Non Parametric Tests and ANOVA
63
R for Graphs, Non Parametric Tests and ANOVA
Following is the data with three groups in the columns and each replicated three times. So, the total
observation is 9.
64
R for Graphs, Non Parametric Tests and ANOVA
66
R for Graphs, Non Parametric Tests and ANOVA
68
R for Graphs, Non Parametric Tests and ANOVA
69
R for Graphs, Non Parametric Tests and ANOVA
In two-way ANOVA, replication variation is extracted from the error variance, so the error mean sum of
square become smaller than the one-way ANOVA. Hence, the F critical value is comparatively higher
than the one-way ANOVA.
In the pairwise comparison test, it is important to mention in the function about the error degrees of
freedom and error mean sum of square.
70
R for Graphs, Non Parametric Tests and ANOVA
Confidence interval for the each of the group mean is given in the output. For example, for the first
group, the confidence interval is 18.48295 for the lower limit and 23.29483 is the upper limit for
the population arithmetic mean of the first group.
$Groups:
Groups which are having same alphabet is statistically equal with one another. For instance, If
group1 and group 3 have same alphabet, they are statistically equal with one another. In the same
way if group1 and group2 have different alphabet, they are statistically unequal. If group3 and
group2 have the same alphabet, they are statistically insignificant.
71
R for Graphs, Non Parametric Tests and ANOVA
1. A data representation technique used for non technical person is called as _________.
a. Graphical Techniques
b. Statistical Inference
c. Testing of hypothesis
72
R for Graphs, Non Parametric Tests and ANOVA
a. bwplot ( )
b. wplot( )
c. boxplot( )
d. None of the above
Answer: boxplot( )
73
R for Graphs, Non Parametric Tests and ANOVA
a. boxplot(item1 | item 2)
b. boxplot(item1 & item 2)
c. boxplot(item1 , item 2)
74
R for Graphs, Non Parametric Tests and ANOVA
a. Vertical boxplot
b. Horizontal boxplot
c. Slanting boxplot
d. Error message
75
R for Graphs, Non Parametric Tests and ANOVA
a. Vertical
b. Horizontal
c. Slanting
Answer: Horizontal
76
R for Graphs, Non Parametric Tests and ANOVA
a. Line
b. Trend line
c. Fitted line
d. Scatter
Answer: Scatter
77
R for Graphs, Non Parametric Tests and ANOVA
a. Line
b. Trend line
c. Fitted line
d. Scatter
78
R for Graphs, Non Parametric Tests and ANOVA
79
R for Graphs, Non Parametric Tests and ANOVA
80
R for Graphs, Non Parametric Tests and ANOVA
81
R for Graphs, Non Parametric Tests and ANOVA
11. Which one of the given options is the function used for pairwise plot in R?
a. pplot( )
b. pairplot ( )
c. pairs ( )
d. None of the above
Answer: pairs ( )
82
R for Graphs, Non Parametric Tests and ANOVA
12. Which one of the given function is used for line chart in R?
i. pplot( )
ii. pairplot ( )
iii. pairs ( )
a. Only i
b. Only ii
c. All i, ii and iii
d. None of the above
13. Which one of the given functions is used for pie chart in R?
a. piechart( )
b. pieplot ( )
c. piecharts ( )
d. pie ( )
Answer: pie ( )
84
R for Graphs, Non Parametric Tests and ANOVA
14. Which one of the given function is used for Clevland dot chart in R?
a. dotchart( )
b. cchart ( )
c. clevchart( )
d. clchart ( )
Answer: dotchart( )
85
R for Graphs, Non Parametric Tests and ANOVA
15. Which one of the given functions is used for stacked bar chart in R?
a. dotchart( )
b. cchart ( )
c. clevchart( )
d. barplot ( )
Answer: barplot ( )
86
R for Graphs, Non Parametric Tests and ANOVA
16. Which one of the given statistical function is used for Wilcoxon signed rank test?
a. wilcoxon.test ( )
b. wilcox.test ( )
c. wilcoxons.test ( )
d. wilcoxonsign.test ( )
Answer: wilcox.test ( )
87
R for Graphs, Non Parametric Tests and ANOVA
17. For testing of association, which one of the given options is the null hypothesis?
88
R for Graphs, Non Parametric Tests and ANOVA
a. Mean
b. Mode
c. Median
d. GM
Answer: Median
89
R for Graphs, Non Parametric Tests and ANOVA
19. Which one of the given options is the statistical analysis used to test arithmetic mean of two groups?
a. t test
b. ANOVA
c. Z Test
d. None of the above
Answer: t test
90
R for Graphs, Non Parametric Tests and ANOVA
20. _____________ is the statistical analysis used to test arithmetic mean of more than two groups.
a. t test
b. ANOVA
c. Z Test
d. None of the above
Answer: ANOVA
91
R for Graphs, Non Parametric Tests and ANOVA
21. In two way ANOVA, _____________ is the second variability considered for the variance accountability.
a. Treatment
b. Within variation
c. Between variation
d. Replication
Answer: Replication
92
R for Graphs, Non Parametric Tests and ANOVA
22. __________________ is the function used for one way ANOVA in R programming language.
a. aov ( )
b. anova( )
c. ava( )
d. None of the above
Answer: aov ( )
93
R for Graphs, Non Parametric Tests and ANOVA
23. _____________ is the function used for two way ANOVA in R programming language.
a. taov ( )
b. tanova( )
c. tava( )
d. aov( )
Answer: aov( )
94
R for Graphs, Non Parametric Tests and ANOVA
24. ____________ is the distribution used for ANOVA to test the significance of arithmetic mean of more
than two groups.
a. F test
b. t test
c. Chi square test
d. Wilcoxon Test
Answer: F test
95
R for Graphs, Non Parametric Tests and ANOVA
96
R for Graphs, Non Parametric Tests and ANOVA
Assignment
You need to answer below sets of Questions. These sets of questions are meant for testing
Module V.
1. Write the procedure to create bar chart for the given data and give colour of rainbow.
2. Write the procedure to create line chart for the given numerical data and give
diamond shape for points.
3. Write the procedure to create stacked bar chart for the given data.
4. Write the procedure to create grouped bar chart for the given data and give three
different colour.
5. Write down the guidelines for non parametric tests and its assumptions.
6. Write down the guidelines for one way ANOVA and its assumptions.
7. Write down the guidelines for two way ANOVA and its assumptions.
8. Write the procedure for post ANOVA test and its importance
9. List out the non parametric tests and its corresponding parametric tests.
10. Write the procedure for Cleveland Dot chart and write its advantage over the
barcharts
“Note: Data is available in R as a inbuilt datasets: “iris” and “mtcars”. 97
R for Graphs, Non Parametric Tests and ANOVA
Summary
• R is a sophisticated software to create customizable graphs.
• R has rich function to create pie charts with different types in the arguments.
• R has inbuilt function to create box plot technique to find out the outliers in the data set.
• R provides methods of assessing model assumptions and applying multiple comparison procedures
following significant omnibus tests
• R has inbuilt function to test the independent one sample median.
• R has inbuilt function to test the independent two sample median.
• R has inbuilt function to test the dependent two sample median.
• R has inbuilt function to test the more than two group’s arithmetic mean.
• R has base inbuilt function to test the more than two group’s arithmetic mean with more than one way
variation in the data set.
98
R for Graphs, Non Parametric Tests and ANOVA
Document Links
99
R for Graphs, Non Parametric Tests and ANOVA
Video Links
100
R for Graphs, Non Parametric Tests and ANOVA
E-Book Links
Non Parametric
http://ceal.fing.uncu.edu.ar/industrial/TyHM/DOE/R_in_Action.pdf 160 to 163
Test
218 to 223
ANOVA
226 to 231
101