Week - 1 - Getting - Started in RStudio - 2023
Week - 1 - Getting - Started in RStudio - 2023
For those with no programming experiencing, RStudio is a user friendly version of R. It’s a
great platform to get started, and once you’ve mastered RStudio, you will also have the
knowledge to work directly into R, should you want to.
Now don’t be scared by the prospect of coding! This unit has been designed to enable you
to walk away with ability to design, implement and analyse your own data. That is the key
aim of this unit. Therefore, we will provide the code where necessary. However, for those of
you excited by the prospect of learning code and who want to take it a step further, there
will be ample opportunity to understand how the code is compiled, break it down and
further develop your coding skills.
The first couple of weeks will focus on getting familiar with the RStudio environment,
before we move on to statistically analysing and graphing your data. If you find that you are
struggling during the first 2 weeks, then I recommend that you jump on Youtube. I have
found the following videos particularly instructive.
• RStudio training by Mike Marin (several short video’s on key features)
1
Quantitative Biology: Week 1
• RMarkdown with Roger Peng
• Introduction to RStudio by Justin Murphy
You will also find this week’s practical recorded and available online. I will only record the
first week’s practical as we are laying the foundations down for the course and you will
have lots of questions. If you feel that you struggled in the first week, you can view the
recording and stop and start it as you go through the practical at your own pace.
As you work through the practical, please make sure that you answer all the questions
(denoted in blue) and complete any exercised provided. Your tutor will go over these key
questions in class, so also check that you got them correct.
2
Quantitative Biology: Week 1
2. Which columns contain the dependent variable and which the independent
variable? How do we distinguish between them?
4. What is the number in cell (3,3)? HINT: Think about the row and column
numbering.
3
Quantitative Biology: Week 1
You will introduce a fourth panel by clicking on and choosing a new R script.
The editor window is where you will write your code, the results of which are displayed in
the console window. The workspace shows you what data you are working with and the
files panel, which is further subdivided, provides information on packages available,
provide help with functions and is where your plots will appear. The first thing we need to
do is set up a working directory.
4
Quantitative Biology: Week 1
Now we need to set this folder location as your working directory. There are several
ways to do this, but the simplest is to:
1. Click on the session tab at the top of the window
2. Go to set working directory
3. Here you can “Choose Directory” by navigating to and opening the folder.
You can always check your working directory by typing in getwd(), and it will return in
answer in your console.
Now that you have set the folder as your working directory, this is where you should store
all your data files so they can be uploaded into the program.
5
Quantitative Biology: Week 1
NOTE: when you see the # command it’s just there to give you extra information, but
RStudio won't think its code!!
The data set with 10 observations and 3 variables should now be in your global
environment.
Check that the data frame has imported correctly by viewing the data. To view the data at
any time, you can click on the table icon next to a data set in the global environment.
6
Quantitative Biology: Week 1
There are lots of additional packages that can be installed on top of the ones already in
there. Packages contain different types of tests and/or plot functions and allow you do even
more with your data. A list of the most useful packages can be found at:
https://support.rstudio.com/hc/en-us/articles/201057987-Quick-list-of-useful-R-
packages
Now make sure you understand the difference between upload and attach. To upload a
package, it means that you don’t have the package installed on your computer and have to
be connected to the internet to install the package. You only have to do this step once, so
once the package is on your computer – it will always be there (unless you actively remove
it!). BUT, once it’s uploaded on to your computer, you still have to tell RStudio when you
want to use it. To use the package, you have to attach it, and this you have to do every time
you want to use a package.
How to upload a package
If you have to install a package from the CRAN website, click on the install button in the
files window and follow instructions. It may take a few minutes to install a package – just
let it do its thing! You know that RStudio is processing information if you see the stop sign
in the console – just leave it until the stop sign disappears. You can check that it has
installed properly by clicking on the packages tab in the file console and seeing if the
package is there.
install.packages("car")
To check that the “car” package has attached, scroll down your packages list and you should
see a tick in the box next to it.
7
Quantitative Biology: Week 1
6. Descriptive data
One of the first things we do when working with data is to have a good look at the data, get
some descriptive information (e.g. mean, median, min, max and interquartile values) and
plot the data. We can get all that descriptive data using the summary function.
summary(Plants)
## Fertiliser Plant.1 Plant.2
## NO :5 Min. :2.400 Min. :2.400
## YES:5 1st Qu.:3.125 1st Qu.:3.025
## Median :3.900 Median :4.950
## Mean :3.790 Mean :4.980
## 3rd Qu.:4.500 3rd Qu.:6.975
## Max. :5.100 Max. :7.600
There are also some other really useful functions that you should start using when you first
import a data set into the R environment. Type the following functions into and Rstudio
and see what information it give you back about your data set.
A. str(Plants)
B. dim(Plants)
C. View(Plants)
D. head(Plants)
8
Quantitative Biology: Week 1
When you have a data set, you often just want to find out some information about one
column. So to refer to a column in a data frame, you use this symbol $. For example if I want
to get the mean value of plant1 then I type in:
mean(Plants$Plant.1)
## [1] 3.79
It is also important to check what class our data is i.e. is it numeric, integer or a factor. This
will become more important as you carry on throughout semester. You can check what
class each variable is the in data frame using the "str" function or you can check one
column like this:
class(Plants$Plant.1)
## [1] "numeric"
OK, let’s say that you wanted to know the mean value of plants in Plant.1 but only those
plants that had had fertiliser treatment. To do this, we need use your factor which here is
fertiliser. You can ONLY use factors to differentiate groups within a column. There are two
ways of finding this mean value out.
mean(Plants$Plant.1[Plants$Fertiliser=="YES"])
## [1] 4.46
or
tapply(Plants$Plant.1, Plants$Fertiliser, mean)
## NO YES
## 3.12 4.46
tapply is a VERY useful command, and one that we will use often.
Exercise 1
• What is the mean plant height of plant 1 with and plant 1 without fertiliser?
• What is the mean plant height of plant 2 with and plant 2 without fertiliser?
• What is the maximum height value of plant 1 and plant 2 with fertiliser?
Did you get these answers: 4.46 cm and 3.12 cm 7.06 cm and 2.9 cm 5.1 cm and 7.6 cm?
9
Quantitative Biology: Week 1
10
Quantitative Biology: Week 1
par(mfrow=c(1,2))
hist(Plants$Plant.1[Plants$Fertiliser=="YES"])
hist(Plants$Plant.1[Plants$Fertiliser=="NO"])
Use the help function to work out what 'par(mfrow=c(1,2))' does? Type in par in the ‘help’
menu (or ?par) and scroll down to the relevant (mfcol, mfrow) section in the help menu.
Sometimes we may want to add a line over the top of the fertilised treatment for plant.1 to
confirm its normal distribution
par(mfrow=c(1,1))
hist(Plants$Plant.1[Plants$Fertiliser=="YES"])
xfit<-seq(3,6, length=100) #create a sequence of numbers
yfit<-
dnorm(xfit,mean=mean(Plants$Plant.1[Plants$Fertiliser=="YES"]),sd=sd(Plants$P
lant.1[Plants$Fertiliser=="YES"]))*3 #find y values on a normal distribution
with appropriate mean and sd for each x value in xfit
lines(xfit,yfit) #plot the line
11
Quantitative Biology: Week 1
The last thing we need to do is add the all important labels to the graph.
par(mfrow=c(1,1))
hist(Plants$Plant.1[Plants$Fertiliser=="YES"], main="Plant 1 with
Fertiliser", xlab=" Growth (cm)", ylab="Frequency")
12
Quantitative Biology: Week 1
Then add the line again if we want to. remember that we have already made the sequences
xfit and yfit. They should be in your global environment.
lines(xfit,yfit) #plot the line
Histograms are great for understanding the distribution of data points, but sometimes you
might want to compare between treatments. Box plots are a great way of doing this and for
comparing the amount of variability between treatments.
Let’s compare the data between our plant groups.
boxplot(Plants$Plant.1, Plants$Plant.2)
13
Quantitative Biology: Week 1
14
Quantitative Biology: Week 1
8. Vectors, matrices and data frames
So far we have been working with a dataframe that you imported into Rstudio, and this is
most likely the way that you will work with most of your data. But it’s also important to
understand how to create matrices and dataframes in Rstudio, should you need to.
We can create variables, which are either as a single value, a string of numbers (vector) or a
matrix (columns and rows). Note that matrices contain columns of data but you can also get
data that contains categories (e.g. male, female). When you get data with columns and rows
that has these types of data as well as numeric data, we call it a data frame. You create
objects using the <- assignment operator.
VECTORS
1. Create a variable named “a” with the value of 1
a <- 1
2. Create a numeric vector named ”b" with elements equal to 1, 2 and 3. There are at
least 3 ways to do this in R.
b <-c(1,2,3)
b
## [1] 1 2 3
assign("b",c(1,2,3))
b
## [1] 1 2 3
b <- seq(1,3)
b
## [1] 1 2 3
3. Create the following vectors: (a) (1 3 5); (b) (1 2 3 0 1 2 3); (c) (1 1 1 1); and (d) (1 2
3 1 2 3 1 2 3)
seq(1,6,2)
## [1] 1 3 5
c(b,0,b)
## [1] 1 2 3 0 1 2 3
rep(1,4)
15
Quantitative Biology: Week 1
## [1] 1 1 1 1
rep(b,3)
## [1] 1 2 3 1 2 3 1 2 3
4. Create character vectors containing: (a) the names of at least 5 students in this class;
(b) the values X1, X2, X3 and X4, and call it labels
student.names <- c("Hunter","Eric","Sara","Arvind","Abigail")
student.names
## [1] "Hunter" "Eric" "Sara" "Arvind" "Abigail"
labels <- paste("X",1:4, sep="")
labels
## [1] "X1" "X2" "X3" "X4"
MATRICES
5. Create the following matrices: (a) a 3 x 3 matrices with numbers 1 to 9 with
numbers increasing from left to right, (b) a 3 x 3 matrices with numbers 1 to 9 with
numbers increasing from top to bottom
matrix(c(1,2,3,4,5,6,7,8,9), nrow=3, byrow=TRUE)
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
matrix(c(1,2,3,4,5,6,7,8,9), nrow=3, byrow=FALSE)
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
6. Create the following matrices: (a) a 3 x 3 identity matrix with 1's in the diagonal and
0's in all of the o’s diagonal elements; (b) a 3 x 3 matrix with the values 1, 2 and 3
along the diagonal and 0's in the o’s diagonals; (c) an empty matrix with 2 rows and
3 columns.
I <- diag(1,3)
I
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 0 1 0
## [3,] 0 0 1
diag(b)
16
Quantitative Biology: Week 1
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 0 2 0
## [3,] 0 0 3
A <- matrix(nrow=2,ncol=3)
A
## [,1] [,2] [,3]
## [1,] NA NA NA
## [2,] NA NA NA
7. Create the following matrices: (a) 2 x 3 matrix with numbers 1,3,5,7,9,11 with number
increasing left to right, and 2 rows and 3 columns.
matrix(seq(1,12,2),nrow=2, byrow=T)
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 7 9 11
8. Adding columns and rows can easily be done using the bind function. First we have to
give the matrix a name, then we can bind another column to it
data<-matrix(seq(1,12,2),nrow=2, byrow=T)
data2<-cbind(data,c(13,15))
##cbind binds the columns from the object I that we have already created to
our dataframe named “data” to create a new data frame “data2”
data2
##the column that we added was a concatenated sequence of the numbers 13 and
15
To add an extra row, we use rbind instead! If you ever come across a function in a piece of
code that you’re not sure what it does, then you can always use the help function by typing
in question mark in front of the function. The help information will pop up in the files tab.
?cbind
Exercise 2
• Create a vector of 10 numbers from 10 to 100, with numbers in multiples of 10. Call this
vector X
• Bind a column to X of 10 numbers of 1 to 5 repeated twice. Call this matrix Y
• Add 5 to every number in Y, call this new matrix Z
• Extract the number on the 5 row, second column from Z.
17
Quantitative Biology: Week 1
The value should be = 10
18