R Programming Basics for Beginners

This document provides an overview of fundamental concepts in R programming including data types, data structures, functions for working with vectors, matrices, data frames, and factors. It covers how to instantiate, subset, and manipulate each of these data structures as well as perform operations like coercion, arithmetic, transposing, sorting, and indexing. Common functions for data wrangling and summarization are also presented.

Uploaded by

Soo Bin Teo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

85 views2 pages

R Programming Basics for Beginners

Uploaded by

Soo Bin Teo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Fundamentals of R Programming -

m1<-matrix(3:8,nrow=2) #R can infer the - Data Frame Subsetting

- getwd(), setwd(x) other dimension - df[3,2], df[3, “age”] #select single element
- ls(): list all the objects in the current workspace - m2<-3:8 - df[3,] #select entire row
- rm(x): remove object “x” from workspace dim(m2)<-c(3,2) #3 rows, 2 col - df[,2], df[, “age”] #select entire col
- dir(): list all files and subfolders in the current wd - Can recycle too - df[c(3,5), c(2,3)], df[c(3,5), c("age", "child")]
- list.files(), file.exists(x) - m1<-matrix(3:5,ncol=3,nrow=2) #select multiple portions
- m1<-matrix(3:6,ncol=3,nrow=2) - df[2] - returns a dataframe
Atomic Data Types #4 is not a factor of 6, warning given - df$age - returns a vector
Logical - m3<-matrix(3:8,ncol = 3,nrow = - df[[“age”]] - returns a vector
Integer 2,byrow = TRUE) - df[[2]] - returns a vector
Numeric - rownames(m3)<-c("Row1","Row2") - Data Frame Extension
- as.numeric(“abc”) = NA [coercion] - colnames(m3)<-c("Col.1","Col.2","Col.3") - df$height <- height, df[["height"]] <- height
- as.integer(x)) - is.matrix(x), is.vector(x) - cbind(df, weight)
- as.integer(5>6) = 0 - Matrix multiplication: G %*% F - rbind(df, tom)
Character - Transpose: t(m) - Data Frame Sorting
● Length of string - Common vectors: Unit, Zero 1. sort(df$age)
○ nchar("Singapore") = 9 - Common matrices: Unit, Zero 2. ranks <- order(df$age)
● Find the starting pos of a substring - Diagonal matrix: (1) diag(m) or (2) diag(c(2,2,5)) #the order of the current indexes if sorted
○ regexpr(“ex”, “longtext”) = 6 - Identity matrix: diag(3) 3. df[ranks,]
○ regexpr("exs","longtext") = -1 - Inverse matrix: solve(m) - Other sorting related functions:
○ regexpr("a","banana") = 2 - Determinant: det(m) max(df$age), which.max(df$age),
● Find the starting pos of all substrings - Num of linearly independent columns: rank(df$age) #just ranks the elements
○ gregexpr("a","banana") = [2,4,6] - matA <- qr(m) - Data Frame Indexing
● Find pos of strings in a vector containing the - matA$rank - index <- df$height > 171
substring - dim(m), nrow(m), ncol(m) ## [1] FALSE TRUE FALSE FALSE FALSE
○ txt<-c("arm","foot","lefroo", "bafoobar") - colSums(m), rowSums(m), sum(m) - sum(index) #1
grep("foo", txt) = [2,4] - colMeans(m), rowMeans(m), mean(m) - df$name[index] #Pete
● Extract a substring from a string - cbind(m1, m2), rbind(m1, m2) - which(df$height > 171)
○ substr("Singapore",2,4) = "ing" Factor - match(c("Anne", "Julia", "Cath"), df$name)
● Substitute substring in a string - special variables used to store categorical ## [1] 1 4 5
○ sub("or","es","Singapore") = "Singapese" variables, more efficient use of memory (stored - c("Anne", "Julia", "Cath", "Bob") %in%
● Substitute all substrings in a string as a vector of integer values) df$name
○ gsub("a","o","banana") = "bonono" - a<-c(0,1,0,0,1) ## [1] TRUE TRUE TRUE FALSE
Complex value a.f<-factor(a,labels = c("Male","Female")) - Basic Data Wrangling
- class(-1-2i) = “complex” - a<-c("One","Two","Three","One","Three") - library(dplyr)
- sqrt(-1+0i)=0+1i a.f<-factor(a) - mutate(df, bmi = weight/height^2*10000)
- levels of the factor are ordered, by default, by ||
Data Structures alphabet order (e.g. “Three” before “Two”) df$bmi <- df$weight/df$height^2*10000
- But, can manually assign the order of the levels: - attach(df)
a<-c("One","Two","Three","One","Three") df$bmi <- weight/height^2*10000
a.f<-factor(a,levels=c("One","Two","Three")) detach(df)
b<-as.integer(a.f) - filter(df, bmi > 18.5 & bmi < 24.9)
b=12313 ||
- Generating factors by specifying the pattern of df[df$bmi > 18.5 & df$bmi < 24.9,]
Vector their levels: - select(df, name, height, weight, bmi)
- Instantiation: gl(2,8,labels=c("male","female")) ||
1) a<-c(1,3,4) List df[,c("name", "height", "weight", "bmi")]
2) a<-c("desks" = 1, "tables" = 3, "chairs" = 4) - Mike<-list(Name="Mike",Salary=10000,Age=43, - %>%
3) a<-c(desks = 1, tables = 3, chairs = 4) Children=c("Tom","Lily","Alice"))
- Coercion (coerced to the most flexible type; - Mike[2], Mike[‘Salary’], Mike$Salary, Mike[c(1,2)], Useful functions:
order: logical < integer < numeric < complex < Mike[c("Salary","Age")] - round(num, 2) #round to 2dp
character < list): - str(): Displays the internal structure of R object - unique(data$marital) #output all unique values
- a <- c("Name",1) = [“Name” “1”] - Nested list: list(list(1,2),c(3,4)) - table(df$age)
typeof(a) = “character” - Combine 2 lists with c(): c(list(1,2),c(3,4)) - group_by(…)
Array - summarise(Count=n()) #creates a new data
- a<-c(TRUE, FALSE, TRUE) - array(1:12,c(2,2,3)) frame. It returns one row for each combination of
a<-as.integer(a) grouping variables
a = [1 0 1] Other useful functions: - Center: mean(), median()
- Other ways to instantiate: - unlist(strsplit(salary_data, " ")) #split the string - Spread: sd(), IQR(), mad()
- seq(1,9,2) → [1 3 5 7 9] into components (vectors) - Range: min(), max(),
- rep(c(2,3,4), 3) → [2 3 4 2 3 4 2 3 4] - which(logical object) - Position: first(), last(), nth(),
- Vector arithmetic (+ - * / ^ on all elements in the - which.max(v) #index of the max value in a vector - Count: n(), n_distinct()
vector) - cat(...) #concat and print - Logical: any(), all()
- Arithmetic using another vector of same length - Concat vectors after converting to character - slice_max(colname) #select rows with highest or
- sum(x), mean(x), prod(x) - paste0(1:10, collapse=””) = “12345678910” lowest values of a variable
- Vector subsetting: - paste(“1”, “2”, sep = ", ") = “1, 2” - tidyr::spread(x, y) #create new cols from the x
- a[1], a[“desk”], a[c(1,2)], a[c(“desks”, - cut(1:10, c(0,5,10), labels=c("low", "high")) values, and y is the spreaded data under those
“tables”)] #low low low low low high high high high high new cols
- a[-3] = a[1:2] (factor)
- a[c(TRUE, FALSE, TRUE)] --------------------------------------------------------------------------
- a[c(TRUE, FALSE)] → will recycle Basic Data Wrangling warning(), stop(), tryCatch(){},
- a[c(TRUE, FALSE, TRUE, FALSE)] → will Data Frame error=function(cond){}, warning = function(cond){}
have NA - df <- data.frame(v1, v2, v3) finally={}
Matrix - names(df) <- c("Name", "Age", "Child")
- Instantiation - df <- data.frame("Name" = name, "Age" = age,
- m1<-matrix(3:8,ncol=3,nrow=2) #fills the "Child" = child) #can do without the “”s also
first col first - stringsAsFactors = FALSE
Advanced Data Wrangling - nodes <- xmlChildren(root) #list of child nodes - inner_join(tab1, tab2) #no NAs
readr: nodes[[2]] - full_join(tab1, tab2) #all the NAs
- books <- getNodeSet(data, - semi_join(tab1, tab2) #keep the part of the first
"/library/catalog/book") #list of nodes that match table for which we have information in the
the criterion second table, but doesn’t add the columns of the
- books <- getNodeSet(data, second
"/library/catalog/book[@type='HardCover']") - anti_join(tab1, tab2) #keep the part of the first
- xmlToList(data) table for which we have NO information in the
- xmlToDataFrame(books) second table, but doesn’t add the columns of the
Data Problems: second
readxl: - Causes of missing data: Set Operators
- Structural missing data - intersect(1:10, 6:15)
- Human error in data entry or collection - library(dplyr)
- System error in capturing data tab1 <- tab[1:5,]
- Loopholes in business process tab2 <- tab[3:7,]
- Handling Missing Data intersect(tab1, tab2) #rows 3:5
- Data can be completed w info in other - union(tab1, tab2)
Normal data loading: attributes (e.g. lat and lon) - setdiff(tab1, tab2) #output the rows that are in
- employees <- read.delim("employees.txt", - Data can be completed w info from other tab1 but not in tab2
sep=",") records (e.g. find the developer of the - setequal(tab1, tab2) ## [1] FALSE
read.table("employees.txt", header=T, sep=",") same property)
- library(readxl) - If there is no way we can replenish missing Programming Structure and Functions
read_excel("employees.xlsx", data from alternative sources, can create a - if(cond){if_output}else{else_output}
sheet="employees") new col and label NAs as “Unknown” - ifelse(cond, if_output, else_output)
- read_excel("employees.xlsx", sheet=2) - Handle Missing Numeric Data #works on vectors
Download Data from the Internet - Convert numeric data into categorical data - z <- c(TRUE, TRUE, FALSE)
- library(curl) - Replace the data with value 0 any(z) ## [1] TRUE
library(XML) - Replace the data with mean value - z <- c(TRUE, TRUE, FALSE)
theurl <- "[url http link]" - Common causes of Data problems all(z) ## [1] FALSE
url <- curl(theurl) - Data entry problem - avg <- function(x){
urldata <- readLines(url) - Logical error s <- sum(x)
data <- readHTMLTable(urldata, - Outdated n <- length(x)
stringsAsFactors = FALSE) - Different standard s/n
class(data) #list - What causes logical error in data? }
- Why CSV? - Logical mistakes in the system where the - for (i in range of values){
- More efficient in storing huge amounts of data was generated from operations that use i, which is changing across
data - Mishandling done by the data processor if the range of values
- Can be used across platforms data is secondary data }
Import Data more Efficiently - Data might be INCONSISTENT Other functions:
- system.time(care.data<-read.csv("hospital-data.c Reshaping data - apply(x, MARGIN, FUNC, ...)
sv")) - library(tidyr) #apply a function to the margin of a matrix or a
new_tidy_data <- wide_data %>% gather(year, dataframe
VS fertility, '1960':'2015') #convert wide to tidy data #for matrix, margin=1 indicates rows, 2 for cols,
- new_tidy_data <- wide_data %>% gather(year, c(1,2) for rows and cols
colclass <- c("character","character","character", fertility, -country) #specify to not gather country e.g. apply(z,2,CountOddorEven, TRUE)
"character","character", - new_tidy_data <- wide_data %>% gather(year, - x <- list(A=1:4, B=seq(0.1,1,by=0.1))
"character","factor","factor","factor", fertility, -country, convert = TRUE) lapply(x, mean) #returns list
"character","factor","factor","factor") #convert to integer - sapply(x, mean) #returns vector instead of list
- new_wide_data <- new_tidy_data %>% - vapply(x, mean, numeric(1))
system.time(care.data<-read.csv("hospital-data.c spread(year, fertility) #tidy to wide #performs like vapply() but can specify return
sv", colClasses = colclass)) Separate and Unite value type
Open Data - dat %>% separate(key, c("year", - rapply(x, function(x){x^2}) #returns a vector
- www.kaggle.com "first_variable_name", "second_variable_name"), #recursive apply
- www.kdnuggets.com/dataset/index.html fill = "right") - mapply(rep, 1:5, c(4,4,4,4,4))
- data.gov.sg #separate key col into the 3 cols #take multiple vectors as inputs
Obtain data via JSON Interface #fill tells us where to place the NA values if there ||
- JSON - lightweight data-interchange format - isn’t a 3rd variable matrix(c(rep(1,4), rep(2,4), rep(3,4), rep(4,4),
completely language independent - standard that - dat %>% separate(key, c("year", rep(5,4)), nrow = 4, ncol = 5)
is accepted by all programming languages - used "variable_name"), extra = "merge") - x <- 1:10
as a media to exchange data between different #can also merge the extra variable y <-
parties - dat %>% separate(key, c("year", factor(c("A","A","A","B","B","B","B","C","C","C"))
- library(jsonlite) "variable_name"), extra = "merge") %>% tapply(x,y,sum) #sumofA, sumofB, sumofC
url <- spread(variable_name, value) #applies the specified FUNC to each group of an
"https://api.data.gov.sg/v1/transport/carpark-avail - dat %>% separate(key, c("year", array, grouped based on levels of certain factors
ability" "first_variable_name", "second_variable_name"), - Pivot table
data <- fromJSON(url) fill = "right") %>% unite(variable_name, - tapply(murders$total, murders$region,
a <- as.data.frame(data$items$carpark_data) first_variable_name, second_variable_name) sum)
Introduction to XML %>% spread(variable_name, value) %>% - tapply(murders$total,
- XML - eXtensible Markup Language - designed rename(fertility = fertility_NA) cut(murders$population, breaks = c(0,
to store and transport data Combining Data 1e+06, 1e+07, 1e+08)), mean)
- HTML - HyperText markup Language - designed - identical(results_us_election_2016$state, - Note: tapply() only works if the studied
to display data murders$state) ## [1] FALSE numbers are in one vector
- XML is able to store all kinds of data - For HTML, #cannot simply join 2 tables - split(murders, murders$region)
the text must be pre-defined - left_join(murders, results_us_election_2016, by #split a dataframe into a list of data frames based
- library(XML) = "state") on a factor array
data <- xmlParse("books.xml") - tab1 %>% left_join(tab2)
- root <- xmlRoot(data) #simply the top level node - tab1 %>% right_join(tab2)

Lab 02 - Compound Data Structures
No ratings yet
Lab 02 - Compound Data Structures
12 pages
R Programming Basics: Vectors, Matrices, Dataframes
No ratings yet
R Programming Basics: Vectors, Matrices, Dataframes
13 pages
Data Science
No ratings yet
Data Science
20 pages
R Machine Learning Lab Guide
0% (1)
R Machine Learning Lab Guide
9 pages
R File Code
No ratings yet
R File Code
16 pages
Base R
No ratings yet
Base R
9 pages
An Introduction To R Language
No ratings yet
An Introduction To R Language
11 pages
R Studio
No ratings yet
R Studio
8 pages
Standard Deviation in RStudio Guide
No ratings yet
Standard Deviation in RStudio Guide
10 pages
R Exam
No ratings yet
R Exam
18 pages
R Basics for Beginners
No ratings yet
R Basics for Beginners
24 pages
R Programming Materials
No ratings yet
R Programming Materials
51 pages
Day 2
No ratings yet
Day 2
5 pages
Kiran R1
No ratings yet
Kiran R1
12 pages
Basic R Dplyr Session 4 Demonstration
No ratings yet
Basic R Dplyr Session 4 Demonstration
18 pages
R Misc Tools for Data Analysis
No ratings yet
R Misc Tools for Data Analysis
19 pages
Experiment 5
No ratings yet
Experiment 5
13 pages
A Short List of Some Useful R Commands: Input and Display
No ratings yet
A Short List of Some Useful R Commands: Input and Display
2 pages
Statistic and R Programming Lab Exercise
No ratings yet
Statistic and R Programming Lab Exercise
8 pages
20mia1006 - Fda - Consolidated Report
No ratings yet
20mia1006 - Fda - Consolidated Report
119 pages
Teaching R
No ratings yet
Teaching R
15 pages
DMPA Codes
No ratings yet
DMPA Codes
16 pages
M2 Dar
No ratings yet
M2 Dar
46 pages
R Programming Cheat Sheet: Data Structures
No ratings yet
R Programming Cheat Sheet: Data Structures
2 pages
R Programming Cheatsheet
100% (2)
R Programming Cheatsheet
6 pages
R Programming
No ratings yet
R Programming
50 pages
R Lecture 2-1
No ratings yet
R Lecture 2-1
28 pages
Essential R Commands Guide
No ratings yet
Essential R Commands Guide
11 pages
Assignment
No ratings yet
Assignment
4 pages
R Programming Cheat Sheet for Biometrics
100% (2)
R Programming Cheat Sheet for Biometrics
35 pages
R Programming Cheat Sheet for Biometrics
100% (1)
R Programming Cheat Sheet for Biometrics
4 pages
R Programming Cheat Sheet Guide
No ratings yet
R Programming Cheat Sheet Guide
4 pages
R Tutorial: Vectors, Matrices, Arrays
No ratings yet
R Tutorial: Vectors, Matrices, Arrays
8 pages
18 3 24 Upto Week 6 A B Latest 1
No ratings yet
18 3 24 Upto Week 6 A B Latest 1
25 pages
R Syntax Examples 1
No ratings yet
R Syntax Examples 1
6 pages
R Programming Assignments Overview
No ratings yet
R Programming Assignments Overview
25 pages
R Programming: Vector and Matrix Basics
No ratings yet
R Programming: Vector and Matrix Basics
3 pages
R Reference Card
No ratings yet
R Reference Card
6 pages
R Reference Guide for Programmers
No ratings yet
R Reference Guide for Programmers
6 pages
Ba Ca4
No ratings yet
Ba Ca4
3 pages
Lecture 5 (Managing and Understanding Data)
No ratings yet
Lecture 5 (Managing and Understanding Data)
9 pages
R Commands
No ratings yet
R Commands
18 pages
Week 1-B. Data in R
No ratings yet
Week 1-B. Data in R
5 pages
Statistic and R Programming Lab Exercise
No ratings yet
Statistic and R Programming Lab Exercise
24 pages
Base R
No ratings yet
Base R
2 pages
Da Session 4
No ratings yet
Da Session 4
75 pages
Data Visualisation L9+L10 Lab 1 R Basics: Printing Character
No ratings yet
Data Visualisation L9+L10 Lab 1 R Basics: Printing Character
9 pages
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
No ratings yet
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
15 pages
Matrix, Dataframes, List
No ratings yet
Matrix, Dataframes, List
8 pages
DR - Pierpaolo-Delser - Introduction R
No ratings yet
DR - Pierpaolo-Delser - Introduction R
83 pages
Sunil Test
No ratings yet
Sunil Test
15 pages
R Network Analysis with igraph Guide
No ratings yet
R Network Analysis with igraph Guide
62 pages
R Programing
No ratings yet
R Programing
32 pages
Wiring Diagram For CCTV System
67% (3)
Wiring Diagram For CCTV System
3 pages
Lecture Notes Combinatorics
No ratings yet
Lecture Notes Combinatorics
145 pages
Helicopter Driveshaft Wear Limits
No ratings yet
Helicopter Driveshaft Wear Limits
3 pages
AFW DataSheet 2020
No ratings yet
AFW DataSheet 2020
2 pages
Unit 4 MCQS
No ratings yet
Unit 4 MCQS
9 pages
Study On Design and Stability of Tunnels by Using Rock Mass Classification Systems at Neelum Jhelum Hydro-Power Plant Project in Azad Kashmir
No ratings yet
Study On Design and Stability of Tunnels by Using Rock Mass Classification Systems at Neelum Jhelum Hydro-Power Plant Project in Azad Kashmir
7 pages
Understanding Functions and Their Operations
No ratings yet
Understanding Functions and Their Operations
25 pages
Bias & Blinding
No ratings yet
Bias & Blinding
33 pages
Courier Management System
No ratings yet
Courier Management System
2 pages
Multimodal Machine Learning: A Survey and Taxonomy: Tadas Baltru Saitis, Chaitanya Ahuja, and Louis-Philippe Morency
No ratings yet
Multimodal Machine Learning: A Survey and Taxonomy: Tadas Baltru Saitis, Chaitanya Ahuja, and Louis-Philippe Morency
20 pages
Namma Kalvi 12th Chemistry Unit 3 Powerpoint Presentation Material em 218224
No ratings yet
Namma Kalvi 12th Chemistry Unit 3 Powerpoint Presentation Material em 218224
100 pages
BUSINESS ANALYTICS-presentation Groups
No ratings yet
BUSINESS ANALYTICS-presentation Groups
7 pages
Unit Review
No ratings yet
Unit Review
4 pages
Fluid Mechanics for Engineers
No ratings yet
Fluid Mechanics for Engineers
35 pages
Photosynthesis Worksheet
No ratings yet
Photosynthesis Worksheet
4 pages
Arihant NEET 34 Years Chapterwise Solutions Chemistry 2022
100% (5)
Arihant NEET 34 Years Chapterwise Solutions Chemistry 2022
253 pages
Advanced Math Problem Solutions
No ratings yet
Advanced Math Problem Solutions
9 pages
Pharmaceutical Film Coating
No ratings yet
Pharmaceutical Film Coating
72 pages
GAT Math Practice Questions
No ratings yet
GAT Math Practice Questions
15 pages
Structural Steel Connection Design
100% (1)
Structural Steel Connection Design
8 pages
기구학 1장 솔루션
No ratings yet
기구학 1장 솔루션
35 pages
Hioki LR5000 Series
No ratings yet
Hioki LR5000 Series
8 pages
Arsenic Removal With Bayoxide® E33 Ferric Oxide Media & SORB 33® As Adsorption
No ratings yet
Arsenic Removal With Bayoxide® E33 Ferric Oxide Media & SORB 33® As Adsorption
25 pages
NCERT Solutions For Class 1 To 12 - Free PDF Download
No ratings yet
NCERT Solutions For Class 1 To 12 - Free PDF Download
13 pages
Mass and Balance
100% (2)
Mass and Balance
84 pages
Pengaruh Pemberian Aromaterapi Lavender Terhadap Penurunan Nyeri Luka Ibu Post Sectio Caesarea Di RST DR Soepraoen Kesdam V/Brawijaya Malang
No ratings yet
Pengaruh Pemberian Aromaterapi Lavender Terhadap Penurunan Nyeri Luka Ibu Post Sectio Caesarea Di RST DR Soepraoen Kesdam V/Brawijaya Malang
7 pages
Biomedical Eng. Math Module
No ratings yet
Biomedical Eng. Math Module
11 pages
Dynamic Programming in Computer Science
No ratings yet
Dynamic Programming in Computer Science
49 pages
Well-Designed Apps Rock - Great Software Begins Here - Head First Object-Oriented Analysis and Design
No ratings yet
Well-Designed Apps Rock - Great Software Begins Here - Head First Object-Oriented Analysis and Design
39 pages
Exercise Chapter 1
No ratings yet
Exercise Chapter 1
7 pages

R Programming Basics for Beginners

Uploaded by

R Programming Basics for Beginners

Uploaded by

Fundamentals of R Programming -

m1<-matrix(3:8,nrow=2) #R can infer the - Data Frame Subsetting

You might also like