0% found this document useful (0 votes)

90 views

EM622 Data Analysis and Visualization Techniques For Decision-Making

This document provides an introduction to data analysis and visualization techniques in R. It covers importing and manipulating data, basic operations in R like installing packages and exporting data, and different data structures like vectors, arrays, and data frames. The document contains code examples for importing data from files, the web, and other sources. It also demonstrates accessing and subsetting elements within different data structures.

Uploaded by

Ridhi B

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

90 views

EM622 Data Analysis and Visualization Techniques For Decision-Making

Uploaded by

Ridhi B

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

EM622 Data Analysis and Visualization

Techniques for Decision-Making

Introduction to R and Data Manipulation

1 / 47
Getting Started
RStudio console

Options (Import dataset)

File Viewer (Data & Code)

Console (for typing commands) Plots

2 / 47
Your first graph
Copy and paste:
data(iris)
plot(Sepal.Width ~ Sepal.Length, data=iris,
col=c("red","orange","blue")[iris$Species],pch=16,
xlab="Sepal Length", ylab="Sepal Width")
legend("topright", legend=levels(iris$Species),
col=c("red","orange","blue"), bty="n",pch=16)

3 / 47
Agenda

1. Basic operations
2. Data structures
3. Data Manipulation
4. Your First Graph

4 / 47
Basic Operation - Import data
1. Import data from drop down menu in R Studio:

2. Import data from SAS/SPSS, etc: http://www.statmethods.net/input/importingdata.html

5 / 47
Intermediate - Import data

## install.packages(c("tseries","lubridate"))
library(tseries)
library(lubridate)
amazon <- as.data.frame(get.hist.quote("amzn",
start="2013-1-1", end="2018-9-15", quote=c("Cl")))

## time series starts 2013-01-02

## time series ends 2018-09-14

amazon$Date<-ymd(row.names(amazon))
tail(amazon)

## Close Date
## 2018-09-07 1952.07 2018-09-07
## 2018-09-10 1939.01 2018-09-10
## 2018-09-11 1987.15 2018-09-11
## 2018-09-12 1990.00 2018-09-12
## 2018-09-13 1989.87 2018-09-13
## 2018-09-14 1970.19 2018-09-14

6 / 47
Advanced - Import data
# list of addresses for raw data.
addressList <- list(
drives_address = "http://stats.nba.com/js/data/sportvu/drivesData.js",
defense_address = "http://stats.nba.com/js/data/sportvu/defenseData.js",
catchshoot_address = "http://stats.nba.com/js/data/sportvu/catchShootData.js")

# function that grabs the data from the website and converts to R data frame
readIt <- function(address) {
web_page <- readLines(address)

## regex to strip javascript bits and convert raw to csv format

x1 <- gsub("[\\{\\}\\]]", "", web_page, perl = TRUE)
x2 <- gsub("[\\[]", "\n", x1, perl = TRUE)
x3 <- gsub("\"rowSet\":\n", "", x2, perl = TRUE)
x4 <- gsub(";", ",", x3, perl = TRUE)

# read the resulting csv with read.table()

nba <- read.table(textConnection(x4), header = T,
sep = ",", skip = 2, stringsAsFactors = FALSE)
return(nba)
}
# download the data
df_list <- lapply(addressList, readIt)

7 / 47
Advanced (Cont.) - Import data

# check the data

catchshoot<-df_list$catchshoot_address
#str(catchshoot) # Get information about structure
head(catchshoot)

## PLAYER_ID PLAYER FIRST_NAME LAST_NAME TEAM_ABBREVIATION GP MIN

## 1 202691 Klay Thompson Klay Thompson GSW 78 34.0
## 2 1717 Dirk Nowitzki Dirk Nowitzki DAL 53 26.3
## 3 2594 Kyle Korver Kyle Korver CLE 35 24.6
## 4 201586 Serge Ibaka Serge Ibaka TOR 23 30.9
## 5 201567 Kevin Love Kevin Love CLE 60 31.4
## 6 202331 Paul George Paul George IND 74 35.8
## PTS FGM FGA FG_PCT FG3M FG3A FG3_PCT EFG_PCT PTS_TOT X
## 1 11.5 4.2 9.3 0.454 3.1 7.1 0.438 0.621 899 NA
## 2 8.1 3.4 7.5 0.446 1.3 3.5 0.388 0.535 427 NA
## 3 7.6 2.7 5.7 0.470 2.2 4.7 0.470 0.662 265 NA
## 4 7.5 2.9 6.9 0.424 1.7 4.3 0.394 0.547 173 NA
## 5 7.5 2.6 6.6 0.388 2.3 5.8 0.395 0.561 448 NA
## 6 7.4 2.7 6.1 0.437 2.0 4.8 0.420 0.603 546 NA

8 / 47
Advanced: scraping the web using R

#install.packages("rvest")
library(rvest)
# Store web url
lego_movie <- read_html("http://www.imdb.com/title/tt1490017/")
#Scrape the website for the movie rating
rating <- lego_movie %>%
html_nodes("strong span") %>%
html_text() %>%
as.numeric()
#rating
# Scrape the website for the cast
cast <- lego_movie %>%
html_nodes("#titleCast .itemprop span") %>%
html_text()
#cast

https://stat4701.github.io/edav/2015/04/02/rvest_tutorial/

9 / 47
Advanced (Cont.): scraping the web using R

#Scrape the website for the movie rating

rating

## [1] 7.8

# Scrape the website for the cast

cast

## character(0)

https://stat4701.github.io/edav/2015/04/02/rvest_tutorial/

10 / 47
Basic Operation - Export data

I Export dataframe into a spreedsheet,the easiest way to do this is to

use write.csv().
I By default, write.csv() includes row names, but these are usually
unnecessary and may cause confusion.
I The export file will be stored under working directory.
# export 'mydf' as a .csv file:
write.csv(mydf,"test.csv")

I How to find out your working directory?

# returns an absolute filepath representing the current working directory o
getwd()
## [1] "/Users/annieyu/Dropbox/622 visualization/lectures/Lecture 3_intro_t

I Write data into other format files:

http://www.cookbook- r.com/Data_input_and_output/Writing_data_to_a_file/

11 / 47
Basic Operation - Install pacakges
Two ways to install a package:
1. From drop down menu in R Studio:

2. Using command:
# Download and install packages from CRAN-like repositories or from local f
install.packages(c("ggplot2","tidyr","dplyr"))
# Always load package before call it:
library(ggplot2)
12 / 47
Basic Operation - Update pacakges
1. To update all your installed packages to the latest versions available:

update.packages()

2. To store your R code, always create a R script:

3. Export your images to pdf/png format:

13 / 47
Getting Started
R programming style

I R is case sensitive: a and A are two different objects.

I The assignment symbol is <-. Alternatively, the classical = symbol
can be used.
I The symbol # comments to the end of the line:

# This is a comment
# The two following statements are equivalent:
a <- 1
# Assigning value 1 to object a:
a = 1

14 / 47
Data Structure
1. Vector
2. Matrix
3. Array
4. Data Frame
5. List

http://venus.ifca.unican.es/Rintro/dataStruct.html

15 / 47
Data Structure - Variable
Like most other languages, R lets you assign values to variables and refer
to them by name:
x <- 1
# x gets 1
y <- 2
# c(...): a generic function which combines values into a vector
z <- c(x,y)
# evaluate z to see what's stored as z
z

## [1] 1 2

Notice that the substitution is done at the time that the value is assigned
to z, not the time that z is evaluated:
y <- 5
z

## [1] 1 2

16 / 47
Data Structure - Vector
Fetch element(s) by location in a vector:

a <- c(1,2,3,4,5,6,7,8)
a

## [1] 1 2 3 4 5 6 7 8

# fetch the 5th item in vector a:

a[5]

## [1] 5

# fetch item 1 through 6:

a[1:6]

## [1] 1 2 3 4 5 6

# fetch item 1, 3, 7:
a[c(1,3,7)]

## [1] 1 3 7

17 / 47
Data Structure - Array
I In R, you can construct more complicated data structures than just
vectors.
I An array object is just a vector that’s associated with a dimension
attribute.

# Define an array
a <- array(c(1, 2, 3, 4, 5, 6, 7, 8), dim=c(2, 4))
a

## [,1] [,2] [,3] [,4]

## [1,] 1 3 5 7
## [2,] 2 4 6 8

# fetch one cell in array a:

a[2,3]

## [1] 6

# fetch 1st row only

a[1,]

## [1] 1 3 5 7

18 / 47
Data Structure - Data frame
I A data frame is a list that contains multiple named vectors that are
the same length.
I Like a spreadsheet or a database table, particularly good for
representing experimental data.
# data.frame() is a function to creates data frames
team <-c("A","B","C","D","E")
first <- c(92, 89, 94, 72, 59)
second <- c(70, 73, 77, 90, 102)
mydf <- data.frame(team, first, second)
mydf

## team first second

## 1 A 92 70
## 2 B 89 73
## 3 C 94 77
## 4 D 72 90
## 5 E 59 102

# refer to the components of a data frame by name:

mydf$team

## [1] A B C D E
## Levels: A B C D E
19 / 47
Data Structure - List
I R has a built-in data type for mixing objects of different types, called
lists.

# list() function to construct R lists.

#Example: a list containing two strings, and a data frame
e <- list(thing=c("hat","shoes"), size=c("8.25","5"), myData=mydf)
e

## $thing
## [1] "hat" "shoes"
##
## $size
## [1] "8.25" "5"
##
## $myData
## team first second
## 1 A 92 70
## 2 B 89 73
## 3 C 94 77
## 4 D 72 90
## 5 E 59 102

20 / 47
Data Structure - List Cont

# fetch the 1st item in the list:

e$thing

## [1] "hat" "shoes"

e[1]

## $thing
## [1] "hat" "shoes"

# fetch the 1st row in the data frame

# which is the third component in the list:
e$myData[1,]

## team first second

## 1 A 92 70

21 / 47
Data Structure - Get Info about structure
# Here are some sample variables for example:
n <- 1:4
let <- LETTERS[1:4]
let

## [1] "A" "B" "C" "D"

df <- data.frame(n, let)

## n let
## 1 1 A
## 2 2 B
## 3 3 C
## 4 4 D

# Get information about structure

str(df)

## 'data.frame': 4 obs. of 2 variables:

## $ n : int 1 2 3 4
## $ let: Factor w/ 4 levels "A","B","C","D": 1 2 3 4

22 / 47
Data Structure - Get Info about structure

# Get the length of a vector

length(n)

## [1] 4

# Number of rows
nrow(df)

## [1] 4

# Number of columns
ncol(df)

## [1] 2

# Get num of rows and columns

dim(df)

## [1] 4 2

23 / 47
1
Data Exploration
“Happy families are all alike; every unhappy family is unhappy in its own
way. ” Leo Tolstoy

“Tidy datasets are all alike, but every messy dataset is messy in its own
way. ” Hadley Wickham

1 Hadley Wickham. http://r4ds.had.co.nz/tidy-data.html

24 / 47
Working with NA and NaN
There are some special characters in R
I NA : Not Available (ie missing values)

I NaN : Not a Number

I Inf: Infinity

I -Inf : Minus Infinity

# For instance:
0/0

## [1] NaN

1/0

## [1] Inf

# Here's how to test whether a variable has one of these values:

y <- NA
# Is y NA?
is.na(y)

## [1] TRUE

25 / 47
Working with NA and NaN
Ignoring "bad" values in vector summary functions:
I If you run functions like mean() or sum() on a vector or data frame
containing NA or NaN, they will return NA and NaN(bad value).
I Many of these functions take the flag na.rm, which tells them to
ignore these values:
df1 <- c(1, 2, 3, NA, 5)
mean(df1)

## [1] NA

mean(df1, na.rm=TRUE)

## [1] 2.75

df2 <- c(1, 2, 3, NaN, 5)

sum(df2)

## [1] NaN

sum(df2, na.rm=TRUE)

## [1] 11
26 / 47
Example: Import Data
library(readr)
HW <- read_csv("dataSets/Student_List_HW.csv")
HW<-as.data.frame(HW)
summary(HW)

## Last_Name First_Name Status

## Length:20 Length:20 Length:20
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## Home Homework_1 Homework_2 Homework_3
## Length:20 Min. :58.00 Min. :77.00 Min. : 80.00
## Class :character 1st Qu.:70.50 1st Qu.:80.00 1st Qu.: 85.50
## Mode :character Median :74.50 Median :88.00 Median : 90.50
## Mean :77.39 Mean :87.35 Mean : 90.90
## 3rd Qu.:84.25 3rd Qu.:93.00 3rd Qu.: 98.25
## Max. :99.00 Max. :99.00 Max. :100.00
## NA's :2

27 / 47
Example: Replace Missing Variables
HW$Homework_1[is.na(HW$Homework_1)]<-0
HW$Home[which(HW$Last_Name=="Garcia")]<-"NJ"
HW$Home[is.na(HW$Home)]<-"Unknown"
HW<-HW[complete.cases(HW),]
summary(HW)

## Last_Name First_Name Status

## Length:18 Length:18 Length:18
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## Home Homework_1 Homework_2 Homework_3
## Length:18 Min. : 0.00 Min. :77.00 Min. : 80.00
## Class :character 1st Qu.:66.75 1st Qu.:80.00 1st Qu.: 86.25
## Mode :character Median :74.50 Median :86.00 Median : 90.50
## Mean :70.28 Mean :86.39 Mean : 91.33
## 3rd Qu.:84.25 3rd Qu.:91.75 3rd Qu.: 98.75
## Max. :99.00 Max. :98.00 Max. :100.00

28 / 47
Subset Observations (Rows)2

2 https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-

cheatsheet.pdf
29 / 47
Subset Observations (Rows) Cont.

#load dplyr
library(dplyr)
Subset_HW_1 <- filter(HW,Status == "Master")
head(Subset_HW_1)

## Last_Name First_Name Status Home Homework_1 Homework_2 Homework_3

## 1 Brown Susan Master NJ 74 88 98
## 2 Wilson Karen Master NJ 0 93 84
## 3 Moore Nancy Master PA 74 91 89
## 4 Taylor Betty Master GA 93 92 88
## 5 Anderson Anthony Master CA 96 98 100
## 6 Thomas Donald Master NJ 82 77 96

30 / 47
Subset Variables (Columns)

There are many options to choose columns

31 / 47
Subset Variables (Columns) Cont.

Subset_HW_2 <- select(HW,contains("Name"),contains("Homework"))

head(Subset_HW_2)

## Last_Name First_Name Homework_1 Homework_2 Homework_3

## 1 Smith Patricia 82 97 82
## 2 Johnson Jennifer 0 77 99
## 3 Williams Robert 99 80 80
## 4 Jones Michael 75 82 86
## 5 Brown Susan 74 88 98
## 7 Miller Richard 85 78 82

32 / 47
Subset Observations (Rows) and Variables (Columns)

Subset_HW_3 <- subset(HW,Status == "Master" ,

select=c("Last_Name","First_Name",
"Homework_1","Homework_2","Homework_3"))
head(Subset_HW_3)

## Last_Name First_Name Homework_1 Homework_2 Homework_3

## 5 Brown Susan 74 88 98
## 8 Wilson Karen 0 93 84
## 9 Moore Nancy 74 91 89
## 10 Taylor Betty 93 92 88
## 11 Anderson Anthony 96 98 100
## 12 Thomas Donald 82 77 96

33 / 47
Pipe Operator

Piping makes coding more readable and allow us to make several actions
in one sentence such as sort, filter, or create a variable.

34 / 47
Pipe Operator Cont.

HW %>%
filter(Status == "Master") %>%
select(contains("Name"),contains("Homework"))%>%
arrange(desc(Homework_1))%>%
head()

## Last_Name First_Name Homework_1 Homework_2 Homework_3

## 1 Anderson Anthony 96 98 100
## 2 Taylor Betty 93 92 88
## 3 Garcia Linda 93 91 100
## 4 Thomas Donald 82 77 96
## 5 Brown Susan 74 88 98
## 6 Moore Nancy 74 91 89

35 / 47
Create New Columns and Re-order
The mutate() function will add new columns to the data frame.
Arrange or re-order rows using arrange().

HW_update<-HW %>%
filter(Status != "Unknown") %>%
mutate(Homework_Average = 0.2*Homework_1+0.3*Homework_2+0.5*Homework_3)%>%
arrange(desc(Homework_Average))
head(HW_update)

## Last_Name First_Name Status Home Homework_1 Homework_2

## 1 Anderson Anthony Master CA 96 98
## 2 Garcia Linda Master NJ 93 91
## 3 Wang Thomas PhD CHINA 72 98
## 4 Martin Morgan Undergraduate NJ 72 88
## 5 Brown Susan Master NJ 74 88
## 6 Taylor Betty Master GA 93 92
## Homework_3 Homework_Average
## 1 100 98.6
## 2 100 95.9
## 3 95 91.3
## 4 99 90.3
## 5 98 90.2
## 6 88 90.2

36 / 47
Split-Apply-Combine
Idea: split up a big problem into manageable pieces, apply a function to
each piece and then combine all the pieces together.

Split Apply Combine

(by X) X Y (average)
A 2
A 4
X Y
X Y A 3 X Y
A 2 A 3
A 4 X Y X Y B 2.5
B 0 B 0 B 2.5 C 7.5
B 5 B 5
C 5
C 10
X Y X Y
C 5 B 7.5
C 10

37 / 47
Group Data
Implement group operations in the “split-apply-combine” concept:

38 / 47
Group Data

Group_Summarise_HW<- HW %>%
filter(Status != "Unknown") %>%
mutate(Homework_Average = 0.2*Homework_1+0.3*Homework_2+0.5*Homework_3)%>%
group_by(Status) %>%
summarise(Homework_Average=mean(Homework_Average),
Number_of_Student=length(Status))%>%
arrange(desc(Homework_Average))
head(Group_Summarise_HW)

## # A tibble: 3 x 3
## Status Homework_Average Number_of_Student
## <chr> <dbl> <int>
## 1 Master 87.4 8
## 2 PhD 86.4 2
## 3 Undergraduate 83.7 8

39 / 47
Reshape Data3
Lets change the layout of a data set, our tools from Tidyr library are:

3 https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-

cheatsheet.pdf
40 / 47
Reshape Data Cont.

I gather() makes "wide" data longer

I unite() combines two variables into one variable

#load tidyr
library(tidyr)
tidyr_HW<- HW %>% unite(Name, First_Name, Last_Name, sep = " ")%>%
select(-c(Status,Home)) %>%
gather(Homework, Score, Homework_1:Homework_3)
head(tidyr_HW)

## Name Homework Score

## 1 Patricia Smith Homework_1 82
## 2 Jennifer Johnson Homework_1 0
## 3 Robert Williams Homework_1 99
## 4 Michael Jones Homework_1 75
## 5 Susan Brown Homework_1 74
## 6 Richard Miller Homework_1 85

41 / 47
Merge Data
Exam<- read_csv("dataSets/Student_List_Exam.csv")
Exam<-as.data.frame(Exam)
head(Exam,3)

## Last_Name First_Name Exam Project

## 1 Smith Patricia 77 65
## 2 Johnson Jennifer 100 96
## 3 Williams Robert 92 53

HW_update<-mutate(HW,Homework_Average =
0.2*Homework_1+0.3*Homework_2+0.5*Homework_3)
Merged_df<-inner_join(HW_update, Exam,by=c("Last_Name","First_Name"))
head(Merged_df,3)

## Last_Name First_Name Status Home Homework_1 Homework_2 Homework_3

## 1 Smith Patricia Undergraduate MD 82 97 82
## 2 Johnson Jennifer Undergraduate NY 0 77 99
## 3 Williams Robert Undergraduate NY 99 80 80
## Homework_Average Exam Project
## 1 86.5 77 65
## 2 72.6 100 96
## 3 83.8 92 53

42 / 47
ggplot2

I ggplot2 is an R package designed for creating high quality plots.

I ggplot is based on the layered grammar of graphics, which means
that plots can be constructed layer by layer.

#you need to install the package just once

install.packages('ggplot2')

43 / 47
Composition of plots in ggplot2
Plots have two main components: 1) data to use and 2) type of plot.

Basic We want
function points Aesthetics
for plotting

ggplot(data=economics) + geom_point(aes(x=date, y=unemploy))

Specify Specify
Dataset what goes what goes
on the on the
X axis Y axis

Type of plot
Data to use

44 / 47
Our first offcial graph
library(ggplot2)
ggplot(data=iris)+
geom_point(aes(x=Sepal.Width,y=Sepal.Length,colour=Species))

Species
Sepal.Length

setosa
6 versicolor
virginica

2.0 2.5 3.0 3.5 4.0 4.5

Sepal.Width

45 / 47
Resources

1. Rob Kabacoff, “R in Action”: https://www.amazon.com/Action- Data- Analysis- Graphics/dp/

1617291382/ref=pd_sbs_14_t_0?_encoding=UTF8&psc=1&refRID=EEBN1DRHWQ6J09Z6TTBY

2. Michael J Crawley, “The R Book”:

http://users.humboldt.edu/ygkim/CrawleyMJ_TheRBook.pdf

3. Joseph Adler, “R in a Nutshell”:

http://www.amazon.com/R- Nutshell- Joseph- Adler/dp/144931208X

4. Quick-R tutorial: http://www.statmethods.net/input/datatypes.html

5. Cookbook for R, Data input and output:

http://www.cookbook- r.com/Data_input_and_output/Writing_data_to_a_file/

46 / 47
What have we learned?

1. Define Data structures such as vector, array, list and dataframe.

2. Basic operations such as install package, import/export datasets
3. Common data manipulation operations such as filtering for rows,
selecting specific columns, re-ordering rows, adding new columns,
summarizing data, and performing the "split-apply-combine" task
4. Draw the graph

47 / 47

Iso 25179-2018
No ratings yet
Iso 25179-2018
16 pages
Grey Aliens Harvesting of Souls
0% (1)
Grey Aliens Harvesting of Souls
7 pages
All Shes Ever Known S1
0% (1)
All Shes Ever Known S1
604 pages
Borang Pengisytiharan Politically Exposed Person'
No ratings yet
Borang Pengisytiharan Politically Exposed Person'
1 page
Skeeter Monologue
100% (2)
Skeeter Monologue
1 page
A Project Report On "Consumer Behaviour Towards Lays Potato Chips"
No ratings yet
A Project Report On "Consumer Behaviour Towards Lays Potato Chips"
73 pages
R Prog
No ratings yet
R Prog
27 pages
R Programming: © 2016 SMART Training Resources Pvt. LTD
No ratings yet
R Programming: © 2016 SMART Training Resources Pvt. LTD
28 pages
MTech R Notes
No ratings yet
MTech R Notes
14 pages
1. R Programming
No ratings yet
1. R Programming
22 pages
Tutorial 1
No ratings yet
Tutorial 1
29 pages
Lecture 1
No ratings yet
Lecture 1
42 pages
Bdo Co1 Session 4
No ratings yet
Bdo Co1 Session 4
43 pages
DSF 9-10
No ratings yet
DSF 9-10
25 pages
Unit 1 Big Data Analytics - An Introduction (Final)
No ratings yet
Unit 1 Big Data Analytics - An Introduction (Final)
65 pages
Data Visualisation Slides 1-6
No ratings yet
Data Visualisation Slides 1-6
318 pages
2.R Concepts - BDSM - Oct2020 PDF
No ratings yet
2.R Concepts - BDSM - Oct2020 PDF
37 pages
Introduction to r
No ratings yet
Introduction to r
18 pages
Obejcts in R A13
No ratings yet
Obejcts in R A13
8 pages
R Concepts - 25092018 PDF
No ratings yet
R Concepts - 25092018 PDF
51 pages
R-Programming: To See The Working Directory in R Studio
No ratings yet
R-Programming: To See The Working Directory in R Studio
17 pages
MLlab5th
No ratings yet
MLlab5th
17 pages
2 Undefined
No ratings yet
2 Undefined
86 pages
ProgrammingForDS14_Rbasics
No ratings yet
ProgrammingForDS14_Rbasics
32 pages
DA_Lab_Week-2
No ratings yet
DA_Lab_Week-2
22 pages
Dar lecture 7
No ratings yet
Dar lecture 7
24 pages
MIS 4.hafta (Introduction To R)
No ratings yet
MIS 4.hafta (Introduction To R)
52 pages
R Objects
No ratings yet
R Objects
10 pages
WIN SEM (2022-23) CSE4027 ETH AP2022236000324 Reference Material I 25-Jan-2023 Module-1 Topic-3 - R Datatypes
No ratings yet
WIN SEM (2022-23) CSE4027 ETH AP2022236000324 Reference Material I 25-Jan-2023 Module-1 Topic-3 - R Datatypes
41 pages
R Programming For NGS Data Analysis
No ratings yet
R Programming For NGS Data Analysis
5 pages
1 - Introduction To Programming With R
No ratings yet
1 - Introduction To Programming With R
13 pages
Creating and Manipulating Objects
No ratings yet
Creating and Manipulating Objects
12 pages
Mod1 R Programming
No ratings yet
Mod1 R Programming
49 pages
Week1 Slides
No ratings yet
Week1 Slides
64 pages
basics of R
No ratings yet
basics of R
12 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
37 pages
ProgrammingForDS13_introR
No ratings yet
ProgrammingForDS13_introR
25 pages
Homo Deus A Brief History of Tomorrow
No ratings yet
Homo Deus A Brief History of Tomorrow
19 pages
Rmarkdown
No ratings yet
Rmarkdown
10 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
Introduction To R
No ratings yet
Introduction To R
52 pages
Getting Started With R
No ratings yet
Getting Started With R
155 pages
Introduction to R for Business Analytics(1)
No ratings yet
Introduction to R for Business Analytics(1)
7 pages
01 IntroSlides
No ratings yet
01 IntroSlides
43 pages
Network Analysis and Visualization With R and Igraph
No ratings yet
Network Analysis and Visualization With R and Igraph
62 pages
N2 Data in R
No ratings yet
N2 Data in R
7 pages
BDA Section 3
No ratings yet
BDA Section 3
33 pages
Introduction To R
No ratings yet
Introduction To R
39 pages
Broomspatial
No ratings yet
Broomspatial
31 pages
Lec 4 Basics of R
No ratings yet
Lec 4 Basics of R
22 pages
Data Mining Lab 2
No ratings yet
Data Mining Lab 2
15 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
Vectors and lists in R
No ratings yet
Vectors and lists in R
9 pages
Data in R
No ratings yet
Data in R
7 pages
Introduction To R
No ratings yet
Introduction To R
21 pages
Unit 2
No ratings yet
Unit 2
29 pages
Apunts BLOC 1 Estadística
No ratings yet
Apunts BLOC 1 Estadística
15 pages
6 Working With Data Frames in R
No ratings yet
6 Working With Data Frames in R
8 pages
All Codes
No ratings yet
All Codes
10 pages
advance R prog.-1
No ratings yet
advance R prog.-1
24 pages
R Introduction II
No ratings yet
R Introduction II
45 pages
Introduction To Data Science With R Programming
No ratings yet
Introduction To Data Science With R Programming
40 pages
Chapter - 03 - Review of Basic Data
No ratings yet
Chapter - 03 - Review of Basic Data
92 pages
MDPN460 Lecture05
No ratings yet
MDPN460 Lecture05
32 pages
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
English Project 25-26 INT
No ratings yet
English Project 25-26 INT
6 pages
The Rat Trap
No ratings yet
The Rat Trap
8 pages
8 Steps To High Performance Coaching
No ratings yet
8 Steps To High Performance Coaching
5 pages
SIM Summary CH8 - 14 Edition SIM Summary CH8 - 14 Edition
No ratings yet
SIM Summary CH8 - 14 Edition SIM Summary CH8 - 14 Edition
9 pages
CH 10 E-Commerce Digital Markets, Digital Goods CH 10 E-Commerce Digital Markets, Digital Goods
No ratings yet
CH 10 E-Commerce Digital Markets, Digital Goods CH 10 E-Commerce Digital Markets, Digital Goods
7 pages
MIS Course Handout - 2018-19 Second Sem - WILP BITS Pilani
No ratings yet
MIS Course Handout - 2018-19 Second Sem - WILP BITS Pilani
14 pages
CNC-Tutorial 1 Ver0.6
No ratings yet
CNC-Tutorial 1 Ver0.6
39 pages
Relational Aggression Classroom Guidance Lesson
No ratings yet
Relational Aggression Classroom Guidance Lesson
5 pages
Bunty Script
No ratings yet
Bunty Script
4 pages
ESA Lesson Plan
No ratings yet
ESA Lesson Plan
4 pages
Warda Zulfiqar Ali erasmus
No ratings yet
Warda Zulfiqar Ali erasmus
2 pages
Acronyms and Abbreviations
No ratings yet
Acronyms and Abbreviations
4 pages
5 Little Friend Song
No ratings yet
5 Little Friend Song
10 pages
Grammar-Exercises Unit 5
No ratings yet
Grammar-Exercises Unit 5
43 pages
05 Pri WB Math P5 PDF
No ratings yet
05 Pri WB Math P5 PDF
16 pages
Vapor 55: All-Electric Helicopter Uas
No ratings yet
Vapor 55: All-Electric Helicopter Uas
1 page
KTM DUKE
No ratings yet
KTM DUKE
8 pages
C Network Programming
No ratings yet
C Network Programming
5 pages
UCP Assignment
No ratings yet
UCP Assignment
16 pages
Lesson Plan: Algebra, Grade 11
No ratings yet
Lesson Plan: Algebra, Grade 11
5 pages
Vee Power: Solutions (PVT) LTD
No ratings yet
Vee Power: Solutions (PVT) LTD
4 pages
Manual - BEE (ESC-EE 01 (P) ) - 2024-25
No ratings yet
Manual - BEE (ESC-EE 01 (P) ) - 2024-25
48 pages
Prospectus 2010 Final
No ratings yet
Prospectus 2010 Final
15 pages
Answer: C: Page 1/12
No ratings yet
Answer: C: Page 1/12
12 pages
Kia Carnival 2009 2.9L Diesel Supplement Workshop Manual
100% (1)
Kia Carnival 2009 2.9L Diesel Supplement Workshop Manual
393 pages
PDS NA Double Coated Cloth Tape P-55 P55B 022020 EN
No ratings yet
PDS NA Double Coated Cloth Tape P-55 P55B 022020 EN
2 pages
Pinggang Pinoy
No ratings yet
Pinggang Pinoy
2 pages
Memoirs Of Hippie Girl In India 1st Edition Becoy Ann instant download
100% (1)
Memoirs Of Hippie Girl In India 1st Edition Becoy Ann instant download
27 pages
Uang Yudha
No ratings yet
Uang Yudha
5 pages
The Consolidation of Latin America, 1830-1920
No ratings yet
The Consolidation of Latin America, 1830-1920
26 pages
Umbilical Cord Blood Gas Analysis
No ratings yet
Umbilical Cord Blood Gas Analysis
2 pages

EM622 Data Analysis and Visualization Techniques For Decision-Making

Uploaded by

EM622 Data Analysis and Visualization Techniques For Decision-Making

Uploaded by

EM622 Data Analysis and Visualization

Techniques for Decision-Making

Introduction to R and Data Manipulation

Options (Import dataset)

Console (for typing commands) Plots

2. Import data from SAS/SPSS, etc: http://www.statmethods.net/input/importingdata.html

## time series starts 2013-01-02

## regex to strip javascript bits and convert raw to csv format

# read the resulting csv with read.table()

# check the data

## PLAYER_ID PLAYER FIRST_NAME LAST_NAME TEAM_ABBREVIATION GP MIN

#Scrape the website for the movie rating

# Scrape the website for the cast

I Export dataframe into a spreedsheet,the easiest way to do this is to

I How to find out your working directory?

I Write data into other format files:

2. To store your R code, always create a R script:

3. Export your images to pdf/png format:

I R is case sensitive: a and A are two different objects.

# fetch the 5th item in vector a:

# fetch item 1 through 6:

## [,1] [,2] [,3] [,4]

# fetch one cell in array a:

# fetch 1st row only

## team first second

# refer to the components of a data frame by name:

# list() function to construct R lists.

# fetch the 1st item in the list:

## [1] "hat" "shoes"

# fetch the 1st row in the data frame

## team first second

## [1] "A" "B" "C" "D"

df <- data.frame(n, let)

# Get information about structure

## 'data.frame': 4 obs. of 2 variables:

# Get the length of a vector

# Get num of rows and columns

1 Hadley Wickham. http://r4ds.had.co.nz/tidy-data.html

I NaN : Not a Number

I -Inf : Minus Infinity

# Here's how to test whether a variable has one of these values:

df2 <- c(1, 2, 3, NaN, 5)

## Last_Name First_Name Status

## Last_Name First_Name Status

## Last_Name First_Name Status Home Homework_1 Homework_2 Homework_3

There are many options to choose columns

Subset_HW_2 <- select(HW,contains("Name"),contains("Homework"))

## Last_Name First_Name Homework_1 Homework_2 Homework_3

Subset_HW_3 <- subset(HW,Status == "Master" ,

## Last_Name First_Name Homework_1 Homework_2 Homework_3

## Last_Name First_Name Homework_1 Homework_2 Homework_3

## Last_Name First_Name Status Home Homework_1 Homework_2

Split Apply Combine

I gather() makes "wide" data longer

## Name Homework Score

## Last_Name First_Name Exam Project

## Last_Name First_Name Status Home Homework_1 Homework_2 Homework_3

I ggplot2 is an R package designed for creating high quality plots.

#you need to install the package just once

ggplot(data=economics) + geom_point(aes(x=date, y=unemploy))

2.0 2.5 3.0 3.5 4.0 4.5

1. Rob Kabacoff, “R in Action”: https://www.amazon.com/Action- Data- Analysis- Graphics/dp/

2. Michael J Crawley, “The R Book”:

3. Joseph Adler, “R in a Nutshell”:

4. Quick-R tutorial: http://www.statmethods.net/input/datatypes.html

5. Cookbook for R, Data input and output:

1. Define Data structures such as vector, array, list and dataframe.

You might also like