0% found this document useful (0 votes)

177 views96 pages

02b Data Structures Datasets

This document provides tutorials and code examples on data structures in R. It begins by listing various tutorial links on data structures like vectors, matrices, and data frames. It then demonstrates how to create vectors using functions like c(), seq(), and rep(). Examples are given for numeric, character, date, and logical vectors. The document also shows how to reference, subset, filter, sort, and perform vectorized operations on data structures in R.

Uploaded by

Alexandra Gabriela Grecu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

177 views96 pages

02b Data Structures Datasets

Uploaded by

Alexandra Gabriela Grecu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 96

Al.I.

Cuza University of Iai

Faculty of Economics and Business Administration
Department of Accounting, Information Systems and
Statistics

Data Analysis & Data

Science with R
Data structures in R.
Build-in Datasets
By Marin Fotache

Data structures in R

Tutorials (and code) on Data

Structures

Data structures (Advanced R by Hadley Wickham)

http://adv-r.had.co.nz/Data-structures.html

1.2 Variables (Variables and Data Structures)

https://www.youtube.com/watch?v=DG7YNf8kb3w

2 - Introduction to R : Atomic Classes

https://www.youtube.com/watch?v=271FKAYavYE
http://repidemiology.wordpress.com/introduction-to-r-code/

1.3 Vectors (Variables and Data Structures)

https://www.youtube.com/watch?v=QygSZw77Hs8

3- Introduction to R : Vectors

https://www.youtube.com/watch?v=MGphwmXCCgM#t=12
http://repidemiology.wordpress.com/introduction-to-r-code/

1.4 Matrices (Variables and Data Structures)

https://www.youtube.com/watch?v=UakyyZSyuZU

Tutorials on Data Structures (cont.)

1.5

Lists and Data Frames (Variables and Data Structures)

https://www.youtube.com/watch?v=U6vbR4el3kQ
1.6 Logical Vectors and Operators (Variables and Data
Structures)
https://www.youtube.com/watch?v=GQb735O2qjc
4- Introduction to R : Matrix, List and Data Frame
https://www.youtube.com/watch?v=cEX4iXUPqoo
http://repidemiology.wordpress.com/introduction-to-r-code/
Common Data Structures in R
https://www.youtube.com/watch?v=q5YJUGTYUvI
Introduction to R Statistical Computing: Data Structures
https://www.youtube.com/watch?v=OZD4oLobjWM
Lecture 2b: Subsetting
https://www.youtube.com/watch?v=hWbgqzsQJF0&index=7&
list=PLjTlxb-wKvXNSDfcKPFH2gzHGyjpeCZmJ

R script associated with this

presentation
02b_data_structures__datasets.R

http://1drv.ms/1sYllLB

Vectors with c() function

Vectors

are one-dimensional arrays that can hold

numeric, character logical, or date/time/timestamp data
Most frequently function c() is used to declare/form the
vector
> x = c(1, 3, 5, 7, 25, -13, 47)
> x
[1]
1
3
5
7 25 -13 47
> y = c("one", "two", "three", "eight")
> y
[1] "one"
"two"
"three" "eight"
> z = c(TRUE, FALSE, TRUE, TRUE, FALSE, TRUE)
> z
[1] TRUE FALSE TRUE TRUE FALSE TRUE
The data in a vector must only be one type (numeric,
character, or logical)

Vectors of numbers with

sequences
Vectors

can also be created with a sequence

> ten_integers.1 <- 5:14

> ten_integers.1
[1] 5 6 7 8 9 10 11 12 13 14
or
> ten_integers.2 <- seq(from=5, to=14, by=1)
> ten_integers.2
[1] 5 6 7 8 9 10 11 12 13 14
Declare

a vector of descending numbers

> seq(from=5, to=-5, by=-1)

[1] 5 4 3 2 1 0 -1 -2 -3 -4 -5
Combine

sequences and c function

> a_vector <- c( 2:4, 8:14)

> a_vector
[1] 2 3 4 8 9 10 11 12 13 14

Vectors containing a range of

dates
Generating

a vector with dates between

September 29th and October 2nd 2014 as
"pure" dates

First solution:

> seq(as.Date("2014/09/29"), by = "day", length.out = 4)

Second solution:

> seq(as.Date("2014/09/29"), as.Date("2014/10/02"),

"days")

In both cases the result is:

[1] "2014-09-29" "2014-09-30" "2014-10-01" "201410-02"

Vectors containing a range of

timestamps
Generating

a vector with dates between

September 29th and October 2nd 2014 as
timestamps
First solution
> seq(c(ISOdate(2014,9,29)), by = "DSTday",
length.out = 4)
Second solution
> x <- as.POSIXct("2014-09-25 23:59:59",
tz="Turkey")
> format(seq(x, by="day", length.out=8),
"%Y-%m-%d %Z")
Third solution
> d1<-ISOdate(year=2014,month=9,day=25,tz="GMT")
> seq(from=d1,by="day",length.out=8)

Vectors generated from the

normal distribution
Vector

object named x contains five random

values drawn from the standard normal
distribution; values are not ordered
> x <- rnorm(5)
> x

[1] -0.2766566 0.7262000

-0.3409396 -0.5192846

0.5508588

Numbers

are extracted randomly, so that the

same function will draw other five numbers:
> x <- rnorm(5)
> x

[1] 1.9030714 -1.7139177 -0.2287666

0.8369275 0.4203014

Vectors created with function rep

(repeat)
Vector

x.rep contains a sequence of

numbers (5, 7, 11) repeated three times

> x.rep <- rep(c(5, 7, 11), 3)

> x.rep
[1] 5 7 11 5 7 11 5 7 11
See

the difference with version which uses

each clause:

> x.rep.2 <- rep(c(5, 7, 11), each=2,

times=3)
> x.rep.2
[1] 5 5 7 7 11 11 5 5 7 7 11 11
5 5 7 7 11 11

Example of built-in (system

defined) vectors
> Letters
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n"
"o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"

> LETTERS
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N"
"O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"

> month.name
[1] "January"
"June"
[10] "October"

"February"
"July"
"November"

"March"
"August"
"December"

"April"
"May"
"September"

> state.name
[1] "Alabama"
"Arkansas"
...

"Alaska"

> state.area
[1]
...

51609 589757 113909

53104

"Arizona"

Vectors of factors
Factors

are nominal variables whose values have a number of

levels
Very important in data analysis and visualization
Ex: two vectors:
student names
student genres
Both

vectors initially contain characters

> names <- c( "Popescu I. Valeria", "Ionescu V. Viorel",

+
"Genete I. Aurelia", "Lazar T. Ionut",
+
"Sadovschi V. Iuliana", "Dominte I. Nicoleta")
> genre <- c("Female", "Male", "Female", "Male",
+
"Female", "Female" )
> class(names)
[1] "character"
> class(genre)
[1] "character"

Vectors of factors (cont.)

> unclass(genre)
[1] "Female" "Male"
"Female" "Male"
"Female" "Female"
Genre can have only two values, so it is converted into a factor
> genre <- as.factor(genre)
> class(genre)
[1] "factor"
> unclass(genre)
[1] 1 2 1 2 1 1
attr(,"levels")
[1] "Female" "Male"
If

a non existing value is added in vector "genre", it is

automatically converted back into character

> genre <- c(genre, "Boy")

> class(genre)
[1] "character"
> unclass(genre)

Functions for getting vector

type and length

Class

returns elements data type; unclass returns the

values
> class(ten_integers.1)
[1] "integer"

> unclass(ten_integers.1)
[1] 5 6 7 8 9 10 11 12 13 14
Internally, factor levels are stored

as integers

> class(genre)
[1] "factor"

> unclass(genre)
[1] 1 2 1 2 1 1
attr(,"levels")
[1] "Female" "Male"

> typeof(genre)
[1] "integer"
Function length

returns the number of elements in a vector

> length(ten_integers.1)
[1] 10

Referencing vector elements

First

element in vector ten_integers.1

> ten_integers.1 [1]
[1] 5
Last element in vector ten_integers.1
> ten_integers.1 [length(ten_integers.1)]
[1] 14
First three elements in vector ten_integers.1
> ten_integers.1 [1:3]
[1] 5 6 7
Last three elements in vector
> ten_integers.1 [(length(ten_integers.1)-2) :
length(ten_integers.1)]
[1] 12 13 14
First, third, fifth and sixth elements
> ten_integers.1 [c(1, 3, 5, 6)]
[1] 5 7 9 10

Referencing vector elements

(cont.)
Indices

of elements can be qualified with other

vectors
Display first, third, fifth and sixth elements in
vector ten_integers.1
Vector ind contains indices for elements of
interest from vector ten_integers.1
> ind <- c(1, 3, 5, 6)
> ind
[1] 1 3 5 6
> ten_integers.1
[1]
Now

9 10 11 12 13 14

the result:

> ten_integers.1 [ind]

[1] 5 7 9 10

Excluding elements from a

vector
Basic

idea: R will exclude from a vector the

elements whose indices are negative
(prefixed by minus)

Excluding

first element:

> ten_integers.1 [-1]

[1]

Excluding

9 10 11 12 13 14

first three elements:

> ten_integers.1 [-(1:3)]

[1]

9 10 11 12 13 14

Excluding

first, third, and fourth elements:

> ten_integers.1 [-(c(1,3,4))]

[1]

9 10 11 12 13 14

Excluding elements from a vector

(cont.)
Excluding

first three elements and the 6 th

element and the 8th element

> ten_integers.1 [-(c(1:3,6,8))]

[1] 8 9 11 13 14

Excluding

the first two elements and

the last two elements of the vector:

> ten_integers.1 [-c((1:2),

(length(ten_integers.1)-1) :
length(ten_integers.1))]
[1] 7 8 9 10 11 12

Vector filtering
Filter

vector elements - select only elements

greater than 10

> ten_integers.1 [ten_integers.1 > 10]

[1] 11 12 13 14
How

many elementes are greater than 10 ?

> length(ten_integers.1 [ten_integers.1 > 10])

[1] 4
Display

INDICES of elements greater than 10

> which (ten_integers.1 > 10)

[1]

9 10

Filter

vector elements - select only elements

greater than 10 ver. 2

> ind <- which (ten_integers.1 > 10)

> ten_integers.1 [ind]

[1] 11 12 13 14

Sorting/ordering a vector
Initial

vector

> names <- c( "Popescu I. Valeria", "Ionescu V. Viorel",

+
"Genete I. Aurelia", "Lazar T. Ionut",
+
"Sadovschi V. Iuliana", "Dominte I. Nicoleta")
Sort

the vector elements in ascending (default) order

> names <- sort(names)

> names
[1] "Dominte I. Nicoleta" "Genete I. Aurelia"
"Ionescu V. Viorel"
"Lazar T. Ionut"
[5] "Popescu I. Valeria"
"Sadovschi V. Iuliana"
Sorting

the vector in descending order

> names.desc <- rev(sort(names))

> names.desc
[1] "Sadovschi V. Iuliana" "Popescu I. Valeria"
T. Ionut"
"Ionescu V. Viorel"
[5] "Genete I. Aurelia"
"Dominte I. Nicoleta"

"Lazar

R as a vectorized language
Lecture

2c: Vectorized Operations

https://www.youtube.com/watch?v=Fm8SORJQjPY&list=PLjTlx
b-wKvXNSDfcKPFH2gzHGyjpeCZmJ&index=8
Operations

are automatically applied on each element of the

vector without looping among vector elements

> num.vec.1 <- c(1, 3, 5, 7, 25, -13, 47)

> num.vec.2 <- num.vec.1 + 100
> num.vec.2
[1] 101 103 105 107 125 87 147
> date.vec.1 <- c ("2013-10-01", "2013-10-03", "2013-10-10")
For

the moment, elements are strings

> class(date.vec.1)
[1] "character"
as.Date()

converts all of the vector elements into dates

> date.vec.1 <- as.Date(date.vec.1)

> class(date.vec.1)
[1] "Date"

R as a vectorized language
(cont.)
Operations

can be applied on two or more vectors

> num.vec.3 <- num.vec.1 + num.vec.2

> num.vec.3
[1] 102 106 110 114 150 74 194
Compare

a vector with a value

> x
[1] -0.56757455 -0.90079348
> x >= 0
[1] FALSE FALSE TRUE FALSE
> x.1 <- x >= 0
> x.1
[1] FALSE FALSE TRUE FALSE
Testing

0.24397156 -0.51325283

0.03209287

TRUE

if at least one of the vector elements fulfils the predicate

> x
[1] -0.56757455 -0.90079348
> any(x > 0)
[1] TRUE

0.24397156 -0.51325283

0.03209287

R as a vectorized language
(cont.)
Testing

if all the vector elements fulfill the

predicate (function all)

> all(x > 0)

[1] FALSE
> all(x > -25)
[1] TRUE
For

a character vector, display the number of

characters for each element

> y
[] "one"
"two"
> nchar(y)
[1] 3 3 5 5
>

"three" "eight"

Naming vector elements

Provide

a name for each vector element

> num_ro = c (one = "unu", two="doi", three="trei",

four="patru")
> num_ro
one
two
three
four
"unu"
"doi" "trei" "patru"
The

same result can be accomplished with:

> num_ro = c ("unu", "doi", "trei", "patru")

> num_ro
[1] "unu"
"doi"
"trei" "patru"
> names(num_ro) = c ("one", "two", "three", "four")
> num_ro
one
two
three
four
"unu"
"doi" "trei" "patru"

Descriptive statistics on vectors

vector (age) containing the age of 10 persons

(Kabacoff, 2011)

> age = c(1,3,5,2,11,9,3,9,12,3)

Another

vector containing the weight of above people

> weight = c(4.4,5.3,7.2,5.2,8.5,7.3,6.0,10.4,10.2,6.1)

Suppose

above weights were in US metric system, we had

convert them from lbs into kg

> weight.kg <- weight * 0.454

Compute

the mean of people's weight

> mean(weight)

[1] 7.06
Compute

the standard deviation of people's weight

> sd(weight)

[1] 2.077498
Compute

correlation between age and weight

> cor(age,weight)

Matrices
Two-dimensional

arrays where each element has

the same type (numeric,character, or logical)
Created with the m atrix function. Format:
> Myymatrix <- matrix(vector,
nrow=number_of_rows,
ncol=number_of_columns, byrow=logical_value,
dimnames=list( char_vector_rownames,
char_vector_colnames))
vector contains the elements for the matrix
nrow and ncol specify the row and column dimensions
dimnames contains optional row and column labels stored in
character vectors.
byrow indicates whether the matrix should be filled in by row
(byrow=TRUE) or by column (byrow=FALSE); the default is by
column.

Matrices (cont.)
m.1

is a 5 x 4 matrix
> m.1 <- matrix(1:20, nrow=5, ncol=4)
> m.1
[,1] [,2] [,3] [,4]
[1,]
1
6
11
16
[2,]
2
7
12
17
[3,]
3
8
13
18
[4,]
4
9
14
19
[5,]
5
10
15
20
m.2

>
>
>
>
+

is a 2 x 2 matrix, filled by rows

cells <- c(1,26,24,68)
rownames <- c("Row1", "Row2")
colnames <- c("Col1", "Col2")
m.2 <- matrix(cells, nrow=2, ncol=2, byrow=TRUE,
dimnames=list(rownames, colnames))

Matrices (cont.)
Display

m.2

> m.2
Col1 Col2
Row 1 1 26
Row 2 24 68
m.3 is a 2 x 2 matrix, filled by columns
list is a data structure presented after data frame
> m.3 <- matrix(cells, nrow=2, ncol=2,
byrow=FALSE,
+ dimnames=list(rownames, colnames))
> m.3
Col1 Col2
Row 1 1 24
Row 2 26 68

Matrices (cont.)
m.4

is a 4 x 3 matrix, filled by rows

> m.4 <- matrix(1:12, nrow=4, ncol=3, byrow=TRUE)

> m.4
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[4,] 10 11 12
Naming

rows: row.1, row.2, ... and columns: col.1, col.2, ...

> dimnames(m.4)=list(paste("row.", 1:nrow(m.4), sep=""),
paste("col.", 1:ncol(m.4), sep=""))
> m.4
col.1 col.2 col.3
row .1
1
2
3
row .2
4
5
6
row .3
7
8
9
row .4 10 11 12

Accesing matrix elements

> m.1
[,1] [,2] [,3] [,4]
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20

Display the 3rd row

> m.1[3,]
[1] 3 8 13 18

Display the

3rd column

> m.1[,3]
[1] 11 12 13 14 15

Display the element

at the intersection of the 2nd

row and the 3rd column

> m.1 [2,3]

[1] 12

Accesing matrix elements

(cont.)
Display

two elements from the same row: m.1 [2,3]

and m.1[2,4]
> m.1 [2, c(3,4)]
[1] 12 17
Display three elements from the same column:
m.1[1,2], m1[2,2] and m.1[3,2]
> m.1 [c(1,2, 3), 2]
[1] 6 7 8
Display a "submatrix", from m1 [2,2] to m2[4.4]
> m.1 [ c(2,3,4), c(2,3,4)]
[,1] [,2] [,3]
[1,] 7 12 17
[2,] 8 13 18

Basic statistics on matrix

> m.4
col.1 col.2 col.3
row .1
1
2
3
row .2
4
5
6
row .3
7
8
9
row .4 10 11 12
Compute mean of all the cells in matrix m.4
> mean(m.4)
[1] 6.5
Compute mean of all the cells on the third column
> mean(m.4[,3])
[1] 7.5
Compute mean of all the cells on the third row
> mean(m.4[3,])
[1] 8

Basic statistics on matrix (cont.)

Compute

sum of
> sum(m.4)
[1] 78
Compute sum of
> sum(m.4[,3])
[1] 30
Compute sum of
> sum(m.4[3,])
[1] 24
Compute sum of
> sum(m.4)
[1] 78

all the cells in matrix m.4

all the cells on the third column

all the cells on the third row

all the cells in matrix m.4

rowSums/colSums
rowSums

calculates the sum of the cells for each row of a

matrix
> rowSums(m.4)
row .1 row .2 row .3 row .4
6 15 24 33
colSums

calculated the sums of the cells for each column of

a matrix
> colSums(m.4)
col.1 col.2 col.3
22 26 30
rowMeans/colMeans

> rowMeans(m.4)
row .1 row .2 row .3 row .4
2
5
8 11

> colMeans(m.4)
col.1 col.2 col.3
5.5 6.5 7.5

calculate mean of the every row/column

Adding total rows and columns to

a matrix
> m.4

col.1 col.2 col.3

row .1
1
2
3
row .2
4
5
6
row .3
7
8
9
row .4 10 11 12
Add

total column
> m.4 <- cbind(m.4, rowSums(m.4))
Setting the name for the total column
> column.names <- colnames(m.4)
> column.names
[1] "col.1" "col.2" "col.3" ""

> column.names[length(column.names)] <"col.total"

> colnames(m.4) <- column.names

Adding total rows and columns to

a matrix (cont.)

Check

the operation

> m.4
col.1 col.2 col.3 col.total
row .1
1
2
3
6
row .2
4
5
6
15
row .3
7
8
9
24
row .4 10 11 12
33

Add

total row

> m.4 <- rbind(m.4, colSums(m.4))

Setting

the name for the total column

> row.names <- rownames(m.4)

> row.names
[1] "row .1" "row .2" "row .3" "row .4" ""
> row.names[length(row.names)] <- "row.total"
> rownames(m.4) <- row.names

Adding total rows and columns to

a matrix (cont.)
Check

the operation; notice the

names of rows and columns and the
content of last row and column

> m.4
col.1 col.2 col.3 col.total
row .1

row .2

row .3

row .4

row .total 22

11
26

12
30

33
78

Arrays
Similar

to matrices but can have more than

two dimensions
Elements must be of the same type
Created with array function:
> myarray <- array(vector,
+
dimensions, dimnames)

vector contains the data for the array

dimensions is a numeric vector giving the maximal
index for each dimension
dimnames - optional list of dimension labels.

Elements

in arrays are accessed similar to

those in matrices

Create and access arrays

> dim1 <- c("A1", "A2")
> dim2 <- c("B1", "B2", "B3")
> dim3 <- c("C1", "C2",
+
"C3", "C4")
> a1 <- array(1:24, c(2, 3, 4), +
dimnames=list(dim1, dim2, + dim3))
>
> a1
,,C1
B1 B2 B3
A1 1 3 5
A2 2 4 6
,,C2
B1 B2 B3
A1 7 9 11
A2 8 10 12

Cont. of previous column

, , C3
B1 B2 B3
A1 13 15 17
A2 14 16 18
, , C4
B1 B2 B3
A1 19 21 23
A2 20 22 24

display element [2,2,3]

> a1 [2,2,3]

[1] 16

Create and access arrays (cont.)

display a matrix from
elements of A and B for first
row/column of C
> a1 [,,1]

display a subarray containg all

elements from first two
rows/columns of A, B and C
> a1 [c(1,2),c(1,2),c(1,2)]

B1 B2 B3
A1 1 3 5

, , C1

A2 2 4 6
B1 B2
display elements of A for the
3rd "row" of B and 2nd
row/columns of C
> a1 [,3,2]

A1 A2
11 12

A1 1 3
A2 2 4
, , C2
B1 B2
A1 7 9
A2 8 10

Data Frames
Most

important data structure in R (at least

for us)
A data frame is a structure in R that holds
data and is similar to the datasets found in
standard statistical packages (for example,
SAS, SPSS, and Stata) and databases
The columns are variables and the rows
are observations
Variables can have different types (for
example, numeric, character) in the same
data frame

Create an empty data frame

> student_gi <- data.frame(studentID = numeric(),
name = character(), age = numeric(),
scholarship = character(),
lab_assessment = character(),
final_grade = numeric())
> class(student_gi)
[1] "data.fram e"
> str(student_gi)
'data.fram e': 0 obs. of 6 variables:
$ studentID
: num
$ nam e
: Factor w / 0 levels:
$ age
: num
$ scholarship : Factor w / 0 levels:
$ lab_assessm ent: Factor w / 0 levels:
$ fi
nal_grade : num

Create a data frame from vectors

Create

the vectors

> studentID <- c(1, 2, 3, 4, 5)

Create

the data frame using the above vectors

> student_gi <- data.frame(studentID, name, age,

+
scholarship, lab_assessment, final_grade)

Display data frame content

Display

data frame (content)

> student_gi
studentID
nam e age scholarship lab_assessm ent fi
nal_grade
1
1 Popescu I.Vasile 23
Social
Bine
9.00
2
2 Ianos W .Adriana 19
Studiu1 Foarte bine
9.45
3
3 Kovacz V.Iosef 21
Studiu2
Excelent
9.75
4
4 Babadag I.M aria 22
M erit
Bine
9.00
5
5
Pop P.Ion 31
Studiu1
Slab
6.00
Display one column of the data frame as a vector
> student_gi$name
[1] Popescu I.Vasile Ianos W .Adriana Kovacz V.Iosef Babadag I.M aria Pop P.Ion
Levels: Babadag I.M aria Ianos W .Adriana Kovacz V.Iosef Pop P.Ion Popescu I.Vasile
Display one column of the data frame as a... column
> student_gi["name"]
name
1 Popescu I.Vasile
2 Ianos W .Adriana
3 Kovacz V.Iosef
4 Babadag I.M aria
5
Pop P.Ion

Display data frame structure

Confirm

student_giis indeed a data frame

> class(student_gi)
[1] "data.fram e"

Display

structure of the data frame

> str(student_gi)
'data.fram e': 5 obs. of 6 variables:
$ studentID
: num 1 2 3 4 5
$ nam e
: Factor w / 5 levels "Babadag I.M aria",..: 5 2 3 1 4
$ age
: num 23 19 21 22 31
$ scholarship : Factor w / 4 levels "M erit","Social",..: 2 3 4 1 3
$ lab_assessm ent: Factor w / 4 levels "Bine","Excelent",..: 1 3 2 1 4
$ fi
nal_grade : num 9 9.45 9.75 9 6

Display

type of invididual variables within the data fra

> class(student_gi$studentID)
[1] "num eric"

> class(student_gi$name)
[1] "factor"

Useful functions for displaying

some data frame properties

Number

of observations (rows)

> nrow(student_gi)

[1] 5

Number

of variables (columns)

> ncol(student_gi)

[1] 6

Both

the number of observations (rows) and variables

(columns)

> dim(student_gi)

[1] 5 6

Display

the names of all the variables (columns)

> names(student_gi)

[1] "studentID "

"nam e"
"age"
"lab_assessm ent" "fi
n al_grade"

Display

"scholarship"

the names of the second, third and fourth

variable

> names(student_gi[2:4])

Selecting columns

Select/display

first two columns (studentID and

name )
> student_gi [1:2]

studentID
nam e
1
1 Popescu I. Vasile
2
2 Ianos W . Adriana
3
3 Kovacz V. Iosef
4
4 Babadag I. M aria
5
5
Pop P. Ion

or
> student_gi [, 1:2]

> student_gi [c("studentID", "name")]

(see on next slide)

Selecting columns (cont.)

Select/display

first two columns (studentID and

name ) other solutions

> student_gi [, c("studentID", "name")]

Using

a vector for storing indices of the first two

columns

> cols <- c("studentID", "name")

> student_gi[cols]

> student_gi[, names(student_gi) %in% cols]

Return

"final_grade" variable (column) as a vector

> student_gi$final_grade
[1] 9.00 9.45 9.75 9.00 6.00

or ... See on the next slide

Selecting columns (cont.)

Return

"final_grade" variable (column) as a vector

(cont.)
> student_gi[ , 6]

or
> student_gi[ , "final_grade"]

Return

"final_grade" variable (column) as a one-column

data frame
> student_gi[ , "final_grade", drop=FALSE]
fi
nal_grade
1
9.00
2
9.45
3
9.75
4
9.00
5
6.00

Selecting rows

Display

first two observations (rows)

> student_gi [1:2,]
studentID
nam e age scholarship
1
1 Popescu I. Vasile 23
Social
2
2 Ianos W . Adriana 19
Studiu1
lab_assessm ent fi
n al_grade
1
Bine
9.00
2 Foarte bine
9.45

Display

display observations 1, 2 and 5

> student_gi [c(1:2, 5),]
studentID
nam e age scholarship lab_assessm ent
fi
nal_grade
1
1 Popescu I. Vasile 23
Social
Bine
9.00
2
2 Ianos W . Adriana 19
Studiu1 Foarte bine
9.45
5
5
Pop P. Ion 31
Studiu1
Slab
6.00

attach function
attach

adds the data frame to the R search path

> search()
[1] ".G lobalEnv"
"tools:rstudio"
[3] "package:stats" "package:graphics"
[5] "package:grD evices" "package:utils"
[7] "package:datasets" "package:m ethods"
[9] "Autoloads"
"package:base"
When a variable name is encountered, data
frames in the search path are checked in order to
locate the variable.
Commands

without attach
> student_gi$final_grade
> table (student_gi$lab_assessment,
student_gi$final_grade)
> summary(student_gi$final_grade)

attach vs. with

The

>
>
>
>
>

same commands using attach

attach(student_gi)
final_grade
table (lab_assessment, final_grade)
summary(final_grade)
plot(age, final_grade)

detach

removes an objects from the search path

> detach(student_gi)
It

is advisable to use
> with (student_gi,
> with (student_gi,
final_grade))
> with (student_gi,
final_grade) )

with instead of attach:

final_grade)
table (lab_assessment,
plot(lab_assessment,

Case (row) identifiers

Act

like primary/unique keys in relational tables

Can be specified by rowname option within the
data.frame function
We allocate new values for studentID (to avoid
confusion with row numbers); the remaining
vectors are identical
> studentID <- c(1001, 1002, 1003, 1004,
1005)
> name <- c("Popescu I. Vasile",
+
"Ianos W. Adriana", "Kovacz V. Iosef",
+
"Babadag I. Maria", "Pop P. Ion")
> age <- c(23, 19, 21, 22, 31)
> scholarship <- c("Social", "Studiu1",
+
"Studiu2", "Merit", "Studiu1")
> lab_assessment <- c("Bine", "Foarte bine",
+
"Excelent", "Bine", "Slab")

Case (row) identifiers (cont.)

(slightly) new version of the data frame:

> student_gi <- data.frame(studentID, name,
age,
+
scholarship, lab_assessment,
+ final_grade, row.names = studentID)
studentID is the variable to use in labeling cases
on various printouts and graphics produced with
R.
display

the name of the rows (observations)

> rownames(student_gi)
[1] "1001" "1002" "1003" "1004" "1005"
> student_gi
studentID
nam e age scholarship lab_assessm ent
1001
1001 Popescu I. Vasile 23
Social
Bine
1002
1002 Ianos W . Adriana 19
Studiu1 Foarte bine
1003
1003 Kovacz V. Iosef 21
Studiu2
Excelent

Case (row) identifiers (cont.)

display

the name of the rows (observations)

> rownames(student_gi)
[1] "1001" "1002" "1003" "1004" "1005"
Notice

the leftmost column of the data frame

display
> student_gi
studentID
1001
1001
1002
1002
1003
1003
1004
1004
1005
1005

nam e age scholarship lab_assessm ent

Popescu I. Vasile 23
Social
Bine
Ianos W . Adriana 19
Studiu1 Foarte bine
Kovacz V. Iosef 21
Studiu2
Excelent
Babadag I. M aria 22
M erit
Bine
Pop P. Ion 31
Studiu1
Slab

fi
nal_grade
1001
9.00
1002
9.45
1003
9.75
1004
9.00

Case (row) identifiers (cont.)

Display

the observation (row) corresponding to

student Ianos W. Adriana using her case
identifier ("1002")
> student_gi["1002",]
studentID
nam e age scholarship lab_assessm ent
1002
1002 Ianos W . Adriana 19
Studiu1 Foarte bine
fi
nal_grade
1002
9.45

Display

the observations corresponding to

students Ianos W. Adriana and Pop P. Ion using
their case identifier ("1002" and "1005")
> student_gi[c("1002", "1005"),]
studentID
nam e age scholarship lab_assessm ent
1002
1002 Ianos W . Adriana 19
Studiu1 Foarte bine
1005
1005
Pop P. Ion 31
Studiu1
Slab
fi
nal_grade
1002
9.45
1005
6.0

Factors (reprise)
In

presentation 02a, variables were described as

nominal, ordinal, interval, and ratio
Nominal variables are categorical, without an
implied order. Examples: MaritalStatus, Sex, Job,
MasterProgramme
Ordinal variables imply order but not amount.
Examples: Status (poor, improved, excellent ),
LabAssessment (slab, bine, foarteBine, excelent)
Interval and Ratio variables can take on any
value within some range, and both order and
amount are implied. Examples: LitersPer100Km,
Height, Weight, FinalGrade (with decimals)
Categorical (nominal) and ordered categorical
(ordinal) variables are called factors.

Function factor
Factors

determine how data will be analyzed and

presented visually
The function factor() stores the categorical
values as a vector of integers in the range [1... k ]
(where k is the number of unique values in the
nominal variable), and an internal vector of
character strings (the original values) mapped to
these integers
Initially vector scholarship is a nominal variable
> scholarship <- c("Social", "Studiu1",
"Studiu2",
+
"Merit", "Studiu1")
Now

it will be converted into a factor:

> scholarship_f <- factor(scholarship)

> scholarship_f
[1] Social Studiu1 Studiu2 M erit Studiu1
Levels: M erit SocialStudiu1 Studiu2

Ordered factors
Another

ordinal variable
> lab_assessment <- c("Bine", "Foarte bine",
+
"Excelent", "Bine", "Slab")
Notice the way of dispaying
> lab_assessment
[1] "Bine"
"Foarte bine" "Excelent" "Bine"
[5] "Slab"
Now declare the vector as an ordered factor
> lab_assessment <- factor(lab_assessment,
+
order=TRUE, levels=c("Slab", "Bine",
+
"Foarte bine", "Excelent"))
Notice the new way of displaying the vector
> lab_assessment
[1] Bine
Foarte bine Excelent Bine
Slab
Levels: Slab < Bine < Foarte bine < Excelent

Factors in data frames

Re-create

the data frame using factors

> studentID <- c(1001, 1002, 1003, 1004, 1005)

> name <- c("Popescu I. Vasile", "Ianos W.
Adriana",
+
"Kovacz V. Iosef", "Babadag I. Maria",
+
"Pop P. Ion")
> age <- c(23, 19, 21, 22, 31)
> scholarship <- c("Social", "Studiu1",
"Studiu2",
+
"Merit", "Studiu1")
> scholarship <- factor(scholarship)
> lab_assessment <- c("Bine", "Foarte bine",
+
"Excelent", "Bine", "Slab")
> lab_assessment <- factor(lab_assessment,
+
order=TRUE, levels=c("Slab", "Bine",
+
"Foarte bine", "Excelent"))
> final_grade <- c(9, 9.45, 9.75, 9, 6)

Factors in data frames (cont.)

Another

version of the data frame

> student_gi <- data.frame(name, age,

scholarship,
+
lab_assessment, final_grade,
+
row.names = studentID)
Display

the structure of the data frame

> str(student_gi)
'data.fram e':5 obs.of 5 variables:
$ nam e
: Factor w / 5 levels "Babadag I.M aria",..: 5
2314
$ age
: num 23 19 21 22 31
$ scholarship : Factor w / 4 levels "M erit","Social",..: 2 3
413
$ lab_assessm ent: O rd.factor w / 4 levels
"Slab"< "Bine"< ..: 2 3 4 2 1
$ fi
n al_grade : num 9 9.45 9.75 9 6

Factors in data frames (cont.)

Basic

statistics about variables in data frame

> summary(student_gi)
nam e
age
scholarship
Babadag I.M aria :1 M in. :19.0 M erit :1
Ianos W .Adriana :1 1st Q u.:21.0 Social:1
Kovacz V.Iosef :1 M edian :22.0 Studiu1:2
Pop P.Ion
:1 M ean :23.2 Studiu2:1
Popescu I. Vasile:1 3rd Q u.:23.0
M ax. :31.0
lab_assessm ent fi
nal_grade
Slab
:1
M in. :6.00
Bine
:2
1st Q u.:9.00
Foarte bine:1
M edian :9.00
Excelent :1
M ean :8.64
3rd Q u.:9.45
M ax. :9.75

Factors and value labels

> patientID <- c(1, 2, 3, 4)
> age <- c(25, 34, 28, 52)
> diabetes <- c("Type1", "Type2", "Type1",
"Type1")
> status <- c("Poor", "Improved", "Excellent",
+
"Poor")
> diabetes <- factor(diabetes)
> status <- factor(status, order=TRUE)
> gender <- c(1, 2, 2, 1)
> patientdata <- data.frame(patientID, age,
+
diabetes, status, gender)
For

variable gender (coded 1 for males and 2 for

females) the value labels are declared with options
levels (indicating the values) and labels
(indicating the labels):

> patientdata$gender <-

Factors and value labels (cont.)

For

gender, labels (instead of of values) are displayed

> patientdata
patientID age diabetes status gender
1
1 25 Type1
Poor m ale
2
2 34 Type2 Im proved fem ale
3
3 28 Type1 Excellent fem ale
4
4 52 Type1
Poor m ale
Data

frame structure (see information about gender):

> str(patientdata)
'data.fram e':4 obs.of 5 variables:
$ patientID : num 1 2 3 4
$ age
: num 25 34 28 52
$ diabetes : Factor w / 2 levels "Type1","Type2": 1 2 1 1
$ status : O rd.factor w / 3 levels "Excellent"< "Im proved"< ..: 3
213
$ gender : Factor w / 2 levels "m ale","fem ale": 1 2 2 1

Lists
Lists

are the most complex of the R data types

A list is an ordered collection of objects
(components).
A list allows gathering a large variety of (possibly
unrelated) objects under one name.
A list can contain a combination of vectors,
matrices, data frames, and even other list
Created using list() function :
mylist <- list(object1, object2, )

where the objects are any of the structures seen so far

Optionally, the objects in a list can be named:
mylist <- list(name1=object1,
+
name2=object2, )

First example of list: POSIXlt variables

Variable

t gets the current system timestamp:

> t = Sys.time()

POSIXlt

objects are actually lists

> l.1 <- as.POSIXlt(t)

> l.1
[1] "2014-09-25 08:37:24 EEST"
> typeof(l.1)
[1] "list"
> names(l.1)
NULL
> unclass(l.1)
$sec
[1] 24.19267
$min
[1] 37
$hour
[1] 8
$mday
[1] 25
...

First example of list: POSIXlt variables (cont.)

Extract

list components values (seconds, minutes,

hours, ...) eqivalent to l.1$sec, l.1$min ...:

> l.1[[1]]
[1] 24.19267
> l.1[[2]]
[1] 37
> l.1[[3]]
[1] 8
> l.1[[4]]
[1] 25
...

Display

(horizontally) components of the timestamp

object
> unlist(l.1)
sec
min
24.19267 37.00000
wday
yday

hour
8.00000
isdst

mday
25.00000

mon
year
8.00000 114.00000

Matrices and lists

Matrix

dimension names (dimnames) object is a list

> m.3 <- matrix(cells, nrow=2, ncol=2,

+
byrow=FALSE,
+
dimnames=list(rownames, colnames))
> m.3
Col1 Col2
Row1
1
24
Row2
26
68
> dimnames(m.3)
[[1]]
[1] "Row1" "Row2"
[[2]]
[1] "Col1" "Col2"
> unlist(dimnames(m.3))
[1] "Row1" "Row2" "Col1" "Col2"

Creating and displaying simple lists

Create

two simple lists

> list.1 = list ("unu", "doi", "trei")
> list.2 = list( c("doi", "trei", "patru"))
Vizualizing

> list.1
[[1]]
[1] "unu"

[[2]]
[1] "doi"
[[3]]
[1] "trei"
> list.2

[[1]]

lists

Create a more complex list

list.3

contains two previous lists, a vector (sequence) and a data

frame:

> list.3 = list (list.1, list.2, 3:7, patientdata)

> list.3
[[1]]
[[1]][[1]]
[1] "unu"
[[1]][[2]]
[1] "doi"
[[1]][[3]]
[1] "trei"
[[2]]
[[2]][[1]]
[1] "doi"
"trei" "patru"
[[3]]
[1] 3 4 5 6 7
[[4]]
patientID age diabetes
status gender
1
1 25
Type1
Poor
male
2
2 34
Type2 Improved female
3
3 28
Type1 Excellent female
4
4 52
Type1
Poor
male

Create a more complex list (cont.)

Display

the structure of list.3:

> str(list.3)
List of 4
$ :List of 3
..$ : chr "unu"
..$ : chr "doi"
..$ : chr "trei"
$ :List of 1
..$ : chr [1:3] "doi" "trei" "patru"
$ : int [1:5] 3 4 5 6 7
$ :'data.frame': 4 obs. of 5 variables:
..$ patientID: num [1:4] 1 2 3 4
..$ age
: num [1:4] 25 34 28 52
..$ diabetes : Factor w/ 2 levels "Type1","Type2": 1 2 1 1
..$ status
: Ord.factor w/ 3 levels
"Excellent"<"Improved"<..: 3 2 1 3
..$ gender
: Factor w/ 2 levels "male","female": 1 2 2 1

Accessing list components

Display

the number of objects in a list

> length(list.3)
[1] 4

Access

the first object of the list

> list.3[[1]]
[[1]]
[1] "unu"

[[2]]
[1] "doi"
[[3]]
[1] "trei"
> class(list.3[[1]])
[1] "list"

Accessing list components (cont)

Access

the second component of the list

> list.3[[2]]
[[1]]
[1] "doi"
"trei" "patru"
> class(list.3[[2]])
[1] "list"
...

and the fourth component

> list.3[[4]]
patientID age diabetes
status gender
1
1 25
Type1
Poor
male
2
2 34
Type2 Improved female
3
3 28
Type1 Excellent female
4
4 52
Type1
Poor
male
> class(list.3[[4]])
[1] "data.frame"

List component attributes/names

Function

names display the names of

designated components of a list

The

first object of list.3 is a list whose

components have no name:

> names(list.3[[1]])
NULL
The

fourth object of list.3 is a data frame

called patientdata; this data frame have four
variables (columns) whose names can be
displayed with function names:

> names(list.3[[4]])
[1] "patientID" "age"
"gender"

"diabetes"

"status"

Accessing components within components

Display

the third object within the first component in list.3

> list.3[[1]][[3]]
[1] "trei"
Display, in the data

frame patientdata (the data frame is

the 4th component of the list) the values of column age (this
column is the 2nd of the data frame)

list.3[[4]][,

2]
[1] 25 34 28 52

Display

> list.3[[4]][, "age"]

age as a column (not a vector)

> list.3[[4]][, "age", drop=FALSE]

age
1 25
2 34
3 28
4 52

Display

age of the third patient

> list.3[[4]][, 2][3]

> list.3[[4]][, "age", drop=FALSE]$age[3]
[1] 28

Tables in R
Not

full-fledged data structure, but a sort of

labeled (named) arrays
Some functions (e.g. graphic functions,
categorical data analysis functions) accept
only tables as arguments
More about tables in script 06c
Two

main types of tables:

tables of frequencies counts number of occurences

for each value of a (usually) categorical variable
tables of proportions which divides number of
occurences of each value to total number of
occurences of a (usually) categorical variable

Uni-dimensional tables
Create

a table with frequencies of scholarship in data frame

student_gi
> table.1 <- with(student_gi, table(scholarship))
> table.1
scholarship
Merit Social Studiu1 Studiu2
1
1
2
1
Display structure of table.1
> str(table.1)
'table' int [1:4(1d)] 1 1 2 1
- attr(*, "dimnames")=List of 1
..$ scholarship: chr [1:4] "Merit" "Social" "Studiu1"
"Studiu2"
> class(table.1)
[1] "table"
Unidimensional

tables are vectors with labeled elements (each

element's label is a value of the attribute used in function table)
> names(table.1)
[1] "Merit"
"Social" "Studiu1" "Studiu2"

Access/display uni-dimensional tables

tables.1

is not a data frame, so we cannot qualify the variable using

$...
> table.1$Merit
Error in table.1$Merit : $ operator is invalid for atomic vectors
...

but we can access with vector indices

> table.1[1]
Merit
1
...

or list indices

> table.1[[1]]
[1] 1
Display

both label and the of the 3rd element in table table.1:

> table.1[3]
Studiu1
2
...

or
> unlist(table.1)[3]
Studiu1
2

Access/display uni-dimensional tables (cont.)

Display

only the label of the 3rd element of the table table.1:

> names(table.1) [3]

[1] "Studiu1"
Display

only the value of the 3rd element in table.1:

> unlist(table.1)[[3]]
[1] 2
Display

3rd elements' both name and value by the name:

> table.1["Studiu1"]
Studiu1
2
Display

both names and values of two elements by their

names:
> table.1[c("Merit", "Studiu1")]
scholarship
Merit Studiu1
1
2

Bi-dimensional tables
Similar

to pivot tables in Excel

Create

a contingency (pivot) table with frequencies of

scholarship by lab_assessment

> table.2 <- with(student_gi, table(scholarship, lab_assessment))

> table.2
lab_assessm ent
scholarship Slab Bine Foarte bine Excelent
M erit
0 1
0
0
Social 0 1
0
0
Studiu1 1 0
1
0
Studiu2 0 0
0
1
Structure

of table.2

> str(table.2)
'table'int [1:4,1:4] 0 0 1 0 1 1 0 0 0 0 ...
- attr(*,"dim nam es")= List of 2
..$ scholarship : chr [1:4] "M erit" "Social" "Studiu1" "Studiu2"
..$ lab_assessm ent: chr [1:4] "Slab" "Bine" "Foarte bine" "Excelent"
> class(table.2)
[1] "table"

Accessing bi-dimensional tables

Any

cell can be accessed using indices of row and column...

> table.2[1,2]

[1] 1
...

or the names/labels
> table.2["Merit", "Bine"]
[1] 1
Display

the second column (associated with value Bine of

lab_assessment) as a vector using the index (2)...
> table.2[, 2]
M erit SocialStudiu1 Studiu2
1

...

or the name of the column (Bine)

> table.2[, "Bine"]
M erit SocialStudiu1 Studiu2
1

Accessing bi-dimensional tables (cont.)

Similarly,

Access

one can access individual (or group of) rows

particular rows and columns in a table

> table.2[c("Merit", "Studiu1"), c("Slab", "Excelent")]

lab_assessm ent
scholarship Slab Excelent
M erit
Studiu1

0
1

Tri-dimensional tables
Create

a three-dimensional table with frequencies of scholarship by

lab_assessment by final_grade

> table.3 <- with(student_gi, table(scholarship, lab_assessment,

final_grade))
Display

table.3

> table.3
,,fi
nal_grade = 6
lab_assessm ent
scholarship Slab Bine Foarte bine Excelent
M erit
0 0
0
0
Social 0 0
0
0
Studiu1 1 0
0
0
Studiu2 0 0
0
0
,,fi
nal_grade = 9
lab_assessm ent
scholarship Slab Bine Foarte bine Excelent
M erit
0 1
0
0
Social 0 1
0
0
Studiu1 0 0
0
0
Studiu2 0 0
0
0

Tri-dimensional tables (cont.)

Display

table.3 (cont.)

, , fi
n al_grade = 9.45
lab_assessm ent
scholarship Slab Bine Foarte bine Excelent
M erit
0 0
0
0
Social 0 0
0
0
Studiu1 0 0
1
0
Studiu2 0 0
0
0
, , fi
n al_grade = 9.75
lab_assessm ent
scholarship Slab Bine Foarte bine Excelent
M erit
0 0
0
0
Social 0 0
0
0
Studiu1 0 0
0
0
Studiu2 0 0
0
1

ftable
ftable

improves the display of three-dimensional tables

> ftable(table.3)
fi
n al_grade 6 9 9.45 9.75
scholarship lab_assessm ent
M erit
Slab
00 0 0
Bine
01 0 0
Foarte bine
00 0 0
Excelent
00 0 0
Social
Slab
00 0 0
Bine
01 0 0
Foarte bine
00 0 0
Excelent
00 0 0
Studiu1
Slab
10 0 0
Bine
00 0 0
Foarte bine
00 1 0
Excelent
00 0 0
Studiu2
Slab
00 0 0
Bine
00 0 0
Foarte bine
00 0 0
Excelent
00 0 1

Accessing three-dimensional tables

Any

cell can be accessed using indices of the three axes...

> table.3[3, 3, 3]
[1] 1
...

or the names/labels

> table.3["Studiu2", "Excelent", "9.75"]

[1] 1
Display,

as an one-dimensional table, the values of the

lab_assessment which corespond to value Studiu2 (4th) of
scholarship and the value 9.75 (4th) of final_grade

one can use the indexes ...

> table.3[4, , 4]
Slab
0

Bine Foarte bine Excelent

0
0
1

... or the label/names

> table.3[ "Studiu2", , "9.75" ]
Slab
0

Bine Foarte bine Excelent

0
0
1

Accessing three-dimensional tables (cont.)

Display,

as a bi-dimensional table, the values of the first

(scholarship) and the third (final_grade) axes associated with
the 4th value (Excelent) of the second axis (lab_assessment)

one can use the index...

> table.3[, 4, ]
fi
nal_grade
scholarship 6 9 9.45 9.75
M erit 0 0 0 0
Social 0 0 0 0
Studiu1 0 0 0 0
Studiu2 0 0 0 1

... or the label/name

> table.3[, "Excelent", ]
fi
nal_grade
scholarship 6 9 9.45 9.75
M erit 0 0 0 0
Social 0 0 0 0
Studiu1 0 0 0 0
Studiu2 0 0 0 1

Accessing three-dimensional tables (cont.)

One

can access particular ranges on each axis

> table.3[c("Merit", "Studiu1"), c("Slab",

"Excelent"), c("9.45", "9.75") ]
, , fi
n al_grade = 9.45
lab_assessm ent
scholarship Slab Excelent
M erit
0
0
Studiu1 0
0
, , fi
n al_grade = 9.75
lab_assessm ent
scholarship Slab Excelent
M erit
0
0
Studiu1 0
0

Built-in datasets

Some

datasets are available in base (core) R (e.g. faithful)

> head(faithful, 3)
eruptions w aiting
1
3.600
79
2
1.800
54
3
3.333
74

Most

data sets are available in packages (e.g. ggplot2, vcd,

...)

most cases, data sets are stored as data frames, e.g.

the dataset movies from package ggplot2

Every

package must be installed (once per computer)

> install.packages("ggplot2")

After

installation, a package must be loaded (once for

every RStudio session)
> library(ggplot2)

Built-in datasets (cont.)

Display

the structure of dataset movies

> str(movies)
'data.fram e':58788 obs. of 24 variables:
$ title
: chr "$" "$1000 a Touchdow n" "$21 a D ay O nce a
M onth" "$40,000" ...
$ year
: int 1971 1939 1941 1996 1975 2000 2002
2002 1987 1917 ...
$ length
: int 121 71 7 70 71 91 93 25 97 61 ...
$ budget : int N A N A N A N A N A N A N A N A N A N A ...
$ rating
: num 6.4 6 8.2 8.2 3.4 4.3 5.3 6.7 6.6 6 ...
$ votes
: int 348 20 5 6 17 45 200 24 18 51 ...
$ r1
: num 4.5 0 0 14.5 24.5 4.5 4.5 4.5 4.5 4.5 ...
$ r2
: num 4.5 14.5 0 0 4.5 4.5 0 4.5 4.5 0 ...
$ r3
: num 4.5 4.5 0 0 0 4.5 4.5 4.5 4.5 4.5 ...
...

Built-in dataset stored as table

Data

set HairEyeColor in package vcd is stored as

three-dimensional table (http://cran.us.rproject.org/w eb/packages/vcdExtra/vignettes/vcdtutorial.pdf)
> install.packages("vcd")
> library(vcd)
> head(HairEyeColor)
[1] 32 53 10 3 11 50
> str(HairEyeColor)
table [1:4, 1:4, 1:2] 32 53 10 3 11 50 10 30 10 25 ...
- attr(*, "dim nam es")= List of 3
..$ H air: chr [1:4] "Black" "Brow n" "Red" "Blond"
..$ Eye : chr [1:4] "Brow n" "Blue" "H azel" "G reen"
..$ Sex : chr [1:2] "M ale" "Fem ale"
> class(HairEyeColor)
[1] "table"

Package datasets

has a special package called datasets

> library(datasets)

function

data displays all the datasets in this package

> data()

Visualize

all the data sets available in all packages:

> data(package = .packages(all.available =
TRUE))

Display the datasets available in package ggplot2

> try(data(package = "ggplot2") )

...or

> data(package = "ggplot2")$results

list (made in 2012) of all datasets in R is available at

http://www.public.iastate.edu/~hofmann/data_in_r_sor

Data structures conversion

Not

all conversions from an object (of a data type) into

another object (of another data type) are possible

Generally,

function as.data.frame converts any other

data type object into a a data frame

Ex:

convert a vector into a data frame

> a_vector
[1] 2 3 4 8 9 10 11 12 13 14
> v_to_df.1 <- as.data.frame(a_vector)
> v_to_df.1
a_vector
1
2
2
3
3
4
...

Data structures conversion (cont.)

Convert

matrix m.4 into a data frame

> m_to_df.1 <- as.data.frame(m.4)
> m_to_df.1
col.1 col.2 col.3 col.total
row.1
1
2
3
6
row.2
4
5
6
15
row.3
7
8
9
24
row.4
10
11
12
33
row.total
22
26
30
78
> str(m_to_df.1)
'data.frame': 5 obs. of 4 variables:
$ col.1
: num 1 4 7 10 22
$ col.2
: num 2 5 8 11 26
$ col.3
: num 3 6 9 12 30
$ col.total: num 6 15 24 33 78

Data structures conversion (cont.)

Convert

a table into a data frame

> table_to_dataframe =
data.frame(unlist(HairEyeColor))
> head(table_to_dataframe, 3)
Hair
Eye Sex Freq
1 Black Brown Male
32
2 Brown Brown Male
53
3 Red Brown Male
10

Convert

>
+
>
1
2
3

a list into a data frame

df <- data.frame(matrix(unlist(list.1), nrow=132,
byrow=T))
head(df,3)
matrix.unlist.list.1...nrow...132..byrow...T.
unu
doi
trei

Programming Python: Computation Using
No ratings yet
Programming Python: Computation Using
6 pages
Performance Analyst Interview Prep
No ratings yet
Performance Analyst Interview Prep
2 pages
FinalPaper SalesPredictionModelforBigMart
No ratings yet
FinalPaper SalesPredictionModelforBigMart
14 pages
Software Engineering Term Paper On Function Oriented Design
100% (2)
Software Engineering Term Paper On Function Oriented Design
16 pages
Archaeological Site Detection: The Importance of Contrast
No ratings yet
Archaeological Site Detection: The Importance of Contrast
6 pages
Genetic Algorithms in Java Basics-2 PDF
No ratings yet
Genetic Algorithms in Java Basics-2 PDF
2 pages
Statistics For Business Analysis: Learning Objectives
No ratings yet
Statistics For Business Analysis: Learning Objectives
37 pages
Introduction To Data Mining
100% (1)
Introduction To Data Mining
643 pages
Data + Design
100% (1)
Data + Design
299 pages
Overview of Design Patterns in Programming
No ratings yet
Overview of Design Patterns in Programming
17 pages
An Introduction To WEKA: Contributed by Yizhou Sun 2008
No ratings yet
An Introduction To WEKA: Contributed by Yizhou Sun 2008
85 pages
SAS Retail Analytics Guide
No ratings yet
SAS Retail Analytics Guide
19 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
187 pages
CODE201911 Practices DataVisualizations
No ratings yet
CODE201911 Practices DataVisualizations
19 pages
Data Science Life Cycle Sheet
No ratings yet
Data Science Life Cycle Sheet
191 pages
Data Visualization with R
100% (1)
Data Visualization with R
18 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
46 pages
UNIVAC 9400: High-Performance Tape and Disc System
No ratings yet
UNIVAC 9400: High-Performance Tape and Disc System
22 pages
Field List Icons in Power BI Desktop - POWER BI WITH PRASAD
No ratings yet
Field List Icons in Power BI Desktop - POWER BI WITH PRASAD
3 pages
Rapidminer 4.6 Tutorial
100% (1)
Rapidminer 4.6 Tutorial
695 pages
Algorithm Design and Analysis Basics
No ratings yet
Algorithm Design and Analysis Basics
76 pages
Practical R Programming Guide
No ratings yet
Practical R Programming Guide
103 pages
Face Recognition in The Browser With Tensorflow - Js & JavaScript
No ratings yet
Face Recognition in The Browser With Tensorflow - Js & JavaScript
13 pages
3 - Big Data Insight V.2019 PDF
No ratings yet
3 - Big Data Insight V.2019 PDF
28 pages
D3 Tips and Tricks PDF
No ratings yet
D3 Tips and Tricks PDF
497 pages
Dawn Griffiths - Excel Cookbook - Recipes For Mastering Microsoft Excel-O'Reilly Media (2024)
No ratings yet
Dawn Griffiths - Excel Cookbook - Recipes For Mastering Microsoft Excel-O'Reilly Media (2024)
75 pages
Bpy - Py - 25109-E-Commerce Fraud Detection Based On Machine Learning Techniques Systematic Literature Review
No ratings yet
Bpy - Py - 25109-E-Commerce Fraud Detection Based On Machine Learning Techniques Systematic Literature Review
107 pages
Data Analysis Expressions - DAX - Reference (1) (001-318) (100-200)
No ratings yet
Data Analysis Expressions - DAX - Reference (1) (001-318) (100-200)
101 pages
C# Chart - Windows Forms
No ratings yet
C# Chart - Windows Forms
5 pages
Python for Twitter Data Analysis
No ratings yet
Python for Twitter Data Analysis
21 pages
Lecture 12 Distance Metrics Different Distance Metrics in Machine Learning
No ratings yet
Lecture 12 Distance Metrics Different Distance Metrics in Machine Learning
12 pages
Process Mining: Overview and Opportunities: ACM Reference Format
No ratings yet
Process Mining: Overview and Opportunities: ACM Reference Format
16 pages
Lesson 8 - Creating Dashboards in Tableau 1
No ratings yet
Lesson 8 - Creating Dashboards in Tableau 1
50 pages
NLTK Text Analysis Cheatsheet
No ratings yet
NLTK Text Analysis Cheatsheet
3 pages
Know Ur Database-Clarity Technical Reference Guide PDF
No ratings yet
Know Ur Database-Clarity Technical Reference Guide PDF
484 pages
Sogdian Verb Declension Guide
No ratings yet
Sogdian Verb Declension Guide
14 pages
d3 T and T v6
No ratings yet
d3 T and T v6
411 pages
Q-Step WS 06112019 Data Analysis and Visualisation With Python
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
76 pages
C Programming Algorithm Guide
No ratings yet
C Programming Algorithm Guide
29 pages
Web Engineering Solve Updated PDF
No ratings yet
Web Engineering Solve Updated PDF
77 pages
Microsoft Certified Data Analyst Associate Skills Measured
No ratings yet
Microsoft Certified Data Analyst Associate Skills Measured
4 pages
Building An Effective Data Science Practice
No ratings yet
Building An Effective Data Science Practice
22 pages
CDC Python Learning Hierarchy
No ratings yet
CDC Python Learning Hierarchy
3 pages
The Use of ChatGpt and Teamwork
No ratings yet
The Use of ChatGpt and Teamwork
10 pages
Data Visualization
No ratings yet
Data Visualization
24 pages
Python
No ratings yet
Python
27 pages
NumPy Functions Cheatsheet
No ratings yet
NumPy Functions Cheatsheet
6 pages
BANA6037-Data Visualization-18FS 001 and 003
No ratings yet
BANA6037-Data Visualization-18FS 001 and 003
10 pages
Data Types in R (Vectors)
No ratings yet
Data Types in R (Vectors)
48 pages
MDPN460 Lecture03
No ratings yet
MDPN460 Lecture03
34 pages
R Studio
No ratings yet
R Studio
8 pages
ATA Tructures IN: Pavan Kumar A Senior Project Engineer Big Data Analytics Team Cdac-Kp
No ratings yet
ATA Tructures IN: Pavan Kumar A Senior Project Engineer Big Data Analytics Team Cdac-Kp
32 pages
Vectors and Lists in R
No ratings yet
Vectors and Lists in R
9 pages
R Data Structures Guide
No ratings yet
R Data Structures Guide
35 pages
IDS Notes Unit 3
No ratings yet
IDS Notes Unit 3
14 pages
R Unit2
No ratings yet
R Unit2
58 pages
A Crash Course in R - Intro To Statistical Programming
No ratings yet
A Crash Course in R - Intro To Statistical Programming
53 pages
Introduction To R
No ratings yet
Introduction To R
21 pages
Introduction To R Chap 2
No ratings yet
Introduction To R Chap 2
30 pages
Biostat S1 Handout
No ratings yet
Biostat S1 Handout
7 pages
Registru de Casa S.C. Cantemir S.A.
No ratings yet
Registru de Casa S.C. Cantemir S.A.
1 page
Business English Answer Key
No ratings yet
Business English Answer Key
3 pages
Case Study The Futures Unwritten
No ratings yet
Case Study The Futures Unwritten
1 page
Comanda de L Client
No ratings yet
Comanda de L Client
1 page
Do The Right Thing: in Company Upper-Intermediate - Second Edition Answer Key: Unit 3
No ratings yet
Do The Right Thing: in Company Upper-Intermediate - Second Edition Answer Key: Unit 3
1 page
Promoting Your Ideas 8
100% (1)
Promoting Your Ideas 8
7 pages
E-mailing Strategies for Business Communication
No ratings yet
E-mailing Strategies for Business Communication
6 pages
BIBLIOGRAFIE
No ratings yet
BIBLIOGRAFIE
1 page
BA Opgave
No ratings yet
BA Opgave
32 pages
Data Analysis & Data Science With R
No ratings yet
Data Analysis & Data Science With R
6 pages
R Data Input/Output Techniques
No ratings yet
R Data Input/Output Techniques
43 pages
The Statoil Book
100% (2)
The Statoil Book
77 pages
Student Grades Overview Report
No ratings yet
Student Grades Overview Report
1 page
Details of Test Procedure For Volumetric Shrinkage Characteristics Curve - Download Scientific Diagram
No ratings yet
Details of Test Procedure For Volumetric Shrinkage Characteristics Curve - Download Scientific Diagram
2 pages
Race and Circular Motion
No ratings yet
Race and Circular Motion
10 pages
CM BCV301
No ratings yet
CM BCV301
4 pages
Antenna Array Simulation by Groups: CST Studio Suite
100% (1)
Antenna Array Simulation by Groups: CST Studio Suite
16 pages
Multiple Choice Questions on Atomic Structure
No ratings yet
Multiple Choice Questions on Atomic Structure
5 pages
Coriolis Effect on Artillery Projectiles
0% (1)
Coriolis Effect on Artillery Projectiles
3 pages
Rocks and Minerals Unit Plan
No ratings yet
Rocks and Minerals Unit Plan
16 pages
Excellent Corrosion-Resistant Zn-Al-Mg-Si Alloy Hot-Dip Galvanized Steel Sheet "SUPER DYMA"
No ratings yet
Excellent Corrosion-Resistant Zn-Al-Mg-Si Alloy Hot-Dip Galvanized Steel Sheet "SUPER DYMA"
3 pages
DBR Mega
No ratings yet
DBR Mega
25 pages
Footing - 24.09.2020
No ratings yet
Footing - 24.09.2020
10 pages
Lightening and Series Kittler-63-74
No ratings yet
Lightening and Series Kittler-63-74
12 pages
Concept Check Qa Ch07-21
No ratings yet
Concept Check Qa Ch07-21
106 pages
Mavic 2 Enterprise: User Manual
No ratings yet
Mavic 2 Enterprise: User Manual
58 pages
Plastic Limit
No ratings yet
Plastic Limit
4 pages
Shear Design in Reinforced Concrete
No ratings yet
Shear Design in Reinforced Concrete
14 pages
Lever Mechanisms Handbook
No ratings yet
Lever Mechanisms Handbook
632 pages
Properties of Fresh and Hardened Concrete
No ratings yet
Properties of Fresh and Hardened Concrete
18 pages
Comprehensive Mechanics 1st Edition A.K. Gupta Instant Download
100% (1)
Comprehensive Mechanics 1st Edition A.K. Gupta Instant Download
59 pages
Kirchhoff's Laws Circuit Analysis
No ratings yet
Kirchhoff's Laws Circuit Analysis
13 pages
74HC14D
No ratings yet
74HC14D
8 pages
PH Optimo de Invertasa
No ratings yet
PH Optimo de Invertasa
11 pages
4th Grade CCSS Math Vocabulary Word List
No ratings yet
4th Grade CCSS Math Vocabulary Word List
10 pages
Power Screw Design Guide
No ratings yet
Power Screw Design Guide
37 pages
Signal & Systems
No ratings yet
Signal & Systems
8 pages
Calibration Methods for Analytical Chemistry
100% (1)
Calibration Methods for Analytical Chemistry
19 pages
Engineering Numerical Methods
No ratings yet
Engineering Numerical Methods
18 pages
Albino Javier Rodríguez
No ratings yet
Albino Javier Rodríguez
3 pages
Miggy and Railtrac - PT
No ratings yet
Miggy and Railtrac - PT
33 pages
Class X - Maths-Pairs of Linear Equations in Two Varibles-Aecs2 Mumbai
No ratings yet
Class X - Maths-Pairs of Linear Equations in Two Varibles-Aecs2 Mumbai
3 pages
Induction Motor Testing Guide
No ratings yet
Induction Motor Testing Guide
8 pages