Midsem Notes
Basic Fundamentals
R is an environment for data manipulation, statistical computing, graphic display
and data analysis. Effective data handling and storage of output is possible.
Simple as well as complicated calcu possible.
Introduction to R: R is a programming language and environment designed for
statistical computing and graphics. Developed by Ross Ihaka and Robert
Gentleman in the 1990s, it's widely used in data analysis, statistical modeling,
and visualization.
History and Applications: R has its roots in the S language created at Bell
Laboratories. It's used in various fields such as academia, healthcare, and
finance for data analysis and visualization.
Key Features
Comprehensive collection of statistical tools.
Rich graphical capabilities.
Extensible through packages.
Community support and active development.
Advantages of R
Open-source and free to use.
Wide variety of libraries and packages for data analysis and visualization.
Active community and extensive online resources.
Platform-independent (works on Windows, macOS, and Linux).
Excellent for statistical analysis and data manipulation.
Integration with other programming languages like Python and C++
R is an interpreted computer language.
Midsem Notes 1
Installation and Use of Software
Downloading R: You can download R from the Comprehensive R Archive
Network (CRAN) at https://cran.r-project.org/. Follow the instructions on the
website to install R for your operating system (Windows, macOS, Linux).
Installing RStudio: RStudio is an integrated development environment (IDE) for
R. It provides a user-friendly interface for writing and executing R code.
Download RStudio from https://rstudio.com/.
Midsem Notes 2
Midsem Notes 3
Creating a vector
Midsem Notes 4
Vector entries can be calculations or previously stored items including vectors
themselves
myvec←c(1,3,4,2,4)
myvec
foo<-32.1
newvec<-(1,2,3,4,foo)
myvec3<-(myvec,newvec)
Slicing
3:27
foo=5.3
bar=foo:(-47+1.5)
bar
Sequences with seq
seq(from=3,to=27,by=3)
seq(from=3,to=27,length.out=40)
length.out means the gaps it will form…which means here 39 gaps will be formed
Midsem Notes 5
Repetition with rep
rep(x=1,times=4)
x=c((3,4,2),times=3)
#342342342
rep(x,each=2)
#3 3 4 4 2 2
rep(x,each=2,times=3)
Sorting with sort
sort(x=c(2,4,5,19,33),decreasing=FALSE)
#for ascending order
foo=seq(from=4.3,to=5.5,length.out=8)
bar=sort(x=foo,decreasing=TRUE)
Vector length
length(x=c(1,2,3,4))
Subsetting and Element Extraction
myvec<-c(5,-2,3,4,4,4,-8)
length(x=myvec)
#7
myvec[1]
#5
foo<-myvec[2]
foo
#-2
Midsem Notes 6
myvec[length(myvec)]
#-8
Command line Vs Script
Execution of commands in R is not menu driven.
we need to type the commands
single line and multi lines commands are possible to write
when writing multi line program it is useful to use a text editor rather than
executing everythn directly at the command line.
there are 2 options-
1.One may use R’s own built in editor. it is accessible from R gui menu bar.
2. Use R studio software.
Introduction to R studio
it is an interface b/w R and us.
it is more useful for beginners.
it makes coding easier.
there r 4 windows in R studio
Window 1- script selection, Win 2- console (calculation takes place here),
Win 3- Environment window ( all the variables and objects used
in the prog appear here), Win 4- Output window (output appears here).
Cleaning up the windows
We assign names to variables when analysing any data.
It is good practice to remove variable names given to any dataframe at the end
of each session in R
“rm()” to remove variables (environment gets empty on removing using rm)
Midsem Notes 7
“detach()” command - it removes it from the search path from the available R
object.
It is usually a dataframe which has been attached or a package that was
attached by the library.
To get rid of everything including dataframes, type- rm(list=ls())
x=3
y=4
rm(x,y)
To get rid of everythn including data frames, type rm(list=ls())
library(splines) #loads the packages splines
detach(package:splines) #detaches the package splines
To plot the Histograms
hist(v, main, xlab, xlim, ylim, breaks, col, border)
v: Numerical values used in the histogram.
main: Title of the chart.
xlab: Label for the horizontal axis.
xlim: Range of values for the x-axis.
ylim: Range of values for the y-axis.
breaks: Width of each bar.
col: Color of the bars.
border: Border color of each bar
R software commands
Midsem Notes 8
1. Installing packages:
install.packages("package_name")
1. HOW TO USE IT AS CALCULATOR
2. R software is case sensitive , hence capitals and small letters are different
3. is multiplication, ** is power
4. EXPONENTIAL OF VECTOR:
a. > c(2,3,4,5)^2
b. [1] 4 9 16 25
c. > c(2,3,4,5)**2
d. [1] 4 9 16 25
5. ONE VECTOR TO THE POWER OF ANOTHER:
a. > c(2,3,5,7)^c(2,3)
b. [1] 4 27 25 343
6. In the above concept, the power vector length needs to be a multiple of the
length of the base vector otherwise, it will give the
7. On multiplying two vectors, each and every element is multiplied by its own
respective position
a. > c(2,3,45,90)*c(2,3)
b. [1] 4 9 90 270
8. > c(2,3,45,90)*c(2,3,1,4)
9. [1] 4 9 45 360
10. When the multiplying vector should be of length of multiple of the first
otherwise it would give a warning but still generate the output:
a. > c(2,3,4,5)*c(2,3)
b. [1] 4 9 90
c. Warning message:
Midsem Notes 9
d. In c(2, 3, 45) * c(2, 3) :
e. longer object length is not a multiple of shorter object length
Division
Integer Division
division in which the fractional part (remainder) is discarded.
c(2,3,4,5,7)%/%c(2,3)
#2/2 , 3/3, 5/2, 7/3
#1,1,2,2
Modulo Division
modulo division finds the remainder after the division of one num by the other.
c(2,3,5,7)%%2
#0,1,1,1,1
Maximum:max
max(1,2,3,-7)
Inbuilt fxns
abs() absolute value
sqrt() square root
round(),floor(),Ceiling() Rounding,up and down
sum(),prod() sum nd product
log(),log10(),log2() exponential fxn
Midsem Notes 10
sin(),cos(),tan(),acos(),asin() trigno fxns
sinh(),cosh(),tanh(),asin(),acosh() hyperbolic trigno fxn
mode(x) gives the datatype of x
Functions
Fxns are a bunch of commands grouped together in a sensible unit. Fxns take
input arguments ,do calculations and produce some output and return result in a
variable. The return variable can be of complex construct, like a list.
Syntax
Name<-function(Arg1,Arg2)
{
expression
}
#where syntax is a single command or a grp of command.
Fxns arguments can be given a meaningful name.
fxn args can be set to default values
Fxns can have the special args ‘__’
Matrix
in R a 4x2 matrix can be created with the following command.
x<-matrix(nrow=4,ncol=2,data=c(1,2,3,4,5,6,7,8))
byrow=TRUE can be used to fill the data rowise.
Row and col binding
Midsem Notes 11
rbind(1:3,4:6)
#1 2 3
#4 5 6
cbind(c(1,4),c(2,5),c(3,6))
The rbind function, short for row-bind, can be used to combine vectors, matrices
and data frames by rows. The cbind function, short for column-bind, can be used
to combine vectors, matrices and data frames by columns.
dim(mat)—>dimensions of mat
nrow(mat)—> num of rows
ncol(mat)—> num of cols
dim(mat)[2]—> [1] —>means no. of rows, [2]—>num of col…( it is upto two places
only,,1,2)
diag(x=mat)—>print the diagonal matrix
Omitting and Overwriting
a[,-2] #deleting second col
#2nd coldeleted
a[-1,3:2] #first row deleted
#-1 is the row deleted, 3:2 wil be col swaped
a[-1,-2] #1st row and 2nd col will be deleted
#pehle comma (,1) is col….baad m comma(1,) is row
Midsem Notes 12
To overwrite
B[-2]<-1:3
B[c(1,3),2]<-900 #row 1 and 3 mein 2nd el ko 900 krdega
B[,3]<-B[3,] #to assign the values of col 3 to row 3
diag(x=B)<-rep(x=0,lines3) #rep means repeat
a<-diag(x=3) #prints identity matrix
Matrix Transpose
Write a program to print the transpose of a matrix
#Use t(a)
#Use t(t(a)) to show that the transpose is equal to the original matrix.
Scalar Multiplication of matrix
Midsem Notes 13
x<-matrix(nrow=2,ncol=2,data=c(1,2,3,4)
x*5
Operations on two matrices
addition and subtraction happens on each elements
A%*%B to do proper multiplication of both matrix ( row =col)
a<-rbind(c(2,5,2),c(6,1,4))
a
dim(a)
#2 3
b<-cbind(c(3,-1,1),c(-3,1,5))
dim(b)
#3 2
a%*%b
#3 9
# 21 3
Inverse of a matrix
find the inverse of matrix thru solve(a) fxn and verify by multiplying inverse with
the original matrix and get an identity matrix.
Multidimensional Array
Just as a matrix is considered to be a collection of vectors of equal length, a 3d
array can be considered to be a collection of equally dimensioned matrices.
arr<-array(data=1:24,dim=c(3,4,2))
arr
Midsem Notes 14
arr<-array(data=rep(1:24,times=3),dim=c(3,4,2,3))
arr
Midsem Notes 15
Subsets, Extraction, Replacements
arr[a,b,c] where a is row, b is col, c is layer.
Midsem Notes 16
Midsem Notes 17
3. Data Editing
Use the data.frame or matrix structure to store tabular data.
Example:
data <- data.frame(Name = c("Alice", "Bob"), Age = c(25, 30))
print(data)
Use edit(data) to edit data interactively.
4. Using R as a Calculator
Basic arithmetic:
5 + 3 # Addition
5 - 3 # Subtraction
5 * 3 # Multiplication
5 / 3 # Division
5 ^ 3 # Exponentiation
Functions for math operations:
sqrt(16) # Square root
log(10) # Natural logarithm
log10(100) # Base-10 logarithm
5. Functions and Assignments
Midsem Notes 18
Defining Functions
Creating reusable blocks of code (functions)
add <- function(a, b) {
return(a + b)
}
result <- add(5, 3)
print(result)
Assignments
Use <- or = to assign values: assigning values to variables for computation
and data manipulation.
x <- 10
y = 20
z <- x + y
print(z)
6. R Packages
Collections of pre-built functions and datasets in R that extend its functionality for
specialized tasks
Extend R’s capabilities with packages.
To install a package:
install.packages("ggplot2")
Load a package:
library(ggplot2)
Check installed packages:
Midsem Notes 19
installed.packages()
7. Expressions, Objects, Symbols, and Functions
Expressions
Any syntactically valid collection of R code. Any valid combination of code that
produces a result
x <- 5 + 3 # Expression
Objects
Everything in R is an object (e.g., vectors, matrices, lists).
Data entities in R, such as vectors or lists, that store information.
Create objects using assignments.
my_vector <- c(1, 2, 3)
Symbols
Names assigned to objects.
x <- 10 # 'x' is a symbol for the value 10
Functions
Built-in or user-defined reusable operations that perform specific tasks.
mean(c(1, 2, 3)) # Built-in function
8. Special Values
Midsem Notes 20
Special Constants
Unique constants in R (e.g.,
NA , NULL , Inf , NaN ) that represent missing data, absence of value, or undefined
operations.
NA : Missing value.
NULL : Absence of value.
Inf : Infinity (e.g., 1/0).
NaN : Not a Number (e.g., 0/0).
Examples
x <- c(1, 2, NA, 4)
mean(x, na.rm = TRUE) # Remove NA for calculations
Midsem Notes 21