22bce1859 Rprogramming
22bce1859 Rprogramming
22BCE1859
LAB-1
Date: 05.01.2024
Aim:
To import data from excel files using R studio code
Code:
read.csv(“C:/Users/student.MAT33/Documents/lab2prb.csv”)
Procedure:
Step 1: Create a table in excel file with required data
Step 2: Save the file
Step 3: Open R studio
Step 4: Export the file using syntax read.csv(‘ ‘)
Step 5: Copy and paste the path of the excel file in R studio
Step 6: Run the code
Input window:
Output:
LAB-2
Date: 12.01.2024
Aim:
To perform different alignment of data set and various graphical representation in R studio.
Procedure:
Step 1: Create and arrange various data vectors using R function
Step 2: Create a data frame to print those vectors
Step 3: Label the characters to numeric using label() syntax
Step 4: Extract specific data using subset()
Step 5: Summary the statistics for required data
Step 6: Create separate table for required data using table() syntax
Step 7: Visualize the data sets using graphical representation
Step 8: Run the code
Code:
r=c(1166,1120,1807,1609,1809,1377,1443,1859,5115,5014)
n=c('Udhay','Amogh','Abish','Ramy','Jyothi','Poope','Josh','Srikar','
Latha','Renu')
a=c(18,21,19,20,19,23,18,20,22,21)
d=c(0,1,0,0,2,0,1,3,4,2)
h_d=c(0,1,1,1,0,0,1,0,1,1)
g=c(1,0,0,0,1,1,0,1,1,0)
s_info=data.frame(r,n,a,d,h_d,g)
s_info$g=factor(s_info$g,labels=c('male','female'))
s_info$h_d=factor(s_info$h_d,labels=c('Hosteler','Day scholor'))
s_info$d=factor(s_info$d,labels=c('CSE','MECH','CIVIL','EEE','ECE
'))
f_s=subset(s_info,s_info$g=='female')
m_s=subset(s_info,s_info$g=='male')
c_s=subset(s_info,s_info$d=='CSE')
summary(s_info)
f_s
m_s
c_s
s_info
x=table(s_info$g,s_info$h_d)
x
plot(s_info$a,type='l',main='Age of
students',xlab='age',ylab='Reg no',col='green')
pie(x)
barplot(r,a)
boxplot(s_info$a~s_info$h_d,col=c('blue','pink'))
Outputs:
LAB-3
Date: 02.02.2024
Aim:
To perform R program on simple correlation and linear regression
Procedure:
Step 1: Create input data vectors x, y
Step 2: Find its variance using syntax var()
Step 3: Find its covariance using syntax cov()
Step 4: Find its correlation coefficient using syntax cor()
Step 5: Test the correlation association between paired data using syntax cor.test()
Step 6: Visualize the data using plot()
Step 7: Find linear regression model of x vs y and y vs x using syntax lm(x~y), lm(y~x)
Step 8: Visualize both linear regression line
Step 9: Run the code
Formula:
Code:
x=c(2,4,6,8,10,11,13,15,17,19,20,23,25,26,28)
y=c(10,20,30,40,50,60,70,80,90,100,110,120,130,140,1
50) info=data.frame(x,y) summary(info)
var(x)
var(y)
cov(x,y)
cor(x,y)
cor.test(x,y,
method='s
pearman')
plot(x,y,typ
e='l')
reg1=lm(x
~y) reg1
abline(reg
1)
summary(
reg1)
reg2=lm(y
~x) reg2
abline(reg
2)
summary(
reg2)
Outputs:
Visualized graph of data x, y:
LAB-4
Applying multiple linear regression model to real dataset; computing and
interpreting the multiple coefficients of determination
Aim:
To understand the multiple linear regression model with computation and interpretation
using R
Procedure:
Import the data set
Determine the multiple linear regression using R functions
Visualize the multiple linear regression using R functions
CODE:
Y=c(110,80,70,120,150,90,70,120)
Y
X1=c(30,40,20,50,60,40,20,60)
X1
X2=c(11,10,7,15,19,12,8,14)
X2
RegModel=lm(Y~X1+X2)
RegModel
library(scatterplot3d)
scatterplot3d(Y,X1,X2)
LAB-5
Aim:
To understand discrete probability distribution using R
Procedure:
Input/Import the data set
Determine the probabilities of the random variable using Binomial distribution in R
Visualize the probability distribution using R functions
CODE:
n=4
p = 0.25
mean= n * p
variance = n * p * (1 - p)
mean
variance
CODE:
n=7
p =1/4
p1=dbinom(2,n,p)
p1
p2=dbinom(0:n,n,p)
p2
table=data.frame(0:7,p2)
table
barplot(p2,0:7,xlab = "Number of Successes",ylab = "Probability")
LAB-6
Aim:
To understand Poisson distribution and Normal distribution using R functions
Poisson Distribution
Procedure:
1.. A manufacturer of pins knows that 2% of his products are defective. If he sells pins in boxes of
20 and find the number of boxes containing (i) at least 2 defective (ii) exactly 2 defective (iii) at
most 2 defective pins in a consignment of 1000 boxes (iv) plot the distribution (v) E(x) (vi)
Variance of X?
CODE:
m=20
ps=0.02
lambda=m*ps
lambda
p1=sum(dpois(2:m,lambda))
p1
round(1000*p1)
p2=dpois(2,lambda)
p2
round(1000*p2)
p3=sum(dpois(0:2, lambda))
p3
round(1000*p3)
x1=0:m
px1=dpois(x1,lambda)
plot(x1,px1,type="h",xlab="values of x",ylab="Probability distribution of x",main="Poisson
distribution")
Normal distribution:
Procedure:
Generating the data set
Determine the probabilities of the random variable using Normal distribution in R
.Visualize the probability distribution using R functions
A company finds that the time taken by one of its engineers to complete or repair job has a
normal distribution with mean 20 minutes and S.D 5 minutes. State what proportion of jobs take:
i. Less than 15 minutes
ii. More than 25 minutes
iii. Between 15 and 25 minutes
iv. Plot the distribution
v. Table the distribution
Code:
x=seq(0,40)
x
y=dnorm(x,mean=20,sd=5)
y
plot(x,y,type='l')
p1=pnorm(15,mean=20,sd=5)
p1
x2=seq(0,15)
x2
y2=dnorm(x2,mean=20,sd=5)
y2
polygon(c(0,x2,15),c(0,y2,0),col='blue')
p2=pnorm(40,mean=20,sd=5)-pnorm(25,mean=20,sd=5)
p2
x1=seq(25,40)
x1
y1=dnorm(x1,mean=20,sd=5)
y1
polygon(c(25,x1,40),c(0,y1,0),col='red')
p3=pnorm(25,mean=20,sd=5)-pnorm(15,mean=20,sd=5)
p3
x3=seq(15,25)
x3
y3=dnorm(x3,mean=20,sd=5)
y3
polygon(c(15,x3,25),c(0,y3,0),col='purple')
OUTPUT:
Lab-7
Testing of hypothesis for one sample mean and proportion
from real time problems
xbar=13.2
mu0=22
sigma=2.5
n=20
z=(xbar-mu0)/(sigma/sqrt(n))
z
alpha=0.05
zhalfalpha=qnorm(1-(alpha/2))
zhalfalpha
c(-zhalfalpha,zhalfalpha)
pval=2*pnorm(z)
pval
if(pval>alpha){print("Accept Null hypothesis")} else{print("Reject Null
hypothesis")}
Testing of Hypothesis - Large Sample proportion Test
1.The fatality rate of typhoid patients is believed to be 19.41%. In a certain year 720 patients
suffering from typhoid were treated in a metropolitan hospital and only 53 patients died. Can
you consider the hospital efficient?
n=720
Sprop=53/n
Sprop
Pprop=0.1941
q=1-Pprop
q
z=(Sprop-Pprop)/sqrt(Pprop*q/n)
z
E=qnorm(.975)
c(-E,E)
Sprop+c(-E,E)*sqrt(Pprop*(1-Pprop)/n)
if(z>-E && z<E){print("Hospital is not efficient")} else{print("Hospital is
efficient")}
Lab-8
Testing of hypothesis for two sample means and proportion
from real time problems
1. In a random sample of size 300, the mean is found to be 5. In another independent sample of
size 200, the mean is 10. Could the samples have been drawn from the same population with S.D
3?
xbar=15
ybar=25
sigma=5
n1=1000
n2=800
z=(xbar-ybar)/(sigma*sqrt((1/n1)+(1/n2)))
z
alpha=0.05
alpha
zalpha=qnorm(1-(alpha/2))
zalpha
if(z<=zalpha){print("Accept Null hypothesis")} else{print("Reject Null
hypothesis")}
2.In a large city A, 40% of a random sample of 900 school boys had a slight physical defect. In
another large city B, 19% of a random sample of 1000 school boys had the same defect. Is the
difference between the proportions significant?
p1=0.40
p2=0.19
n1=900
n2=1000
P=(n1*p1+n2*p2)/(n1+n2)
P
Q=1-P
z=(p1-p2)/sqrt(P*Q*sqrt((1/n1)+(1/n2)))
z
alpha=0.05
alpha
zalpha=qnorm(1-(alpha/2))
zalpha
if(z<=zalpha){print("Accept Null hypothesis")} else{print("Reject Null
hypothesis")}
Lab- 9
Applying the t-test for independent and dependent samples
1. Student’s t-test
Two independent samples of sizes 8 and 7 contained the following values:
Sample 1 - 15 21 28 35 34 23 24 24
Sample 2 - 26 20 15 23 25 33 29 20
Is the difference between the sample means significant?
test1=c(41,38,47,32,39,44,40,37)
test2=c(31,39,44,40,32,48,40,33)
t=t.test(test1,test2,paired=TRUE)
t
alpha=0.05
tv=t$p.value
tv
if(tv >alpha){print("Accept Ho")} else{print("Reject Ho")}
F-test
1.Two independent samples of sizes 8 and 7 contained the following values:
Sample 1 - 15 21 28 35 34 23 24 24
Sample 2 - 26 20 15 23 25 33 29 20
Is the difference between the sample means significant?
1.Six coins are tossed 256 times. The number of heads observed by binomial distribution is given
below. Examine if the coins are unbiased by employing chi-square goodness of fit.
No. of heads -0 1 2 3 4 5
Frequency - 25 15 5 76 52 12
n=6
alpha=0.05
N=256
P = 0.5
x = c(0:n);x
obf = c(25,15,5,76,52,12)
exf = (dbinom(x,n,P)*256)
exf
sum(obf)
sum(exf)
chisq=sum((obf-exf)^2/exf)
cv = chisq
tv = qchisq(1-alpha,n)
if(cv <= tv){print("Accept H0/Fit is good")} else{print("Reject H0/Fit is not
good")}
2.From the following information state whether the condition of the child is associated with the
condition of the house.
Condition of child Condition of house Clean Condition of house dirty
Clean 41 20
Fairly clean 13 31
Dirty 27 36
data=matrix(c(41,20,13,31,27,36),ncol=2,byrow=T)
data
l=length(data)
cv=chisq.test(data)
cv
alpha=0.05
cv=cv$p.value
cv