0% found this document useful (0 votes)
18 views29 pages

22bce1859 Rprogramming

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views29 pages

22bce1859 Rprogramming

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

SRIKAR VEDANABHATLA

22BCE1859

LAB-1
Date: 05.01.2024

Importing Data from Excel files

Aim:
To import data from excel files using R studio code
Code:
read.csv(“C:/Users/student.MAT33/Documents/lab2prb.csv”)
Procedure:
Step 1: Create a table in excel file with required data
Step 2: Save the file
Step 3: Open R studio
Step 4: Export the file using syntax read.csv(‘ ‘)
Step 5: Copy and paste the path of the excel file in R studio
Step 6: Run the code
Input window:

Output:
LAB-2
Date: 12.01.2024

Computing Summary Statistics/Plotting and Visualizing data using


Tabulation and Graphical representation

Aim:
To perform different alignment of data set and various graphical representation in R studio.

Procedure:
Step 1: Create and arrange various data vectors using R function
Step 2: Create a data frame to print those vectors
Step 3: Label the characters to numeric using label() syntax
Step 4: Extract specific data using subset()
Step 5: Summary the statistics for required data
Step 6: Create separate table for required data using table() syntax
Step 7: Visualize the data sets using graphical representation
Step 8: Run the code

Code:
r=c(1166,1120,1807,1609,1809,1377,1443,1859,5115,5014)
n=c('Udhay','Amogh','Abish','Ramy','Jyothi','Poope','Josh','Srikar','
Latha','Renu')
a=c(18,21,19,20,19,23,18,20,22,21)
d=c(0,1,0,0,2,0,1,3,4,2)
h_d=c(0,1,1,1,0,0,1,0,1,1)
g=c(1,0,0,0,1,1,0,1,1,0)
s_info=data.frame(r,n,a,d,h_d,g)
s_info$g=factor(s_info$g,labels=c('male','female'))
s_info$h_d=factor(s_info$h_d,labels=c('Hosteler','Day scholor'))
s_info$d=factor(s_info$d,labels=c('CSE','MECH','CIVIL','EEE','ECE
'))
f_s=subset(s_info,s_info$g=='female')
m_s=subset(s_info,s_info$g=='male')
c_s=subset(s_info,s_info$d=='CSE')
summary(s_info)
f_s
m_s
c_s
s_info
x=table(s_info$g,s_info$h_d)
x
plot(s_info$a,type='l',main='Age of
students',xlab='age',ylab='Reg no',col='green')
pie(x)
barplot(r,a)
boxplot(s_info$a~s_info$h_d,col=c('blue','pink'))
Outputs:
LAB-3
Date: 02.02.2024

Applying correlation and simple linear regression model to data set

Aim:
To perform R program on simple correlation and linear regression
Procedure:
Step 1: Create input data vectors x, y
Step 2: Find its variance using syntax var()
Step 3: Find its covariance using syntax cov()
Step 4: Find its correlation coefficient using syntax cor()
Step 5: Test the correlation association between paired data using syntax cor.test()
Step 6: Visualize the data using plot()
Step 7: Find linear regression model of x vs y and y vs x using syntax lm(x~y), lm(y~x)
Step 8: Visualize both linear regression line
Step 9: Run the code
Formula:
Code:
x=c(2,4,6,8,10,11,13,15,17,19,20,23,25,26,28)
y=c(10,20,30,40,50,60,70,80,90,100,110,120,130,140,1
50) info=data.frame(x,y) summary(info)
var(x)
var(y)
cov(x,y)
cor(x,y)
cor.test(x,y,
method='s
pearman')
plot(x,y,typ
e='l')
reg1=lm(x
~y) reg1
abline(reg
1)
summary(
reg1)
reg2=lm(y
~x) reg2
abline(reg
2)
summary(
reg2)
Outputs:
Visualized graph of data x, y:
LAB-4
Applying multiple linear regression model to real dataset; computing and
interpreting the multiple coefficients of determination

Aim:
To understand the multiple linear regression model with computation and interpretation
using R
Procedure:
Import the data set
Determine the multiple linear regression using R functions
Visualize the multiple linear regression using R functions

CODE:
Y=c(110,80,70,120,150,90,70,120)
Y
X1=c(30,40,20,50,60,40,20,60)
X1
X2=c(11,10,7,15,19,12,8,14)
X2
RegModel=lm(Y~X1+X2)
RegModel
library(scatterplot3d)
scatterplot3d(Y,X1,X2)
LAB-5

Fitting the probability distributions: Binomial distribution

Aim:
To understand discrete probability distribution using R
Procedure:
Input/Import the data set
Determine the probabilities of the random variable using Binomial distribution in R
Visualize the probability distribution using R functions

1. Suppose a random variable X~ B (4,0.25). Find mean and variance of X by using


proper R code.

CODE:
n=4
p = 0.25
mean= n * p
variance = n * p * (1 - p)
mean
variance

2. Let X be the number of heads in 10 tosses of a fair coin.


(i) Find the probability of getting at least 5 heads (that is, 5 or more).
(ii) Find the probability of getting exactly 5 heads.
(iii) Find the probability of getting between 4 and 6 heads, inclusive
Note: For question no 2, use two different R code and find the answer.
CODE:
n = 10
p = 0.5
p1=sum(dbinom(5:10,n,p))
p1
p2=dbinom(5,n,p)
p2
p3=sum(dbinom(4:6,n,p))
p3

3. For a Binomial(7,1/4) random variable named X,


(i) Compute the probability of two success
(ii) Compute the Probabilities for whole space
(iii) Display those probabilities in a table
(iv) Show the shape of this binomial Distribution

CODE:

n=7
p =1/4
p1=dbinom(2,n,p)
p1
p2=dbinom(0:n,n,p)
p2
table=data.frame(0:7,p2)
table
barplot(p2,0:7,xlab = "Number of Successes",ylab = "Probability")
LAB-6

Normal distribution, Poisson distribution

Aim:
To understand Poisson distribution and Normal distribution using R functions

Poisson Distribution

Procedure:

Import the data set


Determine the probabilities of the random variable using Poisson distribution in R
Visualize the probability distribution using R functions

1.. A manufacturer of pins knows that 2% of his products are defective. If he sells pins in boxes of
20 and find the number of boxes containing (i) at least 2 defective (ii) exactly 2 defective (iii) at
most 2 defective pins in a consignment of 1000 boxes (iv) plot the distribution (v) E(x) (vi)
Variance of X?
CODE:
m=20
ps=0.02
lambda=m*ps
lambda
p1=sum(dpois(2:m,lambda))
p1
round(1000*p1)
p2=dpois(2,lambda)
p2
round(1000*p2)
p3=sum(dpois(0:2, lambda))
p3
round(1000*p3)
x1=0:m
px1=dpois(x1,lambda)
plot(x1,px1,type="h",xlab="values of x",ylab="Probability distribution of x",main="Poisson
distribution")
Normal distribution:

Procedure:
Generating the data set
Determine the probabilities of the random variable using Normal distribution in R
.Visualize the probability distribution using R functions

A company finds that the time taken by one of its engineers to complete or repair job has a
normal distribution with mean 20 minutes and S.D 5 minutes. State what proportion of jobs take:
i. Less than 15 minutes
ii. More than 25 minutes
iii. Between 15 and 25 minutes
iv. Plot the distribution
v. Table the distribution

Code:
x=seq(0,40)
x
y=dnorm(x,mean=20,sd=5)
y
plot(x,y,type='l')
p1=pnorm(15,mean=20,sd=5)
p1
x2=seq(0,15)
x2
y2=dnorm(x2,mean=20,sd=5)
y2
polygon(c(0,x2,15),c(0,y2,0),col='blue')
p2=pnorm(40,mean=20,sd=5)-pnorm(25,mean=20,sd=5)
p2
x1=seq(25,40)
x1
y1=dnorm(x1,mean=20,sd=5)
y1
polygon(c(25,x1,40),c(0,y1,0),col='red')
p3=pnorm(25,mean=20,sd=5)-pnorm(15,mean=20,sd=5)
p3
x3=seq(15,25)
x3
y3=dnorm(x3,mean=20,sd=5)
y3
polygon(c(15,x3,25),c(0,y3,0),col='purple')

OUTPUT:
Lab-7
Testing of hypothesis for one sample mean and proportion
from real time problems

Testing of Hypothesis - Large Sample mean Test


1.Suppose the mean weight of King Penguins found in an Antarctic colony last year was
22kg.
In a sample of 20 penguins same time this year in the same colony, the mean penguin weight
is
13.2kg. Assume the population standard deviation is 2.5 kg. At 0 .05 significance level, can
we
reject the null hypothesis that the mean penguin weight does not differ from last year?

xbar=13.2
mu0=22
sigma=2.5
n=20
z=(xbar-mu0)/(sigma/sqrt(n))
z
alpha=0.05
zhalfalpha=qnorm(1-(alpha/2))
zhalfalpha
c(-zhalfalpha,zhalfalpha)
pval=2*pnorm(z)
pval
if(pval>alpha){print("Accept Null hypothesis")} else{print("Reject Null
hypothesis")}
Testing of Hypothesis - Large Sample proportion Test
1.The fatality rate of typhoid patients is believed to be 19.41%. In a certain year 720 patients
suffering from typhoid were treated in a metropolitan hospital and only 53 patients died. Can
you consider the hospital efficient?

n=720
Sprop=53/n
Sprop
Pprop=0.1941
q=1-Pprop
q
z=(Sprop-Pprop)/sqrt(Pprop*q/n)
z
E=qnorm(.975)
c(-E,E)
Sprop+c(-E,E)*sqrt(Pprop*(1-Pprop)/n)
if(z>-E && z<E){print("Hospital is not efficient")} else{print("Hospital is
efficient")}
Lab-8
Testing of hypothesis for two sample means and proportion
from real time problems

1. In a random sample of size 300, the mean is found to be 5. In another independent sample of
size 200, the mean is 10. Could the samples have been drawn from the same population with S.D
3?

xbar=15
ybar=25
sigma=5
n1=1000
n2=800
z=(xbar-ybar)/(sigma*sqrt((1/n1)+(1/n2)))
z
alpha=0.05
alpha
zalpha=qnorm(1-(alpha/2))
zalpha
if(z<=zalpha){print("Accept Null hypothesis")} else{print("Reject Null
hypothesis")}

2.In a large city A, 40% of a random sample of 900 school boys had a slight physical defect. In
another large city B, 19% of a random sample of 1000 school boys had the same defect. Is the
difference between the proportions significant?

p1=0.40
p2=0.19
n1=900
n2=1000
P=(n1*p1+n2*p2)/(n1+n2)
P
Q=1-P
z=(p1-p2)/sqrt(P*Q*sqrt((1/n1)+(1/n2)))
z
alpha=0.05
alpha
zalpha=qnorm(1-(alpha/2))
zalpha
if(z<=zalpha){print("Accept Null hypothesis")} else{print("Reject Null
hypothesis")}
Lab- 9
Applying the t-test for independent and dependent samples

1. Student’s t-test
Two independent samples of sizes 8 and 7 contained the following values:
Sample 1 - 15 21 28 35 34 23 24 24
Sample 2 - 26 20 15 23 25 33 29 20
Is the difference between the sample means significant?

sample1=c(15,21,28,35,34,23 ,24 ,24)


sample2=c(26,20,15,23 ,25,33,29,20)
t=t.test(sample1,sample2)
t
cv=t$statistic
cv
tv=qt(0.975,14)
tv
if(cv <= tv){print("Accept Ho")} else{print("Reject Ho")}
2. Paired t-test
The following data relate to the marks obtained by 8 students in two test ,one held at the
beginning of a year and the other at the end of the year after intensive coaching. Do the data
indicate that the students have got benefited by coaching?
Test 1 - 41 38 47 32 39 44 40 37
Test 2 - 31 39 44 40 32 48 40 33

test1=c(41,38,47,32,39,44,40,37)
test2=c(31,39,44,40,32,48,40,33)
t=t.test(test1,test2,paired=TRUE)
t
alpha=0.05
tv=t$p.value
tv
if(tv >alpha){print("Accept Ho")} else{print("Reject Ho")}
F-test
1.Two independent samples of sizes 8 and 7 contained the following values:
Sample 1 - 15 21 28 35 34 23 24 24
Sample 2 - 26 20 15 23 25 33 29 20
Is the difference between the sample means significant?

sample1=c(15,21,28,35,34,23 ,24 ,24)


sample2=c(26,20,15,23 ,25,33,29,20)
f=var.test(sample1,sample2)
f
cv=f$statistic
cv
tv=qf(0.95,7,7)
tv
if(cv <= tv){print("Accept Ho")} else{print("Reject Ho")}
Lab-10
Applying Chi-square test for goodness of fit test and
Contingency test to real dataset

1.Six coins are tossed 256 times. The number of heads observed by binomial distribution is given
below. Examine if the coins are unbiased by employing chi-square goodness of fit.
No. of heads -0 1 2 3 4 5
Frequency - 25 15 5 76 52 12

n=6
alpha=0.05
N=256
P = 0.5
x = c(0:n);x
obf = c(25,15,5,76,52,12)
exf = (dbinom(x,n,P)*256)
exf
sum(obf)
sum(exf)
chisq=sum((obf-exf)^2/exf)
cv = chisq
tv = qchisq(1-alpha,n)
if(cv <= tv){print("Accept H0/Fit is good")} else{print("Reject H0/Fit is not
good")}

2.From the following information state whether the condition of the child is associated with the
condition of the house.
Condition of child Condition of house Clean Condition of house dirty
Clean 41 20
Fairly clean 13 31
Dirty 27 36

data=matrix(c(41,20,13,31,27,36),ncol=2,byrow=T)
data
l=length(data)
cv=chisq.test(data)
cv
alpha=0.05
cv=cv$p.value
cv

if(cv >alpha){print("Attributes are independent")} else{print("Attributes are


not independent")}

You might also like