0% found this document useful (0 votes)
10 views10 pages

Quiz 2 Solution Id 22070144

Uploaded by

Asif Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views10 pages

Quiz 2 Solution Id 22070144

Uploaded by

Asif Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Quiz 2

Katrodiya tapankumar Ashokbhai

2024-08-18

Question 1
Load the data
dataset = read.csv("dataset.csv")

I look at the top few entries using the head function to confirm data are loaded.
head(dataset, n= 10)

## number
## 1 139
## 2 116
## 3 122
## 4 115
## 5 122
## 6 126
## 7 107
## 8 112
## 9 112
## 10 121

To understand data
dim(dataset)

## [1] 52 1

nrow(dataset)

## [1] 52

ncol(dataset)

## [1] 1

names(dataset)

## [1] "number"

To visualize data I create a histogram


hist(dataset$number, xlab ="Number", main = "Datasets of the number",
col = "lightgreen" )
To draw a box plot
boxplot(dataset$number, horizontal = TRUE, pch = 16, main = "Dataset
of Number", col = "lightblue", xlab ="Number")
To compute the mean, median, standard deviation and First Quartile (Q1)
I compute the mean
mean(dataset$number)

## [1] 114.3269

So the mean is 114.33


I compute the median
median(dataset$number)

## [1] 121.5

So the median is 121.5


I compute the standard deviation
sd(dataset$number)

## [1] 35.40894

So the standard deviation is 35.41


For the first quantile
quantile(dataset$number, 1/4)
## 25%
## 107

So the first quantile is 107.0


comment on the shape of the distribution
For the Mean :
The mean (114.33) is less than the median (121.5), which suggests that the distribution
may be left-skewed (negatively skewed).
For the Standard Deviation :
The standard deviation (35.41) is relatively large compared to the mean, indicating that
there is significant variability in the data.
For First Quartile (Q1)
The first quartile (Q1) is 107.0, which is fairly close to the mean (114.33). This proximity
indicates that a significant portion of the data is concentrated below the median
Conclusion
The fact that the mean is less than the median suggests that the distribution is left-skewed,
or negatively skewed. The majority of the data is above the mean, but some lower numbers
push the mean downward, as indicated by this skewness.

Question 2

Step 1: To upload the data


diabetes = read.csv("diabetes.csv")

I look at the top few entries to confirm data are loaded.


head(diabetes$HDL, n = 10)

## [1] 27.4 51.4 42.1 53.8 57.6 32.5 47.6 25.9 47.3 84.6

To understand data
dim(diabetes)

## [1] 87 6

nrow(diabetes)

## [1] 87

ncol(diabetes)
## [1] 6

names(diabetes)

## [1] "sex" "BG" "HbA1c" "LDL" "HDL" "Tri"

Step 2: Define hypothesis


H0: No difference in HDL between males and females
H1: the mean HDL levels are greater in females than males.

Step 3: pre-checking data


I am interested in HDL’data
I quickly look at the summary of HDL
summary(diabetes$HDL)

## Min. 1st Qu. Median Mean 3rd Qu. Max.


## 18.90 39.80 47.10 45.58 51.15 84.60

I look the frequency of the males and females of this data


table(diabetes$sex)

##
## Female Male
## 44 43

barplot(table(diabetes$sex), main = "male and female count", col =


c("lightpink", "lightblue"))
To visualize data I create a histogram
hist(diabetes$HDL, xlab = "HDL", main = "Data of HDL", col =
"lightpink", breaks = 15)
I look at the data
split by the sex variable
aggregate(HDL~sex, data = diabetes, mean)

## sex HDL
## 1 Female 46.76591
## 2 Male 44.35814

To visualize I create a histogram and box plot


library(lattice)
histogram(~HDL|sex, data = diabetes)
boxplot(HDL~sex, data = diabetes, horizontal = TRUE, pch = 16)
Step 4: compute sample statistic
The actual difference in means can be computed from the aggregate data
HDL = -diff(aggregate(HDL~sex,diabetes, mean)$HDL)
HDL

## [1] 2.40777

To simulate the sex variable I use the sample function.


sex.sim = sample(diabetes$sex)

Step 5: Generate randomized distribution under Null0


To create a new sample with the same size as the original sample I use the replicate
function
HDL0 = replicate(1000,{
sex.sim = sample(diabetes$sex)
-diff(aggregate(HDL~sex.sim, data = diabetes, mean)$HDL)
})

hist(HDL0, main = "Replicate's sample Histogram", xlab = "Thousand of


HDL sample", col = "lightgoldenrod")
Step 6: compute p-value
pVal = mean(HDL0 > HDL)
pVal

## [1] 0.125

So p-value is 0.12
I run a t.test
t.test(HDL~sex, data= diabetes, alternative="greater",)

##
## Welch Two Sample t-test
##
## data: HDL by sex
## t = 1.1631, df = 68.33, p-value = 0.1244
## alternative hypothesis: true difference in means between group
Female and group Male is greater than 0
## 95 percent confidence interval:
## -1.04406 Inf
## sample estimates:
## mean in group Female mean in group Male
## 46.76591 44.35814

I extract t-statistic
t.test(HDL~sex, data= diabetes, alternative="greater")$statistic

## t
## 1.163112

so t-statistic is 1.16

Step 7: Conclusion
The p-value is 0.12, which is greater than the significance level of 0.05. Therefore, we do
not reject the null hypothesis.
There is insufficient evidence at the 0.05 significance level to conclude that the mean HDL
levels are greater in females than in males.

You might also like