0% found this document useful (0 votes)
97 views15 pages

Basic Stats

This document contains 11 questions related to statistics concepts. Question 1 identifies data types as discrete or continuous for various attributes. Question 2 identifies data types as nominal, ordinal, interval or ratio. Question 3 calculates probability of outcomes from tossing coins. Question 4 calculates probability of outcomes from rolling dice. Question 5 calculates probability of balls drawn from a bag. Questions 6-10 calculate statistical measures like mean, median, mode, variance, standard deviation, range, skewness and kurtosis for different data sets and draw inferences. Question 11 calculates confidence intervals for estimating average weight of adult males in Mexico using sample data.

Uploaded by

Johny Jose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views15 pages

Basic Stats

This document contains 11 questions related to statistics concepts. Question 1 identifies data types as discrete or continuous for various attributes. Question 2 identifies data types as nominal, ordinal, interval or ratio. Question 3 calculates probability of outcomes from tossing coins. Question 4 calculates probability of outcomes from rolling dice. Question 5 calculates probability of balls drawn from a bag. Questions 6-10 calculate statistical measures like mean, median, mode, variance, standard deviation, range, skewness and kurtosis for different data sets and draw inferences. Question 11 calculates confidence intervals for estimating average weight of adult males in Mexico using sample data.

Uploaded by

Johny Jose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Q1) Identify the Data type for the Following:

Activity Data Type


Number of beatings from Wife Discrete
Results of rolling a dice Discrete
Weight of a person Continuous
Weight of Gold Continuous
Distance between two places Continuous
Length of a leaf Continuous
Dog's weight Continuous
Blue Color Discrete
Number of kids Discrete
Number of tickets in Indian railways Discrete
Number of times married Discrete
Gender (Male or Female) Discrete

Q2) Identify the Data types, which were among the following
Nominal, Ordinal, Interval, Ratio.
Data Data Type
Gender Nominal
High School Class Ranking Ordinal
Celsius Temperature Interval
Weight Ratio
Hair Color Nominal
Socioeconomic Status Nominal
Fahrenheit Temperature Interval
Height Ratio
Type of living accommodation Nominal
Level of Agreement Nominal
IQ(Intelligence Scale) Ordinal
Sales Figures Ratio
Blood Group Nominal
Time Of Day Ratio
Time on a Clock with Hands Interval Data
Number of Children Ratio
Religious Preference Nominal
Barometer Pressure Ratio
SAT Scores Interval
Years of Education Ratio

Q3) Three Coins are tossed, find the probability that two heads and one tail are
obtained?
Total Possibilities= 2^3=8.
Interested Events-(H,H,T),(H,T,H),(T,H,H)-3
Probability of 2 Heads and 1 Tail= 3/8
Q4) Two Dice are rolled, find the probability that sum is
a) Equal to 1- Zero Probability
b) Less than or equal to 4- 6/36=1/6
c) Sum is divisible by 2 and 3=6/36=1/6

Q5) A bag contains 2 red, 3 green and 2 blue balls. Two balls are drawn at
random. What is the probability that none of the balls drawn is blue?

Total Probability- 7c2


Interested Events- 5c2
Probaility-10/21

Q6) Calculate the Expected number of candies for a randomly selected child
Below are the probabilities of count of candies for children (ignoring the nature of
the child-Generalized view)
CHILD Candies count Probability X*p(x)
A 1 0.015 0.015
B 4 0.20 0.8

C 3 0.65 1.95
D 5 0.005 0.025
E 6 0.01 0.06
F 2 0.120 0.24
Child A – probability of having 1 candy = 0.015.
Child B – probability of having 4 candies = 0.20
𝑛
Expected Number of Candies = ∑1 (𝑋𝑛 P(x))

=3.09.

Q7) Calculate Mean, Median, Mode, Variance, Standard Deviation, Range &
comment about the values / draw inferences, for the given dataset
- For Points,Score,Weigh>
Find Mean, Median, Mode, Variance, Standard Deviation, and Range
and also Comment about the values/ Draw some inferences.
Parameters Points Score Weigh
Mean 3.59 3.21 17.85
Median 3.69 3.32 17.71
Mode 3.92 3.44 17.02
Variance 0.28 0.95 3.19
Standard Deviation 0.53 0.97 1.78
Range 2.76-4.93 1.513-5.424 14.5-22.9
Boxplot Data Data Data
concentrated concentrated concentrated
towards the towards the towards the
left hand side LHS RHS. Outlier
on the larger
side
Histogram Data Doesn’t follow Not a normal
concentrated a normal distribution.
distribution. Data
to Left hand Outliers on the Concentrated
side larger Side towards the
RHS
Skewness Left Skewed Left skewed so Right Skewed
that is why mean is less that’s why
mean is less than median mean is
than median greater than
median
QQ- Do not follow Not a normal Doesn’t follow
PLOT(QQNORM&QQLINE) a normal Distribution normal
distribution distribution
Q8) Calculate Expected Value for the problem below
a) The weights (X) of patients at a clinic (in pounds), are
108, 110, 123, 134, 135, 145, 167, 187, 199
Assume one of the patients is chosen at random. What is the Expected
Value of the Weight of that patient?
Expected Value of the patient is 145.33.
Ans. Done in Excel shared along with Assignment.
Q9) Calculate Skewness, Kurtosis & draw inferences on the following data
Cars speed and distance

Ans:
For Speed Column skewness- -0.11
Tail towards the LHS of the distribution is 0.11 times fatter or longer than tail on
the RHS. With the assumption -0.5<Skewness<0.5 we can take Speed distribution
as fairly symmetric.
Doubt: For -ve skewness mode>median>mean but in this scenario 20>15>!15.4
Which is not correct.
For Speed Column Kurtosis- 2.42
Kurtosis is < 3 . This distribution has a Platykurtic distribution. Distribution is
shorter, tails are thinner than normal distribution. The peak is lower & thinner
than mesokurtic which means data’s are light tailed or lack of outliers.

For Dist Column skewness- 0.78


Tail towards the RHS of the distribution is 0.78 times fatter or longer than tail on
the LHS. With the assumption 0.5<Skewness<1 we can take Dist distribution as
moderately skewed. For +vely skewed distribution mode<median<mean and as
per values 26<36<42.98 follows the same.
For Dist Column Kurtosis: 3.24
The distribution follows a Leptokurtic distribution. The Peak is higher and
sharper than mesokurtic. It mean data’s are heavily tailed and profusion of
outliers.

SP and Weight(WT)
Ans:
For SP distribution : Skewness is 1.5814. Positive Skewness implies a right skewed
distribution and the value 1.5814 indicates highly rightly skewed. Tail towards the
RHS is 1.5814 times greater or longer than tail towards the LHS.
Doubt: For a positively skewed distribution mode<median<mean but in this
scenario 118.29!<118.21<121.54
For SP Distribution :Kurtosis is 5.72. Which means peak is longer and sharper than
mesokurtic. The distribution is heavily tailed and profusion of outliers. The
distribution is leptokurtic distribution.
For WT Distribution: Skewness is -0.6033. The distribution is negatively skewed
which means tail towards the left is 0.60 times longer or fatter than tail on the
right hand side. Considering the fact skewness is -0.60 we can tell the distribution
is moderately skewed to the left.
Doubt: For a Negatively skewed distribution mode>median>mean but in this
scenario 28.76>!32.73>32.41.
For WT Distribution: Kurtosis is 3.81 which means it is a leptokurtic distribution.
The peak is longer and sharper than mesokurtic. It means the distribution is
heavily tailed and profusion of outliers.

Q10) Draw inferences about the following boxplot & histogram

Histogram Inferences:
1. The histogram is Right Skewed or Positively Skewed. Which means the Right Tail is larger or
Fatter compared to the left tail.
2. Since it positively skewed mode<median<mean.
3. The Bin with the highest frequency is 50-100. As a result mode will be in this bin.
4. Since the Histogram has more values to the right mean will be occurring to the right of mode.
5. Bin arranged on the ascending order of their frequency will be as follows: 350-400<300-
350<250-300<200-250<0-50<150-200<100-150<50-100

Box Plot Inferences:


1. Considering Lower Quartile as Q1, Median as Q2 & upper Quartile as Q3,
largest value on the Right Whisker as Q4 & smallest value on the left
whisker as Q0. Q3-Q2>Q2-Q1 which indicates data is right skewed.
2. Q4-Q3>Q1-Q0 which indicates data is right skewed.
3. Right Skewed Data’s will have mode<median<mean.
4. As there are data points >Q3+1.5(IQR) these can be considered as
suspected outliers on the largest side of the box plot. If these outliers value
is greater than Q3+3(IQR) they can be termed as outliers.
5. This Right tail of this distribution will be longer or fatter compared to its left
tail.

Q11) Suppose we want to estimate the average weight of an adult male in


Mexico. We draw a random sample of 2,000 men from a population of
3,000,000 men and weigh them. We find that the average person in our
sample weighs 200 pounds, and the standard deviation of the sample is 30
pounds. Calculate 94%,98%,96% confidence interval ?

n=2000 N=3,000,000 Sample mean=200 s=30.

To calculate the mean of the Population when SD of the population is not


given:
𝑆
Interval of the Population Mean= 𝑆𝑎𝑚𝑝𝑙𝑒 𝑀𝑒𝑎𝑛 ± 𝑡1−𝛼,𝑛−1 ( )
√𝑛

With 94% confidence Interval- For a one tail distribution value we should
check is t at 0.94+0.03=0.97 or 0.03. & degrees of freedom 2000-1=1999.
As t at 0.97 is not available I am taking t at 0.975. with degrees of Freedom
>1000. Which is 1.881.
[198.73,201.26]
With 98% confidence Interval-[198.44,201.55]
With 96% confidence Interval-[198.63,201.37]
Q12) Below are the scores obtained by a student in tests

34,36,36,38,38,39,39,40,40,41,41,41,41,42,42,45,49,56
1) Find mean, median, variance, standard deviation.
Mean-41,Median=40.5,Mode=41 Variance=25.52,Standard Deviation=
5.052
2) What can we say about the student marks?
Mean = Mode~ Median- Which means the student marks almost follows a
normal distribution. All features of a Normal Distribution like Bell shaped
curve, symmetry will be almost applicable to this student marks.
Q13) What is the nature of skewness when mean, median of data are equal?
Skewness will be 0 when mean=median=mode. Reason being when
mean=median=mode distribution is symmetric hence there is no question of
Skewness
Q14) What is the nature of skewness when mean > median ?
When mean is greater than median it means median is to the left of mean which
occurs when there are a lot of values towards the right tail. In other words we
can tell mean>median in case of a Right Skewed or positive Skewed distribution.
Q15) What is the nature of skewness when median > mean?
When median is greater than mean or when median occurs to the right of mean it
means there are a lot of values towards the left tail of the distribution. In other
words if median >mean it means a left skewed or negative skewed distribution.
Q16) What does positive kurtosis value indicates for a data ?
A distribution with positive kurtosis or kurtosis>3 indicates it has heavier tails
than the normal distribution. It is called Leptokurtic distribution. Its peaks are
more longer and sharper than mesokurtic distributions.
Q17) What does negative kurtosis value indicates for a data?
A distribution with negative kurtosis or kurtosis<3 indicate it has lighter tails than
the normal distribution. It is called Platykurtic distribution. Its peaks are smaller
and thinner than mesokurtic distributions.
Q18) Answer the below questions using the below boxplot visualization.

What can we say about the distribution of the data?


Consider lower Quartile Q1, Median Q2 , Upper Quartile Q3, largest value on the
right whisker as Q4 and lowest value on the left whisker as Q0. Q2-Q1>Q3-Q2
which means data is Left Skewed. Left Whisker>Right Whisker indicates data is
left skewed. It do not follow a normal distribution. In this case Mean<Median
What is nature of skewness of the data?
This Distribution is Negatively skewed.
What will be the IQR of the data (approximately)?

IQR= Q3-Q1= 18-10=8

Q19) Comment on the below Boxplot visualizations?

Draw an Inference from the distribution of data for Boxplot 1 with respect
Boxplot 2.
a. Box plot 1 has shorter box when compared to Box plot 2 which means their
data consistently hover around the center values.
b. IQR(Plot1)<IQR(Plot2)
c. As the medians are similar for both Box plots the variation in IQR indicates
that plot 2 has more variable data compared to plot 1.
d. Whisker lengths of plot 2 is also greater when compared to whisker lengths
of plot 1 which also indicates Plot 2 data set has larger variation in data set
when compared to plot 1
Q 20) Calculate probability from the given dataset for the below cases

Data _set: Cars.csv


Calculate the probability of MPG of Cars for the below cases.
MPG <- Cars$MPG
a. P(MPG>38)- 0.3474
pnorm(q=38,mean = 34.42,sd=9.13,lower.tail = F)
b. P(MPG<40)-0.7294
pnorm(q=40,mean = 34.42,sd=9.13,lower.tail = T)
c. P (20<MPG<50)-0.8989
pnorm(q=50,mean = 34.42,sd=9.13,lower.tail = T)-pnorm(q=20,mean
= 34.42,sd=9.13,lower.tail = T)

Q 21) Check whether the data follows normal distribution


a) Check whether the MPG of Cars follows Normal Distribution
Dataset: Cars.csv
qqnorm(cars$MPG)
qqplot(cars$MPG)
Suggest MPG do not follow a normal distribution.
Theoretically for a normal distribution mean=median=mode

34.42!=35.15!=29.62. MPG do not follow a normal distribution

b) Check Whether the Adipose Tissue (AT) and Waist Circumference(Waist)


from wc-at data set follows Normal Distribution
Dataset: wc-at.csv
qqnorm(waist$Waist)
qqline(waist$Waist)
qqnorm(waist$AT)
qqline(waist$AT)
Shows they do not follow a normal distribution. Skewness of Waist= 0.13
Skewness of AT= 0.57. Which clearly indicates both the data are not
symmetric hence they do not follow a normal distribution.

Q 22) Calculate the Z scores of 90% confidence interval,94% confidence


interval, 60% confidence interval
90%-1.644854
94%-1.880794
60%-0.8416
Q 23) Calculate the t scores of 95% confidence interval, 96% confidence
interval, 99% confidence interval for sample size of 25
95%-t(0.975,24)-2.063
96%-t(0.98,24)-2.1715
99%-t(0.995,24)-2.796

Q 24) A Government company claims that an average light bulb lasts 270
days. A researcher randomly selects 18 bulbs for testing. The sampled bulbs
last an average of 260 days, with a standard deviation of 90 days. If the
CEO's claim were true, what is the probability that 18 randomly selected
bulbs would have an average life of no more than 260 days
Hint:

rcode  pt(tscore,df)
df  degrees of freedom

Mean=270

(260-270)/(90/(18)^1/2)=-0.47103

Pt(-0.47103,17)=32.18%

You might also like