Basic Stats
Basic Stats
Q2) Identify the Data types, which were among the following
Nominal, Ordinal, Interval, Ratio.
Data Data Type
Gender Nominal
High School Class Ranking Ordinal
Celsius Temperature Interval
Weight Ratio
Hair Color Nominal
Socioeconomic Status Nominal
Fahrenheit Temperature Interval
Height Ratio
Type of living accommodation Nominal
Level of Agreement Nominal
IQ(Intelligence Scale) Ordinal
Sales Figures Ratio
Blood Group Nominal
Time Of Day Ratio
Time on a Clock with Hands Interval Data
Number of Children Ratio
Religious Preference Nominal
Barometer Pressure Ratio
SAT Scores Interval
Years of Education Ratio
Q3) Three Coins are tossed, find the probability that two heads and one tail are
obtained?
Total Possibilities= 2^3=8.
Interested Events-(H,H,T),(H,T,H),(T,H,H)-3
Probability of 2 Heads and 1 Tail= 3/8
Q4) Two Dice are rolled, find the probability that sum is
a) Equal to 1- Zero Probability
b) Less than or equal to 4- 6/36=1/6
c) Sum is divisible by 2 and 3=6/36=1/6
Q5) A bag contains 2 red, 3 green and 2 blue balls. Two balls are drawn at
random. What is the probability that none of the balls drawn is blue?
Q6) Calculate the Expected number of candies for a randomly selected child
Below are the probabilities of count of candies for children (ignoring the nature of
the child-Generalized view)
CHILD Candies count Probability X*p(x)
A 1 0.015 0.015
B 4 0.20 0.8
C 3 0.65 1.95
D 5 0.005 0.025
E 6 0.01 0.06
F 2 0.120 0.24
Child A – probability of having 1 candy = 0.015.
Child B – probability of having 4 candies = 0.20
𝑛
Expected Number of Candies = ∑1 (𝑋𝑛 P(x))
=3.09.
Q7) Calculate Mean, Median, Mode, Variance, Standard Deviation, Range &
comment about the values / draw inferences, for the given dataset
- For Points,Score,Weigh>
Find Mean, Median, Mode, Variance, Standard Deviation, and Range
and also Comment about the values/ Draw some inferences.
Parameters Points Score Weigh
Mean 3.59 3.21 17.85
Median 3.69 3.32 17.71
Mode 3.92 3.44 17.02
Variance 0.28 0.95 3.19
Standard Deviation 0.53 0.97 1.78
Range 2.76-4.93 1.513-5.424 14.5-22.9
Boxplot Data Data Data
concentrated concentrated concentrated
towards the towards the towards the
left hand side LHS RHS. Outlier
on the larger
side
Histogram Data Doesn’t follow Not a normal
concentrated a normal distribution.
distribution. Data
to Left hand Outliers on the Concentrated
side larger Side towards the
RHS
Skewness Left Skewed Left skewed so Right Skewed
that is why mean is less that’s why
mean is less than median mean is
than median greater than
median
QQ- Do not follow Not a normal Doesn’t follow
PLOT(QQNORM&QQLINE) a normal Distribution normal
distribution distribution
Q8) Calculate Expected Value for the problem below
a) The weights (X) of patients at a clinic (in pounds), are
108, 110, 123, 134, 135, 145, 167, 187, 199
Assume one of the patients is chosen at random. What is the Expected
Value of the Weight of that patient?
Expected Value of the patient is 145.33.
Ans. Done in Excel shared along with Assignment.
Q9) Calculate Skewness, Kurtosis & draw inferences on the following data
Cars speed and distance
Ans:
For Speed Column skewness- -0.11
Tail towards the LHS of the distribution is 0.11 times fatter or longer than tail on
the RHS. With the assumption -0.5<Skewness<0.5 we can take Speed distribution
as fairly symmetric.
Doubt: For -ve skewness mode>median>mean but in this scenario 20>15>!15.4
Which is not correct.
For Speed Column Kurtosis- 2.42
Kurtosis is < 3 . This distribution has a Platykurtic distribution. Distribution is
shorter, tails are thinner than normal distribution. The peak is lower & thinner
than mesokurtic which means data’s are light tailed or lack of outliers.
SP and Weight(WT)
Ans:
For SP distribution : Skewness is 1.5814. Positive Skewness implies a right skewed
distribution and the value 1.5814 indicates highly rightly skewed. Tail towards the
RHS is 1.5814 times greater or longer than tail towards the LHS.
Doubt: For a positively skewed distribution mode<median<mean but in this
scenario 118.29!<118.21<121.54
For SP Distribution :Kurtosis is 5.72. Which means peak is longer and sharper than
mesokurtic. The distribution is heavily tailed and profusion of outliers. The
distribution is leptokurtic distribution.
For WT Distribution: Skewness is -0.6033. The distribution is negatively skewed
which means tail towards the left is 0.60 times longer or fatter than tail on the
right hand side. Considering the fact skewness is -0.60 we can tell the distribution
is moderately skewed to the left.
Doubt: For a Negatively skewed distribution mode>median>mean but in this
scenario 28.76>!32.73>32.41.
For WT Distribution: Kurtosis is 3.81 which means it is a leptokurtic distribution.
The peak is longer and sharper than mesokurtic. It means the distribution is
heavily tailed and profusion of outliers.
Histogram Inferences:
1. The histogram is Right Skewed or Positively Skewed. Which means the Right Tail is larger or
Fatter compared to the left tail.
2. Since it positively skewed mode<median<mean.
3. The Bin with the highest frequency is 50-100. As a result mode will be in this bin.
4. Since the Histogram has more values to the right mean will be occurring to the right of mode.
5. Bin arranged on the ascending order of their frequency will be as follows: 350-400<300-
350<250-300<200-250<0-50<150-200<100-150<50-100
With 94% confidence Interval- For a one tail distribution value we should
check is t at 0.94+0.03=0.97 or 0.03. & degrees of freedom 2000-1=1999.
As t at 0.97 is not available I am taking t at 0.975. with degrees of Freedom
>1000. Which is 1.881.
[198.73,201.26]
With 98% confidence Interval-[198.44,201.55]
With 96% confidence Interval-[198.63,201.37]
Q12) Below are the scores obtained by a student in tests
34,36,36,38,38,39,39,40,40,41,41,41,41,42,42,45,49,56
1) Find mean, median, variance, standard deviation.
Mean-41,Median=40.5,Mode=41 Variance=25.52,Standard Deviation=
5.052
2) What can we say about the student marks?
Mean = Mode~ Median- Which means the student marks almost follows a
normal distribution. All features of a Normal Distribution like Bell shaped
curve, symmetry will be almost applicable to this student marks.
Q13) What is the nature of skewness when mean, median of data are equal?
Skewness will be 0 when mean=median=mode. Reason being when
mean=median=mode distribution is symmetric hence there is no question of
Skewness
Q14) What is the nature of skewness when mean > median ?
When mean is greater than median it means median is to the left of mean which
occurs when there are a lot of values towards the right tail. In other words we
can tell mean>median in case of a Right Skewed or positive Skewed distribution.
Q15) What is the nature of skewness when median > mean?
When median is greater than mean or when median occurs to the right of mean it
means there are a lot of values towards the left tail of the distribution. In other
words if median >mean it means a left skewed or negative skewed distribution.
Q16) What does positive kurtosis value indicates for a data ?
A distribution with positive kurtosis or kurtosis>3 indicates it has heavier tails
than the normal distribution. It is called Leptokurtic distribution. Its peaks are
more longer and sharper than mesokurtic distributions.
Q17) What does negative kurtosis value indicates for a data?
A distribution with negative kurtosis or kurtosis<3 indicate it has lighter tails than
the normal distribution. It is called Platykurtic distribution. Its peaks are smaller
and thinner than mesokurtic distributions.
Q18) Answer the below questions using the below boxplot visualization.
Draw an Inference from the distribution of data for Boxplot 1 with respect
Boxplot 2.
a. Box plot 1 has shorter box when compared to Box plot 2 which means their
data consistently hover around the center values.
b. IQR(Plot1)<IQR(Plot2)
c. As the medians are similar for both Box plots the variation in IQR indicates
that plot 2 has more variable data compared to plot 1.
d. Whisker lengths of plot 2 is also greater when compared to whisker lengths
of plot 1 which also indicates Plot 2 data set has larger variation in data set
when compared to plot 1
Q 20) Calculate probability from the given dataset for the below cases
Q 24) A Government company claims that an average light bulb lasts 270
days. A researcher randomly selects 18 bulbs for testing. The sampled bulbs
last an average of 260 days, with a standard deviation of 90 days. If the
CEO's claim were true, what is the probability that 18 randomly selected
bulbs would have an average life of no more than 260 days
Hint:
rcode pt(tscore,df)
df degrees of freedom
Mean=270
(260-270)/(90/(18)^1/2)=-0.47103
Pt(-0.47103,17)=32.18%