0% found this document useful (0 votes)
36 views31 pages

Statistics Volume 2

Uploaded by

Aaradhya Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views31 pages

Statistics Volume 2

Uploaded by

Aaradhya Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Video – 8A

Partitional Values- Percentile & Decile

Decile: Word Decile is derived from Decade. It means to divide the series in
10 equal parts.
Decile for Individual & Discrete Series:

Percentile: To divide the series in 100 equal parts.


Percentile for Individual & Discrete Series:

n= Total no. of observations

Question: Find D5 & P93 of the following data:-


23, 13, 37, 16, 26, 35, 26, 35

Decile & Percentile of Discrete Data:


Que) The marks obtained by 100 students are given below. Find the D8 &
P37of the marks.

Marks No. of
Obtained Students
20 6
29 28
28 24
33 15
42 2
38 4
43 1
25 20

Que) Find D8 & P86 of the following data

Class No. of
Interval Students

0-10 5

10-20 8

20-40 16

40-60 7

60-90 4

Total 40
Video- 10(a)
Multiple & Partial Correlation

Multiple & Partial Correlation: It shows the relationship between atleast 3


variables. In other words, we can say that it shows the combined influence
of 2 or more independent variables on a single dependent variable.
Coefficient Of Multiple Linear Correlations:

Note: The value of Multiple correlation coefficient lies between 0 to 1. If the


value is not between 0 & 1, data may be inconsistent.

Partial Correlation:
It is a simple correlation between two variables after eliminating the
influence of the third variable.
In simple words, we can say that it shows the relationship between two
variables at a time taking other variable/variables as constant.

Note: The value of partial correlation coefficient lies between -1 to 1.

Relationship between Simple, Partial & Multiple Correlation:


It is a simple correlation between two variables after eliminating the
influence of the third variable.
In simple words, we can say that it shows the relationship between two
variables at a time taking other variable/variables as constant.
17.Time Series

Time Series: A time series consists of data arranged choronologically.


There are two variables in Time Series:
a) Time
b) Any other variable: Sales, Purchase, Production, etc.
Data is arranged in chronological order in Time Series.

Purpose of Time Series: a) Forecasting b) Past data evaluation


Note: Time can be in year, month, days, weeks, hours, quarter, decade, etc.
in Time Series.

Components of Time Series:


Secular Trend: If there is continuous movement in Time Series, either
upward or downward or gradual shift over a period of time, then the trend
formed is called Secular Trend.
Seasonal Variation: Seasonal Variation shows periodic movement when
time is less than 1 year. Seasonal Trend gets repeated every year (periodic)
but the variation will be less than 1 year.
Eg: Sale of sweet during Diwali.
Cyclical Variation: Cyclical Variation shows periodic movement but the
duration of cycle will be more than 1 year (can be 4-5 year or more).
Irregular Trend: It is completely random & unpredictable.
Eg: Earthquake, cyclone, flood situation.
Time Series: Fitting of Straight Line Trend for Odd Number of Years by
Least Square Method

Time Series: Fitting of Non-linear Trend by Least Square Method


(Second degree Parabola)

Year Sale (Rs.)

2011 19

2012 27

2013 29

2014 33

2015 29

Measurement of Seasonal Variation


Seasonal Variations: Seasonal Variations are periodic movement which
tends to repeat themselves at regular intervals of time.
In Seasonal Variations, time period is always less than one year.
Example:
1. Demand of electricity rises rapidly during Summer every year.
2. Sales of sweets increase during festive seasons every year.
Measurement of Seasonal Indices:
1. Simple Average Method
2. Ratio to moving average method
3. Ratio to trend method

Calculate the seasonal index for the following data by using Simple Average
Method

Calculate Seasonal Indices for the following data by method of Simple


Average
Video - 23. Statistical Inference

Estimators and Estimate: For the purpose of estimating a population


parameter we can use various sample statistics like sample mean (𝑋̅),
sample median (M), Sample variance (𝑆 2 ), etc. are called estimators and the
actual value taken by the estimators are called estimates.

Point Estimate: A single value of a statistic that is used to estimate the


unknown population parameter is called a point estimate. For example, the
sample mean (𝑋̅) which we use for estimating the population mean 𝜇 is a
point estimator of 𝜇. Similarly, the statistic 𝑆 2 is a point estimator of 𝜎 2
where the value of 𝑆 2 is computed from a random sample.

Interval Estimate: An interval estimate refers to the probable range within


which the real value of a parameter is expected to lie. The two extreme
limits of such a range are called confidence limits and the range is called a
confidence interval. These are determined on the basis of sample studies
of a population.

Properties Of A Good Estimator: A good estimator is one which is as close


to the true value of the parameter as possible. A good estimator possess
the following properties or characteristics:-
I. Unbiased Estimator: An estimator 𝜃̂ is said to be unbiased estimator
of the population parameter 𝜃 if the mean of the sampling distribution
of𝜃̂is equal to the corresponding population parameter 𝜃. In terms of
mathematical expectations, 𝜃̂ is an unbiased estimator of 𝜃 if the
expected value of the estimator is equal to the parameter being
estimated.
Example:-
• Sample mean 𝑋̅ is an unbiased estimate of the population when
E( 𝑋̅ ) = 𝜇
• Sample variance (S2) is an unbiased estimate of the population
variance (𝜎 2 ) when:
E(S2) = 𝜎 2
II. Consistent Estimator
An estimator is said to be consistent if the estimator approaches the
population parameter as the sample size increases. For example:-
i. E( 𝑋̅ ) 𝜇 n ∞
ii. Var ( 𝑋̅ ) 𝜎2 n ∞

III. Efficient Estimator: Efficiency is a relative team. Efficiency of an


estimator is generally defined by comparing it with another estimator. The
estimator which variance is less is an efficient estimator. The estimator 𝜃̂1,
is called an efficient estimator if the variance of 𝜃̂1 is less than the variance
of 𝜃̂2
Var (𝜃̂1) < var (𝜃̂2)
IV . Sufficient Estimator: The last property that a good estimator should
possess is sufficiency. An estimator is said to be a sufficient estimator of a
parameter if it contains all the informations in the sample regarding the
parameter. In other words, a sufficient estimator utilises all informations
that the given sample can furnish about the population.

Interval Estimation:
Interval Estimation For Large Samples: In large sample, the interval
estimation is further studied under the 4 heading

i. Confidence Interval or Limits For Population Mean:


The determination of the confidence interval or limits for the population
mean 𝜇 in case of large sample (n >30), requires the use of normal
distribution.
𝑋̅ ± Zα⁄2 .SE ̄ x
SE ̄ x = Standard Error of Population Mean

𝜎
@ 95% confidence limits for 𝜇: 𝑋̅ ±1.96
√𝑛
𝜎
@ 99% confidence limits for 𝜇 : 𝑋̅ ±2.58
√𝑛

Example: A random sample of 100 Observations yields sample mean 𝑋̅ =


150 and sample variance s2= 400. Compute 95% and 99% confidence
interval for the population mean.

Que: The 95% confidence level limits to a population Having mean 100 are
94.12-105.88. Find the population standard deviation if sample size is 100.
a) 15 b) 30 c) 45 d) 60

II Confidence Interval Or Limits For Population Proportion P :


The sampling distribution associated with proportions is binomial
distribution.
The sample is large, ie., n>30 and np & nq ≥ 5. Here n is the size of the
sample, p is the proportion of success and q = 1-p.
𝑝 ± Zα⁄2 .SEp

SEp = Standard Error of Proportion

𝑝𝑞
→ At 95% confidence limits for P are: 𝑃 ± 1.96√
𝑛

𝑝𝑞
→ At 99% confidence limit for P are: 𝑃 ± 2.58√
𝑛
Example: A random sample of 1000 households in a city revealed that 500
of these had Car. Find 95% and 99% confidence limits for the proportion of
households in the city with car.
Que) A factory manufactures 10,000 bolts per day. From a sample of 500
bolts, 4% were of unacceptable quality. Estimate the interval of number of
bolts that would be of unacceptable quality produced per day at 95%
confidence.
a) (313,487) b) (305,495) c) (328,472) d) (233, 567)

(iii) Confidence Interval Or Limits For Population Standard Deviation :-


The determination of the confidence interval or limits for population
S.D. (𝜎) in case of large sample requires the use of normal
distribution.
𝑆 ± 𝑍𝛼⁄2 .SEs
𝜎 𝑠
S.E.s = or
√2𝑛 √2𝑛
𝑠
At 95% level: S ± 1.96
√2𝑛
𝑠
At 99% level: S ± 2.58
√2𝑛

Example: A random sample of 50 observations gave a value of its standard


deviation equal to 24.5. Construct a 95% confidence interval for population
standard deviation σ.

IV . Determination Of A Proper Sample For Estimating 𝝁 or P :


So far we have calculated the confidence intervals based on the assumption
that the sample size 'n' is known. In most of the practical situation, generally,
sample size is not known. The method of determining a proper sample size
is studied under 2 headings:
(a) Sample Size For Estimating A Population Mean:
In order to determine the sample size for estimating a population mean, the
following 3 factors must be known :
I. Desired confidence level and the corresponding values of Z.
II. Permissible sample error E
III. Standard deviation or an estimate of σ.
𝒛.σ 2
Sample size n= ( )
𝑬

Example: A cigarette manufacturer wishes to use a random sample to


estimate the average. nicotine content. The sampling error should not be
more than one milligram above or below the true mean, with 99%
confidence level. The population standard deviation is 4 milligram. What
sample size should the company use in order to satisfy these requirements?

(b) Sample Size For Estimating A Population Proportion:


In order to determine the sample size for estimating population proportion,
the following 3 factors must be known: -
I. the desired level of confidence and the corresponding value of Z.
II. the permissible sampling error E.
III. the actual or estimated true proportion of success P.
𝐙 𝟐 . 𝐏𝐐
The size of sample n =
𝐄²

Example: A firm wishes to determine with a maximum allowable error of


0.05 and a 99% level of confidence the proportion of consumer who prefer
its product. How large a sample will be required in order to make such an
estimate if the preliminary sales report indicate that 25% of all the
consumers prefer the firm's product?
Que) 10% of people in a village are afflicted with a viral disease. What size of
sample should be taken to ensure that error of estimation of the proportion
is not more than 5% with 95% confidence?
a) 138 b) 238 c) 38 d) 338
Video 26. Z-Test
Z-Test
Conditions for Z-test:
1) Sample size must be large (n ≥ 30)
2) Sample must be selected randomly
3) Z-test follows Standard Normal Distribution of which mean is 0 &
variance is 1.
Notes:
❖ In case of Null Hypothesis we consider that our sample is taken from
the population. We also assume that there is no difference between
Sample Mean & Population Mean.
❖ In case of Alternative Hypothesis there can be difference between
Sample Mean & Population Mean (> or<).
❖ If the difference is ≠, we will apply two-tailed test. If the difference is >
or <, we will apply one-tailed test.

Critical Value (Zα)

❖ -ve sign indicates that it lies towards the left.


Q.) A sample size of 400 was drawn and the sample mean was found to be
99. Test whether this sample would have come from a Normal population
with mean 100 and standard deviation 8 at 5% level of significance.

Q.) The mean lifetime of a sample of 400 bulbs produced by a company is


found to be 1570 hours with a standard deviation of 150 hours. Test the
hypothesis that the mean lifetime of the bulbs produced by the company is
1600 hours against the alternative hypothesis that is greater than 1600 hours
at 1% level of significance.
Video-27 t-test

Condition for t-test: Sample size must be small (n<30)


Methods:
1) Test of hypothesis about the population mean
2) Difference between two means
3) Observed Coefficient of Correlation

Modified Standard Deviation


Actual Mean Method:

Assumed Mean Method:

If sample Standard Deviation is given:


Que.) A group of 5 patients treated with a medicine A weights: 42, 39, 48, 60
& 41 kg. In the light of above data, discuss the suggestion that mean weight
of the population is 48 kg. Test at 5% level of significance.
Modified Standard Deviation
Actual Mean Method:

Assumed Mean Method:

If sample Standard Deviation is given:

Assumed Mean Method:


Que) A random sample of 9 boys had heights (inches): 45, 47, 50, 52, 48, 47,
49, 53 & 51. In the light of the data, discuss the suggestion that mean height
in the population is 47.5.
When sample standard deviation is given:
Que) Sixteen oil tins are taken at random from an automatic filling machine.
The mean weight of the tins is 14.5 kg with Standard Deviation of 0.4 kg. Does
the sample mean differ significantly from the intended weight of 16 kg?
Test of Hypothesis about difference between two means in case of independent samples:
When Standard Deviations of two samples are given:
Que) The mean life of a sample of 10 electric bulbs was found to be 1456
hours with s=423 hrs. A second sample of 17 bulbs chosen from different
batch showed a mean life of 1280 hours with S=398 hrs. Is there any
significant difference between the mean of two batches.
Video 28- ANOVA
Analysis of Variance (ANOVA)
ANOVA technique is used to compare more than 2 populations or
population having more than two subgroups.
Assumption:
1. Each population is having Normal Distribution.
2. All the samples drawn from a population have equal variance.
3. Each sample is drawn randomly & they are independent.

The sum of variances of all the components should be equal to the total
variance.
Observations in sample data in ANOVA is classified according to one-factor
& two-factor. If it is classified according to one-factor, it is known as One-
Way ANOVA. If it is classified according to two-factor, it is known as Two-Way
ANOVA.

Uses of ANOVA:
1. Test of significance between the Means of several Samples:
ANOVA is used to test the hypothesis whether the means of several
Samples are significantly different or not.
2. Test of significance between the Variance of two Samples: F-ratio
in ANOVA is used to test the significance of the difference between the
variance of two samples.
3. Study of Homogenity in case of Two-way classification:
Homogenity of data can also be studied in ANOVA of two-way
classification because in this case the data are classified into different
parts on 2 bases.
Uses of ANOVA:
Test of Correlation & Regression: ANOVA is used to test the significance of
Multiple Correlation Coefficient. The Linearity of Regression is also tested
with is help.
𝑉𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑡ℎ𝑒 𝑀𝑒𝑎𝑛𝑠
ANOVA =
𝑉𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑤𝑖𝑡ℎ𝑖𝑛 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛

Total Variance = Variability between the Means + Variability within the


distribution

One Way ANOVA: It is classified according to only one factor or one criteria

Two way Anova:


The data has number of hours 4 students studied on different days:

You might also like