0% found this document useful (0 votes)
539 views150 pages

Business Statistics & Analytics Overview

This document outlines the course plan for KMB104 Business Statistics & Analytics. It is divided into 5 units that will be covered over several class sessions: 1. Descriptive statistics including measures of central tendency and dispersion. 2. Time series analysis and index numbers. 3. Correlation and regression analysis. 4. Probability theory and distributions. 5. Hypothesis testing and business analytics including spreadsheet analysis. Suggested reading materials are also provided.

Uploaded by

Punit Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
539 views150 pages

Business Statistics & Analytics Overview

This document outlines the course plan for KMB104 Business Statistics & Analytics. It is divided into 5 units that will be covered over several class sessions: 1. Descriptive statistics including measures of central tendency and dispersion. 2. Time series analysis and index numbers. 3. Correlation and regression analysis. 4. Probability theory and distributions. 5. Hypothesis testing and business analytics including spreadsheet analysis. Suggested reading materials are also provided.

Uploaded by

Punit Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

KMB104 Business Statistics & Analytics

Unit I: Descriptive Statistics

by
Dr. Pravin Kumar Agrawal
Unit I(10 Sessions)
Descriptive Statistics
Meaning, Scope, types, functions and limitations of
statistics, Measures of Central tendency – Mean,
Median, Mode, Quartiles, Measures of Dispersion –
Range, Inter quartile range, Mean deviation,
Standard deviation, Variance, Coefficient of Variation,
Skewness and Kurtosis
UNIT 2 (8 Sessions)
Time Series & Index Number
• Time series analysis: Concept, Additive and
Multiplicative models, Components of time
series, Trend analysis: Least Square method -
Linear and Non- Linear equations, Applications
in business decision-making.

• Index Numbers:- Meaning , Types of index


numbers, uses of index numbers, Construction
of Price, Quantity and Volume indices:- Fixed
base and Chain base methods.
UNIT 3 (6 Sessions)
Correlation & Regression Analysis
• Correlation Analysis: Rank Method & Karl Pearson's
Coefficient of Correlation and Properties of
Correlation.

• Regression Analysis: Fitting of a Regression Line and


Interpretation of Results, Properties of Regression
Coefficients and Relationship between Regression
and Correlation.
UNIT 4 (8 Sessions)
Probability Thoery & Distribution

• Probability: Theory of Probability,


Addition and Multiplication Law, Baye’s
Theorem Probability Theoretical
Distributions: Concept and application of
Binomial; Poisson and Normal
distributions.
UNIT 5 (8 Sessions)
Hypothesis Testing& Business Analytics
• Hypothesis Testing: Null and Alternative Hypotheses;
Type I and Type II errors; Testing of Hypothesis: Large
Sample Tests, Small Sample test, (t, F, Z Test and Chi
Square Test) Concept of

• Business Analytics- Meaning types and application of


Business Analytics, Use of Spread Sheet to analyze
data-Descriptive analytics and Predictive analytics.
Suggested Readings
1. G C Beri – Business Statistics, 3rd ed, TATA McGrawHill.
2. Chandrasekaran & Umaparvathi-Statistics for Managers, 1st
edition, PHI Learning
3. Davis , Pecar – Business Statistics using Excel, Oxford
4. Ken Black – Business Statistics, 5th ed., Wiley India
5. Levin and Rubin – Statistics for Management, 7th ed.,
Pearson
6. Lind, Marchal, Wathen – Statistical techniques in business
and economics, 13th ed, McGrawHill
7. Newbold, Carlson, Thorne – Statistics for Business and
Economics, 6th ed., Pearson
8. S. C. Gupta – Fundamentals of Statistics, Himalaya Publishing
9. Walpole – Probability and Statistics for Scientists and
Engineers, 8th ed., Pearson
Definition
• Statistics is the science which deals with the
methods of collecting, classifying, presenting,
comparing and interpreting numerical data
collected to throw some light on any sphere of
enquiry.“
Seligman
Collection representation and interpretation of
numerical data.
Croxton and Crowed
Definition
• “Statistics is that which deals with the
collection, classification and tabulation of
numerical facts as the basis for explanation,
description and comparison of phenomena.”
Lovitt
Definition
‘By Statistics we mean aggregates of facts
affected to a marked extent by multiplicity of
causes numerically expressed, enumerated or
estimated according to reasonable standards
of accuracy, collected in a systematic manner
for a pre-determined purpose and placed in
relation to each other.’
Horace Secrist
Characteristics
• They should be aggregates of facts
• They should be numerically expressed
• They should be enumerated or estimated
according to reasonable standards of accuracy
• They should be collected in systematic manner
• They should be collected for some
predetermined purpose
• They should be placed in relation to each other
Characteristics
• They should be aggregates of facts.

Single and unconnected figures are not statistics. A single age


of 25 years or 40 years is not statistics but a series relating to
the ages of a group of persons would be called statistics. A
single figure relating to birth, death, purchase, sale, accident,
etc., does not form statistics though aggregates of figures
relating to births, deaths, purchases, sales, accidents, etc.,
would be called statistics because they can be, studied in
relation to each other and are capable of comparison. It is
possible to study them in relation to time, place and frequency
of occurrence.
Characteristics…
• They should be numerically expressed:
– Qualitative expressions like good, bad,
young, old, etc., do not form part of
statistical studies. If it is said that the
production of wheat per acre in 2018 was
100 Kg and in the year 2017 it was only 60
kg or if it is said that of two persons A and
B, A is 20 years old and B 60 years old, we
shall be making statistical statements.
Characteristics …
• They should be enumerated or estimated according to reasonable
standards of accuracy
Numerical statements can either be enumerated
in which case, they are supposed to be accurate and
precise or they can be estimated by some expert
observers. Where the scope of statistical enquiry is
very wide or where the numbers are very large,
enumeration usually out of question and in such
cases figures can only he estimated. It is obvious that
estimated figures cannot be absolutely accurate.
Characteristics …
• They should be collected in systematic manner
– in a haphazard fashion one can never be
sure about the degree of accuracy of such
data. It is, therefore, essential that statistics
must be collected in systematic manner so
that they may conform Jo reasonable
standards of accuracy.
Characteristics …
• They should be collected for some predetermined purpose
if statistical data are not collected with some
predetermined aim their usefulness would be
almost negligible. Figures, are usually
collected with some end in view, as without it
all the efforts made in the collection of figures
would be completely wasteful and the figures
so collected would not be in any way usefuI.
characteristics …
• They should be placed in relation to each other:
Statistics are collected mostly for the purpose of
comparison. If the collected figures are not capable
of being compared with each other they lose a very
large part of their value. It is necessary that the
figures which' are collected should be a
homogeneous lot because it is not possible to
compare figures which are of a heterogeneous
character and which cannot be placed in relationship
to each other.
Functions of Statistics
1. It helps in collecting and presenting the data in a systematic
manner.
2. It helps to understand unwisely and complex data by
simplifying it.
3. It provides basis and techniques for making comparison.
4. It helps to study the relationship between different
phenomena.
5. It helps to formulate the hypothesis and test it.
6. It helps to indicate the trend of behavior.
7. It helps to classify the data.
8. It helps to draw rational conclusions.
Functions of Statistics
• Simplification of Complex Facts
– The foremost purpose of the statistics is to simplify huge collection of
numerical data. It is beyond the reach of human mind to remember
and recollect the huge facts and figures. Statistical method makes it
possible to understand the whole in the short span of time and in a
better way.
• Comparison
– Objective of statistics is to enable comparison to be made
between past and present results will a view to ascertain
the reasons for changes which have taken place and the
effect of such changes in the future.
Functions of Statistics
• Relationship between Facts:
– Statistical methods are used to investigate the cause and
effect relationship between two or more facts. The
relationship between demand and supply, money-supply
and price level can be best understood with the help of
statistical methods between Facts.

Formulation and Testing of Hypothesis:


The most theoretical function of statistics is to test
the various types of hypothesis and discover a
new theory. Eg: whether a particular coin is fair or
not
Functions of Statistics
• To Indicate Trend Behaviour:
– Statistics helps to indicate trend behaviour certain fields of
enquiry. The statistical techniques like Analysis of Time
Series highly used to know the trend behaviour of the
enquiry in question.

• Classification of Data:
– Classification refers to a process of splitting up the data
into certain parts which helps in the matters of comparison
and interpretation of the various features of the data. This
is done by the various improved techniques statistics.
Functions of Statistics
• To Measure Uncertainty:
In most of the social fields, comprising of business,
commerce, economics, it becomes necessary to take decisions
in the face of uncertainty and study the change of occurrence
of certain events and their effect on the polio adopted.

• To Draw Rational Conclusion:


In various fields of uncertainty like business and
commerce, it is very much necessary to draw rational
conclusions on the basis of facts collected and analyzed. For
this, the mind of the decision maker should be free from any
bias and prejudices.
Scope of Statistics
• Statistics and State

Statistical data relating to prices, product, production, consumption,


income and expenditure, etc., are extensively used by the governments
world over for formulating their economic and other policies.

To raise the standards of living of its population, developing countries such as


India are following the policy of planned economic development. For that
purpose the government must base its decisions on correct and sound
analysis of statistical data.

For instance, in formulating its five year plans, the government must have an
idea about the availability of raw materials, capital goods, financial
resources, the distribution of population according to various
characteristics such as age, sex, income, etc., to evolve various policies.
Statistics in Economics
 Statistical analysis is immensely useful in the solution of a variety of
economic problems such as production, consumption, distribution,
etc.
 an analysis of data on consumption may reveal the pattern of
consumption of various commodities by different sections of the
society. Data on prices, wages, consumption, savings and
investment, etc., are vital in formulating various economic
policies.
 Statistical tools of index numbers, time series analysis,
regression analysis, etc., are vital in economic planning. the
consumer price index is used for grant of dearness allowance
(DA) or bonus to workers.
 Demand forecasting could also be made by using time series
analysis.
Statistics in Business and Management

• With the growing size and increasing competition,


the activities of modern business enterprises are
becoming more complex and demanding. The
separation of ownership and management in the
case of big enterprises has resulted in the emergence
of professional management.

• The success of the managerial decision-making


depends upon the timely availability of relevant
information.
Statistics in Business and Management…

• Statistical data has, therefore, been


increasingly used in business and industry in
all operations like sales, purchases,
production, marketing, finance etc.

• Statistical methods are now widely applied in


market and production research, investment
policies, quality control of manufactured
products, economic forecasting, auditing and
many other fields.
Research Activities
Primarily, statistical techniques are used for
collecting information in any research.

Statistical methods are used for analysis and


interpretation of research findings.

Thus there is hardly any branch of study where


statistics is not being used. It is used in all spheres of
human activities.
Statistics and industry

• In industry statistics is widely used


inventory control. In production
engineering to find out whether the
product is confirming to the
specifications or not. eg. inspection
plan, control chart etc.
Statistics, psychology and education

• In education and physiology statistics has


found wide application such as,
determining or to determine
the reliability and validity to a test, factor
analysis etc.
Statistics and Medical science

•  In medical science the statistical tools for


collection, presentation and analysis of
observed facts relating to causes and
incidence of disease and the result of
application various drugs and medicine
are of great importance.
LIMITATIONS OF STATISTICS
• Does not study qualitative phenomena.

• Statistical results are true only on an average: Statistical


methods reveal only the average behaviour of a phenomenon.
The average income of employees of a company will,
therefore, not throw much light on the income of a specific
individual. They are therefore, useful for studying a general
appraisal of a phenomenon.
LIMITATIONS

• Statistics does not deal with


individuals: Since statistics deals with
aggregate of facts, a single and isolated
figure cannot be regarded as statistics.
For example, the height of one individual
is not of much relevance but the average
height of a group of people is relevant
from statistical point or view.
LIMITATIONS…
• Statistics deals only with the quantitative
characteristics: 
Statistics deals with facts which are expressed
in numerical terms. Therefore, those
phenomena that cannot be described in
numerical terms do not fall under the scope of
statistics. Beauty, colour of eyes, intelligence,
etc., are qualitative characteristics and hence
cannot be studied directly.
LIMITATIONS…
• Statistical laws are not exact: Unlike the laws
of natural sciences, statistical laws are not
exact. They are true under certain conditions
and always some chance factor is associated
with them for being true. Therefore,
conclusions based on them are only
approximate and not exact. They cannot be
applied universally. Laws of pure sciences like
Physics and Chemistry are universal in their
application.
Measure of Central Tendency
Measure Of Central Tendency
• A measure of central tendency is a single value
that attempts to describe a set of data by
identifying the central position within that set
of data.
• Also called as measures of central location.
• The mean, median and mode are all valid
measures of central tendency, but under
different conditions, some measures of central
tendency become more appropriate to use
than others.
Arithmetic Mean
• Its value is obtained by adding together all the
items and dividing it by the total number of
observations. If X1, X2, . . . . . , Xn are n values of
a variable X, then the arithmetic mean (A.M.)
in case of raw data, is defined as
Requisites of a good average
• It should be easy to understand and calculate.
•  It should be based on all the observations.
•  It should not be unduly affected by extreme
observations.
•  It should be suitable for further mathematical
treatment.
•  It should be least affected by fluctuations of
sampling.  
Properties of Arithmetic Mean
(i)The algebraic sum of the deviations of the given set of
observations from their arithmetic mean is zero i.e. .
(ii) If n1 and n2 are the sizes and X1 and X2  are the respective
means of two series then the pooled mean of the combined
series of size (n1+n2) observations is given by:

(iii) The sum of the squares of deviations of the given set of


observations, when taken from their arithmetic mean, is
minimum.
Merits of Arithmetic Mean
 It is based on all the observations
  It is easily calculated from the given data
 It is least affected by fluctuations of sampling
 It is suitable for further mathematical
treatment. The average of two or more series
can be obtained from the averages of the
individual series.
Demerits of Arithmetic Mean
• The strongest drawback of arithmetic mean is that it
is very much affected by extreme observations.
•  It can neither be located by inspection nor
graphically.
• It cannot be used for qualitative type of data such as
intelligence, honesty etc. 
• In a skewed distribution, usually arithmetic mean is
not representative of the distribution.
• AM can be a value that does not exist in the series.
• AM can not calculated if a single observation is
missing.
Calculation of Arithmetic Mean in
Individual Series
• Direct Method
Calculation of Arithmetic Mean in a discrete
Frequency distribution
• Direct Method

• Shortcut Method

• Step Deviation Method


Direct Method
Question 2: Following table gives the wages paid
to 125 workers in a factory. Calculate the
arithmetic mean of the wages.

Wage (Rs.) No. of workers


200 5
210 15
220 32
230 42
240 15
250 12
260 4
Short Cut Method
Formula
Question 3: Calculate the arithmetic mean for
the following data by using short-cut method.
Wage (Rs.) No. of workers
200 5
210 15
220 32
230 42
240 15
250 12
260 4
Step Deviation Method
Formula
Question 3: Calculate the arithmetic mean for
the following data by using step deviation
method.
Wage (Rs.) No. of workers
200 5
210 15
220 32
230 42
240 15
250 12
260 4
Answer Using step deviation method
Wage (Rs.) No. of Deviations, d = (X- Step – fd’
workers 230) deviations
(f) d’ = d/10
200 5 -30 -3 -15
210 15 -20 -2 -30
220 32 -10 -1 -32
230 42 0 0 0
240 15 10 1 15
250 12 20 2 24
260 4 30 3 12
N = 125 ∑fd’ = -26
Answer
∑f d’
X = A + X i
N
-26
X = 230 + X 10
125

X = 230 - 2.08 = Rs. 227.92


Continuous Series
Question 4: Use direct method to calculate
arithmetic mean from the following data.

Class Frequency
20-25 8
25-30 10
30-35 12
35-40 20
40-45 11
45-50 4
50-55 5

Do it by Step Deviation Method at Home.


2018-19
Question 4: Find Out Mean from the following
data.
Class Frequency
10-20 15
20-30 20
30-40 45
40-50 15
50-60 5

.
Question 5: The mean marks of 60 students in
section A is 40 and mean marks of 40 students in
section B is 45. Find the combined mean of the
100 students in both the sections.
Question 6: Obtain arithmetic mean for the
following income distribution.

Income (Rs.) Frequency


Below 10 5
10-15 8
15-20 7
20-25 10
25-30 6
30 and above 4
2017-18
Question 6: Obtain arithmetic mean for the
following data.
Marks Below No. of Students

10 15
20 35
30 60
40 84
50 96
60 127
70 198
80 250
Answer
• It is the example of a less than cumulative
frequency distribution is given and for
computing the average, this should first be
converted to a simple frequency distribution
as shown in the table
2019-20
Median
• The measure of the central value when arranged in
ascending or descending order of magnitude.

– The median is that value of the variate which divides the


group in two equal parts, one part comprising all the values
greater and the other, all values less than median

As against arithmetic mean which is based on all the


items of the distribution, the median is only positional
average i.e. its value depends on the position occupied
by a value in the frequency distribution. The median is
less affected by outliers and skewed data.
Calculation of median
Ungrouped data (Odd series)
• When the total numbers of observations are odd,
then the median is the middle value after the
observations are arranged in ascending or
descending order of magnitude. If the number of
observations is equal to n, then the value of
((n+1)/2)th item gives the value of median

• e.g. the median of 5 observations 65,69,52,58,45 i.e.

45,52,58,65,69 is 58.
Calculation of Median…
• When the total number of observations is
even then median is obtained as the
arithmetic mean of the two middle
observations after they are arranged in
ascending or descending order of magnitude.

• Determine the Median


17,32,35,33,15,21,41, 32,11,18
First Arranged in ascending order

11 15 17 18 21 32 32 33 35 41
The Median is the value (N+1)th (10+1) th

=
2 2
= 5.5 th Item

Median = Value of 5.5th item = Value of 5th item + Value of 6th item
2

= 21 + 32 = 53 = 26.5 Marks
2 2
Grouped data
• Steps involved for its computation are:
1) Prepare less than cumulative frequency(c.f. )
distribution table
2)    Find N/2.
3)    Find cumulative frequency just greater than N/2
4)    The class corresponding to step 3 contains the
median value and is called the median class.
The median for a grouped series is given by the
following formula:
Median…
 L is the lower limit of the median class
• f is the frequency of the median class
• h is width of the class interval
• c.f.  is the cumulative frequency of the class preceding the
median class.
Merits of Median
• It is easily understood, very readily calculated
and can exactly be located.
•  It is readily obtained without the necessity of
measuring all the objects.
•  It is not affected by abnormally large or small
values of the variable.
• It can be determined by mere inspections and
can be computed graphically.
Demerits of Median
• The median does not lend itself to algebraic
treatment. The median of several series by
combining the medians of the component series
cannot be computed.
•  Median being positional average is not based on
each and every item of the observations.
• Median is relatively less stable than mean,
particularly for small samples since it is affected
more by fluctuations of sampling as compared to
arithmetic mean.
Question 7: The following data relate to the
height of nine students in a class. Compute the
median height.
S. No. Heights (in cms)
1 153
2 142
3 151
4 144
5 149
6 146
7 147
8 150
9 154
Question 8: Calculate median from the following
data.
S. No. Marks
1 17
2 32
3 35
4 33
5 15
6 21
7 41
8 32
9 11
10 18
2018-19
Question 4: Find Out Median from the following
data.
Class Frequency
10-20 15
20-30 20
30-40 45
40-50 15
50-60 5

.
Quartiles
• Quartiles are those values of the variable which divides the
total frequency into four equal parts. Obviously there will be
three such points Q1, Q2 and Q3 such that Q1≤ Q2 ≤ Q3 termed as
the three quartiles.
• Q1 is known as the lower or first quartile and is the value
which has 25% of the items of the distribution below it and
consequently 75 percent of the items are greater than it. 
• Q3  is known as the upper or third quartile and has 75percent
of the observations below it and consequently 25 percent of
the observations above it.
Quartile Deviation or Semi-Inter-Quartile Range

• The difference between the upper and lower quartiles i.e. Q3 -
Q1 is known as the inter-quartile range.
• The Quartile Deviation(QD) is the product of half of the
difference between the upper and lower quartiles.
Mathematically we can define as: (Q3 - Q1)/2
• For comparative studies of variability of two distributions the
relative measure which is known as Coefficient of Quartile
deviation which is given by
Q3 - Q1
Coefficient of Quartile Deviation =
Q 3 + Q1
Merits of Quartile Deviation

• The quartile deviation is easy to compute and


understand.
• It is a better measure of dispersion than range
because it makes use of 50% of the data.
• It is not affected at all by extreme
observations.
• It can be computed from the frequency
distribution with open end classes.
Demerits of Quartile Deviation

• It is not based on all the observations.


• It is affected considerably by fluctuations
of sampling.
• It is not suitable for further mathematical
treatment.
Question 11: Calculate Q1, Q3, D4 and P90 from
the following data:
Marks No. of Student
0-10 10
10-20 15
20-30 20
30-40 25
40-50 35
50-60 15
60-70 16
70-80 14
Mode
• The mode is the most frequent score in our data set. It
represents the highest bar in a bar chart or histogram.
the mode is used for categorical data where we wish to
know which is the most common category

• Mode is the value which occurs most in a set of


observations and around which the other items of the
set cluster densely. It is defined to be size of the
variable which occurs most frequently or the point of
maximum frequency or the point of greatest density.
•  
• The mode of a distribution is value at the point around
which the items tend to be heavily concentrated.
Example

• Let the scores of 10 students be


40 45 46 48 48 58 58 58 70 and 81
Then the mode is 58 which occurs thrice. A
distribution having one mode is called as
Unimodal.
Merits of Mode
• It is easily understood.
• It can be easily located by mere inspection of
certain items.
• It can be easily determined from the graph.
• The extreme items have no effect provided
they are not in the modal class.
Demerits of Mode
• It is ill defined. A clearly defined mode does not
always exist. The value of mode cannot always be
determined. A distribution can be bimodal or
multimodal.
•  It is not based on all the observations of a series.
• It is not suitable of further mathematical
treatment.
•  As compared to mean, mode is affected to a
greater extent by the fluctuations of sampling.
Empirical relation between Mean,
Median and Mode
• In case of symmetrical distribution mean, mode and
median coincide.
• while for asymmetrical distribution the empirical
relationship is
Mode = 3 Median -2 Mean
When to use the Mean, Median and
Mode
Best measure of central
Type of Variable tendency
Nominal Mode

Ordinal Median

Interval/Ratio (not skewed) Mean

Interval/Ratio (skewed) Median


Calculation of Mode for Grouped Data

Where,
L= the lower limit of the modal class
f1 = the frequency of the modal class

the frequency of the modal class preceding modal class


f 0=

2 = the frequency of the modal class succeeding modal class


Geometric Mean

• The geometric mean (usually denoted by G.M.) of a


set of n observations is the nth root of their product.
If X1, X2, . . . , Xn are n values of a variable X, none of
them being zero, then the geometric mean, G.M. is
defined by
Geometric Mean

• If  X1, X2, . . . , Xn occurs f1, f2 . . . , fn times


respectively then
Merits of Geometric Mean

•  It is rigidly defined


• The G.M. is based on all observations of a series.
•  It is not much affected by fluctuations of sampling.
•  It is suitable for further mathematical treatment.
• Unlike arithmetic mean which has a bias for higher values,
geometric mean has bias for smaller observations.
• As compared with Arithmetic mean, Geometric mean is
affected to a lesser extent by extreme observations.
Demerits of Geometric Mean

• Computations are difficult


•   It is not simple to understand
•  It does not give equal weight to every item.
•  It cannot be calculated if the number of negative
values is odd as well as some value is zero.
Harmonic Mean

• If X1, X2, . . . , Xn are n values of a variable X,


then their Harmonic Mean, abbreviated as
H.M. is defined by
Merits of Harmonic Mean

•  It is rigidly defined


• The H.M. is based on all observations of a series.
•  It is not much affected by fluctuations of
sampling.
•  It is suitable for further mathematical treatment.
•  Since the reciprocals of the values of the variable
are involved, it gives greater weight age to smaller
observations and as such is not very much
affected by one or two big observations. 
Demerits of harmonic mean

• Computations are difficult and not simple to


understand.
• It cannot be calculated if any one of the
observations is zero.
•  It is not a representative figure of the
distribution unless the phenomenon requires
greater weight age to be given to smaller
values.
Use of Harmonic Mean
• H.M. is used in finding averages involving speed, time, price and ratios. It
is useful for computing the average rate of increase of profits of a concern
or average speed at which a journey has been performed or the average
price at which an article has been sold. The rate usually indicates the
relation between two different types of measuring units that can be
expressed reciprocally. The H.M. is used for the problems about work,
time and rate, where the amount of work is held constant and the average
rate is required, or in problems about total cost, number of persons and
per capita cost is called for or in problems of similar nature involving rates.
MEASURES OF DISPERSION
• The measures of central tendency are just different types of averages and do not
indicate the extent of variability in a distribution. Averages or the measures of
central tendency give us an idea of the concentration of the observations about
the central part of the distributions. Let us consider two series I and II
Series                                                                           Total    Mean
I           20        20        25        25        30        30        150      25
II         15        20        25        25        30        35        150      25 

in the first case the observations vary from 20 to 30 and in the second case, the
observations vary from 15 to 35 i.e. The greatest deviation from the mean in the
first case is 5 and in the second case it is 10. Such a variation is called dispersion.

Measures of dispersion help us to study variability of


the items i.e. the extent to which the items vary
from one another and also from the central value.
Dispersion
• Dispersion refers to the variation of the items among
themselves. If the value of all the items of a series is the same, there
will be no variation among the various items and the dispersion will be
zero. On the other hand, the greater the variation among different items
of a series, the more will be the extent of dispersion. 

• Secondly, dispersion refers to the variation of the


items about an average. If the difference between the value of
items and the average is large, the dispersion will be high and on the other
hand if the difference between the values of items and average is small,
the dispersion will be low. Thus, dispersion is defined
as scatteredness around central value or the spread of the individual items
in a given series.
Dispersion
• Dispersion is the measure of the variation of the
items
-Bowley

• The degree to which numerical data tend to


spread about an average value is called the
variation or dispersion of the data
--Spigel
Objectives of Measuring Dispersion
• To determine the reliability of an average.
• To compare the variability of two or more
series.
• For facilitating the use of other statistical
measures.
• Basis of Statistical Quality Control.
Characteristics for an Ideal Measure of
Dispersion
• It should be rigidly defined.
• It should be based on all observations.
• It should be easily calculated.
• it should be amenable to further mathematical
treatment.
• It should be affected as little as possible by
fluctuations of sampling.
• It should not be affected much by extreme
observations.
Absolute and Relative Measures of
Dispersion
• The measures of dispersion which are expressed in terms of the
original units of a series are termed as Absolute Measures.

• Such measures are not suitable for comparing the variability of


the two distributions which are expressed in different units of
measurement.  

• On the other hand, relative measures of dispersion are obtained


as ratios or percentages and are thus pure numbers
independent of the units of measurement. These measures are
used to compare two series expressed in different units.
Range

• Simplest measure of dispersion

• Difference between the greatest and the


smallest observation of the distribution.

• Range = Xmax -Xmin where Xmax is the


greatest observation and Xmin is the
smallest observation of the variable
value.
Grouped frequency distribution

• Difference between upper limit of the highest class


and the lower limit of the smallest class.

• In order to compare the variability of the two or


more distributions given in different units of
measurement, the relative measure , called
coefficient of range is used and this is defined as
follows:
 Merits / Demerits
• It is rigidly defined, readily comprehensible and
easiest to compute.

DEMERITS
• It is not based on all the observations.
• It is unreliable measure of the dispersion.
• Range is not suitable for mathematical treatment.
Uses of Range
• In a number of fields where the data have small
variations like in stock market fluctuations, the
variations in money rates and rate of exchange .
• It is used in industry for the statistical quality control
of the manufactured products by the construction of
R chart i.e. the control chart for range.
• It is also used as a very convenient measure by
meteorological department for weather forecasts in
terms of difference between maximum and
minimum temperature.
Question 1: Marks of 10 students in
Mathematics and Statistics are given below:
Marks in 25 40 30 35 21 45 23 33 10 29
Maths.
Marks in 30 39 23 42 20 40 25 30 18 19
Stat.

(a) Compare the range of marks in the two


subjects.
(b) Compare the coefficients of range for both
the subjects.
Question 2: Find the range and coefficient of
range from the following distribution:

Classes Frequency
6-10 7
11-15 8
16-20 15
21-25 35
26-30 18
31-35 7
36-40 5
Mean Deviation or Average Deviation

• This measure of dispersion is obtained by taking the


arithmetic mean of the absolute deviations of the
given values from a measure of central tendency.

– Average deviation is the average amount


of scatter of the items in a distribution either the
mean or the median, ignoring the signs of
deviations. The average that is taken of the scatter
is an arithmetic mean, which accounted for the
fact that this measure is often called the mean
deviation.
Where 

Calculation of Mean Deviation


• If X1, X2, ---, Xn are n given observations then mean
deviation (M.D.) is

Mean deviation is minimum when it is calculated from median. In other words,


mean deviation calculated about median will be less than the mean deviation about
mean or mode. The relative measures of mean deviation is called coefficient of
mean deviation is given by
Coefficient of Mean Deviation
Mean Deviation for Grouped Data
 Merits of Mean Deviation

• It is rigidly defined, easy to understand and calculate.

• It is based on all observations and is better than range


and quartile deviation.

• The averaging of the absolute deviations from an


average irons out the irregularities in the distribution
and thus provides an accurate measure of dispersion.

•   It is less affected by extreme observations.


Demerits of Mean Deviation
• Ignoring the signs is not correct from
mathematical point of view.
• It is not an accurate method when it is
calculated from mode.
•  It is not capable of further mathematical
treatment.
Question 4: Compute mean deviation and its
coefficient from the following data relating to
the marks obtained by a batch of 11 students in
a class test:

Marks 10 70 50 53 20 95 55 42 60 48 80
Question 5: Compute the mean deviation (M.D.)
and its coefficient from the following data:

Classes Frequency
0-20 5
20-40 50
40-60 84
60-80 32
80-100 10
100-120 6
Calculation of Mean
Classes X Frequency fX
(f)
0-20 10 5 50
20-40 30 50 1500
40-60 50 84 4200
60-80 70 32 2240
80-100 90 10 900
100-120 110 6 660
Total 187 9550

∑ fX
= 9550/187 = 51.07
Mean =
N
Standard Deviation
• Greek alphabet σ
•  by Karl Pearson as a measure of dispersion in
1893.
– It is defined as the positive square root of the
mean of the square of the deviations of the given
observations from their arithmetic mean. If
X1,X2,---, Xn is a set of n observations then its
standard deviation is given by :
Standard deviation for grouped data

• In case of a grouped data, the standard


deviation is given by:
Standard deviation
Where;
• Xi is the value of the variable or mid value of the
class in case of grouped frequency distribution;
• fi is the corresponding frequency of the value Xi,
• Ʃfi is the total frequency
• (ƩfiXi/N)is the arithmetic mean of the distribution.
• The square of the standard deviation viz., σ2 is
called variance
Variance
• Other formulae for calculating variance is
Merits of Standard Deviation
•  It is rigidly defined.

• It is based on all observations and is the best measure of


dispersion.

• The squaring of the deviations from mean removes the


drawback of ignoring the signs of deviations in computing the
mean deviation. This makes it suitable for further
mathematical treatment. The variance of the combined series
can also be computed.

• It is least affected by fluctuations of sampling and therefore, it


widely used in sampling theory and tests of significance. 
Deviation Method

Step Deviation Method


Demerits of Standard Deviation

• Difficult to understand and calculate.


Coefficient of Variation
• Standard deviation is an absolute measure of dispersion. The
relative measure of dispersion based on standard deviation is
called the coefficient of standard deviation and is given by

This is a pure number independent of the units of


measurement and thus, is suitable for comparing the
variability, homogeneity or uniformity of two or more
distributions.
Coefficient of Variation
• Suggested by Prof. Karl Pearson.

• For comparing the variability of two distributions we


compute the coefficient of variation for each
distribution.

• A distribution with relatively smaller C.V. is said to be


more homogeneous or uniform or less variable or
more consistent than the other and the series with
relatively greater C.V. is said to be more
heterogeneous or more variable or less consistent
than the other.
Question 9: A sample of 5 items was taken from
the output of a factory. The length and weight of
5 items are given below:
Length (Inches) Weight (Ounces)
X Y
5 13
6 15
7 18
9 19
12 20

State which of the two characteristics of the two


items is more variable.
Question 10: Below are given the number of runs scored
by two batsmen in eight matches.
Batsman A Batsman B
27 0
16 100
39 80
45 5
101 60
80 40
40 10
52 121
Who is better run scorer? Also find which of the two
batsmen is more consistent in scoring.
Question 11: In the following table, distribution of
students is shown according to their weights in Kg.
Find the coefficient of variation for each series.
Which series show greater variation in weights ?

Weight (Kg) Class A Class B


20-30 7 5
30-40 10 9
40-50 20 21
50-60 18 15
60-70 7 6
Question 12: In the factories A and B the average weekly
wages and standard deviation are as follows:

Factory Average weekly S.D. of wages No. of workers


wages (Rs.)

A 460 50 100
B 490 40 80
Answer the following:
1. Which factory pays larger amount as weekly wage?
2. Which factory shows greater variability in the
distribution of wages?
3. What is the mean of wages of all the workers in two
factories taken together?
2016-17
Question 4: The following table gives the frequency
distribution of expenditure on education per family
among middle class families in two cities.

Expenditure(in No. of Families


thousand Rs.)
City ‘A’ City ‘B’

3-6 28 39
6-9 292 284
9-12 389 401
12-15 212 202
15-18 59 48
18-21 18 21
21-24 2 5
a) Find the standard deviation of the
expenditure at both cities.
b) Find out which of the city shows greater
variability.
SKEWNESS
• The measures of central tendency tell us about the
concentration of the observations about the middle of the
distribution and the measure of dispersion gives us an idea
about the spread or scatter of the observations about some
measure of central tendency.

• These measures, however, don’t adequately describe a


frequency distribution in the sense that there could be two or
more distributions with the same mean and standard
deviation but still different from each other with regard to
shape or pattern of distribution. Thus these two measures of
central tendency and dispersion are inadequate to
characterize a distribution completely and must be supported
and supplemented by two more measures viz. skewness and
kurtosis.
SKEWNESS
• Skewness is lack of symmetry. It measures the
degree of departure of a distribution from
symmetry and reveals the direction of
scatterdness of the items.

• If a frequency distribution is not symmetrical,


it is said to be asymmetrical or skewed. Any
deviation from symmetry is called skewness.
SKEWNESS
• Asymmetry or lack of symmetry in the shape of a frequency
distribution.
Morris Humberg Skewness

•  When a series is not symmetrical it is said to be asymmetrical


or skewed.
– Croxton & Cowden

• In a symmetrical distribution the mean, median and mode are


identical. The more we move away from the mode, the larger
the asymmetry or skewness.
Symmetrical curve
• The figure presents the shape of a symmetrical curve
which is bell shaped having no skewness. The value
of mean (M), median (Md) and mode (Mo) for such a
curve would be identical.
SKEWNESS
• In a symmetrical distribution the values of
mean, median and mode coincide. The spread
of the frequencies is the same on both sides of
the centre point of the curve.

• For a symmetrical distribution Mean = Median


= Mode.
Positively skewed curve
• A positively skewed curve has a longer tail towards the higher values of X
i.e. the frequency curve gradually slopes down towards the higher values
of X. In a positively skewed distribution the mean is greater than the
median and then mode and the median lies in between mean and mode.
The frequencies are spread over a greater range of values on the high
value end of the curve (the right hand side) as is clear from the Figure . For
a positively skewed distribution Mean > Median > Mode.
 Negatively skewed curve
• A negatively skewed curve has a longer tail towards the lower
values of X i.e. the frequency curve gradually slopes down
towards  the lower values  of X as shown below

In the negatively skewed distribution the mode is the maximum and


mean is the least. The median lies in between mean and mode. The
elongated tail in negatively skewed distribution is on the left hand side
as would be clear from above Figure. For a negatively skewed
distribution, Mean < Median < Mode.
Absolute SKEWNESS
• The absolute measures of skewness tell us the
extent of asymmetry and whether it is positive or
negative. The absolute skewness is based on the
difference between mean and mode.

Absolute skewness = Mean - Mode


• Skewness positive, if the value of mean > mode
• Skewness negative, if the value of mean < mode.
Absolute SKEWNESS
• The difference between the mean and the mode,
whether positive or negative, indicates the
distribution is positively skewed or negatively
skewed. However, such an absolute measure of
skewness is not adequate because it cannot be used
for comparison of skewness in two distributions, if
they are in different units, since difference between
the mean and mode will be in terms of the units of
distribution. Thus for comparison purposes we use
the relative measures of skewness known as co-
efficient of skewness.
Karl Pearson coefficient of skewness

Skewness will be positive if mean > mode


and negative if mean < mode.

Its value generally lies between -1 and


+1.
Bowleys Coefficient of Skewness

• Prof. A.L. Bowleys Coefficient of Skewness is


based on quartiles and is given by:

The value of this coefficient generally lies


between -1 and +1.
Kurtosis
• Kurtosis is a Greek word means bulginess. Kurtosis is
used to describe the peakedness of a curve. kurtosis
refers to the degree of flatness or peakedness in the
region about the mean of a frequency curve. The
degree of Kurtosis of a distribution is measured
relative to the peakedness of normal curve.
Kurtosis
• where

Curve which is neither flat nor peaked is known as normal curve and shape of its
hump is accepted as a standard one.
Curves with humps of the form of normal curve are said to have normal kurtosis
and are termed as  Mesokurtic (beta=3 and Gamma = 0)  .

The curve of type A , which are more peaked than the normal curve are known
as Leptokurtic (beta>3 and Gamma >0)    and are said to lack kurtosis or to have
negative kurtosis. Curves of type C , which are flatter than the normal curve are
called  Platykurtic (beta<3 and Gamma < 0)   and they are said to posses kurtosis
in excess or have positive kurtosis.
Question 1: Calculate Karl-Pearson’s coefficient of
skewness for the following data.

25, 15, 23, 40, 27, 25, 23, 25, 20


Question 2: Find the coefficient of skewness from
the data given below:
Size Frequency
3 7
4 10
5 14
6 35
7 102
8 136
9 43
10 8
Question 3: Find Karl-Pearson’s coefficient of
skewness for the given distribution:
x f
0-5 2
5-10 5
10-15 7
15-20 13
20-25 21
25-30 16
30-35 8
35-40 3
2017-18
Question 2: Compute an appropriate measure of
skewness for the following data:

Sales (Rs. lakhs) No. of companies


Below 50 12
50-60 30
60-70 65
70-80 78
80-90 80
90-100 55
100-110 45
110-120 25
Above 120 10

You might also like