0% found this document useful (0 votes)
14 views30 pages

Lecture 3 Sem 1 Edited

The document discusses various measures of central tendency including the mean, median, and mode. It provides examples of calculating the mean, weighted mean, and geometric mean. It also discusses how the median and mode are calculated and how they are not affected by outliers unlike the mean. The document then discusses using measures of central tendency and dispersion to describe the shape and spread of data distributions.

Uploaded by

tomshave28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views30 pages

Lecture 3 Sem 1 Edited

The document discusses various measures of central tendency including the mean, median, and mode. It provides examples of calculating the mean, weighted mean, and geometric mean. It also discusses how the median and mode are calculated and how they are not affected by outliers unlike the mean. The document then discusses using measures of central tendency and dispersion to describe the shape and spread of data distributions.

Uploaded by

tomshave28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

LECTURE 3

Numerical
Descriptive
Techniques
Measures of Central Tendency

 An attribute of a distribution concerning where the


values of the distribution tend to congregate.
 A central tendency is the extent in which all the data
values group around a typical or central value

 Arithmetic Mean for ungrouped data:


Population mean (min populasi)

𝜇=
∑ 𝑥𝑖 xi = observation-i
𝑁
N = total observations in
Sample mean (min sampel) the

𝑥=
∑ 𝑥𝑖 population
𝑛 n = total observations in
Measures of Central Tendency:
The Mean

 The most common measure of central tendency


 Mean = sum of values divided by the number of values
 Affected by extreme values (outliers)

11 12 13 14 15 16 17 18 19 20 11 12 13 14 15 16 17 18 19 20

Mean = 13 Mean = 14

11  12  13  14  15 65 11  12  13  14  20 70
  13   14
5 5 5 5
Weighted Mean

Instead of each data point contributing equally to the final


mean, weighted mean is calculated by giving values in a data
set different weight according to some attribute of the data.
These weightings determine the relative importance of each
value on the average.

The weighted mean of a set of numbers x1 , x2 , ... , xn with


corresponding weights w1 , w2 , ... , wn is computed from the
following formula:

or =
Example 1

The shipments of peanuts (in thousands of bag) from an


exporter to five city and the profits of selling ($ per thousand
bags) are shown in following table.

City A B C D E
Quantity 64 15 285 228 45
Profit 15.00 13.5 15.50 12.00 14.00

Compute the mean profit per thousand bags for the shipment.

𝜇 𝑤 =¿ 64(15)  15(13.5)  ....  45(14) = $14.04 per


64  15  285  228  45 thousand bags
Geometric Mean

The geometric mean of a set of numbers


The geomatric mean is used to measure the average growth
rate or average rate of change in a variable over time.
Geometric Mean of Growth Rate

Let Ri denote the growth rate / rate of change (expressed in


decimal form) in period i , for i = 1, 2, ... , n

The geometric mean, RG of the growth rate / rate of change


is defined such that

= ....

Solving for RG , we obtain the following formula

=
OR RG  [(1  R1 )  (1  R2 )    (1  Rn )] 1 n  1

The average growth rate / average rate of change is


expressed in term of percentage
Example 2

Suppose you made a 6-year investment of RM5000, and the


annual returns rate (in percentage) over the past six years
were -22.1, 28.7, 4.9, 0.5, -1.3 and 15.8 respectively. What
is the average annual return rate over this period?

= 0.032 = 3.2%

The value at the end of the investment period is


Exercise

 An investment of $1,000 you made 4 years ago was worth


$1,200 after the first year, $1,200 after the second year, $1,500
after the third year and, $2,000 today.
a. Compute the annual rates of return
b. Compute the mean of the rates of return.
c. Compute the geometric mean.
d. Discuss whether the mean or geometric mean is
the best measure of the performance of the
investment.
Answer

x
 x i

.20  0  .25  .33 0.78
  0.195  19.5%
n 4 4
Answer
(c)

R g  4 (1  R1 )(1  R 2 )(1  R 3 )(1  R 4 )  1

= 4 (1  .20)(1  0)(1  .25)(1  .33)  1 = .188 = 18.8%

(d) The geometric mean is better because the present value of the return in
the 4th period is = 1000 (1 + 0.188) 4 = 2000.
Measures of Central Tendency:
The Median
The median is the midpoint of the values after they have been
ordered from the smallest to the largest

 In an ordered array, the median is the “middle”


number (50% above, 50% below)
11 12 13 14 15 16 17 18 19 20 11 12 13 14 15 16 17 18 19 20

Median = 13 Median = 13

 Not affected by extreme values (outliers)


Measures of Central Tendency:
Locating the Median

 The location of the median when the values are in numerical order
(smallest to largest):
n 1
Median position  position in the ordered data
2
 If the number of values is odd, the median is the middle number

 If the number of values is even, the median is the average of the


two middle numbers

n 1
Note that is not the value of the median, only the position of
2
the median in the ranked data
Median

Data: {0, 7, 12, 5, 14, 8, 0, 9, 22} N=9 (odd)


Sort them bottom to top, find the middle:
0 0 5 7 8 9 12 14 22

Data: {0, 7, 12, 5, 14, 8, 0, 9, 22, 33} N=10 (even)

Sort them bottom to top, the middle is the


simple average between 8 & 9:
0 0 5 7 8 9 12 14 22 33
median = (8+9)÷2 = 8.5
Sample and population medians are computed the same way.
NHMD, EPPD2023, Sem 2, Sesi 2020-2021
Measures of Central Tendency:
The Mode
 The mode is the value of the observation that
appears most frequently
 Not affected by extreme values
 Used for either numerical or categorical (nominal)
data
 There may be no mode
 There may be several modes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6

No Mode
Mode = 9
Chap 3-
Distribution Shape & Measures
of Central Tendency

Mean Mean Mode Mode Mean


Median Median Median
Mode

Mean = Median = Mean < Median < Mode < Median <
Mode Mode Mean
Symmetry Left skewed Right skewed
Income level for 200 people

73 77 62 35 33 63 68 31 20 93
53 73 94 75 72 66 64 55 60 73
66 68 64 83 62 38 24 25 58 58
82 72 83 26 82 54 68 49 73 27
54 57 24 30 70 75 96 54 52 40
53 30 28 28 32 23 85 69 35 49
78 28 69 61 33 19 64 41 54 54
33 71 62 52 44 65 60 67 50 30
35 30 25 36 30 44 39 28 60 80
40 59 63 37 76 37 32 90 51 62
65 74 81 38 53 25 51 52 56 53
74 28 65 24 60 30 53 49 45 50
55 55 91 22 31 38 16 71 60 36
82 52 26 18 63 27 27 57 46 90
74 75 35 70 69 33 60 82 56 82
76 61 52 71 69 49 61 60 31 60
57 36 83 36 79 42 65 38 72 51
63 37 25 54 65 71 78 76 46 32
65 95 63 48 52 66 45 16 67 22
33 54 99 31 76 42 74 65 27 17

We can use excel to get descriptive statistics


for this data.
NHMD, EPPD2023, Sem 2, Sesi 2020-2021
Refer to Age Data in Topic 2

10753
a) Mean, 𝑥=¿  53.765
200
200  1
b) The location of the median is  100.5
2
100th observation = 54 ; 101st observation = 54

54  55
Median =  54.5
2

c) Mode = 60 (frequency of occurrence = 8)


Excel Output
Using Excel:
Click Data, Data Analysis and Descriptive Data
Column1

Mean 53.765
Standard Error 1.397358914
Median 54.5
Mode 60
Standard Deviation 19.76163928
Sample Variance 390.5223869
Kurtosis -0.885087385
Skewness 0.002533504
Range 83
Minimum 16
Maximum 99
Sum 10753
Count 200

Note: Excel only shows the smallest data value if there is


more than one mode in the data.
Measures of Dispersion

An attribute of a distribution concerning the spread of the


values from the mean.

Distribution A, Mean = 15.5

11 12 13 14 15 16 17 18 19 20 21

Distribution B, Mean = 15.5

11 12 13 14 15 16 17 18 19 20 21
Range

 The range is the distance between the smallest


and the largest data value in the set

 range = largest observation – smallest observation

 Its major shortcoming is its failure to provide


information on the dispersion of the observations
between the two end points
Variance

 Variance (varians) is one of the most frequently used


measures of dispersion.
 population: == -

 sample: = =
=

 and are known as square of deviation


2
(𝑥¿¿𝑖−𝜇) ¿ ( 𝑥 𝑖 − 𝑥)2
Standard Deviation

 has the same units as the original data

 cannot be negative

 most widely reported measure of dispersion

 or
Refer to Age Data in Example 3.1

Range = 99 – 16 = 83

Variance, = 390.522

Standard deviation, = 19.762


The Empirical Rule

 The standard deviation can be used to compare


the dispersion of several distributions and make a
statement about the general shape of a
distribution.

 If the histogram is bell-shaped, the Empirical Rule


states that the interval of:

  1 contains about 68% of all observations

  2 contains about 95% of all observations

  3 contains about 99.7% of all observations


Illustration of the Empirical Rule

General Form:
  k

55 60 65 70 75 80 85

  1
  2
  3
Chebysheff’s Theorem

 A more general interpretation of the standard


deviation is derived from Chebysheff’s Theorem,
which applies to all shapes of histograms.

 For any set of observations, the proportion of the


values that lie within k standard deviations of the
mean,   k , is at least

100% , for k > 1


Example 3

Suppose that the mean and standard deviation of last year’s


midterm test marks are 70 and 5, respectively.

  2 = (60, 80)

a) If the distribution is bell-shaped, about 95% of


the marks fell between 60 to 80

b) If the distribution is skewed, at least 75% of the


marks fell between 60 to 80
Coefficient of Variation, CV

 Coefficient of Variation is the ratio of the standard


deviation to the mean, expressed as a percentage

 s
CV   100 CV   100
 x
(population) (sample)
 provides a proportionate measure of variation which
measure the relative dispersion of data
 useful for comparing distributions with 2 different
units of measurement or differ substantially in magnitude
Example 4

Compare the dispersion of the gold & slab zinc prices.

Gold Slab Zinc


Mean $810.6 per ounce $1.04 per pound
Std. dev. $49.0 per ounce $0.08 per pound
CV 6.04% 7.69%

Although the standard deviation of gold prices is more than


612 times the standard deviation of zinc prices, from the
relative dispersion perspective, the dispersion of gold prices is
actually less than that for slab zinc.

You might also like