DESCRIPTIVE
STATISTICS
Dr. Marlissa Omar
Senior Lecturer
Centre for STEM Enculturation,
Faculty of Education,
UKM
Example: Country, gender, race, hair
color of a group of people, mode of
transportation, etc
Example: Likert Scale, socio economic status
(“low income”,”middle income”,”high income”),
education level (“high
school”,”BS”,”MS”,”PhD”), satisfaction rating
(“extremely dislike”, “dislike”, “neutral”, “like”,
“extremely like”).
Example: Temperature (Farenheit),
temperature (Celcius), pH, IQ
Example: Temperature (Kelvin), age,
distance, height, weight.
3/1/20XX SAMPLE FOOTER TEXT 2
KEY STATISTICAL CONCEPTS
Population Sample
- The group of all items of interest - A set of data drawn from the population
- Frequently very large: sometimes infinite - Potentially large, but less than the population
Parameter: Descriptive measure of a population Statistics: Descriptive measure of a sample
Example of Parameter vs Statistics
DESCRIPTIVE STATISTICS
Which Group is Smarter?
Class A--IQs of 13 Students Class B--IQs of 13 Students
102 115 127 162
128 109 131 103
131 89 96 111
98 106 80 109
140 119 93 87
93 97 120 105
110 109
Each individual may be different. If you try to understand a group by remembering the
qualities of each member, you become overwhelmed and fail to understand the group.
3/1/20XX SAMPLE FOOTER TEXT 6
DESCRIPTIVE STATISTICS
Which group is smarter now?
Class A--Average IQ Class B--Average IQ
110.54 110.23
They’re roughly the same!
With a summary descriptive statistic, it is much easier to answer our question.
7
DESCRIPTIVE STATISTICS
Types of descriptive statistics:
• Organize Data
– Tables
– Graphs
• Summarize Data
– Central Tendency
– Variation
8
DESCRIPTIVE STATISTICS
• Organize Data
– Tables
• Frequency Distribution
• Relative Frequency Distribution
– Graphs
• Bar Chart
• Histogram
• Stem and Leaf Plot
• Frequency Polygon
• Pie Chart
• Scatter Plot
9
SPSS OUTPUT FOR FREQUENCY DISTRIBUTION
GROUPED RELATIVE FREQUENCY DISTRIBUTION
HISTOGRAM
BAR GRAPH
13
STEM AND LEAF PLOT
14
SPSS OUTPUT OF A
FREQUENCY
POLYGON
15
PIE CHART
16
SCATTER PLOT
17
DESCRIPTIVE STATISTICS
Summarizing Data:
– Central Tendency (or Groups’ “Middle Values”)
• Mean
• Median
• Mode
– Variation (or Summary of Differences Within Groups)
• Range
• Interquartile Range
• Variance
• Standard Deviation
18
MEASURES OF CENTRAL TENDENCY
• A measure of central tendency is a single value that attempts to describe
a set of data by identifying the central position within that set of data. As
such, measures of central tendency are sometimes called measures of
central location (Laerd, 2018).
• The mean, median and mode are all valid measures of central tendency,
but under different conditions, some measures of central tendency
become more appropriate to use than others (Laerd, 2018).
19
MEAN
Most commonly called the “average”.
Add up the values for each case and divide by the total number of cases.
Total score
________________
Mean = Number of scores
20
MEAN
21
MEAN
22
Semester 1 examination results
(Mathematics score)
Class 3A Class 3B Class 3C
54 60 70 94 55 38 62 60 62
62 55 59 58 59 70 65 55 61
86 58 74 75 50 28 57 60 50
74 72 64 98 88 44 50 60 67
65 75 62 29 41 69 67 54 65
82 65 78 97 68 42 60 66 61
84 78 68 95 90 99 63 67 62
76 68 75 56 60 92 67 59 60
80 73 60 87 100 40 66 58 55
66 75 57 46 33 56 60 67 50
67 80 69 43 57 80 62 63 57
72 67 60 70 35 72 61 68 69
56 70 59 48 30 90 55 58 63
Mean for class 3A = 68.59
Mean for class 3B = 63.64
Mean for class 3C = 60.82 Which class performance is the best and the weakest?
23
1. Means can be badly affected by outliers (data points with extreme
values unlike the rest)
2. Outliers can make the mean a bad measure of central tendency or
common experience
3/1/20XX SAMPLE FOOTER TEXT 24
MEDIAN
• The middle value when a variable’s
values are ranked in order; the point Class A--IQs of 13 Students
that divides a distribution into two
equal halves.
• When data are listed in order, the
median is the point at which 50% of
the cases are above and 50% below
it.
• The 50th percentile.
25
MEDIAN
• If the first student were to drop out of Class A, there would be a new
median:
89
93
97
98
102
106
109 Median = 109.5
110 109 + 110 = 219/2 = 109.5
115 (six cases above, six below)
119
128
131
140
26
12 17 21 24 26 35 35 37 40
The median value 26 is the middle score
12 17 21 24 26 35 35 37 40 41
The median value is the average value of scores 26 and 35 which is 30.5
27
MEDIAN
The median is unaffected by outliers, making it
a better measure of central tendency, better
describing the “typical person” than the mean
when data are skewed.
3/1/20XX SAMPLE FOOTER TEXT 28
MEDIAN
If the recorded values for a variable form a symmetric distribution, the
median and mean are identical.
In skewed data, the mean lies further toward the skew than the median.
29
MODE
The most common data point is called the mode.
Used to state the category that appears with greatest frequency.
Usually used to state the demographic characteristics of research subjects
which comprise many categories such as:
Ages
Income
Education
Etc.
30
MODE
1. It may give you the most likely experience rather than the “typical” or
“central” experience.
2. In symmetric distributions, the mean, median, and mode are the same.
3. In skewed data, the mean and median lie further toward the skew than
the mode.
31
Choosing a Measure of Central
Tendency
– If you want to know which score occurred most often, then the mode
is the choice.
– The mean is a better choice to serve as the representative score
because it takes into account all the data in the distribution. However,
it treats all scores alike; differences in magnitude are not taken into
account.
32
SUMMARY OF WHEN TO USE THE MEAN,
MEDIAN AND MODE
Discrete data is a count that can't be
Type of Variable Type of Data made more Bestprecise.
MeasureTypically
of CentralitTendency
involves
integers. For instance, the number of
children (or adults, or pets) in your family
Nominal Discrete Mode you are
is discrete data, because
counting whole, indivisible entities: you
can't have 2.5 kids, or 1.3 pets.
Ordinal Discrete Median
Interval/Ratio (Not skewed) Continuous Continuous data is dataMeanthat can take
any value. Height, weight, temperature
Continuous and length are all examples of
Interval/Ratio (Skewed) Median
continuous data. Some continuous data
will change over time; the weight of a
baby in its first year or the temperature in
a room throughout the day. 33
3/1/20XX SAMPLE FOOTER TEXT 34
RANGE
The difference between the highest value and the lowest value in a
distribution
Range = Maximum score – Minimum score
Class A--IQs of 13 Students Class B--IQs of 13 Students
102 115 127 162
128 109 131 103
131 89 96 111
98 106 80 109
140 119 93 87
93 97 120 105
110 109
Class A Range = 140 - 89 = 51 Class B Range = 162 - 80 = 82
35
3/1/20XX SAMPLE FOOTER TEXT 36
RANGE
( 1 2 5 6 7 ) 9 (12 15 18 19 27 )
Median
Interquartile Range = Q3 – Q1
= 18 – 5
= 13
3/1/20XX SAMPLE FOOTER TEXT 37
DETECTING POTENTIAL OUTLIERS
An observation is a potential outlier (pencilan) if it falls more
than 1.5 x IQR below the first quartile or more than 1.5 x IQR
above the third quartile.
• Cutoff value for LOW OUTLIERS:
Q1-1.5 X IQR *any value less than this number is considered a low outlier
• Cutoff value for HIGH OUTLIERS
Q3+1.5 X IQR *any value greater than this number is considered a high outlier
3/1/20XX SAMPLE FOOTER TEXT 38
IQR = 41 – 26
= 15
Cuttoff value for LOW OUTLIERS = 25 – (1.5 x 15)
= 3.5
Cutoff value for HIGH OURLIERS = 41 + (1.5 x 15)
= 63.5
3/1/20XX SAMPLE FOOTER TEXT 39
LETS TRY!
Skor peperiksaan pertengahan semester beberapa orang pelajar bagi kelas KP2 adalah
seperti berikut: (Skor dalam peratus item ditanda betul).
35 76 76 81 85 87 88 93 95 99
a) Cari min, median, dan mod.
b) Kenalpasti julat,julat antara kuartil dan pencilan (outlier).
c) Apakah maklumat yang diperolehi mengenai prestasi pelajar dalam peperiksaan
pertengahan semester KP2?
d) Adakah taburan data pencong? Nyatakan jenis kepencongan.
3/1/20XX SAMPLE FOOTER TEXT 40
VARIANCE
A measure of how far a data set is spread out on a variable. Used to identify the
dispersion of scores in a distribution.
The larger the variance, the further the individual cases are from the
mean.
The smaller the variance, the closer the individual scores are to the mean.
3/1/20XX 41
VARIANCE
The deviation of 102 from 110.54 is? Deviation of 115?
Class A--IQs of 13 Students
102 115
126 109
131 89
98 106
140 119
93 97
110
3/1/20XX 42
VARIANCE
The deviation of 102 from 110.54 is? Deviation of 115?
102 - 110.54 = -8.54 115 - 110.54 = 4.46
Class A--IQs of 13 Students
102 115
126 109
131 89
98 106
140 119
93 97
110
3/1/20XX 43
VARIANCE
• We need a way to eliminate negative signs.
Squaring the deviations will eliminate negative signs...
A Deviation Squared: (Yi –Y-bar)2
Back to the IQ example,
A deviation squared
a) for 102: b)for 115:
(102 - 110.54)2 (115 - 110.54)2
=(-8.54)2 = 72.93 =(4.46)2 = 19.89
3/1/20XX SAMPLE FOOTER TEXT 44
VARIANCE
If you were to add all the squared deviations together,
you’d get what we call the “Sum of Squares.”
Sum of Squares (SS) = Σ (Yi – Y-bar)2
SS = (Y1 – Y-bar)2 + (Y2 – Y-bar)2 + . . . + (Yn – Ybar)
2
3/1/20XX SAMPLE FOOTER TEXT 45
Class A--IQs of 13 Students Class A, sum of squares:
102 115 (102 – 110.54)2 + (115 – 110.54)2 +
126 109 (126 – 110.54)2 + (109 – 110.54)2 +
131 89 (131 – 110.54)2 + (89 – 110.54)2 +
98 106 (98 – 110.54)2 + (106 – 110.54)2 +
140 119 (140 – 110.54)2 + (119 – 110.54)2 +
93 97 (93 – 110.54)2 + (97 – 110.54)2 +
110 (110 – 110.54) = SS = 2825.39
Y-bar = 110.54
3/1/20XX SAMPLE FOOTER TEXT 46
The last step…
The approximate average sum of squares is the variance.
SS/N = Variance for a population.
SS/n-1 = Variance for a sample.
Variance = Σ(Yi –Y-bar)2 / n – 1
For Class A, Variance = 2825.39 / n - 1
= 2825.39 / 12
= 235.45
3/1/20XX SAMPLE FOOTER TEXT 47
STANDARD DEVIATION
The main measurement indicator in research to explain the dispersion of scores in a
distribution.
The square root of the variance reveals the average deviation of the observations
from the mean.
s.d. = Σ(Yi –Y-bar)2
n-1
48
STANDARD DEVIATION
What is the standard deviation for Class A?
235.45 = 15.34
The average of persons’ deviation from the mean IQ of
110.54 is 15.34 IQ points.
3/1/20XX SAMPLE FOOTER TEXT 49
VARIANCE VS STANDARD DEVIATION
• Both indicate how spread-out the data values are.
• The variance is measured in terms of square units (added together
squared differences in the calculation) – For example, if sample data is
measured in terms of
meters, then the units for a variance would be given in square meters.
• In order to eliminate the problem of squared units, and gives us a
measure of the spread that will have the same units as in original
sample, take the square root of the variance = standard deviation.
50
1. Larger s.d. = greater amounts of variation around the mean.
For example:
19 25 31 13 25 37
Y = 25 Y = 25
s.d. = 3 s.d. = 6
2. s.d. = 0 only when all values are the same (only when you have a constant and
not a “variable”)
3. If you were to “rescale” a variable, the s.d. would change by the same
magnitude—if we changed units above so the mean equaled 250, the s.d. on the
left would be 30, and on the right, 60
4. Like the mean, the s.d. will be inflated by an outlier case value.
3/1/20XX SAMPLE FOOTER TEXT 51