0% found this document useful (0 votes)
32 views51 pages

1-Descriptive Statistics

The document provides an overview of descriptive statistics, including key concepts such as population, sample, measures of central tendency (mean, median, mode), and measures of variation (range, variance, standard deviation). It highlights the importance of summarizing data through various methods like tables and graphs, and explains how to identify outliers. Additionally, it discusses when to use different measures of central tendency based on the type of data.

Uploaded by

g-06551861
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views51 pages

1-Descriptive Statistics

The document provides an overview of descriptive statistics, including key concepts such as population, sample, measures of central tendency (mean, median, mode), and measures of variation (range, variance, standard deviation). It highlights the importance of summarizing data through various methods like tables and graphs, and explains how to identify outliers. Additionally, it discusses when to use different measures of central tendency based on the type of data.

Uploaded by

g-06551861
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

DESCRIPTIVE

STATISTICS

Dr. Marlissa Omar


Senior Lecturer
Centre for STEM Enculturation,
Faculty of Education,
UKM
Example: Country, gender, race, hair
color of a group of people, mode of
transportation, etc

Example: Likert Scale, socio economic status


(“low income”,”middle income”,”high income”),
education level (“high
school”,”BS”,”MS”,”PhD”), satisfaction rating
(“extremely dislike”, “dislike”, “neutral”, “like”,
“extremely like”).

Example: Temperature (Farenheit),


temperature (Celcius), pH, IQ

Example: Temperature (Kelvin), age,


distance, height, weight.

3/1/20XX SAMPLE FOOTER TEXT 2


KEY STATISTICAL CONCEPTS

Population Sample

- The group of all items of interest - A set of data drawn from the population
- Frequently very large: sometimes infinite - Potentially large, but less than the population
Parameter: Descriptive measure of a population Statistics: Descriptive measure of a sample
Example of Parameter vs Statistics
DESCRIPTIVE STATISTICS
Which Group is Smarter?

Class A--IQs of 13 Students Class B--IQs of 13 Students


102 115 127 162
128 109 131 103
131 89 96 111
98 106 80 109
140 119 93 87
93 97 120 105
110 109

Each individual may be different. If you try to understand a group by remembering the
qualities of each member, you become overwhelmed and fail to understand the group.

3/1/20XX SAMPLE FOOTER TEXT 6


DESCRIPTIVE STATISTICS
Which group is smarter now?

Class A--Average IQ Class B--Average IQ


110.54 110.23

They’re roughly the same!

With a summary descriptive statistic, it is much easier to answer our question.

7
DESCRIPTIVE STATISTICS
Types of descriptive statistics:

• Organize Data
– Tables
– Graphs

• Summarize Data
– Central Tendency
– Variation

8
DESCRIPTIVE STATISTICS
• Organize Data
– Tables
• Frequency Distribution
• Relative Frequency Distribution
– Graphs
• Bar Chart
• Histogram
• Stem and Leaf Plot
• Frequency Polygon
• Pie Chart
• Scatter Plot
9
SPSS OUTPUT FOR FREQUENCY DISTRIBUTION
GROUPED RELATIVE FREQUENCY DISTRIBUTION
HISTOGRAM
BAR GRAPH

13
STEM AND LEAF PLOT

14
SPSS OUTPUT OF A
FREQUENCY
POLYGON

15
PIE CHART

16
SCATTER PLOT

17
DESCRIPTIVE STATISTICS
Summarizing Data:
– Central Tendency (or Groups’ “Middle Values”)
• Mean
• Median
• Mode

– Variation (or Summary of Differences Within Groups)


• Range
• Interquartile Range
• Variance
• Standard Deviation

18
MEASURES OF CENTRAL TENDENCY
• A measure of central tendency is a single value that attempts to describe
a set of data by identifying the central position within that set of data. As
such, measures of central tendency are sometimes called measures of
central location (Laerd, 2018).

• The mean, median and mode are all valid measures of central tendency,
but under different conditions, some measures of central tendency
become more appropriate to use than others (Laerd, 2018).

19
MEAN
Most commonly called the “average”.

Add up the values for each case and divide by the total number of cases.

Total score
________________
Mean = Number of scores

20
MEAN

21
MEAN

22
Semester 1 examination results
(Mathematics score)
Class 3A Class 3B Class 3C
54 60 70 94 55 38 62 60 62
62 55 59 58 59 70 65 55 61
86 58 74 75 50 28 57 60 50
74 72 64 98 88 44 50 60 67
65 75 62 29 41 69 67 54 65
82 65 78 97 68 42 60 66 61
84 78 68 95 90 99 63 67 62
76 68 75 56 60 92 67 59 60
80 73 60 87 100 40 66 58 55
66 75 57 46 33 56 60 67 50
67 80 69 43 57 80 62 63 57
72 67 60 70 35 72 61 68 69
56 70 59 48 30 90 55 58 63
Mean for class 3A = 68.59

Mean for class 3B = 63.64

Mean for class 3C = 60.82 Which class performance is the best and the weakest?
23
1. Means can be badly affected by outliers (data points with extreme
values unlike the rest)
2. Outliers can make the mean a bad measure of central tendency or
common experience

3/1/20XX SAMPLE FOOTER TEXT 24


MEDIAN
• The middle value when a variable’s
values are ranked in order; the point Class A--IQs of 13 Students
that divides a distribution into two
equal halves.

• When data are listed in order, the


median is the point at which 50% of
the cases are above and 50% below
it.

• The 50th percentile.


25
MEDIAN
• If the first student were to drop out of Class A, there would be a new
median:
89
93
97
98
102
106
109 Median = 109.5
110 109 + 110 = 219/2 = 109.5
115 (six cases above, six below)
119
128
131
140

26
12 17 21 24 26 35 35 37 40

The median value 26 is the middle score

12 17 21 24 26 35 35 37 40 41

The median value is the average value of scores 26 and 35 which is 30.5

27
MEDIAN
The median is unaffected by outliers, making it
a better measure of central tendency, better
describing the “typical person” than the mean
when data are skewed.

3/1/20XX SAMPLE FOOTER TEXT 28


MEDIAN
If the recorded values for a variable form a symmetric distribution, the
median and mean are identical.

In skewed data, the mean lies further toward the skew than the median.

29
MODE
The most common data point is called the mode.

Used to state the category that appears with greatest frequency.

Usually used to state the demographic characteristics of research subjects


which comprise many categories such as:

Ages
Income
Education
Etc.
30
MODE
1. It may give you the most likely experience rather than the “typical” or
“central” experience.
2. In symmetric distributions, the mean, median, and mode are the same.
3. In skewed data, the mean and median lie further toward the skew than
the mode.

31
Choosing a Measure of Central
Tendency
– If you want to know which score occurred most often, then the mode
is the choice.
– The mean is a better choice to serve as the representative score
because it takes into account all the data in the distribution. However,
it treats all scores alike; differences in magnitude are not taken into
account.

32
SUMMARY OF WHEN TO USE THE MEAN,
MEDIAN AND MODE
Discrete data is a count that can't be
Type of Variable Type of Data made more Bestprecise.
MeasureTypically
of CentralitTendency
involves
integers. For instance, the number of
children (or adults, or pets) in your family
Nominal Discrete Mode you are
is discrete data, because
counting whole, indivisible entities: you
can't have 2.5 kids, or 1.3 pets.
Ordinal Discrete Median

Interval/Ratio (Not skewed) Continuous Continuous data is dataMeanthat can take


any value. Height, weight, temperature
Continuous and length are all examples of
Interval/Ratio (Skewed) Median
continuous data. Some continuous data
will change over time; the weight of a
baby in its first year or the temperature in
a room throughout the day. 33
3/1/20XX SAMPLE FOOTER TEXT 34
RANGE
The difference between the highest value and the lowest value in a
distribution

Range = Maximum score – Minimum score

Class A--IQs of 13 Students Class B--IQs of 13 Students


102 115 127 162
128 109 131 103
131 89 96 111
98 106 80 109
140 119 93 87
93 97 120 105
110 109
Class A Range = 140 - 89 = 51 Class B Range = 162 - 80 = 82

35
3/1/20XX SAMPLE FOOTER TEXT 36
RANGE
( 1 2 5 6 7 ) 9 (12 15 18 19 27 )

Median

Interquartile Range = Q3 – Q1
= 18 – 5
= 13

3/1/20XX SAMPLE FOOTER TEXT 37


DETECTING POTENTIAL OUTLIERS
An observation is a potential outlier (pencilan) if it falls more
than 1.5 x IQR below the first quartile or more than 1.5 x IQR
above the third quartile.

• Cutoff value for LOW OUTLIERS:


Q1-1.5 X IQR *any value less than this number is considered a low outlier

• Cutoff value for HIGH OUTLIERS


Q3+1.5 X IQR *any value greater than this number is considered a high outlier

3/1/20XX SAMPLE FOOTER TEXT 38


IQR = 41 – 26
= 15

Cuttoff value for LOW OUTLIERS = 25 – (1.5 x 15)


= 3.5

Cutoff value for HIGH OURLIERS = 41 + (1.5 x 15)


= 63.5
3/1/20XX SAMPLE FOOTER TEXT 39
LETS TRY!
Skor peperiksaan pertengahan semester beberapa orang pelajar bagi kelas KP2 adalah
seperti berikut: (Skor dalam peratus item ditanda betul).

35 76 76 81 85 87 88 93 95 99

a) Cari min, median, dan mod.


b) Kenalpasti julat,julat antara kuartil dan pencilan (outlier).
c) Apakah maklumat yang diperolehi mengenai prestasi pelajar dalam peperiksaan
pertengahan semester KP2?
d) Adakah taburan data pencong? Nyatakan jenis kepencongan.

3/1/20XX SAMPLE FOOTER TEXT 40


VARIANCE
A measure of how far a data set is spread out on a variable. Used to identify the
dispersion of scores in a distribution.

The larger the variance, the further the individual cases are from the
mean.

The smaller the variance, the closer the individual scores are to the mean.

3/1/20XX 41
VARIANCE
The deviation of 102 from 110.54 is? Deviation of 115?

Class A--IQs of 13 Students


102 115
126 109
131 89
98 106
140 119
93 97
110

3/1/20XX 42
VARIANCE
The deviation of 102 from 110.54 is? Deviation of 115?
102 - 110.54 = -8.54 115 - 110.54 = 4.46

Class A--IQs of 13 Students


102 115
126 109
131 89
98 106
140 119
93 97
110
3/1/20XX 43
VARIANCE
• We need a way to eliminate negative signs.
Squaring the deviations will eliminate negative signs...
A Deviation Squared: (Yi –Y-bar)2
Back to the IQ example,
A deviation squared

a) for 102: b)for 115:


(102 - 110.54)2 (115 - 110.54)2
=(-8.54)2 = 72.93 =(4.46)2 = 19.89

3/1/20XX SAMPLE FOOTER TEXT 44


VARIANCE
If you were to add all the squared deviations together,
you’d get what we call the “Sum of Squares.”

Sum of Squares (SS) = Σ (Yi – Y-bar)2

SS = (Y1 – Y-bar)2 + (Y2 – Y-bar)2 + . . . + (Yn – Ybar)


2

3/1/20XX SAMPLE FOOTER TEXT 45


Class A--IQs of 13 Students Class A, sum of squares:
102 115 (102 – 110.54)2 + (115 – 110.54)2 +
126 109 (126 – 110.54)2 + (109 – 110.54)2 +
131 89 (131 – 110.54)2 + (89 – 110.54)2 +
98 106 (98 – 110.54)2 + (106 – 110.54)2 +
140 119 (140 – 110.54)2 + (119 – 110.54)2 +
93 97 (93 – 110.54)2 + (97 – 110.54)2 +
110 (110 – 110.54) = SS = 2825.39

Y-bar = 110.54

3/1/20XX SAMPLE FOOTER TEXT 46


The last step…
The approximate average sum of squares is the variance.

SS/N = Variance for a population.

SS/n-1 = Variance for a sample.

Variance = Σ(Yi –Y-bar)2 / n – 1

For Class A, Variance = 2825.39 / n - 1


= 2825.39 / 12
= 235.45
3/1/20XX SAMPLE FOOTER TEXT 47
STANDARD DEVIATION
The main measurement indicator in research to explain the dispersion of scores in a
distribution.

The square root of the variance reveals the average deviation of the observations
from the mean.

s.d. = Σ(Yi –Y-bar)2


n-1

48
STANDARD DEVIATION
What is the standard deviation for Class A?

235.45 = 15.34

The average of persons’ deviation from the mean IQ of


110.54 is 15.34 IQ points.

3/1/20XX SAMPLE FOOTER TEXT 49


VARIANCE VS STANDARD DEVIATION
• Both indicate how spread-out the data values are.
• The variance is measured in terms of square units (added together
squared differences in the calculation) – For example, if sample data is
measured in terms of
meters, then the units for a variance would be given in square meters.
• In order to eliminate the problem of squared units, and gives us a
measure of the spread that will have the same units as in original
sample, take the square root of the variance = standard deviation.

50
1. Larger s.d. = greater amounts of variation around the mean.
For example:

19 25 31 13 25 37

Y = 25 Y = 25
s.d. = 3 s.d. = 6

2. s.d. = 0 only when all values are the same (only when you have a constant and
not a “variable”)

3. If you were to “rescale” a variable, the s.d. would change by the same
magnitude—if we changed units above so the mean equaled 250, the s.d. on the
left would be 30, and on the right, 60

4. Like the mean, the s.d. will be inflated by an outlier case value.
3/1/20XX SAMPLE FOOTER TEXT 51

You might also like