Introduction to Statistics
Measures of Central Tendency
Written by
jwala
Two Types of StatisticsTwo Types of Statistics
• Descriptive statistics of a POPULATION
• Relevant notation (Greek):
µ mean
– N population size
∑ sum
• Inferential statistics of SAMPLES from a
population.
– Assumptions are made that the sample reflects
the population in an unbiased form. Roman
Notation:
– X mean
– n sample size
∑ sum
• Be careful though because you may
want to use inferential statistics even
when you are dealing with a whole
population.
• Measurement error or missing data may
mean that if we treated a population as
complete that we may have inefficient
estimates.
– It depends on the type of data and project.
– Example of Democratic Peace.
• Also, be careful about the phrase
“descriptive statistics”. It is used
generically in place of measures of
central tendency and dispersion for
inferential statistics.
• Another name is “summary statistics”,
which are univariate:
– Mean, Median, Mode, Range, Standard
Deviation, Variance, Min, Max, etc.
Measures of Central TendencyMeasures of Central Tendency
• These measures tap into the average
distribution of a set of scores or values in
the data.
– Mean
– Median
– Mode
What do you “Mean”?What do you “Mean”?
The “mean” of some data is the average
score or value, such as the average
age of an MPA student or average
weight of professors that like to eat
donuts.
Inferential mean of a sample: X=(∑X)/n
Mean of a population: µ=(∑X)/N
Problem of being “mean”Problem of being “mean”
• The main problem associated with the
mean value of some data is that it is
sensitive to outliers.
• Example, the average weight of political
science professors might be affected if
there was one in the department that
weighed 600 pounds.
Donut-Eating ProfessorsDonut-Eating Professors
Professor Weight Weight
Schmuggles 165 165
Bopsey 213 213
Pallitto 189 410
Homer 187 610
Schnickerson 165 165
Levin 148 148
Honkey-Doorey 251 251
Zingers 308 308
Boehmer 151 151
Queenie 132 132
Googles-Boop 199 199
Calzone 227 227
194.6 248.3
The Median (not the cement in the middle of
the road)
• Because the mean average can be
sensitive to extreme values, the median is
sometimes useful and more accurate.
• The median is simply the middle value
among some scores of a variable. (no
standard formula for its computation)
What is the Median?
Professor Weight
Schmuggles 165
Bopsey 213
Pallitto 189
Homer 187
Schnickerson 165
Levin 148
Honkey-Doorey 251
Zingers 308
Boehmer 151
Queenie 132
Googles-Boop 199
Calzone 227
  194.6
Weight
132
148
151
165
165
187
189
199
213
227
251
308
Rank order 
and choose 
middle value.
If even then 
average 
between two 
in the middle
PercentilesPercentiles
• If we know the median, then we can go up
or down and rank the data as being above
or below certain thresholds.
• You may be familiar with standardized
tests. 90th
percentile, your score was
higher than 90% of the rest of the sample.
The Mode (hold the pie and the ala)
(What does ‘ala’ taste like anyway??)
• The most frequent response or value 
for a variable.
• Multiple modes are possible: bimodal 
or multimodal.
Figuring the Mode
Professor Weight
Schmuggles 165
Bopsey 213
Pallitto 189
Homer 187
Schnickerson 165
Levin 148
Honkey-Doorey 251
Zingers 308
Boehmer 151
Queenie 132
Googles-Boop 199
Calzone 227
What is the mode?
Answer:  165
Important descriptive 
information that may help 
inform your research and 
diagnose problems like lack 
of variability.
Measures of DispersionMeasures of Dispersion (not something
you cast…)
• Measures of dispersion tell us about
variability in the data. Also univariate.
• Basic question: how much do values differ
for a variable from the min to max, and
distance among scores in between. We
use:
– Range
– Standard Deviation
– Variance
• Remember that we said in order to glean 
information from data, i.e. to make an 
inference, we need to see variability in 
our variables.  
• Measures of dispersion give us 
information about how much our 
variables vary from the mean, because if 
they don’t it makes it difficult infer 
anything from the data.  Dispersion is 
also known as the spread or range of 
variability.
The RangeThe Range (no Buffalo roaming!!)
• r = h – l
– Where h is high and l is low
• In other words, the range gives us the
value between the minimum and maximum
values of a variable.
• Understanding this statistic is important in
understanding your data, especially for
management and diagnostic purposes.
The Standard Deviation The Standard Deviation 
• A standardized measure of distance from
the mean.
• Very useful and something you do read
about when making predictions or other
statements about the data.
=square root
∑=sum (sigma)
X=score for each point in data
_
X=mean of scores for the variable
n=sample size (number of
observations or cases
S =
Formula for Standard DeviationFormula for Standard Deviation
1)-(n
2)( XX −∑
We can see that the Standard Deviation equals 165.2
pounds. The weight of Zinger is still likely skewing this
calculation (indirectly through the mean).
X X- mean x-mean squared
Smuggle 165 -29.6 875.2
Bopsey 213 18.4 339.2
Pallitto 189 -5.6 31.2
Homer 187 -7.6 57.5
Schnickerson 165 -29.6 875.2
Levin 148 -46.6 2170.0
Honkey-Doorey 251 56.4 3182.8
Zingers 308 113.4 12863.3
Boehmer 151 -43.6 1899.5
Queeny 132 -62.6 3916.7
Googles-boop 199 4.4 19.5
Calzone 227 32.4 1050.8
Mean 194.6 2480.1 49.8
Example of S in useExample of S in use
• Boehmer- Sobek paper.
–One standard deviation increase in
the value of X variable increases the
Probability of Y occurring by some
amount.
Table 2: Development and Relative Risk of Territorial Claim
Probability* % Change
Baseline 0.0401
development 0.0024 -94.3
pop density 0.0332 -17.3
pop growth 0.0469 16.8
Capability 0.0813 102.5
Openness 0.0393 -2
Capability and pop growth 0.0942 134.8
% Change in prob after 1 sd change in given x variable, holding others at their means
Let’s go to computers!
• Type in data in the Excel sheet.
VarianceVariance
1)-(n
2)( XX −∑
• Note that this is the same equation except for
no square root taken.
• Its use is not often directly reported in research
but instead is a building block for other statistical
methods
S2 =
Organizing and Graphing
Data
Goal of Graphing?
1. Presentation of Descriptive Statistics
2. Presentation of Evidence
3. Some people understand subject
matter better with visual aids
4. Provide a sense of the underlying
data generating process (scatter-
plots)
What is the Distribution?
• Gives us a picture of
the variability and
central tendency.
• Can also show the
amount of skewness
and Kurtosis.
Graphing Data: Types
Creating Frequencies
• We create frequencies by sorting data
by value or category and then
summing the cases that fall into those
values.
• How often do certain scores occur?
This is a basic descriptive data
question.
Ranking of Donut-eating Profs.
(most to least)
Zingers 308
Honkey-Doorey 251
Calzone 227
Bopsey 213
Googles-boop 199
Pallitto 189
Homer 187
Schnickerson 165
Smuggle 165
Boehmer 151
Levin 148
Queeny 132
Weight Class Intervals of Donut-Munching Professors
0
0.5
1
1.5
2
2.5
3
3.5
130-150 151-185 186-210 211-240 241-270 271-310 311+
Number
Here we have placed the Professors into
weight classes and depict with a histogram in
columns.
Weight Class Intervals of Donut-Munching Professors
0 0.5 1 1.5 2 2.5 3 3.5
130-150
151-185
186-210
211-240
241-270
271-310
311+
Number
Here it is another histogram depicted
as a bar graph.
Pie Charts:
Proportions of Donut-Eating Professors by Weight Class
130-150
151-185
186-210
211-240
241-270
271-310
311+
Actually, why not use a donut
graph. Duh!
Proportions of Donut-Eating Professors by Weight Class
130-150
151-185
186-210
211-240
241-270
271-310
311+
See Excel for other options!!!!
Line Graphs: A Time Series
0
10
20
30
40
50
60
70
80
90
100
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
Month
Approval
Approval
Economic approval
Scatter Plot (Two variable)
Presidential Approval and Unemployment
0
20
40
60
80
100
0 2 4 6 8 10 12
Unemployment
Approval
Approve

Introduction to statistics

  • 1.
    Introduction to Statistics Measuresof Central Tendency Written by jwala
  • 3.
    Two Types ofStatisticsTwo Types of Statistics • Descriptive statistics of a POPULATION • Relevant notation (Greek): µ mean – N population size ∑ sum • Inferential statistics of SAMPLES from a population. – Assumptions are made that the sample reflects the population in an unbiased form. Roman Notation: – X mean – n sample size ∑ sum
  • 4.
    • Be carefulthough because you may want to use inferential statistics even when you are dealing with a whole population. • Measurement error or missing data may mean that if we treated a population as complete that we may have inefficient estimates. – It depends on the type of data and project. – Example of Democratic Peace.
  • 5.
    • Also, becareful about the phrase “descriptive statistics”. It is used generically in place of measures of central tendency and dispersion for inferential statistics. • Another name is “summary statistics”, which are univariate: – Mean, Median, Mode, Range, Standard Deviation, Variance, Min, Max, etc.
  • 6.
    Measures of CentralTendencyMeasures of Central Tendency • These measures tap into the average distribution of a set of scores or values in the data. – Mean – Median – Mode
  • 7.
    What do you“Mean”?What do you “Mean”? The “mean” of some data is the average score or value, such as the average age of an MPA student or average weight of professors that like to eat donuts. Inferential mean of a sample: X=(∑X)/n Mean of a population: µ=(∑X)/N
  • 8.
    Problem of being“mean”Problem of being “mean” • The main problem associated with the mean value of some data is that it is sensitive to outliers. • Example, the average weight of political science professors might be affected if there was one in the department that weighed 600 pounds.
  • 9.
    Donut-Eating ProfessorsDonut-Eating Professors ProfessorWeight Weight Schmuggles 165 165 Bopsey 213 213 Pallitto 189 410 Homer 187 610 Schnickerson 165 165 Levin 148 148 Honkey-Doorey 251 251 Zingers 308 308 Boehmer 151 151 Queenie 132 132 Googles-Boop 199 199 Calzone 227 227 194.6 248.3
  • 10.
    The Median (notthe cement in the middle of the road) • Because the mean average can be sensitive to extreme values, the median is sometimes useful and more accurate. • The median is simply the middle value among some scores of a variable. (no standard formula for its computation)
  • 11.
    What is theMedian? Professor Weight Schmuggles 165 Bopsey 213 Pallitto 189 Homer 187 Schnickerson 165 Levin 148 Honkey-Doorey 251 Zingers 308 Boehmer 151 Queenie 132 Googles-Boop 199 Calzone 227   194.6 Weight 132 148 151 165 165 187 189 199 213 227 251 308 Rank order  and choose  middle value. If even then  average  between two  in the middle
  • 12.
    PercentilesPercentiles • If weknow the median, then we can go up or down and rank the data as being above or below certain thresholds. • You may be familiar with standardized tests. 90th percentile, your score was higher than 90% of the rest of the sample.
  • 13.
    The Mode (hold thepie and the ala) (What does ‘ala’ taste like anyway??) • The most frequent response or value  for a variable. • Multiple modes are possible: bimodal  or multimodal.
  • 14.
    Figuring the Mode ProfessorWeight Schmuggles 165 Bopsey 213 Pallitto 189 Homer 187 Schnickerson 165 Levin 148 Honkey-Doorey 251 Zingers 308 Boehmer 151 Queenie 132 Googles-Boop 199 Calzone 227 What is the mode? Answer:  165 Important descriptive  information that may help  inform your research and  diagnose problems like lack  of variability.
  • 15.
    Measures of DispersionMeasures of Dispersion (not something youcast…) • Measures of dispersion tell us about variability in the data. Also univariate. • Basic question: how much do values differ for a variable from the min to max, and distance among scores in between. We use: – Range – Standard Deviation – Variance
  • 16.
  • 17.
    The RangeThe Range (no Buffaloroaming!!) • r = h – l – Where h is high and l is low • In other words, the range gives us the value between the minimum and maximum values of a variable. • Understanding this statistic is important in understanding your data, especially for management and diagnostic purposes.
  • 18.
    The Standard Deviation The Standard Deviation  • A standardizedmeasure of distance from the mean. • Very useful and something you do read about when making predictions or other statements about the data.
  • 19.
    =square root ∑=sum (sigma) X=scorefor each point in data _ X=mean of scores for the variable n=sample size (number of observations or cases S = Formula for Standard DeviationFormula for Standard Deviation 1)-(n 2)( XX −∑
  • 20.
    We can seethat the Standard Deviation equals 165.2 pounds. The weight of Zinger is still likely skewing this calculation (indirectly through the mean). X X- mean x-mean squared Smuggle 165 -29.6 875.2 Bopsey 213 18.4 339.2 Pallitto 189 -5.6 31.2 Homer 187 -7.6 57.5 Schnickerson 165 -29.6 875.2 Levin 148 -46.6 2170.0 Honkey-Doorey 251 56.4 3182.8 Zingers 308 113.4 12863.3 Boehmer 151 -43.6 1899.5 Queeny 132 -62.6 3916.7 Googles-boop 199 4.4 19.5 Calzone 227 32.4 1050.8 Mean 194.6 2480.1 49.8
  • 21.
    Example of Sin useExample of S in use • Boehmer- Sobek paper. –One standard deviation increase in the value of X variable increases the Probability of Y occurring by some amount.
  • 22.
    Table 2: Developmentand Relative Risk of Territorial Claim Probability* % Change Baseline 0.0401 development 0.0024 -94.3 pop density 0.0332 -17.3 pop growth 0.0469 16.8 Capability 0.0813 102.5 Openness 0.0393 -2 Capability and pop growth 0.0942 134.8 % Change in prob after 1 sd change in given x variable, holding others at their means
  • 23.
    Let’s go tocomputers! • Type in data in the Excel sheet.
  • 24.
    VarianceVariance 1)-(n 2)( XX −∑ •Note that this is the same equation except for no square root taken. • Its use is not often directly reported in research but instead is a building block for other statistical methods S2 =
  • 25.
  • 26.
    Goal of Graphing? 1.Presentation of Descriptive Statistics 2. Presentation of Evidence 3. Some people understand subject matter better with visual aids 4. Provide a sense of the underlying data generating process (scatter- plots)
  • 27.
    What is theDistribution? • Gives us a picture of the variability and central tendency. • Can also show the amount of skewness and Kurtosis.
  • 28.
  • 29.
    Creating Frequencies • Wecreate frequencies by sorting data by value or category and then summing the cases that fall into those values. • How often do certain scores occur? This is a basic descriptive data question.
  • 30.
    Ranking of Donut-eatingProfs. (most to least) Zingers 308 Honkey-Doorey 251 Calzone 227 Bopsey 213 Googles-boop 199 Pallitto 189 Homer 187 Schnickerson 165 Smuggle 165 Boehmer 151 Levin 148 Queeny 132
  • 31.
    Weight Class Intervalsof Donut-Munching Professors 0 0.5 1 1.5 2 2.5 3 3.5 130-150 151-185 186-210 211-240 241-270 271-310 311+ Number Here we have placed the Professors into weight classes and depict with a histogram in columns.
  • 32.
    Weight Class Intervalsof Donut-Munching Professors 0 0.5 1 1.5 2 2.5 3 3.5 130-150 151-185 186-210 211-240 241-270 271-310 311+ Number Here it is another histogram depicted as a bar graph.
  • 33.
    Pie Charts: Proportions ofDonut-Eating Professors by Weight Class 130-150 151-185 186-210 211-240 241-270 271-310 311+
  • 34.
    Actually, why notuse a donut graph. Duh! Proportions of Donut-Eating Professors by Weight Class 130-150 151-185 186-210 211-240 241-270 271-310 311+ See Excel for other options!!!!
  • 35.
    Line Graphs: ATime Series 0 10 20 30 40 50 60 70 80 90 100 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 Month Approval Approval Economic approval
  • 36.
    Scatter Plot (Twovariable) Presidential Approval and Unemployment 0 20 40 60 80 100 0 2 4 6 8 10 12 Unemployment Approval Approve