Basic Statics
Basic Statics
(Stat 2011)
Suggested references
1. Bluman, A.G. (1995). Elementary Statistics: A Step by Step
Approach. Wm. C. Brown Communications, Inc.
2. Mekonnen Tadesse and Bedilu Alamirie (2018). Basic
Statistics, Aster Nega publishing interprise.
Course Delivery and Evaluation
Contact Information
- Email: [email protected]
- Office: 103, Freshman building
Chapter one
INTRODUCTION
Chapter 1 Goals
After completing this chapter, you should be able
to:
Define statistics
Explain key definitions:
Population vs. Sample
Parameter vs. Statistic
Explain the difference between Descriptive and Inferential
statistics
Stages in Statistical Investigation
Identify types of data and levels of measurement scale
Definition of statistics
What is statistics?
Currently, the word STATISTICS used in two senses:
1) In its plural sense - Numerical data
2) In its singular sense - a subject/science which
studies principles and methods employed in the
collection, presentation, analysis and interpretation
of data.
Statistics as a subject is the study of making sense of
data in describing certain situations.
Cont’d
Data
- Facts/figures from which conclusions drawn
- Raw materials of statistics
Population Sample
a b cd b c
ef gh i jk l m n gi n
o p q rs t u v w o r u
x y z y
Sample Population
Inferential statistics
Provide the bases for predictions, forecasts, and
estimates that are used to transform information into
knowledge
Descriptive Statistics
Collect data
e.g., Survey
Present data
e.g., Tables and graphs
Summarize data
e.g., Sample mean =
X i
n
Inferential Statistics
Sample Population
Inferential Statistics
Estimation
e.g., Estimate the population
mean weight using the sample
mean weight
Hypothesis testing
e.g., Test the claim that the
population mean weight of
statistics students is 58 Kg
Decision
Qualitative/ Quantitative/
Categorical Numerical
Examples:
Marital Status
Are you registered to Discrete Continuous
vote?
Eye Color Examples: Examples:
(Defined categories or Number of Children Weight
groups) Number of laptops Voltage
(Counted items) (Measured characteristics)
Measurement scales for variables
Differences between
measurements, true Ratio Data
zero exists
Quantitative Data
Differences between
measurements but no Interval Data
true zero
Ordered Categories
(rankings, order, or Ordinal Data
scaling)
Qualitative Data
Categories (no
ordering or direction) Nominal Data
Data type summary
Exercise
Classify the following as nominal, ordinal, interval or ratio
data.
1. Ethnic group
2. Marital status
3. Health status: very sick, sick and cured
4. Data on temperature
5. Height of students in a college
6. Age of employees in a company
7. Student mark
Chapter Two
Data Presentation
Sources of data and methods of data collection
1. Observation or measurement
2. Interviews and questionnaires
3. The use of documentary sources
Example: Consider Commercial Bank of Ethiopia (CBE) data
Raw data’s doesn’t facilitate decision making process!
89,17.21,100,11,3,90,45,41,67,87,34,69,3,39,63,
41,57,53,12,79, 91, 42 ,100, 62,73,1,38,56,45,25,
24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13,
12, 38, 41, 43, 44, 27, 53, 27 …
We need to summarize the data to make it Meaningful
Class Intervals
and Class Boundaries
24, 35, 17, 21, 24, 37, 26, 46, 58, 30,
32, 13, 12, 38, 41, 43, 44, 27, 53, 27
Frequency Distribution Example
(continued)
Relative
Interval Frequency Percentage
Frequency
10 but less than 20 3 .15 15
20 but less than 30 6 .30 30
30 but less than 40 5 .25 25
40 but less than 50 4 .20 20
50 but less than 60 2 .10 10
Total 20 1.00 100
The Cumulative
Frequency Distribuiton
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Cumulative Cumulative
Class Frequency Percentage
Frequency Percentage
.
Cont’d
.
Steps for constructing Grouped frequency Distribution
.
Cont’d
.
Cont’d
.
Graphical
Presentation of Data
Graph
Categorical Numerical
Variables Variables
Frequency
Distribution Bar Pie Pictograms
Table Chart Chart
The Frequency
Distribution Table
Summarize data by category
Hospital Number
Unit of Patients
2000
1000
0
Cardiac
Surgery
Emergency
Maternity
Intensive
Care
Care
Pie Chart Example
Hospital Number % of
Unit of Patients Total
Hospital Patients by Unit
Cardiac Care 1,052 11.93
Emergency 2,245 25.46 Cardiac Care
12%
Intensive Care 340 3.86
Maternity 552 6.26
Surgery 4,630 52.50
Emergency
Surgery 25%
53%
Intensive Care
(Percentages 4%
are rounded to Maternity
the nearest 6%
percent)
Pictograms
Represent the data by means of some picture
symbols
Decide a suitable picture to represent a definite
number of units
Example: Number of patients in each department
Interval Frequency
Histogram : Daily High Tem perature
10 but less than 20 3
20 but less than 30 6 7 6
30 but less than 40 5
40 but less than 50 4
6 5
50 but less than 60 2 5 4
Frequency
4 3
3 2
2
1 0 0
(No gaps 0
between 0 0 10 10 2020 30 30 40 40 50 50 60 60 70
bars) Temperature in Degrees
Frequency Polygon
Example
By considering the following histogram, explore how
the frequency polygon and cumulative frequency
polygon (less than and more than type) looks like.
Solution:
Cumulative frequency polygon
.
Chapter Three
Measures of central Tendency
Objectives of Measures of Central
Tendency
To determining a single value around which the
other data will concentrate
To summarizing/reducing the volume of the data
To facilitating comparison within one group or
between groups of data
Desirable Properties of
measures of central tendency
Mode Variance
Standard Deviation
Harmonic Mean
Coefficient of Variation
Geometric Mean
The Summation Notation (∑)
General Notation
Some Properties
Measures of Central Tendency
Overview
Central Tendency
x i
x i1
n
Arithmetic Midpoint of Most frequently
average ranked values observed value
Arithmetic Mean
The arithmetic mean (mean) is the most
common measure of central tendency
For a population of N values:
N
xx1 x 2 x N
i Population
μ
i1
values
N N
Population size
x i
x1 x 2 x n Observed
x i1
values
n n
Sample size
Arithmetic Mean
(continued)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
1 2 3 4 5 15 1 2 3 4 10 20
3 4
5 5 5 5
Properties of Arithmetic mean
Approximations for Grouped
Data
Suppose a data set contains values m1, m2, . . ., mk,
occurring with frequencies f1, f2, . . . fK
fimi K
where N fi
μ i1 i1
N
For a sample of n observations, the mean is
K
fm i i
where
K
n fi
x i1 i 1
n
Weighted Mean
w x i i
w 1x1 w 2 x 2 w n x n
x i1
w wi
Where wi is the weight of the ith observation
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 3 Median = 3
n 1
Median position position in the ordered data
2
If the number of values is odd, the median is the middle number
If the number of values is even, the median is the average of
the two middle numbers
n 1
Note that is not the value of the median, only the
2
position of the median in the ranked data
Mode
A measure of central tendency
Value that occurs most often
Not affected by extreme values
Used for either numerical or categorical data
There may may be no mode
There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
No Mode
Mode = 9
Review Example
Consider five houses in Addis Ababa
House Prices:
5,000,000
3,500,000
1,500,000
1,000,000
900,000
Review Example:
Summary Statistics
House Prices:
Mean: (11,900,000/5)
5,000,000 = 2,380,000
1,500,000
3,500,000
1,000,000
900,000 Median: middle value of ranked data
_____________ = 1,500,000
Sum 11,900,000
Quantiles:
1. Quartiles
2. Deciles and
3. Percentiles
Chapter Four
Measures of
Variation
Introduction
Dispersion refers to the variations of the items among
themselves
Mode Variance
Standard Deviation
Harmonic Mean
Coefficient of Variation
Geometric Mean
Objectives of Measuring Dispersion
To determine the reliability of an average
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Disadvantages of the Range
Ignores the way in which data are distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Mean Deviation and Coefficient of mean
deviation
Example:
Median X
X Q1 Q3 maximum
minimum (Q2)
25% 25% 25% 25%
12 30 45 57 70
Interquartile range
= 57 – 30 = 27
Quartile Formulas
(n = 9)
Q1 = is in the 0.25(9+1) = 2.5 position of the ranked data
so use the value half way between the 2nd and 3rd values,
so Q1 = 12.5
Population Variance
σ 2 i 1
N
Where μ = population mean
N = population size
xi = ith value of the variable x
Sample Variance
s
2 i1
n -1
Where X = arithmetic mean
n = sample size
Xi = ith value of the variable X
Population Standard Deviation
Most commonly used measure of variation
Shows variation about the mean
Has the same units as the original data
i
(x μ) 2
σ i 1
N
Sample Standard Deviation
Most commonly used measure of variation
Shows variation about the mean
Has the same units as the original data
i
Sample standard deviation:
(x x) 2
S i1
n -1
Calculation Example:
Sample Standard Deviation
Sample
Data (xi) : 10 12 14 15 17 18 18 24
n=8 Mean = x = 16
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 3.338
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 0.926
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 4.570
Advantages of Variance and
Standard Deviation
s
CV 100%
x
Comparing Coefficient
of Variation
Stock A:
Average price last year = $50
Standard deviation = $5
s $5
CVA 100% 100% 10%
x $50 Both stocks
Stock B: have the same
standard
Average price last year = $100 deviation, but
stock B is less
Standard deviation = $5 variable relative
to its price
s $5
CVB 100% 100% 5%
x $100
Standard Score
The standard score for a data computed by
Solution:
Mean=8, sd=3.8
Zscore=(14-8)/3.8=1.57
Skewness and Kurtosis
Skewness Kurtosis
Exercise
Consider the following data and answer the subsequent
questions?
3 5 2 4 6 2 7
Basic Probability
Concepts
Chapter Goals
After completing this chapter, you should be
able to:
Counting techniques
A AB B
Important Terms
(continued)
A B
Important Terms
(continued)
S
A A
Examples
Let the Sample Space be the collection of all
possible outcomes of rolling one die:
S = [1, 2, 3, 4, 5, 6]
Complements:
A [1, 3, 5] B [1, 2, 3]
Intersections:
A B [4, 6] A B [5]
Unions:
A B [2, 4, 5, 6]
A A [1, 2, 3, 4, 5, 6] S
Examples
(continued)
Mutually exclusive:
A and B are not mutually exclusive
The outcomes 4 and 6 are common to both
Collectively exhaustive:
A and B are not collectively exhaustive
A U B does not contain 1 or 3
Counting Rules
The Addition Rule
If A ∩ B = Ø, then n(A ∪ B) = n(A) + n(B)
If A1, A2, . . . , Ak are k pair-wise mutually exclusive
events, then n(A1∪A2 ∪ · · · ∪Ak )=∑ n(Ai)
For any events A & B, n (A ∪ B)=n (A) + n (B) – n(A∩ B).
Example
Rule1
Example
K = 6; n = 7 Thus, Total = 67
The Multiplication Rule …
Rule-2
Example
K1 = 8; K2 = 6; K3 = 3
Total = 8 x 6 x 3 = 144
Permutation
An arrangement of n objects in a specific order.
• Factorial: n! = n x (n – 1) x (n – 2) x ... x 1
Note that 1! = 0! = 1 by definition.
The number of permutation of n objects taken all
together is given by nPn (read as n permutation n) = n!
nPr n!
(n r)!
Permutation . . .
Example
1. Suppose that a photographer must arrange three
people in a row for a photograph. How many different
possible ways can the arrangement be done?
n=3
Since the photo is going to be taken all together, the
total possibility is given by: 3P3 = 3! = 3 x 2 x 1 = 6
Permutation . . .
n = 7; r = 4
Total number of ways = 7P4 = 7!/ (7-4)!
= 7x6x5x4 = 840
Combination
n! nPr
nCr
(n r)!r! r!
Combination . . .
Example
1. Suppose you plan to invest equal amounts of money in
each of five business areas. If you have 20 business
areas from which to make the selection, how many
different samples of five business area can be selected
from the 20?
n = 20; r = 5;
Total = 20C5 = 20!/ (20-5)!x5!
= 20!/ 15!x5!
= 15, 504
Exercise
1. How many different 7-digit license plates are
possible if the first 3 digit are to be occupied by
letters and the final 4 by numbers?
2. In the above example, how many license plates
would be possible if repetition among letters or
numbers were prohibited?
3. Assume there are 10 men and 8 women staffs in
mathematics department. In how many ways a
committee consists of 4 men and 2 women
selected?
Probability of an Event
0 Impossible
Cont’d
Four approaches to calculate a probability of an
event
1. The classical approach
2. The frequentist approach
3. The axiomatic approach and
4. The subjective approach
Assessing Probability
There are three approaches to assessing the
probability of an uncertain event:
1. Classical Probability
NA number of outcomes that satisfy the event
probability of event A
N total number of outcomes in the sample space
3. Subjective probability
an individual opinion or belief about the probability of occurrence
(the notation means that the summation is over all the basic outcomes in A)
3. P(S) = 1
P(CD AC) .2
P(CD | AC) .2857
P(AC) .7
Example 2
To study the proportion of smokers by sex from a
population a random sample of 200 persons was
taken, the following table shows the result.
Sex Non-Smoker Smoker Total
Male 64 16 80
Female 42 78 120
Total 106 94 200
P(AC) = 0.7
P(AC)P(CD) = (0.7)(0.4) = 0.28
P(CD) = 0.4
Probability
Distribution
Chapter Overview
Random Variables
Random Variable
Represents a possible numerical value from a
random experiment
Random
Variables
Discrete Continuous
Random Variable Random Variable
4 possible outcomes
Probability Distribution
T T x Value Probability
0 1/4 = .25
T H 1 2/4 = .50
2 1/4 = .25
H T
Probability
.50
.25
H H
0 1 2 x
Probability Distribution
Required Properties
P(x) 1
x
If x is continuous
E X X f x dx
Cont’d
Expected Value (or mean) of a discrete
distribution (Weighted Average)
μ E(x) xP(x)
x
x P(x)
Example: Toss 2 coins, 0 .25
x = # of heads, 1 .50
σ E(X μ) (x μ) P(x)
2 2 2
σ σ2 x
(x μ) 2
P(x)
Standard Deviation Example
σ x
(x μ) 2
P(x)
σ 2
Y Var(a bX) b σ
2 2
X
Probability
Distributions
Discrete Continuous
Probability Probability
Distributions Distributions
Binomial
Normal
Poisson
The Binomial Distribution
Probability
Distributions
Discrete
Probability
Distributions
Binomial
Poisson
Binomial Probability Distribution
A fixed number of observations, n
e.g., 15 tosses of a coin
Two mutually exclusive and collectively exhaustive
categories
e.g., head or tail in each toss of a coin; defective or not
defective light bulb
- Generally called “success” and “failure”
- Probability of success is P , probability of failure is 1 – P
Constant probability for each observation
e.g., Probability of getting a tail is the same each time we toss the coin
Observations are independent
The outcome of one observation does not affect the
outcome of the other
Possible Binomial Distribution
Settings
n! X nX
P(x) P (1- P)
x ! (n x )!
P(x) = probability of x successes in n trials,
with probability of success P on each trial
Example: Flip a coin four
times, let x = # heads:
x = number of ‘successes’ in sample,
n=4
(x = 0, 1, 2, ..., n)
P = 0.5
n = sample size (number of trials
or observations) 1 - P = (1 - 0.5) = 0.5
P = probability of “success” x = 0, 1, 2, 3, 4
Example:
Calculating a Binomial Probability
What is the probability of one success in five
observations if the probability of success is 0.1?
x = 1, n = 5, and P = 0.1
n!
P(x 1) P X (1 P)n X
x! (n x)!
5!
(0.1)1(1 0.1) 5 1
1! (5 1)!
(5)(0.1)(0.9) 4
.32805
Binomial Distribution
Mean and Variance
Mean
μ E(x) nP
Variance and Standard Deviation
σ nP(1- P)
2
σ nP(1- P)
Where n = sample size
P = probability of success
(1 – P) = probability of failure
Exercise
Probability
Distributions
Discrete
Probability
Distributions
Binomial
Poisson
The Poisson Distribution
λ
e λ x
P(x)
x!
where:
x = number of successes per unit
= expected number of successes per unit
e = base of the natural logarithm system (2.71828...)
Poisson Distribution
Characteristics
Mean
μ E(x) λ
Variance and Standard Deviation
σ E[( X ) ] λ
2 2
σ λ
where = expected number of successes per unit
Example
1. If x is a Poisson random variable with mean λ 2
find P(x=0)?
Solution
λ 2
e λ e 2x 0
P(x)
x! 0!
0.135
Example
Given Required
λ 5 a). P(x=10)=?
Distribution poisson b). P(x>3)=?
Solution:
λ 5 10
e λ e 5 x
a). P(x)
x! 10!
0.018
Cont’d
P(x>3)=1-P(≤3)=1-[p(x=0)+p(x=1)+p(x=2)]
5 0
e 5
P(x 0) 0.0067
0!
5 1
e 5
P(x 1) 0.0335
1!
e 5 52
P(x 2) 0.087
2!
=> P(x>3)=1-(0.0067+0.0335+0.087)=0.8728
Exercise
Let Addis Ababa Police Commission receives 15 phone call on
average, daily. Assume that the number of phone calls done per
day follow a Poisson probability distribution. Find the probability
that:
a) There is no phone call at a given day?
Normal Distribution
Student’s t- distribution
Exponential distribution
Normal Random Variables
Most important type of random variable is the normal
random variable
Normal probability distribution Characterized by two
parameters: mean &variance
The formula for the normal probability density function
is 1
e (x μ)
2 2
f(x) /2σ
2π
Where e = the mathematical constant approximated by 2.71828
π = the mathematical constant approximated by 3.14159
μ = the population mean
σ = the population standard deviation
x = any value of the continuous variable, < x <
The Normal Distribution
(continued)
‘Bell Shaped’
Symmetrical
f(x)
Mean, Median and Mode
are Equal
Location is determined by the σ
mean, μ
x
Spread is determined by the μ
standard deviation, σ
Mean
= Median
The random variable has an = Mode
infinite theoretical range:
+ to
Many Normal Distributions
F(b) P(X b)
a μ b x
F(a) P(X a)
a μ b x
a μ b x
The Standardized Normal
Any normal distribution (with any mean and variance
combination) can be transformed into the
standardized normal distribution (Z), with mean 0
and variance 1
f(Z)
Z ~ N(0 ,1) 1
Z
0
Need to transform X units into Z units by subtracting the
mean of X and dividing by its standard deviation
X μ
Z
σ
General procedure to read
Probability from Z table
Example: 0 2.00 Z
P(Z < 2.00) = .9772
The Standardized Normal Table
(continued)
.0228
Example:
0 2.00 Z
P(Z < -2.00) = 1 – 0.9772
= 0.0228 .9772
.0228
-2.00 0 Z
Finding Normal Probabilities
X
8.0
8.6
Finding Normal Probabilities
(continued)
Suppose X is normal with mean 8.0 and
standard deviation 5.0. Find P(X < 8.6)
X μ 8.6 8.0
Z 0.12
σ 5.0
μ=8 μ=0
σ = 10 σ=1
8 8.6 X 0 0.12 Z
.11 .5438
.12 .5478
Z
0.00
.13 .5517
0.12
Upper Tail Probabilities
X
8.0
8.6
Upper Tail Probabilities
(continued)
0.5478
1.000 1.0 - 0.5478
= 0.4522
Z Z
0 0
0.12 0.12
Example
IQ examination scores for sixth-graders are normally
distributed with mean value 100 and standard
deviation 14.2.
1. What is the probability a randomly chosen sixth-grader has a
score greater than 130?
2. What is the probability a randomly chosen sixth-grader has
score between 90 and 115?
Solution: Change the X values in to Z-values
1.
=0.0176
Cont’d
Exercise
Population Sample
a b cd b c
ef gh i jk l m n g i n o
o p q rs t u v w r u y
x y z
Values calculated using Values computed from
population data are called sample data are called
parameters statistics
Sampling Frame
Judgment sampling
Quota sampling
Convenience sampling
Probability Sampling
Simple Random Samples
Judgment sampling
Quota sampling
Convenience sampling
Convenience sampling