0% found this document useful (0 votes)
83 views218 pages

Basic Statics

Uploaded by

yonasante2121
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views218 pages

Basic Statics

Uploaded by

yonasante2121
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 218

Basic Statistics

(Stat 2011)

By: Bedilu Alamirie (PhD)


Addis Ababa University, Ethiopia
April, 2021
Course Outline
Cont’d
Cont’d
Cont’d

 Suggested references
1. Bluman, A.G. (1995). Elementary Statistics: A Step by Step
Approach. Wm. C. Brown Communications, Inc.
2. Mekonnen Tadesse and Bedilu Alamirie (2018). Basic
Statistics, Aster Nega publishing interprise.
Course Delivery and Evaluation

 Method of Course Delivery


- Class room lecturing
- Discussion
- Group Assignment
 Method of Evaluation

- Continuous Assessment (50%)

- Final Exam (50%)

 Contact Information

- Email: [email protected]
- Office: 103, Freshman building
Chapter one

INTRODUCTION
Chapter 1 Goals
After completing this chapter, you should be able
to:
 Define statistics
 Explain key definitions:
 Population vs. Sample
 Parameter vs. Statistic
 Explain the difference between Descriptive and Inferential
statistics
 Stages in Statistical Investigation
 Identify types of data and levels of measurement scale
Definition of statistics

What is statistics?
 Currently, the word STATISTICS used in two senses:
1) In its plural sense - Numerical data
2) In its singular sense - a subject/science which
studies principles and methods employed in the
collection, presentation, analysis and interpretation
of data.
 Statistics as a subject is the study of making sense of
data in describing certain situations.
Cont’d

 Data
- Facts/figures from which conclusions drawn
- Raw materials of statistics

 Example: Attrition Rate of Students by College


Definitions basic terms
 A population is the collection of all items of interest or under
investigation
 N represents the population size
 A sample is an observed subset of the population
 n represents the sample size

 A parameter is a specific characteristic of a population

 A statistic is a specific characteristic of a sample


Population vs. Sample

Population Sample

a b cd b c
ef gh i jk l m n gi n
o p q rs t u v w o r u
x y z y

Values calculated using Values computed from


population data are called sample data are called
parameters statistics
cont’d
 Main goal of study: Making statements about a
population by examining sample results
Sample statistics Population parameters
(known) Inference (unknown, but can be
estimated from sample evidence)

Sample Population

 Too small trial may not representative for the


population
 Large trials are costy
Why Sample?
 Less time consuming than a census

 Less costly to administer than a census

 Require less manpower to execute.

 It is possible to obtain statistical results of a


sufficiently high precision based on samples.
Classifications of Statistics

Two branches of statistics:


 Descriptive statistics
 Collecting, summarizing, and processing data to
transform data into information

 Inferential statistics
 Provide the bases for predictions, forecasts, and
estimates that are used to transform information into
knowledge
Descriptive Statistics

 Collect data
 e.g., Survey

 Present data
 e.g., Tables and graphs

 Summarize data
 e.g., Sample mean =
 X i

n
Inferential Statistics

 Making statements about a population by


examining sample results
Sample statistics Population parameters
(known) Inference (unknown, but can
be estimated from
sample evidence)

Sample Population
Inferential Statistics
 Estimation
 e.g., Estimate the population
mean weight using the sample
mean weight
 Hypothesis testing
 e.g., Test the claim that the
population mean weight of
statistics students is 58 Kg

Inference is the process of drawing conclusions or


making decisions about a population based on
sample results
Stages in Statistical Investigation

Decision

Analysis & Interpretation


Experience, Theory,
Literature, Inferential
Statistics, Computers
Organization & Presentation
Descriptive Statistics,
Begin Here: Probability, Computers
Data
Identify the
Problem
Application and Limitation of Statistics
 Uses of statistics
1. To represent the facts in the form of numerical
data.
2. To summarize a mass of data into a few
presentable understandable and precise figures.
3. To easily compare summarized figures
4. To Predict or forecast future trend.
5. To help select a course of action among a number
of alternatives.
6. To help in formulating policies.
Limitations of Statistics
1. It does not study qualitative characteristics directly
i.e. Beauty, honesty, and standard of living

2. It does not study a single individual but deals with


aggregate of facts.

3. It is sensitive for misuse


Examples: The number of car accidents committed in a city in a
particular year by women drivers is 5 while that committed by
men drivers is 20. Hence women drivers are safe drivers.

 What if the available women drivers in that city are only 5?


Types of Data
Data

Qualitative/ Quantitative/
Categorical Numerical

Examples:
 Marital Status
 Are you registered to Discrete Continuous
vote?
 Eye Color Examples: Examples:
(Defined categories or  Number of Children  Weight
groups)  Number of laptops  Voltage
(Counted items) (Measured characteristics)
Measurement scales for variables
Differences between
measurements, true Ratio Data
zero exists
Quantitative Data

Differences between
measurements but no Interval Data
true zero

Ordered Categories
(rankings, order, or Ordinal Data
scaling)
Qualitative Data

Categories (no
ordering or direction) Nominal Data
Data type summary
Exercise
 Classify the following as nominal, ordinal, interval or ratio
data.

1. Ethnic group
2. Marital status
3. Health status: very sick, sick and cured
4. Data on temperature
5. Height of students in a college
6. Age of employees in a company
7. Student mark
Chapter Two
Data Presentation
Sources of data and methods of data collection

 Aggregated data are statistical data if they are


1. Comparable
2. Meaningful and
3. Collected for a well defined objective
 The required data can be obtained from either a

primary source or a secondary source.


 Methods of data collection

1. Observation or measurement
2. Interviews and questionnaires
3. The use of documentary sources
Example: Consider Commercial Bank of Ethiopia (CBE) data
 Raw data’s doesn’t facilitate decision making process!

How the CBE manager will


use this data for decision making?
Frequency Distributions

Frequency: - is the number of times a certain value or


set of values occurs in a specific group.
A frequency distribution is a table that presents data
according to some criteria with the corresponding
number of items falling in each class
Example: Biology student age distribution

Age Group Frequency


10 - 20 5
20 - 30 69
30 - 40 20
Types of Frequency Distribution
 Ungrouped frequency distribution
- Is a table of all potential values that could possibly occur in the
data collection along with their corresponding frequencies.
Example: Consider age of 20 students who read in library last night
30, 41, 39, 41, 32, 29, 35, 31, 30, 36, 33, 36, 32, 42, 30,
35, 37, 32, 30, and 41.
Grouped Frequency
distribution

 When the range of the data is large, the data must be


grouped into classes
 Example : Mark of a student in a class

89,17.21,100,11,3,90,45,41,67,87,34,69,3,39,63,
41,57,53,12,79, 91, 42 ,100, 62,73,1,38,56,45,25,
24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13,
12, 38, 41, 43, 44, 27, 53, 27 …
 We need to summarize the data to make it Meaningful
Class Intervals
and Class Boundaries

 Each class grouping has the same width


 Determine the width of each interval by
largest number  smallest number
w  interval width 
number of desired intervals

 Use at least 5 but no more than 15-20 intervals


 Intervals never overlap
 Round up the interval width to get desirable
interval endpoints
Frequency Distribution Example

Example: A biologist at AAU randomly selects 20


winter days and count the number of different
species in different locations

24, 35, 17, 21, 24, 37, 26, 46, 58, 30,
32, 13, 12, 38, 41, 43, 44, 27, 53, 27
Frequency Distribution Example
(continued)

 Sort raw data in ascending order:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

 Steps to do Frequency Distribution


1. Find range: 58 - 12 = 46
2. Select number of classes: 5 (usually between 5 and 15)
3. Compute interval width: 10 (46/5 then round up)

4. Determine interval boundaries: 10 but less than 20, 20 but


less than 30, . . . , 60 but less than 70

5. Count observations & assign to classes


Frequency Distribution Example
(continued)
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Relative
Interval Frequency Percentage
Frequency
10 but less than 20 3 .15 15
20 but less than 30 6 .30 30
30 but less than 40 5 .25 25
40 but less than 50 4 .20 20
50 but less than 60 2 .10 10
Total 20 1.00 100
The Cumulative
Frequency Distribuiton
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Cumulative Cumulative
Class Frequency Percentage
Frequency Percentage

10 but less than 20 3 15 3 15


20 but less than 30 6 30 9 45
30 but less than 40 5 25 14 70
40 but less than 50 4 20 18 90
50 but less than 60 2 10 20 100
Total 20 100
Common terms in Frequency Distribution table

 .
Cont’d

 .
Steps for constructing Grouped frequency Distribution

 .
Cont’d

 .
Cont’d

 .
Graphical
Presentation of Data

 Data in raw form are usually not easy to use


for decision making
 Some type of organization is needed
 Table

 Graph

 The type of graph to use depends on the


variable/data type being summarized
Graphical
Presentation of Data
 Techniques reviewed in this Section:

Categorical Numerical
Variables Variables

• Frequency distribution • Line chart


• Bar chart • Frequency distribution
• Pie chart • Histogram and O-give
• Pictograms • Frequency Polygon
Tables and Graphs for
Categorical Variables
Categorical
Data

Tabulating Data Graphing Data

Frequency
Distribution Bar Pie Pictograms
Table Chart Chart
The Frequency
Distribution Table
Summarize data by category

Example: Hospital Patients by Unit


Hospital Unit Number of Patients

Cardiac Care 1,052


Emergency 2,245
Intensive Care 340
Maternity 552
Surgery 4,630
(Variables are
categorical)
Exercise
 The following data are taken from the medical
records department at a certain hospital. The data
include the blood type and gender(in bracket) of
patients.

 Construct a frequency distribution for the variable


blood type.
Bar and Pie Charts

 Bar charts and Pie charts are often used


for qualitative (category) data

 Height of bar or size of pie slice shows the


frequency or percentage for each
category
Bar Chart Example

Hospital Number
Unit of Patients

Cardiac Care 1,052


Emergency 2,245 Hospital Patients by Unit
5000
Intensive Care 340
Maternity 552
patients per year
Number of 4000
Surgery 4,630
3000

2000

1000

0
Cardiac

Surgery
Emergency

Maternity
Intensive
Care

Care
Pie Chart Example

Hospital Number % of
Unit of Patients Total
Hospital Patients by Unit
Cardiac Care 1,052 11.93
Emergency 2,245 25.46 Cardiac Care
12%
Intensive Care 340 3.86
Maternity 552 6.26
Surgery 4,630 52.50

Emergency
Surgery 25%
53%

Intensive Care
(Percentages 4%
are rounded to Maternity
the nearest 6%
percent)
Pictograms
 Represent the data by means of some picture
symbols
 Decide a suitable picture to represent a definite
number of units
Example: Number of patients in each department

Year 2000 2001 2002 2003


No. 2000 3000 5000 7000
Student
Histogram Example

Interval Frequency
Histogram : Daily High Tem perature
10 but less than 20 3
20 but less than 30 6 7 6
30 but less than 40 5
40 but less than 50 4
6 5
50 but less than 60 2 5 4
Frequency

4 3
3 2
2
1 0 0
(No gaps 0
between 0 0 10 10 2020 30 30 40 40 50 50 60 60 70
bars) Temperature in Degrees
Frequency Polygon
Example
 By considering the following histogram, explore how
the frequency polygon and cumulative frequency
polygon (less than and more than type) looks like.
Solution:
Cumulative frequency polygon

 .
Chapter Three
Measures of central Tendency
Objectives of Measures of Central
Tendency
 To determining a single value around which the
other data will concentrate
 To summarizing/reducing the volume of the data
 To facilitating comparison within one group or
between groups of data
Desirable Properties of
measures of central tendency

 Be simple to understand and easy to


calculate/interpret,
 Exist and be unique
 Be rigidly defined by mathematical formula,
 Be based on all observations,
 Not be seriously affected by extreme
observations,
 Have capable of further statistical analysis
Describing Data Numerically
Describing Data Numerically

Central Tendency Variation

Arithmetic Mean Range

Median Interquartile Range

Mode Variance

Standard Deviation
Harmonic Mean
Coefficient of Variation
Geometric Mean
The Summation Notation (∑)

 General Notation

 Some Properties
Measures of Central Tendency
Overview
Central Tendency

Mean Median Mode

x i
x i1
n
Arithmetic Midpoint of Most frequently
average ranked values observed value
Arithmetic Mean
 The arithmetic mean (mean) is the most
common measure of central tendency
 For a population of N values:
N

xx1  x 2    x N
i Population
μ 
i1
values
N N
Population size

 For a sample of size n:


n

x i
x1  x 2    x n Observed
x i1
 values
n n
Sample size
Arithmetic Mean
(continued)

 The most common measure of central tendency


 Mean = sum of values divided by the number of values
 Affected by extreme values (outliers)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Mean = 3 Mean = 4
1  2  3  4  5 15 1  2  3  4  10 20
 3  4
5 5 5 5
Properties of Arithmetic mean

 sum of the deviations of the items from their


arithmetic mean is zero.


Approximations for Grouped
Data
Suppose a data set contains values m1, m2, . . ., mk,
occurring with frequencies f1, f2, . . . fK

 For a population of N observations the mean is


K

 fimi K
where N   fi
μ i1 i1

N
 For a sample of n observations, the mean is
K

fm i i
where
K
n   fi
x i1 i 1

n
Weighted Mean

 The weighted mean of a set of data is

w x i i
w 1x1  w 2 x 2    w n x n
x i1

w  wi
 Where wi is the weight of the ith observation

 Use when data is already grouped into n classes, with


wi values in the ith class
Example

 The following table presents the result of 4th


year biology student assessment result in
different examinations. Compute the average
mark of student A?
Geometric Mean
Geometric mean for frequency
Distribution
Properties of Geometric Mean
Harmonic Mean
Median
 In an ordered list, the median is the “middle”
number (50% above, 50% below)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Median = 3 Median = 3

 Not affected by extreme values


Finding the Median

 The location of the median:

n 1
Median position  position in the ordered data
2
 If the number of values is odd, the median is the middle number
 If the number of values is even, the median is the average of
the two middle numbers

n 1
 Note that is not the value of the median, only the
2
position of the median in the ranked data
Mode
 A measure of central tendency
 Value that occurs most often
 Not affected by extreme values
 Used for either numerical or categorical data
 There may may be no mode
 There may be several modes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6

No Mode
Mode = 9
Review Example
 Consider five houses in Addis Ababa

House Prices:

5,000,000
3,500,000
1,500,000
1,000,000
900,000
Review Example:
Summary Statistics

House Prices:
 Mean: (11,900,000/5)
5,000,000 = 2,380,000
1,500,000
3,500,000
1,000,000
900,000  Median: middle value of ranked data
_____________ = 1,500,000
Sum 11,900,000

 Mode: most frequent value


= None
Which measure of location
is the “best”?

 Mean is generally used, unless extreme


values (outliers) exist

 Then median is often used, since the median


is not sensitive to extreme values.
 Example: Median home prices may be reported for
a region – less sensitive to outliers
Exercise
 The number of suits sold daily by a women’s boutique for
the past 6 days has been arranged in the following
frequency table:
 Number of suits sold/day: 3 4 5
 Number of days: 2 1 3
a) What is the sample mean of the number of suits sold
daily?
b) What is the sample median of the number of suits sold
daily?
c) What is the mode of the number of suits sold daily?
Cont’d
 Consider the following hypothetical data and answer
the subsequent question

 1. What is the median age of patients in hospital C?


 2. Compute the mean and harmonic mean of patients
height in Hospital A
Reading Assignment

 Quantiles:
1. Quartiles
2. Deciles and
3. Percentiles
Chapter Four
Measures of
Variation
Introduction
 Dispersion refers to the variations of the items among
themselves

 Dispersion refers to the variation of the items around


an average

 If the series is the same, there will be no variation


among different items of a series
Describing Data Numerically
Describing Data Numerically

Central Tendency Variation

Arithmetic Mean Range

Median Interquartile Range

Mode Variance

Standard Deviation
Harmonic Mean
Coefficient of Variation
Geometric Mean
Objectives of Measuring Dispersion
 To determine the reliability of an average

 To compare the variability of two or more series

 For facilitating the use of other statistical


measures

 Basis of statistical quality control


Absolute and Relative Measures

 Absolute measures of dispersion


- expressed in the same unit in which the original
data are given i.e. Kg, mg, tones
- Suitable for comparing the variability in two
distributions having variables expressed in the same
units & of the same averaging size.
 Relative measures of dispersion

- It is the ratio of a measure of absolute dispersion to

an appropriate average/selected items of the data


Range
 Simplest measure of variation
 Difference between the largest and the smallest
observations:
Range = Xlargest – Xsmallest

Example:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 14 - 1 = 13
Disadvantages of the Range
 Ignores the way in which data are distributed

7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5

 Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Mean Deviation and Coefficient of mean
deviation

 The mean or average deviation is defined by

 Coefficient of Mean deviation

Example: Consider the variable X with values 2,4,5,3,5


- Compute the mean deviation and coefficient mean
deviation for X?
Interquartile Range

 Can eliminate some outlier problems by using


the interquartile range

 Eliminate high- and low-valued observations


and calculate the range of the middle 50% of
the data

 Interquartile range = 3rd quartile – 1st quartile


IQR = Q3 – Q1
Interquartile Range

Example:
Median X
X Q1 Q3 maximum
minimum (Q2)
25% 25% 25% 25%

12 30 45 57 70

Interquartile range
= 57 – 30 = 27
Quartile Formulas

Find a quartile by determining the value in the


appropriate position in the ranked data, where

First quartile position: Q1 = 0.25(n+1)

Second quartile position: Q2 = 0.50(n+1)


(the median position)

Third quartile position: Q3 = 0.75(n+1)

where n is the number of observed values


Quartiles

 Example: Find the first quartile


Sample Ranked Data: 11 12 13 16 16 17 18 21 22

(n = 9)
Q1 = is in the 0.25(9+1) = 2.5 position of the ranked data
so use the value half way between the 2nd and 3rd values,

so Q1 = 12.5
Population Variance

 Average of squared deviations of values from the


mean
N
 Population variance:
 (x i  μ) 2

σ 2 i 1
N
Where μ = population mean
N = population size
xi = ith value of the variable x
Sample Variance

 Average (approximately) of squared deviations


of values from the mean
n
 Sample variance:
 (x  x)i
2

s 
2 i1
n -1
Where X = arithmetic mean
n = sample size
Xi = ith value of the variable X
Population Standard Deviation
 Most commonly used measure of variation
 Shows variation about the mean
 Has the same units as the original data

 Population standard deviation:

 i
(x  μ) 2

σ i 1
N
Sample Standard Deviation
 Most commonly used measure of variation
 Shows variation about the mean
 Has the same units as the original data

 i
 Sample standard deviation:
(x  x) 2

S i1
n -1
Calculation Example:
Sample Standard Deviation
Sample
Data (xi) : 10 12 14 15 17 18 18 24
n=8 Mean = x = 16

(10  X)2  (12  x)2  (14  x)2    (24  x)2


s
n 1

(10  16)2  (12  16)2  (14  16)2    (24  16)2



8 1

126 A measure of the “average”


  4.2426 scatter around the mean
7
Measuring variation

Small standard deviation

Large standard deviation


Comparing Standard Deviations

Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 3.338

Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 0.926
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 4.570
Advantages of Variance and
Standard Deviation

 Each value in the data set is used in the


calculation

 Values far from the mean are given extra


weight
(because deviations from the mean are squared)
Coefficient of Variation

 Measures relative variation


 Always in percentage (%)
 Shows variation relative to mean
 Can be used to compare two or more sets of
data measured in different units

 s
CV     100%
x 
Comparing Coefficient
of Variation
 Stock A:
 Average price last year = $50

 Standard deviation = $5

s  $5
CVA    100%  100%  10%
x  $50 Both stocks
 Stock B: have the same
standard
 Average price last year = $100 deviation, but
stock B is less
 Standard deviation = $5 variable relative
to its price
s  $5
CVB    100%  100%  5%
x  $100
Standard Score
 The standard score for a data computed by

 Example: What is the Z-score for the value of 14 in the


following sample data set?
3 8 6 14 4 12 7 10

Solution:
Mean=8, sd=3.8
 Zscore=(14-8)/3.8=1.57
Skewness and Kurtosis
 Skewness  Kurtosis
Exercise
 Consider the following data and answer the subsequent
questions?
3 5 2 4 6 2 7

1. Compute the IQR?


2. Find the mean deviation?
3. Find the sample standard deviation?
4. Compute the coefficient of variation, and interpret the result?
Chapter Five

Basic Probability
Concepts
Chapter Goals
After completing this chapter, you should be
able to:

 Definitions of basic terms

 Explain basic probability concepts

 Counting techniques

 Addition and Multiplication rules of Probability


Introduction . . .
 Random Experiment – a process leading to an
uncertain outcome.

• Outcome is the result of a single trial of an


experiment.
A Head
A four
Important Terms

 Sample Space – the collection of all possible


outcomes of a random experiment

 Event – any subset of basic outcomes from the


sample space

 Basic Outcome – a possible outcome of a random


experiment
Important Terms
(continued)

 Intersection of Events – If A and B are two


events in a sample space S, then the
intersection, A ∩ B, is the set of all outcomes in
S that belong to both A and B

A AB B
Important Terms
(continued)

 A and B are Mutually Exclusive Events if they


have no basic outcomes in common
 i.e., the set A ∩ B is empty

A B
Important Terms
(continued)

 Union of Events – If A and B are two events in a


sample space S, then the union, A U B, is the
set of all outcomes in S that belong to either
A or B
S The entire shaded
area represents
A B AUB
Important Terms
(continued)

 The Complement of an event A is the set of all


basic outcomes in the sample space that do not
belong to A. The complement is denoted A

S
A A
Examples
Let the Sample Space be the collection of all
possible outcomes of rolling one die:

S = [1, 2, 3, 4, 5, 6]

Let A be the event “Number rolled is even”


Let B be the event “Number rolled is at least 4”
Then
A = [2, 4, 6] and B = [4, 5, 6]
Examples
(continued)

S = [1, 2, 3, 4, 5, 6] A = [2, 4, 6] B = [4, 5, 6]

Complements:
A  [1, 3, 5] B  [1, 2, 3]

Intersections:
A  B  [4, 6] A  B  [5]
Unions:
A  B  [2, 4, 5, 6]
A  A  [1, 2, 3, 4, 5, 6]  S
Examples
(continued)

S = [1, 2, 3, 4, 5, 6] A = [2, 4, 6] B = [4, 5, 6]

 Mutually exclusive:
 A and B are not mutually exclusive
 The outcomes 4 and 6 are common to both

 Collectively exhaustive:
 A and B are not collectively exhaustive
 A U B does not contain 1 or 3
Counting Rules
The Addition Rule
 If A ∩ B = Ø, then n(A ∪ B) = n(A) + n(B)
 If A1, A2, . . . , Ak are k pair-wise mutually exclusive
events, then n(A1∪A2 ∪ · · · ∪Ak )=∑ n(Ai)
 For any events A & B, n (A ∪ B)=n (A) + n (B) – n(A∩ B).
 Example

 In a survey, 100 people are asked whether they drink or


smoke or do both or neither. The results are: 60 drink, 30
smoke, 20 do both and 30 do neither. Are these numbers
compatible with each other?
Counting Rules

N = 100; N(D) = 60; N(S) = 30; N(DnS) = 20; N(DcnSc)


= 30
N(D u S) = N(S) + N(D) – N(DnS) = 30 + 60 – 20 = 70
N = N(DuS) + N(DcnSc)
= 70 + 30 They are compatible!
The Addition . . .
If two events A and B are not mutually exclusive, then
P (A U B) = P (A) + P (B) – P (A and B)
Exercise
1. There are 80 nurses and 40 physicians in a hospital. Of
these, 70 nurses and 15 physicians are females. If a staff
person is selected at random, find the probability that the
subject is a nurse or male.
Male Female Total
Nurse 10 70 80
Physician 25 15 40
Total 35 85 120

P(N u M) = P(N) + P(M) – P(N n M)


= 80/120 + 35/ 120 – 10/ 120 = 105/ 120
The Multiplication Rule

 Rule1

 If each event in a sequence of n events has K


possibilities, then the total number of possibilities will
be. K.… K = Kn

 Example

Seven dice are rolled. How many different outcomes


are there?

K = 6; n = 7 Thus, Total = 67
The Multiplication Rule …

Rule-2

• In a sequence of n events, if there are m ways a


first event can occur and n ways a second event
can occur, the total number of ways the two
events can occur is given by m x n.
The Multiplication Rule …

Example

There are 8 different statistics, 6 different calculus and 3


different physics books. A student must select one book
of each type. How many different ways can this be done?

K1 = 8; K2 = 6; K3 = 3
Total = 8 x 6 x 3 = 144
Permutation
 An arrangement of n objects in a specific order.
• Factorial: n! = n x (n – 1) x (n – 2) x ... x 1
Note that 1! = 0! = 1 by definition.
 The number of permutation of n objects taken all
together is given by nPn (read as n permutation n) = n!

 An arrangement of n objects in a specific order using r


objects at a time is given by:

nPr  n!
(n  r)!
Permutation . . .
 Example
1. Suppose that a photographer must arrange three
people in a row for a photograph. How many different
possible ways can the arrangement be done?

n=3
Since the photo is going to be taken all together, the
total possibility is given by: 3P3 = 3! = 3 x 2 x 1 = 6
Permutation . . .

2. How many different four – letter permutations can be


formed from the letters in the word DECAGON?

n = 7; r = 4
Total number of ways = 7P4 = 7!/ (7-4)!
= 7x6x5x4 = 840
Combination

 It is a counting technique in which the order of the


objects is immaterial

 Combination of n objects r objects taken at a time is


given by nCr = n! / (n-r)! r!

Example: In a club containing 7 members a


committee of 3 people is to be formed. In how many
ways can the committee be formed?
Solution: 7C3 = 7! / (7-3)! 3! = 35
Combination
 A selection of objects without regard to order.
 Example: Given the letters A, B, C and D. List the permutations
and combinations for selecting two letters.
Permutation AB; AC; AD; BA;
BC; BD; CA; CB;
CD; DA; DB; DC
Combination AB; AC; AD; BC; BD;
CD
 The number of combination of r objects selected from n objects
is

n! nPr
nCr  
(n  r)!r! r!
Combination . . .
Example
1. Suppose you plan to invest equal amounts of money in
each of five business areas. If you have 20 business
areas from which to make the selection, how many
different samples of five business area can be selected
from the 20?
n = 20; r = 5;
Total = 20C5 = 20!/ (20-5)!x5!
= 20!/ 15!x5!
= 15, 504
Exercise
1. How many different 7-digit license plates are
possible if the first 3 digit are to be occupied by
letters and the final 4 by numbers?
2. In the above example, how many license plates
would be possible if repetition among letters or
numbers were prohibited?
3. Assume there are 10 men and 8 women staffs in
mathematics department. In how many ways a
committee consists of 4 men and 2 women
selected?
Probability of an Event

 Probability – the chance that 1 Certain


an uncertain event will occur
(always between 0 and 1)

0 ≤ P(A) ≤ 1 For any event A .5

0 Impossible
Cont’d
 Four approaches to calculate a probability of an
event
1. The classical approach
2. The frequentist approach
3. The axiomatic approach and
4. The subjective approach
Assessing Probability
 There are three approaches to assessing the
probability of an uncertain event:

1. Classical Probability
NA number of outcomes that satisfy the event
probability of event A  
N total number of outcomes in the sample space

 Assumes all outcomes in the sample space are equally likely to


occur

Stat 2181 By: Bedilu A


Assessing Probability
Three approaches (continued)
2. Relative frequency probability
nA number of events in the population that satisfy event A
probabilit y of event A  
n total number of events in the population

 the limit of the proportion of times that an event A occurs in a large


number of trials, n

3. Subjective probability
an individual opinion or belief about the probability of occurrence

Stat 2181 By: Bedilu A


4. Axiomatic Approach/Probability Postulates

1. If A is any event in the sample space S, then


0  P(A)  1

2. Let A be an event in S, and let Oi denote the basic


outcomes. Then
P(A)   P(Oi )
A

(the notation means that the summation is over all the basic outcomes in A)

3. P(S) = 1

Stat 2181 By: Bedilu A


Probability Rules

 The Complement rule:


P(A)  1 P(A) i.e., P(A)  P(A)  1

 The Addition rule:


 The probability of the union of two events is

P(A  B)  P(A)  P(B)  P(A  B)

Stat 2181 By: Bedilu A


Conditional Probability
 A conditional probability is the probability of one
event, given that another event has occurred:

P(A  B) The conditional


P(A | B)  probability of A given
P(B) that B has occurred

P(A  B) The conditional


P(B | A)  probability of B given
P(A) that A has occurred
Conditional Probability Example

 Of the cars on a used car lot, 70% have air


conditioning (AC) and 40% have a CD player
(CD). 20% of the cars have both.

 What is the probability that a car has a CD


player, given that it has AC ?

i.e., we want to find P(CD | AC)


Conditional Probability Example
(continued)
 Of the cars on a used car lot, 70% have air conditioning
(AC) and 40% have a CD player (CD).
20% of the cars have both.
CD No CD Total
AC .2 .5 .7
No AC .2 .1 .3
Total .4 .6 1.0

P(CD  AC) .2
P(CD | AC)    .2857
P(AC) .7
Example 2
 To study the proportion of smokers by sex from a
population a random sample of 200 persons was
taken, the following table shows the result.
Sex Non-Smoker Smoker Total
Male 64 16 80
Female 42 78 120
Total 106 94 200

a) What is the probability of getting a non smoker given that a


person selected is a female?
b) What is the probability of getting a male given that a person
selected is smoker?
Stat 2181 By: Bedilu A
Solution
 P (M) = 80/200, P(F) = 120/200
 P(S) = 94/200, P(N) = 106/200
 P(M n S)= 16/200, P(F n N)=42/200

1) P(N/F) = P(N n F)/P(F) =42/120= 0.35

2) P(M/S)=P(M nS)/P(S) =16/94= 0.17

Stat 2181 By: Bedilu A


Statistical Independence
 Two events are statistically independent if
and only if:
P(A  B)  P(A) P(B)
 Events A and B are independent when the probability of one event
is not affected by the other event
 If A and B are independent, then

P(A | B)  P(A) if P(B)>0

P(B | A)  P(B) if P(A)>0

Stat 2181 By: Bedilu A


Statistical Independence Example
 Of the cars on a used car lot, 70% have air conditioning
(AC) and 40% have a CD player (CD).
20% of the cars have both.
CD No CD Total
AC .2 .5 .7
No AC .2 .1 .3
Total .4 .6 1.0

 Are the events AC and CD statistically independent?

Stat 2181 By: Bedilu A


Statistical Independence Example
(continued)
CD No CD Total
AC .2 .5 .7
No AC .2 .1 .3
Total .4 .6 1.0
P(AC ∩ CD) = 0.2

P(AC) = 0.7
P(AC)P(CD) = (0.7)(0.4) = 0.28
P(CD) = 0.4

P(AC ∩ CD) = 0.2 ≠ P(AC)P(CD) = 0.28


So the two events are not statistically independent
Stat 2181 By: Bedilu A
Chapter Six

Probability
Distribution
Chapter Overview

 Random Variables

 Expectation-Mean and Variance of a Random


Variable

 Discrete Probability Distribution

 Continues Probability Distribution


Introduction
 A random variable, X, provides a means of assigning
numerical values to experimental outcomes.
 Probability distribution for a random variable
describes how the probabilities are distributed over the
values of the random variable
Example: Consider different ordering of boys and girls in a family

Stat 2181 By: Bedilu A


Random Variables

 Random Variable
 Represents a possible numerical value from a

random experiment
Random
Variables

Discrete Continuous
Random Variable Random Variable

Stat 2181 By: Bedilu A


Discrete Random Variables
 Can only take on a countable number of values
Examples:

 Roll a die twice


Let X be the number of times 4 comes up
(then X could be 0, 1, or 2 times)

 Toss a coin 5 times.


Let X be the number of heads
(then X = 0, 1, 2, 3, 4, or 5)
Discrete Probability Distribution
Experiment: Toss 2 Coins. Let X = # heads.
Show P(x) , i.e., P(X = x) , for all values of x:

4 possible outcomes
Probability Distribution
T T x Value Probability
0 1/4 = .25
T H 1 2/4 = .50
2 1/4 = .25
H T
Probability

.50

.25
H H
0 1 2 x
Probability Distribution
Required Properties

 P(x)  0 for any value of x

 The individual probabilities sum to 1;

P(x)  1
x

(The notation indicates summation over all possible x values)


. If
X

Introduction to Expectation- Mean


and Variance of a Random Variable

 If x is discrete random variable


EX    x PX
i  xi 

 If x is continuous

E X    X f x dx
Cont’d
 Expected Value (or mean) of a discrete
distribution (Weighted Average)
μ  E(x)   xP(x)
x

x P(x)
 Example: Toss 2 coins, 0 .25
x = # of heads, 1 .50

compute expected value of x: 2 .25

E(x) = (0 x .25) + (1 x .50) + (2 x .25)


= 1.0
Variance and Standard
Deviation
 Variance of a discrete random variable X

σ  E(X  μ)   (x  μ) P(x)
2 2 2

 Standard Deviation of a discrete random variable X

σ  σ2  x
(x  μ) 2
P(x)
Standard Deviation Example

 Example: Toss 2 coins, X = # heads,


compute standard deviation (recall E(x) = 1)

σ x
(x  μ) 2
P(x)

σ  (0  1)2 (.25)  (1 1)2 (.50)  (2  1)2 (.25)  .50  .707

Possible number of heads


= 0, 1, or 2
Properties of Expected values
(continued)
 Let random variable X have mean µx and variance σ2x
 Let a and b be any constants.
 Let Y = a + bX
 Then the mean and variance of Y are
μY  E(a  bX)  a  bμX

σ 2
Y  Var(a  bX)  b σ
2 2
X

 so that the standard deviation of Y is


σY  b σX
Example

 Find the expected value of the following random


variable
𝑿 0 1 2 3 4
𝑷(𝑿) 0.18 0.34 0.23 0.21 0.04
Probability Distributions

Probability
Distributions

Discrete Continuous
Probability Probability
Distributions Distributions

Binomial

Normal

Poisson
The Binomial Distribution

Probability
Distributions

Discrete
Probability
Distributions

Binomial

Poisson
Binomial Probability Distribution
 A fixed number of observations, n
e.g., 15 tosses of a coin
 Two mutually exclusive and collectively exhaustive
categories
e.g., head or tail in each toss of a coin; defective or not
defective light bulb
- Generally called “success” and “failure”
- Probability of success is P , probability of failure is 1 – P
 Constant probability for each observation
 e.g., Probability of getting a tail is the same each time we toss the coin
 Observations are independent
 The outcome of one observation does not affect the
outcome of the other
Possible Binomial Distribution
Settings

 A manufacturing plant labels items as either


defective or acceptable
 True/False exam
 A marketing research firm receives survey responses
of “yes I will buy” or “no I will not”
 New job applicants either accept the offer or reject it
Binomial Distribution Formula

n! X nX
P(x)  P (1- P)
x ! (n  x )!
P(x) = probability of x successes in n trials,
with probability of success P on each trial
Example: Flip a coin four
times, let x = # heads:
x = number of ‘successes’ in sample,
n=4
(x = 0, 1, 2, ..., n)
P = 0.5
n = sample size (number of trials
or observations) 1 - P = (1 - 0.5) = 0.5
P = probability of “success” x = 0, 1, 2, 3, 4
Example:
Calculating a Binomial Probability
What is the probability of one success in five
observations if the probability of success is 0.1?
x = 1, n = 5, and P = 0.1

n!
P(x  1)  P X (1 P)n X
x! (n  x)!
5!
 (0.1)1(1 0.1) 5 1
1! (5  1)!
 (5)(0.1)(0.9) 4
 .32805
Binomial Distribution
Mean and Variance

Mean

μ  E(x)  nP
 Variance and Standard Deviation

σ  nP(1- P)
2

σ  nP(1- P)
Where n = sample size
P = probability of success
(1 – P) = probability of failure
Exercise

1. In a certain true or false exam, there are 10 questions


set for the candidate. If a candidate guesses the answer
at each time
a) What is the probability that the candidate will get 8 or
more correct answers?
b) What is the probability that the candidate will answer
6 questions correctly?
c) What is the mean number of correct answers you
would expect the candidate to obtain? Find the
variance?
The Poisson Distribution

Probability
Distributions

Discrete
Probability
Distributions

Binomial

Poisson
The Poisson Distribution

 Apply the Poisson Distribution when:


 You wish to count the number of times an event occurs in
a given continuous interval
 The probability that an event occurs in one subinterval is
very small and is the same for all subintervals
 The number of events that occur in one subinterval is
independent of the number of events that occur in the
other subintervals
 There can be no more than one occurrence in each
subinterval
 The average number of events per unit is  (lambda)
Poisson Distribution Formula

λ
e λ x
P(x) 
x!
where:
x = number of successes per unit
 = expected number of successes per unit
e = base of the natural logarithm system (2.71828...)
Poisson Distribution
Characteristics

Mean

μ  E(x)  λ
 Variance and Standard Deviation
σ  E[( X   ) ]  λ
2 2

σ λ
where  = expected number of successes per unit
Example
1. If x is a Poisson random variable with mean λ 2
find P(x=0)?

Solution
λ 2
e λ e 2x 0
P(x)  
x! 0!
 0.135
Example

Example 1: Simple observation over the past five year


has shown that on average there are 5 car accidents per
day at Addis Ababa city. What is the probability that :

1. Exactly 10 car accidents will happen at any given day?

2. More than 3 car accidents students happen at any given day?


Solution

Given Required
λ 5 a). P(x=10)=?
Distribution  poisson b). P(x>3)=?

Solution:
λ 5 10
e λ e 5 x
a). P(x)  
x! 10!
 0.018
Cont’d
 P(x>3)=1-P(≤3)=1-[p(x=0)+p(x=1)+p(x=2)]
5 0
e 5
P(x  0)   0.0067
0!
5 1
e 5
P(x  1)   0.0335
1!
e  5 52
P(x  2)   0.087
2!

=> P(x>3)=1-(0.0067+0.0335+0.087)=0.8728
Exercise
Let Addis Ababa Police Commission receives 15 phone call on
average, daily. Assume that the number of phone calls done per
day follow a Poisson probability distribution. Find the probability
that:
a) There is no phone call at a given day?

b) Exactly 10 phone calls per day?

c) More than 2 Phone calls at a given day?


Continuous Probability
Distributions

 Normal Distribution

 Student’s t- distribution

 Exponential distribution
Normal Random Variables
 Most important type of random variable is the normal
random variable
 Normal probability distribution Characterized by two
parameters: mean &variance
 The formula for the normal probability density function
is 1
e (x μ)
2 2
f(x)  /2σ

2π
Where e = the mathematical constant approximated by 2.71828
π = the mathematical constant approximated by 3.14159
μ = the population mean
σ = the population standard deviation
x = any value of the continuous variable,  < x < 
The Normal Distribution
(continued)

 ‘Bell Shaped’
 Symmetrical
f(x)
 Mean, Median and Mode
are Equal
Location is determined by the σ
mean, μ
x
Spread is determined by the μ
standard deviation, σ
Mean
= Median
The random variable has an = Mode
infinite theoretical range:
+  to  
Many Normal Distributions

By varying the parameters μ and σ, we obtain


different normal distributions
Finding Normal Probabilities
(continued)

F(b)  P(X  b)

a μ b x

F(a)  P(X  a)

a μ b x

P(a  X  b)  F(b)  F(a)

a μ b x
The Standardized Normal
 Any normal distribution (with any mean and variance
combination) can be transformed into the
standardized normal distribution (Z), with mean 0
and variance 1
f(Z)

Z ~ N(0 ,1) 1
Z
0
 Need to transform X units into Z units by subtracting the
mean of X and dividing by its standard deviation

X μ
Z
σ
General procedure to read
Probability from Z table

 Draw the picture


 Translate X-values to Z-values

 Shade the area desired

 Find the correct figure Basic Steps

 Follow the direction


Example
 Find the area under the normal distribution curve
between Z=0, and Z=2.34
 Solution:

1. Draw the picture


0 2.34
2. Shade the area desired

3. Find the correct figure


4. Follow the direction
The Standardized Normal Table
 If X is distributed normally with mean of 100 and
standard deviation of 50, the Z value for X = 200
is
X  μ 200  100
Z   2.0
σ 50
.9772
 P(x<200)=P(Z <2)

Example: 0 2.00 Z
P(Z < 2.00) = .9772
The Standardized Normal Table
(continued)

 For negative Z-values, use the fact that the


distribution is symmetric to find the needed
probability:
.9772

.0228
Example:
0 2.00 Z
P(Z < -2.00) = 1 – 0.9772
= 0.0228 .9772
.0228

-2.00 0 Z
Finding Normal Probabilities

 Suppose X is normal with mean 8.0 and


standard deviation 5.0
 Find P(X < 8.6)

X
8.0
8.6
Finding Normal Probabilities
(continued)
 Suppose X is normal with mean 8.0 and
standard deviation 5.0. Find P(X < 8.6)
X  μ 8.6  8.0
Z   0.12
σ 5.0

μ=8 μ=0
σ = 10 σ=1

8 8.6 X 0 0.12 Z

P(X < 8.6) P(Z < 0.12)


Solution: Finding P(Z < 0.12)
Standardized Normal Probability
Table (Portion)
P(X < 8.6)
= P(Z < 0.12)
z F(z) F(0.12) = 0.5478
.10 .5398

.11 .5438

.12 .5478
Z
0.00
.13 .5517
0.12
Upper Tail Probabilities

 Suppose X is normal with mean 8.0 and


standard deviation 5.0.
 Now Find P(X > 8.6)

X
8.0
8.6
Upper Tail Probabilities
(continued)

 Now Find P(X > 8.6)…


P(X > 8.6) = P(Z > 0.12) = 1.0 - P(Z ≤ 0.12)
= 1.0 - 0.5478 = 0.4522

0.5478
1.000 1.0 - 0.5478
= 0.4522

Z Z
0 0
0.12 0.12
Example
 IQ examination scores for sixth-graders are normally
distributed with mean value 100 and standard
deviation 14.2.
1. What is the probability a randomly chosen sixth-grader has a
score greater than 130?
2. What is the probability a randomly chosen sixth-grader has
score between 90 and 115?
Solution: Change the X values in to Z-values
1.

=0.0176
Cont’d
Exercise

Let the mark of mathematics student mid test result


follows a normal distribution with mean 18 and
standard deviation 7. Find the probabilities that
a. The mark of a student is less than 12?
b. The mark of a student is between 5 and 15
marks?
Stat 2181 By: Bedilu A
Chapter Seven

Sampling and Sampling distribution of


the sample mean
Introduction: Basic Concepts
 Most researchers come to a conclusion of their study
by studying a small sample from the huge population
or universe.
 To draw conclusions about population from sample,

there are two major requirements for a sample:


1. the sample size should be adequately large.
2. the sample has to representative of the population.

 Sampling techniques is concerned with the


selection of representative sample, especially for the
purposes of statistical inference.
Key Definitions
 A population is the collection of all items of interest
or under investigation
 N represents the population size
 A sample is an observed subset of the population
 n represents the sample size
Population vs. Sample

Population Sample
a b cd b c
ef gh i jk l m n g i n o
o p q rs t u v w r u y

x y z
Values calculated using Values computed from
population data are called sample data are called
parameters statistics
Sampling Frame

 Sampling frame is a list of all elements in the target


population.
 There is a risk of drawing wrong conclusion from the survey if
the sample has been selected from a sampling frame that
differs from the population. The problems are:
1. Under-coverage: occurs if the target population contains
elements that do not have a counterpart in the sampling frame.
e.g. an online survey, where respondents are selected via the
Internet. In this case, there will be under-coverage due to people
having no Internet access.
2. Over-coverage: sampling frame contains elements that do not
belong to the target population.
- If such elements end up in the sample and their data are used
in the analysis, estimates of population parameters may be
affected.
Reasons for Sampling
 Reduced Cost
 Greater Speed
 Greater Accuracy: Measurement errors typically
can be controlled more effectively in a small
undertaking than in a large one
 Greater Scope
 When a test involves the destruction of an item
 When a population is infinite, information about it
can be obtained only from a sample
Essentials of Samples
 Sample should be representative of the entire
parent universe

 Number of sample to be selected should be such


that the limits of variation between them may be
easily explained
 The first requirement of any sampling procedure is
the avoidance of human bias.

 How to select the sample


Types of Sampling
 Sampling techniques/methods can be grouped into
two categories
1. Random (probability) sampling methods
- Each member of the population has an equal and
known chance of being selected.

2. Non-random (non-probability) sampling


methods
- Each member of the population have not equal
chance to be selected as a sample
Probability Sampling

 Simple random sampling (S.R.S)

 Stratified/cluster random sampling

 Systematic random sampling

 Multi-stage random sampling


Non-Probability Sampling

 Judgment sampling

 Quota sampling

 Convenience sampling
Probability Sampling
Simple Random Samples

 Every object in the population has an equal chance of


being selected
 Objects are selected independently
 Samples can be obtained from a table of random numbers
or computer random number generators

 A simple random sample is the ideal against which other


sample methods are compared
Developing a
Sampling Distribution

 Assume there is a population …


C D
 Population size N=4 A B
 Random variable, X,
is age of individuals
 Values of X:
18, 20, 22, 24 (years)
Stratified random sampling
 It is preferred when the population is heterogeneous with
respect to characteristic under study.
 In this method, the complete population is divided into
homogenous sub groups called "Strata" and then a stratified
sample is obtained by independently selecting a separate
 Simple random sample from each population stratum.

 Some of the criteria for dividing a population into strata are:


Sex (male, female); Age (under 18, 18 to 28, 29 to 39);
 Random samples taken within a stratum will have much less
variability than a random sample taken across all strata. This is
true because sample units within each stratum tend to have
characteristics that are similar.
Systematic Random Sampling
 Systematic sampling is a commonly employed
technique, when complete and up to date list of
sampling units is available.
 A systematic random sample is obtained by selecting

one unit on a random basis and then choosing


additional units at evenly spaced intervals until the
desired number of sample size is obtained.
 Let N=population size; n=sample size and k is
sampling interval.
=> Then choose randomly a number between 1 and k.
Example on blackboard!
Cluster Sampling
 Clusters are formed by grouping units on the basis
of their geographical locations. Thus, elements
within a cluster are heterogeneous.
 It is obtained by selecting clusters from the
population on the basis of simple random sampling
so that each and every units in the selected clusters
will be included in the sample.

 The advantage of cluster sampling is that sampling


frame is not required and in practice when complete
lists are rarely available, cluster sampling is suitable.
Multistage Sampling
 In this method, the whole population is divided in first
stage sampling units from which a random sample is
selected.

 The selected first stage is then subdivided into


second stage units from which another sample is
selected. Third and fourth stage sampling is done in
the same manner if necessary.

e.g Studying malaria prevalence in Ethiopia


Region => Zone => Woreda => Kebele
Non-Probability Sampling

 Judgment sampling

 Quota sampling

 Convenience sampling
Convenience sampling

 In convenience sampling, we select individuals into


our sample based on their availability to the
investigators rather than selecting subjects at
random from the entire population

 The extent to which the sample is representative of


the target population is not known
Quota Sampling
 We determine a specific number of individuals to
select into our sample in each of several specific
groups
 Similar to stratified sampling in that we develop non-
overlapping groups and sample a predetermined
number of individuals within each
 E.g. Suppose our desired sample size is n=300, and we wish to ensure
that the distribution of subjects' ages in the sample is similar to that in the
population. We know from census data that approximately 30% of the
population are under age 20; 40% are between 20 and 49; and 30% are
50 years of age and older. We would then sample n1=90 persons under
age 20, n2=120 between the ages of 20 and 49 and n3=90 who are 50
years of age and older.
Judgment Sampling
 Samples selected according to the opinion of an
expert.
 Use when you want a quick sample and you believe
you are able to select a sufficiently representative
sample for your purposes
 It is a biased method that is useful when some
members of a population make better subjects than
others.
 Judgment sampling is often a last-resort method that
may be used when there is no time to do a proper
study.
Sampling distribution of the sample
mean
 If we consider all the samples of size n that can be
drawn from a population, we can compute sample
statistic such as mean or variance of each sample.
 Value of sample statistic will vary from sample to
sample

 Theoretical distribution that relates the possible


values of the sample mean to the probability of all
possible samples of size n is called the sampling
distribution of the mean
Example
 Suppose that all possible samples of size two are drawn from a
population, having the following N=4 elements.
3, 6, 8, 11.
Construct the sampling distribution of the mean.
Central limit theorem
 If the parent population is not normally distributed, the
distribution of the means of the samples still tends to
be normally distributed if the size of the samples and
the parent population are sufficiently large i.e., n>30
and N>2n.
Survey Error
 Sampling Error – Who are you sampling?
 Coverage Error – Does your list include everyone?
 Measurement Error – Does everyone answer a question
the same way?

 Non-response Error – Why did respondent not answer:

- Instrument (whole questionnaire not returned)


- Item (question not answered)

You might also like