Elementary Statistics
Chapter 1
1
Chapter 1 Introduction to Survey and Statistics
❑ Work flow of conducting a survey
❑ Census and Sample survey
❑ Sampling Methods
❑ Numerical variable vs categorical variable
❑ Presentation of numerical dataset
❑ Summarize the finding in a simple paragraph
❑ Linear function of a variable
2
Example 1:
Suppose now the manager of a large company with 3000 employees
wants to collect information about employees' satisfactory level
towards the company. How should the manager plan this survey?
Q: Why do we have to conduct this survey?
Q: Who are eligible to participate in the
study?
Q: How to measure the level of satisfactory?
Q: How many employees should be involved in
the study?
Q: How to summarize the collected data to
present the result?
3
Q: Why do we have to conduct this survey?
The objective: understand the employees' satisfactory
level towards the company
Q: Who are eligible to participate in the study?
Subject: every employee in the company
Q: How to measure the level of satisfactory?
Variable: 10 point scale: 10: very satisfied ... 0: very unsatisfied
Q: How many employees should be involved in the study?
Most ideal: conducting census: all employees (3000 employees)
or more convenient: conducting survey: portion of the employees
(e.g. 500 employees)
Q: How to summarize the collected data to present the result?
Summary statistics, e.g. mean, standard deviation, 25th percentile,
75th percentile, ... 4
Difference between Census and Survey
Sample
Population
(Some employees
(Every employee
in this company)
in this company)
5
Limitations to conduct Census
❑ Time ❑ Manpower
❑ Budget ❑ Location
❑Possibilities to obtain data
A sample would be selected based on a fair
and random procedure and then data will be
collected and analysed.
6
When doing census, data would be collected from every employee (3000
data would be collected). Once the data collection is completed, we try to
understand the current situation by doing some data analysis (for example,
calculation of mean and standard deviation of the scores). You can imagine
that the mean satisfactory score for example equals to 9.5 or 2.4 represent
very different situation.
When doing survey, we would also analysis the data in order to draw
conclusion about the current situation. However, as the data collection is
incomplete, we need to be very careful when we try to make the
conclusion. The reliability of the conclusion you made from a sample
survey very much depends on how good is your sample as a representative
of the population.
7
Key Concepts in Statistics
A Population is the totality of elements (also called items, objects) under
consideration. Investigation based on the data of the whole population is
called a census. Sometimes it is too expensive or impossible to obtain data
on every object of the population. In this case, we conduct survey by
selecting some objects of the population for analysis in order to derive the
characteristics of the whole population. A Sample is a portion of the
population that is selected for analysis.
8
• Every
elements in
a survey is • Only some of
known as the elements
the in a survey is
population. known as the
sample.
• A survey
which • A survey which
collects data collects data
from the from part of
whole the population
population is is called a
called a sample survey.
census.
9
Types of Survey Sampling Methods
Sampling methods are classified into two types:
1. Non-Probability Sampling
• The probability of the members being selected into the
sample is unknown.
• Where selection is made for convenience and time saving
2. Probability Sampling
• Each member in the population has a known probability of
being selected into the sample.
• Where selection is based on the chance of occurrence
10
Sampling Methods
Sampling
Unknown Known
Non-
probability Probability
Sampling Sampling
Simple
Random
Sampling
Systematic
Sampling
Stratified
Sampling
11
Probability Sampling
• When selecting probability sample, we need to ensure every element has a chance
to be selected.
• Sampling Frame is a data file that contains information of the
population objects
For examples: a telephone directory, student registration
list, employment record, etc.
12
Method 1: Simple Random Sampling
▪ Selects objects such that every object of the population has an equal chances
of being selected.
▪ Identify each element in the sampling frame let say with a unique identity
number and then sample can be selected by using a Random Number Table,
computer, etc.
1 2 3 4 5
6 7 8 9 10
Population 13
Example 1:
Select a sample of size 500 from 3000 employees by
simple random sampling method.
using a Random Number Table
14
Simple Random Sampling
Advantage: ____________________
Disadvantage: ______________________________
15
Method 2: Systematic Sampling
Systematic Sampling selects the first object a randomly and the
rest by a fixed interval k,
16
Example 1:
Select a sample of size 500 from 3000 employees by
systematic sampling method.
Steps:
1. Assign a unique number to each employee
2. Find the ratio k
3. Randomly select a starting number a
4. Select a, a + k, a + 2k, ……and so on, until 500 numbers are chosen
5. The corresponding employees with these numbers are chosen as
sample for the survey
17
Systematic Sampling
Advantage: ____________________
Disadvantage: ______________________________
_____________________________
18
Method 3: Stratified Sampling
▪ Stratified sampling divides the whole population into
distinct subgroups (called strata)
▪ Elements inside each strata share a common characteristic
▪ Individual samples are then selected from each of the strata
randomly.
▪ Strata are often sampled in proportion to their actual
percentages of occurrence in the overall population.
19
Stratified Sampling
20
Example 1:
Select a stratified sample of size 500 from 3000 employees, for
whom 600 are managers and the other 2400 are junior staffs.
Steps:
1. Find the sample size for each subgroup
Sample size for Managers:
Sample size for Junior staffs:
2. Draw 100 managers and 400 junior staffs randomly from each of the strata to
form a sample of 500 employees. 21
Stratified Sampling
Advantage: __________________________
Disadvantage: ___________________
22
Non-probability Samples
Select sample based on a convenience way (e.g.
street interview). Practical when no sampling
frame is available.
23
Example 2:
How does a sample of 500 teenagers to be selected in
order to review the satisfactory level towards a brand of
cola?
Solution:
As the population size is very large, all teenagers in
Hong Kong, it is impossible to prepare a sampling
frame. A more practical way is to invite 500 teenagers to
join the survey by convenience.
24
Variable
A variable is a characteristic of the individual to be
measured or observed.
e.g. age, weight, gender, nationality, income
Variables
25
Types of Data
26
Numerical variable: Data consists of numbers that represent counts or
measurements
Discrete: Data only takes place at particular values
(usually integers)
Continuous: Data covers a range of values
27
Categorical variable: Data consists of names that represent
categories
Nominal: No natural order between categories
Ordinal: There exists a natural order between
categories
28
Question
State whether each of the following questions provides numerical
or categorical data and indicate the types of data for each
question.
(a)What is your age?
(b)Which school are you studying at?
(c) What type of video games do you frequently play?
(d) How much do you spend on video games per month?
Example 3:
This is the result of part of the survey. How many variables
are there? What are the data types of the variables?
30
We usually use capital letter, e.g. X to denote the
variable and use small letter, x to denote the collected
data. Suppose let X represents the gender of an
employee, x1 = "F", x2 = "M", x3 = "F", x4 = "F", x5 = "M".
Sample size is usually denoted by n (n = 5) and
population size is denoted by N (N = 3000).
31