INTRODUCTION TO
STATISTICS
Lecturer: LE HONG VAN
Foreign Trade University – HCM Campus
Email:
[email protected]SYLLABUS
Chapter 1: Introduction to Statistics
Chapter 2: Data collection and Summarizing
Chapter 3: Descriptive Statistics
Chapter 4-5: Inferential Statistics
Chapter 6: Correlation and Regression
Chapter 7: Time-series analysis and Forecasting
Chapter 8: Indexes
Textbook
Business Statistics– 8th edition (David
F.Groebner)
References:
- Statistics for Business and Economics – 11rd
edition, 2003 (Anderson Sweeney Williams)
- Handouts and Turorials
STUDYING METHOD
Statistically thinking
Doing exercises
Group presentation
Self-study
GRADING BREAKDOWN
MARK (%) FORM OF ASSESSMENT
ATTENDANCE 10% ATTENDANCE CHECK
MID – TERM TEST 30% WRITINGTEST + GROUP
PROJECTS
FINAL EXAM 60% WRITINGTEST
(MULTIPLE CHOICES+
PROBLEMS)
PLUS MARK
CLASSROOM ETIQUETTE
Do not lay your head on the desk, fall asleep.
Do not do your nails, apply makeup, or do
work for other classes.
Keep your mobile phones silent.
Bring good questions. We are not mind
readers, please ask questions if you do not
understand.
CLASSROOM ETIQUETTE
Food and beverages should not be
consumed in the classroom.
Do not use laptops to do private things
If you arrive late to class or you are
returning after an absence, please sit
down quietly without making a production.
TREAT OTHERS THE WAY YOU WANT TO BE TREATED!
PRESENTATION (choose one of 4
projects)
Projects # 1: Survey
Projects # 2: Inference
Projects # 3: Correlation and Regression
Projects # 4: Time-series analysis and Forecasting
In detailed….
I. What is statistics?
- In a very general way:
Statistics numerical information
- Furthermore:
Statistics Statistical methods
- Collect
- describe
- summarize
- present
- analyze
More details, Statistics covers some
major jobs:
Making sense of numerical information
Dealing with uncertainty
Sampling
Analyzing relationships
Forecasting
Decision making in an uncertain environment
WHO USES STATISTICS?
Business Physical
Economics, Engineering, Sciences
Marketing, Astronomy,
Computer Science Chemistry, Physics
Areas where
STATISTICS
are used
Health &
Medicine Environment
Agriculture,
Genetics, Clinical Trials, Ecology, Forestry,
Epidemiology, Animal Populations
Pharmacology
Government
Census, Law,
National Defense
Source: American Statistical
Association
Applications in
Business and Economics
Accounting
Public accounting firms use statistical
sampling procedures when conducting audits
for their clients.
Economics
Economists use statistical
information in making forecasts
about the future of the economy
or some aspects of it.
Applications in
Business and Economics
Marketing
Electronic point-of-sale scanners at retail
checkout counters are used to collect data
for a variety of marketing research
applications.
Production
A variety of statistical quality
control charts are used to monitor
the output of a production
process.
Applications in
Business and Economics
Finance
Financial advisors use price-earnings ratios and dividend
yields to guide their investment recommendations.
II/ Definitions
1/ Population is the WHOLE set of all items or
individuals of interest
2/ Sample is an observed subset of population values
3/ Variable is a characteristic that changes or varies over
time for different individuals or objects under
consideration
Population vs. Sample
Population Sample
a b cd b c
ef gh i jk l m n gi n
o p q rs t u v w o r u
x y z y
III/ Descriptive statistics and Inferential
statistics
Statistics
Descriptive Inferential
Statistics Statistics
1/ Descriptive statistics
Descriptive statistics: Methods used to summarize
and describe the main features of the whole population
in quantitative term.
Tabular, graphical, and numerical methods (mean,
median, variance, standard deviation…)
Used when we can enumerate the whole population
Descriptive Statistics
- Collect data
e.g., Survey, Observation,
Experiments
- Present data
e.g., Charts and graphs
- Characterize data x i
e.g., Calculate mean = n
2/ Inferential Statistics
Inferential statistics: Procedures used to draw
conclusions or inferences about the characteristics of a
population from information obtained from the sample.
Making estimates, testing hypothesis…
Used when we can not enumerate the whole population
Inferential Statistics
Drawing conclusions and/or making decisions
concerning a population based on sample results.
Estimation
e.g., Estimate the population mean weight
using the sample mean weight
Hypothesis Testing
e.g., Use sample evidence to test the claim
that the population mean weight is 120
pounds
IV. Quantitative and qualitative data
Data can be classified as being qualitative
or quantitative.
Depends on whether the data are qualitative or
quantitative, we choose the most
appropriate statistical methods
In general, there are more statistical analysis for
quantitative data.
Qualitative Data
Labels or names used to identify an attribute of each
element.
Often be referred to as categorical data
Nominal or ordinal scale of measurement will be applied to
summarize this kind of data
Usually nonnumeric data
Therefore, appropriate statistical analyses are rather limited
in comparison with those of quantitative data
Examples
Eye colors:
1.Brown 2.Black 3.Blue 4.Green
Marital status:
1. Single
2. Married
3. Divorced
4. Widowed
Quantitative Data
Quantitative data can be described as data under the
numeric form. It indicates how many or how much:
There are two types of quantitative data:
discrete data: Continuous data:
- can measure precisely. - can not measured
- Only a finite number of precisely
values is possible. - An infinite number of
- Example: values is possible.
- Example:
Quantitative Data
E.g.
(i)The number of students in a class
(ii)The number of correct answers in a test
(iii)People’s height, weight; students’ GPA
V. Scales of Measurement
Scales of measurement include:
Nominal Interval
Ordinal Ratio
The scale determines the amount of information
contained in the data.
The scale indicates the data summarization and
statistical analyses that are most appropriate.
Level of measurements
Highest Level
Measurements
Ratio/Interval Scale Complete Analysis
Rankings Higher Level
Ordered Categories Ordinal Scale Mid-level Analysis
Categorical Codes Lowest Level
ID Numbers Nominal Scale Basic Analysis
Category Names
Scales of Measurement
Nominal
Data are labels or names used to identify an
attribute of the element.
A nonnumeric label or numeric code may be used.
Example
Students of a university are classified by the
school in which they are enrolled using a
nonnumeric label such as Business, Humanities, Education,
and so on.
Alternatively, a numeric code could be used for the school
variable (e.g. 1 denotes Business,2 denotes Humanities, 3
denotes Education, and so on).
Example
Please state which fuel are you using at home?
1. Firewood
2. Coal
3. Oil
4. Gas
Scales of Measurement
Ordinal
The data have the properties of nominal data and
the order or rank of the data is meaningful.
A nonnumeric label or numeric code may be used.
Example
Students of a university are classified by their class
standing using a nonnumeric label such as Freshman,
Sophomore, Junior, or Senior.
Alternatively, a numeric code could be used for the
class standing variable (e.g. 1 denotes Freshman, 2
denotes Sophomore, and so on).
Example
Please order the kind of fuel that is the most favorite
one for you?
( ) Firewood
( ) Coal
( ) Oil
( ) Gas
Scales of Measurement
Interval
The data have the properties of ordinal data, and
the interval between observations is expressed in
terms of a fixed unit of measure.
Interval data are always numeric.
There is no zero value that indicates
that nothing exists for the variable at the zero point.
Scales of Measurement
Interval
The ratio of two values of interval scale is not
Meaningful because there is no zero value in this
scale.
Example: Melissa has an SAT score of 800, while
Kevin has an SAT score of 400. Melissa scored
400 points more than Kevin.
Example
Please state your opinion on customer service at one
restaurant?
-3 -2 -1 +1 +2 +3
Not friendly Friendly
Scales of Measurement
Ratio
The data have all the properties of interval data
and the ratio of two values is meaningful.
This scale must contain a zero value that indicates
that nothing exists for the variable at the zero point.
Variables such as distance, height, weight, and time
use the ratio scale.
Example
Melissa’s college record shows 36 credit hours earned, while
Kevin’s record shows 72 credit hours earned. Kevin has
twice as many credit hours earned as Melissa.
Example
Assume that you spend VND 100,000 for your family’s fuel.
Please distribute this amount for different kinds that you are
interested in?
1. Firewood.................VND
2. Coal.........................VND
3. Oil............................VND
4. Gas..........................VND
Example: there is a survey on FTU’s students. Describe
them as quantitative or qualitative, and the scales of
measurement
1. Full name:..........................................
2. Sex: Male Female
3. Age :
4. Which year student:
1st 2nd 3rd 4th
5. a/ Have you got a part-time job?
Yes No
b/ If yes, how many hours per week?...........
c/ What do you think how much does your part-
time job fit your study field?
Very suitable Not at all
5 4 3 2 1
DATA COLLECTION
Methods of Data Collection:
Cencus
Sample survey
Experiment
Observational study
Census. A census is a study that obtains data from every member of a
population. In most studies, a census is not practical, because of the cost
and/or time required.
Sample survey. A sample survey is a study that obtains data from a
subset of a population, in order to estimate population attributes.
Experiment. An experiment is a controlled study in which the
researcher attempts to understand cause-and-effect relationships. The
study is "controlled" in the sense that the researcher controls (1) how
subjects are assigned to groups and (2) which treatments each group
receives
Observational study. Like experiments, observational studies
attempt to understand cause-and-effect relationships. However, unlike
experiments, the researcher is not able to control (1) how subjects are
assigned to groups and/or (2) which treatments each group receives.
Survey Design Steps
Define the issue
what are the purpose and objectives of the survey?
Define the population of interest
Formulate survey questions
make questions clear and unambiguous
use universally-accepted definitions
limit the number of questions
Survey Design Steps
Pre-test the survey
pilot test with a small group of participants
assess clarity and length
Determine the sample size and sampling method
Select Sample and administer the survey
Types of Questions
Closed-end Questions
◦ Select from a short list of defined choices
Example: Major: __business __liberal arts
__science __other
Open-end Questions
◦ Respondents are free to respond with any value, words, or
statement
Example:What did you like best about this course?
Demographic Questions
◦ Questions about the respondents’ personal characteristics
Example: Gender: __Female __ Male
Populations and Samples
A Population is the set of all items or individuals of interest
◦ Examples: All likely voters in the next election
All parts produced today
All sales receipts for November
A Sample is a subset of the population
◦ Examples: 1000 voters selected at random for interview
A few parts selected for destructive testing
Every 100th receipt selected for audit
Why Sample?
Less time consuming than a census
Less costly to administer than a census
It is possible to obtain statistical results of a sufficiently high
precision based on samples.
Non-probability samples
Voluntary sample
• A voluntary sample is made up of people
who self-select into the survey
Convenience sample
• A convenience sample is made up of
people who are easy to reach
Statistical Sampling
Items of the sample are chosen based on known or calculable
probabilities
Probability Samples
Simple Stratified Systematic Cluster
Random
Simple Random Samples
Every individual or item from the population has an equal
chance of being selected
Selection may be with replacement or without replacement
Samples can be obtained from a table of random numbers
or computer random number generators
Stratified Samples
Population divided into subgroups (called strata) according to
some common characteristic
Simple random sample selected from each subgroup
Samples from subgroups are combined into one
Population
Divided
into 4
strata
Sample
Systematic Samples
Decide on sample size: n
Divide frame of N individuals into groups of k individuals:
k=N/n
Randomly select one individual from the 1st group
Select every kth individual thereafter
N = 64
n=8 First Group
k=8
Cluster Samples
Population is divided into several “clusters,” each
representative of the population
A simple random sample of clusters is selected
All items in the selected clusters can be used, or items can be
chosen from a cluster using another probability sampling
technique
Population
divided into
16 clusters.
Randomly selected
clusters for sample
BIAS IN SURVEY SAMPLING
Bias often occurs when the survey sample does not accurately
represent the population
Two causes of bias:
selection bias
Response bias
Selection bias
Results from an unrepresentative sample
3 types of selection bias
Undercoverage
Non-response
Voluntary response
To improve survey quality: use random sampling
Response bias
Results from problems in the measurement process
Two common causes:
Leading question
Social desirability
Learn to View Statistics with a
Critical Eye
There are three kinds of lies…..
Lies
Damn Lies
Statistics
You need to make statistics work for you, not lie for
you!
Alert
“Statistics don’t lie, statisticians do.”
Exercise 1
Describe the variable implicit in these 10 items as quantitative or
qualitative, and describe the scale of measurement
1. Age of household head
2. Sex of household head
3. Number of people in household
4. Use of electric heating (yes/no)
5. Numbers of large appliances used daily
6. Average number of hours heating is on
7. Average number of heating days
8. Household incomes
9. Average monthly electric bill
10. Ranking of this electric company among 4 electricity suppliers
Problem
An auto analyst is conducting a satisfaction survey, sampling from a list of
10,000 new car buyers. The list includes 2,500 Ford buyers, 2,500 GM
buyers, 2,500 Honda buyers, and 2,500 Toyota buyers. The analyst selects a
sample of 400 car buyers, by randomly sampling 100 buyers of each brand.
Is this an example of a simple random sample?
(A)Yes, because each buyer in the sample was randomly sampled.
(B) Yes, because each buyer in the sample had an equal chance of being
sampled.
(C) Yes, because car buyers of every brand were equally represented in
the sample.
(D) No, because every possible 400-buyer sample did not have an equal
chance of being chosen.
(E) No, because the population consisted of purchasers of four different
brands of car.
Problem
Which of the following statements are true?
I. Random sampling is a good way to reduce response bias.
II. To guard against bias from undercoverage, use a convenience
sample.
III. Increasing the sample size tends to reduce survey bias.
IV. To guard against nonresponse bias, use a mail-in survey.
(A) I only
(B) II only
(C) III only
(D) IV only
(E) None of the above.