0% found this document useful (0 votes)
31 views38 pages

03 Inferential Statistics2025

The document outlines the fundamentals of statistical inference, focusing on inferential statistics, estimation, and hypothesis testing. It explains key concepts such as populations, samples, parameters, and statistics, as well as the types of data and statistical procedures used in analysis. The document also details the process of hypothesis testing, including null and alternative hypotheses, test statistics, p-values, and the potential for Type I and Type II errors.

Uploaded by

saraalbloushi32
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views38 pages

03 Inferential Statistics2025

The document outlines the fundamentals of statistical inference, focusing on inferential statistics, estimation, and hypothesis testing. It explains key concepts such as populations, samples, parameters, and statistics, as well as the types of data and statistical procedures used in analysis. The document also details the process of hypothesis testing, including null and alternative hypotheses, test statistics, p-values, and the potential for Type I and Type II errors.

Uploaded by

saraalbloushi32
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Statistical Inference

Dr. Emad Masuadi, BSc, MSc, MPhil, Ph.D.


Assistant Professor of Biostatistics
Institute of Public Health, College of Medicine
United Arab Emirates University
Objectives
At the end of this session, students should be able to:

1. understand the main keywords in inferential statistics

2. differentiate between the two main areas in inferential


statistics (estimation and hypothesis testing)

3. Identify the most appropriate statistical procedure to answer


their research question
What is Statistics?

› Statistics is a science that deals with collecting,


managing, summarizing, presenting data and
making decisions.
› There are two main areas of Statistics:
– Descriptive statistics:
provides tabular, graphical techniques and numerical measures for
describing data.
– Inferential statistics:
provides procedures for analyzing data and making decisions. Using
the sample (statistic) to infer about the population (parameter)

3
Descriptive Statistics

Describing data by:

› Tables: Frequency tables or Relative frequency table.

› Graphs: Pie , Bar , Line , Histogram, Boxplot or stem-


and- leaf

› Numerical measures: Mean, Median, Mode, SD,


Variance, Percentiles, quartiles or Proportion
(prevalence)

4
Inferential statistics: Key Definitions

› A population is the collection of all items or things


under consideration –people or objects
› A sample is a portion of the population selected for
analysis (Selected at random)
› A parameter is any numerical measure calculated
based on the population elements
› A statistic is any numerical measure calculated based
on the sample elements

5
Inferential Statistics

› Making statements about a population by examining sample


results
› We calculate sample’s statistics to infer about the
population’s parameter
Sample statistics Population parameters
(known variable) Inference (unknown constant, but
can be estimated from sample)

6
Types of data

› Data are the facts, figures, or records that are


collected from the sample elements.
› Data can be classified:

– Categorical (qualitative) data are labels or names used to identify


attributes of the sample elements. The labels can be numbers with
no real numerical meaning.

– Numerical (quantitative) data are numbers (with real meaning),


representing measurements, obtained from the sample elements.

7
Statistical Inference
•Estimation:
What is the best estimate (approximate) of the value of a
population characteristic and how precise is the estimate?
(you are looking for a value, or range) point and interval
estimation

• Hypothesis testing:
What is the evidence for or against the population
characteristic having a specific value?
(you are looking for yes or no) accept of reject a certain
claim
Example
What is the mean height of adult females in UAE?
1. Is this an estimation or a hypothesis testing research question
Estimation
2. What is the population of interest?
All adult females in UAE
3. What is the variable of interest?
The height
4. What is the type of this variable?
Numerical - Continuous
5. What is the parameter?
The mean height of all adult females in UAE
6. What is the statistic you will use to approximate the parameter?
The Sample mean

9
Estimation
Estimation
› There are two types of estimation:
– Point estimation
– Interval estimation.

› Point estimation is the process of finding a point


estimate for a parameter.

› A point estimate of a parameter is a value of some


statistic. It is a number calculated from a random
sample taken from the population. It is an
approximate value for the parameter.

11
Special point estimates
Parameter Point estimation

μ =Population mean ത sample mean


𝑋=

σ =population standard S= sample standard


deviation deviation

σ2= population variance S2 =sample variance

P= population proportion ෠
𝑃=sample proportion

μ1 – μ2 𝑋1 − 𝑋2

P1 – P2 ෢1 − 𝑃
𝑃 ෢2 12
Point and Interval Estimates
› A point estimate is a single number,
› a confidence interval provides additional information about
variability

Lower Upper
Confidence Confidence
Point Estimate Limit (UCL)
Limit (LCL)
Width of
confidence interval
13
Confidence Interval Estimate
› An interval gives a range of values:
– Takes into consideration variation in sample statistics from sample to
sample
– Based on observation from 1 sample
– Gives information about closeness to unknown population
parameters
– Stated in terms of level of confidence

› Never 100% sure


› The general formula for confidence intervals is:

Point Estimate  (Critical Value)(Standard Error)

14
Confidence Level, (1-)
› Confidence in which the interval will contain
the unknown population parameter
› Example:
– Suppose confidence level = 95%
– Also written (1 - ) = .95
› A relative frequency interpretation:
– In the long run, 95% of all the confidence intervals that can
be constructed will contain the unknown true parameter
› A specific interval either will contain or will not
contain the true parameter
– No probability involved in a specific interval

15
Common Levels of Confidence
› Commonly used confidence levels are 90%, 95%,
and 99%. [The critical value is the z-value which
calculated from the normal distribution]

Confidence Critical value


Confidence
Level
Coefficient, z
z value, /2
1− 
80% .80 1.28
90% .90 1.645
95% .95 1.96
98% .98 2.33
99% .99 2.576
99.8% .998 3.08
99.9% .999 3.27
16
Confidence Interval for population mean ()
› A (1-)100% confidence interval for the
population mean  is given by

𝝈 𝝈
ഥ − 𝒛𝜶/𝟐
𝑿 ഥ + 𝒛𝜶/𝟐
, 𝑿
𝒏 𝒏

𝑺 𝑺
ഥ − 𝒕𝜶/𝟐
𝑿 ഥ + 𝒕𝜶/𝟐
, 𝑿
𝒏 𝒏

17
Interpretation

› We are 95% confident that the true mean age


is between 65 and 83 years.
› Although the true mean may or may not be in
this interval, 95% of intervals formed in this
manner will contain the true mean.
–(This interval either does or does not contain
the true mean, there is no probability for a
single interval)

18
Confidence Interval for population prevalence (P)

› A (1-)100% confidence interval for the


population prevalence (P) is given by

𝑝(1
Ƹ − 𝑝)Ƹ
𝑝Ƹ ± 𝑧𝛼/2
𝑛

19
Example
› A study was conduct in UAE in 2020 included 300 Emirati, 57
were diabatic (with HbA1c ≥6.5). Estimate the prevalence of
diabetes in UAE with 95% confidence level.
57
› The point estimate of the prevalence is 𝑝ො = = 0.19
300
› The 95% CI for the prevalence is


𝑝(1− ො
𝑝) 0.19(1−0.19)
𝑝Ƹ ± 𝑧𝛼/2 = 0.19 ± 1.96
𝑛 300
› 0.19 ± 0.04439
› The 95% CI is (14.6%, 23.4%)
› We are 95% confident that the prevalence of diabetes in UAE is
between 14.6% and 23.4%.

20
Margin of Error (E) and interval width
› Margin of Error (E): the amount added and subtracted
to the point estimate to form the confidence interval
For the mean
For the prevalence
E
s
𝑝(1
Ƹ − 𝑝)Ƹ 𝑥lj ± 𝑡𝛼/2
𝑝Ƹ ± 𝑧𝛼/2 n
𝑛

What is the effect of the following margin of error and interval width

 Data variation, σ/s: E as σ/s

 Sample size, n : E as n

 Level of confidence, 1 -  : E if 1- 21


Hypothesis Testing
Introduction
› Hypothesis testing is one of the two main type of
statistical inference statistical which involves methods
used to test a hypothesis or a set of hypotheses at the
population(s) level (not at the sample level)

› Hypothesis testing is a form of statistical inference that


uses data from a sample to draw conclusions about a
population parameter

23
General Testing Procedure
1) State the null and alternative hypothesis.
2) Carry out the experiment, collect the data,
verify the assumptions, and compute the
value of the test statistic.
3) Calculate the p-value.
4) Make a decision on the significance of the
test (reject or fail to reject H0). Make a
conclusion statement in the words of the
original problem.
Hypotheses
› A Statistical hypothesis is a conjecture about a
population parameter which may or may not be true.
a) There are two hypotheses
– The null hypothesis H0 is a claim (or statement) about a
population parameter that is assumed to be true until it is
declared false.
› The test is designed to assess the strength of the evidence against the null
hypothesis.
– The alternative hypothesis Ha (or H1), a claim about a
population parameter that will be true if the null hypothesis
is false.
› The test is designed to assess the strength of the evidence that supports Ha

25
Hypotheses
b) Equality (no difference) always in H0
c) The research question always in H1
d) The alternative hypothesis Ha has three forms:
either a greater than sign (one-tailed test), a less than
sign (one tailed test), or a not equal to sign (two-
tailed test).
– Greater than (>): results if the problem says increases,
improves, better, result is higher, etc.
– Less than (<): results if the problem says decreases,
reduces, worse than, lower, etc.
– Not equal to (): results if the problem says different
from, no longer the same, changes, etc.

26
Example
› Research question:
Is the prevalence of vitamin D deficiency the same in
Diabetic compared to non-Diabetic patients?

H0: There is no difference in the prevalence of vitamin D


deficiency between Diabetic to non-Diabetic patients
(𝐻0 : 𝑃𝐷 = 𝑃𝑁𝐷 )
H1: There is a difference in the prevalence of vitamin D
deficiency between Diabetic to non-Diabetic patients
(𝐻1 : 𝑃𝐷 ≠ 𝑃𝑁𝐷 )

27
Test Statistic and Rejection Region
› The test statistic is a quantity calculated from the
sample data that we have collected. It is used to
determine the strength of the evidence against H0.
› A rejection region, the set of all test statistic values for
which H0 will be rejected (null hypothesis rejected if the
test statistic value falls in this region.)

28
P-value
› The decision to reject or fail to reject the null
hypothesis is based on the p-value of the test.
– The p-value is the probability, assuming the null
hypothesis is true, of observing a test statistic value
as extreme or more extreme than the value
observed.
– We sometimes take one final step to assess the
evidence against H0. We compare the P-value with a
fixed value, called the significance level (). Typical
values of  used are 0.05 and 0.01.
– The smaller the p-value is, the stronger the evidence
against H0 provided by the data.
–We reject H0 if P-value <  or the test
statistic falls in the rejection region.
29
Errors
Four scenarios when making a decision based on a
sample
H 0 true H 0 false
H 0 accepted

Type II given H 0 false


Do not reject H 0 Great! P (Type II Error)
Error
=
Type I
Reject H 0 Great!
Error

False Rejection of H 0
P (Type I Error) =  (Significance level)

30
Error Types
› A type I error occurs when the null hypothesis (H0) is
rejected when in fact H0 is true.
P( Type I error) = P( Reject H0|H0 is true)= 
› A type II error occurs when we fail to reject the null
hypothesis (H0) when in fact H0 is false.
P( Type II error) = P( Accept H0|H0 is false)= 
› The power of the test is the probability of NOT
making Type II error
Power=1-  =P( Reject H0|H0 is false)

31
Statistical Procedures
Univariate Inference
› Qualitative
– Two groups (One Proportion)(prevalence)(𝑃)
– More than two groups (Chi-square goodness-of-fit test)
› Quantitative
– One mean (𝜇)
– T-test for normal data
– Sign test for non-normal data

32
Statistical Procedures
Bivariate Inference
1) Qualitative vs Qualitative
a) Two groups each (Two Proportions or Fisher's test)
b) More than two groups for one of them (Chi-square test
of independence)
2) Qualitative vs Quantitative
a) Two groups (Two means independent)
› T-test for normal data
› Mann-Whitney test for non-normal data
b) More than two groups
› (Analysis of variance) (ANOVA) for normal data
› Kruskal-Wallis test for non-normal data

33
Statistical Procedures
Bivariate Inference
3) Quantitative vs Quantitative
› Pearson correlation for normal data
› Linear regression normal data
› Spearman correlation for non-normal data

34
The Three Skills (questions)
There are three skills you need to master in order to
get a proper and reliable data analysis:
› When to use? (next slide)
– When to use a specific statistical procedure
– Depends on types of data (quantitative vs qualitative)
› How to get?
– Apply the procedure using the formulae or a software
› What to look for?
– Interpretation (comment on the output from the software)

35
The Four Questions for (When to use)
There are four questions you need to answer to know
“when to use?” (To select the appropriate statistical
procedure):
› How many variables in the research question?
– The same as the number of questions we ask each individual
› What are the types of these variables?
– Quantitative or qualitative
– Level of measurement (Nominal, ordinal, scale)
› How many groups in the research question?
– The same as number of categories in the qualitative variable
› Which variables are the dependent and independent?
– Dependent variable (Outcome or response)
– Independent variable (cause or risk factor)

36
Inferential Statistics
Outcome variable (‘Dependent’)
Categorical (Qualitative) Numerical (Quantitative)

Categorical (Independent
Chi-Square / (Same sample -
(Qualitative) samples)
Two readings)
Fisher Exact test (small
unpaired t-
Paired t-test
Predictor variable

Two groups numbers)


(‘Independent’)

test
Chi-Square /
More than Fisher Exact test (small ANOVA
two groups numbers)
Logistic Regression
Numerical Correlations (Pearson,..) /
(Regroup outcome variable
(Quantitative) Linear Regression
into
Two categories if required)

4/8/2025 37
38

You might also like