0% found this document useful (0 votes)
13 views46 pages

Statistical Inferences

The document discusses various statistical inference techniques including hypothesis testing, p-values, confidence intervals, t-tests, and chi-square tests. It provides examples and steps for performing hypothesis testing, including stating hypotheses, choosing a significance level, computing test statistics, making decisions, and interpreting results.

Uploaded by

Kashaf Naveed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views46 pages

Statistical Inferences

The document discusses various statistical inference techniques including hypothesis testing, p-values, confidence intervals, t-tests, and chi-square tests. It provides examples and steps for performing hypothesis testing, including stating hypotheses, choosing a significance level, computing test statistics, making decisions, and interpreting results.

Uploaded by

Kashaf Naveed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Statistical inferences

Statistical inferences
• Using the data from the sample to provide information
about the population parameter
• The accuracy of the collected sample data would
enable to generalize the findings for the population of
interest.

• Common approaches:
1. Hypothesis testing
2. P-value
3. Confidence interval
Hypothesis testing
• Hypothesis testing is the process used to evaluate
the strength of evidence from the sample and
provides a framework for making determinations
related to the population
Steps for hypothesis testing
• Step 1: State the null and alternate hypotheses
• Step 2: Decide on the significant level, α
• Step 3: Compute/calculate the test statistic
• Step 4: Decision to reject (or fail to reject) the null
hypothesis
• Step 5: Interpretation
Hypothesis testing
• Helps us to choose between 2 conclusions
1. The treatment works versus the treatment does not work
2. There is an association versus there is no association

• Null Hypothesis: a tentative position that there is no association/difference


between the 2 sets of values
• Mathematically, it would have an equal sign
H0: µ1 = µ2
H0: µ1 ≥ µ2
H0: µ1 ≤ µ2
• Alternative hypothesis: a tentative position that there is an association or
difference between the 2 sets of values
• Mathematically, it would not have an equal sign
HA: µ1 ≠ µ2
HA: µ1 < µ2
HA: µ1 > µ2

We generally try to disprove the null hypothesis


Step 1: State the Null and alternate
hypotheses
Can we conclude that a certain population mean is not
50? (this is 2-tailed)
H0: µ = 50
Ha: µ ≠ 50
Can we conclude that a certain population mean is
greater than 50? (this is 1-tailed to the right)
H0: µ ≤ 50
Ha: µ > 50
Step 2: Decide on the significance level
• The level of significance (α) is a probability of rejecting
a true null hypothesis.
• Denoted as α, it marks the critical portion of the
normal distribution curve, beyond which will be the
rejection/significant region
• We normally take this value as 0.05
Step 2: Decide on the significance level
• 1 tailed or 2 tailed?
Step 2: Decide on the significance level
Step 3: Compute the test statistic
Step 4: Decision
• If the computed value of the test statistic falls in the
rejection region, it is said to be significant. If not,
then we fail to reject the null hypothesis
Step 5: Interpretation
• Based on the above steps, we provide the
interpretation as: “we have sufficient evidence to
prove that…..”
P-value
• The p value for a test may be defined as the smallest
value of α for which the null hypothesis can be
rejected.

• Typically, an estimate that has a p value < 0.05 is considered


to be “statistically significant” or unlikely to occur due to
chance alone.
Confidence interval
• The probability that a population parameter will fall between a
set of values for a certain proportion of times.
• Analysts often use confidence intervals that contain either 95%
or 99% of expected observations.

Confidence interval = (Estimator) ± Z x (standard error)


Example
• In a study, a sample of 100 subjects were selected by
simple random sampling. Their mean age was found
to be 54.85. Perform a test of significance to
determine the likelihood that such a sample mean
comes from a population whose mean is 53, given
that σ = 5.50 and α = 0.05.
Example
• Step 1: State the H0 and Ha
H0: µ = 53
Ha: µ ≠ 53
• Step 2: Determine the α = 0.05
Example
• Step 3: Compute the test statistic

Z = 1.85/0.55
Z = 3.36
Example
• Step 4: Decision

Z = 3.36 lies in the rejection region, hence it is


significant and we reject the H0 (and accept Ha)
Example
• Step 5:Interpretation

We have sufficient evidence to conclude that the


sample mean DOES NOT come from a population
whose mean is 53
Determining the p-value…
• Zcalc = 3.36
• The probability value (area under the curve) comes out as = 0.4996
• Subtraction: 0.5 – 0.4996 = 0.0004
• The p-value being < 0.05, hence we reject H0.
Determining the confidence interval…

Sample mean = 54.85


Standard deviation = 5.50
n = 100
z = 1.96 (at 95%)

95% CI = (53.772, 55.928)


As this range does not capture the value stated at H0,
we reject H0
T-test
Chi-square test
T-test
• Discovered by William Gossett in 1906
• A method of testing hypothesis about the mean of small sample
drawn from a normally distributed population when the standard
deviation for the population is unknown.
• The formula,
• The t-distribution has a single parameter called the number of
degrees of freedom (df)—this is equal to the sample size minus 1.
df = n-1

• For large samples (> 30) the sample variance pretty much
approximates the population variance. In this situation, for all
practical reasons, the t-statistic behaves identically to the z-statistic.
Example
• The level of phosphate, in mg/dl in the blood of a patient undergoing
dialysis treatment was measured on six consecutive visits.
5.6, 5.1, 4.6, 4.8, 5.7, 6.4.

Calculate the sample mean = ?


Calculate the sample standard deviation = ?
Example
Calculate the sample mean = 5.4 mg/dl
Calculate the sample standard deviation = 0.67 mg/dl

• Test the hypothesis that the sample belonged to the population


where its mean phosphate levels is 4.0 mg/dl. Assume α = 0.01, and
the t-test value at df = 5, is 4.032
• Step 1: State the H0 and Ha
H0: µ = 4 mg/dl
Ha: µ ≠ 4 mg/dl (2-tailed)

• Step 2: Determine the α = 0.01. T-table shows that the 2 tailed t-test
value at df = 5, is 4.032
• Step 3: compute test statistic

t = (5.4-4.0)/(0.67/√6) = 5.12

• Step 4: Decision
As the tcal>tobs, we reject H0

• Step 5: Interpretation
We have sufficient evidence to conclude that the sample
DID NOT belonged to the population whose mean
phosphate level is 4 mg/dl at 0.01 level of significance.
Types of T-test
• One sample t test – we have only 1 group; want to
test against a hypothetical mean.

• Independent samples t test – we have 2 means, 2


groups; no relation between groups. Eg, mean heights
between male and female students of a college

• Paired t test – It consists of samples of matched pairs


of similar units or one group of units tested twice. Eg,
mean Hb among females before and after
administering intervention X.
Comparison of 2 Sample Means
• Independent samples’ T test
• Assumes normally distributed
continuous data.

T value = difference between means


standard error of difference

• T value then looked up in Table to


determine significance
Paired T Tests
• Uses the change before and
after intervention in a single
individual
• Reduces the degree of
variability between the
groups

T value = difference between means


standard error of difference

• Given the same number of


patients, has greater power to
detect a difference between
groups
Chi-square tests
• Applicable for qualitative (categorical data)
• Performed to determine associations between 2 (categorical)
variables
• Chi-square test compares the observed frequencies
with the expected frequencies.
• Formula,

𝑹𝑻 ×𝑪𝑻
• Expected value =
𝑻𝑻
• df = (r-1)(c-1)
Example
• Results from the Isoniazid drug trial after 6 months of follow up
yielded the following:

Dead Alive TOTAL


Placebo 21 110 131
Isoniazid 11 121 132
TOTAL 32 231 263

• Test at α = 0.05, if an association is present between use of isoniazid


and mortality.
Chi-Squared (2) Test-an example
Step-1: Hypotheses
H0: There is no association between isoniazid and mortality
HA: There is an association between isoniazid and mortality

Step-2: State the α (α= 0.05). Chi-square value at α= 0.05


and df = 1 is 3.84

Step-3: Calculations
O E O-E (O-E)2 (O-E)2/E
21 15.9 5.1 26.0 1.6
11 16.1 -5.1 26.0 1.6
110 115.1 -5.1 26.0 0.2
121 115.9 5.1 26.0 0.2
3.7

Chi-square = 3.7
Chi-Squared (2) Test-an example
Step-4: Decision
As Chi-squarecal < Chi-squaretab we fail to reject H0

Step-5: Interpretation
We have no evidence to conclude that there is an
association between isoniazid and mortality at α = 0.05
Errors in statistical inferences
Errors in statistical inferences
• 2 types:
• Type 1 error (α error)
reject H0 when H0 is true
• Type 2 error (β error)
failed to reject H0 when H0 is false
Type 1 error
• It means rejecting the null hypothesis when it’s
actually true.
• It means concluding that results are statistically
significant when, in reality, they came about purely by
chance or because of unrelated factors.

Example:
You decide to get tested for COVID-19 based on mild
symptoms.

Type 1 error: the test result says you have coronavirus,


but you actually do not.
Type 2 error
• It means not rejecting the null hypothesis when it’s
actually false.
• It means failing to conclude there was an effect when
there actually was.
Example:
You decide to get tested for COVID-19 based on mild
symptoms.

Type 2 error: the test result says you don’t have


coronavirus, but you actually do.
How to avoid these errors?
• Type 1 error:
• The α level is usually set at 0.05 or 5%.
• Lowering the α could assist in reducing the probability of type 1 error.
How to avoid these errors?
• Type 2 error:
• The risk of making a type 2 error is inversely related to the
statistical power of a test.
• Statistical power is determined by:
• Effect size: Larger effects are more easily detected.
• Measurement error: Systematic and random errors in recorded
data reduce power.
• Sample size: Larger samples reduce sampling error and increase
power.
• Significance level: Increasing the significance level increases
power.
• Increasing the sample size or the significance level will
reduce the probability of type 2 error.

You might also like