0% found this document useful (0 votes)
22 views11 pages

STAT453 Study Guide

STAT453StudyGuide

Uploaded by

nilsdmikkelsen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views11 pages

STAT453 Study Guide

STAT453StudyGuide

Uploaded by

nilsdmikkelsen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

STAT 453 Study Guide

Nils DM
February 18, 2022

1 Introduction
1. Statistical Design of Experiments: The process of planning the
experiment so that appropriate data will be collected and analysed by
statistical methods.

2. Potential objectives of experiments:

(a) Determine which variables are most influential on the response y


(b) Determine where to set the influential x’s so that y is almost
always near the desired nominal value.
(c) Determine where to set the influential x’s so that variability in y
is small.
(d) Determine where to set the influential x’s so that the effects of the
uncontrollable variables z1 , . . . , zq are minimized.

3. The 3 Basic principles:

(a) Randomization
(b) Replication
(c) Blocking (and the factorial principle)

1
4. Strategy of Experimentation:

(a) Best guess experiments


(b) One factor at a time experiments
(c) Statistically designed experiments based on Fisher’s factorial con-
cept.

5. Guidelines for Design Experiments:

(a) Recognition and statement of the problem.


(b) Selection of the response variable
(c) Choice of factors, levels, and range
(d) Choice of experimental design
(e) Performing the experiment
(f) Statistical analysis of the data
(g) Conclusions and recommendations

6. Factors: Independent variables in the model

7. Levels: The different options for each factor

2
2 Simple Comparative Experiments
1. Run: An observation.

2. Noise/Error: The fluctuation between runs.

3. Statistical model: One factor n levels:

yij = µi + ϵij

Where:

(a) i = 1, 2, . . . , n represents the i levels of the single factor


(b) j = 1, 2, . . . represents the index for the observations (or runs).

We then have:

(a) yij is the j th observation for the factor level i


(b) µi is the mean for the response at the ith factor level.
(c) ϵij is a normal random variable associated with yij , usually de-
noted:
ϵij ∼ N (µi , σi2 )

3
4. Two Independent Sample T-Test: Assumptions:

(a) Both samples are drawn from normally distributed populations.


(b) The two samples are independent.

The different cases are:

(a) Case 1: σ12 = σ22 and is unknown


Hypotheses:

H0 : µ1 = µ2
H1 : µ1 ̸= µ2

Test Statistic:
y − y2
t0 = q1
sp n11 + n12
Where:
(n1 − 1)s21 + (n2−1 )s22
sp =
n1 + n2 − 2
If H0 is true, then t0∼tn1 +n2 −2
Confidence Interval:
r
1 1
(y 1 − y 2 ) ± t α2 ,n1 +n2 −2 sp +
n1 n2

(b) Case 2: σ12 ̸= σ22 and is unknown


Hypotheses:

H0 : µ1 = µ2
H1 : µ1 ̸= µ2

Test Statistic:
y −y
t0 = q 1 2 2 2
s1 s
n1
+ n22

Confidence Interval:
s
s21 s2
(y 1 − y 2 ) ± t α2 ,v sp + 2
n1 n2

4
(c) Case 3: σ12 and σ22 is known
Hypotheses:

H0 : µ1 = µ2
H1 : µ1 ̸= µ2

Test Statistic:
If σ12 ̸= σ22 , then the test statistic is:
y −y
Z0 = q 1 2 2 2
σ1 σ
n1
+ n22

Under H0 , Z0 ∼ (0, 1)
Confidence Interval:
s
σ12 σ22
(y 1 − y 2 ) ± z α2 +
n1 n2

(d) Case 4: Comparing only one population mean to a specific


value Hypotheses:

H0 : µ = µ0
H1 : µ ̸= µ0

Test Statistic: the population is normal with known variance or


the sample size is large:
y − µ0
Z0 =
√σ
n

H0 ∼ N (0, 1)
Confidence Interval:
σ
y ± z α2 √
n
If the sample size is not large and the variance is unknown:
y − µ0
t0 =
√s
n

H0 ∼ tn−1 Confidence Interval:


σ
y ± t α2 ,n−1 √
n

5
5. Checking for equal variance: The way that we check if two in-
dependent normally distributed populations have equal variance is to
check the ratio of their sample variances which are F distributed:

s21
∼ Fn1 −1,n2 −1
s22

6. Rejection Regions and P-values: There are three possible cases:

(a) Two-sided Hypothesis Test:


Rejection Region: H0 if |t| > t α2
P-value: 2P (T > |t|) for H1 : µ1 ̸= µ2
(b) One-sided Hypothesis Test (Greater):
Rejection Region: H0 if t0 > tα
P-value: P (T > t) for H1 : µ1 > µ2
(c) One-sided Hypothesis Test (Lower):
Rejection Region: H0 if t0 < −tα
P-value: P (T < t) for H1 : µ1 < µ2

6
3 Experiments with a Single Factor: The Anal-
ysis of Variance
1. Treatments: The different levels of a factor.

2. ANOVA: A generalized t-test for n treatments. ANOVA allows us to


compare the differences between all factor means.

3. Calculating Total Runs: If we have a levels of a factor with n repli-


cates (full runs of the experiment) then we have:

a × n = N Runs.

If the runs are chosen in a random order, than the design is called a
completely randomized design (CRD).

4. ANOVA model:
We specify the model as:

Yij = µ + τi + ϵij
For:

(a) i = 1, . . . , a
(b) j = 1, . . . , n

Where τi is the ith treatment effect, and ϵij ∼ N (0, σ 2 )


Can we can simplify the model by setting:

µi = µ + τ i

Which is usually referred to as the means model.


Useful tips:

(a) SST = SST reatments + SSE


(b) dfT otal = dfT reatments + dfError = an − 1 = a − 1 + a(n − 1)
SST reatments
(c) M ST reatments = a−1
SSError
(d) M SError = a(n−1)
M ST reatments
(e) F0 = M SError
∼ Fα,a−1,a(n−1)

7
5. Model Adequacy: The following tests are used to determine if the
model is acceptable:

(a) Normality: QQplot of the residuals.


(b) Constant Variance: Residuals versus fitted values.
(c) Independence: Residuals versus Run Order.

6. Multiple Comparisons Problem:


ANOVA can tell us if one of the means is significantly different, but
cannot tell us which one. There are three methods used:

(a) Tukey’s method: Two means are significantly different if the


absolute values of their sample differences exceeds:
r
M SE
Tα = qα (a, f )
n

(b) Bonferroni correction: If we are performing M pairwise t-tests,


for each test we compute its p-value pm and reject any null hy-
α
pothesis with a p-value less than pm ≤ M
(c) FDR

8
4 Randomized Blocks, Latin Squares, and Re-
lated Designs
1. Nuisance Factors: factors that affect the response variable but we
are not interested in them. How do we address this issue?

(a) Unknown and uncontrollable: Randomization


(b) Known and uncontrollable: Analysis of covariance ANCOVA
(c) Known and controllable: Blocking

2. Randomized Complete Block Design (RCBD): An experimental


design where:

(a) Each block contains all treatments


(b) Within a block, the order in which the different levels are tested
is randomly determined

Suppose we have a treatments to be compared and b blocks, the effects


model is:
Yij = µ + τi + βj + ϵij
Where βj is the effect of the j th block, for j = 1, . . . , b We can simplify
the model to:
yij = µij + ϵij
Where µij = µ + τi + βj
Degrees of Freedom:

SST = SST reatments + SSBlocks + SSE


ab − 1 = a − 1 + b − 1 + (a − 1)(b − 1)

We reject H0 if F0 > Fα,a−1 , (a − 1)(b − 1)


SSE
b2 =
Important: σ (a−1)(b−1)
σ2 = σ2)
= M SE and E(b

3. Latin Square Design:

(a) Utilize the blocking principle to eliminate two nuisance sources of


variability.
(b) Rules:
i. Each treatment letter occurs only once per row and column.

9
ii. The observations in the Latin square should be taken in ran-
dom order.
(c) Model:
Yijk = µ + αi + τj + βk + ϵijk
(d) Degrees of Freedom:

SS = SSRows + SSColumns + SST reatments + SSE


2
p − 1 = p − 1 + p − 1 + p − 1 + (p − 2)(p − 1)

Where p as the max index for everything


(e) Test Statistic:
M ST reatments
F0 = ∼ Fα,p−1,(p−2)(p−1)
M SE

4. Balanced Incomplete Block Designs (BIBD):


(a) Used when we are considering a randomized block design but can-
not run all treatment combinations in each block.
(b) The treatment combinations in each block should be selected in a
balanced manner; any pair of treatments occur together the same
number of times as any other pair.

(a) Assume we have a treatments and b blocks where each block con-
tains k < a treatments.
(b) Each treatment a occurs r times in the design, therefore there are:

N = ar = bk

total observations.
(c) The number of times each pair of treatments appears in the same
block is:
r(k − 1)
λ=
a−1
5. Model:
yij = µ + τi + βj + ϵij
Same assumptions as the RCBD
6. Degrees of freedom:

SST = SST reatments(adjusted) + SSBlocks + SSE

10
7. Test Statistic:
M ST reatments
F0 = ∼ Fα,a−1,N −a−b+1
M SE

11

You might also like