Biostatistics Basics PDF
Biostatistics Basics PDF
20 / 66
Thought Experiment Math†
Here’s the truth;
Y ≈ β0 1 + β1 X
22 / 66
Thought Experiment Math?†
23 / 66
†
Thought Experiment #2
A different “design”
What is going on?
24 / 66
†
Thought Experiment #2
Plotting Y against X ;
25 / 66
†
Thought Experiment #2
Plotting Y against X ;
... and not knowing Z
26 / 66
†
Thought Experiment #2
27 / 66
†
Thought Experiment #2
Y = γ0 1 + γ1 X + γ2 Z
Z = θ0 1 + θ1 X +
Y = γ0 + γ1 X + γ2 (θ0 + θ1 X + )
= (γ0 + γ2 θ0 )1 + (γ1 + γ2 θ1 )X + γ2
≡ β0 1 + β1 X + e
28 / 66
†
Thought Experiment #2
29 / 66
†
Thought Experiment #3
A final “design”
... a real mess!
30 / 66
†
Thought Experiment #3
A final “design”
... plotting Y vs. X
31 / 66
†
Thought Experiment #3
A final “design”
... plotting Y vs. X
(Starts to look like real data!)
32 / 66
†
Thought Experiment #3
4
...which probably don’t require a PhD statistician
33 / 66
†
Reality Check
34 / 66
†
Reality Check
With reasonable sample size n, inference (i.e. learning about β) is
possible without making strong assumptions about the distribution
of Y , and how it varies with X. It seems prudent to avoid these
assumptions as “modern” approaches do.
I If you have good a priori reasons to believe them,
distributional assumptions may be okay and may help
substantially
I For small n this may be the only viable approach (other than
quitting)
I For tasks other than inference (e.g. prediction) assumptions
may be needed.
I Checking distributional assumptions after you’ve used them
doesnt actually work very well. Asking the data “was I right
to trust you just now” ? or “did you behave in the way I hope
you did?” is not reliable, in general.
35 / 66
†
Reality Check
36 / 66
What is a parameter?†
From previous courses you will be used to this kind of plot
6
When discussing [most] practical problems with your co-authors, it won’t
hurt to replace the infinite super-population with a vast substitute e.g. all men
aged 50-100 in the US, or in developed countries
46 / 66
What is regression?†
In its most fundamental interpretation, regression estimates
differences in outcome Y , between subjects whose X values differ
in a specified manner.
We take differences in “Y” to mean differences in the expectation
of Y , on some scale. For example, with binary X, you might be
interested in;
EF [Y |X = 1] − EF [Y |X = 0]
or
log (EF [Y |X = 1]/EF [Y |X = 0])
or even
48 / 66
What is regression?: 2 X-values†
In a universe of only two points:
49 / 66
What is regression?: more X-values†
Default “averaging” uses weights ∝ (Xj − Xk )2 :
50 / 66
What is regression?: many X-values†
Jacobi7 showed there is a neater way to define the weighted mean
slope parameter:
CovX [X , Y ]
βX =
VarF [X ]
It can also be described as a (partial) solution to this system of
equations:
EF [β0 + X βX ] = EF [Y ]
EF [X (β0 + X βX )] = EF [XY ],