0% found this document useful (0 votes)
62 views

Biostatistics Basics PDF

This document discusses the concept of randomness through a series of thought experiments. It aims to show that what appears random may actually be the result of unknown or unmeasured factors. The experiments demonstrate how relationships between variables can change depending on whether other influencing variables are known or unknown. They highlight that randomness is often a result of imperfect information rather than truly stochastic phenomena. Parameters are also defined as properties of a hypothetical infinite population that mapping it to a parameter space, such as the mean, regardless of sample size or knowledge of all variables.

Uploaded by

Mike Strandon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

Biostatistics Basics PDF

This document discusses the concept of randomness through a series of thought experiments. It aims to show that what appears random may actually be the result of unknown or unmeasured factors. The experiments demonstrate how relationships between variables can change depending on whether other influencing variables are known or unknown. They highlight that randomness is often a result of imperfect information rather than truly stochastic phenomena. Parameters are also defined as properties of a hypothetical infinite population that mapping it to a parameter space, such as the mean, regardless of sample size or knowledge of all variables.

Uploaded by

Mike Strandon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

What is Randomness?

Thought experiment #1; Note


that in this situation there no
“measurement error” or “noise”,
and nothing random is going on.
What is the difference between X
and X+1?

20 / 66
Thought Experiment Math†
Here’s the truth;

Yn×1 = γ0 1n×1 + γ1 Xn×1 + γ2 Zn×1

where n is evenly distributed between all X , Z combinations.


But not knowing Z , we will fit the relationship

Y ≈ β0 1 + β1 X

Here “fit” means that we will find e orthogonal to 1 and X such


that
Y = β0 1 + β1 X + e
By linear algebra (i.e. projection onto 1 and X) we must have
   
Y·1 Y · (X − X̄1) Y · (X − X̄1)
e = Y− − X 1− X
n (X − X̄1) · (X − X̄1) (X − X̄1) · (X − X̄1)

where X̄ = X · 1/(1 · 1) = X · 1/n, i.e. the mean of X - a scalar.


21 / 66
Thought Experiment Math?†

The fitted line, with e


Note the orthogonality to 1 and X
What’s the slope of the line?

22 / 66
Thought Experiment Math?†

What to remember (in “real” experiments too);


I The “errors” represent everything that we didn’t measure.
I Nothing is random here - we just have imperfect information
I If you are never going to know Z (or can’t assume you know a
lot about it) this sort of “marginal” relationship is all that can
be learned
What you didn’t measure can’t be ignored...

23 / 66

Thought Experiment #2

A different “design”
What is going on?

24 / 66

Thought Experiment #2

Plotting Y against X ;

25 / 66

Thought Experiment #2

Plotting Y against X ;
... and not knowing Z

26 / 66

Thought Experiment #2

Here’s the fitted line;


... what’s the slope?
What would you conclude?

27 / 66

Thought Experiment #2

Here’s the truth, for both Y and Z;

Y = γ0 1 + γ1 X + γ2 Z
Z = θ0 1 + θ1 X + 

where  is orthongal to 1, X. Therefore,

Y = γ0 + γ1 X + γ2 (θ0 + θ1 X + )
= (γ0 + γ2 θ0 )1 + (γ1 + γ2 θ1 )X + γ2 
≡ β0 1 + β1 X + e

and we get β1 = γ1 if (and only if) there’s “nothing going on”


between Z and X . The change we saw in the Y − X slope (from
#1 to #2) follows exactly this pattern.

28 / 66

Thought Experiment #2

I The marginal slope β1 is not the “wrong” answer, but it may


not be the same as γ1 .
I Which do you wnat? The Y − Z slope if Z is fixed or if Z
varies with X in the same way it did in your experiment?
I No one needs to know that Y is being measured for β1 6= γ1
to occur.
I The “observed” e are actually γ2  here, so the “noise”
doesn’t simply reflect the Z − X relationship alone

29 / 66

Thought Experiment #3

A final “design”
... a real mess!

30 / 66

Thought Experiment #3

A final “design”
... plotting Y vs. X

31 / 66

Thought Experiment #3

A final “design”
... plotting Y vs. X
(Starts to look like real data!)

32 / 66

Thought Experiment #3

I Z and X were orthogonal - what happened to the slope?


I But the variability of Z depended on X . What happened to e,
compared to #1 and # 2? We can extend all these
arguments to Xn×p and Zn×q - see Jon Wakefield’s book for
more. Reality also tends to have > 1 “un-pretty” phenomena
per situation!
In general, the nature of what we call “randomness” depends
heavily on what is going on unobserved. Its only in extremely
simple situations4 that unobserved patterns can be dismissed
without careful thought. In some complex situations they can
be dismissed, but only after careful thought.

4
...which probably don’t require a PhD statistician
33 / 66

Reality Check

This is a realistically- complex


“system” you might see in practice
Your “X” might be time
(developmental) and “Y”
expression of a particular gene
Knowing the Y-X relationship is
clearly useful, but pretending that
all the Z -X relationships are
pretty is naı̈ve (at best)

34 / 66

Reality Check
With reasonable sample size n, inference (i.e. learning about β) is
possible without making strong assumptions about the distribution
of Y , and how it varies with X. It seems prudent to avoid these
assumptions as “modern” approaches do.
I If you have good a priori reasons to believe them,
distributional assumptions may be okay and may help
substantially
I For small n this may be the only viable approach (other than
quitting)
I For tasks other than inference (e.g. prediction) assumptions
may be needed.
I Checking distributional assumptions after you’ve used them
doesnt actually work very well. Asking the data “was I right
to trust you just now” ? or “did you behave in the way I hope
you did?” is not reliable, in general.

35 / 66

Reality Check

If you have to start making distributional assumptions:


I Adding lots of little effects → Normal distributions
I Binary events → Bernoulli, and Binomial
I Counting lots of rare events → Poisson
I Continual (small) hazard of an event → Weibull
... but note these are rather stylized, minor modications break
them, e.g. different event rates → overdispersed Poisson.
However, methods which use classical assumptions often have
other interpretations. For example, using Ȳ (the sample mean) as
an estimator can be motivated with Normality, but we don’t need
this assumption in order to use Y .

36 / 66
What is a parameter?†
From previous courses you will be used to this kind of plot

... and also used to “manipulating” the sample in several ways 37 / 66


What is a parameter?†
You may have seen larger sample sizes,

... this sample can also be “manipulated” 38 / 66


What is a parameter?†
To define parameters, think of an infinite “super”-population;

... and consider (simple) ways to manipulate what we see; 39 / 66


What is a parameter?†
The mean of X;

(note: requires finite moments of X to be well-defined) 40 / 66


What is a parameter?†
The mean of Y ;

... mild regularity conditions also apply 41 / 66


What is a parameter?†
The mean of Y at a given value of X

... only sensible if you know the given value of X (!) 42 / 66


What is a parameter?†
Difference in mean of Y , between two values of X;

... which is unchanged, if Y → Y + c 43 / 66


Defining parameters†
A parameter is (formally) an operation on a super-population,
mapping it to a “parameter space” Θ, such as R, or Rp , or {0, 1}.
The parameter value (typically denoted β or θ) is the result of this
operation5 .
I “Inference” means making one or more conclusions about the
parameter value
I These could be estimates, intervals, or binary (Yes/No)
decisions
I “Statistical inference” means drawing conclusions without
the full populations’ data, i.e. in the face of uncertainty.
Parameter values themselves are fixed unknowns; they are not
“uncertain” or “random” in any stochastic sense.

In previous courses, parameters may have been defined as linear


operations on the superpopulation. In 754, we will generalize the
idea.
5
The “true state of Nature” is a common expression for the same thing
44 / 66
Defining parameters†
In this course, we will typically assume relevant parameters can be
identified in this way. But in some real situations, one cannot
identify θ, even with an infinite sample (e.g. mean height of
women, when you only have data on men)
If your data do not permit useful inference, you could;
I Switch target parameters
I Extrapolate cautiously i.e. make assumptions
I Not do inference, but “hypothesis-generation”
I Give up

I will mainly disucss “sane” problems; this means ones we can


reasonably address. Be aware not every problem is like this...
The data may not contain the answer. The combination of some
data and an aching desire for an answer does not ensure that a
reasonable answer can be extracted from a given body of data
-John Tukey
45 / 66
Defining parameters†
Of course, infinite populations are an abstraction. But formally 6
statistical inference is [mostly] about parameter values determined
from e.g.
I The heights of all men aged 50-100, this year and in all years,
ever
I The heights of all possible men aged 50-100, in this and all
possible universes
I The heights of all possible men aged 50-100 in Maryland, in
this and all possible universes
Naturally, these abstract notions are not usually discussed in
practice but thinking about n = ∞ will be helpful, when deciding
exactly what parameters are of interest.

6
When discussing [most] practical problems with your co-authors, it won’t
hurt to replace the infinite super-population with a vast substitute e.g. all men
aged 50-100 in the US, or in developed countries
46 / 66
What is regression?†
In its most fundamental interpretation, regression estimates
differences in outcome Y , between subjects whose X values differ
in a specified manner.
We take differences in “Y” to mean differences in the expectation
of Y , on some scale. For example, with binary X, you might be
interested in;
EF [Y |X = 1] − EF [Y |X = 0]
or
log (EF [Y |X = 1]/EF [Y |X = 0])
or even

exp{EF [log(Y )|X = 1] − EF [log(Y )|X = 0]}

Note that these are all different! As before, none of them is


“right”, “wrong”, “uniformly best”, or even “uniformly a great
idea”.
47 / 66

What is regression? : continuous X-values

Q. How to concisely describe differences in Y over range of X?


The most commonly-used regression parameter is;
“The difference in Y per 1-unit difference in X”
-which, most fundamentally, means:
I Take the difference in Y between two different X values
divided by the difference in those X values
I Rinse and repeat, averaging this “slope” over all pairs of
{Y , Xj }, {Y , Xk }.

(Other interpretations will be given later)

48 / 66
What is regression?: 2 X-values†
In a universe of only two points:

49 / 66
What is regression?: more X-values†
Default “averaging” uses weights ∝ (Xj − Xk )2 :

50 / 66
What is regression?: many X-values†
Jacobi7 showed there is a neater way to define the weighted mean
slope parameter:
CovX [X , Y ]
βX =
VarF [X ]
It can also be described as a (partial) solution to this system of
equations:

EF [β0 + X βX ] = EF [Y ]
EF [X (β0 + X βX )] = EF [XY ],

where β0 is a “nuisance” parameter; without further information,


its value doesn’t tell us anything about βX . Please don’t
misinterpret the term “nuisance” to mean “totally useless” or
“never of any interest”.
7
... in 1841; the result is often overlooked
Jacobi CGJ; De formatione et proprietatibus detrminantum.
51 / 66

You might also like