0% found this document useful (0 votes)

62 views

Biostatistics Basics PDF

This document discusses the concept of randomness through a series of thought experiments. It aims to show that what appears random may actually be the result of unknown or unmeasured factors. The experiments demonstrate how relationships between variables can change depending on whether other influencing variables are known or unknown. They highlight that randomness is often a result of imperfect information rather than truly stochastic phenomena. Parameters are also defined as properties of a hypothetical infinite population that mapping it to a parameter space, such as the mean, regardless of sample size or knowledge of all variables.

Uploaded by

Mike Strandon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views

Biostatistics Basics PDF

Uploaded by

Mike Strandon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

What is Randomness?

Thought experiment #1; Note

that in this situation there no
“measurement error” or “noise”,
and nothing random is going on.
What is the difference between X
and X+1?

20 / 66
Thought Experiment Math†
Here’s the truth;

Yn×1 = γ0 1n×1 + γ1 Xn×1 + γ2 Zn×1

where n is evenly distributed between all X , Z combinations.

But not knowing Z , we will fit the relationship

Y ≈ β0 1 + β1 X

Here “fit” means that we will find e orthogonal to 1 and X such

that
Y = β0 1 + β1 X + e
By linear algebra (i.e. projection onto 1 and X) we must have

Y·1 Y · (X − X̄1) Y · (X − X̄1)
e = Y− − X 1− X
n (X − X̄1) · (X − X̄1) (X − X̄1) · (X − X̄1)

where X̄ = X · 1/(1 · 1) = X · 1/n, i.e. the mean of X - a scalar.

21 / 66
Thought Experiment Math?†

The fitted line, with e

Note the orthogonality to 1 and X
What’s the slope of the line?

22 / 66
Thought Experiment Math?†

What to remember (in “real” experiments too);

I The “errors” represent everything that we didn’t measure.
I Nothing is random here - we just have imperfect information
I If you are never going to know Z (or can’t assume you know a
lot about it) this sort of “marginal” relationship is all that can
be learned
What you didn’t measure can’t be ignored...

23 / 66
†
Thought Experiment #2

A different “design”
What is going on?

24 / 66
†
Thought Experiment #2

Plotting Y against X ;

25 / 66
†
Thought Experiment #2

Plotting Y against X ;
... and not knowing Z

26 / 66
†
Thought Experiment #2

Here’s the fitted line;

... what’s the slope?
What would you conclude?

27 / 66
†
Thought Experiment #2

Here’s the truth, for both Y and Z;

Y = γ0 1 + γ1 X + γ2 Z
Z = θ0 1 + θ1 X +

where is orthongal to 1, X. Therefore,

Y = γ0 + γ1 X + γ2 (θ0 + θ1 X + )
= (γ0 + γ2 θ0 )1 + (γ1 + γ2 θ1 )X + γ2
≡ β0 1 + β1 X + e

and we get β1 = γ1 if (and only if) there’s “nothing going on”

between Z and X . The change we saw in the Y − X slope (from
#1 to #2) follows exactly this pattern.

28 / 66
†
Thought Experiment #2

I The marginal slope β1 is not the “wrong” answer, but it may

not be the same as γ1 .
I Which do you wnat? The Y − Z slope if Z is fixed or if Z
varies with X in the same way it did in your experiment?
I No one needs to know that Y is being measured for β1 6= γ1
to occur.
I The “observed” e are actually γ2 here, so the “noise”
doesn’t simply reflect the Z − X relationship alone

29 / 66
†
Thought Experiment #3

A final “design”
... a real mess!

30 / 66
†
Thought Experiment #3

A final “design”
... plotting Y vs. X

31 / 66
†
Thought Experiment #3

A final “design”
... plotting Y vs. X
(Starts to look like real data!)

32 / 66
†
Thought Experiment #3

I Z and X were orthogonal - what happened to the slope?

I But the variability of Z depended on X . What happened to e,
compared to #1 and # 2? We can extend all these
arguments to Xn×p and Zn×q - see Jon Wakefield’s book for
more. Reality also tends to have > 1 “un-pretty” phenomena
per situation!
In general, the nature of what we call “randomness” depends
heavily on what is going on unobserved. Its only in extremely
simple situations4 that unobserved patterns can be dismissed
without careful thought. In some complex situations they can
be dismissed, but only after careful thought.

4
...which probably don’t require a PhD statistician
33 / 66
†
Reality Check

This is a realistically- complex

“system” you might see in practice
Your “X” might be time
(developmental) and “Y”
expression of a particular gene
Knowing the Y-X relationship is
clearly useful, but pretending that
all the Z -X relationships are
pretty is naı̈ve (at best)

34 / 66
†
Reality Check
With reasonable sample size n, inference (i.e. learning about β) is
possible without making strong assumptions about the distribution
of Y , and how it varies with X. It seems prudent to avoid these
assumptions as “modern” approaches do.
I If you have good a priori reasons to believe them,
distributional assumptions may be okay and may help
substantially
I For small n this may be the only viable approach (other than
quitting)
I For tasks other than inference (e.g. prediction) assumptions
may be needed.
I Checking distributional assumptions after you’ve used them
doesnt actually work very well. Asking the data “was I right
to trust you just now” ? or “did you behave in the way I hope
you did?” is not reliable, in general.

35 / 66
†
Reality Check

If you have to start making distributional assumptions:

I Adding lots of little effects → Normal distributions
I Binary events → Bernoulli, and Binomial
I Counting lots of rare events → Poisson
I Continual (small) hazard of an event → Weibull
... but note these are rather stylized, minor modications break
them, e.g. different event rates → overdispersed Poisson.
However, methods which use classical assumptions often have
other interpretations. For example, using Ȳ (the sample mean) as
an estimator can be motivated with Normality, but we don’t need
this assumption in order to use Y .

36 / 66
What is a parameter?†
From previous courses you will be used to this kind of plot

... and also used to “manipulating” the sample in several ways 37 / 66

What is a parameter?†
You may have seen larger sample sizes,

... this sample can also be “manipulated” 38 / 66

What is a parameter?†
To define parameters, think of an infinite “super”-population;

... and consider (simple) ways to manipulate what we see; 39 / 66

What is a parameter?†
The mean of X;

(note: requires finite moments of X to be well-defined) 40 / 66

What is a parameter?†
The mean of Y ;

... mild regularity conditions also apply 41 / 66

What is a parameter?†
The mean of Y at a given value of X

... only sensible if you know the given value of X (!) 42 / 66

What is a parameter?†
Difference in mean of Y , between two values of X;

... which is unchanged, if Y → Y + c 43 / 66

Defining parameters†
A parameter is (formally) an operation on a super-population,
mapping it to a “parameter space” Θ, such as R, or Rp , or {0, 1}.
The parameter value (typically denoted β or θ) is the result of this
operation5 .
I “Inference” means making one or more conclusions about the
parameter value
I These could be estimates, intervals, or binary (Yes/No)
decisions
I “Statistical inference” means drawing conclusions without
the full populations’ data, i.e. in the face of uncertainty.
Parameter values themselves are fixed unknowns; they are not
“uncertain” or “random” in any stochastic sense.

In previous courses, parameters may have been defined as linear

operations on the superpopulation. In 754, we will generalize the
idea.
5
The “true state of Nature” is a common expression for the same thing
44 / 66
Defining parameters†
In this course, we will typically assume relevant parameters can be
identified in this way. But in some real situations, one cannot
identify θ, even with an infinite sample (e.g. mean height of
women, when you only have data on men)
If your data do not permit useful inference, you could;
I Switch target parameters
I Extrapolate cautiously i.e. make assumptions
I Not do inference, but “hypothesis-generation”
I Give up

I will mainly disucss “sane” problems; this means ones we can

reasonably address. Be aware not every problem is like this...
The data may not contain the answer. The combination of some
data and an aching desire for an answer does not ensure that a
reasonable answer can be extracted from a given body of data
-John Tukey
45 / 66
Defining parameters†
Of course, infinite populations are an abstraction. But formally 6
statistical inference is [mostly] about parameter values determined
from e.g.
I The heights of all men aged 50-100, this year and in all years,
ever
I The heights of all possible men aged 50-100, in this and all
possible universes
I The heights of all possible men aged 50-100 in Maryland, in
this and all possible universes
Naturally, these abstract notions are not usually discussed in
practice but thinking about n = ∞ will be helpful, when deciding
exactly what parameters are of interest.

6
When discussing [most] practical problems with your co-authors, it won’t
hurt to replace the infinite super-population with a vast substitute e.g. all men
aged 50-100 in the US, or in developed countries
46 / 66
What is regression?†
In its most fundamental interpretation, regression estimates
differences in outcome Y , between subjects whose X values differ
in a specified manner.
We take differences in “Y” to mean differences in the expectation
of Y , on some scale. For example, with binary X, you might be
interested in;
EF [Y |X = 1] − EF [Y |X = 0]
or
log (EF [Y |X = 1]/EF [Y |X = 0])
or even

exp{EF [log(Y )|X = 1] − EF [log(Y )|X = 0]}

Note that these are all different! As before, none of them is

“right”, “wrong”, “uniformly best”, or even “uniformly a great
idea”.
47 / 66
†
What is regression? : continuous X-values

Q. How to concisely describe differences in Y over range of X?

The most commonly-used regression parameter is;
“The difference in Y per 1-unit difference in X”
-which, most fundamentally, means:
I Take the difference in Y between two different X values
divided by the difference in those X values
I Rinse and repeat, averaging this “slope” over all pairs of
{Y , Xj }, {Y , Xk }.

(Other interpretations will be given later)

48 / 66
What is regression?: 2 X-values†
In a universe of only two points:

49 / 66
What is regression?: more X-values†
Default “averaging” uses weights ∝ (Xj − Xk )2 :

50 / 66
What is regression?: many X-values†
Jacobi7 showed there is a neater way to define the weighted mean
slope parameter:
CovX [X , Y ]
βX =
VarF [X ]
It can also be described as a (partial) solution to this system of
equations:

EF [β0 + X βX ] = EF [Y ]
EF [X (β0 + X βX )] = EF [XY ],

where β0 is a “nuisance” parameter; without further information,

its value doesn’t tell us anything about βX . Please don’t
misinterpret the term “nuisance” to mean “totally useless” or
“never of any interest”.
7
... in 1841; the result is often overlooked
Jacobi CGJ; De formatione et proprietatibus detrminantum.
51 / 66

Cheat Sheet
No ratings yet
Cheat Sheet
4 pages
Mathematical Statistics For Economics: Lecturer: DR Ioannis (Yiannis) Karavias
No ratings yet
Mathematical Statistics For Economics: Lecturer: DR Ioannis (Yiannis) Karavias
33 pages
Lecture 1
No ratings yet
Lecture 1
66 pages
COM 201 - Inferential Statistics - 18032022-1
No ratings yet
COM 201 - Inferential Statistics - 18032022-1
58 pages
UnivariateRegression 3
No ratings yet
UnivariateRegression 3
81 pages
Introduction to the Practice of Statistics Ninth Edition – Ebook PDF Version pdf download
100% (37)
Introduction to the Practice of Statistics Ninth Edition – Ebook PDF Version pdf download
46 pages
Bio24_Rathouz
No ratings yet
Bio24_Rathouz
45 pages
Generalized Linear Models: Ariel Alonso Abad
No ratings yet
Generalized Linear Models: Ariel Alonso Abad
43 pages
Full Download Statistics Using IBM SPSS: An Integrative Approach – Ebook PDF Version PDF DOCX
100% (2)
Full Download Statistics Using IBM SPSS: An Integrative Approach – Ebook PDF Version PDF DOCX
31 pages
Get Statistics Using IBM SPSS: An Integrative Approach – Ebook PDF Version PDF ebook with Full Chapters Now
100% (4)
Get Statistics Using IBM SPSS: An Integrative Approach – Ebook PDF Version PDF ebook with Full Chapters Now
65 pages
03.22.2021 - L8 Statistics and Least Square
No ratings yet
03.22.2021 - L8 Statistics and Least Square
71 pages
Statistics Lecture Notes
No ratings yet
Statistics Lecture Notes
15 pages
statss-2
No ratings yet
statss-2
7 pages
[FREE PDF sample] Statistics Using IBM SPSS: An Integrative Approach – Ebook PDF Version ebooks
100% (3)
[FREE PDF sample] Statistics Using IBM SPSS: An Integrative Approach – Ebook PDF Version ebooks
35 pages
2021 Stat Notes
No ratings yet
2021 Stat Notes
162 pages
3logistic Regression
No ratings yet
3logistic Regression
61 pages
Get (eBook PDF) Statistics A Gentle Introduction 3rd Edition by Frederick L. Coolidge free all chapters
100% (2)
Get (eBook PDF) Statistics A Gentle Introduction 3rd Edition by Frederick L. Coolidge free all chapters
45 pages
M2S2 - Statistical Modelling: DR Axel Gandy Imperial College London Spring 2011
No ratings yet
M2S2 - Statistical Modelling: DR Axel Gandy Imperial College London Spring 2011
25 pages
Statistics Using IBM SPSS: An Integrative Approach – Ebook PDF Version instant download
100% (1)
Statistics Using IBM SPSS: An Integrative Approach – Ebook PDF Version instant download
61 pages
(eBook PDF) Introduction to Econometrics, 4th Global Edition instant download
100% (6)
(eBook PDF) Introduction to Econometrics, 4th Global Edition instant download
57 pages
ECN 306
No ratings yet
ECN 306
43 pages
Probability and Statistics For Engineers Applied Statistics: Course 461601 Course 400516
No ratings yet
Probability and Statistics For Engineers Applied Statistics: Course 461601 Course 400516
22 pages
Interpret Standard Deviation Outlier Rule: Using Normalcdf and Invnorm (Calculator Tips)
No ratings yet
Interpret Standard Deviation Outlier Rule: Using Normalcdf and Invnorm (Calculator Tips)
12 pages
4736
No ratings yet
4736
44 pages
(eBook PDF) Introduction to Econometrics, 4th Global Edition download pdf
100% (6)
(eBook PDF) Introduction to Econometrics, 4th Global Edition download pdf
56 pages
5 Further Topics (60 Min.) : Systematic Errors, MCMC
No ratings yet
5 Further Topics (60 Min.) : Systematic Errors, MCMC
35 pages
Econometrics__2__Notes (2)
No ratings yet
Econometrics__2__Notes (2)
14 pages
13 Final Review
No ratings yet
13 Final Review
32 pages
RS1 Final Study Guide
No ratings yet
RS1 Final Study Guide
13 pages
Interval-Valued and Fuzzy-Valued Random Variables: From Computing Sample Variances To Computing Sample Covariances
No ratings yet
Interval-Valued and Fuzzy-Valued Random Variables: From Computing Sample Variances To Computing Sample Covariances
8 pages
Lesson One Introduction To Inferential Statistics
No ratings yet
Lesson One Introduction To Inferential Statistics
20 pages
Using Basic Statistics in the Behavioral and Social Sciences - The latest ebook version is now available for instant access
100% (1)
Using Basic Statistics in the Behavioral and Social Sciences - The latest ebook version is now available for instant access
56 pages
Statistics for The Behavioral Sciences 10th Edition, (Ebook PDF) pdf download
100% (36)
Statistics for The Behavioral Sciences 10th Edition, (Ebook PDF) pdf download
56 pages
Lecture 6: Classical Normal Linear Regression Model Some Basic Ideas
No ratings yet
Lecture 6: Classical Normal Linear Regression Model Some Basic Ideas
9 pages
AP Statistics 1st Semester Study Guide
No ratings yet
AP Statistics 1st Semester Study Guide
6 pages
Instant Access to Statistics Using IBM SPSS: An Integrative Approach – Ebook PDF Version ebook Full Chapters
100% (1)
Instant Access to Statistics Using IBM SPSS: An Integrative Approach – Ebook PDF Version ebook Full Chapters
47 pages
Lecture 1
No ratings yet
Lecture 1
12 pages
AP Stats Cheat Sheet FINAL
No ratings yet
AP Stats Cheat Sheet FINAL
8 pages
2. Lecture 2_MAT361 (21 JAN 2025)
No ratings yet
2. Lecture 2_MAT361 (21 JAN 2025)
40 pages
Introduction To Hypothesis Tests: Assistant Prof. Dr. Özgür Tosun
No ratings yet
Introduction To Hypothesis Tests: Assistant Prof. Dr. Özgür Tosun
71 pages
Chapter 1 To Chapter 2 Stat 222
No ratings yet
Chapter 1 To Chapter 2 Stat 222
21 pages
Uni Variate Regression
No ratings yet
Uni Variate Regression
61 pages
Gea Cheatsheet
No ratings yet
Gea Cheatsheet
4 pages
Basic Econometrics: M. Hashem Pesaran Paper 3, Lent Term
No ratings yet
Basic Econometrics: M. Hashem Pesaran Paper 3, Lent Term
25 pages
AP Stats Study Guide
No ratings yet
AP Stats Study Guide
17 pages
What Is Statistic
No ratings yet
What Is Statistic
129 pages
AP Stats Study Guide 1 1 1
No ratings yet
AP Stats Study Guide 1 1 1
21 pages
F (A) P (X A) : Var (X) 0 If and Only If X Is A Constant Var (X) Var (X+Y) Var (X) + Var (Y) Var (X-Y)
No ratings yet
F (A) P (X A) : Var (X) 0 If and Only If X Is A Constant Var (X) Var (X+Y) Var (X) + Var (Y) Var (X-Y)
8 pages
(eBook PDF) Elementary Statistics 4th Edition instant download
100% (1)
(eBook PDF) Elementary Statistics 4th Edition instant download
52 pages
Statistics For Decisions Making: Dr. Rohit Joshi, IIM Shillong
No ratings yet
Statistics For Decisions Making: Dr. Rohit Joshi, IIM Shillong
64 pages
(Original PDF) Intro Stats 5th Edition by Richard - Quickly download the ebook in PDF format for unlimited reading
100% (1)
(Original PDF) Intro Stats 5th Edition by Richard - Quickly download the ebook in PDF format for unlimited reading
54 pages
Sta230 20100329163207
100% (1)
Sta230 20100329163207
62 pages
Bayes Manuscripts
No ratings yet
Bayes Manuscripts
180 pages
Problem Set 2 Quantitative Methods UNIGE
No ratings yet
Problem Set 2 Quantitative Methods UNIGE
10 pages
Introduction to biostatistics
No ratings yet
Introduction to biostatistics
8 pages
dataanalyticsunit-2
No ratings yet
dataanalyticsunit-2
24 pages
Random Sets Approach and Its Applications
No ratings yet
Random Sets Approach and Its Applications
12 pages
Principles of Statistical Inference
100% (10)
Principles of Statistical Inference
236 pages
Math for Computer Applications
From Everand
Math for Computer Applications
The Editors of REA
No ratings yet
Bell's Inequality Untwisted
From Everand
Bell's Inequality Untwisted
Jim Spinosa
No ratings yet
P, S, A, and M Modes
No ratings yet
P, S, A, and M Modes
42 pages
Movie Settings: - Movie Quality: Choose From The Following Options. The Frame Rate Depends On
No ratings yet
Movie Settings: - Movie Quality: Choose From The Following Options. The Frame Rate Depends On
32 pages
Choose The Color Range
No ratings yet
Choose The Color Range
25 pages
Before Using The Remote Control: DK-20 Rubber Eyecup DK-5 Eyepiece Cap
No ratings yet
Before Using The Remote Control: DK-20 Rubber Eyecup DK-5 Eyepiece Cap
54 pages
Basic Playback: Press The
No ratings yet
Basic Playback: Press The
59 pages
Image Quality and Size
No ratings yet
Image Quality and Size
50 pages
Reason
No ratings yet
Reason
3 pages
Sal Ads: Served With Chicken
No ratings yet
Sal Ads: Served With Chicken
5 pages
Guide To Ter
No ratings yet
Guide To Ter
8 pages
Parkway Specialties: Pastrami Reuben Pastrami & Corned Beef Reuben Roasted Turkey Reuben Hot & Spicy Reuben
No ratings yet
Parkway Specialties: Pastrami Reuben Pastrami & Corned Beef Reuben Roasted Turkey Reuben Hot & Spicy Reuben
3 pages
Entrées: Fork-Worthy
No ratings yet
Entrées: Fork-Worthy
4 pages
It Comes at Perkins: All Together
No ratings yet
It Comes at Perkins: All Together
12 pages
Lecutre Notes PDF
No ratings yet
Lecutre Notes PDF
68 pages
Memo To Commission - Publix Gift Cards Database Review
No ratings yet
Memo To Commission - Publix Gift Cards Database Review
2 pages
Department of Education: Republic of The Philippines
No ratings yet
Department of Education: Republic of The Philippines
3 pages
Column Writing 2024
100% (1)
Column Writing 2024
21 pages
01 Questions About Language and Variation
100% (1)
01 Questions About Language and Variation
11 pages
Detailed Syllabus of Grade 5 Science
No ratings yet
Detailed Syllabus of Grade 5 Science
7 pages
Application Domain
No ratings yet
Application Domain
4 pages
2033820
No ratings yet
2033820
4 pages
2007, Digital Filter Applications To Modeling
No ratings yet
2007, Digital Filter Applications To Modeling
252 pages
Summary of PHD Thesis in Mathematics
100% (3)
Summary of PHD Thesis in Mathematics
4 pages
Geography Notes Form 3
100% (1)
Geography Notes Form 3
138 pages
Oceansimoactonweather
No ratings yet
Oceansimoactonweather
3 pages
Name: Mubarik Akbar Assignment: OOP Section: Bscs (H) Submitted To: Ms. SADIA ZAFAR Roll No: 349
No ratings yet
Name: Mubarik Akbar Assignment: OOP Section: Bscs (H) Submitted To: Ms. SADIA ZAFAR Roll No: 349
20 pages
Comparative Anatomy and Embryonic Development (DLP)
No ratings yet
Comparative Anatomy and Embryonic Development (DLP)
7 pages
Wieke Luthfia - Tutorial 1 - Mata Kuliah Bahasa Inggris
No ratings yet
Wieke Luthfia - Tutorial 1 - Mata Kuliah Bahasa Inggris
5 pages
Silt Curtain Selection Guide
100% (1)
Silt Curtain Selection Guide
6 pages
Van Der Waldt 2019 Community Profiling As Instrument To Enhance Project Planning in Local Government
No ratings yet
Van Der Waldt 2019 Community Profiling As Instrument To Enhance Project Planning in Local Government
21 pages
SPSP - Uttar Pradesh
No ratings yet
SPSP - Uttar Pradesh
311 pages
ĐỀ CƯƠNG KHỐI 11-Cuối HKII
No ratings yet
ĐỀ CƯƠNG KHỐI 11-Cuối HKII
30 pages
The Empirical Formula of Copper II Oxide
No ratings yet
The Empirical Formula of Copper II Oxide
4 pages
MSRA For DICV - Fixing of Shutter
No ratings yet
MSRA For DICV - Fixing of Shutter
3 pages
Đ NG Nai
No ratings yet
Đ NG Nai
12 pages
UNIT 9 - LESSON 1 - PART 1 - New Words and Reading
No ratings yet
UNIT 9 - LESSON 1 - PART 1 - New Words and Reading
6 pages
Astm C133
No ratings yet
Astm C133
2 pages
Chapter.5 Motivation and Emotion
No ratings yet
Chapter.5 Motivation and Emotion
21 pages
SB Test Bank Chapter 7
No ratings yet
SB Test Bank Chapter 7
77 pages
CTSB For Service Road
100% (1)
CTSB For Service Road
31 pages
KC Giarrano Bpts Basic Productivity Tools Lesson Idea Template
No ratings yet
KC Giarrano Bpts Basic Productivity Tools Lesson Idea Template
2 pages
Is It Good To Cooperate?
No ratings yet
Is It Good To Cooperate?
23 pages
Chryso Cure WP Tds - 6391 - 1592
No ratings yet
Chryso Cure WP Tds - 6391 - 1592
2 pages
DLT-5173-2012-Specification of Construction Survey in Hydroelectric and Hydraulic Engineering
No ratings yet
DLT-5173-2012-Specification of Construction Survey in Hydroelectric and Hydraulic Engineering
194 pages