0% found this document useful (0 votes)

6 views5 pages

Chapter Iii. Statistical Models: I I N I I N I I N I I N

The document discusses Bayesian linear regression and the calculation of posterior hyperparameters based on observed data sets. It outlines the equations for updating hyperparameters after observing data and provides the joint posterior distribution of regression coefficients and noise variance. Additionally, it covers concepts such as statistical independence and hypotheses in the context of statistical models.

Uploaded by

nelelen929

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views5 pages

Chapter Iii. Statistical Models: I I N I I N I I N I I N

Uploaded by

nelelen929

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

512 CHAPTER III.

STATISTICAL MODELS

(i+1)
µ0 = µ(i)
n
(i+1)
Λ0 = Λ(i)
n
(i+1)
(7)
a0 = a(i)
n
(i+1)
b0 = b(i)
n .

The posterior distribution for Bayesian linear regression when observing a single data set is given by
the following hyperparameter equations (→ III/1.6.2):

µn = Λ−1 T
n (X P y + Λ0 µ0 )
Λn = X T P X + Λ 0
n (8)
an = a0 +
2
1 T
0 Λ0 µ0 − µn Λn µn ) .
bn = b0 + (y P y + µT T
2
We can apply (8) to calculate the posterior hyperparameters after seeing the first data set:

−1

(1) (1)
µ(1) (1)
n = Λn X1T P1 y1 + Λ0 µ0
−1
= Λ(1)
n X1T P1 y1 + Λ0 µ0
(1)
Λ(1) T
n = X1 P 1 X1 + Λ 0
= X1T P1 X1 + Λ0
(1) 1
a(1)
n = a0 + n1 (9)
2
1
= a0 + n 1
2
(1) 1 T (1) T (1) (1) T (1) (1)

b(1)
n = b0 + y1 P1 y1 + µ0 Λ0 µ0 − µ(1) n Λ n µ n
2

1 T
(1) T (1) (1)
= b0 + 0 Λ0 µ0 − µn
y 1 P1 y 1 + µ T Λn µn .
2
These are the prior hyperparameters before seeing the second data set:

(2)
µ0 = µ(1)
n
(2)
Λ0 = Λ(1)
n
(2)
(10)
a0 = a(1)
n
(2)
b0 = b(1)
n .

Thus, we can again use (8) to calculate the posterior hyperparameters after seeing the second data
set:
496 CHAPTER III. STATISTICAL MODELS

Completing the square over β, we finally have

s
τ n+p b0 a0 a0 −1
p(y, β, τ ) = |P ||Λ 0 | τ exp[−b0 τ ]·
(2π)n+p Γ(a0 ) (12)
h τ i
exp − (β − µn )T Λn (β − µn ) + (y T P y + µT Λ µ
0 0 0 − µT
Λ µ
n n n )
2
with the posterior hyperparameters (→ I/5.1.7)

µn = Λ−1 T
n (X P y + Λ0 µ0 )
(13)
Λn = X T P X + Λ 0 .

Ergo, the joint likelihood is proportional to

h τ i
p(y, β, τ ) ∝ τ · exp − (β − µn ) Λn (β − µn ) · τ an −1 · exp [−bn τ ]
p/2 T
(14)
2
with the posterior hyperparameters (→ I/5.1.7)

n
an = a0 +
2
1 T (15)
0 Λ0 µ0 − µn Λn µn ) .
bn = b0 + (y P y + µT T
2
From the term in (14), we can isolate the posterior distribution over β given τ :

p(β|τ, y) = N (β; µn , (τ Λn )−1 ) . (16)

From the remaining term, we can isolate the posterior distribution over τ :

p(τ |y) = Gam(τ ; an , bn ) . (17)

Together, (16) and (17) constitute the joint (→ I/1.3.2) posterior distribution (→ I/5.1.7) of β and
τ.

■
Sources:
• Bishop CM (2006): “Bayesian linear regression”; in: Pattern Recognition for Machine Learning,
pp. 152-161, ex. 3.12, eq. 3.113; URL: https://www.springer.com/gp/book/9780387310732.

1.6.3 Log model evidence

Theorem: Let

m : y = Xβ + ε, ε ∼ N (0, σ 2 V ) (1)
be a linear regression model (→ III/1.5.1) with measured n × 1 data vector y, known n × p design
matrix X, known n × n covariance structure V as well as unknown p × 1 regression coeﬀicients β
and unknown noise variance σ 2 . Moreover, assume a normal-gamma prior distribution (→ III/1.6.1)
over the model parameters β and τ = 1/σ 2 :
56 CHAPTER I. GENERAL THEOREMS

■
Sources:
• Wikipedia (2020): “Variance”; in: Wikipedia, the free encyclopedia, retrieved on 2020-06-06; URL:
https://en.wikipedia.org/wiki/Variance#Basic_properties.

1.11.5 Variance of a constant

Theorem: The variance (→ I/1.11.1) of a constant (→ I/1.2.5) is zero

a = const. ⇒ Var(a) = 0 (1)

and if the variance (→ I/1.11.1) of X is zero, then X is a constant (→ I/1.2.5)

Var(X) = 0 ⇒ X = const. (2)

Proof:
1) A constant (→ I/1.2.5) is defined as a quantity that always has the same value. Thus, if understood
as a random variable (→ I/1.2.2), the expected value (→ I/1.10.1) of a constant is equal to itself:

E(a) = a . (3)
Plugged into the formula of the variance (→ I/1.11.1), we have

Var(a) = E (a − E(a))2

= E (a − a)2 (4)
= E(0) .

Applied to the formula of the expected value (→ I/1.10.1), this gives

X
E(0) = x · fX (x) = 0 · 1 = 0 . (5)
x=0

Together, (4) and (5) imply (1).

2) The variance (→ I/1.11.1) is defined as

Var(X) = E (X − E(X))2 . (6)
Because (X − E(X))2 is strictly non-negative (→ I/1.10.4), the only way for the variance to become
zero is, if the squared deviation is always zero:

(X − E(X))2 = 0 . (7)
This, in turn, requires that X is equal to its expected value (→ I/1.10.1)

X = E(X) (8)
which can only be the case, if X always has the same value (→ I/1.2.5):

X = const. (9)
1. PROBABILITY THEORY 7

• Stephan KE, Penny WD, Daunizeau J, Moran RJ, Friston KJ (2009): “Bayesian model selection for
group studies”; in: NeuroImage, vol. 46, pp. 1004–1017, eq. 16; URL: https://www.sciencedirect.
com/science/article/abs/pii/S1053811909002638; DOI: 10.1016/j.neuroimage.2009.03.025.
• Soch J, Allefeld C (2016): “Exceedance Probabilities for the Dirichlet Distribution”; in: arXiv
stat.AP, 1611.01439; URL: https://arxiv.org/abs/1611.01439.

1.3.6 Statistical independence

Definition: Generally speaking, random variables (→ I/1.2.2) are statistically independent, if their
joint probability (→ I/1.3.2) can be expressed in terms of their marginal probabilities (→ I/1.3.3).

1) A set of discrete random variables (→ I/1.2.2) X1 , . . . , Xn with possible values X1 , . . . , Xn is called

statistically independent, if
Y
n
p(X1 = x1 , . . . , Xn = xn ) = p(Xi = xi ) for all xi ∈ Xi , i = 1, . . . , n (1)
i=1

where p(x1 , . . . , xn ) are the joint probabilities (→ I/1.3.2) of X1 , . . . , Xn and p(xi ) are the marginal
probabilities (→ I/1.3.3) of Xi .

2) A set of continuous random variables (→ I/1.2.2) X1 , . . . , Xn defined on the domains X1 , . . . , Xn

is called statistically independent, if
Y
n
FX1 ,...,Xn (x1 , . . . , xn ) = FXi (xi ) for all xi ∈ Xi , i = 1, . . . , n (2)
i=1

or equivalently, if the probability densities (→ I/1.7.1) exist, if

Y
n
fX1 ,...,Xn (x1 , . . . , xn ) = fXi (xi ) for all xi ∈ Xi , i = 1, . . . , n (3)
i=1

where F are the joint (→ I/1.5.2) or marginal (→ I/1.5.3) cumulative distribution functions (→
I/1.8.1) and f are the respective probability density functions (→ I/1.7.1).

Sources:
• Wikipedia (2020): “Independence (probability theory)”; in: Wikipedia, the free encyclopedia, re-
trieved on 2020-06-06; URL: https://en.wikipedia.org/wiki/Independence_(probability_theory)
#Definition.

1.3.7 Conditional independence

Definition: Generally speaking, random variables (→ I/1.2.2) are conditionally independent given
another random variable, if they are statistically independent (→ I/1.3.6) in their conditional prob-
ability distributions (→ I/1.5.4) given this random variable.

1) A set of discrete random variables (→ I/1.2.6) X1 , . . . , Xn with possible values X1 , . . . , Xn is called

conditionally independent given the random variable Y with possible values Y, if
118 CHAPTER I. GENERAL THEOREMS

1) expressing the first k moments (→ I/1.18.1) of y in terms of θ

µ1 = f1 (θ1 , . . . , θk )
.. (1)
.
µk = fk (θ1 , . . . , θk ) ,

2) calculating the first k sample moments (→ I/1.18.1) from y

µ̂1 (y), . . . , µ̂k (y) (2)

3) and solving the system of k equations

µ̂1 (y) = f1 (θ̂1 , . . . , θ̂k )

.. (3)
.
µ̂k (y) = fk (θ̂1 , . . . , θ̂k )

for θ̂1 , . . . , θ̂k , which are subsequently refered to as “method-of-moments estimates”.

Sources:
• Wikipedia (2021): “Method of moments (statistics)”; in: Wikipedia, the free encyclopedia, retrieved
on 2021-04-29; URL: https://en.wikipedia.org/wiki/Method_of_moments_(statistics)#Method.

4.2 Statistical hypotheses

4.2.1 Statistical hypothesis
Definition: A statistical hypothesis is a statement about the parameters of a distribution describing
a population from which observations can be sampled as measured data.
More precisely, let m be a generative model (→ I/5.1.1) describing measured data y in terms of a
distribution D(θ) with model parameters θ ∈ Θ. Then, a statistical hypothesis is formally specified
as

H : θ ∈ Θ∗ where Θ∗ ⊂ Θ . (1)

Sources:
• Wikipedia (2021): “Statistical hypothesis testing”; in: Wikipedia, the free encyclopedia, retrieved
on 2021-03-19; URL: https://en.wikipedia.org/wiki/Statistical_hypothesis_testing#Definition_
of_terms.

4.2.2 Simple vs. composite

Definition: Let H be a statistical hypothesis (→ I/4.2.1). Then,

Stat520 Ch.4
No ratings yet
Stat520 Ch.4
5 pages
Chapter V. Appendix
No ratings yet
Chapter V. Appendix
5 pages
Stat520 Ch.4
No ratings yet
Stat520 Ch.4
5 pages
Principles of Statistics
No ratings yet
Principles of Statistics
113 pages
Stat520 Ch.5
No ratings yet
Stat520 Ch.5
5 pages
Statistical Inference: Parametric vs Nonparametric
No ratings yet
Statistical Inference: Parametric vs Nonparametric
8 pages
Filt Ident Lecturenotes
No ratings yet
Filt Ident Lecturenotes
12 pages
Probability and Statistics Cheat Sheet
100% (3)
Probability and Statistics Cheat Sheet
28 pages
MA204 FinalTest 2022
No ratings yet
MA204 FinalTest 2022
14 pages
Stat520 Ch.1
No ratings yet
Stat520 Ch.1
5 pages
Stat520 Ch.5
No ratings yet
Stat520 Ch.5
5 pages
College Statistics
No ratings yet
College Statistics
244 pages
Stat520 Ch.3
No ratings yet
Stat520 Ch.3
5 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
Introduction
No ratings yet
Introduction
11 pages
4 - Probability Theory
No ratings yet
4 - Probability Theory
20 pages
STAT2102 Chapter6
No ratings yet
STAT2102 Chapter6
5 pages
Statistics
No ratings yet
Statistics
53 pages
Statistical+Inference+1 Shaw2007
No ratings yet
Statistical+Inference+1 Shaw2007
66 pages
Probability and Statistics Overview
No ratings yet
Probability and Statistics Overview
26 pages
Project Report
No ratings yet
Project Report
56 pages
1) Alexander McFarlane Mood, Franklin A. Graybill, Duane C. Boes - Introduction To The Theory of Statistics
No ratings yet
1) Alexander McFarlane Mood, Franklin A. Graybill, Duane C. Boes - Introduction To The Theory of Statistics
577 pages
Mood - Graybill - Boes (1974) Introduction To The Theory of Statistics
100% (1)
Mood - Graybill - Boes (1974) Introduction To The Theory of Statistics
577 pages
Mood - Graybill - Boes (1974) Introduction To The Theory of Statistics PDF
75% (8)
Mood - Graybill - Boes (1974) Introduction To The Theory of Statistics PDF
577 pages
Basic Statistic
No ratings yet
Basic Statistic
20 pages
L5 6 7 ML
No ratings yet
L5 6 7 ML
28 pages
Stat 535: Bayesian Methods Overview
No ratings yet
Stat 535: Bayesian Methods Overview
23 pages
Understanding Random Variables in Statistics
No ratings yet
Understanding Random Variables in Statistics
6 pages
MA40189 Notes
No ratings yet
MA40189 Notes
70 pages
Statistical Inference Foundations
No ratings yet
Statistical Inference Foundations
89 pages
Math Stats Text
100% (1)
Math Stats Text
577 pages
Mood An Introduction To The Theory of Statistics
No ratings yet
Mood An Introduction To The Theory of Statistics
577 pages
MGB 3rd Edition (Part 1 - Stat 121 122 - Chap 1 To 4) PDF
No ratings yet
MGB 3rd Edition (Part 1 - Stat 121 122 - Chap 1 To 4) PDF
187 pages
X, ..., X, X, ..., X X, X, ..., X: Basic Statistics
No ratings yet
X, ..., X, X, ..., X X, X, ..., X: Basic Statistics
29 pages
Basic Probability Lecture Notes
No ratings yet
Basic Probability Lecture Notes
7 pages
Stat-Review Xid-8243919 1
No ratings yet
Stat-Review Xid-8243919 1
24 pages
Sampling Distributions: 1.1 Statistical Inference
No ratings yet
Sampling Distributions: 1.1 Statistical Inference
22 pages
Stat520 Ch.2
No ratings yet
Stat520 Ch.2
5 pages
MIT14 381F13 Lec1 PDF
No ratings yet
MIT14 381F13 Lec1 PDF
8 pages
Stat520 Ch.3
No ratings yet
Stat520 Ch.3
5 pages
Discrete Probability Basics for Information Theory
No ratings yet
Discrete Probability Basics for Information Theory
7 pages
Probability Theory Overview and Concepts
No ratings yet
Probability Theory Overview and Concepts
36 pages
Introduction To Theory of Statistics, A. Mood, F. Graybill and B. Boes, McGrow-Hill
100% (1)
Introduction To Theory of Statistics, A. Mood, F. Graybill and B. Boes, McGrow-Hill
578 pages
Mood Introduction To The Theory of Statistics
50% (2)
Mood Introduction To The Theory of Statistics
577 pages
Mood A.m., Graybill F.a., Boes D.C. Introduction To The Theory of Statistics (3rd Ed., McGraw-Hil - 0
No ratings yet
Mood A.m., Graybill F.a., Boes D.C. Introduction To The Theory of Statistics (3rd Ed., McGraw-Hil - 0
577 pages
$$$MGB3rdSearchable PDF
100% (1)
$$$MGB3rdSearchable PDF
577 pages
Chapter 2
No ratings yet
Chapter 2
49 pages
Ps Formuale
No ratings yet
Ps Formuale
7 pages
Probability & Statistics Revision Notes
No ratings yet
Probability & Statistics Revision Notes
30 pages
A Probability and Statistics Cheatsheet
No ratings yet
A Probability and Statistics Cheatsheet
28 pages
Fundamental Statistics For The Behavioral Sciences v2.0
No ratings yet
Fundamental Statistics For The Behavioral Sciences v2.0
342 pages
13 - Pediatric Board 2022 Full Workbook
100% (1)
13 - Pediatric Board 2022 Full Workbook
1,108 pages
Chapter 4: Multiple Random Variables: Yunghsiang S. Han
No ratings yet
Chapter 4: Multiple Random Variables: Yunghsiang S. Han
121 pages
Creating Knowledge Objects v9.0 LabGuide
No ratings yet
Creating Knowledge Objects v9.0 LabGuide
19 pages
Handbook of Survival Analysis 1st Edition John P Klein Hans C Van Houwelingen Download
No ratings yet
Handbook of Survival Analysis 1st Edition John P Klein Hans C Van Houwelingen Download
90 pages
Quantitative Research Overview
No ratings yet
Quantitative Research Overview
30 pages
7.simple Classification
No ratings yet
7.simple Classification
45 pages
Decision Tree Algorithm, Explained
No ratings yet
Decision Tree Algorithm, Explained
20 pages
Skittlespart 2
No ratings yet
Skittlespart 2
8 pages
Final Quantitative Research-1
91% (35)
Final Quantitative Research-1
34 pages
BEED 7 W6 Statistics Probability
No ratings yet
BEED 7 W6 Statistics Probability
21 pages
Data Analysis: Summary Statistics Guide
No ratings yet
Data Analysis: Summary Statistics Guide
22 pages
MTH618 WK 3 Lect 1: Random Variables. Probability Distributions
No ratings yet
MTH618 WK 3 Lect 1: Random Variables. Probability Distributions
17 pages
Ramin Shamshiri ABE6981 HW - 03
No ratings yet
Ramin Shamshiri ABE6981 HW - 03
13 pages
NESA - Mathematics - Life - Skills - 11 - 12 - 2024 (S6, LS)
No ratings yet
NESA - Mathematics - Life - Skills - 11 - 12 - 2024 (S6, LS)
123 pages
Stochastic Processes and Markov Chains
No ratings yet
Stochastic Processes and Markov Chains
8 pages
Estimation of Saturated Data Using The Tobit Kalman Filter
No ratings yet
Estimation of Saturated Data Using The Tobit Kalman Filter
6 pages
Statistics For The Social Sciences - A General Linear Model Approach-Cambridge University Press (2018)
100% (1)
Statistics For The Social Sciences - A General Linear Model Approach-Cambridge University Press (2018)
600 pages
T-Test in Excel (In Easy Steps)
No ratings yet
T-Test in Excel (In Easy Steps)
5 pages
Understanding Error Propagation in Experiments
No ratings yet
Understanding Error Propagation in Experiments
8 pages
Assam Govt Job Openings 2016
No ratings yet
Assam Govt Job Openings 2016
4 pages
2023_2 E-Examination Timetable
No ratings yet
2023_2 E-Examination Timetable
6 pages
Health Information Systems (HIS)
No ratings yet
Health Information Systems (HIS)
34 pages
Body of Knowledge v1 Bookmarksv2
No ratings yet
Body of Knowledge v1 Bookmarksv2
319 pages
MSAF601 Research Exam 2023
No ratings yet
MSAF601 Research Exam 2023
12 pages
Class 11 ECO Sample Papers 2024 25
No ratings yet
Class 11 ECO Sample Papers 2024 25
9 pages
Data Science Training in Naresh I Technologies
100% (3)
Data Science Training in Naresh I Technologies
18 pages
Macroeconomic Impact on Nigeria's Stock Index
No ratings yet
Macroeconomic Impact on Nigeria's Stock Index
7 pages
Psychometrics: Understanding Reliability
No ratings yet
Psychometrics: Understanding Reliability
17 pages
Stat II Ch-2
No ratings yet
Stat II Ch-2
14 pages

Chapter Iii. Statistical Models: I I N I I N I I N I I N

Uploaded by

Chapter Iii. Statistical Models: I I N I I N I I N I I N

Uploaded by

512 CHAPTER III.

Completing the square over β, we finally have

Ergo, the joint likelihood is proportional to

p(β|τ, y) = N (β; µn , (τ Λn )−1 ) . (16)

p(τ |y) = Gam(τ ; an , bn ) . (17)

1.6.3 Log model evidence

1.11.5 Variance of a constant

a = const. ⇒ Var(a) = 0 (1)

Var(X) = 0 ⇒ X = const. (2)

Applied to the formula of the expected value (→ I/1.10.1), this gives

Together, (4) and (5) imply (1).

2) The variance (→ I/1.11.1) is defined as

1.3.6 Statistical independence

1) A set of discrete random variables (→ I/1.2.2) X1 , . . . , Xn with possible values X1 , . . . , Xn is called

2) A set of continuous random variables (→ I/1.2.2) X1 , . . . , Xn defined on the domains X1 , . . . , Xn

or equivalently, if the probability densities (→ I/1.7.1) exist, if

1.3.7 Conditional independence

1) A set of discrete random variables (→ I/1.2.6) X1 , . . . , Xn with possible values X1 , . . . , Xn is called

1) expressing the first k moments (→ I/1.18.1) of y in terms of θ

2) calculating the first k sample moments (→ I/1.18.1) from y

µ̂1 (y), . . . , µ̂k (y) (2)

3) and solving the system of k equations

µ̂1 (y) = f1 (θ̂1 , . . . , θ̂k )

for θ̂1 , . . . , θ̂k , which are subsequently refered to as “method-of-moments estimates”.

4.2 Statistical hypotheses

4.2.2 Simple vs. composite

You might also like