WITMSE 2013

.
.
The MDL principle for arbitrary data:
either discrete or continuous or none of them
Joe Suzuki
Osaka University
WITMSE 2013
Sanjo-Kaikan, University of Tokyo, Japan
August 26, 2013
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them
WITMSE 2013Sanjo-Kaikan, University of To
/ 24

Road Map
Road Map
1 Problem
2 The Ryabko measure
3 The Radon-Nikodym theorem
4 Generalization
5 Universal Histogram Sequence
6 Conclusion
/ 24

Road Map
The slides of this talk can be seen via Internet
keywords: Joe Suzuki
slideshare
http://www.slideshare.net/prof-joe/
/ 24

Problem
Given {(xi , yi )}n
i=1, identify whether X ⊥⊥ Y or not
A, B: ﬁnite sets
xn = (x1, · · · , xn) ∈ An, yn = (y1, · · · , yn) ∈ Bn
Pn(xn|θ), Pn(yn|θ), Pn(xn, yn|θ): expressed by parameter θ
p: the prior probability of X ⊥⊥ Y
Bayesian solution
.
.
X ⊥⊥ Y ⇐⇒ pQn
(xn
)Qn
(yn
) ≥ (1 − p)Qn
(xn
, yn
)
Qn
(xn
) :=
∫
Pn
(xn
|θ)w(θ)dθ , Qn
(yn
) :=
∫
Pn
(yn
|θ)w(θ)dθ
Qn
(xn
, yn
) :=
∫
Pn
(xn
, yn
|θ)w(θ)dθ
using a weight w over θ.
/ 24

Problem
Q should be an alternative to P as n grows
A: the ﬁnite set in which X takes values.
Q is a Bayesian measure
.
.
Kraft’s inequality: ∑
xn∈An
Qn
(xn
) ≤ 1 (1)

For Example, Qn(xn) = |A|−n, xn ∈ An
satisﬁes (1); but
does not converges to Pn in any sense
/ 24

Problem
Universal Bayesian Measures
Qn
(xn
) :=
∫
Pn
(xn
|θ)w(θ)dθ
w(θ) ∝
∏
x∈A
θ−a[x]
with {a[x] = 1
2 }x∈A (Krichevsky-Troﬁmov)
−
1
n
log Qn
(xn
) → H(P)
for any Pn
(xn
|θ) =
∏
x∈A
θ−c[x]
with {c[x]}x∈A in xn ∈ An.
Shannon McMillian Breiman:
−
1
n
log Pn
(xn
|θ) −→ H(P)
for any stationary ergodic P, so that for Pn(xn) := Pn(xn|θ),
1
n
log
Pn(xn)
Qn(xn)
→ 0 . (2)
/ 24

Problem
When X has a density function f
There exists a g s.t. ∫
xn∈Rn
gn
(xn
) ≤ 1 (3)
1
n
log
f n(xn)
gn(xn)
→ 0 (4)
for any f satisfying a condition mentioned later (Ryabko 2009).
/ 24

Problem
The problem in this paper
Universal Bayesian measure in the general settings
.
.
What are (1)(2) and (3)(4) for general random variables ?
1 without assuming either discrete or continuous
2 removing the constraint Ryabko poses:
/ 24

The Ryabko measure
Ryabko measure: X has a density function f
A: the set in which X takes values.
{Aj }∞
j=0 :
{
A0 := {A}
Aj+1 is a reﬁnement of Aj
For example, for A = [0, 1), A0 = {[0, 1)}
A1 = {[0, 1/2), [1/2, 1)}
A2 = {[0, 1/4), [1/4, 1/2), [1/2, 3/4), [3/4, 1)}
. . .
Aj = {[0, 2−(j−1)), [2−(j−1), 2 · 2−(j−1)), · · · , [(2j−1 − 1)2−(j−1), 1)}
. . .
sj : A → Aj : x ∈ a ∈ Aj =⇒ sj (x) = a
λ: the Lebesgue measure
fj (x) :=
Pj (sj (x))
λ(sj (x))
for x ∈ A
/ 24

The Ryabko measure
Given xn = (x1, · · · , xn) ∈ An s.t. (sj (x1), · · · , sj (xn)) = (a1, · · · , an) ∈ An
j ,
f n
j (xn
) := fj (x1) · · · fj (xn) =
Pj (a1) · · · Pj (an)
λ(a1) . . . λ(an)
.
gn
j (xn
) :=
Qn
j (a1, · · · , an)
λ(a1) · · · λ(an)
Qj : a universal Bayesian measure w.r.t. ﬁnite set Aj .

f n(xn) := f (x1) · · · f (xn)
gn
(xn
) :=
∞∑
j=0
wj gn
j (xn
) for {ωj }∞
j=1 s.t.
∑
j
ωj = 1, ωj > 0
1
n
log
f n(xn)
gn(xn)
→ 0
for any f s.t. diﬀerential entropy h(fj ) → h(f ) as j → ∞ (Ryabko, 2009)
/ 24

The Radon-Nikodym theorem
In general, exactly when a density function exists ?
B: the entire Borel sets of R
µ(D) := P(X ∈ D): the probability of (X ∈ D) for D ∈ B
FX : the distribution function of X
µ is absolutely continuous w.r.t. λ (µ ≪ λ)
.
The following two are equivalent:
1 f : R → R exists s.t. P(X ≤ x) = FX (x) =
∫
t≤x
f (t)dt
2 for any D ∈ B, λ(D) :=
∫
D dx = 0 =⇒ µ(D) = 0.
f (x) =
dFX (x)
dx
/ 24

Even discrete variables have density functions!
B: a countable subset of R
µ(D) := P(X ∈ D): the probability of (X ∈ D) for D ⊆ B
r : B → R
µ is absolutely continuous w.r.t. η (µ ≪ η)
.
1 f : B → R exists s.t. P(X ∈ D) =
∑
x∈D
f (x)r(x), D ⊆ B
2 for any D ⊆ B, η(D) :=
∑
x∈D
r(x) = 0 =⇒ µ(D) = 0.
f (x) =
P(X = x)
r(x)
/ 24

Radon-Nikodym
µ, η: σ-ﬁnite measures over σ-ﬁeld F
µ is absolutely continuous w.r.t. η (µ ≪ η)
.
.
1 F-measurable f exists s.t. for any A ∈ F, µ(A) =
∫
A
f (t)dη(t)
2 for any A ∈ F, η(A) = 0 =⇒ µ(A) = 0
∫
A
f (t)dη(t) := sup
{Ai }
∑
i
[ inf
x∈Ai
f (x)]η(Ai )
dµ
dη
:= f is the density function w.r.t. η when µ is the probability measure.
/ 24

Generalization
When Y has a density function w.r.t. η s.t. µ ≪ η
B: the set in which Y takes values.
{Bj }∞
k=0 :
{
B0 := {B}
Bk+1 is a reﬁnement of Bk
For example, for B = N := {1, 2, · · · }, B0 = {B}
B1 := {{1}, {2, 3, · · · }}
B2 := {{1}, {2}, {3, 4, · · · }}
. . .
Bk := {{1}, {2}, · · · , {k}, {k + 1, k + 2, · · · }}
. . .
tk : B → Bk: y ∈ b ∈ Bk =⇒ tk(y) = b
η: µ ≪ η
fk(y) :=
Pk(tk(y))
η(tk(y))
for y ∈ B
/ 24

Generalization
Given yn = (y1, · · · , yn) ∈ Bn, s.t.
(tk(y1), · · · , tk(yn)) = (b1, · · · , bn) ∈ Bn
k ,
f n
k (yn
) := fk(y1) · · · fk(yn) =
Pk(b1) · · · Pk(bn)
η(b1) . . . η(bn)
gn
k (yn
) :=
Qn
k (b1, · · · , bn)
η(b1) · · · η(bn)
Qk: a universal Bayesian measure w.r.t. ﬁnite set Bk
Similarly,
1
n
log
f n(xn)
gn(xn)
→ 0
for any f s.t. h(fj ) → h(f ) as j → ∞
h(f ) :=
∫
−f (y) log f (y)dη(y)
/ 24

Generalization
Generalization
µn
(Dn
) :=
∫
D
f n
(yn
)dηn
(yn
) , Dn
∈ Bn
νn
(Dn
) :=
∫
D
gn
(yn
)dηn
(yn
) , Dn
∈ Bn
f n(yn)
gn(yn)
=
dµn
dηn
(yn
)/
dνn
dηn
(yn
) =
dµn
dνn
(yn
)
D(µ||ν) :=
∫
dµ log
dµ
dν
h(f ) :=
∫
−f (y) log f (y)dη(y)
= −
∫
dµ
dη
(y) log
dµ
dη
(y) · dη(y) = −D(µ||η)
/ 24

Generalization
Result 1
Proposition 1 (Suzuki, 2011)
.
If µ ≪ η, ν ≪ η exists s.t. νn(Rn) ≤ 1 and
1
n
log
dµn
dνn
(yn
) → 0
for any µ s.t. D(µk||η) → D(µ||η) as k → ∞.
/ 24

Generalization
The solution of the exercise in Introduction
{Aj × Bk}
gn
j,k(xn
, yn
) :=
Qn
j,k(a1, b1, · · · , an, bn)
λ(a1) · · · λ(an)η(b1) · · · η(bn)
gn
(xn
, yn
) :=
∑
j,k
wj,kgn
j,k(xn
, yn
) for {ωj,k} s.t.
∑
j,k
ωj,k = 1, ωj,k > 0
1
n
log
f n(xn, yn)
gn(xn, yn)
→ 0
Solution
We estimate
f n(xn, yn)
f n(xn)f n(yn)
by
gn(xn, yn)
gn(xn)gn(yn)
extending
Qn(xn, yn)
Qn(xn)Qn(yn)
.
/ 24

Generalization
Further generalization
Proposition 1 assumes
a speciﬁc histogram sequence {Bk}; and
µ should satisfy D(µk||η) → D(µ||η) as k → ∞

{Bk} should be universal
Construct {Bk} s.t. D(µk||η) → D(µ||η) as k → ∞ for any µ
/ 24

Universal Histogram Sequence
Universal histogram sequence {Bk}
µ, σ ∈ R, σ > 0.
{Ck}∞
k=0:
C0 = {(−∞, ∞)}
C1 = {(−∞, µ], (µ, ∞)}
C2 = {(−∞, µ − σ], (µ − σ, µ], (µ, µ + σ], (µ + σ, ∞)}
· · ·
Ck → Ck+1:


(−∞, µ − (k − 1)σ] → (−∞, µ − kσ], (µ − kσ, µ − (k − 1)σ]
(a, b] → (a, a+b
2 ], (a+b
2 , b]
(µ + (k − 1)σ, ∞) → (µ + (k − 1)σ, µ + kσ], (µ + kσ, ∞)

B: the set in which Y takes values
Bk := {B ∩ c|c ∈ Ck}{ϕ} .
/ 24

B = R and µ ≪ λ
{Bk} = {Ck}
For each y ∈ B, there exist K ∈ N and a unique {(ak, bk]}∞
k=K s.t.
{
y ∈ [ak, bk] ∈ Bk , k = K, K + 1, · · ·
|ak − bk| → 0 , k → ∞
FY : the distribution function of Y



fk(y) =
P(Y ∈ (ak, bk])
λ((ak, bk])
=
FY (bk) − FY (ak)
bk − ak
→ f (y) , y ∈ B
h(fk) → h(f )
as k → ∞ for any f
/ 24

B = N and µ ≪ η
B0 = {B}
B1 := {{1}, {2, 3, · · · }}
B2 := {{1}, {2}, {3, 4, · · · }}
. . .
Bk := {{1}, {2}, · · · , {k}, {k + 1, k + 2, · · · }}
. . .
can be obtained via µ = 1, σ = 1.
For each y ∈ B, there exists K ∈ N and a unique {Dk}∞
k=1 s.t.
{
y ∈ Dk ∈ Bk k = 1, 2, · · ·
{y} = Dk ∈ Bk, k = K, K + 1, · · ·



fk(y) =
P(Y ∈ Dk)
η(Dk)
→ f (y) =
P(Y = y)
η({y})
, y ∈ B
h(fk) → h(f )
as k → ∞ for any f
/ 24

Result 2
Theorem 1
.
.
If µ ≪ η, ν ≪ η exists s.t. νn(Rn) ≤ 1 and for any µ
1
n
log
dµn
dνn
(yn
) → 0
The proof is based on the following observation:
Billingeley: Probability & Measure, Problem 32.13
lim
h→0
µ((x − h, x + h])
η((x − h, x + h])
= f (x) , x ∈ R
to remove the condition Ryabko posed:
“for any µ s.t. D(µk||η) → D(µ||η) as k → ∞”
/ 24

Conclusion
Summary and Discussion
Universal Bayesian Measure
.
.the random variables may be either discrete or continuous
a universal histogram sequence to remove Ryabko’s condition
Many Applications
.
Bayesian network structure estimation (DCC 2012)
The Bayesian Chow-Liu Algorithm (PGM 2012)
Markov order estimation even when {Xi } is continuous
Extending MDL:
gn(yn|m): the universal Bayesian measure w.r.t. model m given yn ∈ Bn
pm: the prior probability of model m
− log gn
(yn
|m) − log pm → min
/ 24

WITMSE 2013

More Related Content

What's hot (20)

Viewers also liked (8)

Similar to WITMSE 2013 (20)

More from Joe Suzuki (20)

Recently uploaded (20)

WITMSE 2013