.
.
The MDL principle for arbitrary data:
either discrete or continuous or none of them
Joe Suzuki
Osaka University
WITMSE 2013
Sanjo-Kaikan, University of Tokyo, Japan
August 26, 2013
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them
WITMSE 2013Sanjo-Kaikan, University of To
/ 24
Road Map
Road Map
1 Problem
2 The Ryabko measure
3 The Radon-Nikodym theorem
4 Generalization
5 Universal Histogram Sequence
6 Conclusion
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them
WITMSE 2013Sanjo-Kaikan, University of To
/ 24
Road Map
The slides of this talk can be seen via Internet
keywords: Joe Suzuki
slideshare
http://www.slideshare.net/prof-joe/
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them
WITMSE 2013Sanjo-Kaikan, University of To
/ 24
Problem
Given {(xi , yi )}n
i=1, identify whether X ⊥⊥ Y or not
A, B: finite sets
xn = (x1, · · · , xn) ∈ An, yn = (y1, · · · , yn) ∈ Bn
Pn(xn|θ), Pn(yn|θ), Pn(xn, yn|θ): expressed by parameter θ
p: the prior probability of X ⊥⊥ Y
Bayesian solution
.
.
X ⊥⊥ Y ⇐⇒ pQn
(xn
)Qn
(yn
) ≥ (1 − p)Qn
(xn
, yn
)
Qn
(xn
) :=
∫
Pn
(xn
|θ)w(θ)dθ , Qn
(yn
) :=
∫
Pn
(yn
|θ)w(θ)dθ
Qn
(xn
, yn
) :=
∫
Pn
(xn
, yn
|θ)w(θ)dθ
using a weight w over θ.
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them
WITMSE 2013Sanjo-Kaikan, University of To
/ 24
Problem
Q should be an alternative to P as n grows
A: the finite set in which X takes values.
Q is a Bayesian measure
.
.
Kraft’s inequality: ∑
xn∈An
Qn
(xn
) ≤ 1 (1)
 
For Example, Qn(xn) = |A|−n, xn ∈ An
satisfies (1); but
does not converges to Pn in any sense
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them
WITMSE 2013Sanjo-Kaikan, University of To
/ 24
Problem
Universal Bayesian Measures
Qn
(xn
) :=
∫
Pn
(xn
|θ)w(θ)dθ
w(θ) ∝
∏
x∈A
θ−a[x]
with {a[x] = 1
2 }x∈A (Krichevsky-Trofimov)
−
1
n
log Qn
(xn
) → H(P)
for any Pn
(xn
|θ) =
∏
x∈A
θ−c[x]
with {c[x]}x∈A in xn ∈ An.
Shannon McMillian Breiman:
−
1
n
log Pn
(xn
|θ) −→ H(P)
for any stationary ergodic P, so that for Pn(xn) := Pn(xn|θ),
1
n
log
Pn(xn)
Qn(xn)
→ 0 . (2)
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them
WITMSE 2013Sanjo-Kaikan, University of To
/ 24
Problem
When X has a density function f
There exists a g s.t. ∫
xn∈Rn
gn
(xn
) ≤ 1 (3)
1
n
log
f n(xn)
gn(xn)
→ 0 (4)
for any f satisfying a condition mentioned later (Ryabko 2009).
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them
WITMSE 2013Sanjo-Kaikan, University of To
/ 24
Problem
The problem in this paper
Universal Bayesian measure in the general settings
.
.
What are (1)(2) and (3)(4) for general random variables ?
1 without assuming either discrete or continuous
2 removing the constraint Ryabko poses:
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them
WITMSE 2013Sanjo-Kaikan, University of To
/ 24
The Ryabko measure
Ryabko measure: X has a density function f
A: the set in which X takes values.
{Aj }∞
j=0 :
{
A0 := {A}
Aj+1 is a refinement of Aj
For example, for A = [0, 1), A0 = {[0, 1)}
A1 = {[0, 1/2), [1/2, 1)}
A2 = {[0, 1/4), [1/4, 1/2), [1/2, 3/4), [3/4, 1)}
. . .
Aj = {[0, 2−(j−1)), [2−(j−1), 2 · 2−(j−1)), · · · , [(2j−1 − 1)2−(j−1), 1)}
. . .
sj : A → Aj : x ∈ a ∈ Aj =⇒ sj (x) = a
λ: the Lebesgue measure
fj (x) :=
Pj (sj (x))
λ(sj (x))
for x ∈ A
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them
WITMSE 2013Sanjo-Kaikan, University of To
/ 24
The Ryabko measure
Given xn = (x1, · · · , xn) ∈ An s.t. (sj (x1), · · · , sj (xn)) = (a1, · · · , an) ∈ An
j ,
f n
j (xn
) := fj (x1) · · · fj (xn) =
Pj (a1) · · · Pj (an)
λ(a1) . . . λ(an)
.
gn
j (xn
) :=
Qn
j (a1, · · · , an)
λ(a1) · · · λ(an)
Qj : a universal Bayesian measure w.r.t. finite set Aj .
 
f n(xn) := f (x1) · · · f (xn)
gn
(xn
) :=
∞∑
j=0
wj gn
j (xn
) for {ωj }∞
j=1 s.t.
∑
j
ωj = 1, ωj > 0
1
n
log
f n(xn)
gn(xn)
→ 0
for any f s.t. differential entropy h(fj ) → h(f ) as j → ∞ (Ryabko, 2009)
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them
WITMSE 2013Sanjo-Kaikan, University of To
/ 24
The Radon-Nikodym theorem
In general, exactly when a density function exists ?
B: the entire Borel sets of R
µ(D) := P(X ∈ D): the probability of (X ∈ D) for D ∈ B
FX : the distribution function of X
µ is absolutely continuous w.r.t. λ (µ ≪ λ)
.
The following two are equivalent:
1 f : R → R exists s.t. P(X ≤ x) = FX (x) =
∫
t≤x
f (t)dt
2 for any D ∈ B, λ(D) :=
∫
D dx = 0 =⇒ µ(D) = 0.
f (x) =
dFX (x)
dx
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them
WITMSE 2013Sanjo-Kaikan, University of To
/ 24
The Radon-Nikodym theorem
Even discrete variables have density functions!
B: a countable subset of R
µ(D) := P(X ∈ D): the probability of (X ∈ D) for D ⊆ B
r : B → R
µ is absolutely continuous w.r.t. η (µ ≪ η)
.
1 f : B → R exists s.t. P(X ∈ D) =
∑
x∈D
f (x)r(x), D ⊆ B
2 for any D ⊆ B, η(D) :=
∑
x∈D
r(x) = 0 =⇒ µ(D) = 0.
f (x) =
P(X = x)
r(x)
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them
WITMSE 2013Sanjo-Kaikan, University of To
/ 24
The Radon-Nikodym theorem
Radon-Nikodym
µ, η: σ-finite measures over σ-field F
µ is absolutely continuous w.r.t. η (µ ≪ η)
.
.
1 F-measurable f exists s.t. for any A ∈ F, µ(A) =
∫
A
f (t)dη(t)
2 for any A ∈ F, η(A) = 0 =⇒ µ(A) = 0
∫
A
f (t)dη(t) := sup
{Ai }
∑
i
[ inf
x∈Ai
f (x)]η(Ai )
dµ
dη
:= f is the density function w.r.t. η when µ is the probability measure.
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them
WITMSE 2013Sanjo-Kaikan, University of To
/ 24
Generalization
When Y has a density function w.r.t. η s.t. µ ≪ η
B: the set in which Y takes values.
{Bj }∞
k=0 :
{
B0 := {B}
Bk+1 is a refinement of Bk
For example, for B = N := {1, 2, · · · }, B0 = {B}
B1 := {{1}, {2, 3, · · · }}
B2 := {{1}, {2}, {3, 4, · · · }}
. . .
Bk := {{1}, {2}, · · · , {k}, {k + 1, k + 2, · · · }}
. . .
tk : B → Bk: y ∈ b ∈ Bk =⇒ tk(y) = b
η: µ ≪ η
fk(y) :=
Pk(tk(y))
η(tk(y))
for y ∈ B
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them
WITMSE 2013Sanjo-Kaikan, University of To
/ 24
Generalization
Given yn = (y1, · · · , yn) ∈ Bn, s.t.
(tk(y1), · · · , tk(yn)) = (b1, · · · , bn) ∈ Bn
k ,
f n
k (yn
) := fk(y1) · · · fk(yn) =
Pk(b1) · · · Pk(bn)
η(b1) . . . η(bn)
gn
k (yn
) :=
Qn
k (b1, · · · , bn)
η(b1) · · · η(bn)
Qk: a universal Bayesian measure w.r.t. finite set Bk
Similarly,
1
n
log
f n(xn)
gn(xn)
→ 0
for any f s.t. h(fj ) → h(f ) as j → ∞
h(f ) :=
∫
−f (y) log f (y)dη(y)
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them
WITMSE 2013Sanjo-Kaikan, University of To
/ 24
Generalization
Generalization
µn
(Dn
) :=
∫
D
f n
(yn
)dηn
(yn
) , Dn
∈ Bn
νn
(Dn
) :=
∫
D
gn
(yn
)dηn
(yn
) , Dn
∈ Bn
f n(yn)
gn(yn)
=
dµn
dηn
(yn
)/
dνn
dηn
(yn
) =
dµn
dνn
(yn
)
D(µ||ν) :=
∫
dµ log
dµ
dν
h(f ) :=
∫
−f (y) log f (y)dη(y)
= −
∫
dµ
dη
(y) log
dµ
dη
(y) · dη(y) = −D(µ||η)
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them
WITMSE 2013Sanjo-Kaikan, University of To
/ 24
Generalization
Result 1
Proposition 1 (Suzuki, 2011)
.
If µ ≪ η, ν ≪ η exists s.t. νn(Rn) ≤ 1 and
1
n
log
dµn
dνn
(yn
) → 0
for any µ s.t. D(µk||η) → D(µ||η) as k → ∞.
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them
WITMSE 2013Sanjo-Kaikan, University of To
/ 24
Generalization
The solution of the exercise in Introduction
{Aj × Bk}
gn
j,k(xn
, yn
) :=
Qn
j,k(a1, b1, · · · , an, bn)
λ(a1) · · · λ(an)η(b1) · · · η(bn)
gn
(xn
, yn
) :=
∑
j,k
wj,kgn
j,k(xn
, yn
) for {ωj,k} s.t.
∑
j,k
ωj,k = 1, ωj,k > 0
1
n
log
f n(xn, yn)
gn(xn, yn)
→ 0
Solution
We estimate
f n(xn, yn)
f n(xn)f n(yn)
by
gn(xn, yn)
gn(xn)gn(yn)
extending
Qn(xn, yn)
Qn(xn)Qn(yn)
.
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them
WITMSE 2013Sanjo-Kaikan, University of To
/ 24
Generalization
Further generalization
Proposition 1 assumes
a specific histogram sequence {Bk}; and
µ should satisfy D(µk||η) → D(µ||η) as k → ∞
 
{Bk} should be universal
Construct {Bk} s.t. D(µk||η) → D(µ||η) as k → ∞ for any µ
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them
WITMSE 2013Sanjo-Kaikan, University of To
/ 24
Universal Histogram Sequence
Universal histogram sequence {Bk}
µ, σ ∈ R, σ > 0.
{Ck}∞
k=0:
C0 = {(−∞, ∞)}
C1 = {(−∞, µ], (µ, ∞)}
C2 = {(−∞, µ − σ], (µ − σ, µ], (µ, µ + σ], (µ + σ, ∞)}
· · ·
Ck → Ck+1:


(−∞, µ − (k − 1)σ] → (−∞, µ − kσ], (µ − kσ, µ − (k − 1)σ]
(a, b] → (a, a+b
2 ], (a+b
2 , b]
(µ + (k − 1)σ, ∞) → (µ + (k − 1)σ, µ + kσ], (µ + kσ, ∞)
 
B: the set in which Y takes values
Bk := {B ∩ c|c ∈ Ck}{ϕ} .
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them
WITMSE 2013Sanjo-Kaikan, University of To
/ 24
Universal Histogram Sequence
B = R and µ ≪ λ
{Bk} = {Ck}
For each y ∈ B, there exist K ∈ N and a unique {(ak, bk]}∞
k=K s.t.
{
y ∈ [ak, bk] ∈ Bk , k = K, K + 1, · · ·
|ak − bk| → 0 , k → ∞
FY : the distribution function of Y



fk(y) =
P(Y ∈ (ak, bk])
λ((ak, bk])
=
FY (bk) − FY (ak)
bk − ak
→ f (y) , y ∈ B
h(fk) → h(f )
as k → ∞ for any f
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them
WITMSE 2013Sanjo-Kaikan, University of To
/ 24
Universal Histogram Sequence
B = N and µ ≪ η
B0 = {B}
B1 := {{1}, {2, 3, · · · }}
B2 := {{1}, {2}, {3, 4, · · · }}
. . .
Bk := {{1}, {2}, · · · , {k}, {k + 1, k + 2, · · · }}
. . .
can be obtained via µ = 1, σ = 1.
For each y ∈ B, there exists K ∈ N and a unique {Dk}∞
k=1 s.t.
{
y ∈ Dk ∈ Bk k = 1, 2, · · ·
{y} = Dk ∈ Bk, k = K, K + 1, · · ·



fk(y) =
P(Y ∈ Dk)
η(Dk)
→ f (y) =
P(Y = y)
η({y})
, y ∈ B
h(fk) → h(f )
as k → ∞ for any f
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them
WITMSE 2013Sanjo-Kaikan, University of To
/ 24
Universal Histogram Sequence
Result 2
Theorem 1
.
.
If µ ≪ η, ν ≪ η exists s.t. νn(Rn) ≤ 1 and for any µ
1
n
log
dµn
dνn
(yn
) → 0
The proof is based on the following observation:
Billingeley: Probability & Measure, Problem 32.13
lim
h→0
µ((x − h, x + h])
η((x − h, x + h])
= f (x) , x ∈ R
to remove the condition Ryabko posed:
“for any µ s.t. D(µk||η) → D(µ||η) as k → ∞”
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them
WITMSE 2013Sanjo-Kaikan, University of To
/ 24
Conclusion
Summary and Discussion
Universal Bayesian Measure
.
.the random variables may be either discrete or continuous
a universal histogram sequence to remove Ryabko’s condition
Many Applications
.
Bayesian network structure estimation (DCC 2012)
The Bayesian Chow-Liu Algorithm (PGM 2012)
Markov order estimation even when {Xi } is continuous
Extending MDL:
gn(yn|m): the universal Bayesian measure w.r.t. model m given yn ∈ Bn
pm: the prior probability of model m
− log gn
(yn
|m) − log pm → min
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them
WITMSE 2013Sanjo-Kaikan, University of To
/ 24

More Related Content

PDF
Universal Prediction without assuming either Discrete or Continuous
PDF
The Universal Measure for General Sources and its Application to MDL/Bayesian...
PDF
MDL/Bayesian Criteria based on Universal Coding/Measure
PDF
Bayesian Criteria based on Universal Measures
PDF
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
PDF
A Conjecture on Strongly Consistent Learning
PDF
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
PDF
Development of implicit rational runge kutta schemes for second order ordinar...
Universal Prediction without assuming either Discrete or Continuous
The Universal Measure for General Sources and its Application to MDL/Bayesian...
MDL/Bayesian Criteria based on Universal Coding/Measure
Bayesian Criteria based on Universal Measures
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Conjecture on Strongly Consistent Learning
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
Development of implicit rational runge kutta schemes for second order ordinar...

What's hot (20)

PDF
Multilinear singular integrals with entangled structure
PDF
Principal Component Analysis for Tensor Analysis and EEG classification
PDF
A Szemeredi-type theorem for subsets of the unit cube
PDF
Scattering theory analogues of several classical estimates in Fourier analysis
PPTX
Expectation Maximization Algorithm with Combinatorial Assumption
PDF
A Szemerédi-type theorem for subsets of the unit cube
PDF
Elementary Linear Algebra 5th Edition Larson Solutions Manual
PDF
Common fixed point and weak commuting mappings
PDF
Density theorems for anisotropic point configurations
PDF
Graph partitioning and characteristic polynomials of Laplacian matrics of Roa...
PDF
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
PDF
A sharp nonlinear Hausdorff-Young inequality for small potentials
PDF
Density theorems for Euclidean point configurations
PDF
Tales on two commuting transformations or flows
PDF
Paraproducts with general dilations
PDF
Solution homework2
PDF
Tensor Train decomposition in machine learning
PDF
Isoparametric mapping
PDF
Boundedness of the Twisted Paraproduct
PPT
Constant strain triangular
Multilinear singular integrals with entangled structure
Principal Component Analysis for Tensor Analysis and EEG classification
A Szemeredi-type theorem for subsets of the unit cube
Scattering theory analogues of several classical estimates in Fourier analysis
Expectation Maximization Algorithm with Combinatorial Assumption
A Szemerédi-type theorem for subsets of the unit cube
Elementary Linear Algebra 5th Edition Larson Solutions Manual
Common fixed point and weak commuting mappings
Density theorems for anisotropic point configurations
Graph partitioning and characteristic polynomials of Laplacian matrics of Roa...
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
A sharp nonlinear Hausdorff-Young inequality for small potentials
Density theorems for Euclidean point configurations
Tales on two commuting transformations or flows
Paraproducts with general dilations
Solution homework2
Tensor Train decomposition in machine learning
Isoparametric mapping
Boundedness of the Twisted Paraproduct
Constant strain triangular
Ad

Viewers also liked (8)

PPT
香港六合彩
PDF
香港經濟日報: 企業宣傳新策:博低頭族一Like 20130923
PDF
2013 IEEE International Symposium on Information Theory
PPT
香港六合彩
PDF
公開鍵暗号3: ナップザック暗号
PDF
公開鍵暗号(7): データ圧縮
PDF
Ansys.结构有限元高级分析方法与范例应用
香港六合彩
香港經濟日報: 企業宣傳新策:博低頭族一Like 20130923
2013 IEEE International Symposium on Information Theory
香港六合彩
公開鍵暗号3: ナップザック暗号
公開鍵暗号(7): データ圧縮
Ansys.结构有限元高级分析方法与范例应用
Ad

Similar to WITMSE 2013 (20)

PDF
Bayes Independence Test
PDF
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
PDF
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
PDF
Properties of field induced Josephson junction(s)
PDF
Bayesian network structure estimation based on the Bayesian/MDL criteria when...
PDF
Stochastic Alternating Direction Method of Multipliers
PDF
Low Complexity Regularization of Inverse Problems
PDF
Tutorial7
PDF
02 basics i-handout
PDF
The Universal Bayesian Chow-Liu Algorithm
PDF
Maximum likelihood estimation of regularisation parameters in inverse problem...
PDF
lecture-2-not.pdf
PDF
2014 9-22
PDF
QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...
PDF
Statistical Hydrology for Engineering.pdf
PDF
Model Selection with Piecewise Regular Gauges
PDF
PDF
Lecture9 xing
PDF
1 hofstad
Bayes Independence Test
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Properties of field induced Josephson junction(s)
Bayesian network structure estimation based on the Bayesian/MDL criteria when...
Stochastic Alternating Direction Method of Multipliers
Low Complexity Regularization of Inverse Problems
Tutorial7
02 basics i-handout
The Universal Bayesian Chow-Liu Algorithm
Maximum likelihood estimation of regularisation parameters in inverse problem...
lecture-2-not.pdf
2014 9-22
QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...
Statistical Hydrology for Engineering.pdf
Model Selection with Piecewise Regular Gauges
Lecture9 xing
1 hofstad

More from Joe Suzuki (20)

PPTX
RとPythonを比較する
PPTX
R集会@統数研
PPTX
E-learning Development of Statistics and in Duex: Practical Approaches and Th...
PPTX
分枝限定法でモデル選択の計算量を低減する
PPTX
連続変量を含む条件付相互情報量の推定
PPTX
E-learning Design and Development for Data Science in Osaka University
PPTX
UAI 2017
PPTX
AMBN2017 サテライトワークショップ
PPTX
CRAN Rパッケージ BNSLの概要
PPTX
Forest Learning from Data
PPTX
A Bayesian Approach to Data Compression
PPTX
研究紹介(学生向け)
PPTX
Efficietly Learning Bayesian Network Structures based on the B&B Strategy: A ...
PPTX
Forest Learning based on the Chow-Liu Algorithm and its Application to Genom...
PPTX
2016 7-13
PPTX
Structure Learning of Bayesian Networks with p Nodes from n Samples when n&lt...
PDF
連続変量を含む相互情報量の推定
PPTX
Jeffreys' and BDeu Priors for Model Selection
PPTX
離散と連続の入り混じった相互情報量を推定して、 SNP と遺伝子発現量の因果関係をさぐる
PPTX
MaCaulay2 Miuraパッケージの開発と今後
RとPythonを比較する
R集会@統数研
E-learning Development of Statistics and in Duex: Practical Approaches and Th...
分枝限定法でモデル選択の計算量を低減する
連続変量を含む条件付相互情報量の推定
E-learning Design and Development for Data Science in Osaka University
UAI 2017
AMBN2017 サテライトワークショップ
CRAN Rパッケージ BNSLの概要
Forest Learning from Data
A Bayesian Approach to Data Compression
研究紹介(学生向け)
Efficietly Learning Bayesian Network Structures based on the B&B Strategy: A ...
Forest Learning based on the Chow-Liu Algorithm and its Application to Genom...
2016 7-13
Structure Learning of Bayesian Networks with p Nodes from n Samples when n&lt...
連続変量を含む相互情報量の推定
Jeffreys' and BDeu Priors for Model Selection
離散と連続の入り混じった相互情報量を推定して、 SNP と遺伝子発現量の因果関係をさぐる
MaCaulay2 Miuraパッケージの開発と今後

Recently uploaded (20)

PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PPT
Geologic Time for studying geology for geologist
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
Training Program for knowledge in solar cell and solar industry
PDF
Flame analysis and combustion estimation using large language and vision assi...
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PPTX
Configure Apache Mutual Authentication
PDF
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
PDF
Credit Without Borders: AI and Financial Inclusion in Bangladesh
PDF
sbt 2.0: go big (Scala Days 2025 edition)
PPT
What is a Computer? Input Devices /output devices
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PPTX
The various Industrial Revolutions .pptx
PPTX
Benefits of Physical activity for teenagers.pptx
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
A review of recent deep learning applications in wood surface defect identifi...
Comparative analysis of machine learning models for fake news detection in so...
Taming the Chaos: How to Turn Unstructured Data into Decisions
Geologic Time for studying geology for geologist
Zenith AI: Advanced Artificial Intelligence
Training Program for knowledge in solar cell and solar industry
Flame analysis and combustion estimation using large language and vision assi...
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
Configure Apache Mutual Authentication
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
Credit Without Borders: AI and Financial Inclusion in Bangladesh
sbt 2.0: go big (Scala Days 2025 edition)
What is a Computer? Input Devices /output devices
The influence of sentiment analysis in enhancing early warning system model f...
The various Industrial Revolutions .pptx
Benefits of Physical activity for teenagers.pptx
Module 1.ppt Iot fundamentals and Architecture
A contest of sentiment analysis: k-nearest neighbor versus neural network
Custom Battery Pack Design Considerations for Performance and Safety
A proposed approach for plagiarism detection in Myanmar Unicode text

WITMSE 2013

  • 1. . . The MDL principle for arbitrary data: either discrete or continuous or none of them Joe Suzuki Osaka University WITMSE 2013 Sanjo-Kaikan, University of Tokyo, Japan August 26, 2013 Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them WITMSE 2013Sanjo-Kaikan, University of To / 24
  • 2. Road Map Road Map 1 Problem 2 The Ryabko measure 3 The Radon-Nikodym theorem 4 Generalization 5 Universal Histogram Sequence 6 Conclusion Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them WITMSE 2013Sanjo-Kaikan, University of To / 24
  • 3. Road Map The slides of this talk can be seen via Internet keywords: Joe Suzuki slideshare http://www.slideshare.net/prof-joe/ Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them WITMSE 2013Sanjo-Kaikan, University of To / 24
  • 4. Problem Given {(xi , yi )}n i=1, identify whether X ⊥⊥ Y or not A, B: finite sets xn = (x1, · · · , xn) ∈ An, yn = (y1, · · · , yn) ∈ Bn Pn(xn|θ), Pn(yn|θ), Pn(xn, yn|θ): expressed by parameter θ p: the prior probability of X ⊥⊥ Y Bayesian solution . . X ⊥⊥ Y ⇐⇒ pQn (xn )Qn (yn ) ≥ (1 − p)Qn (xn , yn ) Qn (xn ) := ∫ Pn (xn |θ)w(θ)dθ , Qn (yn ) := ∫ Pn (yn |θ)w(θ)dθ Qn (xn , yn ) := ∫ Pn (xn , yn |θ)w(θ)dθ using a weight w over θ. Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them WITMSE 2013Sanjo-Kaikan, University of To / 24
  • 5. Problem Q should be an alternative to P as n grows A: the finite set in which X takes values. Q is a Bayesian measure . . Kraft’s inequality: ∑ xn∈An Qn (xn ) ≤ 1 (1)   For Example, Qn(xn) = |A|−n, xn ∈ An satisfies (1); but does not converges to Pn in any sense Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them WITMSE 2013Sanjo-Kaikan, University of To / 24
  • 6. Problem Universal Bayesian Measures Qn (xn ) := ∫ Pn (xn |θ)w(θ)dθ w(θ) ∝ ∏ x∈A θ−a[x] with {a[x] = 1 2 }x∈A (Krichevsky-Trofimov) − 1 n log Qn (xn ) → H(P) for any Pn (xn |θ) = ∏ x∈A θ−c[x] with {c[x]}x∈A in xn ∈ An. Shannon McMillian Breiman: − 1 n log Pn (xn |θ) −→ H(P) for any stationary ergodic P, so that for Pn(xn) := Pn(xn|θ), 1 n log Pn(xn) Qn(xn) → 0 . (2) Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them WITMSE 2013Sanjo-Kaikan, University of To / 24
  • 7. Problem When X has a density function f There exists a g s.t. ∫ xn∈Rn gn (xn ) ≤ 1 (3) 1 n log f n(xn) gn(xn) → 0 (4) for any f satisfying a condition mentioned later (Ryabko 2009). Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them WITMSE 2013Sanjo-Kaikan, University of To / 24
  • 8. Problem The problem in this paper Universal Bayesian measure in the general settings . . What are (1)(2) and (3)(4) for general random variables ? 1 without assuming either discrete or continuous 2 removing the constraint Ryabko poses: Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them WITMSE 2013Sanjo-Kaikan, University of To / 24
  • 9. The Ryabko measure Ryabko measure: X has a density function f A: the set in which X takes values. {Aj }∞ j=0 : { A0 := {A} Aj+1 is a refinement of Aj For example, for A = [0, 1), A0 = {[0, 1)} A1 = {[0, 1/2), [1/2, 1)} A2 = {[0, 1/4), [1/4, 1/2), [1/2, 3/4), [3/4, 1)} . . . Aj = {[0, 2−(j−1)), [2−(j−1), 2 · 2−(j−1)), · · · , [(2j−1 − 1)2−(j−1), 1)} . . . sj : A → Aj : x ∈ a ∈ Aj =⇒ sj (x) = a λ: the Lebesgue measure fj (x) := Pj (sj (x)) λ(sj (x)) for x ∈ A Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them WITMSE 2013Sanjo-Kaikan, University of To / 24
  • 10. The Ryabko measure Given xn = (x1, · · · , xn) ∈ An s.t. (sj (x1), · · · , sj (xn)) = (a1, · · · , an) ∈ An j , f n j (xn ) := fj (x1) · · · fj (xn) = Pj (a1) · · · Pj (an) λ(a1) . . . λ(an) . gn j (xn ) := Qn j (a1, · · · , an) λ(a1) · · · λ(an) Qj : a universal Bayesian measure w.r.t. finite set Aj .   f n(xn) := f (x1) · · · f (xn) gn (xn ) := ∞∑ j=0 wj gn j (xn ) for {ωj }∞ j=1 s.t. ∑ j ωj = 1, ωj > 0 1 n log f n(xn) gn(xn) → 0 for any f s.t. differential entropy h(fj ) → h(f ) as j → ∞ (Ryabko, 2009) Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them WITMSE 2013Sanjo-Kaikan, University of To / 24
  • 11. The Radon-Nikodym theorem In general, exactly when a density function exists ? B: the entire Borel sets of R µ(D) := P(X ∈ D): the probability of (X ∈ D) for D ∈ B FX : the distribution function of X µ is absolutely continuous w.r.t. λ (µ ≪ λ) . The following two are equivalent: 1 f : R → R exists s.t. P(X ≤ x) = FX (x) = ∫ t≤x f (t)dt 2 for any D ∈ B, λ(D) := ∫ D dx = 0 =⇒ µ(D) = 0. f (x) = dFX (x) dx Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them WITMSE 2013Sanjo-Kaikan, University of To / 24
  • 12. The Radon-Nikodym theorem Even discrete variables have density functions! B: a countable subset of R µ(D) := P(X ∈ D): the probability of (X ∈ D) for D ⊆ B r : B → R µ is absolutely continuous w.r.t. η (µ ≪ η) . 1 f : B → R exists s.t. P(X ∈ D) = ∑ x∈D f (x)r(x), D ⊆ B 2 for any D ⊆ B, η(D) := ∑ x∈D r(x) = 0 =⇒ µ(D) = 0. f (x) = P(X = x) r(x) Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them WITMSE 2013Sanjo-Kaikan, University of To / 24
  • 13. The Radon-Nikodym theorem Radon-Nikodym µ, η: σ-finite measures over σ-field F µ is absolutely continuous w.r.t. η (µ ≪ η) . . 1 F-measurable f exists s.t. for any A ∈ F, µ(A) = ∫ A f (t)dη(t) 2 for any A ∈ F, η(A) = 0 =⇒ µ(A) = 0 ∫ A f (t)dη(t) := sup {Ai } ∑ i [ inf x∈Ai f (x)]η(Ai ) dµ dη := f is the density function w.r.t. η when µ is the probability measure. Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them WITMSE 2013Sanjo-Kaikan, University of To / 24
  • 14. Generalization When Y has a density function w.r.t. η s.t. µ ≪ η B: the set in which Y takes values. {Bj }∞ k=0 : { B0 := {B} Bk+1 is a refinement of Bk For example, for B = N := {1, 2, · · · }, B0 = {B} B1 := {{1}, {2, 3, · · · }} B2 := {{1}, {2}, {3, 4, · · · }} . . . Bk := {{1}, {2}, · · · , {k}, {k + 1, k + 2, · · · }} . . . tk : B → Bk: y ∈ b ∈ Bk =⇒ tk(y) = b η: µ ≪ η fk(y) := Pk(tk(y)) η(tk(y)) for y ∈ B Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them WITMSE 2013Sanjo-Kaikan, University of To / 24
  • 15. Generalization Given yn = (y1, · · · , yn) ∈ Bn, s.t. (tk(y1), · · · , tk(yn)) = (b1, · · · , bn) ∈ Bn k , f n k (yn ) := fk(y1) · · · fk(yn) = Pk(b1) · · · Pk(bn) η(b1) . . . η(bn) gn k (yn ) := Qn k (b1, · · · , bn) η(b1) · · · η(bn) Qk: a universal Bayesian measure w.r.t. finite set Bk Similarly, 1 n log f n(xn) gn(xn) → 0 for any f s.t. h(fj ) → h(f ) as j → ∞ h(f ) := ∫ −f (y) log f (y)dη(y) Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them WITMSE 2013Sanjo-Kaikan, University of To / 24
  • 16. Generalization Generalization µn (Dn ) := ∫ D f n (yn )dηn (yn ) , Dn ∈ Bn νn (Dn ) := ∫ D gn (yn )dηn (yn ) , Dn ∈ Bn f n(yn) gn(yn) = dµn dηn (yn )/ dνn dηn (yn ) = dµn dνn (yn ) D(µ||ν) := ∫ dµ log dµ dν h(f ) := ∫ −f (y) log f (y)dη(y) = − ∫ dµ dη (y) log dµ dη (y) · dη(y) = −D(µ||η) Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them WITMSE 2013Sanjo-Kaikan, University of To / 24
  • 17. Generalization Result 1 Proposition 1 (Suzuki, 2011) . If µ ≪ η, ν ≪ η exists s.t. νn(Rn) ≤ 1 and 1 n log dµn dνn (yn ) → 0 for any µ s.t. D(µk||η) → D(µ||η) as k → ∞. Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them WITMSE 2013Sanjo-Kaikan, University of To / 24
  • 18. Generalization The solution of the exercise in Introduction {Aj × Bk} gn j,k(xn , yn ) := Qn j,k(a1, b1, · · · , an, bn) λ(a1) · · · λ(an)η(b1) · · · η(bn) gn (xn , yn ) := ∑ j,k wj,kgn j,k(xn , yn ) for {ωj,k} s.t. ∑ j,k ωj,k = 1, ωj,k > 0 1 n log f n(xn, yn) gn(xn, yn) → 0 Solution We estimate f n(xn, yn) f n(xn)f n(yn) by gn(xn, yn) gn(xn)gn(yn) extending Qn(xn, yn) Qn(xn)Qn(yn) . Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them WITMSE 2013Sanjo-Kaikan, University of To / 24
  • 19. Generalization Further generalization Proposition 1 assumes a specific histogram sequence {Bk}; and µ should satisfy D(µk||η) → D(µ||η) as k → ∞   {Bk} should be universal Construct {Bk} s.t. D(µk||η) → D(µ||η) as k → ∞ for any µ Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them WITMSE 2013Sanjo-Kaikan, University of To / 24
  • 20. Universal Histogram Sequence Universal histogram sequence {Bk} µ, σ ∈ R, σ > 0. {Ck}∞ k=0: C0 = {(−∞, ∞)} C1 = {(−∞, µ], (µ, ∞)} C2 = {(−∞, µ − σ], (µ − σ, µ], (µ, µ + σ], (µ + σ, ∞)} · · · Ck → Ck+1:   (−∞, µ − (k − 1)σ] → (−∞, µ − kσ], (µ − kσ, µ − (k − 1)σ] (a, b] → (a, a+b 2 ], (a+b 2 , b] (µ + (k − 1)σ, ∞) → (µ + (k − 1)σ, µ + kσ], (µ + kσ, ∞)   B: the set in which Y takes values Bk := {B ∩ c|c ∈ Ck}{ϕ} . Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them WITMSE 2013Sanjo-Kaikan, University of To / 24
  • 21. Universal Histogram Sequence B = R and µ ≪ λ {Bk} = {Ck} For each y ∈ B, there exist K ∈ N and a unique {(ak, bk]}∞ k=K s.t. { y ∈ [ak, bk] ∈ Bk , k = K, K + 1, · · · |ak − bk| → 0 , k → ∞ FY : the distribution function of Y    fk(y) = P(Y ∈ (ak, bk]) λ((ak, bk]) = FY (bk) − FY (ak) bk − ak → f (y) , y ∈ B h(fk) → h(f ) as k → ∞ for any f Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them WITMSE 2013Sanjo-Kaikan, University of To / 24
  • 22. Universal Histogram Sequence B = N and µ ≪ η B0 = {B} B1 := {{1}, {2, 3, · · · }} B2 := {{1}, {2}, {3, 4, · · · }} . . . Bk := {{1}, {2}, · · · , {k}, {k + 1, k + 2, · · · }} . . . can be obtained via µ = 1, σ = 1. For each y ∈ B, there exists K ∈ N and a unique {Dk}∞ k=1 s.t. { y ∈ Dk ∈ Bk k = 1, 2, · · · {y} = Dk ∈ Bk, k = K, K + 1, · · ·    fk(y) = P(Y ∈ Dk) η(Dk) → f (y) = P(Y = y) η({y}) , y ∈ B h(fk) → h(f ) as k → ∞ for any f Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them WITMSE 2013Sanjo-Kaikan, University of To / 24
  • 23. Universal Histogram Sequence Result 2 Theorem 1 . . If µ ≪ η, ν ≪ η exists s.t. νn(Rn) ≤ 1 and for any µ 1 n log dµn dνn (yn ) → 0 The proof is based on the following observation: Billingeley: Probability & Measure, Problem 32.13 lim h→0 µ((x − h, x + h]) η((x − h, x + h]) = f (x) , x ∈ R to remove the condition Ryabko posed: “for any µ s.t. D(µk||η) → D(µ||η) as k → ∞” Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them WITMSE 2013Sanjo-Kaikan, University of To / 24
  • 24. Conclusion Summary and Discussion Universal Bayesian Measure . .the random variables may be either discrete or continuous a universal histogram sequence to remove Ryabko’s condition Many Applications . Bayesian network structure estimation (DCC 2012) The Bayesian Chow-Liu Algorithm (PGM 2012) Markov order estimation even when {Xi } is continuous Extending MDL: gn(yn|m): the universal Bayesian measure w.r.t. model m given yn ∈ Bn pm: the prior probability of model m − log gn (yn |m) − log pm → min Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them WITMSE 2013Sanjo-Kaikan, University of To / 24