9 July 2008
Source Coding and Simulation
Robert M. Gray
Information Systems Laboratory
Department of Electrical Engineering
Stanford, CA 94305
rmgray@[Link]
[Link] gray
Historical and recent research described here was supported in part by
Source Coding and Simulation 1
Source coding and simulation
Source coding/compression/quantization
source reproduction
bits-
{Xn} -
encoder decoder -
{X̂n}
Simulation/synthesis/fake process
simulation
random bits -
coder -
{X̃n}
Source Coding and Simulation 2
Source
X = {Xn; n ∈ Z} stationary and ergodic random process, distribution µ
Xn ∈ AX = alphabet: discrete, continuous, or mixed
random vectors X N = (X0, X1, · · · , XN−1), distribution µN
Shannon entropy
P
− x N µ N N
(x ) log µ N N
(x ) AX discrete
H(X ) = H(µ ) =
N N
∞
otherwise
Shannon entropy (rate) H(X) = H(µ) = inf H(X N )/N = lim H(X N )/N
N N→∞
& other information measures
Source Coding and Simulation 3
Source coding with a fidelity criterion
[Shannon (1959)]
Communicate a source {Xn} to a user through a bit pipe
source bits reproduction
{Xn} -
encoder -
decoder -
{X̂n}
What is the best tradeoff between the rate in bits per source sample
and the quality of the reproduction with respect to the input?
Shannon rate-distortion theory, source coding with a fidelity
criterion, lossy data compression, quantization
Source Coding and Simulation 4
The simulation problem (1977)
Simulate (synthesize, imitate, model, fake) a source {Xn}
simulation
random bits -
coder -
{X̃n}
What is the best simulation of the source given
• a simple random bit generation, e.g., coin flips (iid),
• a stationary (time-invariant) coder, and
• a constraint on # of bits (possibly infinite) per simulated symbol?
Source Coding and Simulation 5
Would like a simulated process to
• have key properties of original process: stationarity, ergodicity,
mixing, 0-1 law (purely nondeterministic, K)
• “resemble” original as closely as possible
• be perfect if bitrate sufficient. I.e., same distributions as original
source. What stationary ergodic processes have exactly this form?
(Not all do!) (modeling, taxonomy of random processes)
An alternative notion of simulation introduced by Steinburg and
Verdu (1996) and related to source coding. Does not require
stationarity and ergodicity (or preservation of such properties).
Source Coding and Simulation 6
O An information theoretic “folk theorem”
If source code nearly optimal,
then bits ≈ iid fair coin flips
source
↓ reproduction
bits
{Xn} -
encoder -
decoder -
{X̂n}
Bits are maximally informative, maximum entropy.
True??
Coin flips provide simple input mechanism for simulation.
Suggests connection between source coding and simulation:
Source Coding and Simulation 7
Source coding/compression
source reproduction
bits-
{Xn} -
encoder 6
decoder -
{X̂n}
Simulation/synthesis/fake process
?
simulation
random bits -
coder -
{X̃n}
Does nearly optimal performance ⇒ “nearly” iid bits?
Are source decoders and source simulators equivalent?
Are source coding and simulation equivalent?
Source Coding and Simulation 8
Coding
Two basic coding structures for coding a process X with alphabet
AX into Y with alphabet AY :
Block coding (BC) Map each nonoverlapping block of source
symbols into an index or block of encoded symbols (e.g., bits)
(standard for IT)
Sliding-block coding (SBC) Map overlapping blocks of source
symbols into single encoded symbol (e.g., bit)
(standard for ergodic theory)
There are constructions in IT and ergodic theory to get BC from
SBC & vice-versa.
Source Coding and Simulation 9
Block Coding E : ANX → AYN (or other index set), N = block length
··· , | X−N , X−N+1
{z, . . . , X−1}, X0, X1, {z
| . . . , XN−1}, XN , X1, {z
| . . . , X2N−1}, · · ·
··· , ↓ E ↓ E ↓ E ···
· · · , Y−N , Y−N+1, . . . , Y−1, Y0, Y1, . . . , YN−1, YN , Y1, . . . , Y2N−1, · · ·
z }| { z }| { z }| {
Sliding-block Coding N =window length = N1 +N2 +1, f : ANX → AY
· · · , Xn−N1 , Xn−N1+1, · · · , Xn, Xn+1, · · · , Xn+N2 , Xn+N2+1, · · ·
| {z }
slide window −→ | {z }
f f
?
Yn = f (Xn−N1 , . . . , Xn, . . . , Xn+N2 )
Yn+1 = f (Xn−N1+1, . . . , Xn+1, . . . , Xn+N2+1)
?
Source Coding and Simulation 10
Block coding
U
• Far more known about design: e.g., transform codes, vector
quantization, clustering
D
• Does not preserve key properties (stationarity, ergodicity, mixing,
0-1 law)
In general output neither stationary nor ergodic (it is N -stationary
and can have a periodic structure, not necessarily N -ergodic).
Can “stationarize” with uniform random start, but retains possible
periodicities. Not equivalent to SBC of input.
• Not defined for infinite block length, no limiting codes. D
Source Coding and Simulation 11
Sliding-block (stationary, sliding-window) coding
• preserves key properties of input process: stationarity, ergodicity,
mixing, 0-1 law
• well-defined for N = ∞. Infinite codes can be approximated by finite
codes. Sequence of finite codes can converge.
• models many communication and signal processing techinques:
time-invariant convolutional codes, predictive quantization, nonlinear
and linear time-invariant filtering, wavelet coefficient evaluation
• used to prove fundamental results in ergodic theory, e.g., the
Kolmogorov-Sinai-Ornstein isomorphism theorem:
Source Coding and Simulation 12
Sliding-block coding and isomorphism
A sliding-block code (SBC) has the form
{Xn} - Xn+N2 ··· Xn ··· Xn−N1
XXX
XXX
XXX
XXX
XXX
XXX
XXX '$
?
XXX
f
Xz
X 9
&%
- Yn = f (Xn−N1 , . . . , Xn, . . . , Xn+N2 )
Infinite N1, N2 are allowed. Two processes are isomorphic if there
exists an invertible SBC from either process to the other.
A process is a B-process if it is an SBC of an iid process.
Ornstein proved (1970) that two B-processes are isomorphic iff
their entropy rates are equal.
Source Coding and Simulation 13
Source Coding: Block Coding
Distortion measure dN (x , y ) = d1(xi, yi)
N N 1 PN−1
N i=0
Codebook/Decoder CN = {DN (i); i ∈ I}, |I| = M
Encoder EN : ANX → I
Distortion D(EN , DN ) = E dN (X , DN (EN (X )))
N N
1
N log M
fixed-rate
Rate R(EN ) =
N −1 H(EN (X N )) variable-rate
Source Coding and Simulation 14
Optimal performance? Operational distortion-rate function (DRF)
δ(N)
BC (R) = inf D(EN , DN )
EN ,DN :R(EN )≤R
δBC(R) = inf δ(N)
BC (R) = lim δ(N)
BC (R)
N N→∞
Not computable. Evaluate by Shannon DRF:
DX (R) = inf DN (R) = lim DN (R)
N N→∞
DN (R) = inf EdN (X N , Y N )
pN :pN ⇒µN ,N −1 I(X N ,Y N )≤R
Block Source Coding Theorem: For a stationary and
ergodic source∗, δBC(R) = DX (R)
*With the usual technical conditions.
Source Coding and Simulation 15
Source Coding: Sliding-Block Coding
Encoder fN : ANX → AU , Un = fN (Xn−N1 , . . . , Xn+N2 )
K
Decoder gK : AU → ÂX , X̂n = gK (Un−K1 , . . . , Un+K2 )
Distortion D( f, g) = E d1(X0, X̂0) , Rate R( f ) = log |AU |
Optimal performance:
SBC (R) =
δ(N,K) inf D( fN , gK )
fN ,gK :R( f )≤R
δSBC(R) = inf δ(N,K)
SBC (R) = inf D( f, g)
N,K f,g:R( f )≤R
Sliding-lock Source Coding Theorem: For a stationary
and ergodic source∗, δBC(R) = δSBC(R) = DX (R)
*ditto
Source Coding and Simulation 16
X0, X1,{z
| . . . , XN−1
} XN , X1, {z
| . . . , X2N−1
} X2N , X1,{z
| . . . , X3N−1} ···
↓ EN ↓ EN ↓ EN ···
| 0, U1,{z. . . , U N−1 U N , U N+1{z, . . . , U2N−1 | 2N , U2N+1 , . . . , U3N−1
z }| { z }| { z }| {
U } | } U {z } ···
↓ DN ↓ DN ↓ DN ···
z }| { z }| { z }| {
X̂0, X̂1, . . . , X̂N−1 X̂N , X̂1, . . . , X̂2N−1 X̂2N , X̂1, . . . , X̂3N−1 ···
vs. · · · , Xn−N1−1 Xn−N1 , · · · , Xn, · · · , Xn+N2 , Xn+N2+1, · · ·
| {z }
↓ f
· · · , Un−K1−1, Un−K1 , · · · , Un, · · · , Un+K2 , Un+K2+1, · · ·
| {z }
↓g
· · · , X̂n−1, X̂n, X̂n+1, · · ·
If coding nearly optimal, is Un nearly iid?
Source Coding and Simulation 17
Process Distance Measures
How quantify “nearly iid”?
Related: How quantify “best” simulation?
One approach: process distortion measures
Useful example in information theory and ergodic theory:
d̄-distance: Kantorovich/Vasershtein/Ornstein distance
Source Coding and Simulation 18
Basic ideas:
Two stationary random processes, X with distribution µ, Y with
distribution ν. Vector distortion dN .
d̄N (µN , νN ) = inf E pN dN (X N , Y N )
pN ⇒µN ,νN
d̄(µ, ν) = sup d̄N (µN , νN ) = inf E pd1(X0, Y0)
N p⇒µ,ν
Smallest achievable distortion between two processes with given
marginals over all joint distributions consistent with marginals.
Many equivalent definitions. E.g., how much have to change one
typical sequence of one source to get a typical sequence of another.
Source Coding and Simulation 19
Historical aside
d̄N rediscovered and renamed numerous times.
Kantorovich (1942): metrics on compact metric spaces. Often
called the Kantorovich or transportation metric. Inseparable from
development of linear programming.
Early focus on scalar case and `r norms: Dall’Aglio (1956), Frechet
(1956), Vasershtein/Wasserstein (1969), Mallows (1972), Vallender
(1973).
Ornstein (1970-73) used the idea with the Hamming distance on
vectors and processes. Called the d̄ distance. First appearance as
distance measure on processes.
Source Coding and Simulation 20
Gray, Neuhoff, and Shields (1975) considered vector and process
case using additive distortion measures, including d1(x, y) = |x − y|2,
calling the distortion ρ̄ after Ornstein. Vector case equivalent to
subsequent development of Kantorovich for `r norms to vectors (Lr -
minimal metric):
∆ h i 1r
ρ̄1/r (µN , νN ) = `r (µN , νN ) = N d̄N (µN , νN )
h i 1r
= inf E(||X N − Y N ||rr )
pN ⇒µN ,νN
Usually reserve notation d̄ for Ornstein (Hamming), use ρ̄ for `rr
Rediscovered as “earth mover’s distance” in CS literature, used in
clustering algorithms for pattern recognition. Later renamed (1981)
“Mallows distance” after 1972 rediscovery of scalar Kantorovich.
Source Coding and Simulation 21
Properties
• Ornstein d̄ distance and Lr -minimal distance/ρ̄1/r are metrics.
• Infimum is actually a minimum.
• The class of all B-processes of a given alphabet is the closure under
Ornstein’s d̄ of all k-step mixing Markov processes of that alphabet.
• Entropy rate is continuous in d̄, Shannon DRF in ρ̄
• Can evaluate ρ̄ for iid, purely nondeterministic Gaussian processes,
filtered uniform iid, d̄ for discrete iid. In general a linear
programming problem.
Source Coding and Simulation 22
Application 1: Geometric view of source coding
δBC(R) = δSBC(R) = DX (R) = inf ρ̄(µ, ν)
ν:H(ν)≤R
[Gray, Neuhoff, and Omura (1974)]
A form of simulation, but cannot say ν generated from iid.
Distance to “closest” process in ρ̄ with entropy rate ≤ R
Compare with process version of Shannon DRF [Marton (1972)]:
DX (R) = inf E[d1(X0, Y0)]
p:p⇒µ,I(X,Y)≤R
Source Coding and Simulation 23
Application 2: Quantization as distribution
approximation
[Pollard (1982), Graf and Luschgy (2000)]
(Vector) Quantizer ⇔ probability distribution on codebook
Block coding/quantization: fixed rate
δ(N)
BC (R) = inf ρ̄ N (µ ,ν )
N N
νN
Minimum is over all discrete distributions νN with 2NR atoms.
Suppose discrete distribution (π, CN ) = {πi, yi; i = 1, . . . , 2NR},
P2NR
i=1 π i = 1, yi ∈ A N
X , solves minimization ⇒
a discrete simulation of X N ⇒ block independent, N -stationary
process simulation
Source Coding and Simulation 24
Application 3: Optimal simulation and source coding
A definition of optimal simulation of process X ∼ µ using an SBC of
an iid process Z [Gray (1977)]:
∆(X|Z) = inf ρ̄(µ, µ̃)
µ̃:Zn- g - X̃n ∼ µ̃
Sliding-block coding reduces entropy ⇒ H(Z) ≥ H(µ̃) ⇒
∆(X|Z) ≥ inf ρ̄(µ, µ̂) = DX (H(Z))
stationary ergodic µ̂:H(µ̂)≤H(Z)
Source Coding and Simulation 25
If X is a B-process, converse is true and
∆(X|Z) = DX (H(Z)) = δSBC(H(Z))
⇒ the source coding and simulation problems are
equivalent if the source is a B-process
Proof: Choose f, g: Ed1(X0, X̂0) ≈ DX (1),
iid Zn, H(Z) = 1 ≥ H(U)
?
α Sinai theorem
Xn -
f ? -
g -
X̂n ∼ µ̂
Un
Cascade β = gα is SBC producing X̂ from Z , Ed1(X0, X̂0) ≥ ∆(X|Z).
Source Coding and Simulation 26
Bit behavior for near optimal codes
Suppose use block code CN to code source. Let π denote induced
index pmf.
What can be said about π if code performance near Shannon
optimal?
Approximately uniform, like 2N coin flips?
Sort of . . .
Shannon ⇒ there is an asymptotically optimal sequence of block
codes C(N) for which DN = EdN (X N , X̂ N ) ↓ DX (1)
Source Coding and Simulation 27
RX (D) is a continuous function, hence
1 = N −1 log2 2N ≥ N −1 H(E(X N )) ≥ N −1 H(X̂ N )
≥ N −1 I(X N ; X̂ N ) ≥ RN (DN ) ≥ RX (DN ) → 1
N→∞
As blocklength grows, indexes have maximal per symbol entropy and
hence can be thought of as approximately uniformly distributed, but
not stationary or ergodic and can not get process theorem — does
not determine entropy rate or show that overall process behavior is
like coin flips, even if stationarize.
If use SBCs, can get rigorous process version:
Source Coding and Simulation 28
Choose f (N), g(N) so that DN = D( f (N), g(N)) ↓ DX (1)
Let U (N), X̂ (N) denote encoded and reproduction processes
(necessarily stationary and ergodic)
1 ≥ H(U (N)) ≥ H(X̂ (N)) ≥ I(X, X̂ (N))
≥ R(DN ) → 1
N→∞
lim H(U (N)) = 1 ⇒ lim d̄(U (N), Z) = 0
N→∞ N→∞
Proof: Marton’s inequality for relative entropy and d̄ (T. Linder)
As average distortion nears Shannon limit for stationary ergodic
source, binary channel process approaches coin flips in d̄
Source Coding and Simulation 29
Recap
Old: If source is a stationary filtering of an iid process (a B-process,
discrete or continuous alphabet), then the source coding problem
and the simulation problem have the same solution (and the optimal
simulator and decoder are equivalent).
New: If stationary source coding performs close to Shannon
optimum, encoded process is close to iid in d̄
Frosting: An excuse to present ideas of modeling, coding,
and process distance measures common to ergodic theory and
information theory.
Source Coding and Simulation 30
A few final thoughts and questions
• The d̄ close to iid property is nice for intuition, but does it actually
help?
E.g., B-processes (SBC of iid process) have many special properties.
Are there weak versions of those properties for processes that = SBC
of a process d̄-close to iid?
• Does equivalence of source coding and simulation hold for
the more general case of stationary and ergodic sources? —
Steinberg/Verdu results hold more generally, but in ergodic theory
it is known that there are stationary, ergodic, mixing, purely
nondeterministic processes which are not d̄ close to a B-process.
Source Coding and Simulation 31
• Source coding as “almost isomorphism,” avoids hard part
(invertibility).
• How does fitting a model using ρ̄ compare to the Itakura-Saito
distortion used in speech processing to fit autoregressive speech
models to real speech? Can Marton/Talagrand inequalities be
extended? (Steinberg/Verdu considered relative entropy rates in their
simulation problem formulation.)
• Shortcoming of B-processes: In speech, only model unvoiced
sounds well. Voiced sounds better modeled by periodic input to same
filter type: 0-entropy. Composite models? Connections to Pinsker’s
A
( disproved) conjecture regarding products of K processes and 0-
entropy processes?
• Simulator design, e.g., best fake Gaussian from bits?
Source Coding and Simulation 32