The Syntactic Aspect of Information
The Syntactic Aspect of Information
-/
~J
'J'
between
between
H = Id(N)
--
!I
~
C")
11
CID
(a)
~11
:I:
(1)
Ip( x)
-----
Message
Code
@
111
@@
@@
(E)@
110 101
100 011
010 001
000
(2)
Ik = -Id (Pk)
binary steps (figure 10b), Shannon and Weaver, and independently ofthem Norbert Wiener, have named Ikthe information content
of a message Xkwith prior probability p/2,33
The quantification
of information-for
example, as attempted
in equation (2)-contains
as an essential starting point some prior
knowledge on the part of the recipient that is characterized by the
probability distribution P. This prior knowledge is naturally a
subjective property of the recipient, and the probabilities in equaFigure 10
Decision-tree for the choice of a single message out of a set of eight messages.
(a) AlI messages are equally probable. Any individual message can be chosen
by a series of three either-or decisions. (b) The messages A to H have differen t
probabilities,
of occurrence):
P(A)
P(B) = 1/4; P(C) = P(D) = 1/8; P(E) = P(F) = P(G) = P(H) = 1/16. The set is
subdivided into groups containing messages of equal probability. To choose
message A, only two binary decisions are needed. The same is true for message
B. For each of messages e and D, three decisions are needed; for the remainder, four.
N
11
ID
'4
11
(t)
11
:I:
(5
11
4:
(b)
11
11
u:
.!!.
w
Message
Code
@
11
10
011
@ @ (E)@ @
010
A step to the left is assigned the binary character "1" and a step to the right the
character "O." The code that results thus allots the fewest binary characters
(bits) to the most frequent messages. This principIe is also employed by
languages.
Frequently used words are on average shorter than rarely used
ones.
definition
of the information
content
H [bits]
of a
= O
-ld(l)
for
p < P2
(4)
-Id (P2) = 1 + 12
(5)
For the special case in which p =... =PN=l/N, equation (2) reduces
to the simpler equation (1).
If we posit the existence of N messages {xl' oo.,XN}with prior
probabilities {Pl' oo.,PJ, whereIpi = 1, the expectation value H of a
single message is given by
H = I PJk = -I PkId(jJ0
k
(6)
12
the less
(3)
2.The entropy function H attains a maximum when all the probability values Pi are equal.
Much confusion has arisen in connection with the sign of the
Shannon entropy. The quantity defined by equation (2) gives the
novelty valueofthe message Xk'while the entropyin Shannon's sense
is the expectation value of the novelty content of the message.
P,
Figure 11
Entropy of a source of information. The set of messages consists of two
independent messages x and X2 with respective prior probabilities p and
P2= 1 - p. The expectation value of the information content of a messageis
then, according to equation (6): H = - pld(p) - (l-p) ld(l- p). In the case
of even distribution, p] = P2= 1/2, the entropy H reaches its maximum; that is,
the uncertainty in the choice of a message is greatest. For p] = Oand P2= 1, H
becomes equal to zero, since a state of probability P2= 1 has no alternative and
no further information is needed to make a choice. The same applies for
p]=1'P2=O.
Thus, according
same sIgn, l.e.,
information
to Shannon,
information
and entropy
have the
= entropy
= negentropy
- ,.- -.I"'~-"-
information.
probability distribution
PJand
P= {P!' '0"
the constraintsIp = 1
(7)
= 'Lqk
[/(Pk-/(qk)]
k
=Iqk
k
In contradistinction
to Shannon
important inequality
H(Q IP)
? O
Id(qkIPk)
entropy,
(8)
(9)
in which the equality holds when and only when the distributions
Q and Pare identical, i.e., the observation has not modified but
merely confirmed the distribution (see also chapter 5).
The concept of information thus introduced can be greatly
generalized with the help of the mathematical concept of semiorder and, based on it, the theory of mixing character. The
excursions of Shannon and Rnyi then appear as special cases of
the general theory.38
A characteristic of Shannon information theory is that it always
refers to an ensemble of possible events and analyzes the uncertainty with which the occurrence of these events is associated. In
recent years, algorithmicinformation theoryhas been developed as an
alternative to this. Here, a measure for the information content of
u-rw,
"J "':}"'
v'o