Arithmetic Coding in Parallel: Jan Supol and Bo Rivoj Melichar
Arithmetic Coding in Parallel: Jan Supol and Bo Rivoj Melichar
e-mail: {supolj,melichar}@fel.cvut.cz
1 Introduction
There is still a need for data coding. The growing demand for network communication
and for storage of data signals from space are not the only examples of coding needs.
Many algorithms have been developed for text compression.
One of these is arithmetic coding [Mo98, Wi87], which is more efficient than
the widely known Huffman algorithm [Hu52]. The latter rarely produces the best
variable-size code, the arithmetic coding overcomes this problem. Arithmetic coding
can be generated in O(n) time sequentially, and we present a well scalable NC parallel
algorithm that generates the code in O(log n) time on EREW PRAM with n/log n
processors. This leads to O(n) total cost and a cost optimal algorithm.
Despite the large number of papers on the parallel Huffman algorithm (the last
known [Lb99] is work optimal) there are only a few papers on parallel arithmetic
coding. Most of these are based on a quasi-arithmetic coding [Ho92]. We know only
two exceptions. The first [Yo98] is based on an N-processor hypercube and is not
cost optimal. The second [Ji94] is mainly focused on the hardware implementation.
Authors expected the processing speed of their tree-based parallel structure eight
times as high as the speed of a sequential coder. This is still O(n) parallel time.
This paper is organized as follows. Section 2 provides a description of the sequen-
tial arithmetic coding algorithm. Section 3 presents some basic definitions. Section 4
describes the parallel prefix computation needed by our algorithm. Section 5 presents
our parallel arithmetic coding algorithm. Section 6 describes the time complexity of
our algorithm. Section 7 contains our conclusion. Note that this paper does not
mention the decoding process.
168
Arithmetic Coding in Parallel
A F R L H
S 5 5/10=0.5 0.5 1.0
W 1 1/10=0.1 0.4 0.5
I 2 2/10=0.2 0.2 0.4
M 1 1/10=0.1 0.1 0.2
1 1/10=0.1 0.0 0.1
The string of symbols S = [s0 , s1 , . . . , sn−1 ] is encoded as follows. The first charac-
ter s0 can be encoded by a number within an interval [ly , hy ) associated to a character
y = s0 , y ∈ A. This notation [a, b) means the range of real numbers from a to b, not
including b. Let us define these two bounds as LowRange and HighRange.
As more symbols are input and processed, LowRange and HighRange are updated
according to
169
Proceedings of the Prague Stringology Conference ’04
3 Definitions
Our parallel algorithm is designed to run on the Parallel Random Access Machine
(PRAM), which is a very simple synchronous model of the SIMD computer([Le92,
Qu94, Tv94]). PRAM includes many submodels of parallel machines that differ from
each other by conditions of access to the shared memory. Our algorithm works on the
Exclusive Read Exclusive Write (EREW) PRAM model, which means that no two
processors can access the same cell of the shared memory.
We define sequential time SU(n) as the worst time of the best known sequential
algorithm where n is the size of the input data. Parallel time T (n, p) is the time
elapsed from the beginning of a p-processor parallel algorithm solving a problem
instance of size n until the last (slowest) processor finishes the execution.
Consider a synchronous p-processor algorithm A with τ = T (n, p) parallel steps.
Let pi be the number of processors active (working) at step i ∈ {1, 2, . . . , τ } of A.
Then the synchronous parallel work of A is
W (n, p) = T1 + T2 + · · · + Tτ .
It is obvious that
SU(n) ≤ W (n, p) ≤ C(n, p).
If SU(n) = W (n, p) then the algorithm is work optimal. If SU(n) = C(n, p) then
the algorithm is cost optimal.
The efficiency of the parallel algorithm is defined as
SU(n)
E(n, p) = .
C(n, p)
Let E0 be the constant such that 0 < E0 < 1. Then isoefficiency function ψ1 (p)
is the asymptotically minimum function such that
Hence, ψ1 (p) gives asymptotically the lower bound on the instance size of a prob-
lem that can be solved by p processors with efficiency at least E0 .
Scalability is the ability to adapt itself to a changing number of processors or or
to changing size of the input data. Good scalability means that if we want to use
new processors we have to increase the size of our problem only a little. Fast growth
of function ψ1 provides poor scalability.
We say that class NC (Nick’s class) is a set of algorithms that can be computed
with at most polylogarithmic time and with at most a polynomial number of proces-
sors. These algorithms provide a high level of parallelization.
170
Arithmetic Coding in Parallel
171
Proceedings of the Prague Stringology Conference ’04
3 2 4 7 1 5 2
@ @ @ @ @ @
@ @ @ @ @ @
@ @ @ @ @ @
@R?
@ @R?
@ @R?
@ @R?
@ @R?
@ @R?
@
3 5 6 11 8 6 7
HH HH HH HH HH
HH HH HH HH HH
HH HH HH HH HH
HH HH HH HH HH
j?
H j?
H j?
H j?
H j?
H
3 5 9 16 14 17 15
XX X X
XXX XXXX XXXX
XX X X
XXX XXXX XXXX
XX X X
XXX XXXX XXXX
z? XX
XX z? XX
z?
3 5 12 16 17 22 24
5.1 Preliminaries
We suppose that we have an array Range = [range0 , range1 , . . . , rangen−1 ] for our
algorithm. Each rangej is initialized with probability ry such that ay = sj where j is
the index of the j-th character in the input string and sj ∈ A. We also suppose that
we have an array Low = [low0 , low1 , . . . , lown−1 ]. Each lowj is initialized with value ly
such that ay = sj . We need at least one variable high initialized with value hy such
that ay = sn−1 .
172
Arithmetic Coding in Parallel
because LowRange−1 = 0.
The change in our algorithm is that we first compute the cumulative lower and
higher bounds and next we simply compute the sum of these cumulative bounds so
that we obtain the final bounds LowRange and HighRange.
Let us see how the variables LR and HR can be computed for the word “SWISS”.
We declare that LR0 is the LR variable for the first character s0 = “S and lx , hx ,
rx are lower range, higher range and probability of character x ∈ A. LR−1 and HR−1
are initial cumulative bounds for a number that represents the encoded text S. For
arithmetic coding this number is defined by default as an interval [0,1). That is why
LR−1 = LowRange−1 = 0 and HR−1 = HighRange−1 = 1.
173
Proceedings of the Prague Stringology Conference ’04
LR−1 = 0
HR−1 = 1
LR0 = (HR−1 − LR−1 ) × ls = 1.0 × 0.5 = 0.5
HR0 = (HR−1 − LR−1 ) × hs = 1.0 × 1.0 = 1.0
LR1 = (HR0 − LR0 )×lw = (hs − ls )×lw = rs ×lw = 0.5×0.4 = 0.2
HR1 = (HR0 − LR0 )×hw = (hs − ls )×hw = rs ×hw = 0.5×0.5 = 0.25
LR2 = (HR1 − LR1 )×li = (rs ×hw − rs ×lw )×li = rs ×rw ×li = 0.5×0.1×0.2 = 0.01
HR2 = (HR1 − LR1 )×hi = (rs ×hw − rs ×lw )×hi = rs ×rw ×hi = 0.5×0.1×0.4 = 0.02
LR3 = (HR2 − LR2 )×ls = (rs ×rw ×hi − rs ×rw ×li )×ls = rs ×rw ×ri ×ls = 0.005
HR3 = (HR2 − LR2 )×hs = (rs ×rw ×hi − rs ×rw ×li )×hs = rs ×rw ×ri ×hs = 0.01
...
So it is obvious that the lower bound of the j-th character LRj and the higher
bound of the j-th character HRj can be computed as
j−1
j
LR = ( r x ) × lj , j > 0,
x=0
j−1
j
HR = ( r x ) × hj , j > 0.
x=0
for i := 0, 1, . . . , n − 1 do in parallel
y i := Range[i];
for j := 0, 1, . . . , log n − 1 do sequentially
begin
for i := 2j , 2j + 1, . . . , n − 1 do in parallel
y i := y i × Range[i − 2j ];
for i := 2j , 2j + 1, . . . , n − 1 do in parallel
Range[i] = y i;
end
174
Arithmetic Coding in Parallel
S W I S S
0.5 0.1 0.2 0.5 0.5
0.5 0.05 0.02 0.1 0.25
0.5 0.05 0.01 0.005 0.005
0.5 0.05 0.01 0.005 0.0025
n−2
HRn−1 = ( r x ) × hn−1 .
x=0
Table 4 shows this computation in our example for the word “SWISS”. Note that
the results correspond to the cumulative bounds in our sequential example.
do sequentially
begin
y n−1 := High;
y n−1 := y n−1 × Range[n − 2];
High := y n−1 ;
y 0 := 1;
end
for i := 1, 2, . . . ,n − 1 do in parallel
y i := Range[i − 1];
for i := 0, 1, . . . ,n − 1 do in parallel
begin
y i := y i×Low[i];
Low[i] := y i;
end
Now we have computed the cumulative high and low ranges. The array Low
contains the LR values and the field High contains the value HRn−1 . Next we have to
compute the sum of these cumulative ranges LR so that we shall obtain the required
bounds HighRange and LowRange for arithmetic compression of string S.
175
Proceedings of the Prague Stringology Conference ’04
L/H S W I S S
LR 0.5 0.2 0.01 0.005 0.0025
HR 1 0.25 0.02 0.01 0.005
j−1
j
LowRange = ( LRx ) + LRj ,
x=0
j−1
j
HighRange = ( LRx ) + HRj .
x=0
To compute the sum we can use the parallel prefix algorithm once more, exactly the
parallel prefix sum shown in the former text. Finally, after computing the sum, the
variable HighRangen−1 is obtained as
This algorithm is shown in Fig. 5. The array Low contains the values LowRange and
the field High contains the value HighRangen−1 . Our example for the word “SWISS”
is shown in Table 5.
for i := 0, 1, . . . , n − 1 do in parallel
y i := Low[i];
for j := 0, 1, . . . , log n − 1 do sequentially
begin
for i := 2j , 2j + 1, . . . , n − 1 do in parallel
y i := y i + Low[i − 2j ];
for i := 2j , 2j + 1, . . . , n − 1 do in parallel
Low[i] = y i ;
end
do sequentially
begin
y n−1 := High;
y n−1 := y n−1 + Low[n − 2];
High := y n−1;
end
176
Arithmetic Coding in Parallel
S W I S S
0.5 0.2 0.01 0.005 0.0025
0.5 0.7 0.21 0.015 0.0075
0.5 0.7 0.71 0.715 0.2175
0.5 0.7 0.71 0.715 0.7175
7 Conclusions
We have presented a parallel NC algorithm for computation of arithmetic coding. We
have solved the problem in O(log n) time using n/log n processors on EREW PRAM.
Our algorithm leads to O(n) total cost and is cost optimal.
The preliminary phase is a weakness of our algorithm. However, if we were able
to construct a good adaptive parallel arithmetic coding based on our algorithm, it
could solve this problem.
Another question is how to make a good parallel arithmetic decoding algorithm.
References
[Ho92] Howard, Paul G., Jeffrey Scott Witter (1992): Parallel Losseless Image Com-
pression Using Huffman and Arithmetic Coding. Proceedings of the IEEE
Data Compression Conference, 299-308.
177
Proceedings of the Prague Stringology Conference ’04
[Ji94] Jiang J., S. Jones (1994): Parallel design of arithmetic coding. IEE
Proceedings-Computers and Digital Techniques, 141(6):327-333, November.
[La80] Ladner, Richard and Michael J. Fisher (1980): Parallel Prefix Computation.
Journal of the ACM, 27(4):831-838, October.
[Lb99] Laber, Eduardo Sany, Ruy Luiz Milidi and Artur Alves Pessoa (1999): A
Work Efficient Parallel Algorithm for Constructing Huffman Codes. Pro-
ceedings of the IEEE Data Compression Conference DCC’99.
[Mo98] Moffat, Alistar, Redford Neal, and Ian H.Witten (1998): Arithmetic Coding
Revisited. ACM Transactions on Information Systems, 16(3):256-294, July.
[Tv94] Casavant, T.L., P. Tvrdı́k and F. Plášil, editors (1994): Parallel Computers:
Architectures, Languages, and Algorithms. IEEE CS Press.
[Wi87] Witten, Ian H., Redford Neal and John G. Cleary (1987): Arithmetic coding
for Data Compression. Communications of the ACM 30(6):520-540.
[Yo98] Youssef A. (1998): Parallel Algorithms for Entropy Coding Techniques. Pro-
ceedings of European Parallel and Distributed Systems. ACTA Press.
178