Multi-size circular shifting networks for Multiplexers on the ith stage of the shifter, with 0 i < n, are
decoders of structured LDPC codes connected in such a way to circularly shift (rotate) the input vector
by 2i if the control bit is set; otherwise, the stage is transparent to the
M. Rovini, G. Gentile and L. Fanucci input. In this way, the arbitrary rotation r ¼ 0, 1, . . . , N 1 is decom-
posed into n consecutive rotations by powers-of-2. An example of a
The need for circularly shifting an array of data is a distinguishing 5 5 barrel shifter is shown in Fig. 1.
feature of decoders for structured low-density parity-check (LDPC) The advantage of a barrel shifter w.r.t. the Banyan network is that not
code, as a result of an efficient trade-off between performance and only does it require less multiplexers by removing the power-of-2
parallelisation of the elaborations, or throughput. Since the decoder constraint on N, but also it does not need to store or pre-compute any
must typically cope with blocks of data with different size, described is configuration bit, since its stages are directly driven by the bits ri of the
an efficient architecture of a reconfigurable multi-size circular shifting binary represervation of r, i.e. r ¼ < rn1, . . . . , r1, r0 > .
network, used to circularly shift an array with arbitrary size.
Multi-size circular shifting (MS-CS) network: LDPC codes of IEEE
Introduction: Modern wireless communication standards, such as 802.11n and 802.16e are defined for different block-sizes Nk, with
DVB-S2, IEEE 802.16e and the forthcoming IEEE 802.11n, rely on 0 k < K and K the number of admissible sizes. To formalise the
LDPC codes for forward error correction. Particularly, they define problem, let B ¼ gcd {Nk}0 k < K be the greatest common divisor
structured or architecture aware (AA-) LDPC codes, the parity-check between all block-sizes; then, the architecture of the MS-CS network
matrix of which is arranged into smaller sub-matrices that can be can be efficiently arranged into smaller B B barrel shifters. More
either null or a circular shift of the identity matrix. This aims at precisely, given the maximum block-size Nmax, the MS-CS architec-
designing very efficient, semi-parallel (or vectorised) decoder archi- ture is composed of Z ¼ Nmax=B B B barrel shifters, plus an addi-
tectures, where several functional units work in parallel to increase the tional stage of Nmax Z-to-1 multiplexers (MUXZ), also referred to as
achievable throughput. Accordingly, the decoder needs to circularly adaptation network (AN). Each barrel shifter rotates a sub-block of B
shift an array of N messages (or simply, data), with N the size of the data by the same value jrjB, and depending on the current block-size
sub-matrix, also referred to as block-size. Nk, data are further shuffled in the AN to achieve the arbitrary rotation
The well-known N N Benes network [1] is capable of performing r ¼ 0,1, . . . , Nk 1. This is done by properly re-ordering the output
any sort of permutation over a vector of N data, with N a power-of-2. data of the barrel shifters with MUXZ. If we denote with xz,b the bth
The network is made of 2 log2 N 1 stages of – 2-to-1 multiplexers element output by the zth unit, with b ¼ 0,1, . . . ,B 1 and z ¼ 0,1, . . . ,
(MUX2). However, when only circular shifts are required, an N N Z 1, then the AN implements the following rule:
Banyan network [2] is preferable since it only needs log2 N stages of 8
MUX2, thus saving N (log2 N 1) multiplexers. <y ¼ x b þ r
z < Zk ; 8b; with z0 ¼ z þ
To properly drive all MUX2 in the network, configuration bits must z;b z0 ;b
B Zk ð1Þ
:
be providedPfor each rotation r ¼ 0, 1, . . . , N 1; these are in the yz;b ¼ xz;b z Zk ; 8b
number of log 2N1
i¼0 2i ¼ N 1 for each rotation, or N(N 1) overall,
and can be either stored in a dedicated look-up table (LUT) or yz,b being the output of the AN and of the whole MS-CS network and
computed on-the-fly. Zk ¼ Nk=B is the number of barrel shifters really used for the current
As a drawback, Banyan (and Benes) networks are suitable for N a block-size. The term bb þ r=Bc in (1) can be pre-computed off-line and
power-of-2; unfortunately, modern communication standards do not stored in a bidimensional LUT addressed with b and r. Note that the
meet this requirement, and N is typically set up for communication outputs of the shifters with index z Zk, actually unused, are not
performance. Anyway, an N 0 N 0 Banyan network could be still shuffled; however, with minor modifications to (1), several arrays of
used, with N0 ¼ 2dlog2Ne the smallest power-of-2 greater than N, but, as data, and also with different sizes, could be rotated in parallel. (By
mentioned in [3], an additional multiplexing stage must appear at the modifying the shuffling rule of the AN for z Zk, the network can
network input or output. rotate several arrays in parallel; particularly, if all arrays have the same
Furthermore, IEEE 802.11n and 802.16e define codes with different block-size, up to bZ=Zkc ¼ bNmax=Nkc arrays can be processed in
codeword lengths and block sizes; for this reason, a multi-size circular parallel.) Fig. 2 shows an example of an MS-CS network with B ¼ 5
shifting (MS-CS) network, circularly shifting over an arbitrary number and Nmax ¼ 20.
of data, is mandatory.
Recently, a network architecture solving these same issues has been
presented in [3]; this is based on the use of two Benes networks, where
one unit actually shuffles data, while the other, called auxiliary,
configures the first one on the fly. This Letter describes a reconfigurable
architecture of an MS-CS network, where the support of multiple sizes
is not granted by the Benes topology, but is achieved by rearranging the
system in smaller sub-networks, working on blocks of data with fixed
size.
Fig. 2 Architecture of MS-CS network for B ¼ 5 and Nmax ¼ 20
Fig. 1 Architecture of 5 5 SS-CS network
Implementation results: Table 1 compares the results of the logical
Single-size circular shifting network: Given an array of data with size synthesis of the proposed network, tailored to the communication
N, circular shifting can be efficiently achieved with a barrel shifter, standards mentioned above, with other state-of-the-art implementa-
which is composed of stages of n ¼ dlog2 Ne stages of N MUX2 each. tions. In particular, the analysis focuses on the networks of a
ELECTRONICS LETTERS 16th August 2007 Vol. 43 No. 17
scaled-down architecture of a decoder for DVB-S2 (N ¼ 45), and the beyond the requirement of practical applications, and for higher B
MS-CS networks required by decoders for 802.11n and 802.16e. For the MC-SC complexity is smaller than the Benes network proposed in
the sake of fair comparison, syntheses have been run with registered [3] and used in [7] for WiMAX.
inputs and for (i) the same data width, (ii) the same CMOS processes, Although customised for a decoder of LDPC codes, the described
and (iii) the same clock frequency of [4–7]. Note that, because of the MS-CS architecture is of broader utility, and it can be used whenever a
requirement on speed, a proper number of pipeline stages has been set of data with variable size needs to be rotated in parallel.
considered. The analysis shows that the proposed architecture outper-
forms similar implementations, the saving ranging from 30.4 to
# The Institution of Engineering and Technology 2007
67.2%.
20 April 2007
Electronics Letters online no: 20071157
Table 1: RTL complexity of shifting networks for DVB-S2, IEEE doi: 10.1049/el:20071157
802.11n and IEEE 802.16e compared with state-of-the-
art implementations M. Rovini, G. Gentile and L. Fanucci (Department of Information
Engineering, University of Pisa, via G. Caruso, I-56100, Pisa, Italy)
DVB-S2 802.11n 802.16e E-mail:
[email protected] N 45 81 96
Ref. [4] barrel [5] MS-CS [6] MS-CS [7] MS-CS
Proc. CMOS 90 nm CMOS 65 nm CMOS 0.13 mm CMOS 0.13 mm References
fclk 400 MHz 400 MHz 412 MHz 500 MHz
1 Benes, V.E.: ‘Optimal rearrangeable multistage connecting networks’,
Bits 8 6 8 6
Bell Syst. Tech. J., 1964, 43, pp. 1641–1656
2
mm 50 000 25 532 32 500 22 612 289 000 94 690 740 000 277 600 2 Olcer, S.: ‘Decoder architecture for array-code-based LDPC codes’.
Global Telecommunications Conf., December 2003, San Francisco,
Further, in detail, the MS-CS network for IEEE 802.11n features CA, USA, Vol. 4, pp. 2046–2050
3 Tang, J., Bhatt, T., Sundaramurthy, V., and Parhi, K.K.: ‘Reconfigurable
B ¼ 27 and Nmax ¼ 81, and its AN is composed of 81 3-to-1 multi-
shuffle network design in LDPC decoder’. Int. Conf. on Application-
plexers, which only amount to 16.7% of the whole area. On the specific Systems, Architectures and Processors, (ASAP), September
contrary, higher flexibility is required by IEEE 802.16e, where B ¼ 4 2006, Colorado, USA, pp. 81–86
and Nmax ¼ 96. Therefore, the AN features 96 24-to-1 multiplexers, 4 Dielissen, J., Hekstra, A.P., and Berg, V.: ‘Low cost LDPC decoder for
which count for 80% of the whole complexity. Note that in this case our DVB-S2’. Design, Automation and Test in Europe, (DATE’06), March
design would support Z ¼ 24 different sizes, while only 19 values are 2006, Munich, Germany, Vol. 2, pp. 6–10
specified by the standard (from 24 to 96 in steps of 4). 5 Brack, T., Alles, M., Lehnigk-Emden, T., Kienle, F., When, N., et al.:
‘Low complexity LDPC code decoders for next generation standards’.
Design, Automation and Test in Europe, (DATE’07), April 2007, Nice,
Conclusion: The proposed MS-CS network is very efficient in terms France
of complexity and its regular architecture makes the introduction of 6 Karkooti, M., Radosavljevic, P., and Cavallaro, J.R.: ‘Configurable, high
pipeline straightforward when very high clock frequencies are throughput, irregular LDPC decoder architecture: tradeoff analysis and
required. Furthermore, as mentioned above, multiple arrays of data implementation’. Int. Conf. on Application-specific Systems,
Architectures and Processors, (ASAP’06), September 2006, Colorado,
with equal or different size can be easily processed in parallel by one USA
single network. 7 Gunnam, K., Choi, G., Yeary, M.B., and Atiquzzaman, M.: ‘VLSI
When B ¼ 1, the MS-CS network turns out into a direct imple- architectures for layered decoding for irregular LDPC codes of
mentation made of Nmax Nmax-to-1 multiplexers, which compares WiMax’. IEEE Int. Conf. on Communications, (ICC2007), June 2007,
unfavourably with [3]; however, this extreme flexibility is far Glasgow, UK
ELECTRONICS LETTERS 16th August 2007 Vol. 43 No. 17