Left To Right Serial Multiplier
Left To Right Serial Multiplier
Abstract- A new high precIsIon serial multiplier with Most several works [3, 4, 5, 6], who clearly reveal the adaptation of
Significant Digit First (MSDF) is presented. This one uses a the on-line arithmetic for this kind of calculation. Since, in the
Borrow-Save (BS) adder to perform the reduction of large length online arithmetic, the operands, as well as the results, flow
partials products required by the multiplication of large serially through the computation in a digit by digit manner
numbers. The results are converted from BS form to the 2's
starting from the most significant digit (MSDF). The
complement representation by the on-the-fly conversion which let
the conversion of the digit result as soon as it is obtained. It is advantages of online arithmetic have been to permit the
shown that the comparison between the residual and these computation of all operations with MSDF mode which reduce
constants (-3/2, -112, 112 and 3/2) needed in the radix-2 on line the interconnection bandwidth between modules and allow
multiplication, present problem in high precision computation. parallelism between several operations. Moreover, to
However, in the proposed method the operands are introduced manipulate large numbers in parallel way presents always a
digit by digit with MSDF mode and results are obtained in the hardware problem, because parallelism requires large
same manner with fixed time delay independently of the operand processing circuits that must comprise several inputs-outputs
size. So, this approach is advantageously used for the long pins according to the size of the operands, what increases
multiplication computation. This method has been tested by the
execution of a program developed with Maple 9.5 for several test
circuitry complexity.
vectors. The results of the implementation of this multiplier for
several operands sizes (128,256, 512, and 1024) on Virtex-I1 Among these works, A.Guyot, Y.Herreros and I-M.Muller
FPGA Circuit confirm that the multiplication is performed in presented in [4] the implementation on VLSI of an online
constant time. Multiplier IDivider for large numbers. Subsequently, Y.Hornik
sustained in the same context in order to implement several
Keywords-component: Multiplication, On-line arithmetic, High online operators for high precision [5]. The drawback ofthese
precision, Architecture, VHDL, FPGA, Virtex-II. operators is that their architectures are proportional to the
operand size, which generates a very important hardware. We
I. INTRODUCTION can also cite the work of M.D.Ercegovac and A.F Tencas in
[6], where they elaborate divider architecture in long precision
In the last two decades the focus of digital design has been while using the online mode for high radix.
primarily on computation performances, however the
precision remained almost unchanged and limited to simple However, the use of the binary on-line arithmetic in high
and double precision of IEEE-754 standard [1]. In addition, precision, presents generally a problem in the selection of the
several scientific and technical computations are numerically result digit. Since, the generation of this digit is done after the
intensive and require arithmetic operators with very high comparison between the residual noted by H[j], which is
precision. A good example of this kind of operation is founded represented in redundant notation, and constants represented
in cryptography [2], where with the constant growth of the in 2's complement. This comparison is possible only after the
data communications; the security becomes more and more an conversion of the total residual to 2's complement. This
important characteristic. Encryption/Decryption data need requires the use of carry propagate adder depending on the
arithmetic operators with very high precision about 200 to operands size. Consequently the delay of generation of the
2000 bits. Several fields can also be cited such as the digit result will be proportional to the time of this adder.
electronic signatures, the medical imagery, the generation of Several works detailed in [7, 8, 9, 10] present solutions based
random numbers, the reduction of fraction in infmite precision on the overlapping of the selection intervals, thus the
and other more critical applications such as the nuclear operation of comparison is carried out only on a fixed part of
simulators of stations and the military batteries of air defense. the residual. This part is deduced after the estimation of the
Therefore, the calculation in high precision was the object of residual to a value which contains only its significant digits.
{o
which the H [j] has been truncated. This error on H[j], can
-1 if-3/2 ~ HG) < -112 easily in some cases select a false digit result.
Zj = if -112 ~ HG) < 112 (7) So, the long precision computation requires a new online
multiplication where the results digits are generated without the
1 if 112 ~ HG) < 3/2 comparison stage and the carry propagation problem doesn't
occur.
E. On-line Multiplier architecture
III. THE PROPOSED METHOD
The multiplier architecture corresponding to the algorithm
quoted in the latter section is represented on figure 1. This one The multiplication method that we develop in this section is
is composed of: a serial method in MSDF mode. The result is obtained in the
same manner in constant time independently of the operand's
size. The method's principal is based on the classic
Two digit by vector multipliers: used for the multiplication which is done in three steps: the generation of
calculation of partials products X[j]xYj+3 and the partial products, the reduction of the partial products and
Y [j-I] x Xj+3' the final addition. Our approach has the same principle, except
that it runs these three steps in serial with significant digit first.
Tree reduction using CSA adder (carry save adder) for In the proposed method, operands are represented in 2' s
the reduction to two terms in CS notation of the follow complement and introduced bit by bit with most significant bit
expression : first.
H[j] = 2 x (H[j-I] - Zj)+ (X [j] x Yj+3+ Y [j-I] x Xj+3) In order to illustrate the process of this method, we present
an example on the figure 2 which shows the multiplication
Conversion block in order to convert the total residual
calculation of two operands: X and Y which are presented with
from the redundant representation to 2's complement 4 bits as follow:
using a CLA adder.
X=O, XIX2X3}4, Y=O, YIY2Y3Y4 ,such as X and Y < 1
Block used for the calculation of the selection function
S (H[jD. We have to start the process with the initialization step,
which consist in the addition of the four most significant bits
(two bits from each operand). This step may be done in parallel representation. The first block calculates the two partials
or serial, and it presented the method's delay where the fIrst products XjxY [i-I] and YixX[j]. As the bits Xj and Yi are equal
result digit cannot be obtained until the end of this step. During to 0 or 1, the multiplication by these bits is easy to do by a
the fIrst step, the fIrst digit WI is obtained, and the X3 and Y3 group of AND gates. This last has the same size as the
inputs bits are introduced and their corresponding partial operand. Since, each iteration; a new bit is added to the
products are added to the residual R [1]. Then, the second digit operands Y [i-I] and X [j], until obtained completely the
result W2 is obtained which is equal to the most signifIcant digit operands X and Y.
of the partial residual W [2]. In the same manner, all the digits The reduction block, reduces three numbers, twice of them
results are computed, until the appearance of the two last bits are represented in 2's complement and the third one is
of the X and Y operands. The fmal residual W [3], is taken as a represented in the Borrow Save system (B.S). As shown in the
fInal part of the multiplication's result. It is important to note section II.B, with BS representation, number is equal to the
that all the result digits are represented in the redundant form, addition of a positive number and a negative number. The
thus converting them to 2's complement is required. By using schematic block is illustrated on the fIgure 4. This latter,
the on the fly conversion [23] the results digits are converted as contained two stages. The fIrst one is constituted by a carry
soon as they are obtained. save adder (CSA). The second one is constituted by PPM
adders. We noted that this reduction is done in constant time
X3 independently of the operand size used.
Y3 At the iteration G) we obtained result digit Wj, which is
I X1Y1 I Y1 X 2 I Initialisation represented in a BS system. To obtain the result digit in the 2's
I Y 2X 1 I Y2 X 2 I complement representation, two solutions are proposed. The
fIrst one suggests waiting until the reception of all the result
I W1 I W2 I W3 I W4 IW[1] digits then doing the subtraction. This solution, a large delay is
I I I added to the execution time. Since, the final addition requires
R [111 W 2
, W3 W4
X3 *( Y 1 Y2) ,stage1 carry propagation depending on the operands size. The second
solution, is the use of the on the fly conversion [23], which
I Y3*(X1 X2X3) I consists in the conversion of the result as soon as they are
I W2 I W3 I W4 I W5 I W6 IW[2] obtained. This architecture is illustrated in figureS.
I I I stage2
R[2]1 W3 W4 I W5 W6
I x/( Y1Y2Y3) I
I y4 *{ x 1 X 2 x 3 x 4) I
I I I I I I
I+
W[3]1 W3 W4 W5 W6 W7 W 8
~ + + ..... +
I Zl I Z 21 Z3 I Z 4 1 Z5 Z6 I Z7 I ZB I
W[O] =0
{ W [1] = X [2] xY [2] (8)
W [j] = 2 x R[j] + T2(Yj x X[j] + Xj x Y[i-I])
o s1-1) riO)s~(O) rin-1) s+(n-1)
V. CONCLUSION
Figure 5. The on the fly conversion architecture. In this paper, we illustrated the on-line multiplication
problem which is located in the selection of the result bit.
Moreover, we demonstrated that this selection probably
IV. IMPLEMENTATION RESULTS can't be done without carry propagate adder. Then, a new
The multiplier architecture presented in this paper multiplication method is presented. This new approach is
was designed using the Xilinx environment used for the multiplication of no redundant numbers. For
Foundation series 7.li. For the description of all the each step, the proposed method consists first, on the
blocks, we employ a VHDL language description. generation of the partials products which are computed after
We have to implement our method for several the reception of the step's inputs. Secondly, the reduction of
operands size, so we describe architecture in generic these partials products is completed using constant time
mode using the parameter n, which characterize the adder. So, the digits results are generated in constant time.
bits number of operands size. To test our method in We conclude that our computation method is suitable for
high precision and study its implementation in the multiplication in high precision. The implementation of
FPGA, we implement four architectures for 128, 256, our multiplier architecture show that for the operands size
512, and 1024 bits. These architectures are simulated more than the number of flips-flops available in the same
with Modelsim PE 6.0. In order to validate the line of the circuit chosen, the on-the -fly conversion (used
simulation result we use the formal tools Maple9.5 for the conversion of the result from the BS form to 2's
[24]. To implement the multiplier architecture we complement) augment the routing delay so the execution
choose one of FPGA circuit: XC2V2000-6(Virtex2 time is delayed. Consequently, the main problem of this
of Xilinx devices). The Table I and Table II, show method is the routing complexity which enlarges the route
the iteration delay and the occupied area for several delay.
operands 128,256,512, and 1024.
These results confirm that the left to right
multiplication is done at constant time. REFERENCES
T able I . The IteratIOn d eIay [1] V. Lefevre, and P. Zimmermann, "Arithmetique flottante," Research
size Iteration delay Route delay Lo~ic delay Report, lNRIA, n05105, January 2004.
128 5,53 ns 3.75ns (67.8 %) 1.783(32.2 %)
[2] B. Chevallier-Mames, "Cryptographie a cle publique : Constructions et
256 6,40 ns 4.62ns (72.2 %) 1.783(27.8%)
preuves de securite," Thesis PHD, Paris VII University, November
512 8,47 ns 6.69ns (79.0 %) 1.783(21.0%) 2006.
1024 11,81 ns 10.02ns (84.9 %) 1.783(15.1%) [3] A F. Tenca, and M.D Ercegovac, "A High-Radix multiplier design for
variable long-precision computations, " Proc. 31st Asilomar Conference
. dArea on Signals, Systems and Computers, pp. 1173-1177,1997.
T abl e II Th e occuple
Size Area (slices) [4] A Guyot, Y.Herreros, 1. M. Muller, "JANUS, An on-line
multiplier/divider for manipulating large numbers," Computer
1024 10750 (99%)
Arithmetic, Proceedings of 9th Symposium pp.l 06--111, September.
512 59906 (54 %) 1989
256 2949(27 %) [5] Y. K. Hornik, "Operateurs Arithmetiques Standards en ligne a tres
grande precision, conception et implementation," Phd Thesis,
128 1471 (13 %) Grenoble, 93.
[6] M.D Ercegovac, AF Tenca, "On the design of high-radix on-line
division for long precision Computer Arithmetic," Proceedings of the
These results show that the iteration logic delay is 14th IEEE Symposium, pp. 44 - 51, April 1999.
constant and independent of the operand size. [7] M. D. Ercegovac and A 1. Grnarov, "On the Performance of On-Line
Nevertheless, the route delay is different for these Arithmetic," Proceedings of the International Conference on Parallel
architectures. This is due of the routing complexity Processing, pp. 55-62, August 1980.
which increases when the operands size increase.
[8] M. D. Ercegovac, "On-Line Arithmetic: An Overview," Proceedings of [17] K.S.Trivedi, M.D Ercegovac On Line Algorithm For Division And
the SPIE, Real Time Signal Processing VII, volume 495, pp. 86-93, Multiplication IEEE Transaction On Computers, Vol C-26, N7, pp.
1984. 681-687, Juilly 77.
[9] P. K.-G. Tu, "On-line Arithmetic Algorithms for Efficient [18] MD Ercegovac and P. K.-G. Tu, " A radix-4 on-line algorithm", 8 th
Implementation," PhD Thesis,University of California, Los Angeles, symposium on computer arithmetic, Como, Italy, May 1987 .
1990. [19] M.J.Irwin, "An arithmetic unit for on line computation", PHD thesis,
[10] 1. L. Beuchat, "Etude et conception d'operateurs arithmetiques optimises tech. Report UIUCDCS-R-77-873, Dept of Computer science,
pour circuits programmables," Phd Thesis, Ecole Polytechnique University of Illinois, Chanpaign-urbana, 1161801, May 1977.
Federale de Lausanne EPFL, 2001 [20] M.J.Irwin, "An arithmetic unit for on line computation", PHD -thesis,
[11] 1. L. Beuchat, J. M. Muller, "Automatic Generation of Modular tech. Report UIUCDCS-R-77-873, Dept of Computer science,
Multipliers for FPGA Applications," LIP Research Report W2007-1, University of Illinois, Chanpaign-urbana, II 61801, May 1977.
2007.
[12] S. Rajagopal, and J. R. Cavallaro, "Truncated online arithmetic with [21] M. D. Ercegovac and T. Lang, "On-Line Arithmetic: A Design
applications to communication systems," IEEE Transaction on Methodology and Applications in Digital Signal Processing," IEEE
computers, vol. 55, N 10, published August.2006. Acoustics, Speech, and Signal Processing Society Workshop on VLSI
[13] Z. Huang and M.D. Ercegovac, "High-Performance Low-Power Left-to- Signal Processing, pp 252-263, November 1988.
Right Array Multiplier Design," IEEE Trans. Computers, vol. 54(3) pp. [22] M. D. Ercegovac and T. Lang, "On-Line Scheme for Computing
272-283,2005. Rotation Factors," Journal of Parallel and Distributed Computing, vol.
[14] AAvizienis, "Signed-Digit Number Representations for Fast Parallel 5(3), pp. 209-227,1988.
Arithmetic," IEEE Transactions on Electronic Computers, vol. 10,1961. [23] M.D Ercegovac, T.Lang, "On the fly conversion of redundant into
[15] 1.E. Robertson, "A new class of digital division methods-, IRE conventional representation," IEEE Transaction On Computers, vol C-
transaction on electronic computers," vol.7, pp.218- 222, September. 36, W7, Juilly 87 pp. 895-897
58. [24] www.maplesoft.com.
[16] C.Y.Chow, 1.E. Robertson, "Logical design of a redundant binary adder, [25] Xilinx Inc, Virtex-II Platform User Guide: Chp 3, PAR (Design
IEEE International on Computer Arithmetic," pp. 109- 115, October Considerations), 2004
1978.