A05410105

IOSR Journal of Engineering (IOSRJEN) www.iosrjen.org
ISSN (e): 2250-3021, ISSN (p): 2278-8719
Vol. 05, Issue 04 (April. 2015), ||V1|| PP 01-05
International organization of Scientific Research 1 | P a g e
Design And Implementation Of Modified Booth Recoder Using
Fused Add Multiply Operator
S.Vijaya, Mr.k.Santhakumar,
ME(VLSI Design),PG Scholder, Nandha Enggineering college, perundurai, erode
Associate professor of ECE department,Nandha Enggineering college, perundurai, erode.
Abstract: - Complex arithmetic operations are widely used in Digital Signal Processing (DSP) applications.
This paper presents an efficient design of modified booth multiplier and then also implements it. Low-cost finite
impulse response (FIR) designs are presented using the concept of faithfully rounded truncated multipliers. In
this work, focus on optimizing the design of the fused Add-Multiply (FAM) operator for increasing
performance.
In this project introduce a structured and efficient recoding technique and explore three different schemes by
incorporating them in FAM designs. Comparing them with the FAM designs which use existing recoding
schemes, the proposed technique yields considerable reductions in terms of critical delay, hardware complexity
and power consumption of the FAM unit.
Keywords: - Add multiply operation, Modified Booth Recoding, FIR filter design.
I. INTRODUCTION
Modern consumer electronics make extensive use of Digital Signal Processing (DSP) providing custom
accelerators for the domains of multimedia, communications etc. Recent research activities in the field of
arithmetic optimization [1], [2] have shown that the design of arithmetic components combining operations
which share data, can lead to significant performance improvements. Based on the observation that an addition
can often be subsequent to a multiplication symmetric FIR filters), the Multiply-Accumulator (MAC) and
Multiply-Add (MAD) units were introduced [3] leading to more efficient implementations of DSP algorithms
compared to the conventional ones, which use only primitive resources [4]. In [12], the author proposes a two-
stage recoder which converts a number in carry-save form to its MB representation.
Although the direct recoding of the sum of two numbers in its MB form leads to a more efficient
implementation of the fused Add-Multiply (FAM) unit compared to the conventional one, existing recoding
schemes are based on complex manipulations in bit-level, which are implemented by dedicated circuits in gate-
level. More specifically, propose a new recoding technique which decreases the critical path delay and reduces
area and power consumption. The proposed S-MB algorithm is structured, simple and can be easily modified in
order to be applied either in signed (in 2’s complement representation) or unsigned numbers, which comprise of
odd or even number of bits. We explore three alternative schemes of the proposed S-MB approach using
conventional and signed-bit Full Adders (FAs) and Half Adders (HAs) as building blocks. The proposed
recoding technique delivers optimized solutions for the FAM design enabling the targeted operator to be timing
functional (no timing violations) for a larger range of frequencies. Also, under the same timing constraints, the
proposed designs deliver improvements in both area occupation and power consumption, thus outperforming the
existing S-MB recoding solutions.
An important design issue of FIR filter implementation is the optimization of the bit widths for filter
coefficients, which has direct impact on the area cost of arithmetic units and registers. Moreover, since the bit
widths after multiplications grow, many DSP applications do not need full-precision outputs. Instead, it is
desirable to generate faithfully rounded outputs where the total error introduced in quantization and rounding is
no more than one unit of the last place (ulp) defined as the weighting of the least significant bit (LSB) of the
outputs. In this brief, we present low-cost implementations of FIR filters based on the direct structure in Fig.
1(a) with faithfully rounded truncated multipliers. The MCMA module is realized by accumulating all the
partial products (PPs) where unnecessary PP bits (PPBs) are removed without affecting the final precision of the
outputs. The bit widths of all the filter coefficients are minimized using non uniform quantization with unequal
word lengths in order to reduce the hardware cost while still satisfying the specification of the frequency
response.

Design And Implementation of Modified Booth Recoder Using Fused Add Multiply Operator
II. MOTIVATION AND FUSED AM IMPLEMENTATION
A. Motivation
In this paper, focus on AM units which implement the operation Z=X. (A+B). The conventional design
of the AM operator (Fig. 1(a)) requires that its inputs A and B are first driven to an adder and then the input X
and the sum Y=A+B are driven to a multiplier in order to get Z. The drawback of using an adder is that it inserts
a significant delay in the critical path of the AM. As there are carry signals to be propagated inside the adder, the
critical path depends on the bit-width of the inputs. In order to decrease this delay, a Carry-Look-Ahead (CLA)
adder can be used which, however, increases the area occupation and power dissipation. As a result, significant
area savings are observed and the critical path delay of the recoding process is reduced and decoupled from the
bit-width of its inputs. In this work, we present a new technique for direct recoding of two numbers in the MB
representation of their sum.
B. Review of the Modified Booth Form
Modified Booth (MB) is a prevalent form used in multiplication [15], [20], [24]. It is a redundant
signed-digit radix-4 en-coding technique. Its main advantage is that it reduces by half the number of partial
products in multiplication comparing to any other radix-2 representation.
Fig. 1.FAM operator based on the (a) conventional design and (b) Implementation with truncated multiplier.
The multiplier is a basic parallel multiplier based on the MB algorithm. The terms CT, CSA Tree and CLA
Adder are referred to the Correction Term, the Carry-Save Adder Tree and the final Carry-Look-Ahead Adder
of the multiplier.
The most significant of them is negatively weighted while the two least significant of them have
positive weight. Consequently, in order to transform the two aforementioned pairs of bits in MB form we need
to use signed-bit arithmetic. For this purpose, we develop a set of bit-level signed Half Adders (HA) and Full
Adders (FA) considering their inputs and outputs to be signed.
III. FIR FILTER IMPLEMENTATION
FINITE impulse response (FIR) digital filter is one of the fundamental components in many digital
signal processing (DSP) and communication systems. It is also widely used in many portable applications with
limited area and power budget.
A general FIR filter of order M can be expressed as
y[n] =M−1_i=0 a ix[n − i].
There are two basic FIR structures, direct form and transposed form, as shown in Fig. 1 for a linear-
phase even-order FIR filter. In the direct form in Fig. 1(a), the multiple constant multiplication
(MCM)/accumulation (MCMA) module performs the concurrent multiplications of individual delayed signals
and respective filter coefficients, followed by accumulation of all the products. Thus, the operands of the
multipliers in MCMA are delayed input signals x[n − i] and coefficients
In the transposed form in Fig. 1(b), the operands of the multipliers in the MCM module are the current
input signal x[n] and coefficients. The results of individual constant multiplications go through structure adders
(SAs) and delay elements. In the past decades, there are many papers on the designs and implementations of
low-cost or high-speed FIR filters [1]–[13], [15]–[19]. In order to avoid costly multipliers, most prior hardware
implementations of digital FIR filters can be divided into two categories: multiplier less based and memory
based.

Fig-2 Stages of digital FIR filter design and implementation
An important design issue of FIR filter implementation is the optimization of the bit widths for filter
coefficients, which has direct impact on the area cost of arithmetic units and registers. Moreover, since the bit
widths after multiplications grow, many DSP applications do not need full-precision outputs. Instead, it is
desirable to generate faithfully rounded outputs where the total error introduced in quantization and rounding is
no more than one unit of the last place (ulp) defined as the weighting of the least significant bit (LSB) of the
outputs.
IV. COEFFICIENT QUANTIZATION AND OPTIMIZATION
A generic flow of FIR filter design and implementation can be divided into three stages: finding filter
order and coefficients, coefficient quantization, and hardware optimization, as shown in Fig. 2. In the first stage,
the filter order and the corresponding coefficients of infinite precision are determined to satisfy the specification
of the frequency response. Then, the coefficients are quantized to finite bit accuracy. Finally, various
optimization approaches such as CSE are used to minimize the area cost of hardware implementations. Most
prior FIR filter implementations focus on the hardware optimization stage.
After FIR filter operations, the output signals have larger bit width due to bit width expansion after
multiplications. In many practical situations, only partial bits of the full-precision outputs are needed. For
example, assuming that the input signals of the FIR filter have 12 bits and the filter coefficients are quantized to
10 bits, the bit width of the resultant FIR filter output signals is at least 22 bits, but we might need only the 12
most significant bits for subsequent processing.
Our proposed FIR filter design has four versions. MCMA is the baseline implementation using
combined PP compression [similar to Fig. 4(b)] with uniformly quantized coefficients. MCMA_opt is an
improved version by adopting the non uniform quantization in Fig. 3 for coefficient optimization. MCMAT_I
and MCMAT_II faithfully truncate PPBs using the approaches in Fig. 6(a) and (b), respectively.
Although the area costs of the proposed designs are significantly reduced, the critical path delay is
increased because all the operations in the MCMA are executed within one clock cycle. It is possible to reduce
the delay by adding pipeline registers in the PP compression as suggested in [17], where the major goal is to
minimize the number of FAs, HAs, and registers (including algorithmic registers and pipelined registers) using
integer linear programming. In this brief, we focus on low-cost FIR filter designs with moderate speed
performance for mobile applications where area and power are important design considerations.
V. EXPERIMENTAL RESULTS AND COMPARISONS
In multiplier less designs with transposed structure, CSE can effectively reduce the number of adders in
MCM compared with CSD recoding. Non recursive signed CSE (NRSCSE) [1] and multi root binary partition
graph (MBPG) [2] belong to the category of CSE methods. Note that SAs are not optimized.In [11], the constant
multiplication is realized by storing the odd multiples of the constant in LUT implemented with dual-port
segmented memory sharing memory cells. This approach needs full-custom design of the LUT circuits.
Fig. 4. Result analysis of FIR filter implementation of FAM techniques.

A. Results Evaluvation
Table 1 comparison output for existing and proposed system
Most of prior FIR filter designs are based on the transposed structure because the major goal is to
minimize the cost of adders in MCM that takes less than 20% of the total area. Indeed, the MCM cost in
transposed-form NRSCSE and MBPG (and with further coefficient optimization [18], [19]) can be effectively
reduced.
Fig. 5. (a) Area, (b) delay, and (c) power of the proposed designs for filter C.
VI. CONCLUSION
This brief has presented low-cost FIR filter designs by jointly considering the optimization of
coefficient bit width and hardware resources in implementations. Although most prior designs are based on the
transposed form, we observe that the direct FIR structure with faithfully rounded MCMAT leads to the smallest
area cost and power consumption.
REFERENCES
[1] M. M. Peiro, E. I. Boemo, and L. Wanhammar, ―Design of high-speed multiplier less filters using a non
recursive signed common sub expression algorithm,‖ IEEE Trans. Circuits Syst. II,Analog Digit. Signal
Process., vol. 49, no. 3, pp. 196–203, Mar. 2002.
[2] C.-H. Chang, J. Chen, and A. P. Vinod, ―Information theoretic approach to complexity reduction of FIR
filter design,‖ IEEE Trans. Circuits Syst.I, Reg. Papers, vol. 55, no. 8, pp. 2310–2321, Sep. 2008.
[3] F. Xu, C. H. Chang, and C. C. Jong, ―Contention resolution—A new approach to versatile subexpressions
sharing in multiple constant multiplications,‖ IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 2, pp.
559–571, Mar. 2008.
[4] F. Xu, C. H. Chang, and C. C. Jong, ―Contention resolution algorithms for common subexpression
elimination in digital filter design,‖ IEEE Trans.Circuits Syst. II, Exp. Briefs, vol. 52, no. 10, pp. 695–
700, Oct. 2005.
[5] I.-C. Park and H.-J. Kang, ―Digital filter synthesis based on an algorithm to generate all minimal signed
digit representations,‖ IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 21, no. 12, pp.
1525–1529, Dec. 2002.
[6] C.-Y. Yao, H.-H. Chen, T.-F. Lin, C.-J. J. Chien, and X.-T. Hsu, ―A novel common-subexpression-
elimination method for synthesizing fixed-point FIR filters,‖ IEEE Trans. Circuits Syst. I, Reg. Papers,
vol. 51, no. 11, pp. 2215–2221, Sep. 2004.
[7] O. Gustafsson, ―Lower bounds for constant multiplication problems,‖ IEEE Trans. Circuits Syst. II, Exp.
Briefs, vol. 54, no. 11, pp. 974–978, Nov. 2007.
[8] Y. Voronenko and M. Puschel, ―Multiplierless multiple constant multiplication,‖ ACM Trans.
Algorithms, vol. 3, no. 2, pp. 1–38, May 2007.

[9] D. Shi and Y. J. Yu, ―Design of linear phase FIR filters with high probability of achieving minimum
number of adders,‖ IEEE Trans. Circuits Syst.I, Reg. Papers, vol. 58, no. 1, pp. 126–136, Jan. 2011.
[10] R. Huang, C.-H. H. Chang, M. Faust, N. Lotze, and Y. Manoli, ―Signextension avoidance and word-
length optimization by positive-offset representation for FIR filter design,‖ IEEE Trans. Circuits Syst. II,
Exp. Briefs, vol. 58, no. 12, pp. 916–920, Oct. 2011.
[11] P. K. Meher, ―New approach to look-up-table design and memory-based realization of FIR digital filter,‖
IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 3, pp. 592–603, Mar. 2010.
[12] P. K. Meher, S. Candrasekaran, and A. Amira, ―FPGA realization of FIR filters by efficient and flexible
systolization using distributed arithmetic,‖ IEEE Trans. Signal Process., vol. 56, no. 7, pp. 3009–3017,
Jul. 2008.
[13] S. Hwang, G. Han, S. Kang, and J.-S. Kim, ―New distributed arithmetic algorithm for low-power FIR
filter implementation,‖ IEEE Signal Process. Lett., vol. 11, no. 5, pp. 463–466, May 2004.

A05410105

More Related Content

What's hot (20)

Viewers also liked (11)

Similar to A05410105 (20)

More from IOSR-JEN (20)

Recently uploaded (20)

A05410105